Re: Several Topics - Nov. 19, 2013

Discussion in 'Perl Misc' started by glen herrmannsfeldt, Nov 19, 2013.

  1. In comp.lang.fortran E.D.G. <> wrote:
    >>> "E.D.G." <> wrote in message
    >>> news:...

    > Posted by E.D.G. on November 19, 2013


    > 1. PERL PDL CALCULATION SPEED VERSUS PYTHON AND FORTRAN


    (snip)

    > This program translation project has become one of the most
    > surprisingly successful programming projects I have worked on to date. A
    > considerable amount of valuable information has been sent to me by E-mail in
    > addition to all of the information posted to the Newsgroups.


    > The original posts actually discussed calculation speed matters
    > involving Perl and Python. And responses indicated that there were ways to
    > develop routines that could dramatically accelerate Python calculations.
    > But it did not sound like there were any for Perl.


    In general, language processors can be divided into two categories
    called compilers and interpreters. Compilers generate instructions for
    the target processors. Interpreters generate (usually) an intermediate
    representation which is then interpreted by a program to perform the
    desired operations. That latter tends to be much slower, but more
    portable.

    There are a few langauges that allow dynamic generation of code, which
    often makes compilation impossible, and those languages tend to be
    called 'interpreted langauges'.

    Some years ago when working with perl programs that ran too slow, we
    found a perl to C translator. Surprisingly, the result ran just as slow!
    It turns out that the perl to C translator generates a C program
    containing the intermediate code and the interpreter, and so runs just
    the same speed.

    More recently, there are JIT systems which generate the intermediate
    code, but then at the appropriate time (Just In Time) compile that to
    machine code and execute it. This is common for Java, and more recently
    for languages like Matlab.

    -- glen
    glen herrmannsfeldt, Nov 19, 2013
    #1
    1. Advertising

  2. glen herrmannsfeldt <> writes:
    > In comp.lang.fortran E.D.G. <> wrote:
    >>>> "E.D.G." <> wrote in message
    >>>> news:...

    >> Posted by E.D.G. on November 19, 2013

    >
    >> 1. PERL PDL CALCULATION SPEED VERSUS PYTHON AND FORTRAN

    >
    > (snip)
    >
    >> This program translation project has become one of the most
    >> surprisingly successful programming projects I have worked on to date. A
    >> considerable amount of valuable information has been sent to me by E-mail in
    >> addition to all of the information posted to the Newsgroups.

    >
    >> The original posts actually discussed calculation speed matters
    >> involving Perl and Python. And responses indicated that there were ways to
    >> develop routines that could dramatically accelerate Python calculations.
    >> But it did not sound like there were any for Perl.

    >
    > In general, language processors can be divided into two categories
    > called compilers and interpreters. Compilers generate instructions for
    > the target processors. Interpreters generate (usually) an intermediate
    > representation which is then interpreted by a program to perform the
    > desired operations. That latter tends to be much slower, but more
    > portable.
    >
    > There are a few langauges that allow dynamic generation of code, which
    > often makes compilation impossible, and those languages tend to be
    > called 'interpreted langauges'.


    These two paragraphs use the same terms in conflicting ways and the
    assertions in the second paragraph are wrong: Lisp is presumably the
    oldest language which allows 'dynamic code creation' and implementations
    exist which not only have a compiler but actually don't have an
    interpreter, cf

    http://www.sbcl.org/manual/index.html#Compiler_002donly-Implementation

    The main difference between a compiler and an interpreter is that the
    compiler performs lexical and semantical analysis of 'the source code'
    once and then transforms it into some kind of different 'directly
    executable representation' while an interpreter would analyze some part
    of the source code, execute it, analyze the next, execute that, and so
    forth, possibly performing lexical and semantical analysis steps many
    times for the same bit of 'source code'.

    Some compilers produce 'machine code' which can be executed directly by
    'a CPU', others generate 'machine code' for some kind of virtual machine
    which is itself implemented as a program. The distinction isn't really
    clear-cut because some CPUs are designed to run 'machine code'
    originally targetted at a virtual machine, eg, what used to be ARM
    Jazelle for executing JVM byte code directly on an ARM CPU, some virtual
    machines are supposed to execute 'machine code' which used to run
    'directly on a CPU' in former times, eg, used for backwards
    compatibility on Bull Novascale computers.

    Prior to execution, Perl source code is compiled to 'machine code' for a
    (stack-based) virtual machine. Both the compiler and the VM are provided
    by the perl program. There were some attempts to create a standalone
    Perl compiler in the past but these never gained much traction.
    Rainer Weikusat, Nov 19, 2013
    #2
    1. Advertising

  3. In comp.lang.fortran Rainer Weikusat <> wrote:
    > glen herrmannsfeldt <> writes:
    >> In comp.lang.fortran E.D.G. <> wrote:
    >>>>> "E.D.G." <> wrote in message
    >>>>> news:...
    >>> Posted by E.D.G. on November 19, 2013


    >>> 1. PERL PDL CALCULATION SPEED VERSUS PYTHON AND FORTRAN


    (snip)

    >>> This program translation project has become one of the most
    >>> surprisingly successful programming projects I have worked on to date. A
    >>> considerable amount of valuable information has been sent to me by E-mail in
    >>> addition to all of the information posted to the Newsgroups.


    (snip, I wrote)

    >> In general, language processors can be divided into two categories
    >> called compilers and interpreters. Compilers generate instructions for
    >> the target processors. Interpreters generate (usually) an intermediate
    >> representation which is then interpreted by a program to perform the
    >> desired operations. That latter tends to be much slower, but more
    >> portable.


    >> There are a few langauges that allow dynamic generation of code, which
    >> often makes compilation impossible, and those languages tend to be
    >> called 'interpreted langauges'.


    > These two paragraphs use the same terms in conflicting ways and the
    > assertions in the second paragraph are wrong: Lisp is presumably the
    > oldest language which allows 'dynamic code creation' and implementations
    > exist which not only have a compiler but actually don't have an
    > interpreter, cf


    > http://www.sbcl.org/manual/index.html#Compiler_002donly-Implementation


    > The main difference between a compiler and an interpreter is that the
    > compiler performs lexical and semantical analysis of 'the source code'
    > once and then transforms it into some kind of different 'directly
    > executable representation' while an interpreter would analyze some part
    > of the source code, execute it, analyze the next, execute that, and so
    > forth, possibly performing lexical and semantical analysis steps many
    > times for the same bit of 'source code'.


    OK, but many intepreters at least do a syntax check on the whole file,
    and many also convert the statements to a more convenient internal
    representation.

    For an example of something that can't be compiled, consider TeX which
    allows the category code of characters to be changed dynamically.

    I once wrote self-modifying code for Mathematica, where the running code
    (on what Mathematica calls the back end) asked the front end (which does
    editing of input data) to change the code.

    > Some compilers produce 'machine code' which can be executed directly by
    > 'a CPU', others generate 'machine code' for some kind of virtual machine
    > which is itself implemented as a program. The distinction isn't really
    > clear-cut because some CPUs are designed to run 'machine code'
    > originally targetted at a virtual machine, eg, what used to be ARM
    > Jazelle for executing JVM byte code directly on an ARM CPU, some virtual
    > machines are supposed to execute 'machine code' which used to run
    > 'directly on a CPU' in former times, eg, used for backwards
    > compatibility on Bull Novascale computers.


    Yes. There are also systems that do simple processing on each statement,
    with no interstatement memory. Converting numerical constants to
    internal form, encoding keywords to a single byte, and such.

    It is interesting to see the program listing look different than the way
    it was entered, such as constants coming out as 1e6 when you entered
    it as 1000000. The HP2000 BASIC system is the one I still remember.

    The popular microcomputer BASIC systems, mostly from Microsoft, allowed
    things like:

    IF I=1 THEN FOR J=1 TO 10
    PRINT J
    IF I=1 THEN NEXT J

    If you left out the IF on the last line, it would fail when it reached
    the NEXT J statement if the FOR hadn't been executed. Compare to C:

    if(i==1) for(j=1;j<=10;j++) {
    printf("%d\n",j);
    }

    A compiler would match up the FOR and NEXT at compile time. Many
    interpreters do it at run time, depending on the current state.

    I also used to use a BASIC system that allowed you to stop a program
    (or the program stopped itself), change statements (fix bugs) and
    continue on from where it stopped. Not all can do that, but pretty
    much compilers never do.

    > Prior to execution, Perl source code is compiled to 'machine code' for a
    > (stack-based) virtual machine. Both the compiler and the VM are provided
    > by the perl program. There were some attempts to create a standalone
    > Perl compiler in the past but these never gained much traction.


    And, importantly, the code runs fairly slow. Some years ago, I was
    working with simple PERL programs that could process data at 1 megabyte
    per minute. Rewriting in C, I got one megabyte per second. It is not too
    unusual to run 10 times slower, but 60 was rediculous.

    -- glen
    glen herrmannsfeldt, Nov 19, 2013
    #3
  4. glen herrmannsfeldt

    gamo Guest

    El 19/11/13 23:43, glen herrmannsfeldt escribió:
    >
    > And, importantly, the code runs fairly slow. Some years ago, I was
    > working with simple PERL programs that could process data at 1 megabyte
    > per minute. Rewriting in C, I got one megabyte per second. It is not too
    > unusual to run 10 times slower, but 60 was rediculous.
    >
    > -- glen
    >


    Can you provide more information on the topic? Perl version, method to
    read/write, etc.

    Thanks
    gamo, Nov 20, 2013
    #4
  5. In comp.lang.fortran gamo <> wrote:

    (snip, I wrote)

    >> And, importantly, the code runs fairly slow. Some years ago, I was
    >> working with simple PERL programs that could process data at 1 megabyte
    >> per minute. Rewriting in C, I got one megabyte per second. It is not too
    >> unusual to run 10 times slower, but 60 was rediculous.


    > Can you provide more information on the topic? Perl version, method to
    > read/write, etc.


    Well, it was about 10 years ago and written by someone else.

    The programs read characters from a file, did a little processing on
    them, usually with some kind of look-up table, and then wrote them out.

    One was to generate the reverse complement of DNA sequences, so read
    in a string (possibly multiple lines, a few hundred characters long)
    look up each in a table to find a different character, then write
    the string out backwards. I believe it used the perl equivalent to C's
    getchar() and putchar().

    Input and output in fasta format, and a few thousand to a few
    million sequences, so about 1MB to 1GB total.

    -- glen
    glen herrmannsfeldt, Nov 21, 2013
    #5
  6. >
    > One was to generate the reverse complement of DNA sequences, so read
    > in a string (possibly multiple lines, a few hundred characters long)
    > look up each in a table to find a different character, then write
    > the string out backwards. I believe it used the perl equivalent to C's
    > getchar() and putchar().
    >
    > Input and output in fasta format, and a few thousand to a few
    > million sequences, so about 1MB to 1GB total.
    >
    > -- glen
    >



    You have to provide the old code and some sample data if you want to
    have an answer
    George Mpouras, Nov 21, 2013
    #6
  7. >>>>> "gh" == glen herrmannsfeldt <> writes:

    gh> The programs read characters from a file, did a little
    gh> processing on them, usually with some kind of look-up table, and
    gh> then wrote them out.

    gh> One was to generate the reverse complement of DNA sequences, so
    gh> read in a string (possibly multiple lines, a few hundred
    gh> characters long) look up each in a table to find a different
    gh> character, then write the string out backwards. I believe it
    gh> used the perl equivalent to C's getchar() and putchar().

    To be honest, this sounds like someone writing idiomatic C in Perl. I'd
    expect competently written Perl to take at best ten times longer and
    probably more like 20 times longer than competently written C for a task
    like that, but it takes poor programming as well as language overhead to
    produce results like that.

    I implemented three subroutines that do what you describe - produce the
    reverse complement of a DNA sequence (alien DNA, that has ABCD codons,
    for simplicity). The first, "naive," expresses the logic clearly, but
    has a lot of intentional inefficiencies of the sort I've seen smart but
    inexperienced programmers make. The second, "reasonable," is the same
    code as "naive" but with the obvious inefficiencies removed. The third,
    "efficient," uses the most efficient and idiomatic Perl to solve the
    problem. The actual code is below; the results are thus (formatting
    adjusted):

    Benchmark: timing 100000 iterations of idiomatic, naive, reasonable...
    idiomatic: 2 wallclock secs ( 1.90 usr + 0.00 sys = 1.90 CPU)
    @ 52631.58/s (n=100000)
    naive: 2982 wallclock secs (1310.83 usr + 1.04 sys = 1311.87 CPU)
    @ 76.23/s (n=100000)
    reasonable: 1105 wallclock secs (1102.75 usr + 1.17 sys = 1103.92 CPU)
    @ 90.59/s (n=100000)

    Rate naive reasonable idiomatic
    naive 76.2/s -- -16% -100%
    reasonable 90.6/s 19% -- -100%
    idiomatic 52632/s 68946% 58001% --

    So we're looking at the idiomatic Perl version being 580 times faster
    than the reasonable Perl version. But the idiomatic Perl version isn't
    likely to be obvious to anyone who isn't deeply steeped in Perl and Unix
    knowledge.

    Charlton




    #!/usr/bin/perl

    use strict;
    use warnings;
    use Benchmark qw/timethese cmpthese/;

    our %lookup = ( A => 'D', B => 'C', C => 'B', D => 'A' );

    my @codons = qw/A B C D/;
    my @samples;
    my $string_length = 10000;
    my $sample_count = 10000;

    for (1..$sample_count) {
    my $str = '';

    for (1..$string_length) {
    $str .= $codons[rand() * 4];
    }

    push @samples, $str;
    }

    my $timing = timethese( 100000, {
    naive => sub { naive ($samples[rand() * $sample_count]); },
    reasonable => sub { reasonable ($samples[rand() * $sample_count]); },
    idiomatic => sub { idiomatic ($samples[rand() * $sample_count]);}
    });

    cmpthese($timing);

    sub naive {
    my $in = shift;

    my @codons = split '', $in;
    my @out_codons;

    for (my $i = 0; $i < scalar @codons; $i++) {
    if ($codons[$i] eq 'A') {
    $out_codons[$i] = 'D';
    }

    if ($codons[$i] eq 'B') {
    $out_codons[$i] = 'C';
    }

    if ($codons[$i] eq 'C') {
    $out_codons[$i] = 'B';
    }

    if ($codons[$i] eq 'D') {
    $out_codons[$i] = 'A';
    }
    }

    my @reversed_out_codons = reverse @out_codons;
    my $output = join '', @reversed_out_codons;

    return $output;
    }

    sub reasonable {
    my $in = shift;

    my @codons = split '', $in;
    my @out_codons = map { $lookup{$_} } @codons;
    my $output = join '', reverse @out_codons;
    return $output;
    }

    sub idiomatic {
    my $in = shift;
    $in =~ y/ABCD/DCBA/;
    $in = reverse $in;
    return $in;
    }

    --
    Charlton Wilbur
    Charlton Wilbur, Nov 21, 2013
    #7
  8. Charlton Wilbur <> wrote:
    >>>>>> "gh" == glen herrmannsfeldt <> writes:

    >
    > gh> The programs read characters from a file, did a little
    > gh> processing on them, usually with some kind of look-up table, and
    > gh> then wrote them out.
    >
    > gh> One was to generate the reverse complement of DNA sequences, so
    > gh> read in a string (possibly multiple lines, a few hundred
    > gh> characters long) look up each in a table to find a different
    > gh> character, then write the string out backwards. I believe it
    > gh> used the perl equivalent to C's getchar() and putchar().
    >
    >To be honest, this sounds like someone writing idiomatic C in Perl. I'd
    >expect competently written Perl to take at best ten times longer and
    >probably more like 20 times longer than competently written C for a task
    >like that, but it takes poor programming as well as language overhead to
    >produce results like that.


    ACK

    >I implemented three subroutines that do what you describe - produce the
    >reverse complement of a DNA sequence (alien DNA, that has ABCD codons,
    >for simplicity). [...]


    My feeling is that most of this reported slow performance may come from
    reading and writing individual characters: "it used the perl equivalent
    to C's getchar() and putchar().".
    If I'm not mistaken then doing so involves large overhead while reading
    and writing larger blocks (several kB) is orders of magnitude faster in
    Perl.

    jue
    Jürgen Exner, Nov 21, 2013
    #8
  9. glen herrmannsfeldt

    gamo Guest

    El 21/11/13 21:19, Jürgen Exner escribió:
    ....
    > If I'm not mistaken then doing so involves large overhead while reading
    > and writing larger blocks (several kB) is orders of magnitude faster in
    > Perl.
    >
    > jue



    What do you recomend to read a large file (i.e. 1 GB)?
    a) sysread() with a length of, say 8192
    b) use File::Slurp

    Thanks
    gamo, Nov 21, 2013
    #9
  10. glen herrmannsfeldt

    gamo Guest

    El 21/11/13 21:39, gamo escribió:
    > El 21/11/13 21:19, Jürgen Exner escribió:
    > ...
    >> If I'm not mistaken then doing so involves large overhead while reading
    >> and writing larger blocks (several kB) is orders of magnitude faster in
    >> Perl.
    >>
    >> jue

    >
    >
    > What do you recomend to read a large file (i.e. 1 GB)?
    > a) sysread() with a length of, say 8192
    > b) use File::Slurp
    >
    > Thanks
    >
    >
    >


    I'm afraid there is a better c) option that's to get -s size
    and pass it to sysread(). However, File::Slurp do more things,
    like edit in place a file.

    Best regards
    gamo, Nov 21, 2013
    #10
  11. gamo <> wrote:
    >El 21/11/13 21:19, Jürgen Exner escribió:
    >...
    >> If I'm not mistaken then doing so involves large overhead while reading
    >> and writing larger blocks (several kB) is orders of magnitude faster in
    >> Perl.

    >
    >What do you recomend to read a large file (i.e. 1 GB)?
    >a) sysread() with a length of, say 8192
    >b) use File::Slurp


    No idea, never had that need.

    But definitely not by calling getc() 1.000.000.000 times :)
    Even the man page warns: "This is not particularly efficient."

    jue
    Jürgen Exner, Nov 21, 2013
    #11
  12. glen herrmannsfeldt

    gamo Guest

    El 22/11/13 00:01, Ben Morrow escribió:
    >
    > For a file which will fit in memory, File::Slurp. For a file which might
    > not, sysread, but with a larger buffer than that; I might use an 8M or
    > 16M buffer (as opposed to your 8k), or larger.
    >


    I tried with a big file and taking the buffer as large as the file, and
    it's OK if you have the memory. It's faster.

    size = 1042636916 bytes
    read OK
    File::Slurp takes 0.587991 s.
    sysread() takes 0.249269 s.

    All the best
    gamo, Nov 21, 2013
    #12
  13. with <> Jürgen Exner wrote:

    *SKIP*
    >>I implemented three subroutines that do what you describe - produce
    >>the reverse complement of a DNA sequence (alien DNA, that has ABCD
    >>codons, for simplicity). [...]

    > My feeling is that most of this reported slow performance may come
    > from reading and writing individual characters: "it used the perl
    > equivalent to C's getchar() and putchar().".


    (that's me speculating here) May I remind everyone that B<reverse> was
    making duplicate lists in the past not that far away?

    *CUT*

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
    Eric Pozharski, Nov 22, 2013
    #13
  14. Ben Morrow <> writes:
    > Quoth Jürgen Exner <>:


    [read a large file into memory]

    >> But definitely not by calling getc() 1.000.000.000 times :)
    >> Even the man page warns: "This is not particularly efficient."

    >
    > It shouldn't be *that* bad, given that Perl buffers IO... there will be
    > some overhead breaking the buffer into characters and returning them one
    > at a time, but if the input is to be processed character-by-character
    > that overhead has to happen somewhere.


    The Perl getc routine isn't the best way to access the contents of a
    string character by character,

    ----------
    use Benchmark;


    timethese(-3,
    {
    getc => sub {
    my ($c, $fh, $out);
    open($fh, '<', '/tmp/syslog');
    $out .= "$c " while defined($c = getc($fh));
    },

    substr => sub {
    my ($all, $fh, $out, $pos);
    local $/;

    open($fh, '<', '/tmp/syslog');
    $all = <$fh>;

    $out .= substr($all, $pos++, 1).' ' while $pos < length($all);
    }});
    ----------

    and the 'this is not particularly efficient' is intended as a warning to
    C programmers accustomed to stdio (before the advent of 'multithreading
    for student morons', at least): There, getchar/ getc are usually macros
    who load 'bytes' from an internal buffer directly into a register for
    processing which means they're fast. But the perl getc is not at all
    comparable to that because it is still a perl 'operator' which needs to be
    invoked with arguments on a stack and which returns a value via stack
    (and this value comes in its own SV).
    Rainer Weikusat, Nov 22, 2013
    #14
  15. On 2013-11-21 23:19, gamo <> wrote:
    > El 22/11/13 00:01, Ben Morrow escribió:
    >> For a file which will fit in memory, File::Slurp. For a file which might
    >> not, sysread, but with a larger buffer than that; I might use an 8M or
    >> 16M buffer (as opposed to your 8k), or larger.
    >>

    >
    > I tried with a big file and taking the buffer as large as the file, and
    > it's OK if you have the memory. It's faster.
    >
    > size = 1042636916 bytes
    > read OK
    > File::Slurp takes 0.587991 s.
    > sysread() takes 0.249269 s.


    Oh, that's fast. What kind of system is this?

    I get about twice those times on a 2.4 GHz Xeon E5530 system.

    But I guess you used the return value from read_file()?

    If I use the buf_ref method (i.e. “read_file($ARGV[0], buf_ref => \$s)â€
    instead of “$s = read_file($ARGV[0])â€) there is no significant speed
    difference between sysread and read_file.

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
    Peter J. Holzer, Nov 22, 2013
    #15
  16. glen herrmannsfeldt

    gamo Guest

    El 22/11/13 20:16, Peter J. Holzer escribió:
    > On 2013-11-21 23:19, gamo <> wrote:
    >> El 22/11/13 00:01, Ben Morrow escribió:
    >>> For a file which will fit in memory, File::Slurp. For a file which might
    >>> not, sysread, but with a larger buffer than that; I might use an 8M or
    >>> 16M buffer (as opposed to your 8k), or larger.
    >>>

    >>
    >> I tried with a big file and taking the buffer as large as the file, and
    >> it's OK if you have the memory. It's faster.
    >>
    >> size = 1042636916 bytes
    >> read OK
    >> File::Slurp takes 0.587991 s.
    >> sysread() takes 0.249269 s.

    >
    > Oh, that's fast. What kind of system is this?
    >


    linux, cpu i7

    > I get about twice those times on a 2.4 GHz Xeon E5530 system.
    >
    > But I guess you used the return value from read_file()?
    >


    Yes I used the variables $bufferA and $bufferB to compare between,
    (eq) before printing read OK

    > If I use the buf_ref method (i.e. “read_file($ARGV[0], buf_ref => \$s)â€
    > instead of “$s = read_file($ARGV[0])â€) there is no significant speed
    > difference between sysread and read_file.
    >
    > hp


    Must be a difference is you use sysopen(IN, ...) and
    sysread(IN, $bufferB, $size)

    where $size = -s $filename;

    File::Slurp uses an heuristic to determine the $size.

    Anyway, a variable of that size would be handled with extreme caution to
    avoid some overheads and make a difference over regular parsed
    input, i.e. line by line. The file I used is numeric, and I usually
    read it line by line (no slurp) because something could be done
    per number. I do not use NYTProf yet to see what really happens.

    Best regards
    gamo, Nov 22, 2013
    #16
  17. On 2013-11-22 21:08, gamo <> wrote:
    > El 22/11/13 20:16, Peter J. Holzer escribió:
    >> On 2013-11-21 23:19, gamo <> wrote:
    >>> I tried with a big file and taking the buffer as large as the file, and
    >>> it's OK if you have the memory. It's faster.
    >>>
    >>> size = 1042636916 bytes
    >>> read OK
    >>> File::Slurp takes 0.587991 s.
    >>> sysread() takes 0.249269 s.

    [...]
    >> But I guess you used the return value from read_file()?
    >>

    >
    > Yes I used the variables $bufferA and $bufferB to compare between,
    > (eq) before printing read OK


    I don't understand that answer.

    I was referring to the difference between

    my $s = read_file($ARGV[0]);
    and

    my $s;
    read_file($ARGV[0], buf_ref => \$s);

    The latter is about twice as fast (and consumes half the memory) because
    it reads the file directly into $s instead of reading it into a
    temporary variable then copying it into $s.


    >> If I use the buf_ref method (i.e. “read_file($ARGV[0], buf_ref => \$s)â€
    >> instead of “$s = read_file($ARGV[0])â€) there is no significant speed
    >> difference between sysread and read_file.

    >
    > Must be a difference is you use sysopen(IN, ...) and
    > sysread(IN, $bufferB, $size)
    >
    > where $size = -s $filename;
    >
    > File::Slurp uses an heuristic to determine the $size.


    Again, I don't understand what you are trying to say.

    my $s;
    read_file($ARGV[0], buf_ref => \$s);

    is almost exactly the same speed as

    open (my $fh, '<', $ARGV[0]) or die "cannot open $ARGV[0]: $!";
    my $size = stat($fh)->size;
    my $s;
    my $rc = sysread($fh, $s, $size);

    Which is hardly surprising, since it does the same thing.

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
    Peter J. Holzer, Nov 22, 2013
    #17
  18. glen herrmannsfeldt

    gamo Guest

    El 23/11/13 00:04, Peter J. Holzer escribió:
    >
    > my $s;
    > read_file($ARGV[0], buf_ref => \$s);
    >
    > The latter is about twice as fast (and consumes half the memory) because
    > it reads the file directly into $s instead of reading it into a
    > temporary variable then copying it into $s.
    >


    You are right: this method is as fast as reading with sysread.

    >
    > Which is hardly surprising, since it does the same thing.
    >
    > hp
    >
    >


    It's a surprise that that method isn't recomended in the man page of
    File::Slurp

    Thanks
    gamo, Nov 23, 2013
    #18
  19. On 2013-11-23 01:11, gamo <> wrote:
    > El 23/11/13 00:04, Peter J. Holzer escribió:
    >> my $s;
    >> read_file($ARGV[0], buf_ref => \$s);
    >>
    >> The latter is about twice as fast (and consumes half the memory) because
    >> it reads the file directly into $s instead of reading it into a
    >> temporary variable then copying it into $s.
    >>

    >
    > You are right: this method is as fast as reading with sysread.
    >
    >>
    >> Which is hardly surprising, since it does the same thing.

    >
    > It's a surprise that that method isn't recomended in the man page of
    > File::Slurp


    The man page mentions that this “is usually the fastest way to read a
    file into a scalarâ€. I agree that this advantage should be pointed out
    more prominently, and I suggested to Uri to put this variant in the
    synopsis about a year ago (<>).

    hp

    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
    Peter J. Holzer, Nov 23, 2013
    #19
  20. >>>>> "EP" == Eric Pozharski <> writes:

    EP> (that's me speculating here) May I remind everyone that
    EP> B<reverse> was making duplicate lists in the past not that far
    EP> away?

    Indeed; one of the things that makes my "naive" implementation so slow
    was that at every stage of the process I created a new variable to hold
    the intermediate result. This is a pattern I've seen a great deal of in
    inexperienced programmers; on small data sets it's almost always useful
    enough in exploratory programming to justify the overhead.

    Computer resources weren't so limited when I learned to program that I
    needed to learn to reverse a short string in place, but I'm glad I was
    required to; the mindset generalizes nicely.

    Charlton


    --
    Charlton Wilbur
    Charlton Wilbur, Nov 23, 2013
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Charlton Wilbur

    Re: Several Perl Questions - Nov. 5, 2013

    Charlton Wilbur, Nov 5, 2013, in forum: Perl Misc
    Replies:
    3
    Views:
    131
    Charlton Wilbur
    Nov 11, 2013
  2. John Black

    Re: Several Perl Questions - Nov. 5, 2013

    John Black, Nov 5, 2013, in forum: Perl Misc
    Replies:
    0
    Views:
    118
    John Black
    Nov 5, 2013
  3. Peter J. Holzer

    Re: Several Perl Questions - Nov. 5, 2013

    Peter J. Holzer, Nov 5, 2013, in forum: Perl Misc
    Replies:
    3
    Views:
    130
    Peter J. Holzer
    Nov 7, 2013
  4. Peter J. Holzer

    Re: Several Perl Questions - Nov. 5, 2013

    Peter J. Holzer, Nov 5, 2013, in forum: Perl Misc
    Replies:
    0
    Views:
    115
    Peter J. Holzer
    Nov 5, 2013
  5. glen herrmannsfeldt

    Re: Several Topics - Nov. 19, 2013

    glen herrmannsfeldt, Nov 19, 2013, in forum: Python
    Replies:
    5
    Views:
    86
Loading...

Share This Page