Speed of reading some MB of data using qx(...)

Discussion in 'Perl Misc' started by Wolfram Humann, Jul 22, 2010.

  1. I have a program that processes PDF files by converting them to
    Postscript, read the ps and do something with it. I use pdftops (from
    xpdf) for the pdf->ps conversion and retrieve the result like this:

    $ps_text = qx( pdftops $infile - );

    On win32 using strawberry perl (tried 5.10 and 5.12) this takes much
    more time than I expected so I did a test and first converted the PDF
    to Postscript, then read the Postscript (about 12 MB) like this (cat
    on win32 provided by cygwin):

    perl -E" $t = qx(cat psfile.ps); say length $t "

    This takes about 16 seconds on win32 but only <1 seconds on Linux. I
    was afraid that this might be a 'binmode' problem so I also tried
    this:

    perl -E" open $in,'cat psfile.ps |'; binmode $in; local $/; $t=<$in>;
    say length $t "

    But the effect is the same: fast on linux, slow on win32. Besides
    bashing win32 :) and ideas for reason and (possibly) cure?

    Wolfram
     
    Wolfram Humann, Jul 22, 2010
    #1
    1. Advertising

  2. Wolfram Humann

    Uri Guttman Guest

    >>>>> "WH" == Wolfram Humann <> writes:

    WH> perl -E" $t = qx(cat psfile.ps); say length $t "

    WH> This takes about 16 seconds on win32 but only <1 seconds on Linux. I
    WH> was afraid that this might be a 'binmode' problem so I also tried
    WH> this:

    WH> perl -E" open $in,'cat psfile.ps |'; binmode $in; local $/; $t=<$in>;
    WH> say length $t "

    WH> But the effect is the same: fast on linux, slow on win32. Besides
    WH> bashing win32 :) and ideas for reason and (possibly) cure?

    you also bashed File::Slurp in a bug report you sent me. obviously it is
    a winblows issue. one possibility is that winblows does poor process
    context switching and piping between processes causes lots of
    that. there may be other reasons but i stay away from redmond.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Jul 22, 2010
    #2
    1. Advertising

  3. On 2010-07-22, Uri Guttman <> wrote:
    > WH> But the effect is the same: fast on linux, slow on win32. Besides
    > WH> bashing win32 :) and ideas for reason and (possibly) cure?
    >
    > you also bashed File::Slurp in a bug report you sent me. obviously it is
    > a winblows issue. one possibility is that winblows does poor process
    > context switching and piping between processes causes lots of
    > that. there may be other reasons but i stay away from redmond.


    Disable your network adapters, and switch off virus checking. Recheck
    speed.

    Hope this helps,
    Ilya
     
    Ilya Zakharevich, Jul 22, 2010
    #3
  4. On Jul 22, 6:39 pm, "Uri Guttman" <> wrote:
    > >>>>> "WH" == Wolfram Humann <> writes:

    >
    >   WH> perl -E" $t = qx(cat psfile.ps); say length $t "
    >
    >   WH> This takes about 16 seconds on win32 but only <1 seconds on Linux.. I
    >   WH> was afraid that this might be a 'binmode' problem so I also tried
    >   WH> this:
    >
    >   WH> perl -E" open $in,'cat psfile.ps |'; binmode $in; local $/; $t=<$in>;
    >   WH> say length $t "
    >
    >   WH> But the effect is the same: fast on linux, slow on win32. Besides
    >   WH> bashing win32 :) and ideas for reason and (possibly) cure?
    >
    > you also bashed File::Slurp in a bug report you sent me. obviously it is
    > a winblows issue. one possibility is that winblows does poor process
    > context switching and piping between processes causes lots of
    > that. there may be other reasons but i stay away from redmond.
    >
    > uri


    Uri,
    it was certainly not my intention to bash File::Slurp in my bug report
    -- and I when I re-read what I wrote I don't think I did. If I did
    bash, I am sorry for it (and would be grateful for explanation from
    your side). I'm not a native speaker; maybe things came out different
    from what I intended.
    Besides that, I think the two issues (my bug report and my post here)
    are unrelated. It looks like File::Slurp needs to do substitutions on
    win32. I was not aware of that as I do not seem to need them when I
    read line-by-line on win32. Those substitutions naturally suffer from
    the use of $& elsewhere in the program. I think it would be o.k. to
    mention this briefly in the docs for File ::Slurp. This is a special
    case and you might disagree that it should be in the docs. In that
    case, just reject and close my bug report.

    Regards,
    Wolfram
     
    Wolfram Humann, Jul 23, 2010
    #4
  5. Wolfram Humann

    Uri Guttman Guest

    >>>>> "WH" == Wolfram Humann <> writes:

    WH> On Jul 22, 6:39 pm, "Uri Guttman" <> wrote:

    >> you also bashed File::Slurp in a bug report you sent me. obviously it is
    >> a winblows issue. one possibility is that winblows does poor process
    >> context switching and piping between processes causes lots of
    >> that. there may be other reasons but i stay away from redmond.


    WH> it was certainly not my intention to bash File::Slurp in my bug report
    WH> -- and I when I re-read what I wrote I don't think I did. If I did
    WH> bash, I am sorry for it (and would be grateful for explanation from
    WH> your side). I'm not a native speaker; maybe things came out different
    WH> from what I intended.
    WH> Besides that, I think the two issues (my bug report and my post here)
    WH> are unrelated. It looks like File::Slurp needs to do substitutions on
    WH> win32. I was not aware of that as I do not seem to need them when I
    WH> read line-by-line on win32. Those substitutions naturally suffer from
    WH> the use of $& elsewhere in the program. I think it would be o.k. to
    WH> mention this briefly in the docs for File ::Slurp. This is a special
    WH> case and you might disagree that it should be in the docs. In that
    WH> case, just reject and close my bug report.

    i don't think i should worry about someone else using $& in their
    code. most modules use s/// with grabbing and will suffer for the same
    reason. this is a silly idea to warn about something i have no control
    over.

    but you showed with your slurp code test that it was slower on winblows
    without my module. so telling me to warn about using $& is wrong as it
    isn't the reason for the slowdown. winblows doesn't do forks/pipe well
    since it doesn't like process context switching. they push threads which
    suck for other reasons. on most unix flavors process and thread context
    switching are about equally fast.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Jul 23, 2010
    #5
  6. On Jul 22, 11:14 pm, Ilya Zakharevich <> wrote:
    > On 2010-07-22, Uri Guttman <> wrote:
    >
    > >  WH> But the effect is the same: fast on linux, slow on win32. Besides
    > >  WH> bashing win32 :) and ideas for reason and (possibly) cure?

    >
    > > you also bashed File::Slurp in a bug report you sent me. obviously it is
    > > a winblows issue. one possibility is that winblows does poor process
    > > context switching and piping between processes causes lots of
    > > that. there may be other reasons but i stay away from redmond.

    >
    > Disable your network adapters, and switch off virus checking.  Recheck
    > speed.
    >
    > Hope this helps,
    > Ilya


    Ilya,
    I tried that and it makes no difference. Also, when enabled, I don't
    see any CPU usage in the virus check process -- it all goes to
    perl.exe.
    I also tried things using cygwin perl on the same win32 pc: cygwin
    perl also runs <1 second ...

    Thanks for the suggestion,
    Wolfram
     
    Wolfram Humann, Jul 23, 2010
    #6
  7. On Jul 26, 12:10 am, Ben Morrow <> wrote:
    >
    > Win32's pipes are *really* slow. Write it to a temporary and then read
    > the file normally in perl.


    After further experiments I am now convinced that pipes are not the
    bottleneck (even on win32 piping 10MB can't take 20 seconds...). The
    problem seems to be Straberry's memory management for long strings. I
    need to do some more benchmarking and will report this issue
    separately.

    Does anybody know of the appropriate place to report Strawberry Perl
    specific bugs?

    Wolfram
     
    Wolfram Humann, Jul 26, 2010
    #7
  8. On Jul 26, 11:39 am, Ben Morrow <> wrote:
    >
    > I seriously doubt the issue is with Strawberry specifically; almost
    > certainly any issue applies Win32 perl in general and should be reported
    > to p5p with perlbug. If you can confirm it is specific to Strawberry
    > (so, e.g., a self-compiled mingw perl *doesn't* have the problem) then I
    > think the correct place is the Perl::Dist::Strawberry queue on
    > rt.cpan.org (mail ).
    >
    > (IIRC malloc on Win32 is *also* known to be deadly slow, and also IIRC
    > it's impossible to use perl's malloc without breaking things...)
    >
    > Ben


    Thanks for the pointers. My comparison is Strawberry Perl (5.10 and
    5.12) against Cygwin Perl on the same machine. The latter (as well as
    Perl on Linux) doesn't have the issues I see. Is that a sufficient
    "proof" for the issues being Strawberry specific?

    Wolfram
     
    Wolfram Humann, Jul 26, 2010
    #8
  9. On 2010-07-26 10:07, Wolfram Humann <> wrote:
    > On Jul 26, 11:39 am, Ben Morrow <> wrote:
    >> I seriously doubt the issue is with Strawberry specifically; almost
    >> certainly any issue applies Win32 perl in general and should be reported
    >> to p5p with perlbug. If you can confirm it is specific to Strawberry
    >> (so, e.g., a self-compiled mingw perl *doesn't* have the problem) then I
    >> think the correct place is the Perl::Dist::Strawberry queue on
    >> rt.cpan.org (mail ).
    >>
    >> (IIRC malloc on Win32 is *also* known to be deadly slow, and also IIRC
    >> it's impossible to use perl's malloc without breaking things...)

    >
    > Thanks for the pointers. My comparison is Strawberry Perl (5.10 and
    > 5.12) against Cygwin Perl on the same machine. The latter (as well as
    > Perl on Linux) doesn't have the issues I see. Is that a sufficient
    > "proof" for the issues being Strawberry specific?


    I remember vaguely that Activestate Perl has similar issues. I think Ben
    is correct that this is a problem with Win32 malloc and will affect any
    Perl build which uses the Win32 malloc implementation (Cygwin probably
    provides its own malloc implementation). A fix which works on both
    Activestate and Strawberry would certainly be preferrable to a
    Strawberry-specific one.

    hp
     
    Peter J. Holzer, Jul 26, 2010
    #9
  10. On 2010-07-26 13:11, Ben Morrow <> wrote:
    > If the issue simply turns out to be 'Microsoft don't know how to write a
    > decent malloc', there is very little p5p can do about it, of course. On
    > most platforms perl can, and often does, use its own malloc
    > implementation which is optimised for perl's use (lots of tiny
    > allocations and deallocations all the time). This isn't possible on
    > Win32 unless you make a custom build of perl that doesn't support the
    > fork emulation.


    Since the fork emulation works with Win32 malloc, I think it should be
    possible to write a custom malloc based on Win32 malloc (or the
    underlying API calls) which still works with the fork emulation but is
    faster. But it's probably not easy or somebody would have done it
    already (I don't pretend to understand either memory allocation or the
    fork emulation on windows).

    hp
     
    Peter J. Holzer, Jul 26, 2010
    #10
  11. On Jul 26, 3:11 pm, Ben Morrow <> wrote:
    > Quoth Wolfram Humann <>:
    > > Thanks for the pointers. My comparison is Strawberry Perl (5.10 and
    > > 5.12) against Cygwin Perl on the same machine. The latter (as well as
    > > Perl on Linux) doesn't have the issues I see. Is that a sufficient
    > > "proof" for the issues being Strawberry specific?

    >
    > No. As far as Perl is concerned, Cygwin is a separate OS. A fair
    > comparison would be with ActiveState or (as I said) with a Win32 perl
    > you've compiled yourself.


    Oh dear, you're right: ActiveState Perl is just as bad as Strawberry.
    Here's my test-case:
    (I always append a number of chunks with a total size of 1E6 chars to
    an existing string,
    but the start-size of the existing string and the chunk-size vary)


    use strict;
    use warnings;
    use Time::HiRes qw(time);

    my $c1E1 = '#' x 1E1;
    my $c1E2 = '#' x 1E2;
    my $c1E3 = '#' x 1E3;
    my $c1E4 = '#' x 1E4;
    my $c1E5 = '#' x 1E5;


    my $str1 = '#' x 1E5;
    my $str2 = '#' x 1E6;
    my $str3 = '#' x 1E7;

    my $str4 = '#' x 1E7;
    my $str5 = '#' x 1E7;
    my $str6 = '#' x 1E7;
    my $str7 = '#' x 1E7;
    my $str8 = '#' x 1E7;

    my $str9 = '#' x 2E7;
    $str9 = '#' x 1E7;

    my @ar1 = map{ $c1E2 } 1..1E5;

    my @c = (
    '1E5 chars + 1E4 x 1E2 chars' => sub{ $str1 .= $c1E2 for 1..1E4 },
    '1E6 chars + 1E4 x 1E2 chars' => sub{ $str2 .= $c1E2 for 1..1E4 },
    '1E7 chars + 1E4 x 1E2 chars' => sub{ $str3 .= $c1E2 for 1..1E4 },
    '',
    '1E7 chars + 1E5 x 1E1 chars' => sub{ $str4 .= $c1E1 for 1..1E5 },
    '1E7 chars + 1E4 x 1E2 chars' => sub{ $str5 .= $c1E2 for 1..1E4 },
    '1E7 chars + 1E3 x 1E3 chars' => sub{ $str6 .= $c1E3 for 1..1E3 },
    '1E7 chars + 1E2 x 1E4 chars' => sub{ $str7 .= $c1E4 for 1..1E2 },
    '1E7 chars + 1E1 x 1E5 chars' => sub{ $str8 .= $c1E5 for 1..1E1 },
    '',
    '1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars' => sub{ $str9 .=
    $c1E2 for 1..1E4 },
    '1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars ' => sub{ push @ar1,
    $c1E2 for 1..1E4 },
    );

    while (@c) {
    my $name = shift @c;
    print("\n"), next unless $name;
    my $code = shift @c;
    my $t1 = time; &$code; my $t2 = time;
    printf "%s: %6.1f ms\n", $name, 1000 * ($t2 - $t1);
    }

    ##########################################################
    And these are the results:

    c:\cygwin\bin\perl LongStrings.pl
    1E5 chars + 1E4 x 1E2 chars: 1.6 ms
    1E6 chars + 1E4 x 1E2 chars: 2.4 ms
    1E7 chars + 1E4 x 1E2 chars: 1.5 ms

    1E7 chars + 1E5 x 1E1 chars: 11.3 ms
    1E7 chars + 1E4 x 1E2 chars: 1.5 ms
    1E7 chars + 1E3 x 1E3 chars: 0.9 ms
    1E7 chars + 1E2 x 1E4 chars: 1.0 ms
    1E7 chars + 1E1 x 1E5 chars: 0.9 ms

    1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.2 ms
    1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 5.5 ms

    ##########################################################
    c:\strawberry\perl\bin\perl LongStrings.pl
    1E5 chars + 1E4 x 1E2 chars: 94.4 ms
    1E6 chars + 1E4 x 1E2 chars: 319.9 ms
    1E7 chars + 1E4 x 1E2 chars: 2710.4 ms

    1E7 chars + 1E5 x 1E1 chars: 2656.0 ms
    1E7 chars + 1E4 x 1E2 chars: 2656.1 ms
    1E7 chars + 1E3 x 1E3 chars: 2609.1 ms
    1E7 chars + 1E2 x 1E4 chars: 1109.1 ms
    1E7 chars + 1E1 x 1E5 chars: 118.3 ms

    1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.2 ms
    1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 6.5 ms

    I compared Strawberry and ActiveState on another machine: the times
    are close to each other but even longer than the ones above due to the
    older hardware.

    Wolfram
     
    Wolfram Humann, Jul 26, 2010
    #11
  12. On 2010-07-26, Peter J. Holzer <> wrote:
    > On 2010-07-26 13:11, Ben Morrow <> wrote:
    >> If the issue simply turns out to be 'Microsoft don't know how to write a
    >> decent malloc', there is very little p5p can do about it, of course. On
    >> most platforms perl can, and often does, use its own malloc
    >> implementation which is optimised for perl's use (lots of tiny
    >> allocations and deallocations all the time). This isn't possible on
    >> Win32 unless you make a custom build of perl that doesn't support the
    >> fork emulation.


    > Since the fork emulation works with Win32 malloc, I think it should be
    > possible to write a custom malloc based on Win32 malloc (or the
    > underlying API calls) which still works with the fork emulation but is
    > faster. But it's probably not easy or somebody would have done it
    > already (I don't pretend to understand either memory allocation or the
    > fork emulation on windows).


    "My" malloc() (one shipped with Perl) needs only 1 syscall implemented
    on a particular architecture: get some pages from the system. So the
    macro should be defined on the command line of $(CC) -c malloc.c.

    Hope this helps,
    Ilya
     
    Ilya Zakharevich, Jul 26, 2010
    #12
  13. I do understand how an unefficient malloc would explain the chunk-size
    dependency in the code. What I do not understand is the dependency on
    the initial size of the string that I append to. Does Perl re-allocate
    the complete string (at least from time to time) when I keep appending
    to it? Or could there be other effects involved besides malloc speed ?
    (Ideally I would profile Strawberry Perl on the C level but I don't
    know how to do that...)

    Wolfram
     
    Wolfram Humann, Jul 27, 2010
    #13
  14. Wolfram Humann

    Uri Guttman Guest

    >>>>> "WH" == Wolfram Humann <> writes:

    WH> I do understand how an unefficient malloc would explain the chunk-size
    WH> dependency in the code. What I do not understand is the dependency on
    WH> the initial size of the string that I append to. Does Perl re-allocate
    WH> the complete string (at least from time to time) when I keep appending
    WH> to it? Or could there be other effects involved besides malloc speed ?
    WH> (Ideally I would profile Strawberry Perl on the C level but I don't
    WH> know how to do that...)

    perl does a realloc (or its equiv) on strings which grow too big. i
    believe it does a doubling each time (same as for hash buckets). if so,
    a large initial size will affect things as it will mean fewer malloc
    calls as it grows. if malloc is the bottleneck, this will show up.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Jul 27, 2010
    #14
  15. On 27 Jul., 08:20, "Uri Guttman" <> wrote:
    > >>>>> "WH" == Wolfram Humann <> writes:

    >
    >   WH> I do understand how an unefficient malloc would explain the chunk-size
    >   WH> dependency in the code. What I do not understand is the dependency on
    >   WH> the initial size of the string that I append to. Does Perl re-allocate
    >   WH> the complete string (at least from time to time) when I keep appending
    >   WH> to it? Or could there be other effects involved besides malloc speed ?
    >   WH> (Ideally I would profile Strawberry Perl on the C level but I don't
    >   WH> know how to do that...)
    >
    > perl does a realloc (or its equiv) on strings which grow too big. i
    > believe it does a doubling each time (same as for hash buckets). if so,
    > a large initial size will affect things as it will mean fewer malloc
    > calls as it grows. if malloc is the bottleneck, this will show up.


    Unfortunately, that makes thing just more mysterious (at least for my
    understanding):
    > 1E5 chars + 1E4 x 1E2 chars: 94.4 ms
    > 1E6 chars + 1E4 x 1E2 chars: 319.9 ms
    > 1E7 chars + 1E4 x 1E2 chars: 2710.4 ms

    While appending 1E6 chars (= 1E4 chunks of 1E2 chars each) to an
    initial string size of 1E5 might require 3-4 reallocs, appending 1E6
    chars to an initial string size of 1E7 chars involves at most 1
    realloc. And (even though not measured in my sample code above) the
    from-scratch en-bloc allocation of a 2E7 chars string is reasonably
    fast in Strawberry Perl.

    Wolfram
     
    Wolfram Humann, Jul 27, 2010
    #15
  16. Alright, I did some profiling and code-reading and what I found is
    something that I would consider a bug or at least fairly poor coding
    practice in the core.
    Opinions very welcome!

    In Strawberry almost the entire time is spent in the following call
    sequence:

    Perl_sv_catpvn_flags -> Perl_sv_grow -> Perl_safesysrealloc -> realloc
    (msvcrt)

    Perl_sv_catpvn_flags (in sv.c) is documented as "Concatenates the
    string onto the end of the string which is in the SV". That's what my
    code does all the time. So far so good.
    Perl_sv_catpvn_flags *always* calls SvGrow -> Perl_sv_grow (also in
    sv.c). Perl_sv_grow then needs to decide if the string's memory is
    already sufficient or really needs to grow. In the latter case,
    safesysrealloc -> Perl_safesysrealloc -> realloc is called. The
    interesting point is: how much memory does it request? The answer is:

    newlen += 10 * (newlen - SvCUR(sv)); /* avoid copy each time */

    I.e. it requests 10 times as much memory as is required for the
    current append operation. So when I loop 10000 times and each time
    append 100 chars to an initial string size of 10 million, the memory
    grows from 10.000e6 to 10.001e6 to 10.002e6 and so on 1000 times till
    it ends at 11.000e6. I can sort of confirm this to be true if I look
    at the memory graph in Process Explorer: it grows smoothly (no
    discernible steps), becoming incrementally slower towards the end
    (because the amount of memory that needs to be copied for each realloc
    increases).

    Growing memory in such tiny increments is what I consider bad
    practice.

    By the way: I estimate the time required for each realloc to be around
    3 ms for 10e6 chars, growing linearly with the amount of data -- I
    consider that a fair speed and no reason to blame win32.

    What happens in Cygwin? The stack-sampling profiler is of little help
    because it easily misses infrequent events. I would expect that
    Perl_sv_grow is called just as often as in Strawberry Perl. The
    difference is that safesysrealloc does not call Perl_safesysrealloc ->
    realloc, it calls Perl_realloc. And Perl_realloc (in malloc.c) seems
    to have it's own logic (something with '<<' and 'LOG' and 'pow' which
    I did not try to fully understand) to determine what amount of memory
    it finally allocates. When I add some sleep() to the string append
    process, I can see how memory grows in Process Explorer: There are 5
    steps (probably corresponding to 5 calls to Perl_realloc) of growing
    size when I start with 0.1e6 chars and then grow to 1.1e6 chars. When
    I start with 10e6 chars and grow to 11e6 chars, there is just 1 step
    in memory size. This looks like a sensible memory growth strategy to
    me. It explains why Cygwin is several 100 times faster than Strawberry
    Perl. It also explains why I observed during my experiments that
    Cygwin Perl consistently needs more memory than Strawberry Perl -- but
    that's a small price to pay for such a dramatic speedup.

    Wolfram
     
    Wolfram Humann, Jul 27, 2010
    #16
  17. On Jul 27, 4:13 pm, Ben Morrow <> wrote:
    > Would you be able to repeat these tests with 5.12.0: that is, build
    > (under Cygwin, if you don't have access to a Unix system) matched perls
    > configured with and without -Dusemymalloc, and run the test on both?
    > I'll try and do the same here, but I can't promise I'll have time. If
    > the slowdown still exists in 5.12, I think you have a good case for a
    > bug report. I'm not sure how possible it would be to fix, but it would
    > clearly be a big win under some circumstances to be able to build Win32
    > perl with perl's malloc.


    I do have a linux machine and I did comile my own perl there so I
    think a could redo that (possibly easier than recompiling Perl for
    Cygwin). The strange thing is that on Cygwin perl -V says
    'usemymalloc=y' while the one on Linux says 'usemymalloc=n'. And on
    Linux my bechmark runs everything under 12 ms. Are you certain
    changing usemymalloc would have much effect there?


    What I would much more *like* to try is recompile a perl (e.g.
    strawberry perl) on win32 and replace

    newlen += 10 * (newlen - SvCUR(sv));

    with something like

    newlen += 10 * (newlen - SvCUR(sv)) + 0.5 * SvCUR(sv);

    (with the factor reasonably somewhere between 0.2 and 1)
    but a quick attempt to follow http://perldoc.perl.org/perlwin32.html
    was not successful :(

    Wolfram
     
    Wolfram Humann, Jul 27, 2010
    #17
  18. Wolfram Humann

    Guest

    On Jul 22, 7:02 am, Wolfram Humann <> wrote:
    > I have a program that processes PDF files by converting them to
    > Postscript, read the ps and do something with it. I use pdftops (from
    > xpdf) for the pdf->ps conversion and retrieve the result like this:
    >
    > $ps_text = qx( pdftops $infile - );
    >
    > On win32 using strawberry perl (tried 5.10 and 5.12) this takes much
    > more time than I expected so I did a test and first converted the PDF
    > to Postscript, then read the Postscript (about 12 MB) like this (cat
    > on win32 provided by cygwin):
    >
    > perl -E" $t = qx(cat psfile.ps); say length $t "
    >
    > This takes about 16 seconds on win32 but only <1 seconds on Linux.



    Dear Wolfram,

    I've encountered a similar problem on Strawberry Perl before.

    I'm curious: Could you try "pre-allocating" the needed space to
    $ps_text (or $t) before you set it? For example, try this:

    perl -E "$t = ' ' x (-s 'psfile.ps'); $t = qx(cat psfile.ps); say
    length $t"

    See if that helps. I've found that setting my variable to the
    target length BEFORE it's set to the proper string can reduce time
    significantly (when it is eventually being set to its target value).
    I'm not sure why this is so, but I can guess that it's because it can
    avoid the time-consuming process of "growing" the string a little at a
    time.

    I hope this helps,

    -- Jean-Luc
     
    , Jul 27, 2010
    #18
  19. On 2010-07-27 17:07, Ben Morrow <> wrote:
    >
    > Quoth Wolfram Humann <>:
    >> On Jul 27, 4:13 pm, Ben Morrow <> wrote:
    >> > Would you be able to repeat these tests with 5.12.0: that is, build
    >> > (under Cygwin, if you don't have access to a Unix system) matched perls
    >> > configured with and without -Dusemymalloc, and run the test on both?
    >> > I'll try and do the same here, but I can't promise I'll have time. If
    >> > the slowdown still exists in 5.12, I think you have a good case for a
    >> > bug report. I'm not sure how possible it would be to fix, but it would
    >> > clearly be a big win under some circumstances to be able to build Win32
    >> > perl with perl's malloc.

    >>
    >> I do have a linux machine and I did comile my own perl there so I
    >> think a could redo that (possibly easier than recompiling Perl for
    >> Cygwin). The strange thing is that on Cygwin perl -V says
    >> 'usemymalloc=y' while the one on Linux says 'usemymalloc=n'. And on
    >> Linux my bechmark runs everything under 12 ms. Are you certain
    >> changing usemymalloc would have much effect there?

    >
    > No. It's possible that glibc's malloc already behaves the way perl is
    > expecting it to, so using perl's malloc doesn't change the performance
    > much.


    I'm pretty sure that GNU malloc doesn't round up to powers of two or
    something like that. However, the performance difference between GNU
    malloc and Perl malloc is rather small:

    perl 5.12.1, default config, EGLIBC 2.11.2-2:

    1E5 chars + 1E4 x 1E2 chars: 3.9 ms
    1E6 chars + 1E4 x 1E2 chars: 3.8 ms
    1E7 chars + 1E4 x 1E2 chars: 4.4 ms

    1E7 chars + 1E5 x 1E1 chars: 28.4 ms
    1E7 chars + 1E4 x 1E2 chars: 4.5 ms
    1E7 chars + 1E3 x 1E3 chars: 2.6 ms
    1E7 chars + 1E2 x 1E4 chars: 2.0 ms
    1E7 chars + 1E1 x 1E5 chars: 1.9 ms

    1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 2.0 ms
    1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 4.4 ms

    perl 5.12.1, usemymalloc=y, EGLIBC 2.11.2-2:

    1E5 chars + 1E4 x 1E2 chars: 2.6 ms
    1E6 chars + 1E4 x 1E2 chars: 3.8 ms
    1E7 chars + 1E4 x 1E2 chars: 2.5 ms

    1E7 chars + 1E5 x 1E1 chars: 18.8 ms
    1E7 chars + 1E4 x 1E2 chars: 2.5 ms
    1E7 chars + 1E3 x 1E3 chars: 0.9 ms
    1E7 chars + 1E2 x 1E4 chars: 0.9 ms
    1E7 chars + 1E1 x 1E5 chars: 1.1 ms

    1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.9 ms
    1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 3.4 ms

    That may be accidental, though: An strace output for GNU malloc) shows
    that very few reallocations actually result in a different address -
    mostly the allocated area can be grown because there is nothing after
    it. This is even true if two strings grow in parallel - each time one of
    the strings moves it leaves a hole which the other string can grow into,
    so in practice this works like a binary backoff. I guess there are
    allocation patterns which spoil this effect (e.g. if you allocate lots
    of small objects while growing large strings) but I haven't tried to
    find them.

    hp
     
    Peter J. Holzer, Jul 27, 2010
    #19
  20. On 2010-07-27, Wolfram Humann <> wrote:
    > sv.c). Perl_sv_grow then needs to decide if the string's memory is
    > already sufficient or really needs to grow. In the latter case,
    > safesysrealloc -> Perl_safesysrealloc -> realloc is called. The
    > interesting point is: how much memory does it request? The answer is:
    >
    > newlen += 10 * (newlen - SvCUR(sv)); /* avoid copy each time */
    >
    > I.e. it requests 10 times as much memory as is required for the
    > current append operation. So when I loop 10000 times and each time
    > append 100 chars to an initial string size of 10 million, the memory
    > grows from 10.000e6 to 10.001e6 to 10.002e6 and so on 1000 times till
    > it ends at 11.000e6.


    Good l*rd!

    The current algorithm is optimized to work in tandem with "my"
    malloc(), which would round up to a certain geometric progression
    anyway. So if one use as different malloc()s, one should better use

    newlen += (newlen >> 4) + 10; /* avoid copy each time */

    Thanks for an analysis (and report this on p5p)!

    Yours,
    Ilya

    P.S. And the current algorithm is a disaster if you try to append a
    string of length 200e6 to a short string...
     
    Ilya Zakharevich, Jul 28, 2010
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ham

    I need speed Mr .Net....speed

    Ham, Oct 28, 2004, in forum: ASP .Net
    Replies:
    6
    Views:
    2,359
    Antony Baula
    Oct 29, 2004
  2. efiedler
    Replies:
    1
    Views:
    2,083
    Tim Ward
    Oct 9, 2003
  3. Replies:
    2
    Views:
    2,314
    Howard
    Apr 28, 2004
  4. Replies:
    2
    Views:
    347
    Christopher Benson-Manica
    Apr 28, 2004
  5. Weng Lei-QCH1840
    Replies:
    1
    Views:
    192
    Thomas
    Aug 15, 2003
Loading...

Share This Page