Speed of reading some MB of data using qx(...)

Ilya Zakharevich · Jul 28, 2010

If you timed perl's own realloc, you would (I believe) find it does much
better than this. AFAICS from the code, it has a fixed set of block
sizes it actually allocates. Enlarging a block such that it doesn't go
over the block size actually allocated is *free*, not even linear in the
size of the block, since all it does is adjust the end marker. This is
the logic you are expecting sv_grow to implement, but perl has decided
that this is the allocator's responsibility.

Wrong analysis. *In this particular situation* one would find Perl's
realloc() as slow as M$'s one - but it would be called MUCH more
rarely. With my malloc(), I implemented an extra call:
malloced_size(). So on the 1st iteration, things would go similarly
to "their-malloc", and realloc() would be called - and your analysis
would apply.

But on the 2nd iteration of appending, Perl would discover the new
*really* allocated size for the string, and would update the buffer
size stored in SV. After this, for many iteractions the realloc() is
out-of-the-loop altogether.

BTW, IIRC, my malloc() does not store the "end marker" at all (unless
a debugging version is used). (The bucket size of short strings
depends only on which page of memory they belong to - well, actually,
half-page.)

Hope this helps,
Ilya

Ilya Zakharevich · Jul 28, 2010

I'm pretty sure that GNU malloc doesn't round up to powers of two or
something like that. However, the performance difference between GNU
malloc and Perl malloc is rather small:

Yeah, right. Not more than 3 times. ;-)

perl 5.12.1, default config, EGLIBC 2.11.2-2:

1E5 chars + 1E4 x 1E2 chars: 3.9 ms
1E6 chars + 1E4 x 1E2 chars: 3.8 ms
1E7 chars + 1E4 x 1E2 chars: 4.4 ms

1E7 chars + 1E5 x 1E1 chars: 28.4 ms
1E7 chars + 1E4 x 1E2 chars: 4.5 ms
1E7 chars + 1E3 x 1E3 chars: 2.6 ms
1E7 chars + 1E2 x 1E4 chars: 2.0 ms
1E7 chars + 1E1 x 1E5 chars: 1.9 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 2.0 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 4.4 ms

perl 5.12.1, usemymalloc=y, EGLIBC 2.11.2-2:

1E5 chars + 1E4 x 1E2 chars: 2.6 ms
1E6 chars + 1E4 x 1E2 chars: 3.8 ms
1E7 chars + 1E4 x 1E2 chars: 2.5 ms

1E7 chars + 1E5 x 1E1 chars: 18.8 ms
1E7 chars + 1E4 x 1E2 chars: 2.5 ms
1E7 chars + 1E3 x 1E3 chars: 0.9 ms
1E7 chars + 1E2 x 1E4 chars: 0.9 ms
1E7 chars + 1E1 x 1E5 chars: 1.1 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.9 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 3.4 ms

Yours,
Ilya

Peter J. Holzer · Jul 28, 2010

Yeah, right. Not more than 3 times. ;-)

Compared to the factor of about 1000 that Wolfram and Ben observed this
is small. More importantly, growing a string to length n in constant
increments appears to be an O(n) operation with both your malloc and GNU
malloc, but O(n²) with Win32 malloc and BSD malloc.

Your malloc is certainly faster than GNU malloc. OTOH, AFAICS from
quickly scanning the comments in malloc.c, it always pads allocation to
the next power of two (minus 4) and it doesn't return unused memory back
to the OS. So it might use quite a bit more memory. Which of these
effects is more important depends on the program and is hard to
determine analytically. Which means that I should benchmark some of my
more performance-critical scripts with both allocators ;-).

hp

Wolfram Humann · Jul 28, 2010

Good l*rd!

The current algorithm is optimized to work in tandem with "my"
malloc(), which would round up to a certain geometric progression
anyway. So if one use as different malloc()s, one should better use

newlen += (newlen >> 4) + 10; /* avoid copy each time */

I finally managed to compile my own win32 perl. (Actually it was quite
easy once I refrained from doing mistakes so stupid I do not dare to
talk about them...)
Now I could modify Perl_sv_grow() and insert debugging prints and I
found good and bad news.

The bad Looks like I was *overly optimistic* (LOL!) concerning
the efficiency of the current string memory allocation on win32. The
"newlen += 10 * (newlen - SvCUR(sv))" line is only executed if
SvOOK(sv) -- i.e. in most cases it is *not* executed. Therefore win32
system realloc is not called every tenth string-append operation but
*every* time something gets appended to a string.

The good A single additional line of code makes win32 perl
100...1000 times faster!
(for code that appends to strings very frequently)

I went with Ilya's proposal but inserted the line a little further
down, just after
if (newlen > SvLEN(sv)) { /* need more room? */

So now we have:
if (newlen > SvLEN(sv)) { /* need more room? */
newlen += (newlen >> 2) + 10;
#ifndef Perl_safesysmalloc_size
newlen = PERL_STRLEN_ROUNDUP(newlen);
#endif
if (SvLEN(sv) && s) {
s = (char*)saferealloc(s, newlen);
}

The remaining question is by what ratio a string's memory should grow.
I tried several values from (newlen >> 0) to (newlen >> 6) for the
best compromise between execution time and memory usage and my
personal favorite is (newlen >> 2). What do others here think? At the
end of this post I will attach the results for my benchmark script
starting with Cygwin Perl followed by several versions of (newlen >>
x) and finally the unpatched Strawberry Perl. These reports now also
include memory footprint info (courtesy of pslist from the
Sysinternals suite). I also went back to my original task of reading a
12 MB postscript file using qx(cat ...) and in some cases I also
report times for that -- here Cygwin (70 ms) still beats my modified
perl (210 ms), but that's still waaaaay better than the original 18000
ms

I will also report to p5p.

Wolfram

###########################################################

c:\cygwin\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 1.5 ms
1E6 chars + 1E4 x 1E2 chars: 2.3 ms
1E7 chars + 1E4 x 1E2 chars: 1.5 ms

1E7 chars + 1E5 x 1E1 chars: 12.2 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E3 x 1E3 chars: 0.6 ms
1E7 chars + 1E2 x 1E4 chars: 0.6 ms
1E7 chars + 1E1 x 1E5 chars: 0.8 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.2 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 5.9 ms

Private MB: 326.5
Peak Private MB: 326.5

--------------

qx(cat postscriptfile.ps): 68.7 ms
Private MB: 38.5
Peak Private MB: 38.5

###########################################################

newlen += (newlen >> 0) + 10;

C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 2.2 ms
1E6 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms

1E7 chars + 1E5 x 1E1 chars: 10.4 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E3 x 1E3 chars: 0.6 ms
1E7 chars + 1E2 x 1E4 chars: 0.6 ms
1E7 chars + 1E1 x 1E5 chars: 0.6 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.2 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 6.0 ms

Private MB: 378.3
Peak Private MB: 418.0

--------------

qx(cat postscriptfile.ps): 181.2 ms
Private MB: 25.1
Peak Private MB: 40.3

###########################################################

newlen += (newlen >> 1) + 10;

C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 2.5 ms
1E6 chars + 1E4 x 1E2 chars: 2.4 ms
1E7 chars + 1E4 x 1E2 chars: 1.3 ms

1E7 chars + 1E5 x 1E1 chars: 9.6 ms
1E7 chars + 1E4 x 1E2 chars: 1.3 ms
1E7 chars + 1E3 x 1E3 chars: 0.7 ms
1E7 chars + 1E2 x 1E4 chars: 0.7 ms
1E7 chars + 1E1 x 1E5 chars: 0.6 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.1 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 6.4 ms

Private MB: 290.2
Peak Private MB: 319.5

###########################################################

newlen += (newlen >> 2) + 10;

C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 9.2 ms
1E6 chars + 1E4 x 1E2 chars: 5.3 ms
1E7 chars + 1E4 x 1E2 chars: 1.5 ms

1E7 chars + 1E5 x 1E1 chars: 9.9 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E3 x 1E3 chars: 0.5 ms
1E7 chars + 1E2 x 1E4 chars: 0.5 ms
1E7 chars + 1E1 x 1E5 chars: 0.6 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.1 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 5.4 ms

Private MB: 244.9
Peak Private MB: 270.1

--------------

qx(cat postscriptfile.ps): 209.8 ms
Private MB: 16.2
Peak Private MB: 29.0

###########################################################

newlen += (newlen >> 3) + 10;

C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 12.1 ms
1E6 chars + 1E4 x 1E2 chars: 6.9 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms

1E7 chars + 1E5 x 1E1 chars: 10.3 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E3 x 1E3 chars: 0.5 ms
1E7 chars + 1E2 x 1E4 chars: 0.5 ms
1E7 chars + 1E1 x 1E5 chars: 0.5 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.1 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 5.6 ms

Private MB: 221.9
Peak Private MB: 244.3

###########################################################

newlen += (newlen >> 4) + 10;

C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 17.0 ms
1E6 chars + 1E4 x 1E2 chars: 13.8 ms
1E7 chars + 1E4 x 1E2 chars: 11.2 ms

1E7 chars + 1E5 x 1E1 chars: 19.4 ms
1E7 chars + 1E4 x 1E2 chars: 10.1 ms
1E7 chars + 1E3 x 1E3 chars: 10.9 ms
1E7 chars + 1E2 x 1E4 chars: 11.1 ms
1E7 chars + 1E1 x 1E5 chars: 11.0 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.2 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 6.3 ms

Private MB: 219.4
Peak Private MB: 233.8

--------------

qx(cat postscriptfile.ps): 312.0 ms
Private MB: 14.0
Peak Private MB: 25.8

###########################################################

newlen += (newlen >> 6) + 10;

C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 57.7 ms
1E6 chars + 1E4 x 1E2 chars: 59.8 ms
1E7 chars + 1E4 x 1E2 chars: 67.9 ms

1E7 chars + 1E5 x 1E1 chars: 69.4 ms
1E7 chars + 1E4 x 1E2 chars: 71.6 ms
1E7 chars + 1E3 x 1E3 chars: 69.6 ms
1E7 chars + 1E2 x 1E4 chars: 64.8 ms
1E7 chars + 1E1 x 1E5 chars: 53.8 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.2 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 5.7 ms

Private MB: 219.8
Peak Private MB: 230.0

###########################################################

unpatched Strawberry Perl

c:\strawberry\perl\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 96.2 ms
1E6 chars + 1E4 x 1E2 chars: 325.7 ms
1E7 chars + 1E4 x 1E2 chars: 2655.9 ms

1E7 chars + 1E5 x 1E1 chars: 2687.3 ms
1E7 chars + 1E4 x 1E2 chars: 2687.4 ms
1E7 chars + 1E3 x 1E3 chars: 2656.1 ms
1E7 chars + 1E2 x 1E4 chars: 1093.6 ms
1E7 chars + 1E1 x 1E5 chars: 108.3 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.1 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 6.1 ms

Private MB: 200.4
Peak Private MB: 210.2

Peter J. Holzer · Jul 28, 2010

I went with Ilya's proposal but inserted the line a little further
down, just after
if (newlen > SvLEN(sv)) { /* need more room? */

So now we have:
if (newlen > SvLEN(sv)) { /* need more room? */
newlen += (newlen >> 2) + 10;
#ifndef Perl_safesysmalloc_size
newlen = PERL_STRLEN_ROUNDUP(newlen);
#endif
if (SvLEN(sv) && s) {
s = (char*)saferealloc(s, newlen);
}

The remaining question is by what ratio a string's memory should grow.
I tried several values from (newlen >> 0) to (newlen >> 6) for the
best compromise between execution time and memory usage and my
personal favorite is (newlen >> 2). What do others here think?

That sounds about right. I've used factors between 1.2 and 1.5 in the
past for similar problems. I suggest you base the growth on the old
size, though, something like:

if (newlen > SvLEN(sv)) { /* need more room? */
size_t min = SvLEN(sv) * 5/4 + 10;
if (newlen < min) newlen = min;
...

This gives you the same growth pattern if the increments are small, but
it doesn't allocate extra memory if you append a large chunk.

hp

Ilya Zakharevich · Jul 29, 2010

So now we have:
if (newlen > SvLEN(sv)) { /* need more room? */
newlen += (newlen >> 2) + 10;
#ifndef Perl_safesysmalloc_size
newlen = PERL_STRLEN_ROUNDUP(newlen);
#endif
if (SvLEN(sv) && s) {
s = (char*)saferealloc(s, newlen);
}

I think you make your life too simple. What you must do is find the
chain of events which sets
Perl_safesysmalloc_size/PERL_STRLEN_ROUNDUP, and modify this chain. :-(

I tried several values from (newlen >> 0) to (newlen >> 6) for the
best compromise between execution time and memory usage and my
personal favorite is (newlen >> 2). What do others here think?

My approach is never to take responsibility for such decisions.
Make a default value, and shift responsibility to the user. ;-)

#ifndef PERL_STRLEN_ROUNDUP_SHIFT
# define PERL_STRLEN_ROUNDUP_SHIFT 2
#endif

The suggestion to use the OLD length is also very viable...

1E5 chars + 1E4 x 1E2 chars: 1.5 ms

I hope your `ms' are actually seconds. It does not make sense to
measure performance on runs shorter than a second (maybe more on Win,
which is doing more unknown stuff in background)...

Yours,
Ilya

Wolfram Humann · Jul 29, 2010

I think you make your life too simple. What you must do is find the
chain of events which sets
Perl_safesysmalloc_size/PERL_STRLEN_ROUNDUP, and modify this chain. :-(

I'm traveling territory that's fairly unknown to me already, so if
this is too simple I should leave it to someone more knowledgeable to
do it right. What I *think* is, that PERL_STRLEN_ROUNDUP is just
concerned with memory boundary alignment, e.g. roundup to the next
multiple of 4. If this is the case, it should be independent of any
string memory expansion strategy.

My approach is never to take responsibility for such decisions.
Make a default value, and shift responsibility to the user. ;-)

#ifndef PERL_STRLEN_ROUNDUP_SHIFT
# define PERL_STRLEN_ROUNDUP_SHIFT 2
#endif

Given that nobody stumbled across the devastating current state, I
don't envision hoards of users trying to optimize this. But I agree
that it serves well as a reminder for someone reading the code that
this is not *the* correct value but a trade-off and could be decided
differently. However, given that IMO it's a different concept from the
existing PERL_STRLEN_ROUNDUP, I would prefer to give it a different
name. How about PERL_STRLEN_EXPAND_SHIFT?

The suggestion to use the OLD length is also very viable...
Agreed.

I hope your `ms' are actually seconds. It does not make sense to
measure performance on runs shorter than a second (maybe more on Win,
which is doing more unknown stuff in background)...

No, these are milliseconds. Yes, this makes it a lousy benchmark. Yes,
these numbers do vary easily by +-30% and sometimes more from run to
run. I tried to post "average" runs, but that's even more subjective,
of course. However the time differences encountered between different
cases are often a factor of 10 or even much more (in the unmodified
case, many of these do run for several seconds ;-)), so I think this
is a valid base for comparison.

So my current proposal reads like this:

#ifndef PERL_STRLEN_EXPAND_SHIFT
# define PERL_STRLEN_EXPAND_SHIFT 2
#endif

if (newlen > SvLEN(sv)) { /* need more room? */
size_t minlen = SvCUR(sv);
minlen += (minlen >> PERL_STRLEN_EXPAND_SHIFT) + 10;
if (newlen < minlen) newlen = minlen;
#ifndef Perl_safesysmalloc_size
newlen = PERL_STRLEN_ROUNDUP(newlen);
#endif
if (SvLEN(sv) && s) {
s = (char*)saferealloc(s, newlen);
}

My benchmark script does run slower now. The previous version did
expand allocated memory beyond minimum requirements during the initial
assignment, so that the reallocation count during append was 0 in many
cases. The new version does not do that so that at least 1 realloc is
required when one starts appending to the string.

Wolfram

speed of subroutine calls in perl	1	Jul 5, 2013
Backtick command with long output super slow	5	Dec 4, 2012
polling TCP server on Win32 (with Tk)	1	Mar 22, 2011
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
Out of memory in vec	10	Aug 10, 2007
abort at the end of data transfer	1	Mar 21, 2008
need help with a cart I inherited, need to increase number of total characters allowed	3	Oct 22, 2007
How bad is $'? (Was: "Get substring of line")	4	Jan 18, 2005

Speed of reading some MB of data using qx(...)

Ilya Zakharevich

Ilya Zakharevich

Peter J. Holzer

Wolfram Humann

Peter J. Holzer

Ilya Zakharevich

Wolfram Humann

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads