Fast random string generation

Discussion in 'Perl Misc' started by Derek Fountain, Oct 24, 2004.

  1. I need to generate a string of random characters, say about 20,000
    characters long. And I need to do it quickly! I tried the obvious:

    for( my $i=0; $i<20000; $i++ ) {
    $str .= chr(int(rand(256)));
    }

    which works, but I have a feeling there must be a faster way than doing
    20,000 string concatenations...?
     
    Derek Fountain, Oct 24, 2004
    #1
    1. Advertising

  2. Derek Fountain

    Dave Oswald Guest

    "Derek Fountain" wrote in message
    > I need to generate a string of random characters, say about 20,000
    > characters long. And I need to do it quickly! I tried the obvious:
    >
    > for( my $i=0; $i<20000; $i++ ) {
    > $str .= chr(int(rand(256)));
    > }
    >
    > which works, but I have a feeling there must be a faster way than doing
    > 20,000 string concatenations...?


    If speed actually matters (which it may not), you'll have to benchmark this
    solution, comparing it to yours. They may turn out to be reasonably close
    to each other in speed.

    my $string = join '', map { chr int rand(256) } 1 .. 20_000;

    Here's another solution that pre-generates the 'chr' possibilities, and
    thus, MAY be a little more efficient.

    my @characters = map { chr $_ } 0 .. 255;
    my $string = join '', map { $characters[ rand 256 ] } 1 .. 20_000;

    Dave
     
    Dave Oswald, Oct 24, 2004
    #2
    1. Advertising

  3. Derek Fountain

    Ala Qumsieh Guest

    Derek Fountain wrote:

    > I need to generate a string of random characters, say about 20,000
    > characters long. And I need to do it quickly! I tried the obvious:
    >
    > for( my $i=0; $i<20000; $i++ ) {
    > $str .= chr(int(rand(256)));
    > }
    >
    > which works, but I have a feeling there must be a faster way than doing
    > 20,000 string concatenations...?


    I didn't benchmark, but I would guess it's faster to call chr() 256
    times instead of 20000 times:

    # untested code
    my @list = map chr, 0 .. 256;
    my $str = '';
    $str .= $list[rand @list] for 1 .. 20_000;

    --Ala
     
    Ala Qumsieh, Oct 24, 2004
    #3
  4. Derek Fountain <> wrote:

    > I need to generate a string of random characters, say about
    > 20,000 characters long. And I need to do it quickly! I tried the
    > obvious:
    >
    > for( my $i=0; $i<20000; $i++ ) {
    > $str .= chr(int(rand(256)));
    > }
    >
    > which works, but I have a feeling there must be a faster way
    > than doing 20,000 string concatenations...?


    my $str = "\000" x 20_000;
    my @chr = map { chr $_ } 0 .. 255;
    substr $str, $_, 1, $chr[rand 256] for 1 .. 20_000;

    Peter

    --
    #!/local/bin/perl5 -wp -*- mode: cperl; coding: iso-8859-1; -*-
    # matlab comment stripper (strips comments from Matlab m-files)
    s/^((?:(?:[])}\w.]'+|[^'%])+|'[^'\n]*(?:''[^'\n]*)*')*).*/$1/x;
     
    Peter J. Acklam, Oct 24, 2004
    #4
  5. >>>>> "Peter" == Peter J Acklam <> writes:

    Peter> Derek Fountain <> wrote:
    >> I need to generate a string of random characters, say about
    >> 20,000 characters long. And I need to do it quickly! I tried the
    >> obvious:
    >>
    >> for( my $i=0; $i<20000; $i++ ) {
    >> $str .= chr(int(rand(256)));
    >> }
    >>
    >> which works, but I have a feeling there must be a faster way
    >> than doing 20,000 string concatenations...?


    Peter> my $str = "\000" x 20_000;
    Peter> my @chr = map { chr $_ } 0 .. 255;
    Peter> substr $str, $_, 1, $chr[rand 256] for 1 .. 20_000;

    maybe:

    pack "C*", map rand 256 for 1 .. 20_000;

    --
    Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
    <> <URL:http://www.stonehenge.com/merlyn/>
    Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
    See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
     
    Randal L. Schwartz, Oct 24, 2004
    #5
  6. Derek Fountain

    Bart Lateur Guest

    Ala Qumsieh wrote:

    >I didn't benchmark, but I would guess it's faster to call chr() 256
    >times instead of 20000 times:
    >
    > # untested code
    > my @list = map chr, 0 .. 256;
    > my $str = '';
    > $str .= $list[rand @list] for 1 .. 20_000;


    It's also not very random. I don't think it's the idea to have a repeat
    period of 256 cycles.

    --
    Bart.
     
    Bart Lateur, Oct 25, 2004
    #6
  7. Derek Fountain

    Bart Lateur Guest

    Derek Fountain wrote:

    >I need to generate a string of random characters, say about 20,000
    >characters long. And I need to do it quickly! I tried the obvious:
    >
    > for( my $i=0; $i<20000; $i++ ) {
    > $str .= chr(int(rand(256)));
    > }
    >
    >which works, but I have a feeling there must be a faster way than doing
    >20,000 string concatenations...?


    I have the following basic 3 ideas:

    1) prebuild a string of 20000 bytes, and use substr to replace the
    concatenation:

    $x = " " x 20000;
    substr($x, $_, 1) = chr rand 256 for 0 .. 19999;


    2) ditto, but use vec() (bypasses chr()):

    $x = " " x 20000;
    vec($x, $_, 8) = rand 256 for 0 .. 19999;


    3) use pack 'C*' to replace all of the chr() calls + concat/join:

    $x = pack 'C*', map int rand 256, 1 .. 20000;

    --
    Bart.
     
    Bart Lateur, Oct 25, 2004
    #7
  8. Bart Lateur wrote:
    > Ala Qumsieh wrote:
    >
    >
    >>I didn't benchmark, but I would guess it's faster to call chr() 256
    >>times instead of 20000 times:
    >>
    >> # untested code
    >> my @list = map chr, 0 .. 256;
    >> my $str = '';
    >> $str .= $list[rand @list] for 1 .. 20_000;

    >
    >
    > It's also not very random. I don't think it's the idea to have a repeat
    > period of 256 cycles.


    Neither suggestion does.

    --
    John W. Kennedy
    "The poor have sometimes objected to being governed badly; the rich have
    always objected to being governed at all."
    -- G. K. Chesterton. "The Man Who Was Thursday"
     
    John W. Kennedy, Oct 25, 2004
    #8
  9. On Sun, 24 Oct 2004 11:38:30 +0800, Derek Fountain
    <> wrote:

    >I need to generate a string of random characters, say about 20,000
    >characters long. And I need to do it quickly! I tried the obvious:


    OK, here are the benchmarks of some of the proposed solutions along
    with some other idea.

    BTW: I'm to say the least in a rush and I could absoultely NOT verify
    that my 'Pack2' solution is correct. But in case it is notm then I
    guess that it can be easily corrected.

    #!/usr/bin/perl

    use strict;
    use warnings;

    use Benchmark qw/:all/;

    cmpthese 500, {
    Loop1 => \&Loop1,
    Loop2 => \&Loop2,
    Map1 => \&Map1,
    Map2 => \&Map2,
    Subst => \&Subst,
    Vec => \&Vec,
    Pack1 => \&Pack1,
    Pack2 => \&Pack2,
    S => \&S
    };


    sub Loop1 {
    my $str = '';
    $str .= chr int rand 256 for 1..20_000;
    $str;
    }

    sub Loop2 {
    my @chrs = map chr, 0..255;
    my $str = '';
    $str .= $chrs[rand 256] for 1..20_000;
    $str;
    }

    sub Map1 {
    join '', map { chr int rand 256 } 1..20_000;
    }

    sub Map2 {
    my @chrs = map chr, 0..255;
    join '', map $chrs[rand 256], 1..20_000;
    }

    sub Subst {
    my $str = "\000" x 20_000;
    my @chrs = map chr, 0..255;
    substr $str, $_, 1, $chrs[rand 256] for 0..19_999;
    $str;
    }

    sub Vec {
    my $str = ' ' x 20000;
    vec($str, $_, 8) = rand 256 for 0..19_999;
    $str;
    }

    sub Pack1 {
    pack 'C*', map int rand 256, 1..20_000;
    }

    sub Pack2 {
    # Is this OK, BTW?
    pack 'L*', map rand ~0, 1..5_000;
    }

    sub S {
    local $_ = ' ' x 20_000;
    s/ /chr int rand 256/ge;
    $_;
    }

    __END__


    Rate Map2 Map1 S Subst Vec Pack1 Loop1 Loop2 Pack2
    Map2 47.8/s -- -1% -5% -32% -36% -39% -44% -48% -84%
    Map1 48.3/s 1% -- -4% -32% -35% -38% -43% -47% -84%
    S 50.2/s 5% 4% -- -29% -33% -36% -41% -45% -84%
    Subst 70.6/s 48% 46% 41% -- -5% -9% -17% -22% -77%
    Vec 74.6/s 56% 54% 49% 6% -- -4% -12% -18% -76%
    Pack1 78.0/s 63% 61% 55% 10% 5% -- -8% -14% -75%
    Loop1 84.7/s 77% 75% 69% 20% 14% 9% -- -7% -72%
    Loop2 91.1/s 91% 89% 81% 29% 22% 17% 7% -- -70%
    Pack2 307/s 542% 535% 511% 334% 311% 293% 262% 237% --


    Any correction/cmt/etc. welcome!


    HTH,
    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Oct 25, 2004
    #9
  10. Derek Fountain

    Bart Lateur Guest

    Michele Dondi wrote:

    > sub Pack2 {
    > # Is this OK, BTW?
    > pack 'L*', map rand ~0, 1..5_000;
    > }


    I don't think you'll ever see "\xFF\xFF\xFF\xFF" there. Better add 1 to
    it.

    --
    Bart.
     
    Bart Lateur, Oct 25, 2004
    #10
  11. On Mon, 25 Oct 2004 16:36:17 +0200, Michele Dondi
    <> wrote:

    >OK, here are the benchmarks of some of the proposed solutions along
    >with some other idea.
    >
    >BTW: I'm to say the least in a rush and I could absoultely NOT verify
    >that my 'Pack2' solution is correct. But in case it is notm then I
    >guess that it can be easily corrected.


    OK, I'm back again and now I have some more time. I done some *quick*
    tests on 'Pack2' and[*] it *seems* to work as expected. At least, I
    can't see any big misbehaviour...

    In the meanwhile I thought that *if* the program will be run only
    under Linux (don't know about other unices), then it may read from
    /dev/urandom as well. So I added, more out of curiosity than for other
    reasons, a sub like

    sub Dev1 {
    open my $fh, '<', '/dev/urandom' or
    die "Can't access `/dev/urandom': $!\n";
    local $/ = \20_000;
    scalar <$fh>;
    }

    and I noticed that it was much faster than I had expected, so I also
    tested a version

    sub Dev2 {
    local $/ = \20_000;
    scalar <$fh>;
    }

    that doesn't open() /dev/urandom each time. The speed increase is
    noticeable but not impressive. Here are the updated results (omitting
    qw/Map1 Map2 S/ solutions from previous post):


    Rate Subst Vec Pack1 Loop1 Loop2 Dev1 Dev2 Pack2
    Subst 71.0/s -- -3% -9% -18% -24% -73% -73% -78%
    Vec 73.2/s 3% -- -6% -15% -21% -72% -73% -77%
    Pack1 78.1/s 10% 7% -- -10% -16% -70% -71% -76%
    Loop1 86.4/s 22% 18% 11% -- -7% -67% -68% -73%
    Loop2 93.1/s 31% 27% 19% 8% -- -64% -65% -71%
    Dev1 259/s 265% 254% 232% 200% 178% -- -3% -19%
    Dev2 267/s 276% 265% 242% 210% 187% 3% -- -17%
    Pack2 321/s 351% 338% 310% 271% 244% 24% 20% --


    [*] Apart the naive assumption that ~0 will yield (2**32-1), which in
    fact does (here, now!) - but of course it is not reliable in terms of
    portability...


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Oct 25, 2004
    #11
  12. Also sprach Michele Dondi:

    > On Mon, 25 Oct 2004 16:36:17 +0200, Michele Dondi


    > [*] Apart the naive assumption that ~0 will yield (2**32-1), which in
    > fact does (here, now!) - but of course it is not reliable in terms of
    > portability...


    It's a pretty safe assumption, though, that nowadays integers are at
    least 32 bits wide. So you will have to guard against 64-bit machines.
    You can force 32 bits by doing C<~0 & Oxffffffff>.

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
     
    Tassilo v. Parseval, Oct 26, 2004
    #12
  13. On Mon, 25 Oct 2004 15:02:17 GMT, Bart Lateur <>
    wrote:

    >> pack 'L*', map rand ~0, 1..5_000;
    >> }

    >
    >I don't think you'll ever see "\xFF\xFF\xFF\xFF" there. Better add 1 to
    >it.


    Obviously! As I wrote, I was really in a big hurry, so I just put down
    what seemed to me the fastest way (in terms of keystrokes/thinkering)
    to do it.


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Oct 26, 2004
    #13
  14. Derek Fountain

    Bart Lateur Guest

    Michele Dondi wrote:

    >OK, I'm back again and now I have some more time. I done some *quick*
    >tests on 'Pack2' and[*] it *seems* to work as expected. At least, I
    >can't see any big misbehaviour...


    I can think of a caveat: I seem to recall that at least some random
    generators only had 24 significant bits. Now if you plan on using 32, it
    won't be completely random. Instead, I expect the lowest byte to always
    be the same, or at least, close to the same few values. (taking odd
    rounding into account)

    --
    Bart.
     
    Bart Lateur, Oct 26, 2004
    #14
  15. On Tue, 26 Oct 2004 06:27:33 +0200, "Tassilo v. Parseval"
    <> wrote:

    >> [*] Apart the naive assumption that ~0 will yield (2**32-1), which in
    >> fact does (here, now!) - but of course it is not reliable in terms of
    >> portability...

    >
    >It's a pretty safe assumption, though, that nowadays integers are at
    >least 32 bits wide. So you will have to guard against 64-bit machines.
    >You can force 32 bits by doing C<~0 & Oxffffffff>.


    But then I would loose the only reason why I used ~0 in the first
    place, i.e. to write 2**32 (err, well: not exactly, as it has already
    been pointed out!) in just as few keystrokes as possible: and no, no
    good reason for golfing but the fact that I *had* to leave home in a
    minute or so! (so that even thinking better about it was not an
    option! ;-)

    Oh, and BTW: if it could have been C<~0 & Oxffffffff>, then it would
    have been C<Oxffffffff>! :)


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Oct 26, 2004
    #15
  16. On Tue, 26 Oct 2004 09:55:40 GMT, Bart Lateur <>
    wrote:

    >I can think of a caveat: I seem to recall that at least some random
    >generators only had 24 significant bits. Now if you plan on using 32, it
    >won't be completely random. Instead, I expect the lowest byte to always
    >be the same, or at least, close to the same few values. (taking odd
    >rounding into account)


    Well, as I wrote, this doesn't seem to be the case. At least under
    Linux, but then see the "Linux vs. Windows: different behaviour [re
    rand()]" thread started anew about this very topic...


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Oct 27, 2004
    #16
  17. On 24 Oct 2004 08:00:46 -0700, (Randal L.
    Schwartz) wrote:

    >Peter> my $str = "\000" x 20_000;
    >Peter> my @chr = map { chr $_ } 0 .. 255;
    >Peter> substr $str, $_, 1, $chr[rand 256] for 1 .. 20_000;


    >maybe:
    >
    > pack "C*", map rand 256 for 1 .. 20_000;


    If you read my other posts in this thread (those with the benchmarks)
    you'll notice that this solution doesn't perform well if compared to a
    naive loop of the kind of that used by the OP. But a pack() solution
    wins if we pack at four[*] bytes at a time (and in the meanwhile we
    discovered that this depends on the RANDBITS compile option).

    However it also resulted that map()-versions of the naive loop
    solutions were, reasonably, slower than these. So I wondered wether a
    solution like 'Loop3' below could perform even better. For ease of
    reading I'll repeat my complete test here:


    #!/usr/bin/perl

    use strict;
    use warnings;

    use Benchmark qw/:all/;

    # For 'Dev2'
    open my $fh, '<', '/dev/urandom' or
    die "Can't access `/dev/urandom': $!\n";

    cmpthese 500, {
    Loop1 => \&Loop1,
    Loop2 => \&Loop2,
    # Map1 => \&Map1,
    # Map2 => \&Map2,
    # Subst => \&Subst,
    Vec => \&Vec,
    Pack1 => \&Pack1,
    Pack2 => \&Pack2,
    # S => \&S,
    Dev1 => \&Dev1,
    Dev2 => \&Dev2,
    Loop3 => \&Loop3 };


    sub Loop1 {
    my $str = '';
    $str .= chr int rand 256 for 1..20_000;
    $str;
    }

    sub Loop2 {
    my @chrs = map chr, 0..255;
    my $str = '';
    $str .= $chrs[rand 256] for 1..20_000;
    $str;
    }

    sub Map1 {
    join '', map { chr int rand 256 } 1..20_000;
    }

    sub Map2 {
    my @chrs = map chr, 0..255;
    join '', map $chrs[rand 256], 1..20_000;
    }

    sub Subst {
    my $str = "\000" x 20_000;
    my @chrs = map chr, 0..255;
    substr $str, $_, 1, $chrs[rand 256] for 0..19_999;
    $str;
    }

    sub Vec {
    my $str = ' ' x 20000;
    vec($str, $_, 8) = rand 256 for 0..19_999;
    $str;
    }

    sub Pack1 {
    pack 'C*', map int rand 256, 1..20_000;
    }

    sub Pack2 {
    # Is this OK, BTW?
    pack 'L*', map rand ~0, 1..5_000;
    }

    sub S {
    local $_ = ' ' x 20_000;
    s/ /chr int rand 256/ge;
    $_;
    }

    sub Dev1 {
    open my $fh, '<', '/dev/urandom' or
    die "Can't access `/dev/urandom': $!\n";
    local $/ = \20_000;
    scalar <$fh>;
    }

    sub Dev2 {
    local $/ = \20_000;
    scalar <$fh>;
    }

    sub Loop3 {
    my $str = '';
    $str .= pack 'L*', rand 2**32 for 1..5_000;
    $str;
    }

    __END__


    Results:

    Rate Vec Pack1 Loop1 Loop2 Loop3 Dev1 Dev2 Pack2
    Vec 74.7/s -- -5% -12% -16% -70% -71% -72% -77%
    Pack1 78.7/s 5% -- -7% -11% -68% -70% -71% -75%
    Loop1 84.7/s 13% 8% -- -4% -65% -67% -68% -74%
    Loop2 88.5/s 18% 12% 4% -- -64% -66% -67% -72%
    Loop3 245/s 228% 211% 189% 177% -- -6% -8% -24%
    Dev1 260/s 248% 231% 207% 194% 6% -- -3% -19%
    Dev2 267/s 258% 240% 216% 202% 9% 3% -- -17%
    Pack2 321/s 329% 307% 278% 262% 31% 23% 20% --


    So: no, it wasn't a good idea...


    [*] Eight on 64-bit CPUs?


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Oct 28, 2004
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mathias
    Replies:
    1
    Views:
    1,503
    Robert Kern
    Mar 5, 2005
  2. Mathias
    Replies:
    0
    Views:
    386
    Mathias
    Mar 4, 2005
  3. globalrev
    Replies:
    4
    Views:
    782
    Gabriel Genellina
    Apr 20, 2008
  4. John W. Long

    HTML Generation (Next Generation CGI)

    John W. Long, Nov 22, 2003, in forum: Ruby
    Replies:
    4
    Views:
    367
    John W. Long
    Nov 24, 2003
  5. VK
    Replies:
    15
    Views:
    1,210
    Dr J R Stockton
    May 2, 2010
Loading...

Share This Page