perl out of memory

Discussion in 'Perl Misc' started by xlue897@rogers.com, May 2, 2007.

  1. Guest

    Hi,

    I have a query to search the max number in a large size file. When I
    use the perl code below, it generates error : Out of memory! Bus
    error.

    Perl Code:
    perl -e '
    for(<>)
    {if ($_>$max){$max=$_;}}
    print $max;'
    <large_size_file

    Also, can command line perl with -n run like awk -
    'BEGIN{code}
    {code}
    END{code}
    '

    Thanks

    Steven
    , May 2, 2007
    #1
    1. Advertising

  2. wrote:
    > Hi,
    >
    > I have a query to search the max number in a large size file. When I
    > use the perl code below, it generates error : Out of memory! Bus
    > error.
    >
    > Perl Code:
    > perl -e '
    > for(<>)


    Replace 'for' with 'while'.
    The magic of reading a line at a time works for 'while(<>)' only.

    jue
    Jürgen Exner, May 2, 2007
    #2
    1. Advertising

  3. wrote:
    >
    > I have a query to search the max number in a large size file. When I
    > use the perl code below, it generates error : Out of memory! Bus
    > error.
    >
    > Perl Code:
    > perl -e '
    > for(<>)


    You are using a for loop so perl has to read the entire file first into a list
    in memory. Use a while loop instead.


    > {if ($_>$max){$max=$_;}}
    > print $max;'
    > <large_size_file


    perl -lne'$max = $_ if $_ > $max; END { print $max }' large_size_file


    > Also, can command line perl with -n run like awk -
    > 'BEGIN{code}
    > {code}
    > END{code}
    > '


    Yes.



    John
    --
    Perl isn't a toolbox, but a small machine shop where you can special-order
    certain sorts of tools at low cost and in short order. -- Larry Wall
    John W. Krahn, May 2, 2007
    #3
  4. On Wed, 02 May 2007 15:31:13 GMT, "Jürgen Exner"
    <> wrote:

    >The magic of reading a line at a time works for 'while(<>)' only.


    ATM! :)


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
    Michele Dondi, May 2, 2007
    #4
  5. Guest

    On May 2, 1:21 pm, Michele Dondi <> wrote:
    > On Wed, 02 May 2007 15:31:13 GMT, "Jürgen Exner"
    >
    > <> wrote:
    > >The magic of reading a line at a time works for 'while(<>)' only.

    >
    > ATM! :)
    >
    > Michele
    > --
    > {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    > (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    > .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    > 256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


    Thanks everyone for your help. The while loop works. However, the
    perl code seems much slower than awk code. For the same file size
    around 5M records, the awk takes only 1 min to loop to find the max
    value, the perl takes around 20 mins. Does perl slower than awk?


    Thanks.

    Steven
    , May 4, 2007
    #5
  6. On 4 May 2007 10:33:26 -0700, wrote:

    >> ATM! :)
    >>
    >> Michele
    >> --
    >> {$_=3Dpack'B8'x25,unpack'A8'x32,$a^=3Dsub{pop^pop}->(map substr


    What's with quoting the .sig? (If not discussing it, that is. But this
    is generally the case with Abigail's!)

    >Thanks everyone for your help. The while loop works. However, the
    >perl code seems much slower than awk code. For the same file size
    >around 5M records, the awk takes only 1 min to loop to find the max
    >value, the perl takes around 20 mins. Does perl slower than awk?


    Hard to say, without seeing any code. Find it hard to believe, though:

    cognac:~ [21:23:58]$ perl -le 'print rand for 1..5_000_000' > test
    cognac:~ [21:24:19]$ time perl -ne '$m=$_>$m?$_:$m;END{print $m}'
    test
    0.999999995290754

    real 0m8.604s
    user 0m7.160s
    sys 0m1.368s


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
    Michele Dondi, May 4, 2007
    #6
  7. Guest

    On May 4, 3:25 pm, Michele Dondi <> wrote:
    > On 4 May 2007 10:33:26 -0700, wrote:
    >
    > >> ATM! :)

    >
    > >> Michele
    > >> --
    > >> {$_=3Dpack'B8'x25,unpack'A8'x32,$a^=3Dsub{pop^pop}->(map substr

    >
    > What's with quoting the .sig? (If not discussing it, that is. But this
    > is generally the case with Abigail's!)
    >
    > >Thanks everyone for your help. The while loop works. However, the
    > >perl code seems much slower than awk code. For the same file size
    > >around 5M records, the awk takes only 1 min to loop to find the max
    > >value, the perl takes around 20 mins. Does perl slower than awk?

    >
    > Hard to say, without seeing any code. Find it hard to believe, though:
    >
    > cognac:~ [21:23:58]$ perl -le 'print rand for 1..5_000_000' > test
    > cognac:~ [21:24:19]$ time perl -ne '$m=$_>$m?$_:$m;END{print $m}'
    > test
    > 0.999999995290754
    >
    > real 0m8.604s
    > user 0m7.160s
    > sys 0m1.368s
    >
    > Michele
    > --
    > {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    > (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    > .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    > 256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


    Here is the test code with result. test file generated by (perl -le
    'print rand for 1..5_000_000' > test.txt)
    $time awk -F'.' '{ if($2 > max) {max = $2;} } END{print max;}'
    <test.txt
    999969482421875

    real 0m18.16s
    user 0m17.38s
    sys 0m0.18s

    $time perl -a -F'\.' -n -e '{ if($F[1] >$max) {$max=$F[1];} }
    END{print $max;}' test.txt
    999969482421875

    real 0m41.57s
    user 0m41.14s
    sys 0m0.16s


    BTW, why the code below doesn't work?
    perl -a -F/\./ -n -e '{print $F[1], "\n";} ' test.txt


    Thanks,
    Steven
    , May 4, 2007
    #7
  8. On 4 May 2007 13:33:34 -0700, wrote:

    >> >perl code seems much slower than awk code. For the same file size
    >> >around 5M records, the awk takes only 1 min to loop to find the max
    >> >value, the perl takes around 20 mins. Does perl slower than awk?

    [snip]
    >Here is the test code with result. test file generated by (perl -le
    >'print rand for 1..5_000_000' > test.txt)
    >$time awk -F'.' '{ if($2 > max) {max = $2;} } END{print max;}'
    ><test.txt
    >999969482421875
    >
    >real 0m18.16s
    >user 0m17.38s
    >sys 0m0.18s
    >
    >$time perl -a -F'\.' -n -e '{ if($F[1] >$max) {$max=$F[1];} }
    >END{print $max;}' test.txt
    >999969482421875
    >
    >real 0m41.57s
    >user 0m41.14s
    >sys 0m0.16s


    Well, indeed awk appears to be faster, but not in the same measure as
    you hinted above. Anyway, this *does* surprise me, but not too much:
    afaik awk is a specialized tool and Perl a full fledged language
    (although one supposed to excel in the same areas).

    >BTW, why the code below doesn't work?
    >perl -a -F/\./ -n -e '{print $F[1], "\n";} ' test.txt


    That should be -F'/\./' otherwise the dot gets quoted by the shell,
    but perl will see the /./ pattern which is *not* what you want.


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
    Michele Dondi, May 5, 2007
    #8
  9. Guest

    wrote:
    > >
    > > >Thanks everyone for your help. The while loop works. However, the
    > > >perl code seems much slower than awk code. For the same file size
    > > >around 5M records, the awk takes only 1 min to loop to find the max
    > > >value, the perl takes around 20 mins. Does perl slower than awk?

    > >

    ....
    >
    > Here is the test code with result. test file generated by (perl -le
    > 'print rand for 1..5_000_000' > test.txt)
    > $time awk -F'.' '{ if($2 > max) {max = $2;} } END{print max;}'
    > <test.txt
    > 999969482421875
    >
    > real 0m18.16s
    > user 0m17.38s
    > sys 0m0.18s
    >
    > $time perl -a -F'\.' -n -e '{ if($F[1] >$max) {$max=$F[1];} }
    > END{print $max;}' test.txt
    > 999969482421875
    >
    > real 0m41.57s
    > user 0m41.14s
    > sys 0m0.16s


    So the difference here is less than a factor of 3, rather than the factor
    of 20 you originally said. A factor of 3 is easy to believe. Different
    languages have different strengths.

    >
    > BTW, why the code below doesn't work?
    > perl -a -F/\./ -n -e '{print $F[1], "\n";} ' test.txt


    The shell eats the backslash, so Perl never sees it and treats . as the
    special character rather than as a literal. It often helps to use echo
    to tell you exactly what Perl is seeing once the shell is done:


    $ echo F/\./
    F/./

    $ echo 'F/\./'
    F/\./

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , May 10, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page