Opening large data file with Perl

Discussion in 'Perl Misc' started by pppe, Apr 15, 2006.

  1. pppe

    pppe Guest

    Hi

    I am posting this from my post to Perl Programming.
    --

    Is someone able to tell me if when using the open command to read data
    from a file, if the file was large, say 80mb, would it load much of the
    file into memory or would it only take a record at a time?

    To expand further, I currently use a perl script to open a data file,
    search each record for a match to a user input and then store the
    matched data in an array. I stop the search if there are more than 200
    results so the array never exceeds this, but as the data file is so
    large I wonder if there is any load on server memory, especially if
    numerous users were accessing it at once.

    Also, should I expect the server load to be greater with such a large
    file due to execution time? I already run this type of script on
    smaller files (5mb+) but an 80mb file concerns me on a shared server.

    Thanks
    pppe
     
    pppe, Apr 15, 2006
    #1
    1. Advertising

  2. pppe

    robic0 Guest

    On 14 Apr 2006 22:14:54 -0700, "pppe" <> wrote:

    >Hi


    >I am posting this from my post to Perl Programming.
    >--


    >Is someone able to tell me if when using the open command to read data
    >from a file, if the file was large, say 80mb, would it load much of the
    >file into memory or would it only take a record at a time?

    [snipe]

    No problem, post some code. This is the Perl code group.
    Post your Perl sample code. If you have some other concept constructs
    you might not get a response here...........
     
    robic0, Apr 15, 2006
    #2
    1. Advertising

  3. pppe

    Anno Siegel Guest

    pppe <> wrote in comp.lang.perl.misc:
    > Hi
    >
    > I am posting this from my post to Perl Programming.
    > --
    >
    > Is someone able to tell me if when using the open command to read data
    > from a file, if the file was large, say 80mb, would it load much of the
    > file into memory or would it only take a record at a time?


    The size of the file doesn't matter when you open it. How much of the
    file is in memory during processing depends on how you do the processing.

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Apr 15, 2006
    #3
  4. pppe

    Joe Smith Guest

    pppe wrote:

    > Is someone able to tell me if when using the open command to read data
    > from a file, if the file was large, say 80mb, would it load much of the
    > file into memory or would it only take a record at a time?


    Depends entirely on what code you are using to do the reading.

    @entire_file = <>; # Reads entire file all at once
    whereas
    while(<>){...}; # Reads one line at a time.

    > Also, should I expect the server load to be greater with such a large
    > file due to execution time?


    If the average number of lines required to get the data you want is
    the same, then it does not matter how big the unread portion of
    the file is.
    -Joe
     
    Joe Smith, Apr 15, 2006
    #4
  5. pppe wrote:

    > Is someone able to tell me if when using the open command to read data
    > from a file, if the file was large, say 80mb, would it load much of the
    > file into memory or would it only take a record at a time?


    You should process a 80MB file line by line and directly operate on it
    in the read loop, without using extra variables.

    > To expand further, I currently use a perl script to open a data file,
    > search each record for a match to a user input and then store the
    > matched data in an array. I stop the search if there are more than 200
    > results so the array never exceeds this, but as the data file is so
    > large I wonder if there is any load on server memory, especially if
    > numerous users were accessing it at once.


    I think the most memory-efficient way is the following:

    #!/usr/bin/perl
    use strict;
    use warnings;

    my @matched = ();
    my $file = 'file.dat';

    open my $F, '<', $file || die "Cant open $file: $!";
    flock($F, 1) || die "Cant get LOCK_SH on $file: $!";
    while(<$F>) {
    if (/pattern/) { push(@matched, $_); }
    }
    close $F || die "Cant close $file: $!";

    # small report utility
    print for @matched;

    See also PerfFAQ 3.16: " How can I make my Perl program take less
    memory? "

    Hope this helps,

    --
    Bart
     
    Bart Van der Donck, Apr 15, 2006
    #5
  6. pppe wrote:
    > I am posting this from my post to Perl Programming.


    Why did you separate the bulk of your post with a signature separator?
    My Newsreader automatically snipes signatures, so now I have to manually
    copy and paste the text I want to quote.

    <quote>
    when using the open command to read data
    from a file, if the file was large, say 80mb, would it load much of the
    file into memory or would it only take a record at a time?
    </quote>

    Neither, nor. open() doesn't load any content into memory. It just opens the
    file.
    How you read it is totally up to you:

    To read the next line:
    while (<MYFILE>)
    or
    $line = <MYFILE>

    To read the whole file:
    @everything = <MYFILE>


    jue
     
    Jürgen Exner, Apr 15, 2006
    #6
  7. pppe

    Brian Wakem Guest

    Jürgen Exner wrote:

    > pppe wrote:
    >> I am posting this from my post to Perl Programming.

    >
    > Why did you separate the bulk of your post with a signature separator?
    > My Newsreader automatically snipes signatures, so now I have to manually
    > copy and paste the text I want to quote.



    Your newsreader is broken, that was not a sig seperator.



    --
    Brian Wakem
    Email: http://homepage.ntlworld.com/b.wakem/myemail.png
     
    Brian Wakem, Apr 15, 2006
    #7
  8. pppe <> wrote:

    > Is someone able to tell me if when using the open command to read data



    The open() function does not read data!

    "opening a file" and "reading from a file" are distinct operations.


    > from a file, if the file was large, say 80mb, would it load much of the
    > file into memory or would it only take a record at a time?



    It would not load _any_ of the file contents into memory.

    How much memory will be used depends on how you are *reading* the file.

    How are you reading the file?


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Apr 15, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    8
    Views:
    353
    Ian Collins
    Jun 25, 2006
  2. fniles
    Replies:
    0
    Views:
    301
    fniles
    Apr 26, 2009
  3. Paul Nulty
    Replies:
    8
    Views:
    118
    Paul Nulty
    Mar 12, 2007
  4. seansan
    Replies:
    6
    Views:
    171
    Anno Siegel
    Jan 5, 2004
  5. Replies:
    5
    Views:
    994
    Xho Jingleheimerschmidt
    Apr 2, 2009
Loading...

Share This Page