using LWP to get a very large file

Discussion in 'Perl Misc' started by justme, Apr 28, 2005.

  1. justme

    justme Guest

    hi

    I have a remote machine running some application that generates very
    large
    log files in terms of Mb...say average 60Mb.
    Normally, we need to connect to this remote machine by keying in a URL
    in a browser such as http://remote:123/logs. Then the browser will
    display tab delimited columns of data.

    What i wanted to do use the perl LWP module to get this log as i do
    not want to go to the physical machine to use the browser to get the
    logs. Then according to some filtering parameters, display the logs
    according to the filters. For example, if filter by a certain date,
    then it will display the contents based on that date.

    Questions:
    1) this log file is very big ,60Mb at least. So , is LWP the one to
    use..or is there a better module to deal with large files..
    2) While getting the log file, is it better in terms of memory usage
    to parse the data "on the fly" or get the whole file and do the
    parsing afterwards..
    3) becos i am not at the physical machine, i can't really do something
    lika a "tail" feature which displays the data "realtime"... any ways
    to do a "tail" on the log file remotely?

    thanks..
     
    justme, Apr 28, 2005
    #1
    1. Advertising

  2. justme

    John Bokma Guest

    justme wrote:

    > hi
    >
    > I have a remote machine running some application that generates very
    > large
    > log files in terms of Mb...say average 60Mb.
    > Normally, we need to connect to this remote machine by keying in a URL
    > in a browser such as http://remote:123/logs. Then the browser will
    > display tab delimited columns of data.
    >
    > What i wanted to do use the perl LWP module to get this log as i do
    > not want to go to the physical machine to use the browser to get the
    > logs. Then according to some filtering parameters, display the logs
    > according to the filters. For example, if filter by a certain date,
    > then it will display the contents based on that date.
    >
    > Questions:
    > 1) this log file is very big ,60Mb at least. So , is LWP the one to
    > use..or is there a better module to deal with large files..
    > 2) While getting the log file, is it better in terms of memory usage
    > to parse the data "on the fly" or get the whole file and do the
    > parsing afterwards..
    > 3) becos i am not at the physical machine, i can't really do something
    > lika a "tail" feature which displays the data "realtime"... any ways
    > to do a "tail" on the log file remotely?


    I use plink for stuff like this (part of PuTTY), e.g.:

    plink -ssh -pw password gzip -9 -c logs/error_log |
    gzip -d > site/logs/error_log

    But this only works if you can get the log via SSH.

    My log is uncompressed around 130 MB :)

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Apr 28, 2005
    #2
    1. Advertising

  3. John Bokma wrote:
    > justme wrote:
    >>I have a remote machine running some application that generates very
    >>large
    >>log files in terms of Mb...say average 60Mb.
    >>Normally, we need to connect to this remote machine by keying in a URL
    >>in a browser such as http://remote:123/logs. Then the browser will
    >>display tab delimited columns of data.

    <snip>
    >>Questions:
    >>1) this log file is very big ,60Mb at least. So , is LWP the one to
    >>use..or is there a better module to deal with large files..
    >>2) While getting the log file, is it better in terms of memory usage
    >>to parse the data "on the fly" or get the whole file and do the
    >>parsing afterwards..
    >>3) becos i am not at the physical machine, i can't really do something
    >>lika a "tail" feature which displays the data "realtime"... any ways
    >>to do a "tail" on the log file remotely?

    >
    >
    > I use plink for stuff like this (part of PuTTY), e.g.:
    >
    > plink -ssh -pw password gzip -9 -c logs/error_log |
    > gzip -d > site/logs/error_log
    >
    > But this only works if you can get the log via SSH.
    >
    > My log is uncompressed around 130 MB :)
    >

    Numbered answers:
    1. You might be better off using something like wget if you want to do
    this, not because LWP can't handle files of that size, but because it
    gives you a bunch of functionality (eg resume) without doing any
    programming.

    2. it depends on the structure of the data and the amount of memory on
    the parsing machine. 60MB isn't a vast amount of data to suck in in one
    go, but bear in mind that any in-memory data structure you build will
    take more space than this. Check out perldoc -q memory.

    3. Well: if he can get SSH access, then he could run tail that way.

    Mark
     
    Mark Clements, Apr 28, 2005
    #3
  4. justme

    peter pilsl Guest

    justme wrote:
    > hi
    >
    > I have a remote machine running some application that generates very
    > large
    > log files in terms of Mb...say average 60Mb.
    > Normally, we need to connect to this remote machine by keying in a URL
    > in a browser such as http://remote:123/logs. Then the browser will
    > display tab delimited columns of data.
    >
    > Questions:
    > 1) this log file is very big ,60Mb at least. So , is LWP the one to
    > use..or is there a better module to deal with large files..
    > 2) While getting the log file, is it better in terms of memory usage
    > to parse the data "on the fly" or get the whole file and do the
    > parsing afterwards..
    > 3) becos i am not at the physical machine, i can't really do something
    > lika a "tail" feature which displays the data "realtime"... any ways
    > to do a "tail" on the log file remotely?
    >


    wget is the tool of choice. compression on serverside is recommended,
    you can save a lot of traffic and downloadtime for this. If you run a
    usual webserver on serverside like apache, then enabling optional
    compression is easy.

    Its better to fetch the file and do the parsing afterwards. Just makes
    things easier.

    You can do a remotely tail if the server supports resumed download. But
    I dont know how I would implement it. But its merely the same than
    resumed download.

    best,
    peter


    --
    http://www.goldfisch.at/know_list
     
    peter pilsl, Apr 28, 2005
    #4
  5. justme

    Joe Smith Guest

    justme wrote:

    > very large log files in terms of Mb...say average 60Mb.


    60 MB is not large. 2 GB is large.

    > 3) becos i am not at the physical machine, i can't really do something
    > lika a "tail" feature which displays the data "realtime"... any ways
    > to do a "tail" on the log file remotely?


    I'm doing something like that at work.

    *) Account on remote server has read access to the log files
    an a SSH key to connect to a local server.
    *) Account has a cron job that periodically runs rsync over ssh
    to propagate changes in the log files to the local server.
    *) Use File::Tail on the local server to follow the changes.

    -Joe
     
    Joe Smith, Apr 28, 2005
    #5
  6. Joe Smith <> wrote in news:LYednUg2OsnzpOzfRVn-
    :

    > justme wrote:
    >
    >> very large log files in terms of Mb...say average 60Mb.

    >
    > 60 MB is not large. 2 GB is large.


    Just an observation on this matter. When I was downloading the Fedora DVD
    ISO, I found out too late that the Cygwin version of wget that was on my
    machine at the time could not handle file sizes larger than 2GB. OTOH,
    using LWP::Simple, a Perl one liner downloaded the whole DVD image with no
    problem. Of course, the progress indicator was not there, but such is
    life.

    Sinan
    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Apr 28, 2005
    #6
  7. justme

    Joe Smith Guest

    A. Sinan Unur wrote:
    > I found out too late that the Cygwin version of wget that was on my
    > machine at the time could not handle file sizes larger than 2GB. OTOH,
    > using LWP::Simple, a Perl one liner downloaded the whole DVD image with no
    > problem. Of course, the progress indicator was not there, but such is
    > life.


    Yep, I had to give up on wget for that very reason.

    If the server hosting the large file is an FTP server, then you can get
    some sense of progress, as shown in http://www.inwap.com/tivo/from-tivo

    -Joe
     
    Joe Smith, Apr 30, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Raymond Arthur St. Marie II of III

    very Very VERY dumb Question About The new Set( ) 's

    Raymond Arthur St. Marie II of III, Jul 23, 2003, in forum: Python
    Replies:
    4
    Views:
    483
    Raymond Hettinger
    Jul 27, 2003
  2. shanx__=|;-

    very very very long integer

    shanx__=|;-, Oct 16, 2004, in forum: C Programming
    Replies:
    19
    Views:
    1,632
    Merrill & Michele
    Oct 19, 2004
  3. Abhishek Jha

    very very very long integer

    Abhishek Jha, Oct 16, 2004, in forum: C Programming
    Replies:
    4
    Views:
    427
    jacob navia
    Oct 17, 2004
  4. Peter

    Very very very basic question

    Peter, Feb 8, 2005, in forum: C Programming
    Replies:
    14
    Views:
    519
    Dave Thompson
    Feb 14, 2005
  5. olivier.melcher

    Help running a very very very simple code

    olivier.melcher, May 12, 2008, in forum: Java
    Replies:
    8
    Views:
    2,305
Loading...

Share This Page