using LWP to get a very large file

J

justme

hi

I have a remote machine running some application that generates very
large
log files in terms of Mb...say average 60Mb.
Normally, we need to connect to this remote machine by keying in a URL
in a browser such as http://remote:123/logs. Then the browser will
display tab delimited columns of data.

What i wanted to do use the perl LWP module to get this log as i do
not want to go to the physical machine to use the browser to get the
logs. Then according to some filtering parameters, display the logs
according to the filters. For example, if filter by a certain date,
then it will display the contents based on that date.

Questions:
1) this log file is very big ,60Mb at least. So , is LWP the one to
use..or is there a better module to deal with large files..
2) While getting the log file, is it better in terms of memory usage
to parse the data "on the fly" or get the whole file and do the
parsing afterwards..
3) becos i am not at the physical machine, i can't really do something
lika a "tail" feature which displays the data "realtime"... any ways
to do a "tail" on the log file remotely?

thanks..
 
J

John Bokma

justme said:
hi

I have a remote machine running some application that generates very
large
log files in terms of Mb...say average 60Mb.
Normally, we need to connect to this remote machine by keying in a URL
in a browser such as http://remote:123/logs. Then the browser will
display tab delimited columns of data.

What i wanted to do use the perl LWP module to get this log as i do
not want to go to the physical machine to use the browser to get the
logs. Then according to some filtering parameters, display the logs
according to the filters. For example, if filter by a certain date,
then it will display the contents based on that date.

Questions:
1) this log file is very big ,60Mb at least. So , is LWP the one to
use..or is there a better module to deal with large files..
2) While getting the log file, is it better in terms of memory usage
to parse the data "on the fly" or get the whole file and do the
parsing afterwards..
3) becos i am not at the physical machine, i can't really do something
lika a "tail" feature which displays the data "realtime"... any ways
to do a "tail" on the log file remotely?

I use plink for stuff like this (part of PuTTY), e.g.:

plink -ssh -pw password (e-mail address removed) gzip -9 -c logs/error_log |
gzip -d > site/logs/error_log

But this only works if you can get the log via SSH.

My log is uncompressed around 130 MB :)
 
M

Mark Clements

John said:
I use plink for stuff like this (part of PuTTY), e.g.:

plink -ssh -pw password (e-mail address removed) gzip -9 -c logs/error_log |
gzip -d > site/logs/error_log

But this only works if you can get the log via SSH.

My log is uncompressed around 130 MB :)
Numbered answers:
1. You might be better off using something like wget if you want to do
this, not because LWP can't handle files of that size, but because it
gives you a bunch of functionality (eg resume) without doing any
programming.

2. it depends on the structure of the data and the amount of memory on
the parsing machine. 60MB isn't a vast amount of data to suck in in one
go, but bear in mind that any in-memory data structure you build will
take more space than this. Check out perldoc -q memory.

3. Well: if he can get SSH access, then he could run tail that way.

Mark
 
P

peter pilsl

justme said:
hi

I have a remote machine running some application that generates very
large
log files in terms of Mb...say average 60Mb.
Normally, we need to connect to this remote machine by keying in a URL
in a browser such as http://remote:123/logs. Then the browser will
display tab delimited columns of data.

Questions:
1) this log file is very big ,60Mb at least. So , is LWP the one to
use..or is there a better module to deal with large files..
2) While getting the log file, is it better in terms of memory usage
to parse the data "on the fly" or get the whole file and do the
parsing afterwards..
3) becos i am not at the physical machine, i can't really do something
lika a "tail" feature which displays the data "realtime"... any ways
to do a "tail" on the log file remotely?

wget is the tool of choice. compression on serverside is recommended,
you can save a lot of traffic and downloadtime for this. If you run a
usual webserver on serverside like apache, then enabling optional
compression is easy.

Its better to fetch the file and do the parsing afterwards. Just makes
things easier.

You can do a remotely tail if the server supports resumed download. But
I dont know how I would implement it. But its merely the same than
resumed download.

best,
peter
 
J

Joe Smith

justme said:
very large log files in terms of Mb...say average 60Mb.

60 MB is not large. 2 GB is large.
3) becos i am not at the physical machine, i can't really do something
lika a "tail" feature which displays the data "realtime"... any ways
to do a "tail" on the log file remotely?

I'm doing something like that at work.

*) Account on remote server has read access to the log files
an a SSH key to connect to a local server.
*) Account has a cron job that periodically runs rsync over ssh
to propagate changes in the log files to the local server.
*) Use File::Tail on the local server to follow the changes.

-Joe
 
A

A. Sinan Unur

60 MB is not large. 2 GB is large.

Just an observation on this matter. When I was downloading the Fedora DVD
ISO, I found out too late that the Cygwin version of wget that was on my
machine at the time could not handle file sizes larger than 2GB. OTOH,
using LWP::Simple, a Perl one liner downloaded the whole DVD image with no
problem. Of course, the progress indicator was not there, but such is
life.

Sinan
 
J

Joe Smith

A. Sinan Unur said:
I found out too late that the Cygwin version of wget that was on my
machine at the time could not handle file sizes larger than 2GB. OTOH,
using LWP::Simple, a Perl one liner downloaded the whole DVD image with no
problem. Of course, the progress indicator was not there, but such is
life.

Yep, I had to give up on wget for that very reason.

If the server hosting the large file is an FTP server, then you can get
some sense of progress, as shown in http://www.inwap.com/tivo/from-tivo

-Joe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top