Extarcting And Storing a String

D

Digger

I am trying to extract a url from a file and store it, the problem is
I only want the first occurance of that url that meets certain
criteria.

How can I get that single url out of a file and store it to be used
for something else?

Thanks
 
S

Sherm Pendley

Digger said:
How can I get that single url out of a file and store it to be used
for something else?

You left out a critical bit of information: What format is the file in?
If it's HTML, use HTML::parser.

sherm--
 
D

Digger

You left out a critical bit of information: What format is the file in?
If it's HTML, use HTML::parser.

sherm--

Sorry, yes......

It's a flat text log file.....

date : error message: url: other garbage
 
G

Gunnar Hjalmarsson

Sherm said:
You left out a critical bit of information: What format is the file in?
If it's HTML, use HTML::parser.

Not necessarily. The OP didn't tell which criteria will be used to
identify the URL, but if those criteria has nothing to do with the
positioning of the URL in relation to various HTML elements,
HTML::parser won't reasonably be useful for the task, even if the file
happens to be an HTML page.
 
M

mjl69

Digger said:
I am trying to extract a url from a file and store it, the problem is
I only want the first occurance of that url that meets certain
criteria.

How can I get that single url out of a file and store it to be used
for something else?

Thanks

use HTML::LinkExtor;

mjl
 
G

Gunnar Hjalmarsson

Digger said:
Sorry, yes......

It's a flat text log file.....

date : error message: url: other garbage

What part of the task do you have difficulties with? Show us what you
have tried so far, and somebody may be able to point you in the right
direction.

A hint: check out the split() function.
 
M

mjl69

Gunnar said:
What part of the task do you have difficulties with? Show us what you
have tried so far, and somebody may be able to point you in the right
direction.

A hint: check out the split() function.

#!/usr/bin/perl

use strict;
use warnings;

open my $file, 'log.txt' or die "error: could not open file: $!";
for (<$file>)
{
print if s/.*url:\s+(\S+)\s+.*/$1/;
}

For the flat text log file described, I was thinking of something like
this, but it won't work if the url has spaces in it (like local paths
in Windows) or if there is not at least one space on each side of the
url.


mjl
 
S

Sherm Pendley

Digger said:
The criteria to extract the URL with bee either "FAILED" or
"SUCCESS"...

Example...


[2004-12-25 9:20:12] FAILED http://hotmail.com/bla
[2004-12-25 9:25:12] SUCCESS http://hotmail.com/bla
[2004-12-25 9:26:12] FAILED http://abc.com
[2004-12-25 9:27:12] FAILED http://123.com

Just loop through the lines in the file. Use a regex to examine each
line and use last to exit from the loop as soon as you find what you're
looking for.

For example:

#!/usr/bin/perl

use strict;
use warnings;

# These are declared outside the while loop so you
# can use them after the loop exits
my $flag;
my $url;

while(<>) {
($flag, $url) = /(FAILED|SUCCESS) (.*)$/;
last if ($flag && $flag eq 'SUCCESS');
}

# Do something with $url ...

sherm--
 
J

Joe Smith

mjl69 said:
/.*url:\s+(\S+)\s+.*/;
but it won't work if there is not at least one space on each side of the
url.

Then use \s* instead of the first \s+ and get rid of the second.
You want either /.*?url:/ or /url:/ to ignore potential matches
in the garbage field.
-Joe
 
J

Joe Smith

Digger said:
[2004-12-25 9:20:12] FAILED http://hotmail.com/bla
[2004-12-25 9:25:12] SUCCESS http://hotmail.com/bla
[2004-12-25 9:26:12] FAILED http://abc.com
[2004-12-25 9:27:12] FAILED http://123.com

my %status;
while (<>) {
/ (FAILED|SUCCESS) (.*)/ and $status{$2} = $1;
}
print "URLs whose last status was SUCCESS:\n";
$status{$_} eq 'SUCCESS' and print " $_\n" for sort keys %status;

print "\nURLs whose last status was FAILED:\n";
$status{$_} eq 'FAILED' and print " $_\n" for sort keys %status;

-Joe
 
D

Digger

Digger said:
[2004-12-25 9:20:12] FAILED http://hotmail.com/bla
[2004-12-25 9:25:12] SUCCESS http://hotmail.com/bla
[2004-12-25 9:26:12] FAILED http://abc.com
[2004-12-25 9:27:12] FAILED http://123.com

my %status;
while (<>) {
/ (FAILED|SUCCESS) (.*)/ and $status{$2} = $1;
}
print "URLs whose last status was SUCCESS:\n";
$status{$_} eq 'SUCCESS' and print " $_\n" for sort keys %status;

print "\nURLs whose last status was FAILED:\n";
$status{$_} eq 'FAILED' and print " $_\n" for sort keys %status;

-Joe

So how do I go about opening the logfile and running your while loop
on it????
 
S

Sherm Pendley

Digger said:
So how do I go about opening the logfile and running your while loop
on it????

Have you read the posting guidelines that appear here frequently?

This is a very basic question that's answered in any number of tutorials
and books. It's considered rude to ask such questions without making at
least *some* effort to read and understand such material first.

Have a look at "perldoc perlintro" for a good start.

sherm--
 
T

Tad McClellan

Digger said:
So how do I go about opening the logfile and running your while loop
on it????


Put the filename into @ARGV before the while loop, and _perl_
will handle the file-opening for you.

$ARGV[0] = 'some.file';
while ( <> ) {
...

or, use open() and a different while loop that uses the open's filehandle.
 
J

Joe Smith

Digger said:
So how do I go about opening the logfile and running your while loop
on it????

You don't have to do anything. Just specify the log file name(s)
on the command line.

perl logchecker.pl file1.log file2.log file3.log

Now that you know it is possible, go and study how while(<>){} works.
-Joe
 
D

Digger

You don't have to do anything. Just specify the log file name(s)
on the command line.

perl logchecker.pl file1.log file2.log file3.log

Now that you know it is possible, go and study how while(<>){} works.
-Joe
lol...... I had a typo in the syntax I was using.......
 
D

Digger

lol...... I had a typo in the syntax I was using.......

Ok here is what's happening...

script:


#!/usr/bin/perl -w
#
$ARGV[0] = 'url2.log';
my %status;
while (<>) {
/ (FAILED|SUCCESS) (.*)/ and $status{$2} = $1;
}
print "URLs whose last status was SUCCESS:\n";
$status{$_} eq 'SUCCESS' and print " $_\n" for sort keys %status;

print "\nURLs whose last status was FAILED:\n";
$status{$_} eq 'FAILED' and print " $_\n" for sort keys %status;


Log File:

root@digger > more url2.log
[2005-01-04 09:17:59] FAILURE RESPONSE: Exceeded retry count (1) from
http://192.168.6.7:2888/
[2005-01-04 09:17:59] FAILURE RESPONSE: Exceeded retry count (1) from
http://192.9.6.7:2888/
[2005-01-04 09:18:57] SUCCESS RESPONSE from http://192.168.6.7:2888/
[2005-01-04 09:26:57] FAILURE RESPONSE from http://192.55.6.7:2888/


Output:

root@digger > ./test2.pl
URLs whose last status was SUCCESS:
RESPONSE from http://192.168.6.7:2888/

URLs whose last status was FAILED:



As we can see it did pick up the first URL that initially FAILED then
a few minutes later had a SUCCESS. But it didn't pickup
http://192.9.6.7:2888/
http://192.55.6.7:2888/

that both had a FAILURE status, which is what I am concerned
about.....
 
C

charley

Digger said:
#!/usr/bin/perl -w
#
$ARGV[0] = 'url2.log';
my %status;
while (<>) {
/ (FAILED|SUCCESS) (.*)/ and $status{$2} = $1;
}

Line above s/b
/ (FAILURE|SUCCESS).+?from (.+)/ and $status{$2} = $1;

print "URLs whose last status was SUCCESS:\n";
$status{$_} eq 'SUCCESS' and print " $_\n" for sort keys %status;

print "\nURLs whose last status was FAILED:\n";
$status{$_} eq 'FAILED' and print " $_\n" for sort keys %status;

In the line above, 'FAILED' s/b 'FAILURE'
$status{$_} eq 'FAILURE' and print " $_\n" for sort keys %status;

Output:

root@digger > ./test2.pl
URLs whose last status was SUCCESS:
RESPONSE from http://192.168.6.7:2888/

URLs whose last status was FAILED:



As we can see it did pick up the first URL that initially FAILED then
a few minutes later had a SUCCESS. But it didn't pickup
http://192.9.6.7:2888/
http://192.55.6.7:2888/

Output with my changes to the code:

URLs whose last status was SUCCESS:
http://192.168.6.7:2888/

URLs whose last status was FAILED:
http://192.55.6.7:2888/
http://192.9.6.7:2888/
I think these changes should give you the desired results.

Chris
 
C

charley

Wow, there is a piece missing in the first change;

/ (FAILURE|SUCCESS).+?from (.+)/ (** wrong)
/ (FAILURE|SUCCESS).+?from (.+)/ and $status{$2} = $1; (** right)
 
C

charley

Sorry, my bad. Joe's code was correct for the data example that the
poster provided. I should've read the thread more closely.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,052
Latest member
LucyCarper

Latest Threads

Top