Extarcting And Storing a String

Digger · Jan 7, 2005

I am trying to extract a url from a file and store it, the problem is
I only want the first occurance of that url that meets certain
criteria.

How can I get that single url out of a file and store it to be used
for something else?

Thanks

Sherm Pendley · Jan 7, 2005

Digger said:
How can I get that single url out of a file and store it to be used
for something else?

You left out a critical bit of information: What format is the file in?
If it's HTML, use HTML:

arser.

sherm--

Digger · Jan 7, 2005

You left out a critical bit of information: What format is the file in?
If it's HTML, use HTML:arser.

sherm--

Sorry, yes......

It's a flat text log file.....

date : error message: url: other garbage

Gunnar Hjalmarsson · Jan 7, 2005

Sherm said:
You left out a critical bit of information: What format is the file in?
If it's HTML, use HTML:arser.

Not necessarily. The OP didn't tell which criteria will be used to
identify the URL, but if those criteria has nothing to do with the
positioning of the URL in relation to various HTML elements,
HTML:

arser won't reasonably be useful for the task, even if the file
happens to be an HTML page.

mjl69 · Jan 7, 2005

Digger said:
I am trying to extract a url from a file and store it, the problem is
I only want the first occurance of that url that meets certain
criteria.

How can I get that single url out of a file and store it to be used
for something else?

Thanks

use HTML::LinkExtor;

mjl

Gunnar Hjalmarsson · Jan 7, 2005

Digger said:
Sorry, yes......

It's a flat text log file.....

date : error message: url: other garbage

What part of the task do you have difficulties with? Show us what you
have tried so far, and somebody may be able to point you in the right
direction.

A hint: check out the split() function.

mjl69 · Jan 7, 2005

Gunnar said:
What part of the task do you have difficulties with? Show us what you
have tried so far, and somebody may be able to point you in the right
direction.

A hint: check out the split() function.

#!/usr/bin/perl

use strict;
use warnings;

open my $file, 'log.txt' or die "error: could not open file: $!";
for (<$file>)
{
print if s/.*url:\s+(\S+)\s+.*/$1/;
}

For the flat text log file described, I was thinking of something like
this, but it won't work if the url has spaces in it (like local paths
in Windows) or if there is not at least one space on each side of the
url.

mjl

Digger · Jan 7, 2005

use HTML::LinkExtor;

mjl

The criteria to extract the URL with bee either "FAILED" or
"SUCCESS"...

Example...

[2004-12-25 9:20:12] FAILED http://hotmail.com/bla
[2004-12-25 9:25:12] SUCCESS http://hotmail.com/bla
[2004-12-25 9:26:12] FAILED http://abc.com
[2004-12-25 9:27:12] FAILED http://123.com

etc.....

Sherm Pendley · Jan 7, 2005

Digger said:
The criteria to extract the URL with bee either "FAILED" or
"SUCCESS"...

Example...

[2004-12-25 9:20:12] FAILED http://hotmail.com/bla
[2004-12-25 9:25:12] SUCCESS http://hotmail.com/bla
[2004-12-25 9:26:12] FAILED http://abc.com
[2004-12-25 9:27:12] FAILED http://123.com

Just loop through the lines in the file. Use a regex to examine each
line and use last to exit from the loop as soon as you find what you're
looking for.

For example:

#!/usr/bin/perl

use strict;
use warnings;

# These are declared outside the while loop so you
# can use them after the loop exits
my $flag;
my $url;

while(<>) {
($flag, $url) = /(FAILED|SUCCESS) (.*)$/;
last if ($flag && $flag eq 'SUCCESS');
}

# Do something with $url ...

sherm--

Joe Smith · Jan 8, 2005

mjl69 said:
/.*url:\s+(\S+)\s+.*/;
but it won't work if there is not at least one space on each side of the
url.

Then use \s* instead of the first \s+ and get rid of the second.
You want either /.*?url:/ or /url:/ to ignore potential matches
in the garbage field.
-Joe

Joe Smith · Jan 8, 2005

Digger said:
[2004-12-25 9:20:12] FAILED http://hotmail.com/bla
[2004-12-25 9:25:12] SUCCESS http://hotmail.com/bla
[2004-12-25 9:26:12] FAILED http://abc.com
[2004-12-25 9:27:12] FAILED http://123.com

my %status;
while (<>) {
/ (FAILED|SUCCESS) (.*)/ and $status{$2} = $1;
}
print "URLs whose last status was SUCCESS:\n";
$status{$_} eq 'SUCCESS' and print " $_\n" for sort keys %status;

print "\nURLs whose last status was FAILED:\n";
$status{$_} eq 'FAILED' and print " $_\n" for sort keys %status;

-Joe

Digger · Jan 8, 2005

Digger said:
Digger said:

[2004-12-25 9:20:12] FAILED http://hotmail.com/bla
[2004-12-25 9:25:12] SUCCESS http://hotmail.com/bla
[2004-12-25 9:26:12] FAILED http://abc.com
[2004-12-25 9:27:12] FAILED http://123.com

Click to expand...

my %status;
while (<>) {
/ (FAILED|SUCCESS) (.*)/ and $status{$2} = $1;
}
print "URLs whose last status was SUCCESS:\n";
$status{$_} eq 'SUCCESS' and print " $_\n" for sort keys %status;

print "\nURLs whose last status was FAILED:\n";
$status{$_} eq 'FAILED' and print " $_\n" for sort keys %status;

-Joe

So how do I go about opening the logfile and running your while loop
on it????

Sherm Pendley · Jan 8, 2005

Digger said:
So how do I go about opening the logfile and running your while loop
on it????

Have you read the posting guidelines that appear here frequently?

This is a very basic question that's answered in any number of tutorials
and books. It's considered rude to ask such questions without making at
least *some* effort to read and understand such material first.

Have a look at "perldoc perlintro" for a good start.

sherm--

Tad McClellan · Jan 9, 2005

Digger said:
So how do I go about opening the logfile and running your while loop
on it????

Put the filename into @ARGV before the while loop, and _perl_
will handle the file-opening for you.

$ARGV[0] = 'some.file';
while ( <> ) {
...

or, use open() and a different while loop that uses the open's filehandle.

Joe Smith · Jan 9, 2005

Digger said:
So how do I go about opening the logfile and running your while loop
on it????

You don't have to do anything. Just specify the log file name(s)
on the command line.

perl logchecker.pl file1.log file2.log file3.log

Now that you know it is possible, go and study how while(<>){} works.
-Joe

Digger · Jan 9, 2005

You don't have to do anything. Just specify the log file name(s)
on the command line.

perl logchecker.pl file1.log file2.log file3.log

Now that you know it is possible, go and study how while(<>){} works.
-Joe

lol...... I had a typo in the syntax I was using.......

Digger · Jan 9, 2005

lol...... I had a typo in the syntax I was using.......

Ok here is what's happening...

script:

#!/usr/bin/perl -w
#
$ARGV[0] = 'url2.log';
my %status;
while (<>) {
/ (FAILED|SUCCESS) (.*)/ and $status{$2} = $1;
}
print "URLs whose last status was SUCCESS:\n";
$status{$_} eq 'SUCCESS' and print " $_\n" for sort keys %status;

print "\nURLs whose last status was FAILED:\n";
$status{$_} eq 'FAILED' and print " $_\n" for sort keys %status;

Log File:

root@digger > more url2.log
[2005-01-04 09:17:59] FAILURE RESPONSE: Exceeded retry count (1) from
http://192.168.6.7:2888/
[2005-01-04 09:17:59] FAILURE RESPONSE: Exceeded retry count (1) from
http://192.9.6.7:2888/
[2005-01-04 09:18:57] SUCCESS RESPONSE from http://192.168.6.7:2888/
[2005-01-04 09:26:57] FAILURE RESPONSE from http://192.55.6.7:2888/

Output:

root@digger > ./test2.pl
URLs whose last status was SUCCESS:
RESPONSE from http://192.168.6.7:2888/

URLs whose last status was FAILED:

As we can see it did pick up the first URL that initially FAILED then
a few minutes later had a SUCCESS. But it didn't pickup
http://192.9.6.7:2888/
http://192.55.6.7:2888/

that both had a FAILURE status, which is what I am concerned
about.....

charley · Jan 9, 2005

Digger said:
#!/usr/bin/perl -w
#
$ARGV[0] = 'url2.log';
my %status;
while (<>) {
/ (FAILED|SUCCESS) (.*)/ and $status{$2} = $1;
}

Line above s/b
/ (FAILURE|SUCCESS).+?from (.+)/ and $status{$2} = $1;

print "URLs whose last status was SUCCESS:\n";
$status{$_} eq 'SUCCESS' and print " $_\n" for sort keys %status;

print "\nURLs whose last status was FAILED:\n";
$status{$_} eq 'FAILED' and print " $_\n" for sort keys %status;

In the line above, 'FAILED' s/b 'FAILURE'
$status{$_} eq 'FAILURE' and print " $_\n" for sort keys %status;

Output:

root@digger > ./test2.pl
URLs whose last status was SUCCESS:
RESPONSE from http://192.168.6.7:2888/

URLs whose last status was FAILED:

As we can see it did pick up the first URL that initially FAILED then
a few minutes later had a SUCCESS. But it didn't pickup
http://192.9.6.7:2888/
http://192.55.6.7:2888/

Output with my changes to the code:

URLs whose last status was SUCCESS:
http://192.168.6.7:2888/

URLs whose last status was FAILED:
http://192.55.6.7:2888/
http://192.9.6.7:2888/
I think these changes should give you the desired results.

Chris

charley · Jan 9, 2005

Wow, there is a piece missing in the first change;

/ (FAILURE|SUCCESS).+?from (.+)/ (** wrong)
/ (FAILURE|SUCCESS).+?from (.+)/ and $status{$2} = $1; (** right)

charley · Jan 10, 2005

Sorry, my bad. Joe's code was correct for the data example that the
poster provided. I should've read the thread more closely.

C code String And Comparison	2	Dec 27, 2022
JavaScript String Syntax Checking	5	Jun 29, 2022
Trouble accessing a value within a JSON string.	1	Jun 16, 2023
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
Converting an Array to a String in JavaScript	7	Sep 22, 2023
Problem Splitting Text String	2	Dec 29, 2022
Measuring a string of text	1	Sep 15, 2022
Rearranging .ply file via C++ String Parsing	0	Dec 14, 2019

Extarcting And Storing a String

Digger

Sherm Pendley

Digger

Gunnar Hjalmarsson

mjl69

Gunnar Hjalmarsson

mjl69

Digger

Sherm Pendley

Joe Smith

Joe Smith

Digger

Sherm Pendley

Tad McClellan

Joe Smith

Digger

Digger

charley

charley

charley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads