Extarcting And Storing a String

Discussion in 'Perl Misc' started by Digger, Jan 7, 2005.

  1. Digger

    Digger Guest

    I am trying to extract a url from a file and store it, the problem is
    I only want the first occurance of that url that meets certain
    criteria.

    How can I get that single url out of a file and store it to be used
    for something else?

    Thanks
     
    Digger, Jan 7, 2005
    #1
    1. Advertising

  2. Digger wrote:

    > How can I get that single url out of a file and store it to be used
    > for something else?


    You left out a critical bit of information: What format is the file in?
    If it's HTML, use HTML::parser.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Jan 7, 2005
    #2
    1. Advertising

  3. Digger

    Digger Guest

    On Fri, 07 Jan 2005 12:48:20 -0500, Sherm Pendley
    <> wrote:

    >Digger wrote:
    >
    >> How can I get that single url out of a file and store it to be used
    >> for something else?

    >
    >You left out a critical bit of information: What format is the file in?
    >If it's HTML, use HTML::parser.
    >
    >sherm--


    Sorry, yes......

    It's a flat text log file.....

    date : error message: url: other garbage
     
    Digger, Jan 7, 2005
    #3
  4. Sherm Pendley wrote:
    > Digger wrote:
    >> I am trying to extract a url from a file and store it, the problem is
    >> I only want the first occurance of that url that meets certain
    >> criteria.
    >>
    >> How can I get that single url out of a file and store it to be used
    >> for something else?

    >
    > You left out a critical bit of information: What format is the file in?
    > If it's HTML, use HTML::parser.


    Not necessarily. The OP didn't tell which criteria will be used to
    identify the URL, but if those criteria has nothing to do with the
    positioning of the URL in relation to various HTML elements,
    HTML::parser won't reasonably be useful for the task, even if the file
    happens to be an HTML page.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jan 7, 2005
    #4
  5. Digger

    mjl69 Guest

    Digger wrote:

    > I am trying to extract a url from a file and store it, the problem is
    > I only want the first occurance of that url that meets certain
    > criteria.
    >
    > How can I get that single url out of a file and store it to be used
    > for something else?
    >
    > Thanks


    use HTML::LinkExtor;

    mjl
     
    mjl69, Jan 7, 2005
    #5
  6. Digger wrote:
    > Sherm Pendley wrote:
    >> Digger wrote:
    >>> How can I get that single url out of a file and store it to be
    >>> used for something else?

    >>
    >> You left out a critical bit of information: What format is the file
    >> in? If it's HTML, use HTML::parser.

    >
    > Sorry, yes......
    >
    > It's a flat text log file.....
    >
    > date : error message: url: other garbage


    What part of the task do you have difficulties with? Show us what you
    have tried so far, and somebody may be able to point you in the right
    direction.

    A hint: check out the split() function.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jan 7, 2005
    #6
  7. Digger

    mjl69 Guest

    Gunnar Hjalmarsson wrote:

    > Digger wrote:
    > > Sherm Pendley wrote:
    > >> Digger wrote:
    > >>> How can I get that single url out of a file and store it to be
    > >>> used for something else?
    > >>
    > >> You left out a critical bit of information: What format is the file
    > >> in? If it's HTML, use HTML::parser.

    > >
    > > Sorry, yes......
    > >
    > > It's a flat text log file.....
    > >
    > > date : error message: url: other garbage

    >
    > What part of the task do you have difficulties with? Show us what you
    > have tried so far, and somebody may be able to point you in the right
    > direction.
    >
    > A hint: check out the split() function.


    #!/usr/bin/perl

    use strict;
    use warnings;

    open my $file, 'log.txt' or die "error: could not open file: $!";
    for (<$file>)
    {
    print if s/.*url:\s+(\S+)\s+.*/$1/;
    }

    For the flat text log file described, I was thinking of something like
    this, but it won't work if the url has spaces in it (like local paths
    in Windows) or if there is not at least one space on each side of the
    url.


    mjl
     
    mjl69, Jan 7, 2005
    #7
  8. Digger

    Digger Guest

    On 7 Jan 2005 18:20:33 GMT, "mjl69" <> wrote:

    >Digger wrote:
    >
    >> I am trying to extract a url from a file and store it, the problem is
    >> I only want the first occurance of that url that meets certain
    >> criteria.
    >>
    >> How can I get that single url out of a file and store it to be used
    >> for something else?
    >>
    >> Thanks

    >
    >use HTML::LinkExtor;
    >
    >mjl


    The criteria to extract the URL with bee either "FAILED" or
    "SUCCESS"...

    Example...


    [2004-12-25 9:20:12] FAILED http://hotmail.com/bla
    [2004-12-25 9:25:12] SUCCESS http://hotmail.com/bla
    [2004-12-25 9:26:12] FAILED http://abc.com
    [2004-12-25 9:27:12] FAILED http://123.com

    etc.....
     
    Digger, Jan 7, 2005
    #8
  9. Digger wrote:

    > The criteria to extract the URL with bee either "FAILED" or
    > "SUCCESS"...
    >
    > Example...
    >
    >
    > [2004-12-25 9:20:12] FAILED http://hotmail.com/bla
    > [2004-12-25 9:25:12] SUCCESS http://hotmail.com/bla
    > [2004-12-25 9:26:12] FAILED http://abc.com
    > [2004-12-25 9:27:12] FAILED http://123.com


    Just loop through the lines in the file. Use a regex to examine each
    line and use last to exit from the loop as soon as you find what you're
    looking for.

    For example:

    #!/usr/bin/perl

    use strict;
    use warnings;

    # These are declared outside the while loop so you
    # can use them after the loop exits
    my $flag;
    my $url;

    while(<>) {
    ($flag, $url) = /(FAILED|SUCCESS) (.*)$/;
    last if ($flag && $flag eq 'SUCCESS');
    }

    # Do something with $url ...

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Jan 7, 2005
    #9
  10. Digger

    Joe Smith Guest

    mjl69 wrote:
    > /.*url:\s+(\S+)\s+.*/;
    > but it won't work if there is not at least one space on each side of the
    > url.


    Then use \s* instead of the first \s+ and get rid of the second.
    You want either /.*?url:/ or /url:/ to ignore potential matches
    in the garbage field.
    -Joe
     
    Joe Smith, Jan 8, 2005
    #10
  11. Digger

    Joe Smith Guest

    Digger wrote:

    > [2004-12-25 9:20:12] FAILED http://hotmail.com/bla
    > [2004-12-25 9:25:12] SUCCESS http://hotmail.com/bla
    > [2004-12-25 9:26:12] FAILED http://abc.com
    > [2004-12-25 9:27:12] FAILED http://123.com


    my %status;
    while (<>) {
    / (FAILED|SUCCESS) (.*)/ and $status{$2} = $1;
    }
    print "URLs whose last status was SUCCESS:\n";
    $status{$_} eq 'SUCCESS' and print " $_\n" for sort keys %status;

    print "\nURLs whose last status was FAILED:\n";
    $status{$_} eq 'FAILED' and print " $_\n" for sort keys %status;

    -Joe
     
    Joe Smith, Jan 8, 2005
    #11
  12. Digger

    Digger Guest

    On Sat, 08 Jan 2005 05:22:42 -0800, Joe Smith <> wrote:

    >Digger wrote:
    >
    >> [2004-12-25 9:20:12] FAILED http://hotmail.com/bla
    >> [2004-12-25 9:25:12] SUCCESS http://hotmail.com/bla
    >> [2004-12-25 9:26:12] FAILED http://abc.com
    >> [2004-12-25 9:27:12] FAILED http://123.com

    >
    >my %status;
    >while (<>) {
    > / (FAILED|SUCCESS) (.*)/ and $status{$2} = $1;
    >}
    >print "URLs whose last status was SUCCESS:\n";
    >$status{$_} eq 'SUCCESS' and print " $_\n" for sort keys %status;
    >
    >print "\nURLs whose last status was FAILED:\n";
    >$status{$_} eq 'FAILED' and print " $_\n" for sort keys %status;
    >
    > -Joe


    So how do I go about opening the logfile and running your while loop
    on it????
     
    Digger, Jan 8, 2005
    #12
  13. Digger wrote:

    > So how do I go about opening the logfile and running your while loop
    > on it????


    Have you read the posting guidelines that appear here frequently?

    This is a very basic question that's answered in any number of tutorials
    and books. It's considered rude to ask such questions without making at
    least *some* effort to read and understand such material first.

    Have a look at "perldoc perlintro" for a good start.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Jan 8, 2005
    #13
  14. Digger <> wrote:
    > On Sat, 08 Jan 2005 05:22:42 -0800, Joe Smith <> wrote:



    >>while (<>) {



    > So how do I go about opening the logfile and running your while loop
    > on it????



    Put the filename into @ARGV before the while loop, and _perl_
    will handle the file-opening for you.

    $ARGV[0] = 'some.file';
    while ( <> ) {
    ...

    or, use open() and a different while loop that uses the open's filehandle.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Jan 9, 2005
    #14
  15. Digger

    Joe Smith Guest

    Digger wrote:

    > So how do I go about opening the logfile and running your while loop
    > on it????


    You don't have to do anything. Just specify the log file name(s)
    on the command line.

    perl logchecker.pl file1.log file2.log file3.log

    Now that you know it is possible, go and study how while(<>){} works.
    -Joe
     
    Joe Smith, Jan 9, 2005
    #15
  16. Digger

    Digger Guest

    On Sat, 08 Jan 2005 23:17:53 -0800, Joe Smith <> wrote:

    >Digger wrote:
    >
    >> So how do I go about opening the logfile and running your while loop
    >> on it????

    >
    >You don't have to do anything. Just specify the log file name(s)
    >on the command line.
    >
    > perl logchecker.pl file1.log file2.log file3.log
    >
    >Now that you know it is possible, go and study how while(<>){} works.
    > -Joe

    lol...... I had a typo in the syntax I was using.......
     
    Digger, Jan 9, 2005
    #16
  17. Digger

    Digger Guest

    On Sun, 09 Jan 2005 12:45:22 -0500, Digger <>
    wrote:

    >On Sat, 08 Jan 2005 23:17:53 -0800, Joe Smith <> wrote:
    >
    >>Digger wrote:
    >>
    >>> So how do I go about opening the logfile and running your while loop
    >>> on it????

    >>
    >>You don't have to do anything. Just specify the log file name(s)
    >>on the command line.
    >>
    >> perl logchecker.pl file1.log file2.log file3.log
    >>
    >>Now that you know it is possible, go and study how while(<>){} works.
    >> -Joe

    >lol...... I had a typo in the syntax I was using.......


    Ok here is what's happening...

    script:


    #!/usr/bin/perl -w
    #
    $ARGV[0] = 'url2.log';
    my %status;
    while (<>) {
    / (FAILED|SUCCESS) (.*)/ and $status{$2} = $1;
    }
    print "URLs whose last status was SUCCESS:\n";
    $status{$_} eq 'SUCCESS' and print " $_\n" for sort keys %status;

    print "\nURLs whose last status was FAILED:\n";
    $status{$_} eq 'FAILED' and print " $_\n" for sort keys %status;


    Log File:

    root@digger > more url2.log
    [2005-01-04 09:17:59] FAILURE RESPONSE: Exceeded retry count (1) from
    http://192.168.6.7:2888/
    [2005-01-04 09:17:59] FAILURE RESPONSE: Exceeded retry count (1) from
    http://192.9.6.7:2888/
    [2005-01-04 09:18:57] SUCCESS RESPONSE from http://192.168.6.7:2888/
    [2005-01-04 09:26:57] FAILURE RESPONSE from http://192.55.6.7:2888/


    Output:

    root@digger > ./test2.pl
    URLs whose last status was SUCCESS:
    RESPONSE from http://192.168.6.7:2888/

    URLs whose last status was FAILED:



    As we can see it did pick up the first URL that initially FAILED then
    a few minutes later had a SUCCESS. But it didn't pickup
    http://192.9.6.7:2888/
    http://192.55.6.7:2888/

    that both had a FAILURE status, which is what I am concerned
    about.....
     
    Digger, Jan 9, 2005
    #17
  18. Digger

    Guest

    Digger wrote:
    > #!/usr/bin/perl -w
    > #
    > $ARGV[0] = 'url2.log';
    > my %status;
    > while (<>) {
    > / (FAILED|SUCCESS) (.*)/ and $status{$2} = $1;
    > }


    Line above s/b
    / (FAILURE|SUCCESS).+?from (.+)/ and $status{$2} = $1;


    > print "URLs whose last status was SUCCESS:\n";
    > $status{$_} eq 'SUCCESS' and print " $_\n" for sort keys %status;
    >
    > print "\nURLs whose last status was FAILED:\n";
    > $status{$_} eq 'FAILED' and print " $_\n" for sort keys %status;


    In the line above, 'FAILED' s/b 'FAILURE'
    $status{$_} eq 'FAILURE' and print " $_\n" for sort keys %status;


    > Output:
    >
    > root@digger > ./test2.pl
    > URLs whose last status was SUCCESS:
    > RESPONSE from http://192.168.6.7:2888/
    >
    > URLs whose last status was FAILED:
    >
    >
    >
    > As we can see it did pick up the first URL that initially FAILED then
    > a few minutes later had a SUCCESS. But it didn't pickup
    > http://192.9.6.7:2888/
    > http://192.55.6.7:2888/


    Output with my changes to the code:

    URLs whose last status was SUCCESS:
    http://192.168.6.7:2888/

    URLs whose last status was FAILED:
    http://192.55.6.7:2888/
    http://192.9.6.7:2888/
    I think these changes should give you the desired results.

    Chris
     
    , Jan 9, 2005
    #18
  19. Digger

    Guest

    Wow, there is a piece missing in the first change;

    / (FAILURE|SUCCESS).+?from (.+)/ (** wrong)
    / (FAILURE|SUCCESS).+?from (.+)/ and $status{$2} = $1; (** right)
     
    , Jan 9, 2005
    #19
  20. Digger

    Guest

    Sorry, my bad. Joe's code was correct for the data example that the
    poster provided. I should've read the thread more closely.
     
    , Jan 10, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ritu
    Replies:
    1
    Views:
    3,083
    Alvin Bruney
    Jul 27, 2003
  2. Shyam
    Replies:
    1
    Views:
    1,009
    William F. Robertson, Jr.
    Oct 28, 2003
  3. MattC
    Replies:
    0
    Views:
    395
    MattC
    Jun 25, 2004
  4. toton
    Replies:
    11
    Views:
    712
    toton
    Oct 13, 2006
  5. Jonathan Wood
    Replies:
    1
    Views:
    511
    Jonathan Wood
    Jun 2, 2008
Loading...

Share This Page