Parse what's in a URL

Discussion in 'Perl Misc' started by donfanning@msn.com, Sep 26, 2005.

  1. Guest

    Is there a way using perl to take a URL, submit it, then parse the
    resulting url it returns after the page pulls? Like for submitting a
    query to a database and getting a status code in return (success,
    failure, reason, etc..)
    , Sep 26, 2005
    #1
    1. Advertising

  2. wrote:
    > Is there a way using perl to take a URL, submit it, then parse the
    > resulting url it returns after the page pulls? Like for submitting a
    > query to a database and getting a status code in return (success,
    > failure, reason, etc..)


    If you want the resulting _content_, this FAQ entry is applicable:

    perldoc -q "HTML file"

    If you are after only the HTTP status, this may help:

    use LWP::UserAgent;
    my $ua = LWP::UserAgent->new;
    my $response = $ua->get($url);
    print $response->status_line;

    In general, please make yourself comfortable with the LWP family of CPAN
    modules.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Sep 26, 2005
    #2
    1. Advertising

  3. Eric Bohlman Guest

    wrote in news:1127773035.855380.237430
    @z14g2000cwz.googlegroups.com:

    > Is there a way using perl to take a URL, submit it, then parse the
    > resulting url it returns after the page pulls? Like for submitting a
    > query to a database and getting a status code in return (success,
    > failure, reason, etc..)


    Your terminology is a bit confused here; what an HTTP request sent to a
    particular URL returns is not a URL, but a response. The response may be
    just a status code for the HTTP request, or a resource like an HTML page
    (which may contain status codes not related to HTTP, such as database query
    results). I assume you want to parse the returned document. If so, you
    probably want to look into WWW::Mechanize.
    Eric Bohlman, Sep 27, 2005
    #3
  4. Guest

    I was thinking more along the lines of HTML::SimpleLinkExtor where I
    submit it a link. The remote page needs time to pull from a database
    and it spits out information in the URL which I would like to parse
    out.
    , Sep 27, 2005
    #4
  5. Guest

    Nope... The returned document doesn't matter. I can use the example
    that Gunter listed to pull 404's and stuff. The information I'm
    looking for is embedded in the URL.
    , Sep 27, 2005
    #5
  6. wrote:
    > I was thinking more along the lines of HTML::SimpleLinkExtor where I
    > submit it a link. The remote page needs time to pull from a database
    > and it spits out information in the URL which I would like to parse
    > out.


    Now I'm confused. HTML::SimpleLinkExtor extracts links from an HTML
    document, while you said in your reply to Eric that the returned
    document doesn't matter. Either you don't explain accurately enough what
    it is you want, or I'm unusually stupid.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Sep 27, 2005
    #6
  7. wrote in news:1127776150.886370.189270
    @z14g2000cwz.googlegroups.com:

    [ Please quote an appropriate amount of context when you reply ]

    > Nope... The returned document doesn't matter. I can use the example
    > that Gunter


    s/Gunter/Gunnar

    > listed to pull 404's and stuff. The information I'm
    > looking for is embedded in the URL.


    You want to parse the URL that you use to invoke the script? I am a
    little confused. Don't you know how you constructed the URL?

    In any case, there probably is an answer to your question in the LWP
    documentation, I am just not sure what the question is.

    http://search.cpan.org/~gaas/libwww-perl-5.803/lib/LWP.pm
    http://search.cpan.org/~gaas/libwww-perl-
    5.803/lib/LWP.pm#The_Response_Object

    http://search.cpan.org/~rse/lcwa-1.0.0/lib/lwp/lib/URI/URL.pm

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, Sep 27, 2005
    #7
  8. Guest

    My apologies for not clarifying:

    So if I take a URL say http://www.test.com/&search=123 and submit it
    The server will respond back with a page but the url will have the
    information I am looking for ie:
    http://www.test.com/&result=0&status=true or something to that nature.

    What I want is the Result=0 and Status=True portion of the URL that it
    returns.

    My apologies on your name Gunnar. I always get icelandic names wrong.
    ;-)
    , Sep 27, 2005
    #8
  9. wrote in news:1127779163.028178.101840
    @g47g2000cwa.googlegroups.com:

    > My apologies for not clarifying:


    [ Please quote an appropriate amount of context when replying ]

    > So if I take a URL say http://www.test.com/&search=123 and submit it
    > The server will respond back with a page but the url will have the
    > information I am looking for ie:
    > http://www.test.com/&result=0&status=true or something to that nature.


    Then you should read the LWP docs.

    > What I want is the Result=0 and Status=True portion of the URL that it
    > returns.


    Go ahead and read the docs. If you hit a snag, post some code. Before
    posting code, read the posting guidelines.

    > My apologies on your name Gunnar. I always get icelandic names wrong.
    > ;-)


    He is from Sweden, though. (Gunnar: I don't know if you care, and
    apologies if I am overstepping my bounds here).

    Sinan

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, Sep 27, 2005
    #9
  10. Matt Garrish Guest

    "A. Sinan Unur" <> wrote in message
    news:Xns96DDD0A353291asu1cornelledu@127.0.0.1...
    > wrote in news:1127779163.028178.101840
    > @g47g2000cwa.googlegroups.com:
    >
    >
    >> My apologies on your name Gunnar. I always get icelandic names wrong.
    >> ;-)

    >
    > He is from Sweden, though. (Gunnar: I don't know if you care, and
    > apologies if I am overstepping my bounds here).
    >


    <quote>
    From the Old Norse name Gunnarr which was derived from the elements gunnr
    "war" and arr "warrior". It is thus a cognate of GÜNTHER. Gunnar was a
    character in Norse legend, the husband of Brynhild.
    </quote>

    I'm sure considering its history there are many people in Iceland with
    Nordic names. I have nothing else to add, but just thought it would be fun
    to join in this totally off-topic discussion of Gunnar's name... : )

    Matt
    Matt Garrish, Sep 27, 2005
    #10
  11. wrote:

    > My apologies for not clarifying:
    >
    > So if I take a URL say http://www.test.com/&search=123 and submit it
    > The server will respond back with a page but the url will have the
    > information I am looking for ie:
    > http://www.test.com/&result=0&status=true or something to that nature.


    Its doing a HTTP redirection if its changing the URL.
    If you are using Apache you MAY be able to look at the environment variables
    in your Perl program to see whats happening.

    gtoomey
    Gregory Toomey, Sep 27, 2005
    #11
  12. Matt Garrish wrote:
    > "A. Sinan Unur" wrote:
    >> wrote:
    >>> My apologies on your name Gunnar. I always get icelandic names wrong.
    >>> ;-)

    >>
    >> He is from Sweden, though. (Gunnar: I don't know if you care, and
    >> apologies if I am overstepping my bounds here).

    >
    > <quote>
    > From the Old Norse name Gunnarr which was derived from the elements gunnr
    > "war" and arr "warrior". It is thus a cognate of GÜNTHER. Gunnar was a
    > character in Norse legend, the husband of Brynhild.
    > </quote>
    >
    > I'm sure considering its history there are many people in Iceland with
    > Nordic names. I have nothing else to add, but just thought it would be fun
    > to join in this totally off-topic discussion of Gunnar's name... : )


    Not much for me to add either, it seems, other than:

    - Yes, Gunnar _is_ a common name in Iceland.

    - The meaning of it ("warrior") may explain my occasional stubbornness. ;-)

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Sep 27, 2005
    #12
  13. Brian Wakem Guest

    wrote:

    > My apologies for not clarifying:
    >
    > So if I take a URL say http://www.test.com/&search=123 and submit it
    > The server will respond back with a page but the url will have the
    > information I am looking for ie:
    > http://www.test.com/&result=0&status=true or something to that nature.
    >
    > What I want is the Result=0 and Status=True portion of the URL that it
    > returns.
    >
    > My apologies on your name Gunnar. I always get icelandic names wrong.
    > ;-)



    The target server is presumably sending a 302 Moved response and a location
    header. You could use LWP or WWW::Mechanize and empty the
    requests_redirectable array so you will end up with the 302 response rather
    than the final 200. Then extract the location header.


    --
    Brian Wakem
    Email: http://homepage.ntlworld.com/b.wakem/myemail.png
    Brian Wakem, Sep 27, 2005
    #13
  14. Joe Smith Guest

    wrote:
    > My apologies for not clarifying:
    >
    > So if I take a URL say http://www.test.com/&search=123 and submit it
    > The server will respond back with a page


    Hold it right there. The server will respond back with a response.

    The response may be an HTTP response with no content, or an HTML
    page with embedded URLs, or something else. The former may have
    a URL inside the header.

    > but the url will have the information I am looking for ie:
    > http://www.test.com/&result=0&status=true or something to that nature.


    The only way that makes sense is if the server actually returns
    "302 Moved\nLocation: http://www.test.com/&result=0&status=true\n\n"
    and your browser followed the redirection. LWP::Simple follows
    redirections.

    If this is the case, you need to tell the useragent to not follow HTTP
    redirects, and then look at the headers in the response it did get.

    > What I want is the Result=0 and Status=True portion of the URL that it
    > returns.


    That is easy, once you recognize exactly where that URL is coming
    from.
    -Joe
    Joe Smith, Oct 5, 2005
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jon paugh
    Replies:
    1
    Views:
    701
  2. Replies:
    4
    Views:
    548
    Neil Monk
    Mar 17, 2006
  3. Replies:
    19
    Views:
    1,120
    Daniel Vallstrom
    Mar 15, 2005
  4. Just D.
    Replies:
    0
    Views:
    414
    Just D.
    Aug 11, 2004
  5. 7stud --

    optparse: parse v. parse! ??

    7stud --, Feb 20, 2008, in forum: Ruby
    Replies:
    3
    Views:
    183
    7stud --
    Feb 20, 2008
Loading...

Share This Page