Parse what's in a URL

D

donfanning

Is there a way using perl to take a URL, submit it, then parse the
resulting url it returns after the page pulls? Like for submitting a
query to a database and getting a status code in return (success,
failure, reason, etc..)
 
G

Gunnar Hjalmarsson

Is there a way using perl to take a URL, submit it, then parse the
resulting url it returns after the page pulls? Like for submitting a
query to a database and getting a status code in return (success,
failure, reason, etc..)

If you want the resulting _content_, this FAQ entry is applicable:

perldoc -q "HTML file"

If you are after only the HTTP status, this may help:

use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $response = $ua->get($url);
print $response->status_line;

In general, please make yourself comfortable with the LWP family of CPAN
modules.
 
E

Eric Bohlman

(e-mail address removed) wrote in @z14g2000cwz.googlegroups.com:
Is there a way using perl to take a URL, submit it, then parse the
resulting url it returns after the page pulls? Like for submitting a
query to a database and getting a status code in return (success,
failure, reason, etc..)

Your terminology is a bit confused here; what an HTTP request sent to a
particular URL returns is not a URL, but a response. The response may be
just a status code for the HTTP request, or a resource like an HTML page
(which may contain status codes not related to HTTP, such as database query
results). I assume you want to parse the returned document. If so, you
probably want to look into WWW::Mechanize.
 
D

donfanning

I was thinking more along the lines of HTML::SimpleLinkExtor where I
submit it a link. The remote page needs time to pull from a database
and it spits out information in the URL which I would like to parse
out.
 
D

donfanning

Nope... The returned document doesn't matter. I can use the example
that Gunter listed to pull 404's and stuff. The information I'm
looking for is embedded in the URL.
 
G

Gunnar Hjalmarsson

I was thinking more along the lines of HTML::SimpleLinkExtor where I
submit it a link. The remote page needs time to pull from a database
and it spits out information in the URL which I would like to parse
out.

Now I'm confused. HTML::SimpleLinkExtor extracts links from an HTML
document, while you said in your reply to Eric that the returned
document doesn't matter. Either you don't explain accurately enough what
it is you want, or I'm unusually stupid.
 
A

A. Sinan Unur

(e-mail address removed) wrote in @z14g2000cwz.googlegroups.com:

[ Please quote an appropriate amount of context when you reply ]
Nope... The returned document doesn't matter. I can use the example
that Gunter
s/Gunter/Gunnar

listed to pull 404's and stuff. The information I'm
looking for is embedded in the URL.

You want to parse the URL that you use to invoke the script? I am a
little confused. Don't you know how you constructed the URL?

In any case, there probably is an answer to your question in the LWP
documentation, I am just not sure what the question is.

http://search.cpan.org/~gaas/libwww-perl-5.803/lib/LWP.pm
http://search.cpan.org/~gaas/libwww-perl-
5.803/lib/LWP.pm#The_Response_Object

http://search.cpan.org/~rse/lcwa-1.0.0/lib/lwp/lib/URI/URL.pm
 
A

A. Sinan Unur

(e-mail address removed) wrote in @g47g2000cwa.googlegroups.com:
My apologies for not clarifying:

[ Please quote an appropriate amount of context when replying ]
So if I take a URL say http://www.test.com/&search=123 and submit it
The server will respond back with a page but the url will have the
information I am looking for ie:
http://www.test.com/&result=0&status=true or something to that nature.

Then you should read the LWP docs.
What I want is the Result=0 and Status=True portion of the URL that it
returns.

Go ahead and read the docs. If you hit a snag, post some code. Before
posting code, read the posting guidelines.
My apologies on your name Gunnar. I always get icelandic names wrong.
;-)

He is from Sweden, though. (Gunnar: I don't know if you care, and
apologies if I am overstepping my bounds here).

Sinan
 
M

Matt Garrish

A. Sinan Unur said:
(e-mail address removed) wrote in @g47g2000cwa.googlegroups.com:



He is from Sweden, though. (Gunnar: I don't know if you care, and
apologies if I am overstepping my bounds here).

<quote>
From the Old Norse name Gunnarr which was derived from the elements gunnr
"war" and arr "warrior". It is thus a cognate of GÜNTHER. Gunnar was a
character in Norse legend, the husband of Brynhild.
</quote>

I'm sure considering its history there are many people in Iceland with
Nordic names. I have nothing else to add, but just thought it would be fun
to join in this totally off-topic discussion of Gunnar's name... : )

Matt
 
G

Gregory Toomey

My apologies for not clarifying:

So if I take a URL say http://www.test.com/&search=123 and submit it
The server will respond back with a page but the url will have the
information I am looking for ie:
http://www.test.com/&result=0&status=true or something to that nature.

Its doing a HTTP redirection if its changing the URL.
If you are using Apache you MAY be able to look at the environment variables
in your Perl program to see whats happening.

gtoomey
 
G

Gunnar Hjalmarsson

Matt said:
<quote>
From the Old Norse name Gunnarr which was derived from the elements gunnr
"war" and arr "warrior". It is thus a cognate of GÜNTHER. Gunnar was a
character in Norse legend, the husband of Brynhild.
</quote>

I'm sure considering its history there are many people in Iceland with
Nordic names. I have nothing else to add, but just thought it would be fun
to join in this totally off-topic discussion of Gunnar's name... : )

Not much for me to add either, it seems, other than:

- Yes, Gunnar _is_ a common name in Iceland.

- The meaning of it ("warrior") may explain my occasional stubbornness. ;-)
 
B

Brian Wakem

My apologies for not clarifying:

So if I take a URL say http://www.test.com/&search=123 and submit it
The server will respond back with a page but the url will have the
information I am looking for ie:
http://www.test.com/&result=0&status=true or something to that nature.

What I want is the Result=0 and Status=True portion of the URL that it
returns.

My apologies on your name Gunnar. I always get icelandic names wrong.
;-)


The target server is presumably sending a 302 Moved response and a location
header. You could use LWP or WWW::Mechanize and empty the
requests_redirectable array so you will end up with the 302 response rather
than the final 200. Then extract the location header.
 
J

Joe Smith

My apologies for not clarifying:

So if I take a URL say http://www.test.com/&search=123 and submit it
The server will respond back with a page

Hold it right there. The server will respond back with a response.

The response may be an HTTP response with no content, or an HTML
page with embedded URLs, or something else. The former may have
a URL inside the header.
but the url will have the information I am looking for ie:
http://www.test.com/&result=0&status=true or something to that nature.

The only way that makes sense is if the server actually returns
"302 Moved\nLocation: http://www.test.com/&result=0&status=true\n\n"
and your browser followed the redirection. LWP::Simple follows
redirections.

If this is the case, you need to tell the useragent to not follow HTTP
redirects, and then look at the headers in the response it did get.
What I want is the Result=0 and Status=True portion of the URL that it
returns.

That is easy, once you recognize exactly where that URL is coming
from.
-Joe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top