Looking for Perl Script, which retrieves all Links from a site

F

fritz-bayer

Hello,

I'm searching for a powerfull yet easy to use perl script, which will
retrieve all the links from a website and list them.

I need it, because I would like to perform some autmated tests on all
the pages of a site, like checking the HTML code being valid etc.

Any hints to conrete resources are very welcome!

Thanks,
Fritz
 
A

A. Sinan Unur

No, I meant something helpfull.

Please read the posting guidelines for this group to learn how you can
help yourself, and help others help you.

In particular, we do not write your program for you.

Also, when replying, please make sure to quote enough of a context so we
can figure out the topic just by looking at your post.

This is a trivial task using any one of the HTML parsers on CPAN. In
fact, HTML::parser includes an example script that does exactly that for
a single HTML document. See:

<URL:http://search.cpan.org/src/GAAS/HTML-Parser-3.48/eg/>

You can use that as a basis, and combine it with page retrieving code.

On the other hand, are you sure you cannot use wget?

Sinan
 
J

Jürgen Exner

I'm searching for a powerfull yet easy to use perl script, which will
retrieve all the links from a website

See
perldoc -q HTML: "How do I fetch an HTML file?"
perldoc -q HTML: "How do I remove HTML from a string?" (ok, granted, you
don't want to remove HTML but extract certain HTML elements. But the
technique to use is identical)
and list them.

perldoc -f print

jue
 
J

Jürgen Exner

No, I meant something helpfull.

What do you mean by "No"? You are not looking for a Perl script that
retrieves links?
Or you don't think such a script would be helpful?

jue
 
S

Sherm Pendley

I'm searching for a powerfull yet easy to use perl script, which will
retrieve all the links from a website and list them.

I need it, because I would like to perform some autmated tests on all
the pages of a site, like checking the HTML code being valid etc.

Any hints to conrete resources are very welcome!

If you're looking at a ready-made, no-programming-required solution, you're
probably in the wrong group. You might try alt.www.webmaster for that.

If you're looking to write something and want to avoid reinventing the wheel,
have a look at WWW::Mechanize. It will take care of most of the heavy lifting
for you, so you can get on with writing the site-specific pieces.

sherm--
 
F

fritz-bayer

Thanks but I have read those already a while ago. I know basic perl
programming.
 
F

fritz-bayer

I'm looking for a perl script, which will spider a site and print out
all the links. So you call it, pass a domain name to it an then all the
links get printed out. Kind of like mirroring a site. With the
difference so, that the content is not placed on the computer. Just the
links get printed out.
 
A

Anno Siegel

Please post attributions and some context with your followup.
Thanks but I have read those already a while ago. I know basic perl
programming.

So what's stopping you from applying what you've read?

Anno
 
D

Dr.Ruud

(e-mail address removed) schreef:
I'm looking for a perl script, which will spider a site and print out
all the links. So you call it, pass a domain name to it an then all
the links get printed out. Kind of like mirroring a site. With the
difference so, that the content is not placed on the computer. Just
the links get printed out.

bye
 
A

Anno Siegel

I'm looking for a perl script, which will spider a site and print out
all the links. So you call it, pass a domain name to it an then all the
links get printed out. Kind of like mirroring a site. With the
difference so, that the content is not placed on the computer. Just the
links get printed out.

"I want to dig a hole. It's kind of like building a house, only you
put no house in the foundation."

Anno
 
J

Jürgen Exner


Thanks to whom for what? Please quote the appropriate amount of context -as
has been customary for two decades- such that your readers have a chance to
know what you are talking about.
but I have read those already a while ago.

What did you read? Please quote the appropriate amount of context -as has
been customary for two decades- such that your readers have a chance to know
what you are talking about.
I know basic perl programming.

Good. Then, what is your problem?
Please quote the appropriate amount of context -as has been customary for
two decades- such that your readers have a chance to know what you are
talking about.

jue
 
J

Jürgen Exner

I'm looking for a perl script, which will spider a site and print out
all the links. So you call it, pass a domain name to it an then all
the links get printed out. Kind of like mirroring a site. With the
difference so, that the content is not placed on the computer. Just
the links get printed out.

perldoc LWP
perldoc -q HTML: "How do I remove HTML from a string?" (removing HTML is
akin to extracting the text portion of an HTML file, you want to extract the
links. Therefore you can use exactly the same technique)
perldoc -f print

jue
 
S

Scott Bryce

I'm looking for a perl script, which will spider a site and print out
all the links. So you call it, pass a domain name to it an then all the
links get printed out. Kind of like mirroring a site. With the
difference so, that the content is not placed on the computer. Just the
links get printed out.

I think you are missing an important point. This newsgroup is not about
writing scripts for people. Neither is it about finding existing scripts
for people. It is about helping people write Perl.

There are numerous script repositories that contain hundreds of scripts
of varying quality. You might look there. Or you might decide to write
the script yourself. If you decide to write it yourself, we may be able
to help you, should get stuck on some detail. But PLEASE, read the
posting guidelines for this newsgroup before you ask for help. They are
posted here about twice a week.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top