HTML info extraction utility

M

MaggieMagill

Is there any utility that can gather info such as a list of images, fonts
used, links used, etc? Something that could start at "index.html" and run
thru all other html (local) files that are referenced along the way?
 
R

Richard

Is there any utility that can gather info such as a list of images, fonts
used, links used, etc? Something that could start at "index.html" and run
thru all other html (local) files that are referenced along the way?

Perhaps with a javascript routine.
Not directly possible with pure html.
 
A

Andy Dingley

It was somewhere outside Barstow when MaggieMagill
Is there any utility that can gather info such as a list of images, fonts
used, links used, etc?

Any number of them. They're usually written in Perl, because it has
usable parsing modules available off the shelf.
 
D

data64

Is there any utility that can gather info such as a list of images, fonts
used, links used, etc? Something that could start at "index.html" and run
thru all other html (local) files that are referenced along the way?

As Andy replied, using Perl this can put together in short order. I think
Dreamweaver also has some such capabilities (from what little I have used
it). You can run reports on local sites that would give you this information.

data64
 
M

MaggieMagill

It was somewhere outside Barstow when MaggieMagill


Any number of them. They're usually written in Perl, because it has
usable parsing modules available off the shelf.

Could you direct me to where I would find these types of utilities? I'm not
quite sure what search terms I would use.

I would be using them on a local machine that now has Apache 1.3.33 running
and a bunch of 8-9 year old html pages (no styles used) that I need to sift
thru. Images, fonts & links is really the only data I need to extract.

I was thinking of breaking out the PASCAL until I realized that 15 years
of not using it might have dulled my minimal skills.
 
A

Andy Dingley

It was somewhere outside Barstow when MaggieMagill
Could you direct me to where I would find these types of utilities? I'm not
quite sure what search terms I would use.

Google for "HTML analysis" or somesuch ought to give you tools that
meet your immediate needs, immediately.

If you want to write some Perl (whcih is a worthy goal, but probably
overkill for this) then look at the LWP module and the HTML::parser
class (HTML::TokeParser is sometimes easier to use for people less
familiar with Perl). This will spit every tag back at you and a
simple switch statement can recognise the tag types and analyse
accordingly. A couple of hashes (associative arrays) to store things
in and off you go.

You could probably do this task with anything, and I;m not smart
enough to know what the best tool is. But when I do it, I re-write the
nasty hacky Perl I used last time and change a handful of lines for my
specific need.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top