HTML info extraction utility

Discussion in 'HTML' started by MaggieMagill, Mar 3, 2005.

  1. MaggieMagill

    MaggieMagill Guest

    Is there any utility that can gather info such as a list of images, fonts
    used, links used, etc? Something that could start at "index.html" and run
    thru all other html (local) files that are referenced along the way?
     
    MaggieMagill, Mar 3, 2005
    #1
    1. Advertising

  2. MaggieMagill

    Richard Guest

    On Thu, 03 Mar 2005 04:43:58 GMT MaggieMagill wrote:

    > Is there any utility that can gather info such as a list of images, fonts
    > used, links used, etc? Something that could start at "index.html" and run
    > thru all other html (local) files that are referenced along the way?


    Perhaps with a javascript routine.
    Not directly possible with pure html.
     
    Richard, Mar 3, 2005
    #2
    1. Advertising

  3. MaggieMagill

    Andy Dingley Guest

    It was somewhere outside Barstow when MaggieMagill
    <> wrote:

    >Is there any utility that can gather info such as a list of images, fonts
    >used, links used, etc?


    Any number of them. They're usually written in Perl, because it has
    usable parsing modules available off the shelf.
     
    Andy Dingley, Mar 3, 2005
    #3
  4. MaggieMagill

    data64 Guest

    MaggieMagill <> wrote in news:iCwVd.25521$7z6.66
    @lakeread04:

    > Is there any utility that can gather info such as a list of images, fonts
    > used, links used, etc? Something that could start at "index.html" and run
    > thru all other html (local) files that are referenced along the way?


    As Andy replied, using Perl this can put together in short order. I think
    Dreamweaver also has some such capabilities (from what little I have used
    it). You can run reports on local sites that would give you this information.

    data64
     
    data64, Mar 4, 2005
    #4
  5. MaggieMagill

    MaggieMagill Guest

    Andy Dingley <> wrote in
    news::

    > It was somewhere outside Barstow when MaggieMagill
    > <> wrote:
    >
    >>Is there any utility that can gather info such as a list of images,
    >>fonts used, links used, etc?

    >
    > Any number of them. They're usually written in Perl, because it has
    > usable parsing modules available off the shelf.
    >


    Could you direct me to where I would find these types of utilities? I'm not
    quite sure what search terms I would use.

    I would be using them on a local machine that now has Apache 1.3.33 running
    and a bunch of 8-9 year old html pages (no styles used) that I need to sift
    thru. Images, fonts & links is really the only data I need to extract.

    I was thinking of breaking out the PASCAL until I realized that 15 years
    of not using it might have dulled my minimal skills.
     
    MaggieMagill, Mar 4, 2005
    #5
  6. MaggieMagill

    Andy Dingley Guest

    It was somewhere outside Barstow when MaggieMagill
    <> wrote:

    >Could you direct me to where I would find these types of utilities? I'm not
    >quite sure what search terms I would use.


    Google for "HTML analysis" or somesuch ought to give you tools that
    meet your immediate needs, immediately.

    If you want to write some Perl (whcih is a worthy goal, but probably
    overkill for this) then look at the LWP module and the HTML::parser
    class (HTML::TokeParser is sometimes easier to use for people less
    familiar with Perl). This will spit every tag back at you and a
    simple switch statement can recognise the tag types and analyse
    accordingly. A couple of hashes (associative arrays) to store things
    in and off you go.

    You could probably do this task with anything, and I;m not smart
    enough to know what the best tool is. But when I do it, I re-write the
    nasty hacky Perl I used last time and change a handful of lines for my
    specific need.
     
    Andy Dingley, Mar 4, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Himanshu Garg
    Replies:
    0
    Views:
    628
    Himanshu Garg
    Jan 27, 2004
  2. Replies:
    0
    Views:
    627
  3. Replies:
    4
    Views:
    450
    Nick Kew
    Dec 7, 2004
  4. Replies:
    0
    Views:
    358
  5. Rose
    Replies:
    3
    Views:
    89
    Ben Morrow
    Feb 16, 2008
Loading...

Share This Page