Retrieve Description/ Meta tags from website as well as remove HTML

Discussion in 'ASP .Net' started by Mark, Jun 24, 2005.

  1. Mark

    Mark Guest

    Hi all, does anyone know of a nice utility/ class which will allow me to
    retrieve the details of a webpage?

    Specifically, I would like to be able to retrive the html and then call a
    method which would retrieve: meta tags
    as well as another method which removes all the HTML from the string
    starting at the body tag

    Does one exist? I know I can write one using regular expressions etc but
    rather than inventing the wheel :)

    Mark, Jun 24, 2005
    1. Advertisements

  2. Mark

    JV Guest

    I assume you mean programmatically, since you can obviously hand-edit in VS
    or even just NOTEPAD.

    I had to do something like this to work around the VS bug where it
    occasionally eats the closing tag on a <link> tag. I didn't do a whole lot
    of research but here is what I can tell you.

    1) the HTML parsers I found were expensive. I didn't find a free one. Least
    not one that was useful.
    2) Sometimes people use the IE browser control for DOM access, but I found
    it to be pretty clunky for my purposes.
    3) You can't really load it in an XML document because the HTML is rarely
    well-formed XML (though maybe in VS2005 using XHTML it will be?)

    I ended up doing some of my own string parsing since my need was relatively
    JV, Jun 24, 2005
    1. Advertisements

  3. Mark

    Wilbur Slice Guest

    Yeah, take a look at this:
    Wilbur Slice, Jun 24, 2005
  4. Mark

    Mark Guest

    Mark, Jun 25, 2005
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.