Parsing a html page

Discussion in 'Java' started by swetha, Jun 8, 2006.

  1. swetha

    swetha Guest

    I'm working on research projects in which i need to extract specified
    content from a html page using Java code. There are many URLs in the
    html page and i have to write the code in such a way that it goes
    through the link and then extract the contents from that html page.
    In this way it must see through all the links.Can anyone help me in
    this? If anyone has a code for it please say it to me.
    thanks,
    Swetha.
     
    swetha, Jun 8, 2006
    #1
    1. Advertising

  2. swetha

    Boris Werner Guest

    swetha schrieb:
    > I'm working on research projects in which i need to extract specified
    > content from a html page using Java code. There are many URLs in the
    > html page and i have to write the code in such a way that it goes
    > through the link and then extract the contents from that html page.
    > In this way it must see through all the links.Can anyone help me in
    > this? If anyone has a code for it please say it to me.
    > thanks,
    > Swetha.
    >

    Hi!

    I found this article about programming a Webcrawler...

    http://www.devarticles.com/c/a/Java/Crawling-the-Web-with-Java/

    It might help you, because it does exactly what you want to.
    Just have a closer look at the methods (especially retrieveLinks() )

    Hope this helps

    Boris
     
    Boris Werner, Jun 8, 2006
    #2
    1. Advertising

  3. "swetha" <> wrote in message
    news:...
    > I'm working on research projects in which i need to extract specified
    > content from a html page using Java code. There are many URLs in the
    > html page and i have to write the code in such a way that it goes
    > through the link and then extract the contents from that html page.
    > In this way it must see through all the links.Can anyone help me in
    > this? If anyone has a code for it please say it to me.


    Have a look at the following articles, I think this will help you get
    started:
    http://jcsnippets.atspace.com/java/network-stuff/how-to-save-a-webpage.html
    http://jcsnippets.atspace.com/java/regular-expressions/regular-expressions-f
    ind-href.html

    These will allow you to save a webpage, and extract links from said page. If
    you'd like to extract more information, you need to use another regular
    expression.

    When you have a list of links, repeat the process of extracting those
    webpages.

    Best regards,

    JayCee
    --
    http://jcsnippets.atspace.com/
    a collection of source code, tips and tricks
     
    jcsnippets.atspace.com, Jun 9, 2006
    #3
  4. swetha

    Wibble Guest

    jcsnippets.atspace.com wrote:
    > "swetha" <> wrote in message
    > news:...
    >> I'm working on research projects in which i need to extract specified
    >> content from a html page using Java code. There are many URLs in the
    >> html page and i have to write the code in such a way that it goes
    >> through the link and then extract the contents from that html page.
    >> In this way it must see through all the links.Can anyone help me in
    >> this? If anyone has a code for it please say it to me.

    >
    > Have a look at the following articles, I think this will help you get
    > started:
    > http://jcsnippets.atspace.com/java/network-stuff/how-to-save-a-webpage.html
    > http://jcsnippets.atspace.com/java/regular-expressions/regular-expressions-f
    > ind-href.html
    >
    > These will allow you to save a webpage, and extract links from said page. If
    > you'd like to extract more information, you need to use another regular
    > expression.
    >
    > When you have a list of links, repeat the process of extracting those
    > webpages.
    >
    > Best regards,
    >
    > JayCee
    > --
    > http://jcsnippets.atspace.com/
    > a collection of source code, tips and tricks
    >
    >


    We use HtmlUnit for testing servlets and jsp's but its a pretty
    good screen scraper, javascript aware.

    http://htmlunit.sourceforge.net/
     
    Wibble, Jun 11, 2006
    #4
  5. swetha

    Wibble Guest

    Wibble wrote:
    > jcsnippets.atspace.com wrote:
    >> "swetha" <> wrote in message
    >> news:...
    >>> I'm working on research projects in which i need to extract specified
    >>> content from a html page using Java code. There are many URLs in the
    >>> html page and i have to write the code in such a way that it goes
    >>> through the link and then extract the contents from that html page.
    >>> In this way it must see through all the links.Can anyone help me in
    >>> this? If anyone has a code for it please say it to me.

    >>
    >> Have a look at the following articles, I think this will help you get
    >> started:
    >> http://jcsnippets.atspace.com/java/network-stuff/how-to-save-a-webpage.html
    >>
    >> http://jcsnippets.atspace.com/java/regular-expressions/regular-expressions-f
    >>
    >> ind-href.html
    >>
    >> These will allow you to save a webpage, and extract links from said
    >> page. If
    >> you'd like to extract more information, you need to use another regular
    >> expression.
    >>
    >> When you have a list of links, repeat the process of extracting those
    >> webpages.
    >>
    >> Best regards,
    >>
    >> JayCee
    >> --
    >> http://jcsnippets.atspace.com/
    >> a collection of source code, tips and tricks
    >>
    >>

    >
    > We use HtmlUnit for testing servlets and jsp's but its a pretty
    > good screen scraper, javascript aware.
    >
    > http://htmlunit.sourceforge.net/

    Oops, actually HttpUnit

    http://httpunit.sourceforge.net/
     
    Wibble, Jun 11, 2006
    #5
  6. swetha

    rujutanap

    Joined:
    Jun 24, 2010
    Messages:
    1
    Parsing web page in php

    I'm working on project in which i need to extract images
    and alt tags from a html page using PHP code. There are many images in the
    html page and i have to write the code in such a way that it goes
    through the image and then extract the alt tag from that <img>

    Can anyone help me in
    this?

    If anyone has a code for it please say it to me.

    Thanks,
    Rujuta
     
    rujutanap, Jun 24, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Kamoski
    Replies:
    1
    Views:
    7,120
  2. Stu
    Replies:
    2
    Views:
    795
    Rob McAninch
    Apr 6, 2004
  3. Replies:
    7
    Views:
    1,383
  4. Ninja Li

    Parsing HTML with HTML::TableExtract

    Ninja Li, Nov 27, 2009, in forum: Perl Misc
    Replies:
    2
    Views:
    228
    Martien Verbruggen
    Nov 28, 2009
  5. Ninja Li

    Parsing HTML with HTML::Tree

    Ninja Li, Mar 1, 2010, in forum: Perl Misc
    Replies:
    1
    Views:
    150
    Ninja Li
    Mar 1, 2010
Loading...

Share This Page