parse HTML

Discussion in 'Java' started by VitaminB, Apr 25, 2006.

  1. VitaminB

    VitaminB Guest

    Hello,

    I want to parse a HTML document to get all URLs of the frames in a
    frameset. I get a "NullPointer Exception" in the System.out.println...

    Thanks a lot for you help.

    Regards,
    Marcus


    ##################
    Java Code:
    ##################

    URL urlobj = new URL(str);

    HttpURLConnection uc = null;
    uc = (HttpURLConnection)urlobj.openConnection();
    uc.setUseCaches(false);
    DataInputStream is = new DataInputStream(uc.getInputStream());

    HTMLEditorKit hKit = new HTMLEditorKit();
    HTMLDocument hDoc = new HTMLDocument();
    hKit.read(is, hDoc, 0);
    HTMLDocument.Iterator it = hDoc.getIterator(HTML.Tag.FRAME);

    AttributeSet attSet = it.getAttributes();
    String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
    System.out.println(s);





    ##################
    Beispiel HTML-Seite:
    ##################

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN">
    <html>
    <head>

    <script language="JavaScript" type="text/javascript">
    <!--
    self._domino_name = "_Main";
    // -->
    </script>
    </head>

    <frameset cols="45%,55%">

    <frame
    src="/Test/HET/PerformanceTestDB.nsf/ContentDeliveryMeasurement?OpenForm">


    <frameset rows="1*,1*">

    <frame src="/Test/HET/PerformanceTestDB.nsf/DocsInserted?OpenView">

    <frame name="docPreviewFrame"
    src="/Test/HET/PerformanceTestDB.nsf/select?OpenForm">
    </frameset>
    </frameset>
    </html>
    VitaminB, Apr 25, 2006
    #1
    1. Advertising

  2. VitaminB

    Amfur Kilnem Guest

    "VitaminB" <> wrote in message
    news:...
    > AttributeSet attSet = it.getAttributes();
    > String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
    > System.out.println(s);


    attSet.getAttribute must've returned null.
    Amfur Kilnem, Apr 25, 2006
    #2
    1. Advertising

  3. VitaminB

    VitaminB Guest

    Why?
    VitaminB, Apr 25, 2006
    #3
  4. VitaminB

    Oliver Wong Guest

    "VitaminB" <> wrote in message
    news:...
    >
    > I want to parse a HTML document to get all URLs of the frames in a
    > frameset. I get a "NullPointer Exception" in the System.out.println...

    [...]
    >
    > ##################
    > Java Code:
    > ##################
    >
    > URL urlobj = new URL(str);
    >
    > HttpURLConnection uc = null;
    > uc = (HttpURLConnection)urlobj.openConnection();
    > uc.setUseCaches(false);
    > DataInputStream is = new DataInputStream(uc.getInputStream());
    >
    > HTMLEditorKit hKit = new HTMLEditorKit();
    > HTMLDocument hDoc = new HTMLDocument();
    > hKit.read(is, hDoc, 0);
    > HTMLDocument.Iterator it = hDoc.getIterator(HTML.Tag.FRAME);
    >
    > AttributeSet attSet = it.getAttributes();
    > String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
    > System.out.println(s);


    I don't see how you could have gotten an NPE from the
    System.out.printlnt statement. Are you sure you didn't get it from the line
    above, or possibly somewhere else? See the section titled "If you get an
    error message, repeat it exactly." at
    http://riters.com/JINX/index.cgi/Suggestions_20for_20Asking_20Questions_20on_20Newsgroups

    - Oliver
    Oliver Wong, Apr 25, 2006
    #4
  5. VitaminB

    VitaminB Guest

    OK, now I worked on my code and get anothere exception. But I similary
    don't know why.

    Here is the failure:
    javax.swing.text.ChangedCharSetException
    at
    javax.swing.text.html.parser.DocumentParser.handleEmptyTag(DocumentParser.java:198)
    at javax.swing.text.html.parser.Parser.startTag(Parser.java:401)
    at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1875)
    at javax.swing.text.html.parser.Parser.parseContent(Parser.java:1910)
    at javax.swing.text.html.parser.Parser.parse(Parser.java:2076)
    at
    javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:135)
    at
    javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:107)
    at javax.swing.text.html.HTMLEditorKit.read(HTMLEditorKit.java:262)
    at javax.swing.text.DefaultEditorKit.read(DefaultEditorKit.java:163)
    at Stress.urlRequest(Stress.java:76)
    at Stress.run(Stress.java:40)



    Here are the code:

    public long[] urlRequest(String str) {
    Cal starttime = new Cal();
    long[] read = new long[2];
    try {
    int c = 0;
    byte[] rc = new byte[1024];
    URL urlobj = new URL(str);


    HTTPRequest request = new HTTPRequest(str, user, pass);
    DataInputStream is = new DataInputStream( request.get() );

    HTMLEditorKit hKit = new HTMLEditorKit();
    HTMLDocument hDoc = new HTMLDocument();
    hKit.read(is, hDoc, 0);

    HTMLDocument.Iterator it = hDoc.getIterator(HTML.Tag.FRAME);
    it.next();
    AttributeSet attSet = it.getAttributes();
    String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
    System.out.println(s);


    //System.out.println(attSet.getAttributeCount());



    while (( c = is.read(rc)) != -1 ) {
    read[0] = read[0] + c;
    }
    Cal endtime = new Cal();
    read[1] = endtime.getTimeInMillis() -
    starttime.getTimeInMillis();
    return read;
    }
    catch ( Exception e ) {
    e.printStackTrace();
    }
    return read;
    }

    }
    VitaminB, Apr 25, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mitchua
    Replies:
    3
    Views:
    1,176
    Mitchua
    Jul 14, 2003
  2. jjliu

    parse inside of html tags

    jjliu, Oct 8, 2003, in forum: Perl
    Replies:
    3
    Views:
    513
    Eric J. Roode
    Oct 11, 2003
  3. =?Utf-8?B?U3RlcGhhbmU=?=

    The best way to parse an html file?

    =?Utf-8?B?U3RlcGhhbmU=?=, Oct 9, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    6,957
    Martin Honnen
    Oct 9, 2004
  4. Replies:
    19
    Views:
    1,101
    Daniel Vallstrom
    Mar 15, 2005
  5. 7stud --

    optparse: parse v. parse! ??

    7stud --, Feb 20, 2008, in forum: Ruby
    Replies:
    3
    Views:
    173
    7stud --
    Feb 20, 2008
Loading...

Share This Page