HTML Parser - problem with multiple instances

Discussion in 'Java' started by Matt, Apr 29, 2005.

  1. Matt

    Matt Guest

    I have a parser program which queries a online shopping comparison web
    page and extracts the information needed. I am trying to run this
    program with different search terms which are created by entering a
    sentence, so each one is sent separately, however the outputs (text
    files) are the same for each word, despite the correct term and output
    file seeming passed. I suspect it might be that the connection is not
    being closed each time but am not sure why this is happening.

    If i create an identical copy of the program and run that after the
    first one it works but this is not an appropriate solution.

    Any help would be much appreciated. Here is some of my code, if more
    is required i will post.

    To run the program:

    StringTokenizer t = new StringTokenizer("red green yellow", " ");
    int c = 0;
    Parser1 p = new Parser1();
    while (t.hasMoreTokens()) {
    c++;
    String tok = t.nextToken();

    File tem = new File("C:/"+c+".txt");

    p.mainprog(tok, tem);
    p.mainprog(tok, tem)

    p.mainprog(tok, tem);
    }

    The parser:

    import javax.swing.text.html.parser.*;
    import javax.swing.text.html.*;
    import javax.swing.text.*;
    import java.awt.*;
    import java.util.*;
    import javax.swing.*;
    import java.io.*;
    import java.net.*;

    public class Parser1 extends HTMLEditorKit.ParserCallback {

    variable declarations

    public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int
    pos){
    ...methods
    }
    public void handleText(char[] data, int pos){
    ...methods
    }
    public void handleTitleTag(HTML.Tag t, char[] data){

    }

    public void handleEmptyTag(HTML.Tag t, char[] data){

    }

    public void handleSimpleTag(HTML.Tag t, MutableAttributeSet a, int
    pos){

    ...methods
    }
    static void mainprog(String term, File file) {

    ....proxy and authentication methods


    Authenticator.setDefault(new MyAuthenticator() );

    HTMLEditorKit editorKit = new HTMLEditorKit();
    HTMLDocument HTMLDoc;
    Reader HTMLReader;

    try {
    String temp = new String(term);
    String fullurl = new String(MainUrl+temp);
    url = new URL(fullurl);
    InputStream myInStream;
    myInStream =
    url.openConnection().getInputStream();
    HTMLReader = (new
    InputStreamReader(myInStream));
    HTMLDoc = (HTMLDocument)
    editorKit.createDefaultDocument();
    HTMLDoc.putProperty("IgnoreCharsetDirective",
    new Boolean(true));

    ParserDelegator parser = new
    ParserDelegator();
    HTMLEditorKit.ParserCallback callback = new
    Parser1();
    parser.parse(HTMLReader, callback, true);

    callback.flush();

    HTMLReader.close();
    myInStream.close();


    }

    catch (IOException IOE) {
    IOE.printStackTrace();
    }
    catch (Exception e) {
    e.printStackTrace();
    }

    try {
    FileWriter writer = new FileWriter(file);
    BufferedWriter bw = new BufferedWriter(writer);
    for (int i = 0; i < vect.size(); i++){

    bw.write((String)vect.elementAt(i));
    if (vect.elementAt(i)!=vect.lastElement()){
    bw.newLine();
    }
    }

    bw.flush();
    bw.close();
    writer.close();
    }
    catch (IOException IOE) {
    IOE.printStackTrace();
    }
    catch (Exception e) {
    e.printStackTrace();
    }

    } catch (IOException IOE) {
    System.out.println("User options not found.");
    }


    }
    }
     
    Matt, Apr 29, 2005
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ZOCOR

    XML Parser VS HTML Parser

    ZOCOR, Oct 3, 2004, in forum: Java
    Replies:
    11
    Views:
    818
    Paul King
    Oct 5, 2004
  2. John Wohlbier
    Replies:
    2
    Views:
    368
    Josiah Carlson
    Feb 22, 2004
  3. Replies:
    8
    Views:
    469
    James Stroud
    Jan 29, 2009
  4. Nav
    Replies:
    15
    Views:
    559
    Steven D'Aprano
    Jan 5, 2010
  5. Zach Dennis

    HTML-Parser / SGML-Parser

    Zach Dennis, Oct 1, 2003, in forum: Ruby
    Replies:
    5
    Views:
    407
    Bernard Delmée
    Oct 1, 2003
Loading...

Share This Page