SAX Parser for creating (not parsing) XML document

Discussion in 'XML' started by Naresh Agarwal, Jul 21, 2008.

  1. Hi

    I have been using DOM parser to create XML documents. I want to use an
    alternate mechanism to create XML document as size of my XML document
    is large and I don;t want the overhead of DOM (where entire tree in
    constructed in memory).

    Like SAX APIs for parsing XML documents, Is there any thing like SAX
    parser/APIs for creating XML documents?

    Is there any other standard mechanism of creating XML document,
    without requiring the entire tree structure to be constructed in
    memory?

    Thanks,
    Naresh
    Naresh Agarwal, Jul 21, 2008
    #1
    1. Advertising

  2. Naresh Agarwal wrote:
    > Hi
    >
    > I have been using DOM parser to create XML documents. I want to use an
    > alternate mechanism to create XML document as size of my XML document
    > is large and I don;t want the overhead of DOM (where entire tree in
    > constructed in memory).
    >
    > Like SAX APIs for parsing XML documents, Is there any thing like SAX
    > parser/APIs for creating XML documents?


    Java 6 has XMLStreamWriter
    http://www.java2s.com/Tutorial/Java/0440__XML/UsingXMLStreamWritertocreateXMLfile.htm


    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, Jul 21, 2008
    #2
    1. Advertising

  3. Naresh Agarwal

    Stefan Ram Guest

    Martin Honnen <> writes:
    >Java 6 has XMLStreamWriter


    Often, people use »tag« for »element«.

    Here, it seems to be the other way round:

    xtw.writeStartElement("http://www.w3.org/TR/REC-html40", "a");
    xtw.writeAttribute("href", "http://www.java2s.com");
    xtw.writeCharacters("here");
    xtw.writeEndElement();

    . It seems as »writeEndElement« is supposed
    to write the end /tag/, i.e., "</a>".

    So, more correct names to me would be:

    »writeEndTag()«,
    »writeEndOfElement()«, or
    »writeElementEnding()«.
    Stefan Ram, Jul 21, 2008
    #3
  4. On Jul 21, 4:09 pm, Martin Honnen <> wrote:
    > Naresh Agarwal wrote:
    > > Hi

    >
    > > I have been using DOM parser to create XML documents. I want to use an
    > > alternate mechanism to create XML document as size of my XML document
    > > is large and I don;t want the overhead of DOM (where entire tree in
    > > constructed in memory).

    >
    > > Like SAX APIs for parsing XML documents, Is there any thing like SAX
    > > parser/APIs for creating XML documents?

    >
    > Java 6 has XMLStreamWriterhttp://www.java2s.com/Tutorial/Java/0440__XML/UsingXMLStreamWritertoc...
    >
    > --
    >
    > Martin Honnen
    > http://JavaScript.FAQTs.com/


    Thanks. Is there any thing available for this with Java 1.5, any open
    source app?

    Best
    Naresh
    Naresh Agarwal, Jul 22, 2008
    #4
  5. Mon, 21 Jul 2008 21:46:42 -0700 (PDT), /Naresh Agarwal/:
    > On Jul 21, 4:09 pm, Martin Honnen <> wrote:
    >
    >> Java 6 has XMLStreamWriterhttp://www.java2s.com/Tutorial/Java/0440__XML/UsingXMLStreamWritertoc...

    >
    > Thanks. Is there any thing available for this with Java 1.5, any open
    > source app?


    <http://en.wikipedia.org/wiki/StAX> lists three implementations:

    > * http://stax.codehaus.org/ Reference Implementation
    > * Woodstox Open source StAX implementation
    > * https://sjsxp.dev.java.net is Sun's Stax implementation


    --
    Stanimir
    Stanimir Stamenkov, Jul 22, 2008
    #5
  6. Naresh Agarwal

    Tony Lavinio Guest

    Stefan Ram wrote:
    > Martin Honnen <> writes:
    >> Java 6 has XMLStreamWriter

    >
    > Often, people use »tag« for »element«.
    >
    > Here, it seems to be the other way round:
    >
    > xtw.writeStartElement("http://www.w3.org/TR/REC-html40", "a");
    > xtw.writeAttribute("href", "http://www.java2s.com");
    > xtw.writeCharacters("here");
    > xtw.writeEndElement();
    >
    > . It seems as »writeEndElement« is supposed
    > to write the end /tag/, i.e., "</a>".
    >
    > So, more correct names to me would be:
    >
    > »writeEndTag()«,
    > »writeEndOfElement()«, or
    > »writeElementEnding()«.


    You're not writing tags with this, you're writing /events/.

    So writeStartElement means you are sending the StartElement
    /event/ into the stream. writeEndElement means you are
    sending the EndElement /event/ into the stream.

    When using XMLStreamWriter, you must remember that you are
    acting as the parser.


    --
    Tony Lavinio <> DataDirect <> Stylus Studio XML <>
    XQuery, XSLT, XML Schema and EDI Toolset <> http://www.stylusstudio.com/
    <> There is no problem that brute force and ignorance cannot overcome <>
    Tony Lavinio, Jul 22, 2008
    #6
  7. Naresh Agarwal

    Stefan Ram Guest

    Naresh Agarwal <> writes:
    >Is there any thing available for this with Java 1.5, any open
    >source app?


    Writing XML is simpler than reading XML.

    So, sometimes, one can get by with no library at all:

    outStream.print( "<" );
    outStream.print( type );
    outStream.print( ">" );

    See also

    XML IN PRACTICE - APIs Considered Harmful
    http://www.itworld.com/nl/xml_prac/04182002/
    http://web.archive.org/web/*/http://open.itworld.com/nl/xml_prac/04182002/pf_index.html

    This was also being endorsed by Sun Microsystems, Inc.:

    http://web.archive.org/web/*/http:/.../jaxp/dist/1.1/docs/tutorial/sax/2a_echo.html

    See also:

    http://www.cafeconleche.org/slides/sd2002west/xmlandjava/52.html

    Another way to write XML with pre-1.6-Java:

    public class Main
    { public static void main( final java.lang.String[] args )
    throws java.lang.Throwable
    { final java.io.FileOutputStream fileOutputStream =
    new java.io.FileOutputStream( "tmp.txt" );
    com.sun.org.apache.xml.internal.serialize.OutputFormat outputFormat =
    new com.sun.org.apache.xml.internal.serialize.OutputFormat
    ( "XML", "UTF-8", true );
    outputFormat.setIndent( 1 );
    outputFormat.setIndenting( true );
    // outputFormat.setDoctype( null, "users.dtd" );
    com.sun.org.apache.xml.internal.serialize.XMLSerializer serializer =
    new com.sun.org.apache.xml.internal.serialize.XMLSerializer
    ( fileOutputStream, outputFormat);
    serializer.startDocument();
    { serializer.startElement("", "example", "example",
    new com.sun.org.apache.xml.internal.serializer.
    AttributesImplSerializer() );
    { final char[] data = "test text" . toCharArray();
    serializer.characters( data, 0, data.length );
    serializer.endElement( "", "example", "example" ); }
    serializer.endDocument(); }}}

    Giving:

    <?xml version="1.0" encoding="UTF-8"?>
    <example>test text</example>

    Other pages about writing XML with pre-1.6-Java:

    http://www.javazoom.net/services/newsletter/xmlgeneration.html
    http://dev.w3.org/cvsweb/java/class.../serialization/xml/XMLWriter.java?rev=1.1.2.4
    http://itext.sourceforge.net/src/com/lowagie/text/xml/XmlWriter.java
    http://xmlenc.sourceforge.net/j2h/0.39/org/znerd/xmlenc/XMLOutputter.java.html
    http://gridportal.fzk.de/cgi-bin/vi...src/util/ucy/old/Attic/XMLWriter.java?rev=1.2
    http://www.cin.ufpe.br/~avcc/Projeto-AI/xml4j-2.0/src/com/ibm/xml/parser/FormatPrintVisitor.java
    http://www.cin.ufpe.br/~avcc/Projeto-AI/xml4j-2.0/src/com/ibm/xml/parser/util/HTMLPrintVisitor.java
    http://ptolemy.berkeley.edu/ptolemyii/ptIIlatest/ptII/diva/gui/MultipageWriter.java
    http://search.cpan.org/src/PERRAD/CORBA-JAVA-2.48/XMLOutputStream.java

    (If a page is gone from the web try using web.archive.org,
    I have not checked all links from my notes again now.)

    I also have written a small API to generate text with nested
    structures as part of the GPL Java library »ram.jar«.

    http://www.purl.org/stefan_ram/pub/ram-jar

    Here is an example showing how to use this to create XML.

    public class Main
    { public static void main( final java.lang.String[] args )
    { final de.dclj.ram.system.text.LineAppendable lineAppendable = new
    de.dclj.ram.system.text.LineAppender( System.out );
    lineAppendable.appendLine( "+html", "<html>" );
    lineAppendable.appendLine( "+body", "<body>" );
    lineAppendable.appendLine( "+p", "<p>" );
    lineAppendable.indent();
    { java.util.StringTokenizer tokenizer = new java.util.StringTokenizer
    ( de.dclj.ram.notation.xml.Text.sourceText
    ( "Falls jedoch \"i<0\" gilt, dann ist \"i&2\" negativ und der" +
    " Leitwert des Materials ist dann nicht definiert. Um dies zu" +
    " verhindern, kann eine sogenannte \"Halbfunktion\" in die" +
    " Platte gebaut werden. Alle Schwingungen werden dann im" +
    " wesentlichen umgeleitet." ).toString());
    boolean next = false;
    while( tokenizer.hasMoreTokens() )
    { if( next )
    { lineAppendable.addWord( tokenizer.nextToken() ); }
    else
    { lineAppendable.appendWord( tokenizer.nextToken() ); next = true; }}}
    lineAppendable.appendLine();
    lineAppendable.appendLine( "-p", "</p>" );
    lineAppendable.appendLine( "-body", "</body>" );
    lineAppendable.appendLine( "-html", "</html>" );
    lineAppendable.finish(); }}

    <html>
    <body>
    <p>
    Falls jedoch "i<0" gilt, dann ist "i&2" negativ
    und der Leitwert des Materials ist dann nicht definiert. Um dies zu
    verhindern, kann eine sogenannte "Halbfunktion" in die Platte
    gebaut werden. Alle Schwingungen werden dann im wesentlichen umgeleitet.
    </p>
    </body>
    </html>

    This API is not actually aware of XML or DTDs, but checks
    proper nesting (i.e., "+body" must be paired with "-body" at
    the same level) and handles indentation and source text
    paragraph breaking. It might also be used to generate texts of
    other formal languages such as Java source code.
    Stefan Ram, Jul 22, 2008
    #7
  8. 22 Jul 2008 16:31:54 GMT, /Stefan Ram/:

    > Another way to write XML with pre-1.6-Java:
    >
    > com.sun.org.apache.xml.internal.serialize.XMLSerializer serializer =
    > new com.sun.org.apache.xml.internal.serialize.XMLSerializer
    > ( fileOutputStream, outputFormat);
    > serializer.startDocument();
    > { serializer.startElement("", "example", "example",
    > new com.sun.org.apache.xml.internal.serializer.
    > AttributesImplSerializer() );
    > { final char[] data = "test text" . toCharArray();
    > serializer.characters( data, 0, data.length );
    > serializer.endElement( "", "example", "example" ); }
    > serializer.endDocument(); }}}


    Here's more standard way of doing it without requiring
    implementation specific classes (works with Java 1.4):

    http://mail-archives.apache.org/mod...s/200801.mbox/<>

    --
    Stanimir
    Stanimir Stamenkov, Jul 23, 2008
    #8
  9. Stefan Ram a écrit :
    > Naresh Agarwal <> writes:
    >> Is there any thing available for this with Java 1.5, any open
    >> source app?

    >
    > Writing XML is simpler than reading XML.
    >
    > So, sometimes, one can get by with no library at all:
    >
    > outStream.print( "<" );
    > outStream.print( type );
    > outStream.print( ">" );
    >


    hi,

    IMHO, this is one of the worst practice

    outStream.print( "<" );
    outStream.print( type );
    outStream.print( ">" );
    outStream.print( "if (a<b ) {}" );

    you'll have too many occasions to break your XML result ; rely on tools
    made for that purpose that will produce well-formed XML

    you can also use templates for creating SAX documents with RefleX :
    http://reflex.gforge.inria.fr/tips.html#createFromScratch
    of course, you can loop on whatever you want to create content (merge
    thousand XML files for example:
    http://reflex.gforge.inria.fr/tips.html#parsingFragments )

    then pipe the created doc to an XSLT serializer :
    http://reflex.gforge.inria.fr/tips.html#xsltSerialization

    you can use it from the command line, embed it in a program, or deploy
    it within a web server

    tutorials are available here :
    http://reflex.gforge.inria.fr/tutorial.html

    I will talk about all that stuff at Balisage 2008 in Montreal, with a
    focus on schema languages
    http://www.balisage.net/Program.html#h245p

    --
    Cordialement,

    ///
    (. .)
    --------ooO--(_)--Ooo--------
    | Philippe Poulard |
    -----------------------------
    http://reflex.gforge.inria.fr/
    Have the RefleX !
    Philippe Poulard, Jul 24, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Per Magnus L?vold
    Replies:
    0
    Views:
    1,362
    Per Magnus L?vold
    Nov 15, 2004
  2. Per Magnus L?vold

    SAX parser ignores part of XML document

    Per Magnus L?vold, Nov 19, 2004, in forum: Java
    Replies:
    1
    Views:
    459
    John C. Bollinger
    Nov 19, 2004
  3. Replies:
    0
    Views:
    2,710
  4. Sanjeev
    Replies:
    4
    Views:
    1,418
    Stanimir Stamenkov
    May 4, 2008
  5. Erik Wasser
    Replies:
    5
    Views:
    428
    Peter J. Holzer
    Mar 5, 2006
Loading...

Share This Page