SAX Parser for creating (not parsing) XML document

N

Naresh Agarwal

Hi

I have been using DOM parser to create XML documents. I want to use an
alternate mechanism to create XML document as size of my XML document
is large and I don;t want the overhead of DOM (where entire tree in
constructed in memory).

Like SAX APIs for parsing XML documents, Is there any thing like SAX
parser/APIs for creating XML documents?

Is there any other standard mechanism of creating XML document,
without requiring the entire tree structure to be constructed in
memory?

Thanks,
Naresh
 
M

Martin Honnen

S

Stefan Ram

Martin Honnen said:
Java 6 has XMLStreamWriter

Often, people use »tag« for »element«.

Here, it seems to be the other way round:

xtw.writeStartElement("http://www.w3.org/TR/REC-html40", "a");
xtw.writeAttribute("href", "http://www.java2s.com");
xtw.writeCharacters("here");
xtw.writeEndElement();

. It seems as »writeEndElement« is supposed
to write the end /tag/, i.e., "</a>".

So, more correct names to me would be:

»writeEndTag()«,
»writeEndOfElement()«, or
»writeElementEnding()«.
 
T

Tony Lavinio

Stefan said:
Often, people use »tag« for »element«.

Here, it seems to be the other way round:

xtw.writeStartElement("http://www.w3.org/TR/REC-html40", "a");
xtw.writeAttribute("href", "http://www.java2s.com");
xtw.writeCharacters("here");
xtw.writeEndElement();

. It seems as »writeEndElement« is supposed
to write the end /tag/, i.e., "</a>".

So, more correct names to me would be:

»writeEndTag()«,
»writeEndOfElement()«, or
»writeElementEnding()«.

You're not writing tags with this, you're writing /events/.

So writeStartElement means you are sending the StartElement
/event/ into the stream. writeEndElement means you are
sending the EndElement /event/ into the stream.

When using XMLStreamWriter, you must remember that you are
acting as the parser.
 
S

Stefan Ram

Naresh Agarwal said:
Is there any thing available for this with Java 1.5, any open
source app?

Writing XML is simpler than reading XML.

So, sometimes, one can get by with no library at all:

outStream.print( "<" );
outStream.print( type );
outStream.print( ">" );

See also

XML IN PRACTICE - APIs Considered Harmful
http://www.itworld.com/nl/xml_prac/04182002/
http://web.archive.org/web/*/http://open.itworld.com/nl/xml_prac/04182002/pf_index.html

This was also being endorsed by Sun Microsystems, Inc.:

http://web.archive.org/web/*/http:/.../jaxp/dist/1.1/docs/tutorial/sax/2a_echo.html

See also:

http://www.cafeconleche.org/slides/sd2002west/xmlandjava/52.html

Another way to write XML with pre-1.6-Java:

public class Main
{ public static void main( final java.lang.String[] args )
throws java.lang.Throwable
{ final java.io.FileOutputStream fileOutputStream =
new java.io.FileOutputStream( "tmp.txt" );
com.sun.org.apache.xml.internal.serialize.OutputFormat outputFormat =
new com.sun.org.apache.xml.internal.serialize.OutputFormat
( "XML", "UTF-8", true );
outputFormat.setIndent( 1 );
outputFormat.setIndenting( true );
// outputFormat.setDoctype( null, "users.dtd" );
com.sun.org.apache.xml.internal.serialize.XMLSerializer serializer =
new com.sun.org.apache.xml.internal.serialize.XMLSerializer
( fileOutputStream, outputFormat);
serializer.startDocument();
{ serializer.startElement("", "example", "example",
new com.sun.org.apache.xml.internal.serializer.
AttributesImplSerializer() );
{ final char[] data = "test text" . toCharArray();
serializer.characters( data, 0, data.length );
serializer.endElement( "", "example", "example" ); }
serializer.endDocument(); }}}

Giving:

<?xml version="1.0" encoding="UTF-8"?>
<example>test text</example>

Other pages about writing XML with pre-1.6-Java:

http://www.javazoom.net/services/newsletter/xmlgeneration.html
http://dev.w3.org/cvsweb/java/class.../serialization/xml/XMLWriter.java?rev=1.1.2.4
http://itext.sourceforge.net/src/com/lowagie/text/xml/XmlWriter.java
http://xmlenc.sourceforge.net/j2h/0.39/org/znerd/xmlenc/XMLOutputter.java.html
http://gridportal.fzk.de/cgi-bin/vi...src/util/ucy/old/Attic/XMLWriter.java?rev=1.2
http://www.cin.ufpe.br/~avcc/Projeto-AI/xml4j-2.0/src/com/ibm/xml/parser/FormatPrintVisitor.java
http://www.cin.ufpe.br/~avcc/Projeto-AI/xml4j-2.0/src/com/ibm/xml/parser/util/HTMLPrintVisitor.java
http://ptolemy.berkeley.edu/ptolemyii/ptIIlatest/ptII/diva/gui/MultipageWriter.java
http://search.cpan.org/src/PERRAD/CORBA-JAVA-2.48/XMLOutputStream.java

(If a page is gone from the web try using web.archive.org,
I have not checked all links from my notes again now.)

I also have written a small API to generate text with nested
structures as part of the GPL Java library »ram.jar«.

http://www.purl.org/stefan_ram/pub/ram-jar

Here is an example showing how to use this to create XML.

public class Main
{ public static void main( final java.lang.String[] args )
{ final de.dclj.ram.system.text.LineAppendable lineAppendable = new
de.dclj.ram.system.text.LineAppender( System.out );
lineAppendable.appendLine( "+html", "<html>" );
lineAppendable.appendLine( "+body", "<body>" );
lineAppendable.appendLine( "+p", "<p>" );
lineAppendable.indent();
{ java.util.StringTokenizer tokenizer = new java.util.StringTokenizer
( de.dclj.ram.notation.xml.Text.sourceText
( "Falls jedoch \"i<0\" gilt, dann ist \"i&2\" negativ und der" +
" Leitwert des Materials ist dann nicht definiert. Um dies zu" +
" verhindern, kann eine sogenannte \"Halbfunktion\" in die" +
" Platte gebaut werden. Alle Schwingungen werden dann im" +
" wesentlichen umgeleitet." ).toString());
boolean next = false;
while( tokenizer.hasMoreTokens() )
{ if( next )
{ lineAppendable.addWord( tokenizer.nextToken() ); }
else
{ lineAppendable.appendWord( tokenizer.nextToken() ); next = true; }}}
lineAppendable.appendLine();
lineAppendable.appendLine( "-p", "</p>" );
lineAppendable.appendLine( "-body", "</body>" );
lineAppendable.appendLine( "-html", "</html>" );
lineAppendable.finish(); }}

<html>
<body>
<p>
Falls jedoch "i<0" gilt, dann ist "i&2" negativ
und der Leitwert des Materials ist dann nicht definiert. Um dies zu
verhindern, kann eine sogenannte "Halbfunktion" in die Platte
gebaut werden. Alle Schwingungen werden dann im wesentlichen umgeleitet.
</p>
</body>
</html>

This API is not actually aware of XML or DTDs, but checks
proper nesting (i.e., "+body" must be paired with "-body" at
the same level) and handles indentation and source text
paragraph breaking. It might also be used to generate texts of
other formal languages such as Java source code.
 
S

Stanimir Stamenkov

22 Jul 2008 16:31:54 GMT, /Stefan Ram/:
Another way to write XML with pre-1.6-Java:

com.sun.org.apache.xml.internal.serialize.XMLSerializer serializer =
new com.sun.org.apache.xml.internal.serialize.XMLSerializer
( fileOutputStream, outputFormat);
serializer.startDocument();
{ serializer.startElement("", "example", "example",
new com.sun.org.apache.xml.internal.serializer.
AttributesImplSerializer() );
{ final char[] data = "test text" . toCharArray();
serializer.characters( data, 0, data.length );
serializer.endElement( "", "example", "example" ); }
serializer.endDocument(); }}}

Here's more standard way of doing it without requiring
implementation specific classes (works with Java 1.4):

http://mail-archives.apache.org/mod...s/200801.mbox/<[email protected]>
 
P

Philippe Poulard

Stefan Ram a écrit :
Writing XML is simpler than reading XML.

So, sometimes, one can get by with no library at all:

outStream.print( "<" );
outStream.print( type );
outStream.print( ">" );

hi,

IMHO, this is one of the worst practice

outStream.print( "<" );
outStream.print( type );
outStream.print( ">" );
outStream.print( "if (a<b ) {}" );

you'll have too many occasions to break your XML result ; rely on tools
made for that purpose that will produce well-formed XML

you can also use templates for creating SAX documents with RefleX :
http://reflex.gforge.inria.fr/tips.html#createFromScratch
of course, you can loop on whatever you want to create content (merge
thousand XML files for example:
http://reflex.gforge.inria.fr/tips.html#parsingFragments )

then pipe the created doc to an XSLT serializer :
http://reflex.gforge.inria.fr/tips.html#xsltSerialization

you can use it from the command line, embed it in a program, or deploy
it within a web server

tutorials are available here :
http://reflex.gforge.inria.fr/tutorial.html

I will talk about all that stuff at Balisage 2008 in Montreal, with a
focus on schema languages
http://www.balisage.net/Program.html#h245p

--
Cordialement,

///
(. .)
--------ooO--(_)--Ooo--------
| Philippe Poulard |
-----------------------------
http://reflex.gforge.inria.fr/
Have the RefleX !
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,999
Latest member
MakersCBDGummiesReview

Latest Threads

Top