REXml help - Insert newlines into large xml file

S

Sean Nakasone

Hello, I have a large xml file that does not have any newlines in it. Can
someone please provide some code to use REXML to simply read in an xml,
insert newlines after the xml sections or elements, then spit it out to
stdout. This way I'd at least be able to open the xml file in an editor
so I can read what kind of format it has. I don't know anything about
REXML so it needs to be a somewhat complete script. thank you.
 
M

Mikel Lindsaar

Note: parts of this message were removed by the gateway to make it a legal Usenet post.

You could use HTML tidy for this I think.
Mikel
 
P

Phrogz

Hello, I have a large xml file that does not have any newlines in it. Can
someone please provide some code to use REXML to simply read in an xml,
insert newlines after the xml sections or elements, then spit it out to
stdout. This way I'd at least be able to open the xml file in an editor
so I can read what kind of format it has. I don't know anything about
REXML so it needs to be a somewhat complete script. thank you.

irb(main):001:0> require 'rexml/document'

irb(main):002:0> doc = REXML::Document.new( "<root><child><grandkid/></
child></root>" )

irb(main):003:0> doc.write $stdout

irb(main):004:0> doc.write $stdout, 0
<root>
<child>
<grandkid/>
</child>
</root>


You can use IO.read("somefile.xml") to read the contents into a single
string.
You can pass a file to the REXML::Document#write method instead of
$stdout, e.g.

File.open( "with_newlines.xml", "w" ){ |file|
doc.write( file, 0 )
}
 
P

Phrogz

Hello, I have a large xml file that does not have any newlines in it. Can
someone please provide some code to use REXML to simply read in an xml,
insert newlines after the xml sections or elements, then spit it out to
stdout. This way I'd at least be able to open the xml file in an editor
so I can read what kind of format it has. I don't know anything about
REXML so it needs to be a somewhat complete script. thank you.

For more information on REXML, see the official tutorial. It covers
this question directly and plainly, as well as a whole host of others.

http://www.germane-software.com/software/rexml/docs/tutorial.html
 
M

Michael Fellinger

Hello, I have a large xml file that does not have any newlines in it. Can
someone please provide some code to use REXML to simply read in an xml,
insert newlines after the xml sections or elements, then spit it out to
stdout. This way I'd at least be able to open the xml file in an editor
so I can read what kind of format it has. I don't know anything about
REXML so it needs to be a somewhat complete script. thank you.

Step 1) Install tidy
Step 2) tidy -i yourfile.xml
Step 3) tidy --help

REXML is very bad for handling such things.

^ manveru
 
P

Phrogz

Step 1) Install tidy
Step 2) tidy -i yourfile.xml
Step 3) tidy --help

REXML is very bad for handling such things.

For the record, would you care to clarify and justify that statement?
 
J

Jari Williamsson

Michael said:
Step 1) Install tidy
Step 2) tidy -i yourfile.xml
Step 3) tidy --help

REXML is very bad for handling such things.

Not at all. If you're using an an older version of the standard library,
you prettify the XML using doc.write(output, 0), as in Phrogz' example.

For newer versions of REXML, use the REXML::Formatter class instead
which gives you much more control over the prettifier.


Best regards,

Jari Williamsson
 
M

Michael Fellinger

For the record, would you care to clarify and justify that statement?

Sure, i've tried for quite some time to get REXML to a point where it
really pretty-prints any document, but apart from implementing a whole
streamlistener that keeps track of indentation and width there doesn't
seem to be any. The new REXML works a bit better but inserts lots of
whitespace at the wrong places.
Unfortunately tidy has a memory-leak, so i cannot recommend the
bindings if your process is running over a longer period. Of course
you could start it in another process, but then the CLI tool is good
enough already.

REXML::VERSION
# "3.1.6"

http://pastie.caboo.se/126905

^ manveru
 
J

Jari Williamsson

Michael said:
Sure, i've tried for quite some time to get REXML to a point where it
really pretty-prints any document, but apart from implementing a whole
streamlistener that keeps track of indentation and width there doesn't
seem to be any. The new REXML works a bit better but inserts lots of
whitespace at the wrong places.
[]

REXML::VERSION
# "3.1.6"

Don'tunderstand what you mean by "new REXML", since you seem to be using
an old one. I'm on 3.1.7.1, and here you do, for example:

formatter = REXML::Formatters::pretty.new( 3 )
formatter.compact = true
formatter.write( doc, $stdout)


Best regards,

Jari Williamsson
 
M

Michael Fellinger

Michael said:
Sure, i've tried for quite some time to get REXML to a point where it
really pretty-prints any document, but apart from implementing a whole
streamlistener that keeps track of indentation and width there doesn't
seem to be any. The new REXML works a bit better but inserts lots of
whitespace at the wrong places.
[]

REXML::VERSION
# "3.1.6"

Don'tunderstand what you mean by "new REXML", since you seem to be using
an old one. I'm on 3.1.7.1, and here you do, for example:

By new i mean 3.1.7 - which has formatters. But the one that ships
with ruby is still 3.1.6 - if i require a dependency then i can just
use tidy instead, no?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top