Help needed with rexml

Michael · Aug 28, 2005

I've been struggling to properly parse some XML with rexml. I will fully
admit my XML ignorance in advance. It would be easy enough to parse
this with a regular expression instead, but I would prefer to use the
right tool.

Here's a sample XML response:

<?xml version='1.0' encoding="iso-8859-1" ?>
<methodResponse>
<fault>
<value>
<struct>
<member>
<name>faultCode</name>
<value>
<int>5</int>
</value>
</member>
<member>
<name>faultString</name>
<value>
<string>system error (nologin)</string>
</value>
</member>
</struct>
</value>
</fault>
</methodResponse>

However, I can't anything useful out of it. For instance, I've been
trying something like this:

require 'rexml/document'

file = File.new("test.xml")
xml = REXML:

ocument.new(file)
xml.elements.each { |i|
i.texts.each { |t|
puts "Class: #{t.class}"
puts "Value: #{t.value}"
puts "String: #{t.to_s}"
}
}

This doesn't print anything useful for the class. Where am I going wrong
with this? I've been digging through the documentation but I'm must not
getting it.

For what it's worth, I can parse this in perl easily enough (which
suggests to me the XML is valid):

use Data:

umper;
use XML::Simple; # AKA "XML For Idiots"

my $ref = XMLin("test.xml"); # A file containing the XML above
print Dumper $ref, "\n";

I can then use the results to figure out how to dereference $ref to pull
the error information returned by the server.

Responses to the list or the newsgroup, please, for future googlers
to find.

Robert Klemme · Aug 28, 2005

2005/8/28 said:
I've been struggling to properly parse some XML with rexml. I will fully
admit my XML ignorance in advance. It would be easy enough to parse
this with a regular expression instead, but I would prefer to use the
right tool.
=20
Here's a sample XML response:
=20
<?xml version=3D'1.0' encoding=3D"iso-8859-1" ?>
<methodResponse>
<fault>
<value>
<struct>
<member>
<name>faultCode</name>
<value>
<int>5</int>
</value>
</member>
<member>
<name>faultString</name>
<value>
<string>system error (nologin)</s= tring>
</value>
</member>
</struct>
</value>
</fault>
</methodResponse>
=20
However, I can't anything useful out of it. For instance, I've been
trying something like this:
=20
require 'rexml/document'
=20
file =3D File.new("test.xml")
xml =3D REXML:ocument.new(file)
xml.elements.each { |i|
i.texts.each { |t|
puts "Class: #{t.class}"
puts "Value: #{t.value}"
puts "String: #{t.to_s}"
}
}

I'd start with something like this (untested, from memory):

xml.elements.each do |elem|
p elem.node_type
p elem.text
end

This doesn't print anything useful for the class. Where am I going wrong
with this? I've been digging through the documentation but I'm must not
getting it.

What exactly do you want to extract? You'll likely want some kind of
XPath expression with #each like in the tutorial.

Did you look at the tutorial?
http://www.germane-software.com/software/rexml/docs/tutorial.html

Kind regards

robert

zerohalo · Aug 28, 2005

Michael, I came across the same problem recently when using ruby/rexml
for the first time.

The reason why you're not getting results is because each_element and
each_element_with_attribute commands only iterate through the element's
immediate children. They don't recurse through all the descendants. So
what you're probably getting is just the root element and none of the
children.

If you need to iterate through all the elements in the whole document,
then use the XPath.each command. For example XPath.each('/////methods')
{ |x| whatever you want to do with them } should work. That's what I
finally had to do in my recent experience. I'm not sure what XPath
search you would use to go through ALL of the elements in the document,
but with some experimentaiton you'll probably find it. (And post what
you find!)

There may be a better way to do this and I posted something about this
a couple of days ago, but received no response. It seems that
each_element and each_element_with_attribute should include an option
to recurse through all the descendents, but unfortuantely it doesn't
seem to (or at least I couldn't find it).

Maybe that could be added to a new release of rexml (if the dev is
reading this?)

zerohalo · Aug 28, 2005

PS. There's a great XML plugin for JEdit that will show you the results
of an XPath search on your document. That way you can try out different
variations until you get the result set that you want without having to
run your script each time to test it.

zerohalo · Aug 28, 2005

Correction to my last post. It's in the XSLT plugin.

Trans · Aug 28, 2005

Written off teh top of my head, but you could write your own.

class REXML::Element
def each_element_recurse
each_element { |e|
unless e.children.empty? rescue false
e.each_element_recurse
end
yield(e)
end
end

I made a first stab at it b/c I will probably need it myself soon.

(Yes, I know I'm reopening a standard class!

)

T.

James Britt · Aug 28, 2005

Trans said:
Written off teh top of my head, but you could write your own.

class REXML::Element
def each_element_recurse
each_element { |e|
unless e.children.empty? rescue false
e.each_element_recurse
end
yield(e)
end
end

I made a first stab at it b/c I will probably need it myself soon.

(Yes, I know I'm reopening a standard class! )

If you really think you need to visit every element you may be better
off using the stream or pull parsers.

James

--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys

zerohalo · Aug 28, 2005

James, would you mind pointing to a link that explains how to do this?
I couldn't find reference to it in the rexml documentation. Tx!

Michael · Aug 29, 2005

Michael, I came across the same problem recently when using ruby/rexml
for the first time.

The reason why you're not getting results is because each_element and
each_element_with_attribute commands only iterate through the element's
immediate children. They don't recurse through all the descendants. So
what you're probably getting is just the root element and none of the
children.

If you need to iterate through all the elements in the whole document,
then use the XPath.each command. For example XPath.each('/////methods')
{ |x| whatever you want to do with them } should work. That's what I
finally had to do in my recent experience. I'm not sure what XPath
search you would use to go through ALL of the elements in the document,
but with some experimentaiton you'll probably find it. (And post what
you find!)

There may be a better way to do this and I posted something about this
a couple of days ago, but received no response. It seems that
each_element and each_element_with_attribute should include an option
to recurse through all the descendents, but unfortuantely it doesn't
seem to (or at least I couldn't find it).

Thanks! This is what I'm looking for. I read through the tutorial and
the rdoc documentation but I just couldn't figure out what I was missing.

When I mentioned iterating through the document, what I'm really doing
is describing the process I've been using for deciding how to handle
some arbitrary XML document I wound up with. It's not ideal, I admit. It
might do me some good to read up on XML.
--Michael

James Britt · Aug 29, 2005

zerohalo said:
James, would you mind pointing to a link that explains how to do this?
I couldn't find reference to it in the rexml documentation. Tx!

I don't think there is much written about the pull parser (though I
should see an article of mine on the topic published in a mainstream
geek mag in the next few months. I hope.)

Back in 2001 I wrote an article on using the REXML stream parser that
may be relevant, and possibly accurate:

http://www.rubyxml.com/articles/REXML/Stream_Parsing_with_REXML

The pull parser sits below all the other REXML parsers, and has a
sparser API, but is quite handy for many things.

The basic idea is to pull events from the parser, see what you've got
(start_element? end_element? text?), and act on it. You can also push
things back onto the parse stream, too, as well as peek down the stream
to see what's ahead (while not disrupting the current stream order).

# Simple example:
require 'rexml/parsers/pullparser'

# foo.xml has
# <foo>
# <baz>This is baz</baz>
# <bar>Ignore me!</bar>
# <baz>This is baz, also</baz>
# </foo>

File.open( 'foo.xml', 'r' ) do |f|
parser = REXML:

arsers:

ullParser.new( f )
while parser.has_next?
pull_event = parser.pull
puts( "Element: " + pull_event[0] ) if pull_event.start_element?
if pull_event.start_element? and pull_event[0] == 'baz'
while !(pull_event = parser.pull).end_element?
puts pull_event[0] if pull_event.text?
end
end
end
end

Or something like that.

See also

http://www.ruby-doc.org/stdlib/libdoc/rexml/rdoc/classes/REXML/Parsers/PullParser.html

James

--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys

Guest · Aug 29, 2005

[...]

What exactly do you want to extract? You'll likely want some kind of
XPath expression with #each like in the tutorial.
Did you look at the tutorial?
http://www.germane-software.com/software/rexml/docs/tutorial.html

Thanks for the reply. I did read the tutorial--I just wasn't making the
necessary connections.
--Michael

Zach Dennis · Aug 29, 2005

What type of information do you want to get out of this? You never
posted what you *thought* your sample ruby code would give you. I ran
your perl example, and it looks like the xml document but where < > are
gsub'd for { }.

Here is an example which shows some xpath usage.:

require 'rexml/document'

file = File.new("test.xml")
root = REXML:

ocument.new(file).root
fault_arr = root.elements.each( "fault" ) do |e1|
e1.elements.each( "value/struct/member" ) do |e2|
e2.elements.each( '*' ) { |e3| print e3.text.strip }
e2.elements.each( '*/*' ){ |e3| puts " " + e3.text.strip }
end
end

puts "Message response faulted!" if fault_arr.length > 0

Zach

zerohalo · Aug 29, 2005

Thanks, James, I'll study that.

By the way,I've tried before to access rubyxml.com (which seems to be
your site?) which I had found when googling for rexml, and there
doesn't seem to be any way toget to past articles. Maybe there's a
sidebar or something but it doesn't show up in Firefox or Opera on
Linux (I can't try IE as I don't have it). Or am I missing something?

David Jacobs · Aug 30, 2005

2005/8/29 said:
Thanks, James, I'll study that.
=20
By the way,I've tried before to access rubyxml.com (which seems to be
your site?) which I had found when googling for rexml, and there
doesn't seem to be any way toget to past articles. Maybe there's a
sidebar or something but it doesn't show up in Firefox or Opera on
Linux (I can't try IE as I don't have it). Or am I missing something?

Google on REXML and you get some good results!
Start here:
http://raa.ruby-lang.org/project/rexml
and you'll get to
http://www.germane-software.com/software/rexml/
and
http://www.germane-software.com/software/rexml/docs/tutorial.html

Have fun!
Cheers,
David

James Britt · Aug 31, 2005

zerohalo said:
Thanks, James, I'll study that.

By the way,I've tried before to access rubyxml.com (which seems to be
your site?) which I had found when googling for rexml, and there
doesn't seem to be any way toget to past articles. Maybe there's a
sidebar or something but it doesn't show up in Firefox or Opera on
Linux (I can't try IE as I don't have it). Or am I missing something?

No, the site is missing some obvious UI clues for friendlier usage.

You can see past items by tweaking the URL:

http://rubyxml.com/index.rb/2004/12
Shows items from December of 2004

http://rubyxml.com/index.rb/2005
Shows items from 2005.

http://rubyxml.com/index.rb/Articles
Shows items in the Articles category

http://rubyxml.com/index.rb/Applications
Shows items in the Applications category

More or less.

James

--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys

help please with REXML	3	Jul 16, 2010
REXML Speed Question	3	Apr 8, 2011
REXML	19	Nov 6, 2006
Odd puts behaviour with REXML	7	Nov 30, 2009
Errors on REXML reading an HTML.	1	Dec 24, 2010
Stream Parsing with REXML	12	Jan 12, 2008
rexml help	3	Sep 3, 2004
XML to CSV with REXML - I'm sure this should be easy...	7	Mar 17, 2009

Help needed with rexml

Michael

Robert Klemme

zerohalo

zerohalo

zerohalo

Trans

James Britt

zerohalo

Michael

James Britt

Guest

Zach Dennis

zerohalo

David Jacobs

James Britt

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads