Help needed with rexml

M

Michael

I've been struggling to properly parse some XML with rexml. I will fully
admit my XML ignorance in advance. It would be easy enough to parse
this with a regular expression instead, but I would prefer to use the
right tool.

Here's a sample XML response:

<?xml version='1.0' encoding="iso-8859-1" ?>
<methodResponse>
<fault>
<value>
<struct>
<member>
<name>faultCode</name>
<value>
<int>5</int>
</value>
</member>
<member>
<name>faultString</name>
<value>
<string>system error (nologin)</string>
</value>
</member>
</struct>
</value>
</fault>
</methodResponse>

However, I can't anything useful out of it. For instance, I've been
trying something like this:

require 'rexml/document'

file = File.new("test.xml")
xml = REXML::Document.new(file)
xml.elements.each { |i|
i.texts.each { |t|
puts "Class: #{t.class}"
puts "Value: #{t.value}"
puts "String: #{t.to_s}"
}
}

This doesn't print anything useful for the class. Where am I going wrong
with this? I've been digging through the documentation but I'm must not
getting it.

For what it's worth, I can parse this in perl easily enough (which
suggests to me the XML is valid):

use Data::Dumper;
use XML::Simple; # AKA "XML For Idiots"

my $ref = XMLin("test.xml"); # A file containing the XML above
print Dumper $ref, "\n";

I can then use the results to figure out how to dereference $ref to pull
the error information returned by the server.

Responses to the list or the newsgroup, please, for future googlers
to find.
 
R

Robert Klemme

2005/8/28 said:
I've been struggling to properly parse some XML with rexml. I will fully
admit my XML ignorance in advance. It would be easy enough to parse
this with a regular expression instead, but I would prefer to use the
right tool.
=20
Here's a sample XML response:
=20
<?xml version=3D'1.0' encoding=3D"iso-8859-1" ?>
<methodResponse>
<fault>
<value>
<struct>
<member>
<name>faultCode</name>
<value>
<int>5</int>
</value>
</member>
<member>
<name>faultString</name>
<value>
<string>system error (nologin)</s= tring>
</value>
</member>
</struct>
</value>
</fault>
</methodResponse>
=20
However, I can't anything useful out of it. For instance, I've been
trying something like this:
=20
require 'rexml/document'
=20
file =3D File.new("test.xml")
xml =3D REXML::Document.new(file)
xml.elements.each { |i|
i.texts.each { |t|
puts "Class: #{t.class}"
puts "Value: #{t.value}"
puts "String: #{t.to_s}"
}
}

I'd start with something like this (untested, from memory):

xml.elements.each do |elem|
p elem.node_type
p elem.text
end
This doesn't print anything useful for the class. Where am I going wrong
with this? I've been digging through the documentation but I'm must not
getting it.

What exactly do you want to extract? You'll likely want some kind of
XPath expression with #each like in the tutorial.

Did you look at the tutorial?
http://www.germane-software.com/software/rexml/docs/tutorial.html

Kind regards

robert
 
Z

zerohalo

Michael, I came across the same problem recently when using ruby/rexml
for the first time.

The reason why you're not getting results is because each_element and
each_element_with_attribute commands only iterate through the element's
immediate children. They don't recurse through all the descendants. So
what you're probably getting is just the root element and none of the
children.

If you need to iterate through all the elements in the whole document,
then use the XPath.each command. For example XPath.each('/////methods')
{ |x| whatever you want to do with them } should work. That's what I
finally had to do in my recent experience. I'm not sure what XPath
search you would use to go through ALL of the elements in the document,
but with some experimentaiton you'll probably find it. (And post what
you find!)

There may be a better way to do this and I posted something about this
a couple of days ago, but received no response. It seems that
each_element and each_element_with_attribute should include an option
to recurse through all the descendents, but unfortuantely it doesn't
seem to (or at least I couldn't find it).

Maybe that could be added to a new release of rexml (if the dev is
reading this?)
 
Z

zerohalo

PS. There's a great XML plugin for JEdit that will show you the results
of an XPath search on your document. That way you can try out different
variations until you get the result set that you want without having to
run your script each time to test it.
 
T

Trans

Written off teh top of my head, but you could write your own.

class REXML::Element
def each_element_recurse
each_element { |e|
unless e.children.empty? rescue false
e.each_element_recurse
end
yield(e)
end
end

I made a first stab at it b/c I will probably need it myself soon.

(Yes, I know I'm reopening a standard class! :p)

T.
 
J

James Britt

Trans said:
Written off teh top of my head, but you could write your own.

class REXML::Element
def each_element_recurse
each_element { |e|
unless e.children.empty? rescue false
e.each_element_recurse
end
yield(e)
end
end

I made a first stab at it b/c I will probably need it myself soon.

(Yes, I know I'm reopening a standard class! :p)

If you really think you need to visit every element you may be better
off using the stream or pull parsers.


James


--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
 
Z

zerohalo

James, would you mind pointing to a link that explains how to do this?
I couldn't find reference to it in the rexml documentation. Tx!
 
M

Michael

Michael, I came across the same problem recently when using ruby/rexml
for the first time.

The reason why you're not getting results is because each_element and
each_element_with_attribute commands only iterate through the element's
immediate children. They don't recurse through all the descendants. So
what you're probably getting is just the root element and none of the
children.

If you need to iterate through all the elements in the whole document,
then use the XPath.each command. For example XPath.each('/////methods')
{ |x| whatever you want to do with them } should work. That's what I
finally had to do in my recent experience. I'm not sure what XPath
search you would use to go through ALL of the elements in the document,
but with some experimentaiton you'll probably find it. (And post what
you find!)

There may be a better way to do this and I posted something about this
a couple of days ago, but received no response. It seems that
each_element and each_element_with_attribute should include an option
to recurse through all the descendents, but unfortuantely it doesn't
seem to (or at least I couldn't find it).

Thanks! This is what I'm looking for. I read through the tutorial and
the rdoc documentation but I just couldn't figure out what I was missing.

When I mentioned iterating through the document, what I'm really doing
is describing the process I've been using for deciding how to handle
some arbitrary XML document I wound up with. It's not ideal, I admit. It
might do me some good to read up on XML.
--Michael
 
J

James Britt

zerohalo said:
James, would you mind pointing to a link that explains how to do this?
I couldn't find reference to it in the rexml documentation. Tx!

I don't think there is much written about the pull parser (though I
should see an article of mine on the topic published in a mainstream
geek mag in the next few months. I hope.)

Back in 2001 I wrote an article on using the REXML stream parser that
may be relevant, and possibly accurate:

http://www.rubyxml.com/articles/REXML/Stream_Parsing_with_REXML

The pull parser sits below all the other REXML parsers, and has a
sparser API, but is quite handy for many things.

The basic idea is to pull events from the parser, see what you've got
(start_element? end_element? text?), and act on it. You can also push
things back onto the parse stream, too, as well as peek down the stream
to see what's ahead (while not disrupting the current stream order).

# Simple example:
require 'rexml/parsers/pullparser'

# foo.xml has
# <foo>
# <baz>This is baz</baz>
# <bar>Ignore me!</bar>
# <baz>This is baz, also</baz>
# </foo>

File.open( 'foo.xml', 'r' ) do |f|
parser = REXML::parsers::pullParser.new( f )
while parser.has_next?
pull_event = parser.pull
puts( "Element: " + pull_event[0] ) if pull_event.start_element?
if pull_event.start_element? and pull_event[0] == 'baz'
while !(pull_event = parser.pull).end_element?
puts pull_event[0] if pull_event.text?
end
end
end
end

Or something like that.

See also

http://www.ruby-doc.org/stdlib/libdoc/rexml/rdoc/classes/REXML/Parsers/PullParser.html

James

--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
 
Z

Zach Dennis

What type of information do you want to get out of this? You never
posted what you *thought* your sample ruby code would give you. I ran
your perl example, and it looks like the xml document but where < > are
gsub'd for { }.

Here is an example which shows some xpath usage.:

require 'rexml/document'

file = File.new("test.xml")
root = REXML::Document.new(file).root
fault_arr = root.elements.each( "fault" ) do |e1|
e1.elements.each( "value/struct/member" ) do |e2|
e2.elements.each( '*' ) { |e3| print e3.text.strip }
e2.elements.each( '*/*' ){ |e3| puts " " + e3.text.strip }
end
end

puts "Message response faulted!" if fault_arr.length > 0

Zach
 
Z

zerohalo

Thanks, James, I'll study that.

By the way,I've tried before to access rubyxml.com (which seems to be
your site?) which I had found when googling for rexml, and there
doesn't seem to be any way toget to past articles. Maybe there's a
sidebar or something but it doesn't show up in Firefox or Opera on
Linux (I can't try IE as I don't have it). Or am I missing something?
 
D

David Jacobs

2005/8/29 said:
Thanks, James, I'll study that.
=20
By the way,I've tried before to access rubyxml.com (which seems to be
your site?) which I had found when googling for rexml, and there
doesn't seem to be any way toget to past articles. Maybe there's a
sidebar or something but it doesn't show up in Firefox or Opera on
Linux (I can't try IE as I don't have it). Or am I missing something?

Google on REXML and you get some good results!
Start here:
http://raa.ruby-lang.org/project/rexml
and you'll get to
http://www.germane-software.com/software/rexml/
and
http://www.germane-software.com/software/rexml/docs/tutorial.html

Have fun!
Cheers,
David
 
J

James Britt

zerohalo said:
Thanks, James, I'll study that.

By the way,I've tried before to access rubyxml.com (which seems to be
your site?) which I had found when googling for rexml, and there
doesn't seem to be any way toget to past articles. Maybe there's a
sidebar or something but it doesn't show up in Firefox or Opera on
Linux (I can't try IE as I don't have it). Or am I missing something?

No, the site is missing some obvious UI clues for friendlier usage.

You can see past items by tweaking the URL:

http://rubyxml.com/index.rb/2004/12
Shows items from December of 2004

http://rubyxml.com/index.rb/2005
Shows items from 2005.


http://rubyxml.com/index.rb/Articles
Shows items in the Articles category

http://rubyxml.com/index.rb/Applications
Shows items in the Applications category


More or less.


James

--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
474,262
Messages
2,571,048
Members
48,769
Latest member
Clifft

Latest Threads

Top