Rexml - StreamListener - Where I am in the XML?

B

beikel.meikel

Hi,

I'm using REXML::StreamListener to analyze a big xml file. My ruby
code looks like this:

--- code start here ---

require 'rexml/document'
require 'rexml/streamlistener'

class MyListener
include REXML::StreamListener
def tag_start(name, attrs)
# anything to do ...
end
def text(text)
# anything to do ...
end
end

REXML::Document.parse_stream( File.open( xmlfile), MyListener.new)

--- code ends here ---

At the "tag_start" method I need sometimes the information where I am
in the xml. What is my parent tag and so on. Is there is a method to
get this information at this time?

Regards

Michael
 
J

James Britt

Hi,

I'm using REXML::StreamListener to analyze a big xml file. My ruby
code looks like this:

--- code start here ---

require 'rexml/document'
require 'rexml/streamlistener'

class MyListener
include REXML::StreamListener
def tag_start(name, attrs)
# anything to do ...
end
def text(text)
# anything to do ...
end
end

REXML::Document.parse_stream( File.open( xmlfile), MyListener.new)

--- code ends here ---

At the "tag_start" method I need sometimes the information where I am
in the xml. What is my parent tag and so on. Is there is a method to
get this information at this time?

In general, the big value of a stream parser is that it is not holding
onto much state, so memory needs stay small regardless of the size of
the XML. State tracking is left to the application developer.

The REXML pull-parser lets you peek at the next event; not sure offhand
if it goes the other way. But I suspect that with the stream and pull
parsers (one of which sits on the other under the hood, so they are more
or less the same), once an event is off the stack, it is gone.

Stream parsing works really well when you have a large source of
regularly structured data (e.g., XML dump of a database table), such
that you can grab and stash in memory just what you need, work with it
(perhaps as a transient DOM), then discard it and move on.



--
James Britt

"Trying to port the desktop metaphor to the Web is like working
on how to fuel your car with hay because that is what horses eat."
- Dare Obasanjo
 
K

Keith Fahlgren

I'm using REXML::StreamListener to analyze a big xml file. My ruby
code looks like this:
...
At the "tag_start" method I need sometimes the information where I am
in the xml. What is my parent tag and so on. Is there is a method to
get this information at this time?

Hi,

I've used REXML::parsers::pullParser instead of stream parsing (same
general idea), here's an example of a function that waits until it
sees a tag that matches element_name and then pulls the text from it:

def self.get_element_text(filename, element_name)
parser = REXML::parsers::pullParser.new(File.new(filename))
text = false
while parser.has_next?
el = parser.pull
if el.start_element? and el[0] == element_name
text = parser.peek[0]
break
end
end
return text
end

So, while the above certainly won't work for your application, you
could try playing a little with parser.peek to see if you can find the
child node (or next node, whatever) that you're looking for.


HTH,
Keith
 
B

beikel.meikel

@James: I know a SAX parser in another language that has to use like
the REXML::StreamListener. You have additional the information that -
for example - a FirstName-Tag is a member of a User-Tag and so on.
You're right, if I use REXML::StreamListener, I can track the stack
for my self. But there was a chance that I have oversight the right
function in REXML::StreamListener only :)

@Keith: I will check the REXML::parsers::pullParser. Thanks for the
info :)

Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top