rexml exceptions

E

Eric Will

Is there any way to get any useful data out of REXML::parseException
when you're working on a String? It never sets @position or @line or
anything. I need to figure out exactly where the error-causing tag
starts, and save it.

Any ideas?

-- rakaur
 
M

Mark Thomas

Is there any way to get any useful data out of REXML::parseException  
when you're working on a String? It never sets @position or @line or  
anything. I need to figure out exactly where the error-causing tag  
starts, and save it.

Any ideas?

Out of curiosity, I tried libxml. It has nice error messages:

foo.xml:3:
parser error :
Opening and ending tag mismatch: title line 3 and txitle
<title>Foo</txitle>
^

but they go to stdout. You can capture them by registering an error
handler. Sample code:

parser = XML::parser.new
parser.filename = "foo.xml"

msgs = []
XML::parser.register_error_handler lambda { |msg| msgs << msg }

begin
parser.parse
rescue Exception => e
puts "Error: #{msgs}"
end

-- Mark.
 
E

Eric Will

Out of curiosity, I tried libxml. It has nice error messages:

foo.xml:3:
parser error :
Opening and ending tag mismatch: title line 3 and txitle
<title>Foo</txitle>
^

but they go to stdout. You can capture them by registering an error
handler. Sample code:

parser = XML::parser.new
parser.filename = "foo.xml"

msgs = []
XML::parser.register_error_handler lambda { |msg| msgs << msg }

begin
parser.parse
rescue Exception => e
puts "Error: #{msgs}"
end

Interesting. I was thinking about doing libxml anyway. I do not like REXML.

Thanks.

-- rakaur
 
E

Eric Will

Actually, this isn't working for me. I'm using the SAX parser, and it
just calls Listener#on_parser_error with a string. Not helping me.
 
M

Mark Thomas

Actually, this isn't working for me. I'm using the SAX parser, and it
just calls Listener#on_parser_error with a string. Not helping me.

Why would you want to do that? You already have the XML as a string.
The only reason to put up with the awful interface and extra
complexity of SAX would be if your file doesn't fit into memory. And I
don't think the SAX interface to libxml is as complete/robust yet as
the DOM interface.

Go with the DOM interface. With libxml it's plenty fast.

-- Mark.
 
E

Eric Will

My situation requires SAX, unfortunately.

I need to parse and react to each tag as in comes in. If there's a
broken one, all tags up to the broken one must be processed, and the
broken one must be stored. I cannot do this in DOM, because if there's
an error, DOM will not process anything.
 
E

Eric Will

Also, I don't think those error messages can help me location the
position in the string of the bad XML. They're pretty, for sure, but
not very useful to anyone but a human.
 
M

Mark Thomas

My situation requires SAX, unfortunately.

I need to parse and react to each tag as in comes in. If there's a
broken one, all tags up to the broken one must be processed, and the
broken one must be stored. I cannot do this in DOM, because if there's
an error, DOM will not process anything.

If you can receive an entire document at a time, libxml has a
'recover' mode that will correct what it can and process the entire
document -- even if it is not well-formed. It works surprisingly well.

Another option is writing your own recursive descent parser. See
http://snippets.dzone.com/posts/show/2190 for a starting point.

-- Mark.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,218
Latest member
JolieDenha

Latest Threads

Top