Which behaves correctly, Hpricot or Nokogiri?

  • Thread starter Just Another Victim of the Ambient Morality
  • Start date
J

Just Another Victim of the Ambient Morality

I've been considering switching to Nokogiri instead of Hpricot, mostly
'cause Mechanize has switched. However, the two actually behave quite
differently. The Nokogiri objects don't simulate standard container
behavior nearly as well as Hpricot. I also noticed that this:


require 'nokogiri'
require 'hpricot'

xml = '<first look="Big &amp; small...">content</first>'

doc = Nokogiri::XML(xml)
puts doc.search('first')[0].attributes['look']
doc = Hpricot(xml)
puts doc.search('first')[0].attributes['look']


...produces this output:


Big &amp; small...
Big & smal...


I don't know which output is the correct one. Does anyone know what's
going on here?
Thank you...
 
R

Rados³aw Bu³at

xml =3D '<first look=3D"Big & small...">content</first>'

It's not valid xml. It should be "Big &amp; small..."
I guess that for non-valid xml there is now "valid" behavior. Ask
hpricot and nokogiri developers what happen when xml is not valid
(they try to fix it or smth?)
Big & small...
Big & smal...

Strange. I get:
Big small...
Big & small...

The difference is about '&' which is not valid in xml (&amp; should be
used instead).

--=20
Pozdrawiam

Rados=B3aw Bu=B3at
http://radarek.jogger.pl - m=F3j blog
 
J

Just Another Victim of the Ambient Morality

Rados³aw Bu³at said:
It's not valid xml. It should be "Big &amp; small..."
I guess that for non-valid xml there is now "valid" behavior. Ask
hpricot and nokogiri developers what happen when xml is not valid
(they try to fix it or smth?)

Actually, "Big &amp; small" is what I wrote in the example. The second
output is erroneously missing an "l" but I think that's understood...
I'm wondering if anyone knows what the correct behaviour is supposed to
be...

Oh, I get it. Maybe my use of & amp ; was translated in whatever client
you're using?
 
P

Phlip

Just said:
doc = Nokogiri::XML(xml)
puts doc.search('first')[0].attributes['look']
doc = Hpricot(xml)
puts doc.search('first')[0].attributes['look']

...produces this output:


Big &amp; small...
Big & smal...

I don't know which output is the correct one. Does anyone know what's
going on here?

The second one is correct, because &amp; is an encoding, and an XML tool should
use & outside its interface and &amp; inside its interface.

Now try these XPaths in Hpricot and NokoGiri - which combinations find the node?

first[ @look = 'Big &amp; small...' ]
first[ @look = 'Big & small...' ]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,281
Latest member
Pedroaciny

Latest Threads

Top