libxml2/xpath

M

Maxim Khesin

I do not believe it is... You can see the doc by clicking on the link.
Does it have to be?
thanks,
m

Martijn said:
Maxim said:
I am trying to do some xpath on
http://fluidobjects.com/doc.xhtml
but cannot get past 'point A' (that is, I am totally stuck):

import libxml2
mydoc = libxml2.parseDoc(text)
mydoc.xpathEval('/html')
[]



this returns an empty resultlist, which just seems plain wrong. Can
anyone
throw a suggestion to the stupid?


Is the html element in a namespace?

Regards,

Martijn
 
F

Frans Englich

I do not believe it is... You can see the doc by clicking on the link.
Does it have to be?

No, but your XPath statements must match the namespace, no matter what it is.
The document do have a namespace -- as XHTML should:

<html xmlns="http://www.w3.org/1999/xhtml">

The problem with this:

mydoc.xpathEval('/html')

is that it tries to match a top element whose local name is "html" and whose
namespace is null, but in the source document, the local name is "html" but
the namespace is not null, it's "http://www.w3.org/1999/xhtml" --> they don't
match.

The solution is to have a matching namespace, such that the whole qualified
name matches. AFAIK, it is done like this in libxml2:

# confDocument is a libxml2 document, from parseFile() etc
xp = confDocument.xpathNewContext()
xp.xpathRegisterNs("xhtml", "http://www.w3.org/1999/xhtml")
dirElement = xp.xpathEval( "/xhtml:html" )


Cheers,

Frans
 
M

maxk

# confDocument is a libxml2 document, from parseFile() etc
xp = confDocument.xpathNewContext()
xp.xpathRegisterNs("xhtml", "http://www.w3.org/1999/xhtml")
dirElement = xp.xpathEval( "/xhtml:html" )

Stupid question, but can the namespace somehow be changed to null to
make queries simpler?

thanks,
max.
 
F

Frans Englich

Stupid question, but can the namespace somehow be changed to null to
make queries simpler?

(I am no libxml2, XML, or Python expert)

There's a danger to that; the namespace is there for a reason. For example, if
you put all elements in the document into one namespace, that could mean that
you get elements which not is XHTML, but something else, totally different.
In case you want to ignore namespaces, you must be 100% sure what the files
to be processed contains, and that all namespaces that are thrown together
can be treated equally.

Regarding removing the namespace; you could probably process the DOM tree and
remove all namespaces, before doing any XPath lookups. Perhaps libxml2 has
utility functions for things like this(something like recursively set
namespace for an element). Standard namespace-aware DOM probably has it..

Feel free to post your findings afterwards, although I wouldn't do it in the
first place :)


Cheers,

Frans
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top