xml:id

M

Michael Jung

I have a problem with transfering xml:ids from one document to another,
sample code is attached. Somehow the id attribute gets lost. Am I doing
something wrong, missing something, or is this a bug in the XML libs (I
use the ones supplied with the standard JDK)? Maybe this is a "feature"?
It is rather annoying if this doesn't work, since it forces me to to
travers the tree and do id handling myself.

The code produces the same output under OpenJDK, Sun's JDK 1.6 and 1.5:
: [elem: null]
: [elem: null]
: null

=== SimleTest.java ===
import java.io.File;
import javax.xml.parsers.*;
import org.w3c.dom.*;

public class SimpleTest {
public static void main(String[] a) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema");
DocumentBuilder docBuilder = dbf.newDocumentBuilder();
Document parsed = docBuilder.parse(new File("test.xml"));
System.out.println(parsed.getElementById("x"));
Document parsed2 = docBuilder.parse(new File("test.xml"));
Element el = parsed.getElementById("x");
el.setAttribute("id", "x2");
System.out.println(parsed.getElementById("x2"));
// I have tried importNode as well, that even loses the "isId"
// property of the "id" tag.
parsed2.adoptNode(el);
// I definitely want to avoid the next call, since I'd need to
// traverse in production code. But it is useless anyway.
el.setIdAttribute("id", true);
System.out.println(parsed2.getElementById("x2"));
}
}
=== test.xml ===
<?xml version="1.0" encoding="UTF-8"?>
<test xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="./test.xsd">
<elem id="x"/>
</test>
=== test.xsd ===
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="test">
<xs:complexType>
<xs:choice>
<xs:element name="elem">
<xs:complexType>
<xs:attribute name="id" type="xs:ID" />
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema>
 
M

Michael Jung

Lee Fesperman said:
I have a problem with transfering xml:ids from one document to another,
sample code is attached. Somehow the id attribute gets lost. Am I doing
something wrong, missing something, or is this a bug in the XML libs (I
use the ones supplied with the standard JDK)? Maybe this is a "feature"?
It is rather annoying if this doesn't work, since it forces me to to
travers the tree and do id handling myself.

The code produces the same output under OpenJDK, Sun's JDK 1.6 and 1.5:
: [elem: null]
: [elem: null]
: null

=== SimleTest.java === [...]
=== test.xml === [..]
=== test.xsd ===
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
[...]

Try removing 'elementFormDefault="qualified"' from your schema (xsd).
Some schema processors may require qualification in your xml (for
'elem'), even though qualification is not possible in your case ...
because you have no (target)Namespace.

I have (a) removed the elementForDefault, (b) set it to unqualified, and (c)
even added namespaces:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://z" elementFormDefault="qualified">

and

<test xmlns="http://z" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://z ./test.xsd">

No difference in output. I also don't think that this has to do with
the validation process, since that has passed successfully. The ID
property is known for both parsed files. It is simply forgotten during
adopt(ion) of a node. (Though both documents even adhere to the same schema!)

Michael
 
A

Arved Sandstrom

On 11-12-19 05:27 AM, Michael Jung wrote:
[ SNIP ]
No difference in output. I also don't think that this has to do with
the validation process, since that has passed successfully. The ID
property is known for both parsed files. It is simply forgotten during
adopt(ion) of a node. (Though both documents even adhere to the same schema!)

Michael

Not "forgotten" exactly. I tested with JDK 1.6 and 1.7, and I find that
if the adopted node is placed _somewhere_ (appendChild or what have
you), that if you obtain the list of "elem" elements in the new
document, that you'll have 2 of them, and that if you inspect the isId()
value of the 'id" attributes, that both of them are TRUE.

Furthermore, if you retrieve by getElementById(), using "x" actually
returns an element, but using "x2" does not. So that tells me that
things are dubious overall if you've performed your scenario.

Seems to me that this is creating undefined behaviour (getElementById,
for example, calls this out). Neither importNode nor adoptNode say that
they are _replacing_ anything, for starters.

Further experimentation indicates (to me) that
Document.normalizeDocument() is either neutral or unhelpful at any point
that I've tried.

What *does* work is to (1) remove the first element, the one with value
"x", and (2) to then setIdAttribute() on the adopted/placed node with id
attribute of "x2".

Maybe I'm missing something, but if you're willing to call adoptNode()
in the course of doing what you're doing, what's the problem with
removing the element that is going to conflict, and also calling
setIdAttribute()?

AHS
 
M

Michael Jung

Arved Sandstrom said:
On 11-12-19 05:27 AM, Michael Jung wrote: [...]
Not "forgotten" exactly. I tested with JDK 1.6 and 1.7, and I find that
if the adopted node is placed _somewhere_ (appendChild or what have
you), that if you obtain the list of "elem" elements in the new
document, that you'll have 2 of them, and that if you inspect the isId()
value of the 'id" attributes, that both of them are TRUE.
Furthermore, if you retrieve by getElementById(), using "x" actually
returns an element, but using "x2" does not. So that tells me that
things are dubious overall if you've performed your scenario.
[...]

Some of this code was apparently garbled in my attempt to tone the
working example down. Here is some changed code that still yields the
output above.

[...]
Document parsed = docBuilder.parse(new File("src/test.xml"));
Element el = parsed.getElementById("x");
System.out.println(el);

Document parsed2 = docBuilder.parse(new File("src/test2.xml"));
Element el2 = parsed2.getElementById("x");
System.out.println(el2);

parsed2.adoptNode(el);
//parsed2.getDocumentElement().removeChild(el2);
//parsed2.getDocumentElement().appendChild(el);
parsed2.getDocumentElement().replaceChild(el, el2);
System.out.println(parsed2.getElementById("x"));
[...]

(Create a copy of test.xml called test2.xml.) What I am attempting to
achieve should be obvious by now: replace one node with the other from
a different file. The commented out lines replacing the one following
them also doesn't work.
What *does* work is to (1) remove the first element, the one with value
"x", and (2) to then setIdAttribute() on the adopted/placed node with id
attribute of "x2".

That is true: ie. putting "el.setIdAttribute("id", true);" somewhere
in the code above. This way I need to know the id attribute's name,
but I can live with that.
Maybe I'm missing something, but if you're willing to call adoptNode()
in the course of doing what you're doing, what's the problem with
removing the element that is going to conflict, and also calling
setIdAttribute()?

setIdAttribute seems out of place. It's not likely that the schema
changes in that respect but it still is odd. And parsing the schema
separately just for that is overkill. At least this issue could be
documented somewhere in the javadoc (I couldn't find anything). Now,
if there were a getIdAttribute that would be better.

Thanks.

Michael
 
A

Arved Sandstrom

Arved Sandstrom said:
On 11-12-19 05:27 AM, Michael Jung wrote: [...]
Not "forgotten" exactly. I tested with JDK 1.6 and 1.7, and I find that
if the adopted node is placed _somewhere_ (appendChild or what have
you), that if you obtain the list of "elem" elements in the new
document, that you'll have 2 of them, and that if you inspect the isId()
value of the 'id" attributes, that both of them are TRUE.
Furthermore, if you retrieve by getElementById(), using "x" actually
returns an element, but using "x2" does not. So that tells me that
things are dubious overall if you've performed your scenario.
[...]

Some of this code was apparently garbled in my attempt to tone the
working example down. Here is some changed code that still yields the
output above.

[...]
Document parsed = docBuilder.parse(new File("src/test.xml"));
Element el = parsed.getElementById("x");
System.out.println(el);

Document parsed2 = docBuilder.parse(new File("src/test2.xml"));
Element el2 = parsed2.getElementById("x");
System.out.println(el2);

parsed2.adoptNode(el);
//parsed2.getDocumentElement().removeChild(el2);
//parsed2.getDocumentElement().appendChild(el);
parsed2.getDocumentElement().replaceChild(el, el2);
System.out.println(parsed2.getElementById("x"));
[...]

(Create a copy of test.xml called test2.xml.) What I am attempting to
achieve should be obvious by now: replace one node with the other from
a different file. The commented out lines replacing the one following
them also doesn't work.
What *does* work is to (1) remove the first element, the one with value
"x", and (2) to then setIdAttribute() on the adopted/placed node with id
attribute of "x2".

That is true: ie. putting "el.setIdAttribute("id", true);" somewhere
in the code above. This way I need to know the id attribute's name,
but I can live with that.
Maybe I'm missing something, but if you're willing to call adoptNode()
in the course of doing what you're doing, what's the problem with
removing the element that is going to conflict, and also calling
setIdAttribute()?

setIdAttribute seems out of place. It's not likely that the schema
changes in that respect but it still is odd. And parsing the schema
separately just for that is overkill. At least this issue could be
documented somewhere in the javadoc (I couldn't find anything). Now,
if there were a getIdAttribute that would be better.

Thanks.

Michael

I can sort of see why we end up with these problems. Neither adoptNode
nor importNode provide a parent for the adopted/imported node in the
target document, which is why both methods return the actual
adopted/imported node (one that has the correct owner document).

Until we know where the adopted/imported node is placed in the target
document, it's not possible to determine whether attribute "id" of
element "elem" is actually of type xs:ID. As you know we could easily
have a schema that declares two elements at different places in the
hierarchy, each with tag name "elem", each with an attribute "id", where
those 2 attributes could either none of them, one of them or both of
them be declared as type xs:ID.

So we surmise that we have to take the adopted/imported "elem" element,
and parent it somewhere, where if "Something Else" (TM) happened that
the "id" attribute would be identified as being of xs:ID type.

I might add at this juncture, I'm not convinced that your

parsed2.getDocumentElement().replaceChild(el, el2);

will work. Node el2 is in document 2, but el is still in document 1.
That's why the returned value from adoptNode() is handy. replaceChild()
does work just fine if you use that value.

I would not expect any of these methods to re-validate and therefore I
am not surprised that if this is all we do, that the newly parented
adopted/imported node is not found with getElementById().

We also know that at this point that setIdAttribute() sets things right.
I experimented with Document.normalizeDocument() and tweaking some
DOMConfiguration parameters in the hopes of finding a way of at least
not having to specify the xs:ID attribute. But this seems not to work.
So I think we're stuck with setIdAttribute().

I'm no expert at DOM (I hate it actually :)) but I hope the above
analysis shows why I at least am not particularly surprised that all of
this has to be done.

AHS
 
M

Michael Jung

Arved Sandstrom said:
Arved Sandstrom said:
On 11-12-19 05:27 AM, Michael Jung wrote: [...]
Not "forgotten" exactly. I tested with JDK 1.6 and 1.7, and I find that
if the adopted node is placed _somewhere_ (appendChild or what have
you), that if you obtain the list of "elem" elements in the new
document, that you'll have 2 of them, and that if you inspect the isId()
value of the 'id" attributes, that both of them are TRUE.
Furthermore, if you retrieve by getElementById(), using "x" actually
returns an element, but using "x2" does not. So that tells me that
things are dubious overall if you've performed your scenario.
[...]
Document parsed = docBuilder.parse(new File("src/test.xml"));
Element el = parsed.getElementById("x");
System.out.println(el);

Document parsed2 = docBuilder.parse(new File("src/test2.xml"));
Element el2 = parsed2.getElementById("x");
System.out.println(el2);

parsed2.adoptNode(el);
//parsed2.getDocumentElement().removeChild(el2);
//parsed2.getDocumentElement().appendChild(el);
parsed2.getDocumentElement().replaceChild(el, el2);
System.out.println(parsed2.getElementById("x")); [...]
setIdAttribute seems out of place. It's not likely that the schema
changes in that respect but it still is odd. And parsing the schema
separately just for that is overkill. At least this issue could be
documented somewhere in the javadoc (I couldn't find anything). Now,
if there were a getIdAttribute that would be better.
[...]
So we surmise that we have to take the adopted/imported "elem" element,
and parent it somewhere, where if "Something Else" (TM) happened that
the "id" attribute would be identified as being of xs:ID type.

Clear enough. The println's up in the code come for free.
I might add at this juncture, I'm not convinced that your

parsed2.getDocumentElement().replaceChild(el, el2);

will work. Node el2 is in document 2, but el is still in document 1.

It was adopted. That should move it out of doc-1 and somewhere in the
"orphanage" of doc-2. (At least that is what the javadoc says.) Also
parsed2.adoptNode(el).equals(el) evaluates to true.
That's why the returned value from adoptNode() is handy. replaceChild()
does work just fine if you use that value.

Tried that and removed the setIdAttribute; fails.
I would not expect any of these methods to re-validate and therefore I
am not surprised that if this is all we do, that the newly parented
adopted/imported node is not found with getElementById().

That would invalidate a document any time a node is added (and removed).
I understand that this is due to performance, but shouldn't there be
helper methods that verify that the document is valid after some
transformation? And some hints in the javadoc? The only way I see out of
this (if I had deep ids) is by transforming and reparsing the full
document.
We also know that at this point that setIdAttribute() sets things
right. [...] So I think we're stuck with setIdAttribute().

See above. Luckily I don't have deep ids. (Some XPath would help that,
of course. But it all seems overkill.)
I'm no expert at DOM (I hate it actually :)) but I hope the above
analysis shows why I at least am not particularly surprised that all of
this has to be done.

I get the point :)

Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top