libhxml Node#remove! kills each loop and extension to #<<

T

transfire

hi,

While using #each to loop thru the children of a Node, if I remove a
node the loop breaks on it's own.

<root>
<a id="a"></a>
<b id="b"></b>
<c id="c"></c>
</root>

root.each { |node|
if XML::Node === node
node.content = "yep"
node.remove! if node['id'] = "b"
end
}

The result is

<root>
<a id="a">yep</a>
<c id="c"></c>
</root>

Would tha tbe a bug? Or something that simply can't be avoided?

Also, I found this extension to #<< to be useful:

class XML::Node
alias_method :append, :<<
def <<( node )
if Array === node
node.each { |n| self.append n }
else
super
end
end
end

Thanks,
T.
 
T

transfire

Also, I found this extension to #<< to be useful:

class XML::Node
alias_method :append, :<<
def <<( node )
if Array === node
node.each { |n| self.append n }
else
super
end
end
end

s/super/append(node)/

T.
 
M

Matthew Smillie

hi,

While using #each to loop thru the children of a Node, if I remove a
node the loop breaks on it's own.

<root>
<a id="a"></a>
<b id="b"></b>
<c id="c"></c>
</root>

root.each { |node|
if XML::Node === node
node.content = "yep"
node.remove! if node['id'] = "b"
end
}

The result is

<root>
<a id="a">yep</a>
<c id="c"></c>
</root>

Would tha tbe a bug? Or something that simply can't be avoided?

I'm not aware (and couldn't find) any libhxml - if you meant ruby-
libxml (which seems likely given the problem), here's what I figured
out.

At first, I thought it could be a bug caused by modification to the
structure you're iterating over, similar to this:

root = ['a','b','c']
root.each { |node| root.delete(node) if node == "b" }

which will skip over 'c' due to the deletion.

But while I was trying to confirm this in libxml, I found behaviour
that makes me think there's some more fundamental bug. Redefining a
variable seemed to have some very odd effects, which I managed to
reduce to this case:

irb(main):001:0> require 'rubygems' # => true
irb(main):002:0> require 'xml/libxml' # => true
irb(main):003:0> root = XML::Node.new("root") # => <root/>
irb(main):004:0> a = XML::Node.new("a") # => <a/>
irb(main):005:0> b = XML::Node.new("b") # => <b/>
irb(main):006:0> root # => <root/>
irb(main):007:0> root << a # => <a/>
irb(main):008:0> root
# everything
=> <root>
<a/>
</root>
irb(main):009:0> root << b # => <b/>
irb(main):010:0> root
=> <root>
<a/>
<b/>
</root>
irb(main):011:0> root = XML::Node.new("root") # => <root/>
irb(main):012:0> root # => <root/>
irb(main):013:0> root << a # => <a/>
irb(main):014:0> root
=> <root>
<a/>
<b/> # where did *this* come from?
</root>

(That's the existing definition of #<<, not your extension)

Exiting from the irb session results in a segmentation fault, and
running the same code outside of irb yields the same apparent results
(inclusion of 'b' where it shouldn't be), and resulted in a bus
error. I have the hunch that the C extension isn't managing memory
properly, which is confirmed by one of the errors submitted on the
project page. Maybe this is just my setup (1.8.4 on OSX), but it
seems to me that the library has enough problems that it's not quite
ready for use.

matthew smillie.
 
R

Robert Klemme

2006/7/8 said:
hi,

While using #each to loop thru the children of a Node, if I remove a
node the loop breaks on it's own.

<root>
<a id="a"></a>
<b id="b"></b>
<c id="c"></c>
</root>

root.each { |node|
if XML::Node === node
node.content = "yep"
node.remove! if node['id'] = "b"
end
}

The result is

<root>
<a id="a">yep</a>
<c id="c"></c>
</root>

Would tha tbe a bug? Or something that simply can't be avoided?

It's usually a problem to change a container while iterating through
it. This can generate all sorts of weird effects. It's generally
better to rely on this *not* being possible unless explicitely stated
(e.g most of Java's iterators implement remove() which savely removes
an element while iterating).

In your case I'd either first remove the one you want to get rid of,
iterate using an index (if that's possible) or remember objects to
remove in some kind of container and do the removal after the
iteration (probably the most efficient solution).

Kind regards

robert
 
R

Robert Klemme

PS: Here's another alternative that might work: use delete_if to
iterate and delete those elements you want to get rid of.

root.delete_if do |node|
if XML::Node === node
node.content = "yep"
node['id'] == "b"
else
false
end
end

Cheers

robert
 
T

transfire

Matthew said:
I'm not aware (and couldn't find) any libhxml - if you meant ruby-
libxml (which seems likely given the problem), here's what I figured
out.

:) Yes libxml bindings is indeed what I was refering (h was a typo)
At first, I thought it could be a bug caused by modification to the
structure you're iterating over, similar to this:

root = ['a','b','c']
root.each { |node| root.delete(node) if node == "b" }

which will skip over 'c' due to the deletion.

But while I was trying to confirm this in libxml, I found behaviour
that makes me think there's some more fundamental bug. Redefining a
variable seemed to have some very odd effects, which I managed to
reduce to this case:
[snip]

=> <root>
<a/>
<b/> # where did *this* come from?
</root>

(That's the existing definition of #<<, not your extension)

Exiting from the irb session results in a segmentation fault, and
running the same code outside of irb yields the same apparent results
(inclusion of 'b' where it shouldn't be), and resulted in a bus
error. I have the hunch that the C extension isn't managing memory
properly, which is confirmed by one of the errors submitted on the
project page. Maybe this is just my setup (1.8.4 on OSX), but it
seems to me that the library has enough problems that it's not quite
ready for use.

Thanks matthew. Very enlightening. I decdided to write a xml wrapper
and create an common interface for either REXML and libxml. That way I
can use REXML for now and easy switch over when libxml binding are
fully operational.

T.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top