--------------070706010408000105090809
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sounds like it is time for FasterXML.
One pointer: REXML comes with quite a fast pullparser, and it should be
possible to base some lightweight xml document lib on that. (The
documentation says that the API should not be considered stable, but I'm
sure that could be resolved with the REXML author.)
As a proof of concept, see the attached code. We use it in our company
to load and process XML files generated by our tools and OpenOffice Calc.
I just tested it on a 1MB XML from an .ods file, which it loaded
successfully in < 2 seconds.
Writing a fast XPath implementation to match this might be quite a
challenge, though.
Dennis
--------------070706010408000105090809
Content-Type: text/plain;
name="xmlsimple2.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="xmlsimple2.rb"
require 'rexml/parsers/pullparser'
module XmlSimple
def self.load(filename)
parse(File.read(filename))
end
def self.parse(string)
parser = REXML:

arsers:

ullParser.new(string)
return Node.new(['root', {}], parser)
end
class Node
include Enumerable
instance_methods(true).each {|m| undef_method(m) unless m =~ /__.*__/}
attr_reader :name, :attr, :text, :children
def initialize(token, parser)
@name = token[0]
@text = ''
@siblings = [self]
@attr = token[1]
@nodes = {}
@children = []
loop do
if parser.has_next?
tok = parser.pull
else
tok = REXML:

arsers:

ullEvent.new([:end_element, 'root'])
end
case tok.event_type
when :start_element
node = Node.new(tok, parser)
@children << node
if @nodes[tok[0]]
@nodes[tok[0]].push_sibling(node)
else
@nodes[tok[0]] = node
end
when :end_element
raise unless tok[0] == @name
return
when :text
@text << tok[0]
@children << tok[0]
end
end
end
def push_sibling(node)
@siblings << node
end
def to_a
@siblings
end
def each(&block)
@siblings.each(&block)
end
def method_missing(m)
return @nodes[m.to_s]
end
def [](m)
return @nodes[m]
end
def inspect(indent = '')
r = indent + @name + ":\n"
indent += ' '
r << indent + 'attr: ' + attr.inspect + "\n" unless attr.empty?
r << indent + 'text: ' + text.inspect + "\n" unless text.empty?
@nodes.each do |k, v|
v.each {|n| r << n.inspect(indent)}
end
return r
end
end
end
--------------070706010408000105090809--