xerces/SAX xml search

F

foolproofplan

I am currently working on coding something in c++ which allows me to
find locations (line/column) of certain elements and attributes within
an xml file. For this task, I am trying to create a SAX parser which
will search through an xml file as it parses. I have set up a simple
SAX parser and an empty handler.

I need a point in the right direction on where to begin with making
the parse search. I am not sure what is the best way to have the
parser take something like a string which would keep track of exactly
what i am looking for.

I am also unsure how to handle the search itself. Should I have two
parsers? (One which begins to find elements that fit the first
elements description and then parse that element to see if it contains
all the criteria/hierarchy of what i am looking for.)

Thanks in advance,
- Marc
 
J

Joseph Kesselman

I need a point in the right direction on where to begin with making
the parse search.

If you're doing it this way, you need to implement a SAX handler that
keeps track of what it's seen and whether that matches steps along the
way to whatever you're searching for.

Might make more sense to just use an off-the-shelf XPath/XSLT/XQuery
implementation, or subset implementation. XPath is the basic search
language for XML; XSLT and XQuery basically add functions and report
generation capability to that. The fully general versions of these do
require loading the entire document into memory, but subsets exist that
can be processed on the fly.
 
F

foolproofplan

Might make more sense to just use an off-the-shelf XPath/XSLT/XQuery
implementation, or subset implementation. XPath is the basic search
language for XML; XSLT and XQuery basically add functions and report
generation capability to that.

I understand the basics of XPath and XSLT and read about XQuery but
still do not understand how this will help in terms of creating
something that can search for specific elements and attributes I
provide. (even if I converted what I was looking for to an XPath
expression)

Can you explain more about 'off-the-shelf XPath/XSLT/XQuery
implementations'?
 
J

Joe Kesselman

I am currently working on coding something in c++ which allows me to
find locations (line/column) of certain elements and attributes within
an xml file.

OK, looking at this another time... You're almost certainly looking at
building your own SAX-based search, since you said you want line/column
information and most of the other APIs don't deliver that. (SAX may not
either, but you can at least try the SAXLocator API.)

Of course if you take that approach, it's entirely up to you to code the
logic that turns your search (however you want to express it) into a
state machine that can be driven by SAX events, or that runs over
whatever data structure you build from the SAX events to record the
document structure plus locator information (an annotated DOM, perhaps,
that adds location information... or some custom data structure tuned
for your own application's needs). Simple searches may not need much
stored state information; really complex ones may require the whole
document tree be available.

You've given us no indication of what kinds of searches you want to
perform, so generalities are all I can give you. You may be talking
about anything from a trivial subset of XPath to full XPath to full
XQuery to something more complicated than that. Obviously, simpler is
easier to implement.

Personal reaction: Line/column is usually a Bad Thing to use in the XML
world, because documents with identical semantics may not have the same
detailed syntax, and indeed tools don't always have that information
available to them. Expressing a point in the document as a simple XPath
to that location is often a better alternative.
 
?

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Joe said:
OK, looking at this another time... You're almost certainly looking at
building your own SAX-based search, since you said you want line/column
information and most of the other APIs don't deliver that. (SAX may not
either, but you can at least try the SAXLocator API.)

Why not just use Expat ?

http://expat.sourceforge.net/

XML_GetCurrentLineNumber() and XML_GetCurrentColumnNumber() now return unsigned integers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top