xquery help

J

Jeff Kish

Greetings.

I can't invest a large amount of time into this, but it would be very helpful
if I could do this.

I have a directory full of xml files I'd like to be able to query to find out
things like:

find all elements of type "<table" that contain as subelements (elements of
type "row" that have attributes of type "lookup" which have value
"timelookup")

return all the elements of type "row" with their attribute "size" (even if the
size attribute is not specified I'd like to know that) that are in elements
of type "table" where the table has attribute "virtual" not null (or perhaps
null, i.e. not set or specified)



I tried downloading saxon and galax.
I could not get saxon to run (yes, probably my own inability/impatience).
I got galax to run, but can't figure out how to get some simple queries to
work.


I'm still hammering away on the documentation, but this is a "do it in my
spare time" project, and I'd really appreciate any recommendations or
pointers.

Am I using the wrong tool(s)?


Finally, I'd like to be able to format the output to make it possibly more
useful, even as input to another query??..??

Thanks again, in advance if that is not too presumptuous

Jeff Kish
 
W

William Park

Jeff Kish said:
Greetings.

I can't invest a large amount of time into this, but it would be very helpful
if I could do this.

I have a directory full of xml files I'd like to be able to query to find out
things like:

find all elements of type "<table" that contain as subelements (elements of
type "row" that have attributes of type "lookup" which have value
"timelookup")

return all the elements of type "row" with their attribute "size" (even if the
size attribute is not specified I'd like to know that) that are in elements
of type "table" where the table has attribute "virtual" not null (or perhaps
null, i.e. not set or specified)

Hmm, you can determine if such condition exists in a XML file. But,
what do you want to do once you find out? If you want a printout of the
entire node, then SAX-type parser (ie. Expat) may not be the best tool.

Or, you have to parse the files in two-pass fashion, once for slicing
out every nodes, and another for testing if condition is true.
 
?

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Jeff said:
find all elements of type "<table" that contain as subelements (elements of
type "row" that have attributes of type "lookup" which have value
"timelookup")

return all the elements of type "row" with their attribute "size" (even if the
size attribute is not specified I'd like to know that) that are in elements
of type "table" where the table has attribute "virtual" not null (or perhaps
null, i.e. not set or specified)

Each of these tasks should take an experienced
developer less than one hour.
I tried downloading saxon and galax.

Why did you do this ?
Did you expect these tools to be easy to learn ?
Am I using the wrong tool(s)?

Use a language that you already know.
Even bash has an XML extension today.
Finally, I'd like to be able to format the output to make it possibly more
useful, even as input to another query??..??

This is no problem .. _if_ you know your tool.
 
J

Jeff Kish

Each of these tasks should take an experienced
developer less than one hour.

First of all, thankyou for your kind reply (no sarcasm intended).. I realize
that time is precious and I appreciate yours and the other posters'.


I have some experience.. but not in using XML libraries/objects.. That's why
I'm asking for a bit of advice since I can't pour lots of time into this task
(which fortunately does not sound too complicated). It would take me a half
day or day (which translates into a lot of spare time-days) to figure out how
to load/write a program that just accesses, say, JDOM.
Why did you do this ?
Did you expect these tools to be easy to learn ?
I need something that can query XML in a similar fashion to SQL, and I had
heard XQuery was a good target, so I googled around on XQUERY and commandline.

I was hoping for a gentle learning curve, since the things I want to do are
not too difficult I figured an hour or two to get a tool installed and
running, and an hour or two to figure out how to do these simple queries.

I ended up not being able to get the saxon tool to run (I really don't know
java very well.. it could not find some classes etc), so I left off after
about an hour or so and went to galax.

After about an hour I had the galax tool up and available, so I started
through some of the tutorials. It took another three hours to find some
beginner/clear tutorials, so I'm almost at the point where I may be able to
submit some queries.

I figured at the time I posted, that someone might offer some advice for a
tool that would be better suited to my limits of time and ability.
Use a language that you already know.
Even bash has an XML extension today.
I would like to do this from the command line. Though I know a little java and
even more c++, it has been a while since I did anything with XML in them.
This is no problem .. _if_ you know your tool.

Jeff Kish
 
J

Jeff Kish

Hmm, you can determine if such condition exists in a XML file. But,
what do you want to do once you find out? If you want a printout of the
entire node, then SAX-type parser (ie. Expat) may not be the best tool.

Well, I figure if I can get at least the elements and their id's, I could get
some use out of that. The worst case scenario is I could load up the files of
interest in an editor and search for those id's and then examine the XML.
Or, you have to parse the files in two-pass fashion, once for slicing
out every nodes, and another for testing if condition is true.
Not sure.. but I finally have galax 3.5 up and running, and it appears to be
promising. Eventually I'd like to be able to tie java code, database contents
and xml files in some sort of process where I can find situations where things
are used different ways.. for example

find out what files use a certain attribute in a certain element tree and find
out where in the database it exists as data, or where in java code it is
referenced in a given function call.
Jeff Kish
 
?

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Jeff said:
Well, I figure if I can get at least the elements and their id's, I could get
some use out of that. The worst case scenario is I could load up the files of
interest in an editor and search for those id's and then examine the XML.

Before you do that, try using any scripting language.
I don't like Perl, but I think for your problem Perl
with a suitable XML module might be the easiest solution.
Not sure.. but I finally have galax 3.5 up and running, and it appears to be
promising. Eventually I'd like to be able to tie java code, database contents
and xml files in some sort of process where I can find situations where things
are used different ways.. for example

I am surprised how many tools you are willing to employ
in this rather simple task. Again: a scripting language
might do it much easier.

I am currently working on such an extension for the
GNU Awk scripting language and I bet each of your
problems can be solved with 20 lines of code. But I
cannot offer you a proper binary distribution or
documentation yet. So, at the moment Perl, Python
or bash might be best.
 
W

William Park

Jeff Kish said:
Well, I figure if I can get at least the elements and their id's, I
could get some use out of that. The worst case scenario is I could
load up the files of interest in an editor and search for those id's
and then examine the XML.
Not sure.. but I finally have galax 3.5 up and running, and it appears
to be promising. Eventually I'd like to be able to tie java code,
database contents and xml files in some sort of process where I can
find situations where things are used different ways.. for example

find out what files use a certain attribute in a certain element tree
and find out where in the database it exists as data, or where in java
code it is referenced in a given function call.

Do it manually using editor. It's the least time consuming approach.

However, for scripting approach...

I'm familiar with Expat (SAX-style) parser. Determining if certain
condition is true is the easy part. If I understand you right, you then
want to print out the entire scope of element, ie.
<table>
... <row lookup="timelookup">...</row> ...
</table>

And, I'm saying that it's difficult to do this with SAX-style parser,
because you have go back up to <table>, then jump forward to </table>.
Parser does not jump around; it only goes forward, calling callback
functions as it encounters different XML structures. This means that
you reached the end of 'table' element, only when you encounter </table>
tag. To print out all content since the last <table> tag, you have to
store the content manually. Pain.

Two-pass solution is easier. First, cut/slice your file based on
<table>...</table>
assuming it's not nested. Then, test each 'table' section if your
condition exist. If so, print the whole section.

Use any scripting language. I recommend
- Awk with Expat parser,
- Bash with Expat parser (http://freshmeat.net/projects/bashdiff/)
 
J

Jeff Kish

Greetings.

I can't invest a large amount of time into this, but it would be very helpful
if I could do this.

I have a directory full of xml files I'd like to be able to query to find out
things like:

find all elements of type "<table" that contain as subelements (elements of
type "row" that have attributes of type "lookup" which have value
"timelookup")

return all the elements of type "row" with their attribute "size" (even if the
size attribute is not specified I'd like to know that) that are in elements
of type "table" where the table has attribute "virtual" not null (or perhaps
null, i.e. not set or specified)



I tried downloading saxon and galax.
I could not get saxon to run (yes, probably my own inability/impatience).
I got galax to run, but can't figure out how to get some simple queries to
work.


I'm still hammering away on the documentation, but this is a "do it in my
spare time" project, and I'd really appreciate any recommendations or
pointers.

Am I using the wrong tool(s)?


Finally, I'd like to be able to format the output to make it possibly more
useful, even as input to another query??..??

Thanks again, in advance if that is not too presumptuous

Jeff Kish
Hi.
I've made progress in case anyone else is trying this also.
I downloaded galax (3.5.. 4.0 does not have windows binaries yet)
and spent a few hours going through some useful XQuery tutorials I found
elsewhere.

I may look into Qexo to compile the xqueries into java to have ones I use over
and over again into "binaries".

Eventually I'll investigate pushing the xml into a database so I can cross
query xml/database/code, or I'll figure out how to use xquery against a
database also.

Jeff

Jeff Kish
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top