Searching random XML documents

S

sal achhala

I'm working with java and XML documents in order to search for keywords in a
given element name, eg element name 'author' == "jo blogs".

The problem is the XML documents are downloaded (this process is automated)
from different websites thus the element names for author may differ!

Is their a way of dealing with this, such as perhaps a standard adopted by,
say educational websites to agree on element names ?

Thanks very much

ps im also looking for a good simple search method, by element name and also
just searching an xml document as a regular text document
 
M

Martin SChukrazy

Options
1) If you have limited number of schema (for difffering xml documents) then
you could possibly transform these documents
into your own common format and then write an xquery / xpath expression
to search for keywords in a given element
name.
2) Second option is to store all the keywords that you encounter in a
master file and then launch a process that does your
search (multi-thread for efficiency)
3) Use a comman standard in a direct format (that would mean all the
websites generate the info in a common format).
I would not be able to help without more information over here
 
S

sal achhala

1) If you have limited number of schema (for difffering xml documents)
then
you could possibly transform these documents into your own common format
and then write an xquery / xpath expression to search for keywords in a
given element name.

thanks Martin, the option above makes sense to me (im new to java/XML) - i
could transform the diffrent formats into a common one. How easy would that
be ?

The common format of my XML documents would be Date, Title, Author and
articleBody.

how would one go about transforming the documents ?

Considering element names would differ from site to site how would an
automated process recognise, for instance, that 'name' is the same as
'author' ?

thanks very much

sal
 
M

Martin SChukrazy

There are several ways to go about this...
1) Use standard Data Transformation toolkits which transform from text / xml
to a given xml format. Usually visual GUI toolkits make the job easier..
2) Use XSLT transforms to transform from one xml format to a standard xml
format

Again you can usually try GUI tools such as Stylus Studio to do the XSLT
transform and then verify the results..
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top