Multiple XML instance document distribution problem

  • Thread starter steve_marjoribanks
  • Start date
S

steve_marjoribanks

This isn't strictly an XML problem but I thought someone might be able
to help!

As part of my degree I am working on a new data format for use in the
geotechnical engineering domain.

The data structure we have come up with is based around a single
instance document for 'raw' data and then numerous other associated
instance documents containing 'interpreted' data (ie. containing some
interpretation in an engineering sense of the raw data file). It is not
really possible to combine all these documents into one larger data
structure due to the nature of them.

However, it is also *vital* that we make sure that if any instances of
the XML files are to be distributed that they remain in one group (ie.
the raw data and any number of interpreted data files). It should not
be possible to separate them or if they are they *must* be able to be
reunited.

We have thought about using XLink/XPointer which is OK up to a point
but if the files are moved somehow into different directories relative
to each other it could cause problems.
Basically what I'm asking is is anyone aware of any 'archiving' type of
tool (similar to a zip file or something) which means that the files
are effectively distributed as one file. The only catch is that the XML
files need to be able to be compressed/extracted by a Java application.

Many thanks!

Steve
 
J

Joseph Kesselman

Java has libraries that can create/explode zipfiles. That might be the
simplest solution.

But I'd suggest you think again about whether things which are parts of
a single data structure ("can't be separated") really belong in separate
files.

Alternatively, consider tagging them with the information (document name
or something of that sort) needed to confirm that they've been correctly
reassembled.
 
S

steve_marjoribanks

Ok thanks, I shall look into the Java libraries.

I understand what you're saying about whether they should actually be
in separate files or not and we have been discussing exactly how to
solve this problem for quite a while now! The problem is is that if we
combine them into one file there will be a strong possibility of data
being repeated multiple times within one file which is never a good
idea. Also, the idea is to create Java applications which will be able
to parse these files and do various things with them, and extracting
the data from one large file makes it rather tricky (in our case).

I know it's not ideal, but we've been thinking about a more elegant way
to solve the problem for a while and haven't come up with anything
which is any better!

Steve
 
J

Joe Kesselman

steve_marjoribanks said:
The problem is is that if we
combine them into one file there will be a strong possibility of data
being repeated multiple times within one file which is never a good
idea.

So instead you have data repeated multiple times across multiple files.
That's a better idea?
Also, the idea is to create Java applications which will be able
to parse these files and do various things with them, and extracting
the data from one large file makes it rather tricky (in our case).

That's a bit more reasonable. On the other hand, it shouldn't be much
more "tricky" than extracting from the individual files. It may be a bit
more computation, admittedly, to parse through a larger file.

Ship as one large file and use tools (stylesheets, perhaps) to extract
the separate files on the receiving end? That's no worse than the
zipfile approach, though you don't get compression for free.
 
S

steve_marjoribanks

I understand that shipping them as individual files is far from ideal,
but equally having it all as one file without separating it at the
receiving end is faily non-ideal as well. The problem is that the 'raw'
data file I mentioned will have the ability to contain possible
information about any geotechnical entity. The separate 'interpreted'
data files will contain the interpretations (either from an engineer or
a computer) of the 'raw' data but only specific to one individual
geotechnical entity per file. So for example, there would be one large
raw data file and then we might have a file for a slope stability
application, one for a retaining wall application, one ofr a foundation
application etc etc. These application specific data structures need to
be kept separate from each other (not necessarily in separate files
though, as you say) as the idea is that a software application will be
able to load up one of these files and then process the data within it
and add interpretation data to the files. it is very important to make
sure that this 'interpreted' data is easily recognisable and not
confused with the raw data.
Anyhow, it's not a major part of my project, just something to think
about as an aside! :)
Ship as one large file and use tools (stylesheets, perhaps) to extract
the separate files on the receiving end? That's no worse than the
zipfile approach, though you don't get compression for free.

That might be a possibility actually, thanks for the suggestion, I
shall think about it!

Thank you

Steve
 
S

Soren Kuula

Hi,
Basically what I'm asking is is anyone aware of any 'archiving' type of
tool (similar to a zip file or something) which means that the files
are effectively distributed as one file. The only catch is that the XML
files need to be able to be compressed/extracted by a Java application.
Java can compress and uncompress zip. ZipInputStream, ZipOutputStream.

Soren.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top