New file format design

M

mathieu

Hello there,

I am looking for suggestions for designing a simple file format
based on XML. It will only contain text information (no binary data).
1. If I have a choice: Element or Attribute ?
2. Do I need to define my own file version (maybe as the first XML
element) ?
3. Do I need to provide a DTD or XML schema ?

Thanks for inputs,
Mathieu
 
J

Joe Kesselman

mathieu said:
1. If I have a choice: Element or Attribute ?

This is a FAQ. What's the intent of the datum (modifier or content), and
will it ever in the future want to be structured (in which case it has
to be an element).
2. Do I need to define my own file version (maybe as the first XML
element) ?

Up to you. Will you ever need to distinguish versions?

3. Do I need to provide a DTD or XML schema ?

Up to you. Do you want the parser to help confirm the data is reasonably
structured and contains plausible values? Do you need to mark some data
as having particular kinds of meanings (ID is the obvious one that has
to be defined at this level)? Do you want to define named entities
(supported only in DTDs, and *probably* best avoided these days although
folks still debate that)?
 
M

mathieu

Joe said:
This is a FAQ. What's the intent of the datum (modifier or content), and
will it ever in the future want to be structured (in which case it has
to be an element).

Thank for the ref, I am sorry I did not do the step of searching for
it.
http://xml.silmaril.ie/developers/attributes/

Up to you. Will you ever need to distinguish versions?

Well I disagree simply because I don't know. I was under the impression
that XML was designed exactly for this 'I don't know'. So adding
Attributes or Elements is still (by design) syntactically correct. What
I am unsure is : is this mechanism enough ?
Up to you. Do you want the parser to help confirm the data is reasonably
structured and contains plausible values? Do you need to mark some data
as having particular kinds of meanings (ID is the obvious one that has
to be defined at this level)? Do you want to define named entities
(supported only in DTDs, and *probably* best avoided these days although
folks still debate that)?

Not really, I know what I am reading. My understanding was that DTD or
XML schema was much more explicit for a third party than if I were to
write down the file specification.

Thanks !
M
 
J

Joe Kesselman

mathieu said:
Well I disagree simply because I don't know.

If you don't know, you can either treat the absence of the version mark
as indicating version 0.0, or you can go ahead and design it in now.
Either solution is defendable.

In general: If in doubt, it's wise to design for a version mark, even if
you make it optional.
My understanding was that DTD or
XML schema was much more explicit for a third party than if I were to
write down the file specification.

Not entirely. The DTD/Schema may be useful for driving some tools. It
may provide some specific kinds of information that aren't expressed
directly in the instance document -- if your parser doesn't support
xml:id, and you don't have a DTD or schema, tools may not be able to
take advantage of some optimization potential. In fact, IBM has
demonstrated that a schema-aware parser can actually be made faster than
a non-validating parser, if you know which schema to expect and you do
some compilation ahead of time. (I think a paper on that topic appears
in the current issue of the IBM Systems Journal; I know the authors have
presented papers on this at conferences.)

If those issues don't concern you, you don't have to create a DTD or
schema immediately -- but the longer you wait, the more likely folks
will do things in their instance documents that you didn't expect. And
formalizing your document design is a good exercise even if you don't
enforce it.
 
A

Andy Dingley

If you don't know, you can either treat the absence of the version mark
as indicating version 0.0, or you can go ahead and design it in now.


King numbering.
(Coinage is labelled 'George II' and 'George IV', but simply 'George'
for the first one)
 
J

Joe Kesselman

Andy Dingley said:
King numbering.
(Coinage is labelled 'George II' and 'George IV', but simply 'George'
for the first one)

I like the term; thanks!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top