Here's the XML validation tool the world is waiting for...

Ramon F Herrera · Nov 10, 2012

I have tried with some automated file-validation tool, but concluded
that their approach was fundamentally flawed: You cannot possibly
generate an schema based on only one XML sample. Not unlike electoral
polls (or any kind of sampling), the more samples you have, the more
accurate the result. Yes, I know that at some point you reach
diminishing returns.

But I digress...

Are you folks aware of any such tool? One that takes a whole bunch (I
have an infinite numbers, can make a widely varied set) of XML files
and creates their best-fit schema?

-Ramon

ps: There's a business opportunity...

Ramon F Herrera · Nov 10, 2012

I have tried with some automated file-validation tool, but concluded
that their approach was fundamentally flawed: You cannot possibly
generate an schema based on only one XML sample. Not unlike electoral
polls (or any kind of sampling), the more samples you have, the more
accurate the result. Yes, I know that at some point you reach
diminishing returns.

But I digress...

Are you folks aware of any such tool? One that takes a whole bunch (I
have an infinite numbers, can make a widely varied set) of XML files
and creates their best-fit schema?

-Ramon

ps: There's a business opportunity...

Never mind, I found something that looks great: Liquid XML Studio.

Any competing options?

-Ramon

Alain Ketterlin · Nov 11, 2012

Ramon F Herrera said:
I have tried with some automated file-validation tool, but concluded
that their approach was fundamentally flawed: You cannot possibly
generate an schema based on only one XML sample. Not unlike electoral
polls (or any kind of sampling), the more samples you have, the more
accurate the result. Yes, I know that at some point you reach
diminishing returns.

But I digress...

Are you folks aware of any such tool? One that takes a whole bunch (I
have an infinite numbers, can make a widely varied set) of XML files
and creates their best-fit schema?

This is called grammatical inference (or grammar induction sometimes).
(Warning: understatement ahead) It is difficult in the general case. All
you can hope is a "good enough" solution (it looks like you found one).

ps: There's a business opportunity...

There are plenty of scientific opportunities...

-- Alain.

Peter Flynn · Nov 13, 2012

I have tried with some automated file-validation tool, but concluded
that their approach was fundamentally flawed: You cannot possibly
generate an schema based on only one XML sample.

No, you can generate a schema that describes that single document. This
has been known for a long time, at least since the days of OCLC's Fred
(using SGML DTDs). If the document is sufficiently representative of its
type, it is a good starting-point for manual refinement, and saves a lot
of time on those (rare) occasions when it is necessary.

Not unlike electoral polls (or any kind of sampling), the more
samples you have, the more accurate the result. Yes, I know that at
some point you reach diminishing returns.

For it to be useful, the samples must describe the same type of
document. Creating the union of TEI and DocBook is probably not useful

Are you folks aware of any such tool? One that takes a whole bunch
(I have an infinite numbers, can make a widely varied set) of XML
files and creates their best-fit schema?

If you can infer a sample fragmentary grammar and express it in a
generalised machine-readable syntax, then you can probably deduce the
union of multiple instances of other fragments of the same grammar,
provided they possess sufficient commonality.

ps: There's a business opportunity...

Limited, I would say, but certainly there.

///Peter

CanonML: beyond TeX and XML, a lesson also for arrogant stringers?	3	May 5, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004

Here's the XML validation tool the world is waiting for...

Ramon F Herrera

Ramon F Herrera

Alain Ketterlin

Peter Flynn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads