XML or home-grown format?

J

JS Bangs

In the continuing development of Lingua::phonology, I'm starting to
consider what the benefits would be of moving my file-parsing formats to
XML from the current custom format.

Currently, two of the sub-modules do some form of file-parsing, and the
formats they use are described at:

http://search.cpan.org/author/JASPAX/Lingua-Phonology-0.25/Phonology/Features.pm#loadfile
http://search.cpan.org/author/JASPAX/Lingua-Phonology-0.25/Phonology/Symbols.pm#loadfile

The existing formats are concise and human-readable, but completely
custom. As I'm thinking of adding file-parsing to Lingua::phonology::Rules
(and perhaps other modules), I was looking for something more reusable,
general, and powerful (especially since the Rules submodule will require
some fairly complex parsing rules). If I use XML, I can pass parsing
duties off to XML::Whatever, but I'm concerned that the costs (in terms of
verbosity) will outweight the benefits of portability and extensibility.

For example, I can currently write the following line in a file to be
parsed by Lingua::phonology::Symbols:

d +anterior -distributed voice

In XML, this might have to be as verbose as:

<symbol label="d`">
<feature name="anterior" value="+" \>
<feature name="distributed" value="-" \>
<feature name="voice" \>
</symbol>

Which is significantly heavier and less clear. I'm rather torn on this, so
I was wondering what insight the minds here have to offer. Many thanks--

--
Jesse S. Bangs (e-mail address removed)
http://students.washington.edu/jaspax/
http://students.washington.edu/jaspax/blog

Jesus asked them, "Who do you say that I am?"

And they answered, "You are the eschatological manifestation of the ground
of our being, the kerygma in which we find the ultimate meaning of our
interpersonal relationship."

And Jesus said, "What?"
 
R

Rich

JS Bangs wrote:

snip
In XML, this might have to be as verbose as:

<symbol label="d`">
<feature name="anterior" value="+" \>
<feature name="distributed" value="-" \>
<feature name="voice" \>
</symbol>

Which is significantly heavier and less clear. I'm rather torn on this, so
I was wondering what insight the minds here have to offer. Many thanks--

I'd consider YAML whenever you need XML like structures that poor old humans
might have to read/edit.

The slight downer is that YAML seems to be developing at a pace similar to
p6, though in both cases it'll be worth the wait.

Cheers
 
J

JS Bangs

I've added comp.text.xml to the cross-posting for this, since it's
probably more concerned with XML than anything else at this point. So far,
we've been discussing whether it's worth the trouble to move a custom file
format for the perl module Lingua::phonology over to XML. I pointed out an
original example line like:

Which would have to become:

To which MegaZone suggested the shorter version:
<symbol label="d" anterior="+" distributed="-" voice="+" />

Something like that is just as valid in XML,

My response:

The example you gave is *well-formed* XML, which is different from *valid*
XML. The problem is that your example could never be valid XML, because
the attributes needed to define a given <symbol> cannot be known ahead of
time in the module. Rather, the list of feature names is given in a
separate <featureset></featureset> section.

True, one could make the featureset declaration into a DTD, but that would
require the users of my module to write their own DTD's, which is too much
work for them. I'd rather leave the validation of features against the
featureset to the application--which I'm also writing, so it's not much of
a problem.

I could go your way, but it would require all XML files parsed by my
module to run in standalone mode, and would prevent writing any DTD that
could validate all such files.
Using XPath you can find the element based on one attribute and get
the value of another - as in this:
---
use strict;
use warnings;
use XML::LibXML 1.0053;

my $xmlFile;
my $parser = XML::LibXML->new();

open (XMLCONF, "<./pcCurrencyTable.xml") ||
die "Can't open table: $!";
while (<XMLCONF>) {
$xmlFile .= $_;
}
close (XMLCONF);

my $dom = $parser->parse_string($xmlFile);
$xpath = "//CurrencyTable/CurrencyShift[\@number='840']/\@name";
print( ($dom->findnodes($xpath))[0]->textContent() . "\n");

Something like this could provide an elegant way for the Lingua::phonology
module to do checking that a given file doesn't contain errors (i.e. that
all attributes or feature names given for a <symbol> match some feature
declared in the <featureset> section. Once I've decided on my format, I'll
have to consider exactly how to do this.


--
Jesse S. Bangs (e-mail address removed)
http://students.washington.edu/jaspax/
http://students.washington.edu/jaspax/blog

Jesus asked them, "Who do you say that I am?"

And they answered, "You are the eschatological manifestation of the ground
of our being, the kerygma in which we find the ultimate meaning of our
interpersonal relationship."

And Jesus said, "What?"
 
J

Julian Scarfe

For example, I can currently write the following line in a file to be
parsed by Lingua::phonology::Symbols:

d +anterior -distributed voice

In XML, this might have to be as verbose as:

<symbol label="d`">
<feature name="anterior" value="+" \>
<feature name="distributed" value="-" \>
<feature name="voice" \>
</symbol>

Which is significantly heavier and less clear. I'm rather torn on this, so
I was wondering what insight the minds here have to offer. Many thanks--

My guess is that you find this less clear because you're used to reading the
current format. However:

<symbol label="d">
<feature name="anterior" value="true" \>
<feature name="distributed" value="false" \>
<feature name="voice" \>
</symbol>

means a great deal more to me than trying to work out what your +s and -s
mean. The structure is immediately clear and it's not hard to edit using an
XML editor or even a simple text editor. I'd check out XML schema (rather
than playing with DTDs) if you haven't already.

Julian Scarfe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top