What YAML engine do you use?

  • Thread starter Reinhold Birkenfeld
  • Start date
R

rm

Fredrik said:
I guess you both stopped reading before you got to the second paragraph
in my post. YAML (at least the version described in that spec) isn't easy on
users; it may look that way at a first glance, and as long as you stick to a
small subset, but it really isn't. that's not just bad design, that's plain evil.

and trust me, when things are hard to get right for developers, users will
suffer too.

</F>
you stopped reading too early as well, I guess:
"maybe their stated goal is not reached with this implementation of
their idea"

and the implementation being the spec,

furthermore, "users will suffer too", I'm suffering if I have to use
C++, with all its exceptions and special cases.

BTW, I pickpocketed the idea that if there is a choice where to put the
complexity, you never put it with the user. "pickpocket" is strong, I've
learned it from an analyst who was 30 years in the business, and I
really respect the guy, basically he was always right and right on. On
the other hand, the management did not always like what he thought :)

bye,
rm
 
D

Doug Holton

Fredrik said:
and trust me, when things are hard to get right for developers, users will
suffer too.

That is exactly why YAML can be improved. But XML proves that getting
it "right" for developers has little to do with getting it right for
users (or for saving bandwidth). What's right for developers is what
requires the least amount of work. The problem is, that's what is right
for end-users, too.
 
D

Doug Holton

rm said:
this implementation of their idea. But I'd love to see a generic,
pythonic data format.

That's a good idea. But really Python is already close to that. A lot
of times it is easier to just write out a python dictionary than using a
DB or XML or whatever. Python is already close to YAML in some ways.
Maybe even better than YAML, especially if Fredrik's claims of YAML's
inherent unreliability are to be believed. Of course he develops a
competing XML product, so who knows.
 
F

Fredrik Lundh

rm said:
furthermore, "users will suffer too", I'm suffering if I have to use C++, with all its exceptions
and special cases.

and when you suffer, your users will suffer. in the C++ case, they're likely to
suffer from spurious program crashes, massively delayed development projects,
obscure security holes, etc.

</F>
 
D

Daniel Bickett

Doug said:
What do you expect? YAML is designed for humans to use, XML is not.
YAML also hasn't had the backing and huge community behind it like XML.
XML sucks for people to have to write in, but is straightforward to
parse. The consequence is hordes of invalid XML files, leading to
necessary hacks like the mark pilgrim's universal rss parser. YAML
flips the problem around, making it harder perhaps to implement a
universal parser, but better for the end-user who has to actually use
it. More people need to work on improving the YAML spec and
implementing better YAML parsers. We've got too many XML parsers as it is.

However, one of the main reasons that XML is so successful is because
it's roots are shared by (or, perhaps, in) a markup language that a
vast majority of the Internet community knows: HTML.

In it's most basic form, I don't care what anyone says, XML is VERY
straight forward. Throughout the entire concept of XML (again, in its
most basic form) the idea of opening and closing tags (with the
exception of the standalone tags, however still very simple) is
constant, for all different data types.

In my (brief) experience with YAML, it seemed like there were several
different ways of doing things, and I saw this as one of it's failures
(since we're all comparing it to XML). However I maintain, in spite of
all of that, that it can easily boil down to the fact that, for
someone who knows the most minuscule amount of HTML (a very easy thing
to do, not to mention most people have a tiny bit of experience to
boot), the transition to XML is painless. YAML, however, is a brand
new format with brand new semantics.

As for the human read-and-write-ability, I don't know about you, but I
have no trouble whatsoever reading and writing XML. But alas, I don't
need to. Long live elementtree (once again) :)

Daniel Bickett
 
P

Paul Rubin

Daniel Bickett said:
In my (brief) experience with YAML, it seemed like there were several
different ways of doing things, and I saw this as one of it's failures
(since we're all comparing it to XML).

YAML looks to me to be completely insane, even compared to Python
lists. I think it would be great if the Python library exposed an
interface for parsing constant list and dict expressions, e.g.:

[1, 2, 'Joe Smith', 8237972883334L, # comment
{'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
'xyzzy', [3, 5, [3.14159, 2.71828, []]]]

I don't see what YAML accomplishes that something like the above wouldn't.

Note that all the values in the above have to be constant literals.
Don't suggest using eval. That would be a huge security hole.
 
S

Stephen Waterbury

Steve said:
It seems to me the misunderstanding here is that XML was ever intended
to be generated directly by typing in a text editor. It was rather
intended (unless I'm mistaken) as a process-to-process data interchange
metalanguage that would be *human_readable*.

The premise that XML had a coherent design intent
stetches my credulity beyond its elastic limit.
 
R

rm

Doug said:
That's a good idea. But really Python is already close to that. A lot
of times it is easier to just write out a python dictionary than using a
DB or XML or whatever. Python is already close to YAML in some ways.
Maybe even better than YAML, especially if Fredrik's claims of YAML's
inherent unreliability are to be believed. Of course he develops a
competing XML product, so who knows.

true, it's easy enough to separate the data from the functionality in
python by putting the data in a dictionary/list/tuple, but it stays
source code.

rm
 
F

Fredrik Lundh

Stephen said:
The premise that XML had a coherent design intent
stetches my credulity beyond its elastic limit.

the design goals are listed in section 1.1 of the specification.

see tim bray's annotated spec for additional comments by one
of the team members:

http://www.xml.com/axml/testaxml.htm

(make sure to click on all (H)'s and (U)'s in that section for the
full story).

if you think that the XML 1.0 team didn't know what they were
doing, you're seriously mistaken. it's the post-1.0 standards that
are problematic...

</F>
 
A

Alex Martelli

Paul Rubin said:
lists. I think it would be great if the Python library exposed an
interface for parsing constant list and dict expressions, e.g.:

[1, 2, 'Joe Smith', 8237972883334L, # comment
{'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
'xyzzy', [3, 5, [3.14159, 2.71828, []]]]

I don't see what YAML accomplishes that something like the above wouldn't.

Note that all the values in the above have to be constant literals.
Don't suggest using eval. That would be a huge security hole.

I do like the idea of a parser that's restricted to "safe expressions"
in this way. Once the AST branch merge is done, it seems to me that
implementing it should be a reasonably simple exercise, at least at a
"toy level".

I wonder, however, if, as an even "toyer" exercise, one might not
already do it easily -- by first checking each token (as generated by
tokenize.generate_tokens) to ensure it's safe, and THEN eval _iff_ no
unsafe tokens were found in the check. Accepting just square brackets,
braces, commas, constant strings and numbers, and comments, should be
pretty safe -- we'd no doubt want to also accept minus (for unary
minus), plus (to make complex numbers), and specifically None, True,
False -- but that, it appears to me, still leaves little margin for an
attacker to prepare an evil string that does bad things when eval'd...


Alex
 
M

Michael Spencer

Paul said:
YAML looks to me to be completely insane, even compared to Python
lists. I think it would be great if the Python library exposed an
interface for parsing constant list and dict expressions, e.g.:

[1, 2, 'Joe Smith', 8237972883334L, # comment
{'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
'xyzzy', [3, 5, [3.14159, 2.71828, []]]]

I don't see what YAML accomplishes that something like the above wouldn't.

Note that all the values in the above have to be constant literals.
Don't suggest using eval. That would be a huge security hole.
Not hard at all, thanks to compiler.ast:
... def __init__(self):
... self._cache = {} # dispatch table
...
... def visit(self, node,**kw):
... cls = node.__class__
... meth = self._cache.setdefault(cls,
... getattr(self,'visit'+cls.__name__,self.default))
... return meth(node, **kw)
...
... def default(self, node, **kw):
... for child in node.getChildNodes():
... return self.visit(child, **kw)
... ... def visitConst(self, node, **kw):
... return node.value
...
... def visitName(self,node, **kw):
... raise NameError, "Names are not resolved"
...
... def visitDict(self,node,**kw):
... return dict([(self.visit(k),self.visit(v)) for k,v in node.items])
...
... def visitTuple(self,node, **kw):
... return tuple(self.visit(i) for i in node.nodes)
...
... def visitList(self,node, **kw):
... return [self.visit(i) for i in node.nodes]
...[1, 2, 'Joe Smith', 8237972883334L, {'Favorite fruits': ['apple', 'banana',
'pear']}, 'xyzzy', [3, 5, [3.1415899999999999, 2.71828, []]]]

Add sugar to taste

Regards

Michael
 
F

Fredrik Lundh

Alex said:
[1, 2, 'Joe Smith', 8237972883334L, # comment
{'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
'xyzzy', [3, 5, [3.14159, 2.71828, []]]]

I don't see what YAML accomplishes that something like the above wouldn't.

Note that all the values in the above have to be constant literals.
Don't suggest using eval. That would be a huge security hole.

I do like the idea of a parser that's restricted to "safe expressions"
in this way. Once the AST branch merge is done, it seems to me that
implementing it should be a reasonably simple exercise, at least at a
"toy level".

for slightly more interop, you could plug in a modified tokenizer, and use
JSON:

http://www.crockford.com/JSON/xml.html
I wonder, however, if, as an even "toyer" exercise, one might not
already do it easily -- by first checking each token (as generated by
tokenize.generate_tokens) to ensure it's safe, and THEN eval _iff_ no
unsafe tokens were found in the check. Accepting just square brackets,
braces, commas, constant strings and numbers, and comments, should be
pretty safe -- we'd no doubt want to also accept minus (for unary
minus), plus (to make complex numbers), and specifically None, True,
False

or you could use a RE to make sure the string only contains safe literals,
and pass the result to eval.
but that, it appears to me, still leaves little margin for an attacker to prepare
an evil string that does bad things when eval'd...

besides running out of parsing time or object memory, of course. unless
you check the size before/during the parse.

</F>
 
P

Paul Rubin

I wonder, however, if, as an even "toyer" exercise, one might not
already do it easily -- by first checking each token (as generated by
tokenize.generate_tokens) to ensure it's safe, and THEN eval _iff_ no
unsafe tokens were found in the check.

I don't trust that for one minute. It's like checking a gun to make
sure that it has no bullets, then putting it to your head and pulling
the trigger. Or worse, it's like checking the gun once, then putting
it to your head and pulling the trigger every day for the next N years
without checking again to see if someone has inserted some bullets
(this is what you basically do if you write your program to check if
the tokens are safe, and then let users keep running it without
re-auditing it, as newer versions of Python get released).

See the history of the pickle module to see how that kind of change
has already screwed people (some comments in SF bug #467384). "Don't
use eval" doesn't mean mean "check if it's safe before using it". It
means "don't use it".
 
T

Tim Parkin

Doug said:
That is exactly why YAML can be improved. But XML proves that getting
it "right" for developers has little to do with getting it right for
users (or for saving bandwidth). What's right for developers is what
requires the least amount of work. The problem is, that's what is right
for end-users, too.

Having spent some time with YAML and it's implementations (at least
pyyaml and the ruby/python versions of syck), I thought I should
comment. The only problems with syck we've encountered have been to do
with the python wrapper rather than syck itself. Syck seems to be used
widely without problems within the Ruby community and if anybody has
evidence of issues with it I'd really like to know about them. PyYAML is
a little inactive and doesn't conform to the spec in many ways and, as
such, we prefer the syck implementation.

In my opinion there have been some bad decisions made whilst creating
YAML, but for me they are acceptable given the advantages of a data
format that is simple to read and write. Perhaps judging the utility of
a project on it's documentation is one of the problems, as most people
who have 'just used it' seem to be happy enough. These people include
non-technical clients of ours who manage some of their websites by
editing YAML files directly. That said, I don't think it would be the
best way to enter data for a life support machine, but I wouldn't like
to do that with XML either ;-)

One thing that should be pointed out is that there are no parsers
available that are built directly on the YAML pseudo BNF. Such work is
in progress in two different forms but don't expect anything soon. As I
understand it, Syck has been built to pass tests rather than conform to
a constantly changing BNF and it seems to have few warts.

Tim
 
S

Stephen Waterbury

Fredrik said:
the design goals are listed in section 1.1 of the specification.

see tim bray's annotated spec for additional comments by one
of the team members:

http://www.xml.com/axml/testaxml.htm

(make sure to click on all (H)'s and (U)'s in that section for the
full story).

Thanks, Fredrik, I hadn't seen that. My credulity has been restored
to its original shape. Whatever that was. :)

However, now that I have direct access to the documented design
goals (intent) of XML, it's interesting to note that the intent
Steve Holden imputed to it earlier is not explicitly among them:

Steve said:
It seems to me the misunderstanding here is that XML was ever intended
to be generated directly by typing in a text editor. It was rather
intended (unless I'm mistaken) as a process-to-process data interchange
metalanguage that would be *human_readable*.

Not unless you interpret "XML shall support a wide variety of applications"
as "XML shall provide a process-to-process data interchange metalanguage".
It might have been a hidden agenda, but it certainly was not an
explicit design goal.

(The "human-readable" part is definitely there:
"6. XML documents should be human-legible and reasonably clear",
and Steve was also correct that generating XML directly by typing
in a text editor was definitely *not* a design intent. ;)
if you think that the XML 1.0 team didn't know what they were
doing, you're seriously mistaken. it's the post-1.0 standards that
are problematic...

Agreed. And many XML-based standards.

- Steve
 
P

Peter Hansen

Stephen said:
it's interesting to note that the intent
Steve Holden imputed to it earlier is not explicitly among them:



Not unless you interpret "XML shall support a wide variety of applications"
as "XML shall provide a process-to-process data interchange metalanguage".
It might have been a hidden agenda, but it certainly was not an
explicit design goal.

If merely thinking about the purpose of XML doesn't make it
clear where Steve got that idea, read up a little bit more in
the spec to the very first paragraph in the Introduction, and
click on the little M-in-a-circle next to the phrase "data objects".
I'll even quote it here for you, to save time:

"""What Do You Mean By "Data Object?"

Good question. The point is that an XML document is sometimes
a file, sometimes a record in a relational database, sometimes an
object delivered by an Object Request Broker, and sometimes a
stream of bytes arriving at a network socket.

These can all be described as "data objects".
"""

I would ask what part of that, or of the simple phrase
"data object", or even of the basic concept of a markup language,
doesn't cry out "data interchange metalanguage" to you?

-Peter
 
S

Stephen Waterbury

Peter said:
If merely thinking about the purpose of XML doesn't make it
clear where Steve got that idea ...

I meant no disparagement of Steve, and it is quite clear
where he got that (correct!) idea ...

It's also clear that the XML user community sees
that as part of *their* purpose in applying XML.
But here we are talking about intent of its designers,
and "merely thinking about the purpose of XML" won't
enable me to read their minds. ;)
read up a little bit more in
the spec [... in which it is stated rather explicitly!]

I would ask what part of that, or of the simple phrase
"data object", or even of the basic concept of a markup language,
doesn't cry out "data interchange metalanguage" to you?

It does indeed -- my apologies for not reading the annotations
more carefully! I missed that one in particular. Okay, you've
dragged me, kicking and screaming, to agree that the actual,
published design intent of XML is to provide a "data
interchange metalanguage".

Thanks to Fredrik for the link he included (elsewhere
in the "YAML" thread) to JavaScript Object Notation (JSON).
JSON looks like a notable improvement over XML for data
objects that are more fine-grained (higher ratio of markup to
non-markup -- e.g., most relational data sets, RDF, etc.)
than those at the more traditional "document" end of the
spectrum (less markup, more text).

The latter types of data objects are the ones I happen to believe
are in the sweet spot of XML's design, regardless of its designers'
more sweeping pronouncements (and hopes, no doubt).

I should note that I have to deal with XML a lot, but always
kicking and screaming (though much less now because of Fredrik's
Elementtree package ;). Thanks, Fredrik and Peter, for the
references. ;)

Peace.
Steve
 
L

Leif K-Brooks

Bengt said:
I thought XML was a good idea, but IMO requiring quotes around
even integer attribute values was an unfortunate decision.

I think it helps guard against incompetent authors who wouldn't
understand when they're required to use quotes and when they're not. I
see HTML pages all of the time where the author's done something like:

<img src=http://example.com/foo/bar/baz/spam/>

Sometimes it even has spaces in it. At least with a proper XML parser,
they would know where they went wrong right away.
 
S

Steve Holden

Doug said:
That is exactly why YAML can be improved. But XML proves that getting
it "right" for developers has little to do with getting it right for
users (or for saving bandwidth). What's right for developers is what
requires the least amount of work. The problem is, that's what is right
for end-users, too.

Yet again I will interject that XML was only ever intended to be wriiten
by programs. Hence its moronic stupidity and excellent uniformity.

regards
Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,771
Messages
2,569,587
Members
45,099
Latest member
AmbrosePri
Top