What YAML engine do you use?

  • Thread starter Reinhold Birkenfeld
  • Start date
D

Doug Holton

Peter said:
Good question. The point is that an XML document is sometimes
a file, sometimes a record in a relational database, sometimes an
object delivered by an Object Request Broker, and sometimes a
stream of bytes arriving at a network socket.

These can all be described as "data objects".
"""

I would ask what part of that, or of the simple phrase
"data object", or even of the basic concept of a markup language,
doesn't cry out "data interchange metalanguage" to you?

Actually I don't see any explicit mention that XML was meant to be
limited to data interchange only.
"data object" has to do with more than data interchange. There is data
entry as well. And people are having to hand enter XML files all the
time for things like Ant, XHTML, etc.

I guess all those people who learned how to write web pages by hand were
violating some spec and so they have no cause to complain about any
difficulties doing so. Tim Berners-Lee never intended people to have to
type in URLs, either, but here we are.
 
D

Doug Holton

Steve said:
Yet again I will interject that XML was only ever intended to be wriiten
by programs. Hence its moronic stupidity and excellent uniformity.

Neither was HTML, neither were URLs, neither were many things used the
way they were intended. YAML, however, is specifically designed to be
easier for people to write and to read, as is Python.
 
D

Doug Holton

Daniel said:
In my (brief) experience with YAML, it seemed like there were several
different ways of doing things, and I saw this as one of it's failures
(since we're all comparing it to XML). However I maintain, in spite of
all of that, that it can easily boil down to the fact that, for
someone who knows the most minuscule amount of HTML (a very easy thing
to do, not to mention most people have a tiny bit of experience to
boot), the transition to XML is painless. YAML, however, is a brand
new format with brand new semantics.

That's true and a very good point. Like you said, that's probably the
reason XML took off, because of our familiarity with HTML.
As for the human read-and-write-ability, I don't know about you, but I
have no trouble whatsoever reading and writing XML.

You might like programming in XML then: http://www.meta-language.net/
:)
 
A

Alan Kennedy

[Effbot]
ReST and YAML share the same deep flaw: both formats are marketed
as simple, readable formats, and at a first glance, they look simple and read-
able -- but in reality, they're messy as hell, and chances are that the thing
you're looking at doesn't really mean what you think it means (unless you're
the official ReST/YAML parser implementation). experienced designers
know how to avoid that; the ReST/YAML designers don't even understand
why they should.

I'm looking for a good textual markup language at the moment, for
capturing web and similar textual content.

I don't want to use XML for this particular usage, because this content
will be entered through a web interface, and I don't want to force users
through multiple rounds of
submit/check-syntax/generate-error-report/re-submit in order to enter
their content.

I have no strong feelings about YAML: If I want to structured data, e.g.
lists, dictionaries, etc, I just use python.

However, I'm torn on whether to use ReST for textual content. On the one
hand, it's looks pretty comprehensive and solidly implemented. But OTOH,
I'm concerned about complexity: I don't want to commit to ReST if it's
going to become a lot of hard work or highly-inefficient when I really
need to use it "in anger".

From what I've seen, pretty much every textual markup targetted for web
content, e.g. wiki markup, seems to have grown/evolved organically,
meaning that it is either underpowered or overpowered, full of special
cases, doesn't have a meaningful object model, etc.

So, I'm hoping that the learned folks here might be able to give me some
pointers to a markup language that has the following characteristics

1. Is straightforward for non-technical users to use, i.e. can be
(mostly) explained in a two to three page document which is
comprehensible to anyone who has ever used a simple word-processor or
text-editor.

2. Allows a wide variety of content semantics to be represented, e.g.
headings, footnotes, sub/superscript, links, etc, etc.

3. Has a complete (but preferably lightweight) object model into which
documents can be loaded, for transformation to other languages.

4. Is speed and memory efficient.

5. Obviously, has a solid python implementation.

Most useful would be a pointer to a good comparison/review page which
compares multiple markup languages, in terms of the above requirements.

If I can't find such a markup language, then I might instead end up
using a WYSIWYG editing component that gives the user a GUI and
generates (x)html.

htmlArea: http://www.htmlarea.com/
Editlet: http://www.editlet.com/

But I'd prefer a markup solution.

TIA for any pointers.

regards,
 
P

Paul Rubin

Alan Kennedy said:
However, I'm torn on whether to use ReST for textual content. On the
one hand, it's looks pretty comprehensive and solidly implemented.

It seemed both unnecessary and horrendously overcomplicated when I
looked at it. I'd stay away.
So, I'm hoping that the learned folks here might be able to give me
some pointers to a markup language that has the following
characteristics

I'm a bit biased but I've been using Texinfo for a long time and have
been happy with it. It's reasonably lightweight to implement, fairly
intuitive to use, and doesn't get in the way too much when you're
writing. There are several implementations, none in Python at the
moment but that would be simple enough. It does all the content
semantics you're asking (footnotes etc). It doesn't have an explicit
object model, but is straightforward to convert into a number of
formats including high-quality printed docs (TeX); the original Info
hypertext browser that predates the web; and these days HTML.
If I can't find such a markup language, then I might instead end up
using a WYSIWYG editing component that gives the user a GUI and
generates (x)html.... But I'd prefer a markup solution.

Yes, for heavy-duty users, markup is far superior to yet another
editor. Everyone has their favorite editor and doesn't want to have
to switch to another one, hence the Emacs vs. Vi wars etc.
 
F

Fredrik Lundh

Alan said:
From what I've seen, pretty much every textual markup targetted for web content, e.g. wiki markup,
seems to have grown/evolved organically, meaning that it is either underpowered or overpowered,
full of special cases, doesn't have a meaningful object model, etc.

I spent the eighties designing one textual markup language after another,
for a wide variety of projects (mainly for technical writing). I've since come
to the conclusion that they all suck (for exactly the reasons you mention above,
plus the usual "the implementation is the only complete spec we have" issue).

these days, I usually use HTML+custom classes for authoring (and run them
through a HTML->XHTML converter for processing).

the only markup language I've seen lately that isn't a complete mess is John
Gruber's markdown:

http://daringfireball.net/projects/markdown/

which has an underlying object model (HTML/XHTML) and doesn't have too
many warts. not sure if anyone has done a Python implementation yet, though
(for html->markdown, see http://www.aaronsw.com/2002/html2text/ ), and I
don't think it supports footnotes (HTML doesn't).
If I can't find such a markup language, then I might instead end up using a WYSIWYG editing
component that gives the user a GUI and generates (x)html.

htmlArea: http://www.htmlarea.com/
Editlet: http://www.editlet.com/

But I'd prefer a markup solution.

some of these are amazingly usable. have you asked your users what they
prefer? (or maybe you are your user? ;-)

</F>
 
R

rm

rm said:
well, I did look at it, and as a text format is more readable than XML
is. Furthermore, XML's verbosity is incredible. This format is not.
People are abusing the genericity of XML to put everything into it.

Parsing and working with XML are highly optimized, so there's not really
a problem in that sector. But to transfer the same data in a YAML
format, rather than a XML format is much more economic. But networks are
getting faster, right?

Nowadays, people are trying to create binary XML, XML databases,
graphics in XML (btw, I'm quite impressed by SVG), you have XSLT, you
have XSL-FO, ... .

And I think, YAML is a nice initiative.

bye,
rm

http://www.theinquirer.net/?article=20868 :)

rm
 
A

Alan Kennedy

[Alan Kennedy]
[Fredrik Lundh]
> I spent the eighties designing one textual markup language after
> another, for a wide variety of projects (mainly for technical
> writing). I've since come to the conclusion that they all suck
> (for exactly the reasons you mention above, plus the usual
> "the implementation is the only complete spec we have" issue).

Thanks Fredrik, I thought you might have a fair amount of experience in
this area :)

[Fredrik Lundh]
> the only markup language I've seen lately that isn't a complete mess
> is John Gruber's markdown:
>
> http://daringfireball.net/projects/markdown/
>
> which has an underlying object model (HTML/XHTML) and doesn't have
> too many warts. not sure if anyone has done a Python implementation
> yet, though (for html->markdown, see
> http://www.aaronsw.com/2002/html2text/ ), and I don't think it
> supports footnotes (HTML doesn't).

Thanks for the pointer. I took a look at Markdown, and it does look
nice. But I don't like the dual syntax, e.g. switching into HTML for
tables, etc: I'm concerned that the syntax switch might be too much for
non-techies.

[Alan Kennedy]
[Fredrik Lundh]
> some of these are amazingly usable. have you asked your users what
> they prefer? (or maybe you are your user? ;-)

Actually, I'm looking for a solution for both myself and for end-users
(who will take what they're given ;-).

For myself, I think I'll end up picking Markdown, ReST, or something
comparable from the wiki-wiki-world.

For the end-users, I'm starting to think that GUI is the only way to go.
The last time I looked at this area, a few years ago, the components
were fairly immature and pretty buggy. But the number of such components
and their quality seems to have greatly increased in recent times.

Particularly, many of them seem to address an important requirement that
I neglected to mention in my original list: unicode support. I'll be
processing all kinds of funny characters, e.g. math/scientific symbols,
european, asian and middle-eastern names, etc.

thanks-and-regards-ly-y'rs,
 
A

Alan Kennedy

[Alan Kennedy]
[Paul Rubin]
> I'm a bit biased but I've been using Texinfo for a long time and have
> been happy with it. It's reasonably lightweight to implement, fairly
> intuitive to use, and doesn't get in the way too much when you're
> writing. There are several implementations, none in Python at the
> moment but that would be simple enough. It does all the content
> semantics you're asking (footnotes etc). It doesn't have an explicit
> object model, but is straightforward to convert into a number of
> formats including high-quality printed docs (TeX); the original Info
> hypertext browser that predates the web; and these days HTML.

Thanks Paul,

I took a look at texinfo, and it looks powerful and good ....... for
programmers.

Looks like a very steep learning curve for non-programmers though. It
seems to require just a few hundred kilobytes too much documentation ......

regards,
 
I

Istvan Albert

Paul Rubin wrote:


This is my favorite:

http://weblog.burningbird.net/archives/2002/10/08/the-parable-of-the-languages

"I’m considered the savior, the ultimate solution, the final word.
Odes are written to me, flowers strewn at my feet, virgins sacrificed at
my altar. Programmers speak my name with awe. Companies insist on using
me in all their projects, though they’re not sure why. And whenever a
problem occurs, someone somewhere says, “Let’s use XML", and miracles
occur and my very name has become a talisman against evil. And yet, all
I am is a simple little markup, from humble origins.
It’s a burden, being XML."
 
I

Istvan Albert

rm said:

There's a lot of nonsense out there propagated by people who do not
understand XML. You can't possibly blame that on XML...

For me XSLT transformations are the main reason for using XML.
If I have an XML document I can turn it into other
formats with a few lines of code. Most importantly these
are much safer to run than a program.

I think of an XML document as a "mini-database" where one
can easily and efficiently access content via XPath. So there
is a lot more to XML than just markup and that's
why YAML vs XML comparisons make very little sense.

Istvan.
 
S

Sion Arrowsmith

Paul Rubin said:
YAML looks to me to be completely insane, even compared to Python
lists. I think it would be great if the Python library exposed an
interface for parsing constant list and dict expressions, e.g.:
[1, 2, 'Joe Smith', 8237972883334L, # comment
{'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
'xyzzy', [3, 5, [3.14159, 2.71828, []]]]
[ ... ]
Note that all the values in the above have to be constant literals.
Don't suggest using eval. That would be a huge security hole.

I'm probably not thinking deviously enough here, but how are you
going to exploit an eval() which has very tightly controlled
globals and locals (eg. eval(x, {"__builtins__": None}, {}) ?
 
D

Doug Holton

rm said:
true, it's easy enough to separate the data from the functionality in
python by putting the data in a dictionary/list/tuple, but it stays
source code.

Check out JSON, an alternative to XML for data interchange. It is
basically just python dictionaries and lists:
http://www.crockford.com/JSON/example.html

I think I would like this better than YAML or XML, and it looks like it
already parses as valid Python code, except for the /* */ multiline
comments (which boo supports).

It was mentioned in a story about JSON-RPC-Java:
http://developers.slashdot.org/article.pl?sid=05/01/24/125236
 
F

Fredrik Lundh

Sion said:
I'm probably not thinking deviously enough here, but how are you
going to exploit an eval() which has very tightly controlled
globals and locals (eg. eval(x, {"__builtins__": None}, {}) ?

try this:

eval("'*'*1000000*2*2*2*2*2*2*2*2*2")

(for more on eval and builtins, see the "Evaluating Python expressions"
section here: http://effbot.org/librarybook/builtin.htm )

</F>
 
P

Peter Hansen

Sion said:
Paul Rubin said:
YAML looks to me to be completely insane, even compared to Python
lists. I think it would be great if the Python library exposed an
interface for parsing constant list and dict expressions, e.g.:
[1, 2, 'Joe Smith', 8237972883334L, # comment
{'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
'xyzzy', [3, 5, [3.14159, 2.71828, []]]]
[ ... ]
Note that all the values in the above have to be constant literals.
Don't suggest using eval. That would be a huge security hole.


I'm probably not thinking deviously enough here, but how are you
going to exploit an eval() which has very tightly controlled
globals and locals (eg. eval(x, {"__builtins__": None}, {}) ?

See, for example, Alex Martelli's post in an old thread from 2001:
http://groups.google.ca/[email protected]

-Peter
 
M

Michael Spencer

Fredrik said:
try this:

eval("'*'*1000000*2*2*2*2*2*2*2*2*2")

I updated the safe eval recipe I posted yesterday to add the option of reporting
unsafe source, rather than silently ignoring it. Is this completely safe? I'm
interested in feedback.

Michael

Some source to try:
>>> goodsource = """[1, 2, 'Joe Smith', 8237972883334L, # comment
... {'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
... 'xyzzy', [3, 5, [3.14159, 2.71828, []]]]"""
...

Unquoted string literal
>>> badsource = """[1, 2, JoeSmith, 8237972883334L, # comment
... {'Favorite fruits': ['apple', 'banana', 'pear']}, # another comment
... 'xyzzy', [3, 5, [3.14159, 2.71828, []]]]"""
...
Non-constant expression[1, 2, 'Joe Smith', 8237972883334L, {'Favorite fruits': ['apple', 'banana',
'pear']}, 'xyzzy', [3, 5, [3.1415899999999999, 2.71828, []]]]Traceback (most recent call last):
[...]
Unsafe_Source_Error: Line 1. Strings must be quoted: JoeSmith
[1, 2, None, 8237972883334L, {'Favorite fruits': ['apple', 'banana', 'pear']},
'xyzzy', [3, 5, [3.1415899999999999, 2.71828, []]]]
Traceback (most recent call last):
[...]
Unsafe_Source_Error: Line 1. Unsupported source construct: compiler.ast.Mul
...
'*'
Source:

import compiler

class Unsafe_Source_Error(Exception):
def __init__(self,error,descr = None,node = None):
self.error = error
self.descr = descr
self.node = node
self.lineno = getattr(node,"lineno",None)

def __repr__(self):
return "Line %d. %s: %s" % (self.lineno, self.error, self.descr)
__str__ = __repr__

class AbstractVisitor(object):
def __init__(self):
self._cache = {} # dispatch table

def visit(self, node,**kw):
cls = node.__class__
meth = self._cache.setdefault(cls,
getattr(self,'visit'+cls.__name__,self.default))
return meth(node, **kw)

def default(self, node, **kw):
for child in node.getChildNodes():
return self.visit(child, **kw)
visitExpression = default

class SafeEval(AbstractVisitor):

def visitConst(self, node, **kw):
return node.value

def visitDict(self,node,**kw):
return dict([(self.visit(k),self.visit(v)) for k,v in node.items])

def visitTuple(self,node, **kw):
return tuple(self.visit(i) for i in node.nodes)

def visitList(self,node, **kw):
return [self.visit(i) for i in node.nodes]

class SafeEvalWithErrors(SafeEval):

def default(self, node, **kw):
raise Unsafe_Source_Error("Unsupported source construct",
node.__class__,node)

def visitName(self,node, **kw):
raise Unsafe_Source_Error("Strings must be quoted",
node.name, node)

# Add more specific errors if desired


def safe_eval(source, fail_on_error = True):
walker = fail_on_error and SafeEvalWithErrors() or SafeEval()
try:
ast = compiler.parse(source,"eval")
except SyntaxError, err:
raise
try:
return walker.visit(ast)
except Unsafe_Source_Error, err:
raise
 
S

Sion Arrowsmith

Fredrik Lundh said:
try this:

eval("'*'*1000000*2*2*2*2*2*2*2*2*2")

No thanks.

I guess my problem is a tendency view security issues from the
point of view of access to data rather than access to processing.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,223
Latest member
Jurgen2087

Latest Threads

Top