B
BGB
one issue partly in the case of XML for its use in structured data is
its relative verbosity, especially in cases where it is entered by hand
or being read by a human (say, for debugging reasons, ...).
so, the thought here would be to allow a "modest" syntax extension
(probably would be limited to particular implementations which support
the extension).
more specifically, I was considering it as a possible extension feature
to my own implementation, but have some doubts given that, yes, this
would be non-standard extension. note that there probably would be a
feature to manually "enable" it, such as to avoid necessarily breaking
compatibility. in my case, the current primary use is for things like
compiler ASTs, where it competes some with the use of S-Expressions for
ASTs (Lisp style, not the "Rivest" variant / name-hijack). note that
these ASTs normally never leave the application which created them, so
the impact of using a non-standard syntax when serializing them is
likely fairly small.
example, say that a person has an expression like:
<if>
<cond>
<binary op="<">
<ref name="x"/>
<number value="3"/>
</binary>
</cond>
<then>
<funcall name="foo">
<args/>
</funcall>
</then>
</if>
representing, say, the AST of the statement "if(x>3)foo();".
the parser and printer could use a more compact encoding, say:
<if
which would be regarded as functionally-equivalent to the prior
expression (and would generate equivalent DOM trees when read back in).
with the following rules:
<tag>...</tag> and <tag/> are the same as before.
while:
<tag <...> ...>
would use an alternate parsing strategy, where ">" is significant (since
the prior tag didn't actually end), and indicates the end of the
expression (the magic here would be seeing another "<" within a tag).
similarly, maybe "<[[" could also be parsed as a shorthand for
"<![CDATA[" as well (and would also match nicer with the closing bracket
"]]>").
note that it would be possible to mix them, as in:
<foo> <bar <baz/>> </foo>
and:
<foo <bar> <baz/> </bar>>
maybe also a different "name" would be a good idea, like "XEML" or
similar would make sense, such as to reduce possible confusion.
any thoughts or relevant information to look at?...
its relative verbosity, especially in cases where it is entered by hand
or being read by a human (say, for debugging reasons, ...).
so, the thought here would be to allow a "modest" syntax extension
(probably would be limited to particular implementations which support
the extension).
more specifically, I was considering it as a possible extension feature
to my own implementation, but have some doubts given that, yes, this
would be non-standard extension. note that there probably would be a
feature to manually "enable" it, such as to avoid necessarily breaking
compatibility. in my case, the current primary use is for things like
compiler ASTs, where it competes some with the use of S-Expressions for
ASTs (Lisp style, not the "Rivest" variant / name-hijack). note that
these ASTs normally never leave the application which created them, so
the impact of using a non-standard syntax when serializing them is
likely fairly small.
example, say that a person has an expression like:
<if>
<cond>
<binary op="<">
<ref name="x"/>
<number value="3"/>
</binary>
</cond>
<then>
<funcall name="foo">
<args/>
</funcall>
</then>
</if>
representing, say, the AST of the statement "if(x>3)foo();".
the parser and printer could use a more compact encoding, say:
<if
which would be regarded as functionally-equivalent to the prior
expression (and would generate equivalent DOM trees when read back in).
with the following rules:
<tag>...</tag> and <tag/> are the same as before.
while:
<tag <...> ...>
would use an alternate parsing strategy, where ">" is significant (since
the prior tag didn't actually end), and indicates the end of the
expression (the magic here would be seeing another "<" within a tag).
similarly, maybe "<[[" could also be parsed as a shorthand for
"<![CDATA[" as well (and would also match nicer with the closing bracket
"]]>").
note that it would be possible to mix them, as in:
<foo> <bar <baz/>> </foo>
and:
<foo <bar> <baz/> </bar>>
maybe also a different "name" would be a good idea, like "XEML" or
similar would make sense, such as to reduce possible confusion.
any thoughts or relevant information to look at?...