Interplatform (interprocess, interlanguage) communication

L

Lew

BGB said:
it is not like one can't have both:

XML is much easier to modify and maintain when flexibility is a requirement..
have a format which is at the same time is a compressed binary format,
and can also retain the full flexibility of representing free-form XML
semantics, ideally without a major drop in compactness (this happens
with WBXML, and IIRC should also happen with EXI about as soon as one
starts encoding nodes which lie outside the schema).

this is partly why I was advocating a sort of pattern-building adaptive
format: it can build the functional analogue of a schema as it encodes

That rather defeats the purpose of having a schema.

A schema is a contract that the various processes or other stakeholders useto
guarantee correctness of the XML and guide processing. If you develop it /ad
hoc/ you lose that contract.
the data, and likewise does not depend on a schema to properly decode
the document. it is mostly a matter of having the format predict when it
doesn't need to specify tag and attribute names (it is otherwise similar
to a traditional data-compressor).

I'm sure that's very clever, but it defeats the purpose of XML schema.
this is functionally similar to the sliding-window as used in deflate
and LZMA (7zip) and similar (in contrast to codebook-based data
compressors). functionally, it would have a little more in common with
LZW+MTF than with LZ77 though.

.... and now you're off on some weird tangential topic.
granted, potentially a binary format could incorporate both support for
schemas and the use of adaptive compression.


is XML really the text, or is it actually the structure?
Huh?

I had operated under the premise that it was the data-structure (tags,
attributes, namespaces, ...), which allows for pretty much anything
which can faithfully encode the structure (without imposing too many
arbitrary restrictions).

Huh?

XML is a formal specification for structured documents that is devoid of
semantics.
fair enough, I have mostly been using it "internally", and as noted, for
some of my file-formats, I had used a custom binary coded variant
(roughly similar to WBXML, but generally more compact and supporting
more features, such as namespaces and similar, which I had called SBXE).
it didn't make use of schemas, and worked by simply encoding the tag
structure into the file, and using basic contextual modeling strategies.

Bully. Good on ye.
it also compared favorably with XML+GZ in my tests (which IIRC was also
generally smaller than WBXML). remotely possible would also be XML+BZip2
or XML+LZMA.

Compared "favorably" according to what criteria?
I had considered the possibility of a more "advanced" format (with more
advanced predictive modeling), but didn't bother (couldn't see much
point at the time of trying to shave off more bytes at the time, as it
was already working fairly well).
Huh?


well, a lot depends...

for disk files, really, who cares?...
for a link where a several kB message might only take maybe 250-500ms
and is at typical "user-interaction" speeds (say, part of a generic "web
app"), likewise, who cares?...


it may matter a little more in a 3D interactive world where everything
going on in the visible scene has to get through at a 10Hz or 24Hz
clock-tick, and if the connection bogs down the user will be rather
annoyed (as their game world has essentially stalled).

And that's a use case for XML how, exactly?

Saying "XML is bad because it doesn't keep bananas ripe" would be equally
relevant.
one may have to make due with about 16-24kB/s (or maybe less) to better
ensure a good user experience (little is to say that the user has a
perfect internet connection either).

so, some sort of compression may be needed in this case.
(yes, XML+GZ would probably be sufficient).

Back in the universe where we're discussing XML's suitability, please.
if it were dial-up, probably no one would even consider using XML for
the network protocol in a 3D game.

Oh, you're talking about inter-node communication in a distributed game. Thanks
for finally making that clear. XML would be just fine as a transmission protocol for such a thing. I'm not saying ideal, but just fine.

If you're talking about network protocols you certainly are not talking about
frame-by-frame transmission of data with reply at 10 Hz, no matter what the
protocol, so your entire argument against XML for such a thing is moot.
it is possible, it all depends.

a swaying factor in my last choice was the effort tradeoff of writing
the code (because working with DOM is kind of a pain...). IIRC, I may

Huh? again. There's very little effort in writing XML code, whether DOM, JAXB,
SAX or StAX, given the wide availability of libraries to do so.
have also been worrying about performance (mostly passing around lots of
numeric data as ASCII strings, ...).

Based on what measurements?
but, I may eventually need to throw together a basic encoding scheme for
this case (a binary encoder for list-based data), that or just reuse an
existing data serializer of mine (mostly intended for generic data
serialization, which supports lists). it lacks any sort of prediction or
context modeling though, and is used in my stuff mostly as a container
format for bytecode for my VM and similar.



who knows?...

Anyone who thinks about it realistically.
probably delivering the best reasonable user experience?...

That's not a cost, that's a goal.
for a game:
reasonably good graphics;
reasonably good performance (ideally, consistently over 30fps);
hopefully good gameplay, plot, story, ...

well, that and "getting everything done" (this is the hard one).

Those aren't costs. Those are goals.

Clear conclusions require clear reasoning on actual facts with relevance.
 
A

Arved Sandstrom

On 2/9/2012 3:24 AM, Arved Sandstrom wrote: [ SNIP ]
Consider line-oriented files/messages like .properties files: these can
describe hierarchical structures perfectly well if you've got an
understood key=value syntax, specifically with a hierarchy-supporting
syntax for the keys. Easy to read and edit, easy to parse.

yes, but this defeats your own prior point, namely indirectly asserting
that line-oriented == flat-structure.

Minor quibble, I didn't make such a point, not even indirectly. You may
be confusing me with Arne.
point is, one can have hierarchical line-oriented files.
[ SNIP ]

Yes.

AHS
 
B

BGB

On 2/9/2012 3:24 AM, Arved Sandstrom wrote: [ SNIP ]
Consider line-oriented files/messages like .properties files: these can
describe hierarchical structures perfectly well if you've got an
understood key=value syntax, specifically with a hierarchy-supporting
syntax for the keys. Easy to read and edit, easy to parse.

yes, but this defeats your own prior point, namely indirectly asserting
that line-oriented == flat-structure.

Minor quibble, I didn't make such a point, not even indirectly. You may
be confusing me with Arne.

ok. both names started with 'Ar', so I guess I didn't notice the change...
point is, one can have hierarchical line-oriented files.
[ SNIP ]

Yes.

AHS
 
A

Arne Vajhøj

fair enough.

often, one can implement non-flat structures with line-oriented formats,
for example:
...
groupDef {
...
groupDef {
itemDef {
...
}
...
}
...
}

But then the parser becomes more complex than using the
builtin XML parser.
typically, I have not used validation:
if there is anything to validate, typically this logic will be placed in
the logic to parse the text.

But yout get it for free with XML - you just need to enable
validation.

Arne
 
A

Arne Vajhøj

As an example take a look at log4j .properties and XML configuration
files. All you gain with the XML is the ability to validate against a
log4j DTD.

And problems appending to an existing file ...

Arne
 
A

Arne Vajhøj

On 2/8/2012 7:16 PM, Arne Vajhøj wrote:
On 2/8/2012 2:07 PM, BGB wrote:
On 2/8/2012 4:19 AM, Arved Sandstrom wrote:
On 12-02-08 04:41 AM, BGB wrote:
note: my main way of working with XML is typically via DOM-style
interfaces (if I am using it, it is typically because I am directly
working with the data structure, and not as the result of some
dumb-ass
"data binding" crud...).

I haven't been able to completely avoid using the DOM, but I
loathe the
API. If I'm using XML at all, and JAXB suits, I'll use JAXB. More
generally I'll use SAX or StAX.


I have rarely done things for which SAX has made sense...
usually in cases where SAX would make sense, I end up using
line-oriented text formats instead (because there is often little
obvious reason for why XML syntax would make much sense).

Non flat structure and validation comes to mind.

fair enough.

often, one can implement non-flat structures with line-oriented formats,
for example:
...
groupDef {
...
groupDef {
itemDef {
...
}
...
}
...
}
[ SNIP ]

No need for the braces, if you're going to use those all you gain over
the XML is terseness.

well, if the format is still line-oriented, one can still parse the
files using a loop, getting and splitting strings, and checking the
first token of each line.

parsing XML is a little more invovlved, since:
items may be split across lines, or multiple items may exist on the same
line;
one can no longer use whitespace or commas as the primary deliminator;

????

No one in their right mind would parse XML manually.

You can pick between lots of nice XML API's (many of them
shipping with Java) that will handle all that.

Arne
 
A

Arne Vajhøj

On 2/9/2012 3:24 AM, Arved Sandstrom wrote: [ SNIP ]
Consider line-oriented files/messages like .properties files: these can
describe hierarchical structures perfectly well if you've got an
understood key=value syntax, specifically with a hierarchy-supporting
syntax for the keys. Easy to read and edit, easy to parse.

yes, but this defeats your own prior point, namely indirectly asserting
that line-oriented == flat-structure.

Minor quibble, I didn't make such a point, not even indirectly. You may
be confusing me with Arne.
point is, one can have hierarchical line-oriented files.
[ SNIP ]

Yes.

You can have non flat structures other than XML, but parsing
quickly becomes very complex.

Arne
 
A

Arne Vajhøj

yep, but one can wonder what is the gain of using a schema if one is
just going to use "xsd:any"?...

You still have some structure.
it is also a mystery how well EXI behaves in this case (admittedly, I
have not personally looked into EXI in-depth, as I only briefly skimmed
over the spec a long time ago).

No idea. But I would assume EXI supports what is valid XML and XSD.
well, there are some rules, but the question is more if a schema or the
use of validation would offer much advantage to make using it worth the
bother?...

Enforcing correctness of data is usually a good idea.

Arne
 
B

BGB

XML is much easier to modify and maintain when flexibility is a requirement.


That rather defeats the purpose of having a schema.

A schema is a contract that the various processes or other stakeholders use to
guarantee correctness of the XML and guide processing. If you develop it /ad
hoc/ you lose that contract.

there is no schema in use in this case, however...

hence, why the format would ideally need to be adaptive:
so one doesn't need a schema for it to work correctly, and also so the
use of free-form data will not hinder compression.

I'm sure that's very clever, but it defeats the purpose of XML schema.

which is not being used in this case to begin with...

... and now you're off on some weird tangential topic.

data compression...


as I see it, XML exists at 2 levels:
as a textual syntax;
as a semantic (a tree of tags with attributes and so on) structure,
which can be expressed via the textual syntax.

conceptually, the semantic structure of XML is more or less equivalent
to a tree of DOM nodes.

Huh?

XML is a formal specification for structured documents that is devoid of
semantics.

the existence of the tags and attributes is the semantics...

Bully. Good on ye.


Compared "favorably" according to what criteria?

smaller output size.

I was testing each scenario, and comparing the output sizes.

And that's a use case for XML how, exactly?

Saying "XML is bad because it doesn't keep bananas ripe" would be equally
relevant.

because one can use XML as the client/server messaging protocol, say, in
place of "well, I am going to send message tags as raw bytes and have
each followed by some values...".

Back in the universe where we're discussing XML's suitability, please.


Oh, you're talking about inter-node communication in a distributed game. Thanks
for finally making that clear. XML would be just fine as a transmission protocol for such a thing. I'm not saying ideal, but just fine.

If you're talking about network protocols you certainly are not talking about
frame-by-frame transmission of data with reply at 10 Hz, no matter what the
protocol, so your entire argument against XML for such a thing is moot.

both ends transmit concurrently and asynchronously, but one needs to get
the messages through at roughly 10Hz for things to remain playable
(otherwise, real-time interactivity starts to fall apart).


so, the server sends a 10Hz stream of updates to the client;
the client sends a 10Hz stream of movement impulses back to the server;
....

granted, ping time is a bit of an issue, as what actions the player is
trying to do, and what is going on at the servers' end, will invariably
drift somewhat.

typically, things like linear extrapolation and similar are used to try
to make up for sub-optimal ping (ideally, one tries to hide the results
of the ping time, where possible).


but, it does all work, as evidenced by the prevalence of online gaming
and similar.


I doubt anyone uses ping/pong or request/response based protocols for
this, as the ping times over the internet would likely render something
like this unusable (one would probably need to be on a LAN or something...).

Huh? again. There's very little effort in writing XML code, whether DOM, JAXB,
SAX or StAX, given the wide availability of libraries to do so.

well, the issue is that one needs an method call every time they want to
fetch an attribute's value or look up a node, which is a little more
painful than it could be.

it isn't like major pain or anything, but it does tend to result in
slightly longer and more awkward code.

with lists, traditionally there are operations like:
"cadr", "caddr", "cadddr", ..., "caadr", ..., and so on, which make it a
bit easier (and more compact) to reference particular items within a
list (since the operations essentially encode where to fetch the item from).

OTOH, with DOM one might end up with a chain a several statements to
access an item, and yet more statements if one is checking for null, so
it is just a little more awkward and verbose to work with, but granted
it is not like the difference is all that huge (IIRC, the performance
concern may well have been a bigger factor).

Based on what measurements?

this case was based mostly on speculation that if I am creating piles of
new strings to pass numbers around, and I am passing a scene-graph
update on a 10Hz basis, most of which will become garbage immediately
afterwards, than creating all of those strings could get a little
expensive (mostly causing the garbage collector to start "doing its
thing" and reduce performance and similar).

I chose another option (namely lists) which had the option of passing
the the numbers without allocating any memory on the heap.

Anyone who thinks about it realistically.


That's not a cost, that's a goal.


Those aren't costs. Those are goals.

not effectively achieving a goal is a cost...

Clear conclusions require clear reasoning on actual facts with relevance.

dunno.


a lot of time one works based on "the feel of the code" or "the feel of
the problem" or similar (if one "feels" that an option will lead to
suck, it often does lead to suck). one doesn't necessarily know what the
reasoning is, one can just follow along where it leads (it can be almost
like that of a physical sensation or similar, like "what does it feel
like the code wants to do here?").

also, estimating things based on past experiences and known behaviors
and "rules of thumb" and so on.

if one knows what something does, one can make an educated guess for
what it will do in a given situation.

everything else becomes mostly likelihoods and probabilities (like, how
likely is a good outcome, vs a sucky outcome, ...).


or such...
 
B

BGB

On 12-02-08 10:50 PM, BGB wrote:
On 2/8/2012 7:16 PM, Arne Vajhøj wrote:
On 2/8/2012 2:07 PM, BGB wrote:
On 2/8/2012 4:19 AM, Arved Sandstrom wrote:
On 12-02-08 04:41 AM, BGB wrote:
note: my main way of working with XML is typically via DOM-style
interfaces (if I am using it, it is typically because I am directly
working with the data structure, and not as the result of some
dumb-ass
"data binding" crud...).

I haven't been able to completely avoid using the DOM, but I
loathe the
API. If I'm using XML at all, and JAXB suits, I'll use JAXB. More
generally I'll use SAX or StAX.


I have rarely done things for which SAX has made sense...
usually in cases where SAX would make sense, I end up using
line-oriented text formats instead (because there is often little
obvious reason for why XML syntax would make much sense).

Non flat structure and validation comes to mind.

fair enough.

often, one can implement non-flat structures with line-oriented
formats,
for example:
...
groupDef {
...
groupDef {
itemDef {
...
}
...
}
...
}
[ SNIP ]

No need for the braces, if you're going to use those all you gain over
the XML is terseness.

well, if the format is still line-oriented, one can still parse the
files using a loop, getting and splitting strings, and checking the
first token of each line.

parsing XML is a little more invovlved, since:
items may be split across lines, or multiple items may exist on the same
line;
one can no longer use whitespace or commas as the primary deliminator;

????

No one in their right mind would parse XML manually.

You can pick between lots of nice XML API's (many of them
shipping with Java) that will handle all that.

depends on which language one is using at the time...

if one is using Java, then XML parsing is basically free.
if one is using C, then it is either "write some code to do it", or
suffer with a 3rd party library dependency (one might validly choose to
write the code themselves in this case).


I don't expect it is all that uncommon for a person to switch between
several different languages, and maybe deal with the strengths and
weaknesses of whichever language they are using at the time.
 
L

Lew

BGB said:
depends on which language one is using at the time...

if one is using Java, then XML parsing is basically free.

This /is/ a Java newsgroup, as you might have noticed.
if one is using C, then it is either "write some code to do it", or
suffer with a 3rd party [sic] library dependency (one might validly choose to
write the code themselves in this case).

"Suffer"? The XML parsers for C are well-established, very reliable, and no
cause for suffering. Using a pejorative is not the same as establishing a
point.

There is nothing wrong with the third-party libraries, and the choice to
roll your own for C is rarely valid. You seem to suffer from NIH syndrome.
I don't expect it is all that uncommon for a person to switch between
several different languages, and maybe deal with the strengths and
weaknesses of whichever language they are using at the time.

Not usually in the same program. Your expectation lacks relevance here.
 
B

BGB

You still have some structure.

probably.



No idea. But I would assume EXI supports what is valid XML and XSD.

yes, it is just that, IIRC, EXI uses the schema to know how to
efficiently encode structures (values are directly coded), and falls
back to a more naive strategy (describing the encoded tags) if the
schema doesn't cover a given case.

admittedly, I am less certain, partly as skimming over the spec,
admittedly I am not entirely certain how EXI works (would have to invest
a bit more time in reading over the spec).

note: even in the worst case, the output will still likely be tiny vs
textual XML.


more skimming... sudden mystery: if the format is a bitstream, why are
they apparently using a byte-aligned scheme for storing integers?...
(the cost here is that one has to then re-align with the next byte
boundary, potentially wasting on average several bits).

Enforcing correctness of data is usually a good idea.

potentially, but checking against schemas isn't free.
depending on the application, it could be hard to justify spending the
extra clock cycles (except maybe for debugging purposes or similar).

a issue with ASTs is that they come in several forms:
giant, like in the output of a C compiler, where many tasks tend towards
"expensive" (it may take easily anywhere from 250ms-1500ms to shove all
this stuff through the various compiler stages);
small, like in a script-language VM, where typically it is desirable
that compile times still be fairly fast, since a major strength of
scripting languages is trying to keep "eval" and similar fairly close to
free.

granted, one could debate the sanity of using XML for ASTs in the first
place, but this started originally as a historical accident in my case
(I was writing an interpreter, and it was what I had on-hand, actually:
I partly hacked an existing XML-RPC implementation into being a script
interpreter...). however, it doesn't seem to actually hurt performance
too badly (ironically, in my C compiler, much more time goes into the
preprocessor and tokenizer, which are far more efficient and more highly
optimized).

side note: the C compiler doesn't use a standard DOM, but rather a
highly specialized, but still DOM-like, system (and may still dump ASTs
as text-form XML for debugging reasons). it involves, among other
things, optimizations for numerical data (attributes may store numeric
data directly, vs needing to use a string) and large hash-tables and
chaining for look-ups, as well as specialized operations to reduce typing.

my current scripting VM, however, internally uses lists/s-expressions
(note: they are neither AST compatible, nor will C code work effectively
on my scripting VM). this was due to a later rewrite "switching over" (I
was also reusing a lot of parts from a prior Scheme interpreter of mine
for this one).


but, anyways, I am more left thinking schema-checking would probably
make sense more when either some sort of security is a concern, or maybe
when sending data "over the wire" between multiple parties.

inserting a schema check between ones' parser and ones' bytecode emitter
doesn't seem nearly as compelling.


I guess, if a person really wanted, they could write a schema for the
ASTs, but it is not clear how useful it would be to do so (since,
generally, apart from someone mucking around with the compiler
internals, there is little direct reason to know or care what is going
on in there...).


or such...
 
B

BGB

This /is/ a Java newsgroup, as you might have noticed.

yes, but this thread is also about cross-language message passing, one
may have to face the issue that, at least one end, will not be using Java.

this means, of course, that both ends will need to be able to deal with
both sending and receiving the data.

if one is using C, then it is either "write some code to do it", or
suffer with a 3rd party [sic] library dependency (one might validly choose to
write the code themselves in this case).

"Suffer"? The XML parsers for C are well-established, very reliable, and no
cause for suffering. Using a pejorative is not the same as establishing a
point.

There is nothing wrong with the third-party libraries, and the choice to
roll your own for C is rarely valid. You seem to suffer from NIH syndrome.

they introduce porting hassles:
does one bundle "libxml" with their app on Windows;
do they use MSXML and then deal with having to switch over to "libxml"
when building on Linux?
....

often, writing ones' own code to do something may be the fastest and
easiest option.


writing code to do something can also be a fun and entertaining
experience (giving oneself stuff to do, and then doing it, ...), and
also give ideas/experience which could be useful for other things.

granted, there is also the goal of getting things done in a timely
manner, so it is a tradeoff.

but, anyways, it is like asking a person never to write their own JPEG
loader/saver, or their own scripting-language compiler. yes, maybe a
person doesn't technically need to, but they may forsake potentially
valuable learning experiences (or the claim to having the skills to do so).

Not usually in the same program. Your expectation lacks relevance here.

so, then, a program written in a mix of 5 programming languages is
probably rare then?...


but, anyways, whether or not it is within the same program was not the
issue:
it could be in multiple cooperating programs which share data, or in
different components (which merely share APIs or similar).
 
L

Lew

BGB said:
yes, but this thread is also about cross-language message passing, one
may have to face the issue that, at least one end, will not be using Java.

this means, of course, that both ends will need to be able to deal with
both sending and receiving the data.

This is the use case for which XML with schema excels. It is very nearly ideal
for the purpose. XML is semantically void with respect to the problem domain,
schemas provide a reliable contract for interpretation of the messages, they
provide a convenient human-readable format to ensure agreement by all
stakeholders, the drive the easy-to-use tools for XML-based message passing,
and such easy-to-use tools are abundantly available for every major platform
and computer language.

Your comments about different libraries' availability makes an asset sound like
a problem. It's a *good* thing that there are so many libraries available. XML
itself provides the compatibility.
 
B

BGB

This is the use case for which XML with schema excels. It is very nearly ideal
for the purpose. XML is semantically void with respect to the problem domain,
schemas provide a reliable contract for interpretation of the messages, they
provide a convenient human-readable format to ensure agreement by all
stakeholders, the drive the easy-to-use tools for XML-based message passing,
and such easy-to-use tools are abundantly available for every major platform
and computer language.

yes, but it is the agreement on particular formats (say, that both
parties will use XML and have the contents laid out a particular way),
rather than the use of either schemas or validation, which allows for
said compatibility.

it is like claiming that people need to depend on standardized
dictionaries (and some sort of automatic word-use and grammar checker)
to be able to carry on a conversation, rather than, say, the
dictionaries existing as a means of recording agreed-upon word-use patterns.


or, like those people who go and claim that "math is reality" rather
than "math is a formalized system which can be used to describe
reality", and so on.

Your comments about different libraries' availability makes an asset sound like
a problem. It's a *good* thing that there are so many libraries available. XML
itself provides the compatibility.

yep.


it is likewise for many common file-formats:
large numbers of people use them, write code to read and write them, ...
so, people go and write down how the file format works, such that others
can write things which can read and write the files.

luckily for everyone, most people can agree to use PNG and JPEG and so
on as well...
 
L

Lew

yes, but it is the agreement on particular formats (say, that both
parties will use XML and have the contents laid out a particular way),
rather than the use of either schemas or validation, which allows for
said compatibility.

Sure, and schemas give a simple, readable, clear and unambiguous means to
communicate the proposal and reach an agreement.

You might as well say that it's the intent of the carpenter that makes the
furniture, not the saw. This does not make the saw any less useful or valuable.
it is like claiming that people need to depend on standardized
dictionaries (and some sort of automatic word-use and grammar checker)
to be able to carry on a conversation, rather than, say, the
dictionaries existing as a means of recording agreed-upon word-use patterns.

No, it's nothing like that.

It is like having a dictionary to record the agreement. Following your logic,
we'd claim that a dictionary isn't useful because all it does is record an
agreement in a structured, easily-followed and standard manner.
or, like those people who go and claim that "math is reality" rather
than "math is a formalized system which can be used to describe
reality", and so on.

Huh? To make a math joke, you really are off on a tangent with that one.

What have you got against math people? Oh, and by the way, math is reality.
yep.


it is likewise for many common file-formats:
large numbers of people use them, write code to read and write them, ...
so, people go and write down how the file format works, such that others
can write things which can read and write the files.

luckily for everyone, most people can agree to use PNG and JPEG and so
on as well...

But by your logic, PNG and JPEG are not useful because all we have to do is
invent our own format and agree to use it and re-invent all the nifty (and
often free) useful tools that only work on standard formats like PNG and JPEG,
thus throwing away all the human-centuries of engineering and wisdom that went
into those standards simply because we believe we're more clever than anyone
else and can exist in a vacuum and don't need all those steenkeen' free, useful
tools.
 
B

BGB

Sure, and schemas give a simple, readable, clear and unambiguous means to
communicate the proposal and reach an agreement.

You might as well say that it's the intent of the carpenter that makes the
furniture, not the saw. This does not make the saw any less useful or valuable.

a saw is actually physically needed for the work to be done.


a more accurate example would likely be:
does the carpenter need a CNC milling machine?

the carpenter could just saw at the wood, and make something.
and he could draw up a diagram or make a blueprint or similar if he wanted.

but, demanding that a schema be used is about like asking that he write
the CNC program, and have the machine do it.

No, it's nothing like that.

It is like having a dictionary to record the agreement. Following your logic,
we'd claim that a dictionary isn't useful because all it does is record an
agreement in a structured, easily-followed and standard manner.

I was not saying dictionaries are not useful, only that one can carry on
a conversation without invoking one at every instant to validate what
one is saying.

a written specification for a file format will serve a similar purpose.
an XML schema could be considered as a narrower machine-readable subset
of a file-format specification. although there are cases where it could
be useful to validate against the schema, this is not likely the case in
every case.

Huh? To make a math joke, you really are off on a tangent with that one.

What have you got against math people? Oh, and by the way, math is reality.

grr, those people annoy me, especially for their whole "the theory is
too pure to be used for anything actually useful" thing (of believing
that physical reality is somehow inferior to "mathematical perfection"
or whatever...).

at least software does something, and has slightly less occurrence of
people going on endlessly about "perfection" and whatever else (or
getting all condescending and nit-picky about something being "not
sufficiently perfect enough", bleh...).


also, my reality happens to be made mostly out of matter, and "stuff".

matter is obvious enough: one can see it, one can eat it, ...
secondarily: software is "real enough", because one can run it, and one
can copy it around via drives or over the internet, ...

but, where is the "math": it is seemingly nowhere to be found, and seems
mostly just to boil down to people messing around with symbolic
notations and describing the behavior of systems otherwise made out of
matter.

IMO, it makes about as much sense as those people who believe reality is
made out of emotions, or perceptions, or morals, or is actually a huge
pile of laws and words, or whatever else.

(decided against writing a bunch of arguments for how each apparently
fails as a good basis for observable reality).

rather each is by some means built on top of reality:
emotions and perception being a byproduct of the brain (itself made out
of matter...);
morals being (probably) a byproduct of large-scale cost/benefit
tradeoffs (bad behavior -> bad results, and is a place where emotions
and economics seem to converge, ...);
and laws and words are a byproduct of language use and peoples' attempts
to organize things.

likewise, math would seem to be a byproduct of the analysis and
description of physical and mechanical systems.


not that all this stuff doesn't matter, just reality is (probably) not
made out of it.

also note: it is possible to believe in a reality made out of matter,
and also believe in religious stuff and similar as well (because, as I
see it, the belief that they necessarily conflict is probably also flawed).

( could go into the matter of "matter + religion + morals + rational
self-interest + free market + ...", but, I have probably been going off
on enough of a tangent already... )

But by your logic, PNG and JPEG are not useful because all we have to do is
invent our own format and agree to use it and re-invent all the nifty (and
often free) useful tools that only work on standard formats like PNG and JPEG,
thus throwing away all the human-centuries of engineering and wisdom that went
into those standards simply because we believe we're more clever than anyone
else and can exist in a vacuum and don't need all those steenkeen' free, useful
tools.

this is missing the point. to write ones' own code is not the same as to
forsake using an existing standardized file format.


I do use a lot of standardized formats, just I often feel little need to
use others' implementations of those formats.

for example, I have my own implementations of PNG, JPEG, Deflate, ...
granted, I didn't really "need" to do so, but often to use a library
means either creating an annoying external dependency issue, or needing
to drag around the library, when often one can get by just writing a
much smaller and more narrowly focused piece of code to deal with it.
 
A

Arved Sandstrom

On 12-02-10 08:10 PM, BGB wrote:
[ SNIP ]
this is missing the point. to write ones' own code is not the same as to
forsake using an existing standardized file format.

I do use a lot of standardized formats, just I often feel little need to
use others' implementations of those formats.

for example, I have my own implementations of PNG, JPEG, Deflate, ...
granted, I didn't really "need" to do so, but often to use a library
means either creating an annoying external dependency issue, or needing
to drag around the library, when often one can get by just writing a
much smaller and more narrowly focused piece of code to deal with it.
Apart from a situation where you are genuinely resource-constrained and
need to slim down the library in question [1], I don't see those factors
as justifying the effort. "External dependency"? You've already got one
- you depend on the file format specification. So would you rather spend
the (usually substantial) time understanding the spec and implementing
the format, or have other folks do it for you?

And "drag around the library"? Who are you kidding? Look at the size of
libtiff libraries on a typical Linux or Unix system, and then look at
the supported API: you think the library is bloated? You think the
effort is justified to understand the TIFF spec well enough to pick out
just the bits you need, so you can build your own library? Or look at
the Javadoc API for iText 5.1.3: http://api.itextpdf.com/itext/. You
think the 1.6 MB size of the core iText JAR is so indefensible that it's
worth your time to understand the PDF spec well enough to write your own
library for just the bits you need?

It's possible a few times in your career to adopt a new file format so
early that nobody else has a decent library for it. Or the only decent
ones are commercial, as another possibility. This is quite rare, though.

AHS

1. Possible, I suppose, if someone is asking you to do miracles with a
dinky low-end microcontroller.
 
B

BGB

On 12-02-10 08:10 PM, BGB wrote:
[ SNIP ]
this is missing the point. to write ones' own code is not the same as to
forsake using an existing standardized file format.

I do use a lot of standardized formats, just I often feel little need to
use others' implementations of those formats.

for example, I have my own implementations of PNG, JPEG, Deflate, ...
granted, I didn't really "need" to do so, but often to use a library
means either creating an annoying external dependency issue, or needing
to drag around the library, when often one can get by just writing a
much smaller and more narrowly focused piece of code to deal with it.
Apart from a situation where you are genuinely resource-constrained and
need to slim down the library in question [1], I don't see those factors
as justifying the effort. "External dependency"? You've already got one
- you depend on the file format specification. So would you rather spend
the (usually substantial) time understanding the spec and implementing
the format, or have other folks do it for you?

it depends some...

but, anyways, depending on the format is not a dependency, since the
code doesn't care about the format spec. maybe the programmer does when
they implement it, but this doesn't matter for the program.


what is a dependency is whether or not the library exists on the user's
system. if one needs a library, and it is not there already, well then,
the app isn't going to work (hence why one would end up having to bundle
such libraries with the app, ...).


the main issue is also copy/pasting around a bunch of extra source-code,
and dealing with making sure it all builds, some of which may have
annoying legal terms if used this way: worse if it is GPL (though GPL is
generally annoying all around in these regards). some other libraries
have requirements that one mention the library and its authors in the
credits, ...


in the case of JPEG, it was more effort probably to skim through the
spec than write the code to load/save the format (mostly because the
JPEG spec is overly long-winded, and most of its relevant contents could
be probably boiled down to a few pages).

the only real difficult part of PNG is Deflate (yes, also handled by
zlib or similar, if one wants to worry about it).

it might be a little easier to "sell" someone on using all of these
libraries if they were all aggregated into a single library (much like
"libavcodec" in the case of audio/video codecs).

And "drag around the library"? Who are you kidding? Look at the size of
libtiff libraries on a typical Linux or Unix system, and then look at
the supported API: you think the library is bloated? You think the
effort is justified to understand the TIFF spec well enough to pick out
just the bits you need, so you can build your own library? Or look at
the Javadoc API for iText 5.1.3: http://api.itextpdf.com/itext/. You
think the 1.6 MB size of the core iText JAR is so indefensible that it's
worth your time to understand the PDF spec well enough to write your own
library for just the bits you need?

TIFF: not sure why someone would want TIFF support in the first place,
so no really comment here. apparently it is mostly for people who want
48 bit color depth or something.

I have not considered PDF loading or saving (not terribly relevant in my
case).

an LWO loader might be nice, given I haven't gotten around to writing
one yet (but, the observant may notice: even if a 3rd party LWO loader
was used, it wouldn't probably load into the mesh-format my engine uses
already, making it essentially pointless). not that it really matters:
if I really cared much about LWO, I probably would have had a loader for
it already.

It's possible a few times in your career to adopt a new file format so
early that nobody else has a decent library for it. Or the only decent
ones are commercial, as another possibility. This is quite rare, though.

or, one might develop their own file-formats as well, without being
chained to the cult of "does a library already exist for that?..."


another power of writing ones' own code is that there is control over
what is done and why. with a 3rd party library, one may be stuck with
whatever way *they* chose to do something, impeding ones' own freedom to
do it differently and to try out alternate possibilities.


more so, writing code is fairly cheap.

but, anyways, most of the stuff where people are worrying about writing
code oneself, is typically in regards to trivia.

what is there to really to gain from doing all of the hard parts of app
development, by actually writing the app, but then spending inordinate
time worrying about not re-implementing functionality which exists in
libraries.


probably, if a typical programmer can go read a spec for a file format,
throw something together, and have everything working ok in maybe a few
hours or so, what really is the problem? it could very well end up being
more time and effort working out differences between the library's API
and however the app does things internally.

it may even be the case that using the library would end up with one
writing more code than just doing it oneself more directly...


but, whatever, people can try to micro-optimize their productivity or
whatever if they want (ultimately, so long as one does stuff and gets
stuff done, it is probably good enough regardless of whether or not it
is the "most efficient" regarding programmer-time or whatever...).

doesn't hurt programmers too much, given it gives something to do,
especially if one is being paid by the hour, or by the kloc (arguably,
it is a win-win situation, either way the employer gets code, and the
employee gets money).

then one is all on the job, "keeping it real" and "doing their thing"
and similar.

AHS

1. Possible, I suppose, if someone is asking you to do miracles with a
dinky low-end microcontroller.

mostly it is about writing 3D engines for desktop PCs which work on both
Windows and Linux (though Windows is the much higher priority).
 
A

Arved Sandstrom

On 12-02-10 08:10 PM, BGB wrote:
[ SNIP ]
this is missing the point. to write ones' own code is not the same as to
forsake using an existing standardized file format.

I do use a lot of standardized formats, just I often feel little need to
use others' implementations of those formats.

for example, I have my own implementations of PNG, JPEG, Deflate, ...
granted, I didn't really "need" to do so, but often to use a library
means either creating an annoying external dependency issue, or needing
to drag around the library, when often one can get by just writing a
much smaller and more narrowly focused piece of code to deal with it.
Apart from a situation where you are genuinely resource-constrained and
need to slim down the library in question [1], I don't see those factors
as justifying the effort. "External dependency"? You've already got one
- you depend on the file format specification. So would you rather spend
the (usually substantial) time understanding the spec and implementing
the format, or have other folks do it for you?

it depends some...

but, anyways, depending on the format is not a dependency, since the
code doesn't care about the format spec. maybe the programmer does when
they implement it, but this doesn't matter for the program.

There are different types of dependencies. Which ones matter more? If
you are contractually bound to implement a given specification, I
guarantee you that except for the most trivial specs that you will spend
more time understanding the requirements than you will coding them up.
I'd call that a real dependency.

But I get that you mean only compile/link implementation dependencies. OK.
what is a dependency is whether or not the library exists on the user's
system. if one needs a library, and it is not there already, well then,
the app isn't going to work (hence why one would end up having to bundle
such libraries with the app, ...).

Sure. Or at a higher level if it's a managed or interpreted program,
does the user have the runtime or interpreter at all, let alone a
correct version. Your most elegant and compact program might be a Python
or Ruby or Windows Powershell script, but if the target user can't run
it, what's the point?

This is universal though. Like in the examples above, does the target
user have the right interpreters? If running C# or Java, do they have a
sufficiently recent runtime? Are the right versions of framework
libraries present? For C or C++ similar: what libraries exist? Do you
provide them yourself, or link them in? Do you go the GNU build route
and support only configure scripts and building from source? For Java or
C#, if using 3rd party libraries, how do you handle that? For
build/install mechanisms that support downloading of dependencies, like
Perl CPAN or Maven/Ivy or whatever, you have to configure all that.
Maybe you spend quality time configuring up a NSIS installer for
Windows, or a Mac OS X .pkg for use by Installer.

In the big scheme of things you've got enough effort devoted to all this
that I don't myself see how making use of a good 3rd party library
should be questioned...*for the reasons you are thinking of*. I can
certainly think of good reasons why a team would, and should, want to
debate the selection of a _given_ 3rd party library, but not because you
think it'll overly complicate your deployments.
the main issue is also copy/pasting around a bunch of extra source-code,
and dealing with making sure it all builds, some of which may have
annoying legal terms if used this way: worse if it is GPL (though GPL is
generally annoying all around in these regards). some other libraries
have requirements that one mention the library and its authors in the
credits, ...

Copying/pasting? !!! Annoying legal terms? !!! Mentioning folks in
credits? !!!

To borrow from Monday Night Football, "C'mon Man!" OK, granted, legal
requirements attached to candidate 3rd party libraries can be a blocker
(or a difficulty) when dealing with commercial software, but overall
this is quibbling. These are reasons you manufacture when you want to
roll your own code and won't be dissuaded.
in the case of JPEG, it was more effort probably to skim through the
spec than write the code to load/save the format (mostly because the
JPEG spec is overly long-winded, and most of its relevant contents could
be probably boiled down to a few pages).

Who cares about the quality of the spec? They are what they are. You
have to deal with them as is. Most of the W3C specs are way more turgid
and confusing than the image file format specs. Point being, you had to
read some of the spec - at *some point* - in order to load/save a legal
version of a JPEG file. That's my point: *someone* has to read and
understand as much of a spec as is needed to accomplish Task X, and why
would you want to do that if someone else did it for you?

I just glanced at the JPEG/JFIF and JPEG/EXIF file format specs, and I
gotta tell you, if you think that either of those are overly long-winded
then you haven't read very many specs. And "relevant contents...boiled
down to a few pages"??? Relevant to whom? You? There are other people
who use these file formats, and they may be interested in supporting
most or all of the spec. It sounds like to me that you know that *your*
JPEGs are a consistent small slice of the spec, and you want to write
code that only supports the BGB JPEG subset.

Good luck to the maintainers of your code after you leave.

[ SNIP ]
TIFF: not sure why someone would want TIFF support in the first place,
so no really comment here. apparently it is mostly for people who want
48 bit color depth or something.

Bit parochial, aren't we? TIFF usage is quite huge actually, in scads of
domains...but maybe not in your little niche. And it's got nothing to do
with 48 bit colour depth.

I happen to encounter TIFF a great deal, and it's almost always B/W or
grayscale when *I* do. But certain advantages of TIFF also carry over to
colour.
I have not considered PDF loading or saving (not terribly relevant in my
case).

Maybe it's not. JPEG support isn't particularly relevant to me. Point
being, if you did have to support programmatic creation or editing or
reading/display of PDFs, would you roll your own code? In 2012? That
would be insane.

[ SNIP ]
or, one might develop their own file-formats as well, without being
chained to the cult of "does a library already exist for that?..."

I have no intrinsic problem with that first bit. I like well-designed
file formats, and I've concocted a few of my own. A custom file format
can be the best thing to do as part of a solution.

I don't quite see how the first statement leads to the second, the one
about cults. I fully agree that if someone picked an unsuitable file
format simply because it had a library to accompany it, that that would
be questionable. But you're advancing a stronger argument, that even
when you've selected a suitable file format that does have a suitable
library, that you'd often prefer to dispense with the library, and write
your own code.
another power of writing ones' own code is that there is control over
what is done and why. with a 3rd party library, one may be stuck with
whatever way *they* chose to do something, impeding ones' own freedom to
do it differently and to try out alternate possibilities.

Is that a problem with libjpeg, say? You are free to use what source you
like from that codebase, make whatever changes you like, and
redistribute commercially without paying royalties. All you have to do
is assume blame and credit the original authors. Big deal.

Unless you think that their codebase is utter garbage then why not just
modify it?
more so, writing code is fairly cheap.

It is? How much do you get paid? You work for a company? What's their
overhead for keeping you on the books? Are you involved in services
work? What's the cost to the client of spending unnecessary extra time
in coding? Or do you produce product? What's the cost to the end user of
unnecessary extra coding time?

Do you subject the code to testing? Do you write developer and end user
documentation for it? Does *someone*?

It's new code, a brand new burden for maintenance programmers. Do you
factor in the cost of their time, down the road?

Coding ain't cheap. Not even relatively.
but, anyways, most of the stuff where people are worrying about writing
code oneself, is typically in regards to trivia.

??? I don't get that.
what is there to really to gain from doing all of the hard parts of app
development, by actually writing the app, but then spending inordinate
time worrying about not re-implementing functionality which exists in
libraries.

probably, if a typical programmer can go read a spec for a file format,
throw something together, and have everything working ok in maybe a few
hours or so, what really is the problem? it could very well end up being
more time and effort working out differences between the library's API
and however the app does things internally.

What kind of file format are we talking about here? It's got to be
pretty twinky if someone can read the spec for it, *and* "throw
something together", *and* having it working "OK", all in a few hours.

What's "OK", anyways? You certainly didn't allow for a whole bunch of
time there to write comprehensive unit tests for the "something".
it may even be the case that using the library would end up with one
writing more code than just doing it oneself more directly...

Let's be real, one can always identify trivial cases where that's true,
but it's not a solid argument for the general case.
but, whatever, people can try to micro-optimize their productivity or
whatever if they want (ultimately, so long as one does stuff and gets
stuff done, it is probably good enough regardless of whether or not it
is the "most efficient" regarding programmer-time or whatever...).

You've mentioned "good enough" a few times in your posting history. It's
a venerable engineering concept, and can be carried over (_has been_
carried over) to software development.

Let's be clear: "good enough" in software development means [1] that the
product has sufficient benefits, has *no* critical problems, the
benefits sufficiently outweigh the problems, and further improvement is
more harmful than helpful.

In other words, "good enough" means that anything you consider doing has
an inadequate return on investment of time and money.

This is a pretty high bar, actually. It doesn't mean what most
developers seem to think it means. And I'm not convinced that your
development philosophy falls in line with "good enough".
doesn't hurt programmers too much, given it gives something to do,
especially if one is being paid by the hour, or by the kloc (arguably,
it is a win-win situation, either way the employer gets code, and the
employee gets money).

Doesn't help the client/consumer much, does it? We're all professional
developers here, aren't we? Don't they still matter?
then one is all on the job, "keeping it real" and "doing their thing"
and similar.


mostly it is about writing 3D engines for desktop PCs which work on both
Windows and Linux (though Windows is the much higher priority).

Desktop PCs: d'you think you're hurting for resources on a typical
desktop PC these days?

AHS

1. See http://www.satisfice.com/articles/good_enough_quality.pdf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,998
Latest member
MarissaEub

Latest Threads

Top