A little afternoon WTF

markspace · May 14, 2010

Mike said:
That is, that LF, CR, and CRLF are all translated to simple LF by a
conforming XML processor.

Hmm, you're right. The combination of CRLF is translated to a single
LF. In some ways that's worse. It seems to me it could be difficult in
many cases to remove a character already in a buffer. If you're using
something like String and not a stream, then that's a lot of places that
one character will have to be removed from the input, causing two
strings to have to be concatenated.

Mike Schilling · May 14, 2010

markspace said:
Hmm, you're right. The combination of CRLF is translated to a single
LF. In some ways that's worse. It seems to me it could be difficult
in many cases to remove a character already in a buffer. If you're
using something like String and not a stream, then that's a lot of
places that one character will have to be removed from the input,
causing two strings to have to be concatenated

If you're converting bytes to chars, dropping the CR is a very minor issue.
If not, dropping the CR is a bit of a perfornance issue, since it does
require copying characters around, whether to create a String (as in DOM),
or to present the client with an array of chars (as in SAX). In either
case, it's simpler than handling escapes (like > for >).

Lew · May 14, 2010

Possibly so that all the XML is on line with nothing BUT xml,
possibly for grepping/searching purposes.

To which the empty string component contributes absolutely nothing whatsoever
at all in the least.

Tom said:
Oh god, i've [sic] just spotted another one: the hardcoded CRLF! This is a
linux [sic]-only project (up to and including developing on linux [sic] VMs - the
only time you'd ever look at this file would be on a linux [sic] machine),
and XML normalises all line breaks to LF anyway. Why would you do that?

Click to expand...

Perhaps in multi-platform environment, the coder had occasion
to open up the XML in a windows [sic] text editor.

All but one of which handle LF-terminated lines just fine. And which tom
explained, in the passage you cited no less, is not the case here regardless.

Tom Anderson · May 14, 2010

Perhaps in multi-platform environment, the coder had occasion
to open up the XML in a windows text editor.

I doubt it's quite that, but it is surely along those lines - to do with
the fact that the person who wrote this is more used to Windows: to them,
CRLF is the standard line ending, and they probably wrote the above
reflexively, just as i would have incorrectly-ish used just LF if i was
using Window for some reason.

tom

Tom Anderson · May 14, 2010

Well, I don't know if I can consider the SQL to be fine, not if you include
the hardcoded "id" value of 2057. I can see that it's evidently test data but
that still makes me queasy.

Oh, don't worry, that's there in the production version too!

tom

Roedy Green · May 14, 2010

What XML package? I'm not sure what you mean.

He may have experimented with various XML read/write packages, see
http://mindprod.com/jgloss/xml.html and had trouble generating
precisely what he wanted. He then decided to eschew XML packages (at
least for writing) since it is not particularly difficult to do
manually, and since you have precise control.

--
Roedy Green Canadian Mind Products
http://mindprod.com

Beauty is our business.
~ Edsger Wybe Dijkstra (born: 1930-05-11 died: 2002-08-06 at age: 72)

Referring to computer science.

Roedy Green · May 14, 2010

Possible, although it seems a bit of a stretch. There are string constants
all over that bit of the codebase, and i can't think of a single instance
of the ""+int trick in the entire system. I think it's pretty bad
practice, so i'd remember if i'd seen it. It's quite possible he learned
the idiom from that use, though.

The only other way I could see a "" + "legitimately" appearing in code
is when you convert some string to "" with global search replace. My
gut feel is the guy was just an idiot who "stuttered" for much the
same non-motive of other stuttering.

See http://mindprod.com/jgloss/stuttering.html

--
Roedy Green Canadian Mind Products
http://mindprod.com

Beauty is our business.
~ Edsger Wybe Dijkstra (born: 1930-05-11 died: 2002-08-06 at age: 72)

Referring to computer science.

Tom Anderson · May 16, 2010

The only other way I could see a "" + "legitimately" appearing in code
is when you convert some string to "" with global search replace. My
gut feel is the guy was just an idiot who "stuttered" for much the
same non-motive of other stuttering.

See http://mindprod.com/jgloss/stuttering.html

Exactly. 'Stuttering' is a great name for this behaviour.

tom

Andreas Leitgeb · May 17, 2010

John B. Matthews said:
Much better! Double + is surely ungood.

I'd prefer this:

private static final String HEADER = ""
+ "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n"
+ "<initech:tps-report><initech:coversheet> etc";

I'm a bit over-biased towards simple diffs, therefore I prefer
if a line can easily be removed or added without also having to
change something in a neighbouring line. (that tends to spoil
tkdiff's presentation of the file's changes)
If the last line was expected to be volatile as well, I'd even
place the semicolon on a line by itself just below the +'es.

Andreas Leitgeb · May 17, 2010

Roedy Green said:
See http://mindprod.com/jgloss/stuttering.html

quote:
" String s = new String( "abc" );
" To avoid littering the constant pool with redundant Strings, this
" should be written:
" String s = "abc";

Huh? The first litters the heap with a redundant copy, but
how does it litter the constant pool?

David Lamb · May 17, 2010

I'd prefer this:

private static final String HEADER = ""
+ "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n"
+ "<initech:tps-report><initech:coversheet> etc";

I'm a bit over-biased towards simple diffs,

I seem to recall, from about 30 years ago, the company I worked at had
similar formatting rules (details escape me) exactly so that simple text
diffs would work better. In your example the reason would have been
that the real first line of the string (line 2) shouldn't look any
different from the later ones, so that it would not change if one added
another string before it, e.g.
private static final String HEADER = ""
+ "some other new string"
+ "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n"
+ "<initech:tps-report><initech:coversheet> etc";

A simple diff would show a 1-line insertion, with no change to the other
lines.

The thing is, with current programming environments, is this still an
issue for anybody?

Lew · May 17, 2010

I seem to recall, from about 30 years ago, the company I worked at had
similar formatting rules (details escape me) exactly so that simple text
diffs would work better. In your example the reason would have been that
the real first line of the string (line 2) shouldn't look any different
from the later ones, so that it would not change if one added another
string before it, e.g.
private static final String HEADER = ""
+ "some other new string"
+ "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n"
+ "<initech:tps-report><initech:coversheet> etc";

A simple diff would show a 1-line insertion, with no change to the other
lines.

The thing is, with current programming environments, is this still an
issue for anybody?

The other thing is, with the rule that the <?xml ...?> line has to be first in
this case, would this trick ever actually have any value in this case?

Andreas Leitgeb · May 17, 2010

I'm a bit over-biased towards simple diffs,

Click to expand...

I seem to recall, from about 30 years ago, the company I worked at had
similar formatting rules (details escape me) exactly so that simple text
diffs would work better. [...]
The thing is, with current programming environments, is this still an
issue for anybody?

Click to expand...

I do use tkdiff a lot, myself, for code. It is substantial part of my
programming environment. I make no claim about whether my p.e. is in any
way "typical", though. (It also involves vim and CVS)

Lew said:
The other thing is, with the rule that the <?xml ...?> line has to be first in
this case, would this trick ever actually have any value in this case?

It could(*) happen, that at some point, a new method is created that writes
the boilerplate itself (to avoid repetition among several xml-generating
parts of code), followed by the less boilerplate parts passed in as String.
In that case, the String itself shouldn't any longer contain the header,
so the first line would need to be removed from each such literal.

*: I won't make any guesses at how often/likely that would actually happen.

Lars Enderin · May 17, 2010

On 17/05/2010 7:07 AM, Andreas Leitgeb wrote:
I'm a bit over-biased towards simple diffs,
I seem to recall, from about 30 years ago, the company I worked at had
similar formatting rules (details escape me) exactly so that simple text
diffs would work better. [...]
The thing is, with current programming environments, is this still an
issue for anybody?

Click to expand...

Click to expand...

I do use tkdiff a lot, myself, for code. It is substantial part of my
programming environment. I make no claim about whether my p.e. is in any
way "typical", though. (It also involves vim and CVS)

I find emacs and its ediff indispensable in my programming environment.

Chris Riesbeck · May 17, 2010

Roedy said:
...My
gut feel is the guy was just an idiot who "stuttered" for much the
same non-motive of other stuttering.

See http://mindprod.com/jgloss/stuttering.html

Is that page correctly generated? The examples look fine but their
English captions are just "An," "A," "It," "A," "It" etc. as if only the
first word was being put into the page.

Jim Janney · May 17, 2010

Roedy Green said:
He may have experimented with various XML read/write packages, see
http://mindprod.com/jgloss/xml.html and had trouble generating
precisely what he wanted. He then decided to eschew XML packages (at
least for writing) since it is not particularly difficult to do
manually, and since you have precise control.

For what it's worth, here's some code I wrote in 2003. It's part of a
JUnit test for a custom XML parsing framework.

private final static String header = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";

private final static String doc1 =
header
+ "<root foo='bar' baz='quux'>"
+ "<property name='var' value='value'/>"
+ "<property name='nested' value='var'/>"
+ "yabba dabba doo"
+ "<nested blah='blah blah' subst='${var}' nested-subst='${${nested}}'/>"
+ "</root>";

I didn't use an XML writer because I didn't see any particular point in it.

David Lamb · May 17, 2010

The other thing is, with the rule that the <?xml ...?> line has to be
first in this case, would this trick ever actually have any value in
this case?

In this specific case I suppose you're saying there's no reasonable
expectation that the line would change. OTOH one wants one's
programmers to develop coding habits to the point where they're automatic.

Lew · May 17, 2010

David said:
In this specific case I suppose you're saying there's no reasonable
expectation that the line would change. OTOH one wants one's programmers
to develop coding habits to the point where they're automatic.

But not to the point where they give up thinking altogether.

Yes, you are correct about my conclusion.

John B. Matthews · May 17, 2010

Chris Riesbeck said:
Is that page correctly generated? The examples look fine but their
English captions are just "An," "A," "It," "A," "It" etc. as if only the
first word was being put into the page.

I see the same appearance in recent Safari and Firefox.

ClassCastException · May 18, 2010

I find emacs and its ediff indispensable in my programming environment.

Ewwww! Emacs!

Why do you use that waste of cycles instead of opening sixty xterms most
running vim instances like Real Men do?

ObJava: see the from-line.

Oh, all right. ObJava 2: I don't see a problem with hardcoding XML
instead of using StAX if it's just a little bit here and there versus the
complexity overhead of bringing in a whole additional tool, and with it
two whole additional dependencies (one of the project upon the tool, and
another of the programmers upon knowing that additional tool).

Er, that's more XML-and-generic-programming than Java. Okay, ObJava 3:
did anyone notice the further inconsistency that the TEST_DATA_QUERY *is*
declared "final" and *is* capitalized while "header" isn't?

In fact it's likely that the two different chunks of code had different
authors, based just on that.

a little more explicative error message?	0	Jul 16, 2013
The distinction between a java applet and an application	1	Jan 4, 2023
Reading little-endian data from a file in a portable manner	46	Jul 15, 2010
A little diversion (probability fun in Ruby)	1	Aug 4, 2006
Now we're getting somewhere...but one thing is a little weird	1	Jul 23, 2007
Merlin, a fun little program	3	Jul 6, 2003
Splitting a line while keeping quoted items together	1	Nov 19, 2012
How to color a cell in a JTable	0	Jul 31, 2012

A little afternoon WTF

markspace

Mike Schilling

Lew

Tom Anderson

Tom Anderson

Roedy Green

Roedy Green

Tom Anderson

Andreas Leitgeb

Andreas Leitgeb

David Lamb

Lew

Andreas Leitgeb

Lars Enderin

Chris Riesbeck

Jim Janney

David Lamb

Lew

John B. Matthews

ClassCastException

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads