A critique of cgi.escape

  • Thread starter Lawrence D'Oliveiro
  • Start date
J

Jon Ribbens

it has *everything* to do with encoding of existing data into HTML
so it can be safely transported to, and recreated by, an HTML-aware
client.

I can't tell if you're disagreeing or not. You escape the character
"<" as the sequence of characters "&lt;", for example, because
otherwise the HTML user agent will treat it as the start of a tag and
not as character data. You will notice that the character encoding is
utterly irrelevant to this.
does the word "information set" mean anything to you?

You would appear to be talking about either game theory, or XML,
neither of which have anything to do with HTML.
 
F

Fredrik Lundh

Jon said:
It's a pity he's being rude when presented with well-informed comment
then.

since when is the output of

import random, sys
messages = [
"that's irrelevant",
"then their code is broken already",
"that's not good enough",
"then their tests are broken already",
"you're rude",
]
for x in xrange(sys.maxint):
print random.choice(messages)

well-informed? heck, it doesn't even pass the turing test ;-)

</F>
 
J

Jon Ribbens

the same documentation tells people what function to use if they
want to quote *every-thing* that might need to be quoted, so if
people did actually understand everything that was written in a
reasonably clear way, this thread wouldn't even exist.

The fact that you don't understand that that's not true is the reason
you've been getting into such a muddle in this thread.
 
F

Fredrik Lundh

Jon said:
You would appear to be talking about either game theory, or XML,
neither of which have anything to do with HTML.

you see no connection between XML's concept of information set and
HTML? (hint: what's XHTML?)

</F>
 
J

Jon Ribbens

It's a pity he's being rude when presented with well-informed comment
then.

since when is the output of
[snip code]

well-informed? heck, it doesn't even pass the turing test ;-)

Since when did that bear any resemblance to what I have said?

Are you going to grow up and start addressing the substantial points
raised, rather than making puerile sarcastic remarks?

An apology from you would not go amiss.
 
F

Fredrik Lundh

Jon said:
The fact that you don't understand that that's not true is the reason
you've been getting into such a muddle in this thread.

it's a fact that it's not true that the documentation points to the function
that it points to ? exactly what definitions of the words "fact" and "true"
are you using here ?

</F>
 
J

Jon Ribbens

I notice that yet again you've snipped the substantial point and
failed to answer it, presumably because you don't know how.
you see no connection between XML's concept of information set and
HTML? (hint: what's XHTML?)

I am perfectly well aware of what XHTML is. If you're trying to make
a point, please get to it, rather than going off on irrelevant
tangents. What do XML Information Sets have to do with escaping
control characters in HTML?
 
J

Jon Ribbens

it's a fact that it's not true that the documentation points to the function
that it points to ? exactly what definitions of the words "fact" and "true"
are you using here ?

You misunderstand again. The second half of the sentence is the untrue
bit ("if people did ... understand ... this thread wouldn't even exist"),
not the first.
 
F

Fredrik Lundh

Jon said:
I notice that yet again you've snipped the substantial point and
failed to answer it, presumably because you don't know how.
cute.

What do XML Information Sets have to do with escaping control
characters in HTML?

figure out the connection, and you'll have the answer to your "substantial
point".

</F>
 
J

Jon Ribbens

figure out the connection, and you'll have the answer to your "substantial
point".

If you don't know the answer, you can say so y'know. There's no shame
in it.
 
F

Fredrik Lundh

Jon said:
If you don't know the answer, you can say so y'know.

I know the answer. I'm pretty sure everyone else who's actually read my posts
to this thread might have figured it out by now, too. But since you're still trying
to "win" the debate, long after it's over, I think it's safest to end this thread right
now. *plonk*
 
J

Jon Ribbens

I know the answer. I'm pretty sure everyone else who's actually
read my posts to this thread might have figured it out by now, too.
But since you're still trying to "win" the debate, long after it's
over, I think it's safest to end this thread right now. *plonk*

It's sad to see a grown man throw his toys out of his pram, just
because he's losing an argument...
 
B

Brian Quinlan

A summary of this pointless argument:

Why cgi.escape should be changed to escape double quote (and maybe
single quote) characters by default:
o escaping should be very aggressive by default to avoid subtle bugs
o over-escaping is not likely to harm most program significantly
o people who do not read the documentation may be surprised by it's
behavior

Why cgi.escape should NOT be changed:
o it is current used in lots of code and changing it will almost
certainly break some of it, test suites at minimum e.g.
assert my_template_system("<p>{foo}</p>", foo='"') == '<p>"</p>'
o escaping attribute values is less common than escaping element
text so people should not be punished with:
- harder to read output
- (slightly) increased file size
- (slightly) decreased performance
o cgi.escape is not meant for serious web application development, so
either roll your own (trivial) function to do escaping how you want
it or use the one provided by your framework (if it is not automatic)
o the documentation describes the current behavior precisely and
suggests solutions that provide more aggressive escaping, so arguing
about surprising behavior is not reasonable
o it doesn't even make sense for an escape function to exist in the cgi
module, so it should only be used by old applications for
compatibility reasons


Cheers,
Brian
 
P

Paul Rubin

Brian Quinlan said:
o cgi.escape is not meant for serious web application development,

What is it meant for then? Why should the library ever implement
anything in a half-assed way unsuitable for serious application
development, if it can supply a robust implementation instead?

Your other points are reasonable. I like the idea of adding an option
to escape single quotes, but I don't care much what the defaults are.

I notice that the options for pickle.dump/dumps changed incompatibly
between Python 2.2 and 2.3, and nobody really cared.
 
J

Jon Ribbens

A summary of this pointless argument:

Your summary seems pretty reasonable, but please note that later on,
the thread was not about cgi.escape escaping (or not) quote
characters (as described in your summary), but about Fredrik arguing,
somewhat incoherently, that it should have to take character encodings
into consideration.
 
B

Brian Quinlan

Paul said:
What is it meant for then? Why should the library ever implement
anything in a half-assed way unsuitable for serious application
development, if it can supply a robust implementation instead?

I'd have to dig through the revision history to be sure, but I imagine
that cgi.escape was originally only used in the cgi module (and there
only in it's various print_* functions). Then it started being used by
other core Python modules e.g. cgitb, DocXMLRPCServer.

The "mistake", if there was one, was probably that escape wasn't spelled
_escape and got documented in the LaTeX documentation system.

All of this is just speculation though.

Cheers,
Brian
 
G

George Sakkis

Lawrence said:
I don't have to be. Whoever the designer was, they had not properly thought
through the uses of this function. That's quite obvious already, to anybody
who works with HTML a lot. So the function is broken and needs to be fixed.

If you're worried about changing the semantics of a function that keeps the
same "cgi.escape" name, then fine. We delete the existing function and add
a new, properly-designed one. _That_ will be a wake-up call to all the
users of the existing function to fix their code.

Wow. Are you always that arrogant for things you know very little
about, or just plain stupid ?
 
B

Brian Quinlan

Jon said:
Your summary seems pretty reasonable, but please note that later on,
the thread was not about cgi.escape escaping (or not) quote
characters (as described in your summary), but about Fredrik arguing,
somewhat incoherently, that it should have to take character encodings
into consideration.

And, of course, about you telling people that their explanations are not
good enough :)

BTW, I am curious about how you do unit testing. The example that I used
in my summary is a very common pattern but would break in cgi.escape
changed it's semantics. What do you do instead?

Cheers,
Brian
 
J

Jon Ribbens

And, of course, about you telling people that their explanations are not
good enough :)

I guess, if you mean the part of the thread which went "it'll break
existing code", "what existing code"? "existing code" "but what
existing code?" "i dunno, just, er, code" "ok *how* will it break it?"
"i dunno, it just will"?
BTW, I am curious about how you do unit testing. The example that I used
in my summary is a very common pattern but would break in cgi.escape
changed it's semantics. What do you do instead?

To be honest I'm not sure what *sort* of code people test this way. It
just doesn't seem appropriate at all for web page generating code. Web
pages need to be manually viewed in web browsers, and validated, and
checked for accessibility. Checking they're equal to a particular
string just seems bizarre (and where does that string come from
anyway?)
 
B

Brian Quinlan

Jon said:
I guess, if you mean the part of the thread which went "it'll break
existing code", "what existing code"? "existing code" "but what
existing code?" "i dunno, just, er, code" "ok *how* will it break it?"
"i dunno, it just will"?

See below for a possible example.
To be honest I'm not sure what *sort* of code people test this way. It
just doesn't seem appropriate at all for web page generating code.

Well, there are dozens (hundreds?) of templating systems for Python.
Here is a (simplified/modified) unit test for my company's system (yeah,
we lifted some ideas from Django):

test.html
---------
<p>{foo | escape}</p>

test.py
-------
t = Template("test.html")
t['foo'] = 'Brian -> "Hi!"'
assert str(t) == '<p>Brian -&gt; "Hi"</p>'

So how would you test our template system?
Web
pages need to be manually viewed in web browsers, and validated, and
checked for accessibility.
True.

Checking they're equal to a particular
string just seems bizarre (and where does that string come from
anyway?)

Maybe, which is why I'm asking you how you do it. Some of our web
applications contain 100s of script generated pages. Testing each one by
hand after making a change would be completely impossible. So we use
HTTP scripting for testing purposes i.e. send this request, grab the
results, verify that the test in the element with id="username" equals
"Brian Quinlan", etc. The test also validates that each page is well
formed. We also view each page at some point but not every time a
developer makes a change that might (i.e. everything) affect the entire
system.

Cheers,
Brian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,436
Messages
2,571,696
Members
48,796
Latest member
Greg L.
Top