A critique of cgi.escape

Jon Ribbens · Sep 25, 2006

If the documentation isn't clear enough, that means the documentation
should be fixed.

Incorrect - documentation can and frequently does leave certain
behaviours undefined. This is deliberate and (among other things)
is to allow for the behaviour to change in future versions without
breaking backwards-compatibility.

Fredrik Lundh · Sep 25, 2006

Jon said:
It's up to me to decide whether or not an argument is good enough to
convince me, thank you very much.

not if you expect anyone to take anything you say seriously.

</F>

Jon Ribbens · Sep 25, 2006

not if you expect anyone to take anything you say seriously.

Now you're just being ridiculous. In this thread you have been rude,
evasive, insulting, vague, hypocritical, and have failed to answer
substantive points in favour of sarcastic and erroneous sniping - I'd
suggest it's you that needs to worry about being taken seriously.

Brian Quinlan · Sep 25, 2006

Jon said:
Now you're just being ridiculous. In this thread you have been rude,
evasive, insulting, vague, hypocritical, and have failed to answer
substantive points in favour of sarcastic and erroneous sniping - I'd
suggest it's you that needs to worry about being taken seriously.

Actually, at least in the context of this mailing list, Fredrik doesn't
have to worry about that at all. Why? Because he is one of the most
prolific contributers to the Python language and libraries and his
contributions have been of consistent high quality.

You, on the other hand, are "just some guy" and people don't have a lot
of incentive to convince you of anything.

I have no opinion on the actual debate though. Just trying to help with
the social analysis

Cheers,
Brian

Jon Ribbens · Sep 25, 2006

Actually, at least in the context of this mailing list, Fredrik doesn't
have to worry about that at all. Why? Because he is one of the most
prolific contributers to the Python language and libraries

I would have hoped that people don't treat that as a licence to be
obnoxious, though. I am aware of Fredrik's history, which is why I
was somewhat surprised and disappointed that he was being so rude
and unpleasant in this thread. He is not living up to his reputation
at all. Maybe he's having a bad day ;-)

Georg Brandl · Sep 25, 2006

Jon said:
No, but if nobody else can find one either, that's a clue that maybe
it's safe to change.

Here's a point for you - the documentation for cgi.escape says that
the characters "&", "<" and ">" are converted, but not what they are
converted to.

It says "to HTML-safe sequences". That's reasonably clear without the need
to reproduce the exact replacements for each character.

If anyone doesn't know what is meant by this, he shouldn't really write apps
using the cgi module before doing a basic HTML course.

Or use the source.

Georg

Dan Bishop · Sep 25, 2006

Fredrik said:
the "improvement with no downside" would bloat down the output for
everyone who's using the function in the intended way,

"Unless" "your" "CGI" "scripts" "output" "text" "like" "this," "I"
"think" "it's" "absurd" "to" "consider" "the" "bloat" "significant."

Jon Ribbens · Sep 25, 2006

It says "to HTML-safe sequences". That's reasonably clear without the need
to reproduce the exact replacements for each character.

If anyone doesn't know what is meant by this, he shouldn't really write apps
using the cgi module before doing a basic HTML course.

So would you like to expliain the difference between " and " ,
or do you need to go on a "basic HTML course" first?

Lawrence D'Oliveiro · Sep 25, 2006

If I have a unicode string such as: u'\u201d' (right double quote), then I
want that encoded in my html as '”' (or ” but the numeric form
is better).

Right-double-quote is not an HTML special, so there's no need to quote it.
I'm only concerned here with characters that have special meanings in HTML
markup.

There should be a one-stop shop where I can take my unicode text and
convert it into something I can safely insert into a generated html page;
at present I need to call both cgi.escape and s.encode to get the desired
effect.

What you're really asking for is a version of cgi.escape that a) fixes the
bugs discussed in this thread, and b) copes with different encodings while
doing so.

To handle b), you would need to pass it some indication of what the encoding
of the string is. In any case, converting a literal right-double-quote to
” is not relevant to the purpose of cgi.escape.

Lawrence D'Oliveiro · Sep 25, 2006

Lawrence is right that the escape method doesn't work the way he expects
it to.

Rewriting a library module simply because a developer is surprised is a
*very* bad idea.

I'm not surprised. Disappointed, yes. Verging on disgust at some comments in
this thread, yes. But "surprised" is what a lot of users of the existing
cgi.escape function are going to be when they discover their code isn't
doing what they thought it was.

It would break just about every web app out there that
uses the escape module...

How will it break them? Give an example.

Lawrence D'Oliveiro · Sep 25, 2006

and slows things down a bit.

(cgi.escape(s, True) is slower than cgi.escape(s), for reasons that are
obvious for anyone who's looked at the code).

What you're doing is adding to the reasons why the existing cgi.escape
function is stupidly designed and implemented. The True case is by far the
most common, so to make that the slow case, as well as being the
non-default case, is doubly brain-dead.

Lawrence D'Oliveiro · Sep 25, 2006

Jon Ribbens skrev:

Some examples are:

- Possibly any code that tests for string equality in a rendered
html/xml page.

You've got to be kidding. Any programmer knows that, to test two strings for
equality, you should do that on a canonical (non-encoded) representation.

- Code that generates cgi.escaped() markup and (rightfully) for some
reason expects the old behaviour to be used.

Whenever I use a channel-coding function, I expect the resulting output to
be only fit for feeding into the channel. I do NOT expect to do anything
else with it. Any kind of data manipulation I do, I do BEFORE feeding it
into the output channel, which means BEFORE putting it through the channel
coding.

- 3. party code that parses/scrapes content from cgi.escaped() markup.
(you could even break Java code this way :-s )

If that code follows the HTML rules, it will work.

Lawrence D'Oliveiro · Sep 25, 2006

Sorry, that's still not good enough. Why would any code expect such a
thing?
that's not up to you to decide, though.

Yes it is. An HTML-quoting function converts a string to its HTML-compatible
representation. Since it is now HTML-compatible, any code that tries to
work with it afterwards has got to expect it to be HTML-compatible. Which
means it has to allow for what HTML allows.

Lawrence D'Oliveiro · Sep 25, 2006

you're not the designer...

I don't have to be. Whoever the designer was, they had not properly thought
through the uses of this function. That's quite obvious already, to anybody
who works with HTML a lot. So the function is broken and needs to be fixed.

If you're worried about changing the semantics of a function that keeps the
same "cgi.escape" name, then fine. We delete the existing function and add
a new, properly-designed one. _That_ will be a wake-up call to all the
users of the existing function to fix their code.

Gabriel G · Sep 26, 2006

I'm sorry, that's not good enough. How, precisely, would it break
"existing code"? Can you come up with an example, or even an
explanation of how it *could* break existing code?

FWIW, a *lot* of unit tests on *my* generated html code would break,
and I imagine a *lot* of other people's code would break too. So
changing the defaults is not a good idea.
But if you want, import this on sitecustomize.py and pretend it said
quote=True:

import cgi
cgi.escape.func_defaults = (True,)
del cgi

Gabriel Genellina
Softlab SRL

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Steven D'Aprano · Sep 26, 2006

Any change in Python that has these consequences will rightfully be
considered a bug. So what you are suggesting is to knowingly introduce a
bug in the standard library!

It isn't like there have never been backwards _in_compatible changes to
the standard library before.

Ten seconds of googling finds
http://www.python.org/download/releases/2.3/highlights/:

int() - this can now return a long when converting a string with many
digits, rather than raising OverflowError. (New in 2.3a2: issues a
FutureWarning when sign-folding an unsigned hex or octal literal.)

Bastion and rexec - these modules are disabled, because they aren't
safe in Python 2.3 (nor in Python 2.2). (New in 2.3a2.)

Hex/oct literals prefixed with a minus sign were handled
inconsistently. This has been fixed in accordance with PEP 237. (New
in 2.3a2.)

Passing a float to C functions expecting an integer now issues a
DeprecationWarning; in the future this will become a TypeError. (New
in 2.3a2.)

None - assignment to variables or attributes named None will now
trigger a warning. In the future, None may become a keyword.

And more, all from one release.

If the behaviour of cgi.escape is "broken", or incomplete, or misleading,
then Python has a great mechanism for introducing incompatible changes
slowly: warnings.

It isn't good enough to say that the function does what it says it does,
if what it does is dangerous and misleading. Artificial example:

def sqr(x):
"""Returns the square of almost all numbers."""
if x != 1: return x**2
else: return -1

The function does exactly what it says, and yet still has badly dangerous
behaviour that risks introducing serious bugs. If people are relying on
unit tests which include specific tests for that behaviour, then the
function and the code needs to be fixed in parallel. That's what the
warnings module is for.

So any arguments about "breaking code" are a red herring: if cgi.escape
does the wrong thing (and that's arguable), and code relies on that
behaviour, then the code is already broken and needs to be fixed in
parallel with the function. So can we accept that:

(1) *if* there is a problem with cgi.escape it needs to be fixed;

(and, dear gods, I would hope that nobody here wants to argue that Python
should make backwards compatibility a higher virtue than correctness!)

(2) it doesn't need to be fixed *immediately* without warning;

(3) but it can be fixed through a gradual process with warning; and

(4) unit tests and code that expect the (presumed) bad behaviour can be
fixed gradually?

Now that we've got that out of the way, can we CALMLY and RATIONALLY
discuss whether cgi.escape is or isn't broken?

Or, more specifically, UNDER WHAT CIRCUMSTANCES it does the wrong thing?

Steve Holden · Sep 26, 2006

Jon said:
I would have hoped that people don't treat that as a licence to be
obnoxious, though. I am aware of Fredrik's history, which is why I
was somewhat surprised and disappointed that he was being so rude
and unpleasant in this thread. He is not living up to his reputation
at all. Maybe he's having a bad day ;-)

I generally find that Fredrik's rudeness quotient is satisfactorily
biased towards discouraging ill-informed comment. As far as rudeness
goes, I've found your approach to this discussion to be pretty
obnoxious, and I'm generally know as someone with a high tolerance for
idiotic behaviour.

If your intention was to troll you could not have crafted your
contributions in a better way.

regards
Steve

Dan Bishop · Sep 26, 2006

Lawrence said:
What you're doing is adding to the reasons why the existing cgi.escape
function is stupidly designed and implemented. The True case is by far the
most common, so to make that the slow case, as well as being the
non-default case, is doubly brain-dead.

How exactly would you make s = s.replace('"',""") faster than
*not* doing the replacement?

Duncan Booth · Sep 26, 2006

Lawrence D'Oliveiro said:
What you're doing is adding to the reasons why the existing cgi.escape
function is stupidly designed and implemented. The True case is by far
the most common, so to make that the slow case, as well as being the
non-default case, is doubly brain-dead.

It is slightly slower because it does more. Both cases are about 15 times
faster than the regular expression implementation someone posted to this
thread yesterday.

Duncan Booth · Sep 26, 2006

Lawrence D'Oliveiro said:
Right-double-quote is not an HTML special, so there's no need to quote
it. I'm only concerned here with characters that have special meanings
in HTML markup.

There is no need to quote " or ' either except in particular situations.

Would you care to suggest how you get a right double quote into any iso-
8859-1 encoded web page without quoting it? Even if the page is utf-8
encoded quoting it can be a good idea.

What you're really asking for is a version of cgi.escape that a) fixes
the bugs discussed in this thread, and b) copes with different
encodings while doing so.

To handle b), you would need to pass it some indication of what the
encoding of the string is. In any case, converting a literal
right-double-quote to ” is not relevant to the purpose of
cgi.escape.

You don't seem to understand about html entity escapes. ” is a valid
way to express right double quote whatever the page encoding. There is no
need to know the encoding of the page in order to escape entities, just
escape anything which can be problematic.

Mini Web Server in C++ (Part One)	4	Oct 2, 2025
Request critique of first program	14	Sep 1, 2007
Critique requested.....	5	Jan 24, 2006
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
ANN: pyTenjin 1.0.0 - a high-speed and full-featured template engine	1	Feb 21, 2011
Reading in cooked mode (was Re: Python MSI not installing, log fileshowing name of a Viatnemese comm	8	Mar 22, 2014
<c:out/> and escaping of unicode format	0	Oct 27, 2008
Default scope of variables	55	Jul 3, 2013

A critique of cgi.escape

Jon Ribbens

Fredrik Lundh

Jon Ribbens

Brian Quinlan

Jon Ribbens

Georg Brandl

Dan Bishop

Jon Ribbens

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Gabriel G

Steven D'Aprano

Steve Holden

Dan Bishop

Duncan Booth

Duncan Booth

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads