A critique of cgi.escape

  • Thread starter Lawrence D'Oliveiro
  • Start date
J

Jon Ribbens

If the documentation isn't clear enough, that means the documentation
should be fixed.

Incorrect - documentation can and frequently does leave certain
behaviours undefined. This is deliberate and (among other things)
is to allow for the behaviour to change in future versions without
breaking backwards-compatibility.
 
F

Fredrik Lundh

Jon said:
It's up to me to decide whether or not an argument is good enough to
convince me, thank you very much.

not if you expect anyone to take anything you say seriously.

</F>
 
J

Jon Ribbens

not if you expect anyone to take anything you say seriously.

Now you're just being ridiculous. In this thread you have been rude,
evasive, insulting, vague, hypocritical, and have failed to answer
substantive points in favour of sarcastic and erroneous sniping - I'd
suggest it's you that needs to worry about being taken seriously.
 
B

Brian Quinlan

Jon said:
Now you're just being ridiculous. In this thread you have been rude,
evasive, insulting, vague, hypocritical, and have failed to answer
substantive points in favour of sarcastic and erroneous sniping - I'd
suggest it's you that needs to worry about being taken seriously.

Actually, at least in the context of this mailing list, Fredrik doesn't
have to worry about that at all. Why? Because he is one of the most
prolific contributers to the Python language and libraries and his
contributions have been of consistent high quality.

You, on the other hand, are "just some guy" and people don't have a lot
of incentive to convince you of anything.

I have no opinion on the actual debate though. Just trying to help with
the social analysis :)

Cheers,
Brian
 
J

Jon Ribbens

Actually, at least in the context of this mailing list, Fredrik doesn't
have to worry about that at all. Why? Because he is one of the most
prolific contributers to the Python language and libraries

I would have hoped that people don't treat that as a licence to be
obnoxious, though. I am aware of Fredrik's history, which is why I
was somewhat surprised and disappointed that he was being so rude
and unpleasant in this thread. He is not living up to his reputation
at all. Maybe he's having a bad day ;-)
 
G

Georg Brandl

Jon said:
No, but if nobody else can find one either, that's a clue that maybe
it's safe to change.

Here's a point for you - the documentation for cgi.escape says that
the characters "&", "<" and ">" are converted, but not what they are
converted to.

It says "to HTML-safe sequences". That's reasonably clear without the need
to reproduce the exact replacements for each character.

If anyone doesn't know what is meant by this, he shouldn't really write apps
using the cgi module before doing a basic HTML course.

Or use the source.

Georg
 
D

Dan Bishop

Fredrik said:
the "improvement with no downside" would bloat down the output for
everyone who's using the function in the intended way,

"Unless" "your" "CGI" "scripts" "output" "text" "like" "this," "I"
"think" "it's" "absurd" "to" "consider" "the" "bloat" "significant."
 
J

Jon Ribbens

It says "to HTML-safe sequences". That's reasonably clear without the need
to reproduce the exact replacements for each character.

If anyone doesn't know what is meant by this, he shouldn't really write apps
using the cgi module before doing a basic HTML course.

So would you like to expliain the difference between " and &quot; ,
or do you need to go on a "basic HTML course" first?
 
L

Lawrence D'Oliveiro

If I have a unicode string such as: u'\u201d' (right double quote), then I
want that encoded in my html as '”' (or &rdquo; but the numeric form
is better).

Right-double-quote is not an HTML special, so there's no need to quote it.
I'm only concerned here with characters that have special meanings in HTML
markup.
There should be a one-stop shop where I can take my unicode text and
convert it into something I can safely insert into a generated html page;
at present I need to call both cgi.escape and s.encode to get the desired
effect.

What you're really asking for is a version of cgi.escape that a) fixes the
bugs discussed in this thread, and b) copes with different encodings while
doing so.

To handle b), you would need to pass it some indication of what the encoding
of the string is. In any case, converting a literal right-double-quote to
” is not relevant to the purpose of cgi.escape.
 
L

Lawrence D'Oliveiro

Lawrence is right that the escape method doesn't work the way he expects
it to.

Rewriting a library module simply because a developer is surprised is a
*very* bad idea.

I'm not surprised. Disappointed, yes. Verging on disgust at some comments in
this thread, yes. But "surprised" is what a lot of users of the existing
cgi.escape function are going to be when they discover their code isn't
doing what they thought it was.
It would break just about every web app out there that
uses the escape module...

How will it break them? Give an example.
 
L

Lawrence D'Oliveiro

and slows things down a bit.

(cgi.escape(s, True) is slower than cgi.escape(s), for reasons that are
obvious for anyone who's looked at the code).

What you're doing is adding to the reasons why the existing cgi.escape
function is stupidly designed and implemented. The True case is by far the
most common, so to make that the slow case, as well as being the
non-default case, is doubly brain-dead.
 
L

Lawrence D'Oliveiro

Jon Ribbens skrev:


Some examples are:

- Possibly any code that tests for string equality in a rendered
html/xml page.

You've got to be kidding. Any programmer knows that, to test two strings for
equality, you should do that on a canonical (non-encoded) representation.
- Code that generates cgi.escaped() markup and (rightfully) for some
reason expects the old behaviour to be used.

Whenever I use a channel-coding function, I expect the resulting output to
be only fit for feeding into the channel. I do NOT expect to do anything
else with it. Any kind of data manipulation I do, I do BEFORE feeding it
into the output channel, which means BEFORE putting it through the channel
coding.
- 3. party code that parses/scrapes content from cgi.escaped() markup.
(you could even break Java code this way :-s )

If that code follows the HTML rules, it will work.
 
L

Lawrence D'Oliveiro

Sorry, that's still not good enough. Why would any code expect such a
thing?
that's not up to you to decide, though.

Yes it is. An HTML-quoting function converts a string to its HTML-compatible
representation. Since it is now HTML-compatible, any code that tries to
work with it afterwards has got to expect it to be HTML-compatible. Which
means it has to allow for what HTML allows.
 
L

Lawrence D'Oliveiro

you're not the designer...

I don't have to be. Whoever the designer was, they had not properly thought
through the uses of this function. That's quite obvious already, to anybody
who works with HTML a lot. So the function is broken and needs to be fixed.

If you're worried about changing the semantics of a function that keeps the
same "cgi.escape" name, then fine. We delete the existing function and add
a new, properly-designed one. _That_ will be a wake-up call to all the
users of the existing function to fix their code.
 
G

Gabriel G

I'm sorry, that's not good enough. How, precisely, would it break
"existing code"? Can you come up with an example, or even an
explanation of how it *could* break existing code?

FWIW, a *lot* of unit tests on *my* generated html code would break,
and I imagine a *lot* of other people's code would break too. So
changing the defaults is not a good idea.
But if you want, import this on sitecustomize.py and pretend it said
quote=True:

import cgi
cgi.escape.func_defaults = (True,)
del cgi



Gabriel Genellina
Softlab SRL





__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas
 
S

Steven D'Aprano

Any change in Python that has these consequences will rightfully be
considered a bug. So what you are suggesting is to knowingly introduce a
bug in the standard library!


It isn't like there have never been backwards _in_compatible changes to
the standard library before.

Ten seconds of googling finds
http://www.python.org/download/releases/2.3/highlights/:

int() - this can now return a long when converting a string with many
digits, rather than raising OverflowError. (New in 2.3a2: issues a
FutureWarning when sign-folding an unsigned hex or octal literal.)

Bastion and rexec - these modules are disabled, because they aren't
safe in Python 2.3 (nor in Python 2.2). (New in 2.3a2.)

Hex/oct literals prefixed with a minus sign were handled
inconsistently. This has been fixed in accordance with PEP 237. (New
in 2.3a2.)

Passing a float to C functions expecting an integer now issues a
DeprecationWarning; in the future this will become a TypeError. (New
in 2.3a2.)

None - assignment to variables or attributes named None will now
trigger a warning. In the future, None may become a keyword.

And more, all from one release.

If the behaviour of cgi.escape is "broken", or incomplete, or misleading,
then Python has a great mechanism for introducing incompatible changes
slowly: warnings.

It isn't good enough to say that the function does what it says it does,
if what it does is dangerous and misleading. Artificial example:

def sqr(x):
"""Returns the square of almost all numbers."""
if x != 1: return x**2
else: return -1

The function does exactly what it says, and yet still has badly dangerous
behaviour that risks introducing serious bugs. If people are relying on
unit tests which include specific tests for that behaviour, then the
function and the code needs to be fixed in parallel. That's what the
warnings module is for.

So any arguments about "breaking code" are a red herring: if cgi.escape
does the wrong thing (and that's arguable), and code relies on that
behaviour, then the code is already broken and needs to be fixed in
parallel with the function. So can we accept that:

(1) *if* there is a problem with cgi.escape it needs to be fixed;

(and, dear gods, I would hope that nobody here wants to argue that Python
should make backwards compatibility a higher virtue than correctness!)

(2) it doesn't need to be fixed *immediately* without warning;

(3) but it can be fixed through a gradual process with warning; and

(4) unit tests and code that expect the (presumed) bad behaviour can be
fixed gradually?

Now that we've got that out of the way, can we CALMLY and RATIONALLY
discuss whether cgi.escape is or isn't broken?

Or, more specifically, UNDER WHAT CIRCUMSTANCES it does the wrong thing?
 
S

Steve Holden

Jon said:
I would have hoped that people don't treat that as a licence to be
obnoxious, though. I am aware of Fredrik's history, which is why I
was somewhat surprised and disappointed that he was being so rude
and unpleasant in this thread. He is not living up to his reputation
at all. Maybe he's having a bad day ;-)

I generally find that Fredrik's rudeness quotient is satisfactorily
biased towards discouraging ill-informed comment. As far as rudeness
goes, I've found your approach to this discussion to be pretty
obnoxious, and I'm generally know as someone with a high tolerance for
idiotic behaviour.

If your intention was to troll you could not have crafted your
contributions in a better way.

regards
Steve
 
D

Dan Bishop

Lawrence said:
What you're doing is adding to the reasons why the existing cgi.escape
function is stupidly designed and implemented. The True case is by far the
most common, so to make that the slow case, as well as being the
non-default case, is doubly brain-dead.

How exactly would you make s = s.replace('"',"&quot;") faster than
*not* doing the replacement?
 
D

Duncan Booth

Lawrence D'Oliveiro said:
What you're doing is adding to the reasons why the existing cgi.escape
function is stupidly designed and implemented. The True case is by far
the most common, so to make that the slow case, as well as being the
non-default case, is doubly brain-dead.

It is slightly slower because it does more. Both cases are about 15 times
faster than the regular expression implementation someone posted to this
thread yesterday.
 
D

Duncan Booth

Lawrence D'Oliveiro said:
Right-double-quote is not an HTML special, so there's no need to quote
it. I'm only concerned here with characters that have special meanings
in HTML markup.

There is no need to quote " or ' either except in particular situations.

Would you care to suggest how you get a right double quote into any iso-
8859-1 encoded web page without quoting it? Even if the page is utf-8
encoded quoting it can be a good idea.
What you're really asking for is a version of cgi.escape that a) fixes
the bugs discussed in this thread, and b) copes with different
encodings while doing so.

To handle b), you would need to pass it some indication of what the
encoding of the string is. In any case, converting a literal
right-double-quote to ” is not relevant to the purpose of
cgi.escape.
You don't seem to understand about html entity escapes. ” is a valid
way to express right double quote whatever the page encoding. There is no
need to know the encoding of the page in order to escape entities, just
escape anything which can be problematic.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,436
Messages
2,571,696
Members
48,796
Latest member
Greg L.
Top