A critique of cgi.escape

Dennis Lee Bieber · Sep 27, 2006

And if the published documentation said you had to jump off a cliff to use
it, you would do that?

Don't know about the others, but I would conclude, from such
documentation, that the unit was unsuited to my purposes and that I
should look elsewhere...
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

Georg Brandl · Sep 27, 2006

Anthony said:
The Complaints department is down the hall...

Though some discussion participants seemingly want to stay for more
being-hit-on-the-head lessons

Georg

Ben Finney · Sep 27, 2006

Georg Brandl said:
Though some discussion participants seemingly want to stay for more
being-hit-on-the-head lessons

No no, hold your head like this, and then go "waaagh". Try it again.

Brian Quinlan · Sep 27, 2006

John said:
You must be kidding.

Nope. How do you write your templating system unit tests?

Again, you must be kidding: href="/search.cgi?query=3&results=10"

Actually, I wasn't kidding. I was basing this belief on greping through
the Python standard library where only the quote=None form is ever used.
It also matches my experience. But I don't have a large enough sample to
make any claim either way.

Cheers,
Brian

Jon Ribbens · Sep 27, 2006

By example, I do not validate a "page". I validate that all methods
that make up pieces of a page, build them the way they should - these
are our "unit tests". Then, it's up to the templating library to join
all the pieces into the final html page.

That sounds sensible to me - and also likely to be the sort of tests
that are not going to get broken by changes to cgi.escape ;-)

I validated the original html against the corresponding dtd some time
ago (using the w3c validator), and ocasionally when things "looks
wrong" on a browser, but most of the time the html generated pages
are not validated nor checked as a whole.

That's possibly a mistake, but obviously that depends on details of
how your overall methodology works that I have no information about.

Duncan Booth · Sep 27, 2006

Brian Quinlan said:
Actually, I wasn't kidding. I was basing this belief on greping through
the Python standard library where only the quote=None form is ever used.
It also matches my experience. But I don't have a large enough sample to
make any claim either way.

A better sample might be to grep the Zope sources. There are a lot of calls
to escape and the vast majority don't set the quote parameter, but most use
of escape is actually hidden by the templating system. The TAL engine uses
escape(s,1) for attribute values and escape(text) for content, so you get
the best of both worlds: you don't have to think about which form of escape
you need (or even that you need to escape strings at all), and you don't
get quotes escaped when they don't have to be.

Stuart Bishop · Sep 28, 2006

Jon said:
Sorry, that's still not good enough. Why would any code expect such a
thing?

Plenty of test suites for a start. A non-backwards compatible change such as
being suggested can create a huge maintenance burden on lots of people.
People also use that function to escape non-HTML too - if they are using it
as documented, and it produces the correct results for them, great. Note
that the documentation doesn't say that input has to be HTML, nor that
output must be used as HTML. It just describes the transformation that it
does clearly and unambiguously and can quite happily be used for generating
quoted text for use in, say, XML documents. Also, because Python has a
conservative policy on backwards incompatible changes, you are protected
from some wanker going and changing the HTML safe mappings arbitrarily, say
using numerical entity references instead of >, < and &. This
policy allows large software projects to be developed in Python and
maintained with less pain than if they were written in languages with a less
conservative policy.

If you want to improve the situation, join the WEB-SIG to help design new
and improved APIs so that the existing ones like the ancient cgi module can
be deprecated. Or maybe just some helpers can be added to the existing
htmllib module? There are better approaches than making non-backwards
compatible changes to functions people have been relying on since Python 1.5.

--
Stuart Bishop <[email protected]>
http://www.stuartbishop.net/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iD8DBQFFGz4NAfqZj7rGN0oRAodwAJ4qD+VF0VRNrguj/fqwtGgEBk1GTwCeJjcM
Qyd8IxuX+0D0VM083tqGbSs=
=IbTs
-----END PGP SIGNATURE-----

Lawrence D'Oliveiro · Sep 28, 2006

People also use that function to escape non-HTML too - if they are using
it as documented, and it produces the correct results for them, great.
Note that the documentation doesn't say that input has to be HTML, nor
that output must be used as HTML.

It says that the input is converted to "HTML-safe sequences".

It just describes the transformation
that it does clearly and unambiguously and can quite happily be used for
generating quoted text for use in, say, XML documents.

And all those character entities references are also valid in XML.

Also, because Python has a
conservative policy on backwards incompatible changes, you are protected
from some wanker going and changing the HTML safe mappings arbitrarily,
say using numerical entity references instead of >, < and &.

Why would that be wrong? It would still be consistent with the
documentation.

Duncan Booth · Sep 28, 2006

Lawrence D'Oliveiro said:
Why would that be wrong? It would still be consistent with the
documentation.

It would be wrong as he said because "Python has a conservative policy on
backwards incompatible changes". In general (although they may not always
succeed) Python's core developers try not to change functionality even when
that functionality isn't clearly documented. Rather if it becomes an issue
they would prefer to clarify the documentation.

Yes, there is a downside to this: a lot of the Python standard libraries
aren't as good as they could be if incompatible changes were allowed, but
it does reduce maintenance headaches.

The solution is usually that when the standard api is insufficient you wrap
it in something else. cgi.escape is a good example: most people writing web
applications never call it directly because they produce their html output
using a templating language which does all the necessary quoting for them
automatically (e.g. Zope's tal language). If you use tal then you have zero
chance of forgetting to use &quote; in a situation where it is required,
but an incompatible change to cgi.escape could still break your existing
code.

Magnus Lycka · Sep 29, 2006

Jon said:
So what's your excuse?

If you don't like Fredrik's manner I suggest that you simply
don't use any code he's written. Good luck using Python! :^)

Don't bite the hand that feeds you...

Lawrence D'Oliveiro · Oct 7, 2006

Another useful function is this:

def JSString(Str) :
"""returns a JavaScript string literal that evaluates to Str. Note
I'm not worrying about non-ASCII characters for now."""
Result = []
for Ch in Str :
if Ch == "\\" :
Ch = "\\\\"
elif Ch == "\"" :
Ch = "\\\""
elif Ch == "\t" :
Ch = "\\t"
elif Ch == "\n" :
Ch = "\\n"
#end if
Result.append(Ch)
#end for
return "\"" + "".join(Result) + "\""
#end JSString

This can be used, for instance in

sys.stdout.write \
(
"window.setTimeout(%s, 1000)\n"
%
JSString("alert(%s)" % JSString("Hi There!"))
)

Duncan Booth · Oct 8, 2006

Lawrence D'Oliveiro said:
Another useful function is this:

def JSString(Str) :
"""returns a JavaScript string literal that evaluates to Str.
Note I'm not worrying about non-ASCII characters for now."""

<snip>

Here is a shorter alternative that handles non-ASCII sequences provided you
pass in unicode:

def JSString(s):
return repr(unicode(s))[1:]
"Hello 'world'"

For ascii strings you could also use the string-escape codec, but strangely
the unicode-escape codec doesn't escape quotes.

Scott David Daniels · Oct 8, 2006

Lawrence said:
Another useful function is this:

def JSString(Str) :
"""returns a JavaScript string literal that evaluates to Str....

You can do this more simply:

_map = {"\\" : "\\\\", "\"" : "\\\"", "\t" : "\\t", "\n" : "\\n"}
def JSString(Str) :
mapped = [_map.get(Ch, Ch) for Ch in Str]
return "\"" + "".join(mapped) + "\""

Request critique of first program	14	Sep 2, 2007
Critique requested.....	5	Jan 25, 2006
ANN: pyTenjin 1.0.0 - a high-speed and full-featured template engine	1	Feb 22, 2011
Reading in cooked mode (was Re: Python MSI not installing, log fileshowing name of a Viatnemese comm	8	Mar 23, 2014
<c:out/> and escaping of unicode format	0	Oct 27, 2008
Default scope of variables	55	Jul 4, 2013
Building a tree-based readline completer	5	Nov 18, 2013
Problems of Symbol Congestion in Computer Languages	54	Feb 16, 2011

A critique of cgi.escape

Dennis Lee Bieber

Georg Brandl

Ben Finney

Brian Quinlan

Jon Ribbens

Duncan Booth

Stuart Bishop

Lawrence D'Oliveiro

Duncan Booth

Magnus Lycka

Lawrence D'Oliveiro

Duncan Booth

Scott David Daniels

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads