A critique of cgi.escape

Simon Brunning · Sep 26, 2006

To be honest I'm not sure what *sort* of code people test this way. It
just doesn't seem appropriate at all for web page generating code. Web
pages need to be manually viewed in web browsers, and validated, and
checked for accessibility. Checking they're equal to a particular
string just seems bizarre (and where does that string come from
anyway?)

The kind of acceptance testing that you are talking about is very
important, but you can't automate "Does it look OK in a browser?".
Unit tests suitable for automation don't have anything to work with
*but* the generated HTML.

Jon Ribbens · Sep 26, 2006

Well, there are dozens (hundreds?) of templating systems for Python.

I know, I wrote one of them ;-)

t = Template("test.html")
t['foo'] = 'Brian -> "Hi!"'
assert str(t) == 'Brian -> "Hi"'

So how would you test our template system?

What I don't get is why you are testing the above code like that at
all. Surely if the template system somehow became so broken that it
couldn't even do trivial replacements, you would notice immediately
as all your web pages would go totally wrong.

Maybe, which is why I'm asking you how you do it. Some of our web
applications contain 100s of script generated pages. Testing each one by
hand after making a change would be completely impossible. So we use
HTTP scripting for testing purposes i.e. send this request, grab the
results, verify that the test in the element with id="username" equals
"Brian Quinlan", etc. The test also validates that each page is well
formed. We also view each page at some point but not every time a
developer makes a change that might (i.e. everything) affect the entire
system.

Ah, ok, that sounds more sensible. But something as specialised and
complicated as that can surely cope with un-encoding HTML entities?

Incidentally, the company I work for, www.sitemorse.com, does
automated web site testing - and it's all done in Python!

Paul Boddie · Sep 26, 2006

Simon said:
The kind of acceptance testing that you are talking about is very
important, but you can't automate "Does it look OK in a browser?".
Unit tests suitable for automation don't have anything to work with
*but* the generated HTML.

I can understand the disbelief that a straight string comparison
involving HTML might be a robust enough test for the output of a Web
application, given that potentially many equivalent representations of
some piece of text may exist, and especially given the typical
unpredictability of many XML-based solutions with respect to things
like whitespace, encodings, entity usage, and the like. On the other
hand, the initial complaint in this thread, whilst reasonable in the
context of some ideal function for "quoting stuff in HTML pages", is
somewhat inappropriate in the context of modifying an existing, mature
function which is now in ubiquitous usage. In order to minimise the
unpredictability of solutions, we should avoid making fundamental
changes especially at the lower levels of such solutions.

I can't remember whether I have any code using cgi.escape, although
since I usually use XML APIs rather than writing HTML manually, I
suspect that I haven't. Nevertheless, the breakage potentially caused
by even one call site involving a modified variant of cgi.escape would
be enough for most people to consider reimplementing the semantics of
the existing function, thus undermining the inclusion of such a
function in the standard library in the first place.

Paul

Fredrik Lundh · Sep 26, 2006

Brian said:
I'd have to dig through the revision history to be sure, but I imagine
that cgi.escape was originally only used in the cgi module (and there
only in it's various print_* functions). Then it started being used by
other core Python modules e.g. cgitb, DocXMLRPCServer.

nah, it's an official API for simple HTML/XML escaping, and it's
perfectly usable for what it's supposed to be used for.

however, if you're doing serious web hacking, you *should* of course
work at the XHTML information set level whenever you can, where you
focus on the data you want to publish (using Unicode strings for any-
thing that's even remotely resembles human text), and the framework
makes sure that it gets to the other side in once piece, using HTML4 or
XHTML as necessary, and escaping and encoding things properly and
efficiently on the way. it's 2006. transferring data from Python
applications to web browsers is no rocket science.

</F>

Brian Quinlan · Sep 26, 2006

Jon said:
Well, there are dozens (hundreds?) of templating systems for Python.

Click to expand...

I know, I wrote one of them ;-)

t = Template("test.html")
t['foo'] = 'Brian -> "Hi!"'
assert str(t) == 'Brian -> "Hi"'

So how would you test our template system?

Click to expand...

What I don't get is why you are testing the above code like that at
all. Surely if the template system somehow became so broken that it
couldn't even do trivial replacements, you would notice immediately
as all your web pages would go totally wrong.

If, in the example that I showed, the less-than character was not
correctly escaped, then it might not manifest itself frequently in a
typical application because the less-than character is seldom used in
English prose.

Also, assuming that single case was trivial to test without a test
harness, how many web pages do I have to look at to be reasonably
confident that *every* feature works correctly?

Cheers,
Brian

Jon Ribbens · Sep 26, 2006

If, in the example that I showed, the less-than character was not
correctly escaped, then it might not manifest itself frequently in a
typical application because the less-than character is seldom used in
English prose.

OK, but effectively what you're talking about here is testing the
'cgi.escape' function itself - said test of course being part and
parcel of the cgi package and therefore easily updatable if the
cgi.escape function changes.

Also, assuming that single case was trivial to test without a test
harness, how many web pages do I have to look at to be reasonably
confident that *every* feature works correctly?

It depends on how many features you have! My templating system, for
example, has sections and replacements, and that's it. Replacements
can be unencoded, html-encoded or url-encoded. That's approximately
4 things to test ;-) Plus, the templating code basically never changes
so doesn't need regression testing.

Gabriel G · Sep 26, 2006

At said:
Why did you write your code that way?

Uhm, maybe because I relied on the published documentation of a
published standard module? Just modify the behavior in *your* own
cgi.escape and all of us will be happy...

Gabriel Genellina
Softlab SRL

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

Gabriel G · Sep 26, 2006

To be honest I'm not sure what *sort* of code people test this way. It
just doesn't seem appropriate at all for web page generating code. Web
pages need to be manually viewed in web browsers, and validated, and
checked for accessibility. Checking they're equal to a particular
string just seems bizarre (and where does that string come from
anyway?)

By example, I do not validate a "page". I validate that all methods
that make up pieces of a page, build them the way they should - these
are our "unit tests". Then, it's up to the templating library to join
all the pieces into the final html page.
I validated the original html against the corresponding dtd some time
ago (using the w3c validator), and ocasionally when things "looks
wrong" on a browser, but most of the time the html generated pages
are not validated nor checked as a whole.
What you describe are another kind of tests, and really should not
depend on the details of cgi.escape - as the usability test of an MP3
player does not care about some transitor's hFE used inside...

Gabriel Genellina
Softlab SRL

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

John Bokma · Sep 26, 2006

Brian Quinlan said:
A summary of this pointless argument:

Why cgi.escape should be changed to escape double quote (and maybe
single quote) characters by default:
o escaping should be very aggressive by default to avoid subtle bugs
o over-escaping is not likely to harm most program significantly
o people who do not read the documentation may be surprised by it's
behavior

Why cgi.escape should NOT be changed:
o it is current used in lots of code and changing it will almost
certainly break some of it, test suites at minimum e.g.
assert my_template_system("{foo}", foo='"') == '"'

You must be kidding.

o escaping attribute values is less common than escaping element
text

Again, you must be kidding: href="/search.cgi?query=3&results=10"

Lawrence D'Oliveiro · Sep 26, 2006

it has *everything* to do with encoding of existing data into HTML so it
can be safely transported to, and recreated by, an HTML-aware client.

does the word "information set" mean anything to you?

The special characters we're talking about escaping--ampersand, less-than,
single-quote, double-quote--are part of the basic syntax of XML and HTML.
They are information-set-independent.

Lawrence D'Oliveiro · Sep 26, 2006

Again, you must be kidding...

I don't think Brian Quinlan was seriously trying to claim that was true,
only that was the argument some people were making. Anybody who's done much
work generating HTML for Web pages will know that dynamically-generated
attribute values occur far more often than dynamically-generated cdata. Or
is that pcdata?

... href="/search.cgi?query=3&results=10"

You _do_ realize that the "&" should be escaped as "&", don't you?

Lawrence D'Oliveiro · Sep 26, 2006

Is there *any* branch of this thread that won't end with some snippy
remark from you?

And this is relevant to the argument how, exactly?

Lawrence D'Oliveiro · Sep 26, 2006

really? most HTML attributes cannot even contain things that would need
to be escaped...

Are you really serious about that?

Lawrence D'Oliveiro · Sep 26, 2006

Gabriel G said:
Uhm, maybe because I relied on the published documentation of a
published standard module?

And if the published documentation said you had to jump off a cliff to use
it, you would do that?

Lawrence D'Oliveiro · Sep 26, 2006

Wow. Are you always that arrogant for things you know very little
about, or just plain stupid ?

Wow. Express an opinion, and get called names by trolls.

John Bokma · Sep 26, 2006

Lawrence D'Oliveiro said:
In message <[email protected]>, John Bokma
wrote:

[..]

You _do_ realize that the "&" should be escaped as "&", don't you?

And what's "/search.cgi?query=3&results=10"? An attribute value. Exactly
my point.

Lawrence D'Oliveiro · Sep 26, 2006

most HTML attributes cannot even contain things that would need
to be escaped ...

sys.stdout.write \
(
"Email: <INPUT TYPE=\"TEXT\" NAME=\"email_address\" VALUE=\"%s\">\n"
%
QuoteHTML(WhateverTheUserPreviouslyTyped)
)

George Sakkis · Sep 26, 2006

Lawrence said:
Wow. Express an opinion, and get called names by trolls.

Funny, "troll" was my initial thought after reading your first few
posts. As you went on and on though, exposing your smugness and failure
to grasp rocket-science concepts such as "backwards compatibility", I
expressed my opinion by dismissing "troll" for a more fit description
of you.

Steve Holden · Sep 26, 2006

Lawrence said:
In message <[email protected]>, Steve
Holden wrote:

And this is relevant to the argument how, exactly?

I would really rather this were a discussion than an argument. You will
now no doubt reply telling me I wouldn't.

My posting was issued as a response to the irritation engendered by your
argumentative style of debate. Your latest response simply proves that
there is indeed no remark, however irrelevant, that you will allow to go
unanswered.

For heaven's sake, learn to shut up for at least some of the time!

regards
Steve

PS: Do you have the maturity to resist the temptation to reply to this?
I seriously doubt it.

Anthony Baxter · Sep 26, 2006

I would really rather this were a discussion than an argument. You will

now no doubt reply telling me I wouldn't.

My posting was issued as a response to the irritation engendered by your
argumentative style of debate. Your latest response simply proves that
there is indeed no remark, however irrelevant, that you will allow to go
unanswered.

The Complaints department is down the hall...

Mini Web Server in C++ (Part One)	4	Oct 2, 2025
Request critique of first program	14	Sep 1, 2007
Critique requested.....	5	Jan 24, 2006
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
ANN: pyTenjin 1.0.0 - a high-speed and full-featured template engine	1	Feb 21, 2011
Reading in cooked mode (was Re: Python MSI not installing, log fileshowing name of a Viatnemese comm	8	Mar 22, 2014
<c:out/> and escaping of unicode format	0	Oct 27, 2008
Default scope of variables	55	Jul 3, 2013

A critique of cgi.escape

Simon Brunning

Jon Ribbens

Paul Boddie

Fredrik Lundh

Brian Quinlan

Jon Ribbens

Gabriel G

Gabriel G

John Bokma

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Lawrence D'Oliveiro

John Bokma

Lawrence D'Oliveiro

George Sakkis

Steve Holden

Anthony Baxter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads