codecs limitation

D

Denis S. Otkidach

Exploring the idea of Ian Bicking to define web codecs
(http://blog.colorstudy.com/ianb/weblog/2003/11/06.html#P29) I've
noticed that encoder can't return unicode string. Here is
corresponding lines of code from Objects/unicodeobject.c:
--->8---
/* XXX Should we really enforce this ? */
if (!PyString_Check(v)) {
PyErr_Format(PyExc_TypeError,
"encoder did not return a string object
(type=%.400s)",
v->ob_type->tp_name);
--->8---

I have the same question as stated in comments: should we really
enforce this and forget the idea to define some specialized
encodings like 'html'?
 
A

A.M. Kuchling

I have the same question as stated in comments: should we really
enforce this and forget the idea to define some specialized
encodings like 'html'?

I suppose it depends on what the codecs system is *for*. If it's an
interface that goes between between the abstract world of Unicode code
points and the concrete world of 8-bit characters that represent those code
points, then the idea of returning anything but an 8-bit string from
..encode() doesn't make sense. If codecs are for arbitrary string-to-string
transformations, then the restriction should be relaxed.

In any case, it's straightforward to define a separate string-like class
that escapes the string, e.g. as Quixote does:
<htmltext '<p>This is a test.&lt;a href=&quot;http://example.com&gt;'>

--amk
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top