unicode 3 digit decimal conversion

Rune Hansen · Sep 27, 2003

Hi,
I've got the string "Gratis øl",or in english:"Free beer", I know there
is no such thing but...

Python 2.3 (#1, Aug 1 2003, 15:23:03)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.248

What I need is the converted string to read u'Gratis \248l' (*
How do I do this without going through each and every character of the
string?
(not that I have figgured out how to do that right either)

regards

/rune
*) I need to communicate with a telnet interface that only accepts
accented characters as unicode decimals

Klaus Alexander Seistrup · Sep 27, 2003

Rune said:
248

What I need is the converted string to read u'Gratis \248l' (*
How do I do this without going through each and every character
of the string?

How about

#v+

#v-

// Klaus

--

Martin v. =?iso-8859-15?q?L=F6wis?= · Sep 27, 2003

Rune Hansen said:
What I need is the converted string to read u'Gratis \248l' (*
How do I do this without going through each and every character of the
string?

You can register an error callback, like this:

import codecs

def decimal_escape(exc):
try:
data = exc.object
res = u""
for i in range(exc.start, exc.end):
char = ord(data)
if char < 1000:
res += u"\\%03d" % char
else:
# Unsupported character
raise exc

return res, exc.end
except:
raise exc

codecs.register_error("decimal-escape", decimal_escape)

print u"Gratis \xf8l".encode("us-ascii", "decimal-escape")

Notice That your specification is a bit unclear as to what to do with
characters > 1000; I assume they are not supported in your protocol.

Regards,
Martin

Rune Hansen · Sep 27, 2003

Hi , yes, of course *blush*.
Thanks

/regards

/rune

Rune Hansen · Sep 27, 2003

Hi,
The tip from Klaus "solved" my problem for the time beeing, but your
snipplet definitively goes into my "tool chest"

thanks

regards

/rune

What I need is the converted string to read u'Gratis \248l' (*
How do I do this without going through each and every character of the
string?

Click to expand...

You can register an error callback, like this:

import codecs

def decimal_escape(exc):
try:
data = exc.object
res = u""
for i in range(exc.start, exc.end):
char = ord(data)
if char < 1000:
res += u"\\%03d" % char
else:
# Unsupported character
raise exc

return res, exc.end
except:
raise exc

codecs.register_error("decimal-escape", decimal_escape)

print u"Gratis \xf8l".encode("us-ascii", "decimal-escape")

Notice That your specification is a bit unclear as to what to do with
characters > 1000; I assume they are not supported in your protocol.

Regards,
Martin

Peter Otten · Sep 27, 2003

Rune said:
248

What I need is the converted string to read u'Gratis \248l' (*
How do I do this without going through each and every character of the
string?
(not that I have figgured out how to do that right either)

I see your problem is already solved, just want to add that normally (read:
C and Python) the backslash notation is base 8 not base 10.
Traceback (most recent call last):

Peter

Martin v. =?iso-8859-15?q?L=F6wis?= · Sep 27, 2003

Peter Otten said:
I see your problem is already solved

I'm quite uncertain as to what the solution is, though - or perhaps
what the problem is. The OP said that the telnet server expects
backslash characters in the data stream (atleast I interpreted his
message that way), but then he was happy with an approach that did not
send backslash characters.

Perhaps the telnet server really expects latin-1, in which case
encoding the Unicode string in that encoding, or in iso-8859-15,
would work fine most of the time.

Regards,
Martin

Rune Hansen · Sep 28, 2003

Hi Martin, you raise a very interesting question(at least for me it is

.
The, otherwise, excelent support people at Stalker (it's a CommuniGate
pro server I'm speaking to) has me totally confused.

What I tried to do was to send a quoteattr(u'string') (using quoteattr
from sax). The server was very happy with this, showing the accented
characters, when viewed in a telnet session, as human readable text.
This, it became apparent, was obviously wrong. The data was garbled when
viewing it in the web interface.

Stalker told me to send the letter "ø" as \248 or as xf8 (notice the
missing "\"). At this point I'm sending
quoteattr(unicode('string',"iso-8859-1).encode("utf-8")) which is
neither of the above.(..?).
Anyway, the server is still happy, and the data views correctly in the
web interface.

Stalker provides a perl and java API for the telnet server. I don't
read perl code very well, and the java API is distributed as .class
files(nothing new there, it's java after all) so I really don't know how
Stalker is handling it.

regards

/rune

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Sep 28, 2003

Rune said:
Stalker told me to send the letter "ø" as \248 or as xf8 (notice the
missing "\"). At this point I'm sending
quoteattr(unicode('string',"iso-8859-1).encode("utf-8")) which is
neither of the above.(..?).

Correct: UTF-8 works differently. I find it surprising that anybody
actually proposes to send non-ASCII characters using xHH, as this
byte sequence my coincidently happen in ASCII text as well.

Anyway, the server is still happy, and the data views correctly in the
web interface.

It is relatively easy to recognize UTF-8 in the input; it is unlikely
that "real" data look like UTF-8 by coincidence (unlike \-escaping
or x-escaping). So it might be that the server studies the input to
guess the encoding. This is bad style, of course - the protocol should
be clear about encodings (this protocol couldn't be published in an
IETF RFC).

Stalker provides a perl and java API for the telnet server. I don't
read perl code very well, and the java API is distributed as .class
files(nothing new there, it's java after all) so I really don't know how
Stalker is handling it.

Even then, you could only find out what the perl and java clients do -
you couldn't tell, from that, what other options the server might support.

Regards,
Martin

Fredrik Lundh · Sep 29, 2003

Martin said:
Correct: UTF-8 works differently. I find it surprising that anybody
actually proposes to send non-ASCII characters using xHH, as this
byte sequence my coincidently happen in ASCII text as well.

unless they expect you to send "x" as "x78", of course.

</F>

myths about python 3	68	Jan 27, 2010
tempfile broken in 2.3.4?	1	Jun 11, 2004
An interesting python problem using Zope 2.7.3	1	Feb 5, 2005
math.pow vs pow	5	Nov 27, 2003
Trying to understand this moji-bake	9	Jan 25, 2014
Tix cannot open /usr/share/libtix	1	Aug 17, 2005
Temp dir creation	0	Mar 24, 2005
inconsistent value from __builtins__	1	Oct 1, 2004

unicode 3 digit decimal conversion

Rune Hansen

Klaus Alexander Seistrup

Martin v. =?iso-8859-15?q?L=F6wis?=

Rune Hansen

Rune Hansen

Peter Otten

Martin v. =?iso-8859-15?q?L=F6wis?=

Rune Hansen

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Fredrik Lundh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads