How to support a non-standard encoding?

I

Ivan

Dear All

I'm developing a python application for which I need to support a
non-standard character encoding (specifically ISO 6937/2-1983, Addendum
1-1989). Here are some of the properties of the encoding and its use in
the application:

- I need to read and write data to/from files. The file format
includes two sections in different character encodings (so I
shan't be able to use codecs.open()).

- iso-6937 sections include non-printing control characters

- iso-6937 is a variable width encoding, e.g. "A" = [41],
"Ä" = [0xC8, 0x41]; all non-spacing diacritical marks are in the
range 0xC0-0xCF.

By any chance is there anyone out there working on iso-6937?

Otherwise, I think I need to write a new codec to support reading and
writing this data. Does anyone know of any tutorials or blog posts on
implementing a codec for a non-standard characeter encoding? Would
anyone be interested in reading one?

With thanks and best wishes

Ivan


--
============================================================
Ivan A. Uemlianin
Llaisdy
Speech Technology Research and Development

(e-mail address removed)
www.llaisdy.com
llaisdy.wordpress.com
github.com/llaisdy
www.linkedin.com/in/ivanuemlianin

"Froh, froh! Wie seine Sonnen, seine Sonnen fliegen"
(Schiller, Beethoven)
============================================================
 
T

Tim Wintle

Dear All

I'm developing a python application for which I need to support a
non-standard character encoding (specifically ISO 6937/2-1983, Addendum
1-1989).

If your system version of iconv contains that encoding (mine does) then
you could use a wrapped iconv library to avoid re-inventing the wheel.

I've got a forked version of the "iconv" package from pypi available
here:

<https://github.com/timwintle/iconv-python>

... it should work on python2.5-2.7

Tim
 
I

Ivan Uemlianin

Dear Tim

Thanks for your help.
If your system version of iconv contains that encoding, ...

Alas, it doesn't:

$ iconv -l |grep 6937
$

Also, I'd like to package the app so other people could use it, so I
wouldn't want to depend too much on the local OS.

Best wishes

Ivan


If your system version of iconv contains that encoding (mine does) then
you could use a wrapped iconv library to avoid re-inventing the wheel.

I've got a forked version of the "iconv" package from pypi available
here:

<https://github.com/timwintle/iconv-python>

.. it should work on python2.5-2.7

Tim


--
============================================================
Ivan A. Uemlianin
Llaisdy
Speech Technology Research and Development

(e-mail address removed)
www.llaisdy.com
llaisdy.wordpress.com
github.com/llaisdy
www.linkedin.com/in/ivanuemlianin

"Froh, froh! Wie seine Sonnen, seine Sonnen fliegen"
(Schiller, Beethoven)
============================================================
 
J

jmfauth

Dear All

I'm developing a python application for which I need to support a
non-standard character encoding (specifically ISO 6937/2-1983, Addendum
1-1989).  Here are some of the properties of the encoding and its use in
the application:

   - I need to read and write data to/from files.  The file format
     includes two sections in different character encodings (so I
     shan't be able to use codecs.open()).

   - iso-6937 sections include non-printing control characters

   - iso-6937 is a variable width encoding, e.g. "A" = [41],
     "Ä" = [0xC8, 0x41]; all non-spacing diacritical marks are in the
     range 0xC0-0xCF.

By any chance is there anyone out there working on iso-6937?

Otherwise, I think I need to write a new codec to support reading and
writing this data.  Does anyone know of any tutorials or blog posts on
implementing a codec for a non-standard characeter encoding?  Would
anyone be interested in reading one?


Take a look at the files, Python modules, in the
....\Lib\encodings. This is the place where all codecs
are centralized. Python is magically using these
a long there are present in that dir.

I remember, long time ago, for the fun, I created such
a codec quite easily. I picked up one of the file as
template and I modified its "table". It was a
byte <-> byte table.

For multibytes coding scheme, it may be a litte bit more
complicated; you may take a look, eg, at the mbcs.py codec.

The distibution of such a codec may be a problem.

----

Another simple approach, os independent.

You probably do not write your code in iso-6937, but
you only need to encode/decode some bytes sequence
"on the fly". In that case, work with bytes, create
a couple of coding / decoding functions with a
created <dict> [*] as helper. It's not so complicate.
Use <unicode> Py2 or <str> Py3 (the recommended
way ;-) ) as pivot encoding.

[*] I also created once a such a dict from
# http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt

I never checked if it does correpond to the "official" cp1252
codec.

jmf
 
I

Ivan

Dear jmf, Tim

Thanks for these pointers. They look v useful.

I'll have a go and report back (with success I hope).

Best wishes

Ivan

There is a register_codec method (or similar) in the codecs module.

Tim


--
============================================================
Ivan A. Uemlianin
Llaisdy
Speech Technology Research and Development

(e-mail address removed)
www.llaisdy.com
llaisdy.wordpress.com
github.com/llaisdy
www.linkedin.com/in/ivanuemlianin

"Froh, froh! Wie seine Sonnen, seine Sonnen fliegen"
(Schiller, Beethoven)
============================================================
 
T

Thomas Rachel

Am 06.01.2012 21:00 schrieb jmfauth:
Another simple approach, os independent.

You probably do not write your code in iso-6937, but
you only need to encode/decode some bytes sequence
"on the fly". In that case, work with bytes, create
a couple of coding / decoding functions with a
created<dict> [*] as helper. It's not so complicate.
Use<unicode> Py2 or<str> Py3 (the recommended
way ;-) ) as pivot encoding.

These coding/decoding functions are exactly the way to create a codec.
I. e., it is not much more.


Thomas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,165
Latest member
JavierBrak
Top