EBCDIC <--> ASCII

  • Thread starter martinjamesevans
  • Start date
M

martinjamesevans

I'm having a problem trying to use the codecs package to aid me in
converting some bytes from EBCDIC into ASCII.

I have some 8bit text that is in mixed format. I extract the bytes
that are coded for EBCDIC and would like to display them correctly.
The bytes that are EBCDIC could values 0-255, I'm only really
interested in the printable portions and could say leave the rest as
dots.

I've tried starting with something like this, but I assume it is
expecting the source to be in unicode already?

e.g. (pretend the second half are EBCDIC characters)

sAll = "This bit is ASCII, <this bit ebcdic>"
sSource = sAll[19:]

sEBCDIC = unicode(sSource, 'cp500', 'ignore')
sASCII = sEBCDIC.encode('ascii')

Obviously I could just knock up a 255 character lookup table and do it
myself, I was just trying to be a little more Pythonic and use that
built in table.

Thanks,

Martin
 
U

Ulrich Eckhardt

I've tried starting with something like this, but I assume it is
expecting the source to be in unicode already?

e.g. (pretend the second half are EBCDIC characters)

sAll = "This bit is ASCII, <this bit ebcdic>"

Why pretend? You can use this:

"abcde\x81\x82\x83\x84"
sSource = sAll[19:]

sEBCDIC = unicode(sSource, 'cp500', 'ignore')

If you mean this sSource, then no. sSource is treated as byte string here
which is converted to Unicode using 'cp500' as encoding. Note that in
interactive mode, 'print x' will actually convert the string according to
stdout's current encoding (typically ASCII or - I think - Latin 1).

s1 = u'abcde'
s2 = s1.encode('cp500')
s3 = s1.encode('ascii')
s4 = unicode( s2, 'cp500')
s5 = unicode( s3, 'ascii')

Uli
 
M

Michael Ströder

I'm having a problem trying to use the codecs package to aid me in
converting some bytes from EBCDIC into ASCII.

Which EBCDIC variant?
sEBCDIC = unicode(sSource, 'cp500', 'ignore')

Are you sure CP500 is the EBCDIC variant for the language you want?

http://www.ietf.org/rfc/rfc1345.txt lists it as:

&charset IBM500
&rem source: IBM NLS RM Vol2 SE09-8002-01, March 1990
&alias CP500
&alias ebcdic-cp-be
&alias ebcdic-cp-ch
Obviously I could just knock up a 255 character lookup table and do it
myself, I was just trying to be a little more Pythonic and use that
built in table.

It's pythonic to implement a Unicode codec for unknown character tables.
I've put these two on my web site:

http://www.stroeder.com/pylib/encodings/ebcdicatde.py
http://www.stroeder.com/pylib/encodings/cp273.py (needs ebcdicatde)

Ciao, Michael.
 
M

Michael Ströder

Thanks for the tables, ebcdicatde.py does look more suitable.

My problem appears to be that my source is a byte string. In a
nutshell I need "\x81\x82\x83\xf1\xf2\xf3" to become "abc123" in a
byte string.

Python 2.5.2 (r252:60911, Aug 1 2008, 00:43:38)
Ciao, Michael.
 
M

martinjamesevans

Python 2.5.2 (r252:60911, Aug  1 2008, 00:43:38)
 >>> import ebcdicatde
 >>> "\x81\x82\x83\xf1\xf2\xf3".decode('ebcdic-at-de').encode('ascii')
'abc123'
 >>>

Ciao, Michael.- Hide quoted text -

- Show quoted text -


Many thanks for all your posts!
Just what I needed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,119
Latest member
IrmaNorcro
Top