How to use 8bit character sets?

Discussion in 'Python' started by copx, Jun 12, 2005.

  1. copx

    copx Guest

    For some reason Python (on Windows) doesn't use the system's default
    character set and that's a serious problem for me.
    I need to process German textfiles (containing umlauts and other > 7bit
    ASCII characters) and generally work with strings which need to be processed
    using the local encoding (I need to display the text using a Tk-based GUI
    for example). The only solution I managed to find was converting between
    unicode and latin-1 all the time (the textfiles aren't unicode, the output
    of the program isn't supposed to be unicode either). Everything worked fine
    until I tried to run the program on a Windows 9x machine.. It seems that
    Python on Win9x doesn't really support unicode (IIRC Win9x doesn't have real
    unicode support so that's not suprising).
    Is it possible to tell Python to use an 8bit charset (latin-1 in my case)
    for textfile and string processing by default?

    copx
    copx, Jun 12, 2005
    #1
    1. Advertising

  2. copx

    Chris Curvey Guest

    Chris Curvey, Jun 13, 2005
    #2
    1. Advertising

  3. copx

    copx Guest

    "Chris Curvey" <> schrieb im Newsbeitrag
    news:...
    > Check out sitecustomize.py.
    >
    > http://diveintopython.org/xml_processing/unicode.html


    Thanks but I'm looking for a way to do this on application level (i.e. I
    want my app to run in an unmodified interpreter enviroment).

    copx
    copx, Jun 13, 2005
    #3
  4. copx

    John Machin Guest

    copx wrote:
    > For some reason Python (on Windows) doesn't use the system's default
    > character set and that's a serious problem for me.
    > I need to process German textfiles (containing umlauts and other > 7bit
    > ASCII characters) and generally work with strings which need to be processed
    > using the local encoding (I need to display the text using a Tk-based GUI
    > for example). The only solution I managed to find was converting between
    > unicode and latin-1 all the time (the textfiles aren't unicode, the output
    > of the program isn't supposed to be unicode either). Everything worked fine
    > until I tried to run the program on a Windows 9x machine.. It seems that
    > Python on Win9x doesn't really support unicode (IIRC Win9x doesn't have real
    > unicode support so that's not suprising).
    > Is it possible to tell Python to use an 8bit charset (latin-1 in my case)
    > for textfile and string processing by default?
    >
    > copx



    1. Your description of your problem is extremely vague. If you were to
    supply a minimal script that "works" [on what platform?? what version of
    Python??], with a description of what you understand by "works", and
    what happens differently when you run that script on a Win9x box [for
    what value(s) of x?? what version of Python??], we might be able to help
    you. N.B. somewhere near the top of the script you should have something
    like:

    import sys
    print "Python version:", sys.version
    print "platform:", sys.platform
    print "default encoding:", sys.getdefaultencoding()
    try:
    print "Windows version:", sys.getwindowsversion()
    except AttributeError:
    print "sys.getwindowsversion not available"

    2. You should read this:

    http://www.catb.org/~esr/faqs/smart-questions.html

    3. You should not rely on a crutch like a default encoding, especially
    one obtained by a kludge like sitecustomize.py. If your app expects to
    receive data in encoding x and send data in encoding y, these facts are
    properties of the application and the data, NOT the box you are running
    on. If you had a requirement to read MacCyrillic from a Classic Mac and
    write KOI8 for consumption on a Windows PC, you should be able to do it
    on a SPARC Solaris box in Timbuktu or Walla Walla, Wa., without having
    to fiddle with site-wide configuration.

    4. AFAIK, support for Unicode is provided by Python with no assistance
    from the operating system. The multitudinous deficiencies in Win9x
    should have no bearing on the problem. Have you tried to run your
    program on a Win2K or WinXP box?

    HTH,

    John
    John Machin, Jun 13, 2005
    #4
  5. copx wrote:
    > For some reason Python (on Windows) doesn't use the system's default
    > character set and that's a serious problem for me.


    I very much doubt this statement: Python does "use" the system's default
    character set on Windows. What makes you think it doesn't?

    > Is it possible to tell Python to use an 8bit charset (latin-1 in my case)
    > for textfile and string processing by default?


    That is the default.

    Regards,
    Martin
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Jun 13, 2005
    #5
  6. copx

    John Roth Guest

    ""Martin v. Löwis"" <> wrote in message
    news:42ad11a7$0$24953$...
    > copx wrote:
    >> For some reason Python (on Windows) doesn't use the system's default
    >> character set and that's a serious problem for me.

    >
    > I very much doubt this statement: Python does "use" the system's default
    > character set on Windows. What makes you think it doesn't?
    >
    >> Is it possible to tell Python to use an 8bit charset (latin-1 in my case)
    >> for textfile and string processing by default?

    >
    > That is the default.


    As far as I can tell, there are actually two defaults, which tends
    to confuse things. One is used whenever a unicode to 8-bit
    conversion is needed on output to stdout, stderr or similar;
    that's usually Latin-1 (or whatever the installation has set up.)
    The other is used whenever the unicode to 8-bit conversion
    doesn't have a context - that's usually Ascii-7.

    John Roth

    >
    > Regards,
    > Martin
    John Roth, Jun 13, 2005
    #6
  7. John Roth wrote:
    >> That is the default.

    >
    >
    > As far as I can tell, there are actually two defaults, which tends
    > to confuse things.


    Notice that there are two defaults already in the operating system:
    Windows has the notion of the "ANSI code page" and the "OEM code
    page", which are used in different contexts.

    > One is used whenever a unicode to 8-bit
    > conversion is needed on output to stdout, stderr or similar;
    > that's usually Latin-1 (or whatever the installation has set up.)


    You mean, in Python? No, this is not how it works. On output
    of 8-bit strings to stdout, no conversion is ever performed:
    the byte strings are written to stdout as-is.

    > The other is used whenever the unicode to 8-bit conversion
    > doesn't have a context - that's usually Ascii-7.


    Again, you seem to be talking about Unicode conversions -
    it's not clear that the OP is actually interested in
    Unicode conversion in the first place.

    Regards,
    Martin
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Jun 13, 2005
    #7
  8. copx

    John Roth Guest

    ""Martin v. Löwis"" <> wrote in message
    news:...
    > John Roth wrote:
    >>> That is the default.

    >>
    >>
    >> As far as I can tell, there are actually two defaults, which tends
    >> to confuse things.

    >
    > Notice that there are two defaults already in the operating system:
    > Windows has the notion of the "ANSI code page" and the "OEM code
    > page", which are used in different contexts.
    >
    >> One is used whenever a unicode to 8-bit
    >> conversion is needed on output to stdout, stderr or similar;
    >> that's usually Latin-1 (or whatever the installation has set up.)

    >
    > You mean, in Python? No, this is not how it works. On output
    > of 8-bit strings to stdout, no conversion is ever performed:
    > the byte strings are written to stdout as-is.


    That's true, but I was talking about outputing unicode strings,
    not 8-bit strings. As you say below, the OP may not have
    been talking about that.

    >> The other is used whenever the unicode to 8-bit conversion
    >> doesn't have a context - that's usually Ascii-7.

    >
    > Again, you seem to be talking about Unicode conversions -
    > it's not clear that the OP is actually interested in
    > Unicode conversion in the first place.
    >
    > Regards,
    > Martin


    John Roth
    John Roth, Jun 13, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    5
    Views:
    4,732
    Tim McCoy
    Jun 12, 2005
  2. marko

    8bit to 7bit numbers

    marko, Aug 23, 2003, in forum: Perl
    Replies:
    0
    Views:
    1,145
    marko
    Aug 23, 2003
  3. =?ISO-8859-1?Q?Christian_H=F6ntsch-Rode?=

    Convert a monochrome (1bit) image into a grayscale (8bit) one

    =?ISO-8859-1?Q?Christian_H=F6ntsch-Rode?=, Feb 3, 2005, in forum: Java
    Replies:
    6
    Views:
    4,922
    =?ISO-8859-1?Q?Christian_H=F6ntsch-Rode?=
    Feb 3, 2005
  4. Web Developer

    char 8bit wide or 7bit wide in c++?

    Web Developer, Jul 31, 2003, in forum: C++
    Replies:
    2
    Views:
    569
    John Harrison
    Jul 31, 2003
  5. humble
    Replies:
    0
    Views:
    879
    humble
    Oct 28, 2006
Loading...

Share This Page