Re: Python 3 encoding question: Read a filename from stdin,subsequently open that filename

Discussion in 'Python' started by Dan Stromberg, Dec 6, 2010.

  1. Ultimately I switched to reading the filenames from file descriptor 0
    using os.read(); this gave back bytes in 3.x, strings of single-byte
    characters in 2.x - which are similar enough for my purposes, and
    eliminated the filesystem encoding(s) question nicely.

    I rewrote readline0
    (http://stromberg.dnsalias.org/cgi-bin/viewvc.cgi/readline0/trunk/?root=svn)
    for 2.x and 3.x to facilitate reading null-terminated strings from
    stdin. It's in better shape now anyway - more OOP than functional,
    and with a bunch of unit tests. The module now works on CPython 2.x,
    CPython 3.x and PyPy 1.4 from the same code.

    On Mon, Nov 29, 2010 at 9:26 PM, Dan Stromberg <> wrote:
    > I've got a couple of programs that read filenames from stdin, and then
    > open those files and do things with them.  These programs sort of do
    > the *ix xargs thing, without requiring xargs.
    >
    > In Python 2, these work well.  Irrespective of how filenames are
    > encoded, things are opened OK, because it's all just a stream of
    > single byte characters.
    >
    > In Python 3, I'm finding that I have encoding issues with characters
    > with their high bit set.  Things are fine with strictly ASCII
    > filenames.  With high-bit-set characters, even if I change stdin's
    > encoding with:
    >
    >       import io
    >       STDIN = io.open(sys.stdin.fileno(), 'r', encoding='ISO-8859-1')
    >
    > ...even with that, when I read a filename from stdin with a
    > single-character Spanish n~, the program cannot open that filename
    > because the n~ is apparently internally converted to two bytes, but
    > remains one byte in the filesystem.  I decided to try ISO-8859-1 with
    > Python 3, because I have a Java program that encountered a similar
    > problem until I used en_US.ISO-8859-1 in an environment variable to
    > set the JVM's encoding for stdin.
    >
    > Python 2 shows the n~ as 0xf1 in an os.listdir('.').  Python 3 with an
    > encoding of ISO-8859-1 wants it to be 0xc3 followed by 0xb1.
    >
    > Does anyone know what I need to do to read filenames from stdin with
    > Python 3.1 and subsequently open them, when some of those filenames
    > include characters with their high bit set?
    >
    > TIA!
    >
     
    Dan Stromberg, Dec 6, 2010
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Charlie Zender

    Reading stdin once confuses second stdin read

    Charlie Zender, Jun 19, 2004, in forum: C Programming
    Replies:
    6
    Views:
    804
    Dan Pop
    Jun 21, 2004
  2. Jordan S.
    Replies:
    1
    Views:
    414
    Jordan S.
    May 23, 2008
  3. Peter Otten
    Replies:
    0
    Views:
    428
    Peter Otten
    Nov 30, 2010
  4. Peter Otten
    Replies:
    10
    Views:
    889
    Nobody
    Dec 2, 2010
  5. M. Ayhan
    Replies:
    1
    Views:
    124
    Trans
    Mar 8, 2007
Loading...

Share This Page