Program works great, except under less, cron or execl (Unicode?)

S

Sam

I have a program which works great when run from the command line.

But when I run it combined with something else such as:
- piping it through less
- cron
- execl (i.e. calling it from another python program)

it gives me a unicode error

File "../myparser.py", line 261, in set_attributes
print "self.atd['Name'] is: ", self.atd['Name']
UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in
position 7: ordinal not in range(128)

I'd post the whole program here, except it involves weird Unicode
strings.

I could probably change the program to get it working under less/cron/
etc.

But I'd rather understand exactly what the issue is. Why does it work
fine when run directly from the command line, but not otherwise?
 
D

Diez B. Roggisch

Sam said:
I have a program which works great when run from the command line.

But when I run it combined with something else such as:
- piping it through less
- cron
- execl (i.e. calling it from another python program)

it gives me a unicode error

File "../myparser.py", line 261, in set_attributes
print "self.atd['Name'] is: ", self.atd['Name']
UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in
position 7: ordinal not in range(128)

I'd post the whole program here, except it involves weird Unicode
strings.

I could probably change the program to get it working under less/cron/
etc.

But I'd rather understand exactly what the issue is. Why does it work
fine when run directly from the command line, but not otherwise?

Most probably because when to running directly inside a terminal, it gets
it's stdin/stdout as pipes - and python can't attempt to guess the proper
encoding on that, as it does on a terminal.

And thus, when you print unicode to the pipe, it can't decide which encoding
to use.

To circumvene this, try & wrap stdout into a codecs-module wrapper with a
proper encoding applied (e.g. utf-8).

You might make that conditionally based on the sys.stdout.encoding-variable
being set or not, albeit I'm not 100% sure to what it actually gets set
when used in a subprocess. But this should give you the idea where to look.



Diez
 
D

Diez B. Roggisch

Most probably because when to running directly inside a terminal, it gets

That was of course meant to be "not running directly inside a terminal".
it's stdin/stdout as pipes - and python can't attempt to guess the proper
encoding on that, as it does on a terminal.

Diez
 
S

Sam

Diez for the win... :)

sys.stdout.encoding does indeed have the proper value when called from
the command line of UTF-8.

But when piped into anything or called from anywhere it's None.

Just for completeness, here's my test program:
#!/usr/bin/env python
import sys
print sys.stdout.encoding

And here are the results:
$ ./encoding.py
UTF-8
$ ./encoding.py | cat
None

Really, really annoying!

So how can I set sys.stdout.encoding so it's UTF-8 when piped through
cat (or anything else).

I tried assigning to it, but no dice.

Sam said:
I have a program which works great when run from the command line.
But when I run it combined with something else such as:
- piping it through less
- cron
- execl (i.e. calling it from another python program)
it gives me a unicode error
 File "../myparser.py", line 261, in set_attributes
    print "self.atd['Name'] is: ", self.atd['Name']
UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in
position 7: ordinal not in range(128)
I'd post the whole program here, except it involves weird Unicode
strings.
I could probably change the program to get it working under less/cron/
etc.
But I'd rather understand exactly what the issue is.  Why does it work
fine when run directly from the command line, but not otherwise?

Most probably because when to running directly inside a terminal, it gets
it's stdin/stdout as pipes - and python can't attempt to guess the proper
encoding on that, as it does on a terminal.

And thus, when you print unicode to the pipe, it can't decide which encoding
to use.

To circumvene this, try & wrap stdout into a codecs-module wrapper with a
proper encoding applied (e.g. utf-8).

You might make that conditionally based on the sys.stdout.encoding-variable
being set or not, albeit I'm not 100% sure to what it actually gets set
when used in a subprocess. But this should give you the idea where to look.

Diez
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top