From python to LaTeX in emacs on windows

Brian Elmegaard · Aug 30, 2004

Hi group

I hope this is not a faq...

I try to understand how to use the new way of specifying a files
encoding, but no matter what I do I get strange characters in the
output.

I have a text file which I have generated in python by parsing some
html.

In the file there is international characters like é and ó.
I can see the file in emacs it is encoded as
mule-utf-8-dos

I read the file into python as a string and suddenly the characters
when printed looks strange and consists of two characters.

First problem: How do I avoid this?

Second problem is that I make some string replacements and more in
the string to write a latex output file. When I open this file in
emacs the characters now are not the same?

Second problem: How do I avoid this?

tia,

Benjamin Niemann · Aug 30, 2004

Brian said:
Hi group

I hope this is not a faq...

I try to understand how to use the new way of specifying a files
encoding, but no matter what I do I get strange characters in the
output.

I have a text file which I have generated in python by parsing some
html.

In the file there is international characters like é and ó.
I can see the file in emacs it is encoded as
mule-utf-8-dos

I read the file into python as a string and suddenly the characters
when printed looks strange and consists of two characters.

First problem: How do I avoid this?
>
> Second problem is that I make some string replacements and more in
> the string to write a latex output file. When I open this file in
> emacs the characters now are not the same?
>
> Second problem: How do I avoid this?

When you read the filecontents in python, you'll have the "raw" byte
sequence, in this case it is the UTF-8 encoding of unicode text. But you
probably want a unicode string. Use "text = unicode(data, 'utf-8')"
where "data" is the filecontent you read. After processing you probably
want to write it back to a file. Before you do this, you will have to
convert the unicode string back to a byte sequence. Use "data =
text.encode('utf')".

Handling character encodings correctly *is* difficult. It's no shame, if
you don't get it right on the first attempt.

Brian Elmegaard · Aug 31, 2004

Thank for the help. I solved the problem by specifying the cp1252
encoding for the python file by a magic comment and for the input data file.

When you read the filecontents in python, you'll have the "raw" byte
sequence, in this case it is the UTF-8 encoding of unicode text. But
you probably want a unicode string. Use "text = unicode(data,
'utf-8')" where "data" is the filecontent you read. After processing
you probably want to write it back to a file. Before you do this, you
will have to convert the unicode string back to a byte sequence. Use
"data = text.encode('utf')".

This worked, but when I try to print text I get:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-10: ordinal not in range(128)
Why is that?

Handling character encodings correctly *is* difficult.

What makes it difficult? The OS, the editor, python, latex?

Benjamin Niemann · Aug 31, 2004

Brian said:
Thank for the help. I solved the problem by specifying the cp1252
encoding for the python file by a magic comment and for the input data file.

This worked, but when I try to print text I get:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-10: ordinal not in range(128)
Why is that?

The console only understands "byte streams". To print a unicode string,
python tries to encode it using the default encoding, which is 'ascii'
in your case. That encoding is not able to represent characters like
'ü', 'ä'.. which causes the exception. What I usually do is something like:
print text.encode("cp1251", "ignore")

The 'ignore' argument causes all characters, that cannot be represented
in cp1251 to be silently dropped - which is ok, if the output is only
used e.g. to track progress.

Don't know if there's a way to python to do this automagically for all
unicodes passed to stdout...

What makes it difficult? The OS, the editor, python, latex?

At least for me it is difficult, because I'm used to think "1 byte = 1
character" and when I read/write files I could simple handle the data as
strings. Unless you begin to parse arbitrary data from the internet,
there is little chance that you encounter text encodings different from
your operating systems default and you start to believe that e.g.
"ord('ü') == 252" is a universal rule sent by the gods...
If you do it right, then you should convert all data that 'enters' your
application as early as possible to unicode and encode it back when you
print/save/send it - this way you'll only have to deal with unicodes in
your application code. The most difficult part is probably changing old
habbits

IPython in Emacs	5	Apr 25, 2013
LaTeX parser and pstricks generator in python	0	Feb 14, 2010
Windows command line to python	0	Sep 29, 2021
KML to CSV file conversion using Python and Windows Powershell	0	Oct 14, 2022
jython and emacs on windows	0	May 13, 2010
python in emacs	0	Feb 15, 2009
interfacing python with emacs	0	Feb 16, 2011
python in emacs	1	Feb 15, 2009

From python to LaTeX in emacs on windows

Brian Elmegaard

Benjamin Niemann

Brian Elmegaard

Benjamin Niemann

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads