csv reader

Emmanuel · Dec 15, 2009

I have a problem with csv.reader from the library csv. I'm not able to
import accentuated caracters. For example, I'm trying to import a
simple file containing a single word "equação" using the following
code:

import csv
arquivoCSV='test'
a=csv.reader(open(arquivoCSV),delimiter=',')
tab=[]
for row in a:
tab.append(row)
print tab

As a result, I get:

[['equa\xe7\xe3o']]

How can I solve this problem?

Chris Rebert · Dec 15, 2009

I have a problem with csv.reader from the library csv. I'm not able to
import accentuated caracters. For example, I'm trying to import a
simple file containing a single word "equaÃ§Ã£o" using the following
code:

import csv
arquivoCSV='test'
a=csv.reader(open(arquivoCSV),delimiter=',')
tab=[]
for row in a:
Â Â tab.append(row)
print tab

As a result, I get:

[['equa\xe7\xe3o']]

How can I solve this problem?

From http://docs.python.org/library/csv.html :

"""
Note:
This version of the csv module doesnâ€™t support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should be UTF-8 or printable ASCII to be safe;
see the examples in section Examples. These restrictions will be
removed in the future.
"""

Thus, you'll have to decode the results into Unicode manually; this
will require knowing what encoding your file is using. Files in some
encodings may not parse correctly due to the aforementioned NUL
problem.

Cheers,
Chris

Jerry Hill · Dec 15, 2009

I have a problem with csv.reader from the library csv. I'm not able to
import accentuated caracters. For example, I'm trying to import a
simple file containing a single word "equaÃ§Ã£o" using the following
code:

import csv
arquivoCSV='test'
a=csv.reader(open(arquivoCSV),delimiter=',')
tab=[]
for row in a:
Â Â tab.append(row)
print tab

As a result, I get:

[['equa\xe7\xe3o']]

How can I solve this problem?

I don't think it is a problem. \xe7 is the character Ã§ encoded in
Windows-1252, which is probably the encoding of your csv file. If you
want to convert that to a unicode string, do something like the
following.

s = 'equa\xe7\xe3o'
uni_s = s.decode('Windows-1252')
print uni_s

Emmanuel · Dec 15, 2009

Then my problem is diferent!

In fact I'm reading a csv file saved from openoffice oocalc using
UTF-8 encoding. I get a list of list (let's cal it tab) with the csv
data.
If I do:

print tab[2][4]
In ipython, I get:
equação de Toricelli. Tarefa exercícios PVR 1 e 2 ; PVP 1

If I only do:
tab[2][4]

In ipython, I get:
'equa\xc3\xa7\xc3\xa3o de Toricelli. Tarefa exerc\xc3\xadcios PVR 1 e
2 ; PVP 1'

Does that mean that my problem is not the one I'm thinking?

My real problem is when I use that that kind of UTF-8 encoded (?) with
selenium here.
Here is an small code example of a not-working case giving the same
error that on my bigger program:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from selenium import selenium
import sys,os,csv,re

class test:
'''classe para interagir com o sistema acadêmico'''
def __init__(self):
self.webpage=''
self.arquivo=''
self.script=[]
self.sel = selenium('localhost', 4444, '*firefox', 'http://
www.google.com.br')
self.sel.start()
self.sel.open('/')
self.sel.wait_for_page_to_load(30000)
self.sel.type("q", "equação")
#self.sel.type("q", u"equacao")
self.sel.click("btnG")
self.sel.wait_for_page_to_load("30000")

def main():
teste=test()

if __name__ == "__main__":
main()

If I just switch the folowing line:
self.sel.type("q", "equação")

by:
self.sel.type("q", u"equação")

It works fine!
The problem is that the csv.reader does give a "equação" and not a
u"equação"

Here is the error given with bad code (with "equação"):
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (1202, 0))

---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call
last)

/home/manu/Labo/Cefetes_Colatina/Scripts/
20091215_test_acentuated_caracters.py in <module>()
27
28 if __name__ == "__main__":
---> 29 main()
30
31

/home/manu/Labo/Cefetes_Colatina/Scripts/
20091215_test_acentuated_caracters.py in main()
23
24 def main():
---> 25 teste=test()
26
27

/home/manu/Labo/Cefetes_Colatina/Scripts/
20091215_test_acentuated_caracters.py in __init__(self)
16 self.sel.open('/')
17 self.sel.wait_for_page_to_load(30000)
---> 18 self.sel.type("q", "equação")
19 #self.sel.type("q", u"equacao")
20 self.sel.click("btnG")

/home/manu/Labo/Cefetes_Colatina/Scripts/selenium.pyc in type(self,
locator, value)
588 'value' is the value to type
589 """
--> 590 self.do_command("type", [locator,value,])
591
592

/home/manu/Labo/Cefetes_Colatina/Scripts/selenium.pyc in do_command
(self, verb, args)
201 body = u'cmd=' + urllib.quote_plus(unicode(verb).encode
('utf-8'))
202 for i in range(len(args)):
--> 203 body += '&' + unicode(i+1) + '=' +
urllib.quote_plus(unicode(args).encode('utf-8'))
204 if (None != self.sessionId):
205 body += "&sessionId=" + unicode(self.sessionId)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
4: ordinal not in range(128)
WARNING: Failure executing file:
<20091215_test_acentuated_caracters.py>
Python 2.6.4 (r264:75706, Oct 27 2009, 06:16:59)

Emmanuel · Dec 16, 2009

As csv.reader does not suport utf-8 encoded files, I'm using:

fp = codecs.open(arquivoCSV, "r", "utf-8")
self.tab=[]
for l in fp:
l=l.replace('\"','').strip()
self.tab.append(l.split(','))

It works much better except that when I do self.sel.type("q", ustring)
where ustring is a unicode string obtained from the file using the
code showed above.

Remaining problem is that I obtain <sp> insted of a regular space...

Gabriel Genellina · Dec 16, 2009

En Tue said:
Then my problem is diferent!

In fact I'm reading a csv file saved from openoffice oocalc using
UTF-8 encoding. I get a list of list (let's cal it tab) with the csv
data.
If I do:

print tab[2][4]
In ipython, I get:
equação de Toricelli. Tarefa exercícios PVR 1 e 2 ; PVP 1

If I only do:
tab[2][4]

In ipython, I get:
'equa\xc3\xa7\xc3\xa3o de Toricelli. Tarefa exerc\xc3\xadcios PVR 1 e
2 ; PVP 1'

Does that mean that my problem is not the one I'm thinking?

Yes. You have a real problem, but not this one. When you say `print
something`, you get a nice view of `something`, basically the result of
doing `str(something)`. When you say `something` alone in the interpreter,
you get a more formal representation, the result of calling
`repr(something)`:

py> x = "ecuação"
py> print x
ecuação
py> x
'ecua\x87\xc6o'
py> print repr(x)
'ecua\x87\xc6o'

Those '' around the text and the \xNN notation allow for an unambiguous
representation. Two strings may "look like" the same but be different, and
repr shows that.
('ecua\x87\xc6o' is encoded in windows-1252; you should see
'equa\xc3\xa7\xc3\xa3o' in utf-8)

My real problem is when I use that that kind of UTF-8 encoded (?) with
selenium here.
If I just switch the folowing line:
self.sel.type("q", "equação")

by:
self.sel.type("q", u"equação")

It works fine!

Yes: you should work with unicode most of the time. The "recipe" for
having as little unicode problems as possible says:

- convert the input data (read from external sources, like a file) from
bytes to unicode, using the (known) encoding of those bytes

- handle unicode internally everywhere in your program

- and convert from unicode to bytes as late as possible, when writing
output (to screen, other files, etc) using the encoding expected by those
external files.

See the Unicode How To: http://docs.python.org/howto/unicode.html

The problem is that the csv.reader does give a "equação" and not a
u"equação"

The csv module cannot handle unicode text directly, but see the last
example in the csv documentation for a simple workaround:
http://docs.python.org/library/csv.html

.csv to .txt after adding columns	7	Sep 18, 2013
Number of cells, using CSV module	8	May 16, 2013
Separate Rows in reader	29	Mar 24, 2013
Compare list entry from csv files	0	Nov 26, 2012
Grouping on and exporting to csv files	1	Mar 20, 2013
How to implement key of key in python?	5	May 10, 2014
Merge/append CSV files with different headers	0	Mar 24, 2014
Errors When Pulling Information from CSV File to Python	0	Dec 10, 2020

csv reader

Emmanuel

Chris Rebert

Jerry Hill

Emmanuel

Emmanuel

Gabriel Genellina

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads