parse a csv file into a text file


Z

Zhen Zhang

Hi, every one.

I am a second year EE student.
I just started learning python for my project.

I intend to parse a csv file with a format like

3520005,"Toronto (Ont.)",C ,F,2503281,2481494,F,F,0.9,1040597,979330,630.1763,3972.4,1
2466023,"Montréal (Que.)",V ,F,1620693,1583590,T,F,2.3,787060,743204,365..1303,4438.7,2
5915022,"Vancouver (B.C.)",CY ,F,578041,545671,F,F,5.9,273804,253212,114.7133,5039.0,8
3519038,"Richmond Hill (Ont.)",T ,F,162704,132030,F,F,23.2,53028,51000,100..8917,1612.7,28

into a text file like the following

Toronto 2503281
Montreal 1620693
Vancouver 578041

I am extracting the 1st and 5th column and save it into a text file.

This is what i have so far.


Code:
import csv
file = open('raw.csv')
reader = csv.reader(file)

f = open('NicelyDone.text','w')

for line in reader:
f.write("%s %s"%line[1],%line[5])

This is not working for me, I was able to extract the data from the csv file as line[1],line[5]. (I am able to print it out)
But I dont know how to write it to a .text file in the format i wanted.

Also, I have to process the first column eg, "Toronto (Ont.)" into "Toronto".
I am familiar with the function find(), I assume that i could extract Toronto out of Toronto(Ont.) using "(" as the stopping character,
but based on my research , I have no idea how to use it and ask it to return me the string(Toronto).

Here is my question:
1:What is the data format for line[1], if it is string how come f.write()does not work. if it is not string, how do i convert it to a string?
2:How do i extract the word Toronto out of Toronto(Ont) into a string form using find() or other methods.

My thinking is that I could add those 2 string together like c=a+' ' +b, that would give me the format i wanted.
So i can use f.write() to write into a file ;)

Sorry if my questions sounds too easy or stupid.

Thanks ahead

Zhen
 
Ad

Advertisements

A

Asaf Las

Hi, every one.
Zhen
str_t = '3520005,"Toronto (Ont.)",C ,F,2503281,2481494,F,F,0.9,1040597,979330,630.1763,3972.4,1'
list_t = str_t.split(',')
print(list_t)
print("split result ", list_t[1], list_t[5])
print(list_t[1].split('"')[1])
 
R

Roy Smith

Zhen Zhang said:
Code:
import csv
file = open('raw.csv')
reader = csv.reader(file)

f = open('NicelyDone.text','w')

for line in reader:
f.write("%s %s"%line[1],%line[5])

Are you using Python 2 or 3?
Here is my question:
1:What is the data format for line[1],

That's something you can easily figure out by printing out the
intermediate values. Try something like:
for line in reader:
print type(line[1]), repr(line(1))

See if that prints what you expect.
how come f.write() does not work.

What does "does not work" mean? What does get written to the file? Or
do you get some sort of error?

I'm pretty sure I see your error, but I'm trying to lead you to being
able to diagnose it yourself :)
 
M

MRAB

Hi, every one.

I am a second year EE student.
I just started learning python for my project.

I intend to parse a csv file with a format like

3520005,"Toronto (Ont.)",C ,F,2503281,2481494,F,F,0.9,1040597,979330,630.1763,3972.4,1
2466023,"Montréal (Que.)",V ,F,1620693,1583590,T,F,2.3,787060,743204,365.1303,4438.7,2
5915022,"Vancouver (B.C.)",CY ,F,578041,545671,F,F,5.9,273804,253212,114.7133,5039.0,8
3519038,"Richmond Hill (Ont.)",T ,F,162704,132030,F,F,23.2,53028,51000,100.8917,1612.7,28

into a text file like the following

Toronto 2503281
Montreal 1620693
Vancouver 578041

I am extracting the 1st and 5th column and save it into a text file.

This is what i have so far.


Code:
import csv
file = open('raw.csv')
reader = csv.reader(file)

f = open('NicelyDone.text','w')

for line in reader:
f.write("%s %s"%line[1],%line[5])

This is not working for me, I was able to extract the data from the csv file as line[1],line[5]. (I am able to print it out)
But I dont know how to write it to a .text file in the format i wanted.
% is an operator. When used with a format string on its left, its
arguments go on its right. In the general case, those arguments should
be put in a tuple, although if there's only one argument and it's not a
tuple, you can write just that argument:

f.write("%s %s" % (line[1], line[5]))
Also, I have to process the first column eg, "Toronto (Ont.)" into "Toronto".
I am familiar with the function find(), I assume that i could extract Toronto out of Toronto(Ont.) using "(" as the stopping character,
but based on my research , I have no idea how to use it and ask it to return me the string(Toronto).
Use find to tell you the index of the "(" (if there isn't one then
it'll return -1) and then slice the string to get the part preceding it.

Another way is to use the "partition" method.

Also, have a look at the "strip"/"lstrip"/"rstrip" methods.
Here is my question:
1:What is the data format for line[1], if it is string how come f.write()does not work. if it is not string, how do i convert it to a string?
2:How do i extract the word Toronto out of Toronto(Ont) into a string form using find() or other methods.

My thinking is that I could add those 2 string together like c=a+' ' +b, that would give me the format i wanted.
So i can use f.write() to write into a file ;)

Sorry if my questions sounds too easy or stupid.

Thanks ahead

Zhen
 
T

Tim Chase

import csv
file = open('raw.csv')

Asaf recommended using string methods to split the file. Keep doing
what you're doing (using the csv module), as it attends to a lot of
edge-cases that will trip you up otherwise. I learned this the hard
way several years into my Python career. :)
reader = csv.reader(file)

f = open('NicelyDone.text','w')

for line in reader:
f.write("%s %s"%line[1],%line[5])

Here, I'd start by naming the pieces that you get, so do

for line in reader:
location = line[1]
value = line[5]
Also, I have to process the first column eg, "Toronto (Ont.)" into
"Toronto". I am familiar with the function find(), I assume that i
could extract Toronto out of Toronto(Ont.) using "(" as the
stopping character, but based on my research , I have no idea how
to use it and ask it to return me the string(Toronto).

You can use the .split() method to split a string, so you could do
something like

if '(' in location:
bits = location.split('(')
# at this point, bits = ['Toronto ', 'Ont.)']
location = bits[0].strip() # also strip it to remove whitespace
1:What is the data format for line[1], if it is string how come
f.write()does not work. if it is not string, how do i convert it to
a string?

The problem is not that "it is not a string" but that you passing
multiple parameters, the second of which is invalid Python because it
has an extra percent-sign. First create the one string that you
want to output:

output = "%s %s\n" % (location, bits)

and then write it out to the file:

f.write(output)

rather than trying to do it all in one pass.

-tkc
 
M

Mark Lawrence

Asaf recommended using string methods to split the file. Keep doing
what you're doing (using the csv module), as it attends to a lot of
edge-cases that will trip you up otherwise. I learned this the hard
way several years into my Python career. :)

+1
 
Ad

Advertisements

D

Dave Angel

Zhen Zhang said:
Hi, every one.

I am a second year EE student.
I just started learning python for my project.

I intend to parse a csv file with a format like

3520005,"Toronto (Ont.)",C ,F,2503281,2481494,F,F,0.9,1040597,979330,630.1763,3972.4,1
2466023,"Montréal (Que.)",V ,F,1620693,1583590,T,F,2.3,787060,743204,365.1303,4438.7,2
5915022,"Vancouver (B.C.)",CY ,F,578041,545671,F,F,5.9,273804,253212,114.7133,5039.0,8
3519038,"Richmond Hill (Ont.)",T ,F,162704,132030,F,F,23.2,53028,51000,100.8917,1612.7,28

into a text file like the following

Toronto 2503281
Montreal 1620693
Vancouver 578041

I am extracting the 1st and 5th column and save it into a text file.

Looks to me like columns 1 and 6.
This is what i have so far.


Code:
import csv
file = open('raw.csv')
reader = csv.reader(file)

f = open('NicelyDone.text','w')

for line in reader:
f.write("%s %s"%line[1],%line[5])[/QUOTE]

Why not use print to file f? The approach for redirection is
different between python 2 and 3, and you neglected to say which
you're using.[QUOTE]
My thinking is that I could add those 2 string together like c=a+' ' +b, that would give me the format i wanted.[/QUOTE]

And don't forget the "\n" at end of line.
[QUOTE]
So i can use f.write() to write into a file  ;)[/QUOTE]

Or use print, which defaults to adding in a newline.
[QUOTE]
Sorry if my questions sounds too easy or stupid.
[/QUOTE]

Not in the least.
 
T

Terry Reedy

Hi, every one.

I am a second year EE student.
I just started learning python for my project.

I intend to parse a csv file with a format like

3520005,"Toronto (Ont.)",C ,F,2503281,2481494,F,F,0.9,1040597,979330,630.1763,3972.4,1
2466023,"Montréal (Que.)",V ,F,1620693,1583590,T,F,2.3,787060,743204,365.1303,4438.7,2
5915022,"Vancouver (B.C.)",CY ,F,578041,545671,F,F,5.9,273804,253212,114.7133,5039.0,8
3519038,"Richmond Hill (Ont.)",T ,F,162704,132030,F,F,23.2,53028,51000,100.8917,1612.7,28

into a text file like the following

Toronto 2503281
Montreal 1620693
Vancouver 578041

I am extracting the 1st and 5th column and save it into a text file.

This is what i have so far.


Code:
import csv
file = open('raw.csv')
reader = csv.reader(file)

f = open('NicelyDone.text','w')

for line in reader:
f.write("%s %s"%line[1],%line[5])[/QUOTE]
f.write("%s %s\n" % (line[1], line[5])) should do better.[QUOTE]

This is not working for me,

Always say how something is not working. If there is a traceback, cut
and paste after reading it carefully.


I was able to extract the data from the csv file as line[1],line[5].
(I am able to print it out)
But I dont know how to write it to a .text file in the format i wanted.

Also, I have to process the first column eg, "Toronto (Ont.)" into "Toronto".
I am familiar with the function find(), I assume that i could extract Toronto out of Toronto(Ont.) using "(" as the stopping character,
but based on my research , I have no idea how to use it and ask it to return me the string(Toronto).

Here is my question:
1:What is the data format for line[1], if it is string how come f.write()does not work. if it is not string, how do i convert it to a string?
2:How do i extract the word Toronto out of Toronto(Ont) into a string form using find() or other methods.

My thinking is that I could add those 2 string together like c=a+' ' +b, that would give me the format i wanted.
So i can use f.write() to write into a file ;)

Sorry if my questions sounds too easy or stupid.

Thanks ahead

Zhen
 
A

Asaf Las

On 2014-02-05 16:10, Zhen Zhang wrote:
Asaf recommended using string methods to split the file. Keep doing
what you're doing (using the csv module), as it attends to a lot of
edge-cases that will trip you up otherwise. I learned this the hard
way several years into my Python career. :)

i did not recommend anything :)

import io
import csv

str_t = '''3520005,"Toronto (Ont.)",C ,F,2503281,2481494,F,F,0.9,1040597,979330,630.1763,3972.4,1
2466023,"Montréal (Que.)",V ,F,1620693,1583590,T,F,2.3,787060,743204,365..1303,4438.7,2
5915022,"Vancouver (B.C.)",CY ,F,578041,545671,F,F,5.9,273804,253212,114.7133,5039.0,8
3519038,"Richmond Hill (Ont.)",T ,F,162704,132030,F,F,23.2,53028,51000,100..8917,1612.7,28 '''

file_t = io.StringIO(str_t)

csv_t = csv.reader(file_t, delimiter = ',')
for row in csv_t:
print("split result ", row[1].strip('"'), row[5])
 
T

Tim Chase

i did not recommend anything :)

From your code,

list_t = str_t.split(',')

It might have been a short-hand for obtaining the results of a CSV
row, but it might be better written something like

list_t = csv.reader([str_t])

-tkc
 
A

Asaf Las

On 2014-02-05 19:59, Asaf Las wrote:
From your code,
list_t = str_t.split(',')
It might have been a short-hand for obtaining the results of a CSV
row, but it might be better written something like
list_t = csv.reader([str_t])
-tkc

i was too fast to reply. you are correct!

/Asaf
 
Ad

Advertisements

Z

Zhen Zhang

Code:
import csv[/QUOTE]
[QUOTE]
file = open('raw.csv')[/QUOTE]
[QUOTE]
reader = csv.reader(file) 

f = open('NicelyDone.text','w') 

for line in reader:[/QUOTE]
[QUOTE]
f.write("%s %s"%line[1],%line[5])



Are you using Python 2 or 3?


Here is my question:
1:What is the data format for line[1],



That's something you can easily figure out by printing out the

intermediate values. Try something like:


for line in reader:
print type(line[1]), repr(line(1))



See if that prints what you expect.


how come f.write() does not work.



What does "does not work" mean? What does get written to the file? Or

do you get some sort of error?



I'm pretty sure I see your error, but I'm trying to lead you to being

able to diagnose it yourself :)

Hi Roy ,

Thank you so much for the reply,
I am currenly running python 2.7

i run the
print type(line[1]), repr(line(1))
It tells me that 'list object is not callable

It seems the entire line is a data type of list instead of a data type of "line" as i thought.

The line[1] is a string element of list after all.

f.write("%s %s %s" %(output,location,output))works great,
as MRAB mentioned, I have to do write it in term of tuples.

This is the code I am currently using

for line in reader:
location ="%s"%(line[1])
if '(' in location:
# at this point, bits = ['Toronto ', 'Ont.)']
bits = location.split('(')
location = bits[0].strip()
output = "%s %s\n" %(location,line[5])
f.write("%s" %(output))

It extracts desired information into a text file as i wanted.
however, the python program gives me a Error after the execution.
location="%s"%(line[1])
IndexError: list index out of range

I failed to figure out why.
 
Z

Zhen Zhang

Hi, every one.

str_t = '3520005,"Toronto (Ont.)",C ,F,2503281,2481494,F,F,0.9,1040597,979330,630.1763,3972.4,1'

list_t = str_t.split(',')

print(list_t)

print("split result ", list_t[1], list_t[5])

print(list_t[1].split('"')[1])

Thanks for the reply,
I did not get the line
str_t = '3520005,"Toronto (Ont.)",C ,F,2503281,2481494,F,F,0.9,1040597,979330,630.1763,3972.4,1'

I am processing a entire file not a line, so should i do
str_t=line? maybe

list_t = str_t.split(',')
I think you are trying to spit a line into list.

but the line is already a list format right? that is why it allows me to do
something like line[1].
but I am not sure.
 
Z

Zhen Zhang

Hi, every one.

I am a second year EE student.
I just started learning python for my project.

I intend to parse a csv file with a format like

3520005,"Toronto (Ont.)",C ,F,2503281,2481494,F,F,0.9,1040597,979330,630.1763,3972.4,1
2466023,"Montréal (Que.)",V ,F,1620693,1583590,T,F,2.3,787060,743204,365.1303,4438.7,2
5915022,"Vancouver (B.C.)",CY ,F,578041,545671,F,F,5.9,273804,253212,114.7133,5039.0,8
3519038,"Richmond Hill (Ont.)",T ,F,162704,132030,F,F,23.2,53028,51000,100.8917,1612.7,28

into a text file like the following

Toronto 2503281
Montreal 1620693
Vancouver 578041
I am extracting the 1st and 5th column and save it into a text file.
This is what i have so far.
Code:
[/QUOTE]

import csv[/QUOTE]
[QUOTE]
file = open('raw.csv')[/QUOTE]
[QUOTE]
reader = csv.reader(file) 

f = open('NicelyDone.text','w') 

for line in reader:[/QUOTE]
[QUOTE]
f.write("%s %s"%line[1],%line[5]) [QUOTE]

This is not working for me, I was able to extract the data from the csvfile as line[1],line[5]. (I am able to print it out)
But I dont know how to write it to a .text file in the format i wanted.

% is an operator. When used with a format string on its left, its

arguments go on its right. In the general case, those arguments should

be put in a tuple, although if there's only one argument and it's not a

tuple, you can write just that argument:



f.write("%s %s" % (line[1], line[5]))


Also, I have to process the first column eg, "Toronto (Ont.)" into "Toronto".
I am familiar with the function find(), I assume that i could extract Toronto out of Toronto(Ont.) using "(" as the stopping character,
but based on my research , I have no idea how to use it and ask it to return me the string(Toronto).

Use find to tell you the index of the "(" (if there isn't one then

it'll return -1) and then slice the string to get the part preceding it.



Another way is to use the "partition" method.



Also, have a look at the "strip"/"lstrip"/"rstrip" methods.


Here is my question:
1:What is the data format for line[1], if it is string how come f.write()does not work. if it is not string, how do i convert it to a string?
2:How do i extract the word Toronto out of Toronto(Ont) into a string form using find() or other methods.

My thinking is that I could add those 2 string together like c=a+' ' +b, that would give me the format i wanted.
So i can use f.write() to write into a file ;)

Sorry if my questions sounds too easy or stupid.

Thanks ahead

Thanks for the reply, especially the tuple parts,
I was not familiar with this data format,
but i guess i should :)
 
Z

Zhen Zhang

import csv
file = open('raw.csv')



Asaf recommended using string methods to split the file. Keep doing

what you're doing (using the csv module), as it attends to a lot of

edge-cases that will trip you up otherwise. I learned this the hard

way several years into my Python career. :)


reader = csv.reader(file)

f = open('NicelyDone.text','w')

for line in reader:
f.write("%s %s"%line[1],%line[5])



Here, I'd start by naming the pieces that you get, so do



for line in reader:

location = line[1]

value = line[5]


Also, I have to process the first column eg, "Toronto (Ont.)" into
"Toronto". I am familiar with the function find(), I assume that i
could extract Toronto out of Toronto(Ont.) using "(" as the
stopping character, but based on my research , I have no idea how
to use it and ask it to return me the string(Toronto).



You can use the .split() method to split a string, so you could do

something like



if '(' in location:

bits = location.split('(')

# at this point, bits = ['Toronto ', 'Ont.)']

location = bits[0].strip() # also strip it to remove whitespace


1:What is the data format for line[1], if it is string how come
f.write()does not work. if it is not string, how do i convert it to
a string?



The problem is not that "it is not a string" but that you passing

multiple parameters, the second of which is invalid Python because it

has an extra percent-sign. First create the one string that you

want to output:



output = "%s %s\n" % (location, bits)



and then write it out to the file:



f.write(output)



rather than trying to do it all in one pass.



-tkc

Hi Tim,

Thanks for the reply,

Does the split make a list or tuple?

also,

when i do location=line[1],
it gives me a error even though the program did run correctly and output the correct file.
location=line[1]
IndexError: list index out of range

when i do print line[1], there is no error.
it is really strange
 
Z

Zhen Zhang

Zhen Zhang said:
Hi, every one.

I am a second year EE student.
I just started learning python for my project.

I intend to parse a csv file with a format like

3520005,"Toronto (Ont.)",C ,F,2503281,2481494,F,F,0.9,1040597,979330,630.1763,3972.4,1
2466023,"Montréal (Que.)",V ,F,1620693,1583590,T,F,2.3,787060,743204,365.1303,4438.7,2
5915022,"Vancouver (B.C.)",CY ,F,578041,545671,F,F,5.9,273804,253212,114.7133,5039.0,8
3519038,"Richmond Hill (Ont.)",T ,F,162704,132030,F,F,23.2,53028,51000,100.8917,1612.7,28

into a text file like the following

Toronto 2503281
Montreal 1620693
Vancouver 578041

I am extracting the 1st and 5th column and save it into a text file.



Looks to me like columns 1 and 6.


This is what i have so far.
Code:
[/QUOTE]

import csv[/QUOTE]
[QUOTE]
file = open('raw.csv')[/QUOTE]
[QUOTE]
reader = csv.reader(file) 

f = open('NicelyDone.text','w') 

for line in reader:[/QUOTE]
[QUOTE]
f.write("%s %s"%line[1],%line[5])[/QUOTE]



Why not use print to file f? The approach for redirection is

different between python 2 and 3, and you neglected to say which

you're using. 
[QUOTE]
My thinking is that I could add those 2 string together like c=a+' ' +b, that would give me the format i wanted.[/QUOTE]



And don't forget the "\n" at end of line. 


[QUOTE]
So i can use f.write() to write into a file  ;) [/QUOTE]



Or use print, which defaults to adding in a newline.


[QUOTE]
Sorry if my questions sounds too easy or stupid.[/QUOTE]



Not in the least.
[/QUOTE]

Hi Dave  Thanks for the reply,
I am currently running python 2.7.

Yes, i thought there must be a print function in python like fprint in C++ that allows you to print into a file directly.
But i google about "print string into text file" I got answers using f.write() instead. :)
 
Ad

Advertisements

A

Asaf Las

On Wednesday, February 5, 2014 7:33:00 PM UTC-5, Roy Smith wrote:
I failed to figure out why.

OK, you had to look to what i posted second time. The first one is
irrelevant. Note that file was emulated using StringIO. in your
case it will be file name.
You can grab script below and run directly as python script:

<------------------------------------ start of script
import io
import csv

str_t = '''3520005,"Toronto (Ont.)",C ,F,2503281,2481494,F,F,0.9,1040597,979330,630.1763,3972.4,1
2466023,"Montréal (Que.)",V ,F,1620693,1583590,T,F,2.3,787060,743204,365..1303,4438.7,2
5915022,"Vancouver (B.C.)",CY ,F,578041,545671,F,F,5.9,273804,253212,114.7133,5039.0,8
3519038,"Richmond Hill (Ont.)",T ,F,162704,132030,F,F,23.2,53028,51000,100..8917,1612.7,28 '''

file_t = io.StringIO(str_t)

csv_t = csv.reader(file_t, delimiter = ',')
for row in csv_t:
print("split result ", row[1].strip('"').split('(')[0] , row[5])


<----------------------------- end of script
Output must be (i got it after run):

split result Toronto 2481494
split result Montréal 1583590
split result Vancouver 545671
split result Richmond Hill 132030



row[1].strip('"').split('(')[0] is City name
row[5] is digits at pos 5 wished



Both are strings, so save them later into file.
Regarding this one - you can split operations as below to see what is
happening:
row[1]
row[1].strip('"')
row[1].strip('"').split('(')
row[1].strip('"').split('(')[0]

Have a nice day

/Asaf
 
A

Asaf Las

On Thursday, February 6, 2014 9:52:43 AM UTC+2, Zhen Zhang wrote:
case it will be file name.

little correction not a file name - file object, file_t is result from open()
as you did in your example
 
J

Jussi Piitulainen

Zhen Zhang writes:
....
I am currently running python 2.7.

Yes, i thought there must be a print function in python like fprint
in C++ that allows you to print into a file directly.

But i google about "print string into text file" I got answers using
f.write() instead. :)

Indeed. The first Python hit for me with that query was the tutorial
page on I/O in Python 2, and it does exactly that.
<http://docs.python.org/2/tutorial/inputoutput.html>

That page does refer to the spec of the print statement, where you can
find the way to redirect the output to a file, but you need to be able
to read formal syntax specifications like this:

print_stmt ::= "print" ([expression ("," expression)* [","]]
| ">>" expression [("," expression)+ [","]])

The relevant pattern is the second alternative, after the vertical
bar, which can be instantiated this way:

print >> f, e0, e1

There is one object f with a .write method, and one or more
expressions whose values get written using f.write; the effect of an
optional comma at end is also specified there. Not tutorial-level.
<http://docs.python.org/2/reference/simple_stmts.html#print>

But I use the newer print function even if I have to use 2.7,
something like this:

from __future__ import print_function
f = open("test.txt", "w")
print("hello?", "see me?", file=f)
f.close()

It does a modest amount of formatting: the value of the keyword
argument sep is written between the values, and the value of end is
written at end.
 
Ad

Advertisements

D

Dave Angel

I am currently running python 2.7.

Yes, i thought there must be a print function in python like fprint in C++ that allows you to print into a file directly.
But i google about "print string into text file" I got answers using f.write() instead. :)
In python 2.x,

Instead of
f.write (a + " " + b)
you can use
print >> f, a, b
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top