CSV Reader

M

Mike P

Hi All,

I want to read in a CSV file, but then write out a new CSV file from a
given line..

I'm using the CSV reader and have the the line where i want to start
writing the new file from begins with
"Transaction ID",

i thought it should be something along the lines of below.. obvioulsy
this doesn't work but any help would be great.

import csv
f = file(working_CSV, 'rb')
new_data = 0 # a counter to find where the line starts with
"Transaction ID"
reader = csv.reader(f)
for data in reader:
read data file

write new CSV

Cheers

Mike
 
L

Larry Bates

Mike said:
Hi All,

I want to read in a CSV file, but then write out a new CSV file from a
given line..

I'm using the CSV reader and have the the line where i want to start
writing the new file from begins with
"Transaction ID",

i thought it should be something along the lines of below.. obvioulsy
this doesn't work but any help would be great.

import csv
f = file(working_CSV, 'rb')
new_data = 0 # a counter to find where the line starts with
"Transaction ID"
reader = csv.reader(f)
for data in reader:
read data file

write new CSV

Cheers

Mike
What part "obviously" doesn't work? Try something, post any tracebacks and we
will try to help. Don't ask others to write your code for you without actually
trying it yourself. It appears you are on the right track.

-Larry
 
M

Mike P

Hi Larry,

i'm still getting to grips with python, but rest assured i thinkn it's
better for me to write hte code for learnign purposes

My basic file is here, it comes up with a syntax error on the
startswith line, is this because it is potentially a list?
My idea was to get the lines number where i can see Transaction ID and
then write out everything from this point into a new datafile.

Would a better solution be just to use readlines and search for the
string with a counter and then write out a file from there?

Any help is greatly appreciated

Mike


working_CSV = "//filer/common/technical/Research/E2C/Template_CSV/
DFAExposureToConversionQueryTool.csv"

import csv
f = file(working_CSV, 'rb')
reader = csv.reader(f)
CSV_lines = ""
for data in reader:
if lines.startswith("Transaction ID")
append.reader.line_num()
# there will only be 1 instance of this title at the start of the CSV
file
writer(Working_csv.csv[, dialect='excel'][, fmtparam])
 
R

Reedick, Andrew

-----Original Message-----
From: [email protected] [mailto:python-
[email protected]] On Behalf Of Mike P
Sent: Monday, February 11, 2008 11:10 AM
To: (e-mail address removed)
Subject: Re: CSV Reader

Hi Larry,

i'm still getting to grips with python, but rest assured i thinkn it's
better for me to write hte code for learnign purposes

My basic file is here, it comes up with a syntax error on the
startswith line, is this because it is potentially a list?
My idea was to get the lines number where i can see Transaction ID and
then write out everything from this point into a new datafile.



From the docs for reader: "All data read are returned as strings. No
automatic data type conversion is performed."
Just use print or repr() to see what the row data looks. Then the
method to check for 'transaction id' should be abundantly clear.

for data in reader:
print data
print repr(data)

Would a better solution be just to use readlines and search for the
string with a counter and then write out a file from there?

Yes you could, but the danger is that you get an insanely large file
that blows out your memory or causes the process to swap to disk space
(disk is slooooooooow.)
Just loop through the lines and use a boolean flag to determine when to
start printing.


*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA621
 
M

Mike P

Cheers for the help, the second way looked to be the best in the end,
and thanks for the boolean idea

Mike



working_CSV = "//filer/common/technical/Research/E2C/Template_CSV/
DFAExposureToConversionQueryTool.csv"

save_file = open("//filer/common/technical/Research/E2C/Template_CSV/
CSV_Data2.csv","w")

CSV_Data = open(working_CSV)
data = CSV_Data.readlines()
flag=False
for record in data:
if record.startswith('"Transaction ID"'):
flag=True
if flag:
save_file.write(record)
save_file.close()
 
R

Reedick, Andrew

-----Original Message-----
From: [email protected] [mailto:python-
[email protected]] On Behalf Of Mike P
Sent: Monday, February 11, 2008 11:42 AM
To: (e-mail address removed)
Subject: Re: CSV Reader

Cheers for the help, the second way looked to be the best in the end,
and thanks for the boolean idea

Mike



working_CSV = "//filer/common/technical/Research/E2C/Template_CSV/
DFAExposureToConversionQueryTool.csv"

save_file = open("//filer/common/technical/Research/E2C/Template_CSV/
CSV_Data2.csv","w")

CSV_Data = open(working_CSV)
data = CSV_Data.readlines()
flag=False
for record in data:
if record.startswith('"Transaction ID"'):
flag=True
if flag:
save_file.write(record)
save_file.close()


Don't be a pansy.

Use the csv module, or add a check for
record.startswith('TransactionID'). There's no guarantee that csv
columns will be double-quoted. (Leading whitespace may or may not be
acceptable, too.) Look at the first piece of sample code in the
documentation for the csv module. (Section 9.1.5 in python 2.5) You're
99% of the way to using csv.reader() properly.


Nitpick: If the boolean check is expensive, then
if not flag and record.startswith(...):
flag = true

Nitpick: flag is a weak name. Use something like bPrint, okay2print,
or print_now or anything that's more descriptive. In larger and/or more
complex programs, meaningful variable names are a must.



*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA622
 
G

Gabriel Genellina

En Mon, 11 Feb 2008 14:41:54 -0200, Mike P
CSV_Data = open(working_CSV)
data = CSV_Data.readlines()
flag=False
for record in data:
if record.startswith('"Transaction ID"'):
[...]

Files are already iterable by lines. There is no need to use readlines(),
and you can avoid the already menctioned potential slowdown problem. Just
remove the data=CSV_data.readlines() line, and change that for statement
to be:
for record in CSV_Data:

Reading the style guide may be beneficial:
http://www.python.org/dev/peps/pep-0008/
 
M

Mike P

I did just try to post, but it doesn't look like it has appeared?

I've used your advice Andrew and tried to use the CSV module, but now
it doesn't seem to pick up the startswith command?
Is this because of the way the CSV module is reading the data in?
I've looked into the module description but i can't find anything that
i hould be using?

Can anyone offer an advice?

Cheers again

Mike

working_CSV = "//filer/common/technical/Research/E2C/Template_CSV/
DFAExposureToConversionQueryTool.csv"

save_file = "//filer/common/technical/Research/E2C/Template_CSV/
CSV_Data2.csv"

start_line=False
import csv
reader = csv.reader(open(working_CSV, "rb"))
writer = csv.writer(open(save_file, "wb"))
for row in reader:
if not start_line and record.startswith("'Transaction ID'"):
start_line=True
if start_line:
print row
writer.writerows(rows)
#writer.close()
 
M

Mike P

just saw i needed to change record.startswith to row.startswith
but i get hte following traceback error

Traceback (most recent call last):
File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework
\scriptutils.py", line 310, in RunScript
exec codeObject in __main__.__dict__
File "Y:\technical\Research\E2C\Template_CSV\import CSV test.py",
line 10, in <module>
if not start_line and row.startswith('Transaction ID'):
AttributeError: 'list' object has no attribute 'startswith'
 
C

Chris

I did just try to post, but it doesn't look like it has appeared?

I've used your advice Andrew and tried to use the CSV module, but now
it doesn't seem to pick up the startswith command?
Is this because of the way the CSV module is reading the data in?
I've looked into the module description but i can't find anything that
i hould be using?

Can anyone offer an advice?

Cheers again

Mike

working_CSV = "//filer/common/technical/Research/E2C/Template_CSV/
DFAExposureToConversionQueryTool.csv"

save_file = "//filer/common/technical/Research/E2C/Template_CSV/
CSV_Data2.csv"

start_line=False
import csv
reader = csv.reader(open(working_CSV, "rb"))
writer = csv.writer(open(save_file, "wb"))
for row in reader:
if not start_line and record.startswith("'Transaction ID'"):
start_line=True
if start_line:
print row
writer.writerows(rows)
#writer.close()

record won't have an attribute 'startswith' because record is a list
and startswith is a function of a string.
Also, your code isn't exactly clear on what you want to do, if it is
just "Find the first occurence of Transaction ID and pump the file
from then onwards into a new file" why not

output = open('output_file.csv','wb')
start_line = False
for each_line in open('input_file.csv','rb'):
if not start_line and each_line.startswith("'Transaction ID'"):
start_line = True
if start_line:
output.write( each_line )
output.close()

also, if you need a line number for any purposes, take a look at
enumerate() and with that it will return a counter and your data, for
eg. 'for (line_num, each_line) in enumerate(input_file):'. Counting
starts @ zero though so you would need to add 1.
 
G

Gabriel Genellina

En Tue, 12 Feb 2008 08:37:13 -0200, Mike P
just saw i needed to change record.startswith to row.startswith
but i get hte following traceback error

Traceback (most recent call last):
File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework
\scriptutils.py", line 310, in RunScript
exec codeObject in __main__.__dict__
File "Y:\technical\Research\E2C\Template_CSV\import CSV test.py",
line 10, in <module>
if not start_line and row.startswith('Transaction ID'):
AttributeError: 'list' object has no attribute 'startswith'

The csv reader doesn't return complete lines, but a list of fields for
each line. If you want to check the first field, use:

if not start_line and row[0].startswith('Transaction ID'):
 
R

Reedick, Andrew

-----Original Message-----
From: [email protected] [mailto:python-
[email protected]] On Behalf Of Mike P
Sent: Tuesday, February 12, 2008 5:37 AM
To: (e-mail address removed)
Subject: Re: CSV Reader

just saw i needed to change record.startswith to row.startswith
but i get hte following traceback error

Traceback (most recent call last):
File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework
\scriptutils.py", line 310, in RunScript
exec codeObject in __main__.__dict__
File "Y:\technical\Research\E2C\Template_CSV\import CSV test.py",
line 10, in <module>
if not start_line and row.startswith('Transaction ID'):
AttributeError: 'list' object has no attribute 'startswith'


Algorithms + Data Structures = Programs

You need to understand what kind of data structure a "list" is. You're
foundering a bit because you don't have a good understanding of your
tools (the list data structure in this case.) Try going through the
O'Reilly Learning Python book. Even better would be to take/audit a
college/university class on data structures and algorithms.



*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA623
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,058
Latest member
QQXCharlot

Latest Threads

Top