finding out the number of rows in a CSV file

S

SimonPalmer

anyone know how I would find out how many rows are in a csv file?

I can't find a method which does this on csv.reader.

Thanks in advance
 
J

Jon Clements

anyone know how I would find out how many rows are in a csv file?

I can't find a method which does this on csv.reader.

Thanks in advance

You have to iterate each row and count them -- there's no other way
without supporting information (since each row length is naturally
variable, you can't even use the file size as an indicator).

Something like:

row_count = sum(1 for row in csv.reader( open('filename.csv') ) )

hth
Jon.
 
S

Simon Brunning

2008/8/27 SimonPalmer said:
anyone know how I would find out how many rows are in a csv file?

I can't find a method which does this on csv.reader.

len(list(csv.reader(open('my.csv'))))
 
J

Jon Clements

len(list(csv.reader(open('my.csv'))))

Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!
 
S

Simon Brunning

Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!

I do try to avoid premature optimization. ;-)
 
S

SimonPalmer

Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!

Thanks to everyone for their suggestions.

In my case the number of rows is never going to be that large (<200)
so it is a practical if slightly inelegant solution
 
S

SimonPalmer

Thanks to everyone for their suggestions.

In my case the number of rows is never going to be that large (<200)
so it is a practical if slightly inelegant solution

actually not resolved...

after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?
 
J

Jon Clements

actually not resolved...

after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?

If you're sure that the number of rows is always less than 200.

Slightly modify Simon Brunning's example and do:

rows = list( csv.reader(open('filename.csv')) )
row_count = len(rows)
for row in rows:
# do something
 
J

John Machin

actually not resolved...

after reading the file throughthe csv.reader for the length I cannot
iterate over the rows.

OK, I'll bite: Why do you think you need to know the number of rows in
advance?
How do I reset the row iterator?

You don't. You throw it away and get another one. You need to seek to
the beginning of the file first. E.g.:

C:\junk>type foo.csv
blah,blah
waffle
q,w,e,r,t,y

C:\junk>type csv2iters.py
import csv
f = open('foo.csv', 'rb')
rdr = csv.reader(f)
n = 0
for row in rdr:
n += 1
print n, f.tell()
f.seek(0)
rdr = csv.reader(f)
for row in rdr:
print row

C:\junk>csv2iters.py
3 32
['blah', 'blah']
['waffle']
['q', 'w', 'e', 'r', 't', 'y']

HTH,
John
 
S

SimonPalmer

actually not resolved...
after reading the file throughthe csv.reader for the length I cannot
iterate over the rows.

OK, I'll bite: Why do you think you need to know the number of rows in
advance?
How do I reset the row iterator?

You don't. You throw it away and get another one. You need to seek to
the beginning of the file first. E.g.:

C:\junk>type foo.csv
blah,blah
waffle
q,w,e,r,t,y

C:\junk>type csv2iters.py
import csv
f = open('foo.csv', 'rb')
rdr = csv.reader(f)
n = 0
for row in rdr:
n += 1
print n, f.tell()
f.seek(0)
rdr = csv.reader(f)
for row in rdr:
print row

C:\junk>csv2iters.py
3 32
['blah', 'blah']
['waffle']
['q', 'w', 'e', 'r', 't', 'y']

HTH,
John

this is all good, and thanks for your time. I need the number of rows
because of the nature of the data and what I do with it on reading. I
need to initialise some data structures and that is *much* more
efficient if I know in advance the number of rows of data. The cost
of reading the file is probably less than incrementally extending my
internal structures because of their complexity.

To be honest these are all good solutions and I think I have a a view
of csv reading that comes form different technologies plus lack of
experience with python which just means that I don't know where to
look for answers.

Very happy that I can now proceed.
 
T

TYR

Use csv.DictReader to get a list of dicts (you get one for each row,
with the values as the vals and the column headings as the keys) and
then do a len(list)?
 
P

Peter Otten

Jon said:
If you're sure that the number of rows is always less than 200.

Or 2000. Or 20000...

Actually any number that doesn't make your machine fall into a coma will do.
Slightly modify Simon Brunning's example and do:

rows = list( csv.reader(open('filename.csv')) )
row_count = len(rows)
for row in rows:
# do something

Peter
 
J

John S

[OP] Jon Clements said:
after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?

A CSV file is just a text file. Don't use csv.reader for counting rows
-- it's overkill. You can just read the file normally, counting lines
(lines == rows).

This is similar to what Jon Clements said, but you don't need the csv
module.

num_rows = sum(1 for line in open("myfile.csv"))

As other posters have said, there is no free lunch. When you use
csv.reader, it reads the lines, so once it's finished you're at the
end of the file.
 
P

Peter Otten

John said:
[OP] Jon Clements said:
after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?

A CSV file is just a text file. Don't use csv.reader for counting rows
-- it's overkill. You can just read the file normally, counting lines
(lines == rows).

Wrong. A field may have embedded newlines:
import csv
csv.writer(open("tmp.csv", "w")).writerow(["a" + "\n"*10 + "b"])
sum(1 for row in csv.reader(open("tmp.csv"))) 1
sum(1 for line in open("tmp.csv"))
11

Peter
 
F

Fredrik Lundh

John said:
A CSV file is just a text file. Don't use csv.reader for counting rows
-- it's overkill. You can just read the file normally, counting lines
(lines == rows).

$ more sample.csv
"Except
when it
isn't."3

</F>
 
N

norseman

Peter said:
John said:
[OP] Jon Clements said:
after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?
A CSV file is just a text file. Don't use csv.reader for counting rows
-- it's overkill. You can just read the file normally, counting lines
(lines == rows).

Wrong. A field may have embedded newlines:
import csv
csv.writer(open("tmp.csv", "w")).writerow(["a" + "\n"*10 + "b"])
sum(1 for row in csv.reader(open("tmp.csv"))) 1
sum(1 for line in open("tmp.csv"))
11

Peter

=============================
Well..... a semantics's problem here.


A blank line is just an EOL by its self. Yes.
I may want to count these. Could be indicative of a problem.
Besides sum(1 for len(line)>0 in ...) handles problem if I'm not
counting blanks and still avoids tossing, re-opening etc...

Again - it's how you look at it, but I don't want EOLs in my dbase
fields. csv was designed to 'dump' data base fields into text for those
not affording a data base program and/or to convert between data base
programs. By the way - has anyone seen a good spread sheet dumper? One
that dumps the underlying formulas and such along with the display
value? That would greatly facilitate portability, wouldn't it? (Yeah -
the receiving would have to be able to read it. But it would be a start
- yes?) Everyone got the point? Just because it gets abused doesn't
mean .... Are we back on track? Number of lines equals number of
reads - which is what was requested. No bytes magically disappearing. No
slight of hand, no one dictating how to or what with ....

The good part is everyone who reads this now knows two ways to approach
the problem and the pros/cons of each. No loosers.



Steve
(e-mail address removed)
 
J

John Machin

Peter said:
John S wrote:
[OP] Jon Clements wrote:
after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?
A CSV file is just a text file. Don't use csv.reader for counting rows
-- it's overkill. You can just read the file normally, counting lines
(lines == rows).
Wrong. A field may have embedded newlines:
import csv
csv.writer(open("tmp.csv", "w")).writerow(["a" + "\n"*10 + "b"])
sum(1 for row in csv.reader(open("tmp.csv"))) 1
sum(1 for line in open("tmp.csv")) 11

Peter

=============================
Well..... a semantics's problem here.

A blank line is just an EOL by its self. Yes.

Or a line containing blanks. Yes what?
I may want to count these. Could be indicative of a problem.

If you use the csv module to read the file, a "blank line" will come
out as a row with one field, the contents of which you can check.
Besides sum(1 for len(line)>0 in ...) handles problem if I'm not
counting blanks and still avoids tossing, re-opening etc...

What is "tossing", apart from the English slang meaning?
What re-opening?
Again - it's how you look at it, but I don't want EOLs in my dbase
fields.

<rant>
Most people don't want them, but many do have them, as well as Ctrl-Zs
and NBSPs and dial-up line noise (and umlauts/accents/suchlike
inserted by the temporarily-employed backpacker to ensure that her
compatriots' names and addresses were spelled properly) ... and the IT
department fervently believes the content is ASCII even though they
have done absolutely SFA to ensure that.
csv was designed to 'dump' data base fields into text for those
not affording a data base program and/or to convert between data base
programs. By the way - has anyone seen a good spread sheet dumper? One
that dumps the underlying formulas and such along with the display
value? That would greatly facilitate portability, wouldn't it? (Yeah -
the receiving would have to be able to read it. But it would be a start
- yes?) Everyone got the point? Just because it gets abused doesn't
mean .... Are we back on track? Number of lines equals number of
reads - which is what was requested. No bytes magically disappearing. No
slight of hand, no one dictating how to or what with ....

The good part is everyone who reads this now knows two ways to approach
the problem and the pros/cons of each. No loosers.

IMHO it is very hard to discern from all that ramble what the alleged
problem is, let alone what are the ways to approach it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,438
Messages
2,571,699
Members
48,796
Latest member
Greg L.
Top