Getting values out of a CSV

CarpeSkium · Jul 13, 2007

How do I access the value in the second row in the first position of a
CSV? Or the 3rd row, in the fifth position?

a,b,c,d,e,f,g,h,i
j,k,l,m,n,o,p,q,r
r,s,t,v,w,x,y,z

I'd want to get at "j" and "w". I know I can do

import csv
reader = csv.reader(open("some.csv", "rb"))
for row in reader:
print row[0]

to get the first value in EVERY row, but I don't want that. Thanks for
the help.

Daniel · Jul 13, 2007

How do I access the value in the second row in the first position of a
CSV? Or the 3rd row, in the fifth position?

a,b,c,d,e,f,g,h,i
j,k,l,m,n,o,p,q,r
r,s,t,v,w,x,y,z

I'd want to get at "j" and "w". I know I can do

import csv
reader = csv.reader(open("some.csv", "rb"))
for row in reader:
print row[0]

to get the first value in EVERY row, but I don't want that. Thanks for
the help.

data = [row for row in csv.reader(open('some.csv', 'rb'))

then you can access like so:

data[1][4] 'n'
data[0][0] 'a'
data[2][0]

Click to expand...

Click to expand...

'r'

Gabriel Genellina · Jul 13, 2007

En Fri said:
data = [row for row in csv.reader(open('some.csv', 'rb'))

Note that every time you see [x for x in ...] with no condition, you can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))

Daniel · Jul 13, 2007

data = [row for row in csv.reader(open('some.csv', 'rb'))

Click to expand...

Note that every time you see [x for x in ...] with no condition, you can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))

Clearer? Maybe, but list comprehensions are clearer (at least for me)

Faster? No. List Comprehensions are faster.

Kelvie Wong · Jul 13, 2007

data = [row for row in csv.reader(open('some.csv', 'rb'))

Click to expand...

Note that every time you see [x for x in ...] with no condition, you can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))

Click to expand...

Clearer? Maybe, but list comprehensions are clearer (at least for me)

Faster? No. List Comprehensions are faster.

kelvie@valour pdfps $ python -m timeit -c 'data = list(open("make.ps"))'
100 loops, best of 3: 7.5 msec per loop
kelvie@valour pdfps $ python -m timeit -c 'data = [line for line in
open("make.ps")]'
100 loops, best of 3: 9.2 msec per loop

On my system just putting into a list is faster. I think this is
because you don't need to assign each line to the variable 'line' each
time in the former case.

I, too, think it's faster to just use list() instead of 'line for line
in iterable', as it seems kind of redundant.

Michael Hoffman · Jul 13, 2007

Daniel said:
On Fri, 13 Jul 2007 08:51:25 +0300, Gabriel Genellina

Note that every time you see [x for x in ...] with no condition, you
can write list(...) instead - more clear, and faster.

Click to expand...

>
Faster? No. List Comprehensions are faster.

Why do you think that?

Daniel · Jul 13, 2007

Note that every time you see [x for x in ...] with no condition, you

can

Faster? No. List Comprehensions are faster.

Click to expand...

kelvie@valour pdfps $ python -m timeit -c 'data = list(open("make.ps"))'
100 loops, best of 3: 7.5 msec per loop
kelvie@valour pdfps $ python -m timeit -c 'data = [line for line in
open("make.ps")]'
100 loops, best of 3: 9.2 msec per loop

On my system just putting into a list is faster. I think this is
because you don't need to assign each line to the variable 'line' each
time in the former case.

I, too, think it's faster to just use list() instead of 'line for line
in iterable', as it seems kind of redundant.

$ python -m timeit -c 'import csv; data = list(csv.reader(open("some.csv",
"rb")))'
10000 loops, best of 3: 44 usec per loop
$ python -m timeit -c 'import csv; data = [row for row in
csv.reader(open("some.csv", "rb"))]'
10000 loops, best of 3: 37 usec per loop

I don't know why there seems to be a differece, but I know that list comps
are python are very heavily optimised.

Marc 'BlackJack' Rintsch · Jul 13, 2007

Note that every time you see [x for x in ...] with no condition, you
can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))

Faster? No. List Comprehensions are faster.

Click to expand...

kelvie@valour pdfps $ python -m timeit -c 'data = list(open("make.ps"))'
100 loops, best of 3: 7.5 msec per loop
kelvie@valour pdfps $ python -m timeit -c 'data = [line for line in
open("make.ps")]'
100 loops, best of 3: 9.2 msec per loop

On my system just putting into a list is faster. I think this is
because you don't need to assign each line to the variable 'line' each
time in the former case.

I, too, think it's faster to just use list() instead of 'line for line
in iterable', as it seems kind of redundant.

Click to expand...

$ python -m timeit -c 'import csv; data = list(csv.reader(open("some.csv",
"rb")))'
10000 loops, best of 3: 44 usec per loop
$ python -m timeit -c 'import csv; data = [row for row in
csv.reader(open("some.csv", "rb"))]'
10000 loops, best of 3: 37 usec per loop

I don't know why there seems to be a differece, but I know that list comps
are python are very heavily optimised.

Does the machine use power saving features like SpeedStep or
something similar, i.e. runs the processor always with 100% speed or is it
dynamically stepped if there's load on the processor? Do both tests read
the data always from cache or has the very first loop had to fetch the CSV
file from disk?

$ python -m timeit -n 1000 -c 'import csv; data = [row for row in
csv.reader(open("test.csv", "rb"))]' 1000 loops, best of 3: 1.27 msec per
loop

$ python -m timeit -n 1000 -c 'import csv; data =
list(csv.reader(open("test.csv", "rb")))' 1000 loops, best of 3: 1.25 msec
per loop

Ciao,
Marc 'BlackJack' Rintsch

Daniel · Jul 13, 2007

$ python -m timeit -c 'import csv; data =
list(csv.reader(open("some.csv",
"rb")))'
10000 loops, best of 3: 44 usec per loop
$ python -m timeit -c 'import csv; data = [row for row in
csv.reader(open("some.csv", "rb"))]'
10000 loops, best of 3: 37 usec per loop

I don't know why there seems to be a differece, but I know that list
comps
are python are very heavily optimised.

Click to expand...

Does the machine use power saving features like SpeedStep or
something similar, i.e. runs the processor always with 100% speed or is
it
dynamically stepped if there's load on the processor? Do both tests read
the data always from cache or has the very first loop had to fetch the
CSV
file from disk?

$ python -m timeit -n 1000 -c 'import csv; data = [row for row in
csv.reader(open("test.csv", "rb"))]' 1000 loops, best of 3: 1.27 msec per
loop

$ python -m timeit -n 1000 -c 'import csv; data =
list(csv.reader(open("test.csv", "rb")))' 1000 loops, best of 3: 1.25
msec
per loop

No SpeedStep - tried a few repeats just in case files were cached,
consistent 35usec for comp 40usec for list

Python 2.5.1 on Linux 1.2ghz

Even replacing the csv lookup with a straight variable declaration:
[range(10)*3], same results

Weird.

Python

Kelvie Wong · Jul 13, 2007

Hrm. Repeating the test several more times, it seems that the value
fluctuates, sometimes one's faster than the other, and sometimes
they're the same.

Perhaps the minute difference between the two is statistically
insignificant? Or perhaps the mechanism underlying both (i.e. the
implementation) is the same?

$ python -m timeit -c 'import csv; data =
list(csv.reader(open("some.csv",
"rb")))'
10000 loops, best of 3: 44 usec per loop
$ python -m timeit -c 'import csv; data = [row for row in
csv.reader(open("some.csv", "rb"))]'
10000 loops, best of 3: 37 usec per loop

I don't know why there seems to be a differece, but I know that list
comps
are python are very heavily optimised.

Click to expand...

Does the machine use power saving features like SpeedStep or
something similar, i.e. runs the processor always with 100% speed or is
it
dynamically stepped if there's load on the processor? Do both tests read
the data always from cache or has the very first loop had to fetch the
CSV
file from disk?

$ python -m timeit -n 1000 -c 'import csv; data = [row for row in
csv.reader(open("test.csv", "rb"))]' 1000 loops, best of 3: 1.27 msec per
loop

$ python -m timeit -n 1000 -c 'import csv; data =
list(csv.reader(open("test.csv", "rb")))' 1000 loops, best of 3: 1.25
msec
per loop

Click to expand...

No SpeedStep - tried a few repeats just in case files were cached,
consistent 35usec for comp 40usec for list

Python 2.5.1 on Linux 1.2ghz

Even replacing the csv lookup with a straight variable declaration:
[range(10)*3], same results

Weird.

Python

Gabriel Genellina · Jul 14, 2007

En Fri said:
Note that every time you see [x for x in ...] with no condition, you
can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))

Faster? No. List Comprehensions are faster.

Click to expand...

On my system just putting into a list is faster. I think this is
because you don't need to assign each line to the variable 'line' each
time in the former case.

Click to expand...

I don't know why there seems to be a differece, but I know that list
comps
are python are very heavily optimised.

In principle both ways have to create and populate a list, and a list
comprehension surely is better than a loop using append() - but it still
has to create and bind the intermediate variable on each iteration.
I think that testing with a csv file can't show the difference between
both ways of creating the list because of the high overhead due to csv
processing.
Using another example, with no I/O involved (a generator for the first
10000 fibonacci numbers):

C:\TEMP>python -m timeit -s "import fibo" "list(fibo.fibo())"
10 loops, best of 3: 39.4 msec per loop

C:\TEMP>python -m timeit -s "import fibo" "[x for x in fibo.fibo()]"
10 loops, best of 3: 40.7 msec per loop

(Generating less values shows larger differences - anyway they're not
terrific)

So, as always, one should measure in each specific case if optimization is
worth the pain - and if csv files are involved I'd say the critical points
are elsewhere, not on how one creates the list of rows.

Alex Popescu · Jul 15, 2007

So, as always, one should measure in each specific case if optimization is
worth the pain [...].

I hope I am somehow misreading the above sentence

. IMO synonim
language contructs
should result in the same performance or at least have clear/
documented performance.
I don't think we really want to see in code something like:

if threshold:
do_it_with_list_function
else:
do_it_with_list_comprehension

bests,

../alex
--
..w( the_mindstorm )p.

Steve Holden · Jul 15, 2007

Alex said:
So, as always, one should measure in each specific case if optimization is
worth the pain [...].

Click to expand...

I hope I am somehow misreading the above sentence . IMO synonim
language contructs
should result in the same performance or at least have clear/
documented performance.

That's a fine opinion, how would you enforce it? Should we go throught
he interpreter slowing down the faster to each pair of alternative
constructs? ;-) It's inevitable there'll be differences in execution
time between equivalent constructs, and in that case you have to test to
find the better in your specific situation.

The real issue here is that in 95% or more of the source of most
programs speed/performance isn't that much of an issue anyway.

I don't think we really want to see in code something like:

if threshold:
do_it_with_list_function
else:
do_it_with_list_comprehension

This would most certainly be a premature optimization which, as has been
repeated many times on this list, is the root of much evil in
programming. As Gabriel mentioned, you only need to do it if it's "worth
the pain", which in most case it won't be. It isn't worth spending even
five minutes to shave a minute off the performance of a ten-minute
program that is only run once a week, for example.

Ultimately we have to be pragmatic: circumstances alter cases, and it's
usually not worth spending the time to improve execution speed except
for the most critical parts (the innermost nested loops) of production
programs.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

Alex Popescu · Jul 15, 2007

Alex said:
Alex said:

So, as always, one should measure in each specific case if optimization is
worth the pain [...].

Click to expand...

Click to expand...

I hope I am somehow misreading the above sentence . IMO synonim
language contructs
should result in the same performance or at least have clear/
documented performance.

Click to expand...

That's a fine opinion, how would you enforce it? Should we go throught
he interpreter slowing down the faster to each pair of alternative
constructs? ;-) It's inevitable there'll be differences in execution
time between equivalent constructs, and in that case you have to test to
find the better in your specific situation.

The real issue here is that in 95% or more of the source of most
programs speed/performance isn't that much of an issue anyway.

I don't think we really want to see in code something like:

Click to expand...

if threshold:
do_it_with_list_function
else:
do_it_with_list_comprehension

Click to expand...

This would most certainly be a premature optimization which, as has been
repeated many times on this list, is the root of much evil in
programming. As Gabriel mentioned, you only need to do it if it's "worth
the pain", which in most case it won't be. It isn't worth spending even
five minutes to shave a minute off the performance of a ten-minute
program that is only run once a week, for example.

Ultimately we have to be pragmatic: circumstances alter cases, and it's
usually not worth spending the time to improve execution speed except
for the most critical parts (the innermost nested loops) of production
programs.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

Steve, I fully agree with you (I am a newbie only to Python and not to
programming ;-)).
My point was that this thread may be misleading to newbies, because it
is discussing
corner cases performance of the 2 equivalent language constructs,
while it should
most probably be about the fact that the 2 solutions are equivalent
and the only
difference is probably readability (or maybe something like: list
function is
prefered when there are no additional constraints on the list
comprehension construct).

bests,

../alex

CSV out of range	0	Dec 4, 2012
Number of cells, using CSV module	8	May 16, 2013
Padding strings for a clean visual print out...	5	Dec 23, 2023
.csv to .txt after adding columns	7	Sep 18, 2013
csv read clean up and write out to csv	2	Nov 2, 2012
Errors When Pulling Information from CSV File to Python	0	Dec 10, 2020
Blue J Ciphertext Program	2	Nov 22, 2023
Dealing with \r in CSV fields in Python2.4	0	Sep 4, 2013

Getting values out of a CSV

CarpeSkium

Daniel

Gabriel Genellina

Daniel

Kelvie Wong

Michael Hoffman

Daniel

Marc 'BlackJack' Rintsch

Daniel

Kelvie Wong

Gabriel Genellina

Alex Popescu

Steve Holden

Alex Popescu

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads