Getting values out of a CSV

C

CarpeSkium

How do I access the value in the second row in the first position of a
CSV? Or the 3rd row, in the fifth position?

a,b,c,d,e,f,g,h,i
j,k,l,m,n,o,p,q,r
r,s,t,v,w,x,y,z

I'd want to get at "j" and "w". I know I can do

import csv
reader = csv.reader(open("some.csv", "rb"))
for row in reader:
print row[0]

to get the first value in EVERY row, but I don't want that. Thanks for
the help.
 
D

Daniel

How do I access the value in the second row in the first position of a
CSV? Or the 3rd row, in the fifth position?

a,b,c,d,e,f,g,h,i
j,k,l,m,n,o,p,q,r
r,s,t,v,w,x,y,z

I'd want to get at "j" and "w". I know I can do

import csv
reader = csv.reader(open("some.csv", "rb"))
for row in reader:
print row[0]

to get the first value in EVERY row, but I don't want that. Thanks for
the help.

data = [row for row in csv.reader(open('some.csv', 'rb'))

then you can access like so:
data[1][4] 'n'
data[0][0] 'a'
data[2][0]
'r'
 
G

Gabriel Genellina

data = [row for row in csv.reader(open('some.csv', 'rb'))

Note that every time you see [x for x in ...] with no condition, you can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))
 
D

Daniel

data = [row for row in csv.reader(open('some.csv', 'rb'))

Note that every time you see [x for x in ...] with no condition, you can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))

Clearer? Maybe, but list comprehensions are clearer (at least for me)

Faster? No. List Comprehensions are faster.
 
K

Kelvie Wong

data = [row for row in csv.reader(open('some.csv', 'rb'))

Note that every time you see [x for x in ...] with no condition, you can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))

Clearer? Maybe, but list comprehensions are clearer (at least for me)

Faster? No. List Comprehensions are faster.

kelvie@valour pdfps $ python -m timeit -c 'data = list(open("make.ps"))'
100 loops, best of 3: 7.5 msec per loop
kelvie@valour pdfps $ python -m timeit -c 'data = [line for line in
open("make.ps")]'
100 loops, best of 3: 9.2 msec per loop

On my system just putting into a list is faster. I think this is
because you don't need to assign each line to the variable 'line' each
time in the former case.

I, too, think it's faster to just use list() instead of 'line for line
in iterable', as it seems kind of redundant.
 
M

Michael Hoffman

Daniel said:
On Fri, 13 Jul 2007 08:51:25 +0300, Gabriel Genellina
Note that every time you see [x for x in ...] with no condition, you
can write list(...) instead - more clear, and faster.
>
Faster? No. List Comprehensions are faster.

Why do you think that?
 
D

Daniel

Note that every time you see [x for x in ...] with no condition, you
can

Faster? No. List Comprehensions are faster.

kelvie@valour pdfps $ python -m timeit -c 'data = list(open("make.ps"))'
100 loops, best of 3: 7.5 msec per loop
kelvie@valour pdfps $ python -m timeit -c 'data = [line for line in
open("make.ps")]'
100 loops, best of 3: 9.2 msec per loop

On my system just putting into a list is faster. I think this is
because you don't need to assign each line to the variable 'line' each
time in the former case.

I, too, think it's faster to just use list() instead of 'line for line
in iterable', as it seems kind of redundant.

$ python -m timeit -c 'import csv; data = list(csv.reader(open("some.csv",
"rb")))'
10000 loops, best of 3: 44 usec per loop
$ python -m timeit -c 'import csv; data = [row for row in
csv.reader(open("some.csv", "rb"))]'
10000 loops, best of 3: 37 usec per loop

I don't know why there seems to be a differece, but I know that list comps
are python are very heavily optimised.
 
M

Marc 'BlackJack' Rintsch

Note that every time you see [x for x in ...] with no condition, you
can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))

Faster? No. List Comprehensions are faster.

kelvie@valour pdfps $ python -m timeit -c 'data = list(open("make.ps"))'
100 loops, best of 3: 7.5 msec per loop
kelvie@valour pdfps $ python -m timeit -c 'data = [line for line in
open("make.ps")]'
100 loops, best of 3: 9.2 msec per loop

On my system just putting into a list is faster. I think this is
because you don't need to assign each line to the variable 'line' each
time in the former case.

I, too, think it's faster to just use list() instead of 'line for line
in iterable', as it seems kind of redundant.

$ python -m timeit -c 'import csv; data = list(csv.reader(open("some.csv",
"rb")))'
10000 loops, best of 3: 44 usec per loop
$ python -m timeit -c 'import csv; data = [row for row in
csv.reader(open("some.csv", "rb"))]'
10000 loops, best of 3: 37 usec per loop

I don't know why there seems to be a differece, but I know that list comps
are python are very heavily optimised.

Does the machine use power saving features like SpeedStep or
something similar, i.e. runs the processor always with 100% speed or is it
dynamically stepped if there's load on the processor? Do both tests read
the data always from cache or has the very first loop had to fetch the CSV
file from disk?

$ python -m timeit -n 1000 -c 'import csv; data = [row for row in
csv.reader(open("test.csv", "rb"))]' 1000 loops, best of 3: 1.27 msec per
loop

$ python -m timeit -n 1000 -c 'import csv; data =
list(csv.reader(open("test.csv", "rb")))' 1000 loops, best of 3: 1.25 msec
per loop

Ciao,
Marc 'BlackJack' Rintsch
 
D

Daniel

$ python -m timeit -c 'import csv; data =
list(csv.reader(open("some.csv",
"rb")))'
10000 loops, best of 3: 44 usec per loop
$ python -m timeit -c 'import csv; data = [row for row in
csv.reader(open("some.csv", "rb"))]'
10000 loops, best of 3: 37 usec per loop

I don't know why there seems to be a differece, but I know that list
comps
are python are very heavily optimised.

Does the machine use power saving features like SpeedStep or
something similar, i.e. runs the processor always with 100% speed or is
it
dynamically stepped if there's load on the processor? Do both tests read
the data always from cache or has the very first loop had to fetch the
CSV
file from disk?

$ python -m timeit -n 1000 -c 'import csv; data = [row for row in
csv.reader(open("test.csv", "rb"))]' 1000 loops, best of 3: 1.27 msec per
loop

$ python -m timeit -n 1000 -c 'import csv; data =
list(csv.reader(open("test.csv", "rb")))' 1000 loops, best of 3: 1.25
msec
per loop

No SpeedStep - tried a few repeats just in case files were cached,
consistent 35usec for comp 40usec for list

Python 2.5.1 on Linux 1.2ghz

Even replacing the csv lookup with a straight variable declaration:
[range(10)*3], same results

Weird.

Python
 
K

Kelvie Wong

Hrm. Repeating the test several more times, it seems that the value
fluctuates, sometimes one's faster than the other, and sometimes
they're the same.

Perhaps the minute difference between the two is statistically
insignificant? Or perhaps the mechanism underlying both (i.e. the
implementation) is the same?

$ python -m timeit -c 'import csv; data =
list(csv.reader(open("some.csv",
"rb")))'
10000 loops, best of 3: 44 usec per loop
$ python -m timeit -c 'import csv; data = [row for row in
csv.reader(open("some.csv", "rb"))]'
10000 loops, best of 3: 37 usec per loop

I don't know why there seems to be a differece, but I know that list
comps
are python are very heavily optimised.

Does the machine use power saving features like SpeedStep or
something similar, i.e. runs the processor always with 100% speed or is
it
dynamically stepped if there's load on the processor? Do both tests read
the data always from cache or has the very first loop had to fetch the
CSV
file from disk?

$ python -m timeit -n 1000 -c 'import csv; data = [row for row in
csv.reader(open("test.csv", "rb"))]' 1000 loops, best of 3: 1.27 msec per
loop

$ python -m timeit -n 1000 -c 'import csv; data =
list(csv.reader(open("test.csv", "rb")))' 1000 loops, best of 3: 1.25
msec
per loop

No SpeedStep - tried a few repeats just in case files were cached,
consistent 35usec for comp 40usec for list

Python 2.5.1 on Linux 1.2ghz

Even replacing the csv lookup with a straight variable declaration:
[range(10)*3], same results

Weird.

Python
 
G

Gabriel Genellina

Note that every time you see [x for x in ...] with no condition, you
can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))

Faster? No. List Comprehensions are faster.

On my system just putting into a list is faster. I think this is
because you don't need to assign each line to the variable 'line' each
time in the former case.

I don't know why there seems to be a differece, but I know that list
comps
are python are very heavily optimised.

In principle both ways have to create and populate a list, and a list
comprehension surely is better than a loop using append() - but it still
has to create and bind the intermediate variable on each iteration.
I think that testing with a csv file can't show the difference between
both ways of creating the list because of the high overhead due to csv
processing.
Using another example, with no I/O involved (a generator for the first
10000 fibonacci numbers):

C:\TEMP>python -m timeit -s "import fibo" "list(fibo.fibo())"
10 loops, best of 3: 39.4 msec per loop

C:\TEMP>python -m timeit -s "import fibo" "[x for x in fibo.fibo()]"
10 loops, best of 3: 40.7 msec per loop

(Generating less values shows larger differences - anyway they're not
terrific)

So, as always, one should measure in each specific case if optimization is
worth the pain - and if csv files are involved I'd say the critical points
are elsewhere, not on how one creates the list of rows.
 
A

Alex Popescu

So, as always, one should measure in each specific case if optimization is
worth the pain [...].

I hope I am somehow misreading the above sentence :). IMO synonim
language contructs
should result in the same performance or at least have clear/
documented performance.
I don't think we really want to see in code something like:

if threshold:
do_it_with_list_function
else:
do_it_with_list_comprehension

bests,

../alex
--
..w( the_mindstorm )p.


 
S

Steve Holden

Alex said:
So, as always, one should measure in each specific case if optimization is
worth the pain [...].

I hope I am somehow misreading the above sentence :). IMO synonim
language contructs
should result in the same performance or at least have clear/
documented performance.

That's a fine opinion, how would you enforce it? Should we go throught
he interpreter slowing down the faster to each pair of alternative
constructs? ;-) It's inevitable there'll be differences in execution
time between equivalent constructs, and in that case you have to test to
find the better in your specific situation.

The real issue here is that in 95% or more of the source of most
programs speed/performance isn't that much of an issue anyway.
I don't think we really want to see in code something like:

if threshold:
do_it_with_list_function
else:
do_it_with_list_comprehension
This would most certainly be a premature optimization which, as has been
repeated many times on this list, is the root of much evil in
programming. As Gabriel mentioned, you only need to do it if it's "worth
the pain", which in most case it won't be. It isn't worth spending even
five minutes to shave a minute off the performance of a ten-minute
program that is only run once a week, for example.

Ultimately we have to be pragmatic: circumstances alter cases, and it's
usually not worth spending the time to improve execution speed except
for the most critical parts (the innermost nested loops) of production
programs.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------
 
A

Alex Popescu

Alex said:
So, as always, one should measure in each specific case if optimization is
worth the pain [...].
I hope I am somehow misreading the above sentence :). IMO synonim
language contructs
should result in the same performance or at least have clear/
documented performance.

That's a fine opinion, how would you enforce it? Should we go throught
he interpreter slowing down the faster to each pair of alternative
constructs? ;-) It's inevitable there'll be differences in execution
time between equivalent constructs, and in that case you have to test to
find the better in your specific situation.

The real issue here is that in 95% or more of the source of most
programs speed/performance isn't that much of an issue anyway.
I don't think we really want to see in code something like:
if threshold:
do_it_with_list_function
else:
do_it_with_list_comprehension

This would most certainly be a premature optimization which, as has been
repeated many times on this list, is the root of much evil in
programming. As Gabriel mentioned, you only need to do it if it's "worth
the pain", which in most case it won't be. It isn't worth spending even
five minutes to shave a minute off the performance of a ten-minute
program that is only run once a week, for example.

Ultimately we have to be pragmatic: circumstances alter cases, and it's
usually not worth spending the time to improve execution speed except
for the most critical parts (the innermost nested loops) of production
programs.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

Steve, I fully agree with you (I am a newbie only to Python and not to
programming ;-)).
My point was that this thread may be misleading to newbies, because it
is discussing
corner cases performance of the 2 equivalent language constructs,
while it should
most probably be about the fact that the 2 solutions are equivalent
and the only
difference is probably readability (or maybe something like: list
function is
prefered when there are no additional constraints on the list
comprehension construct).

bests,

../alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top