There must be a better way

C

Colin J. Williams

Below is part of a script which shows the changes made to permit the
script to run on either Python 2.7 or Python 3.2.

I was surprised to see that the CSV next method is no longer available.

Suggestions welcome.

Colin W.


def main():
global inData, inFile
if ver == '2':
headerLine= inData.next()
else: # Python version 3.3
inFile.close()
inFile= open('Don Wall April 18 2013.csv', 'r', newline= '')
inData= csv.reader(inFile)
headerLine= inData.__next__()
 
C

Chris Rebert

Below is part of a script which shows the changes made to permit the script
to run on either Python 2.7 or Python 3.2.

I was surprised to see that the CSV next method is no longer available.

Suggestions welcome.
if ver == '2':
headerLine= inData.next()
else: # Python version 3.3
headerLine= inData.__next__()

Use the built-in next() function
(http://docs.python.org/2/library/functions.html#next ) instead:
headerLine = next(iter(inData))

Cheers,
Chris
 
S

Steven D'Aprano

Below is part of a script which shows the changes made to permit the
script to run on either Python 2.7 or Python 3.2.

I was surprised to see that the CSV next method is no longer available.

This makes no sense. What's "the CSV next method"? Are you talking about
the csv module? It has no "next method".

py> import csv
py> csv.next
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'next'


Please *define your terms*, otherwise we are flailing in the dark trying
to guess what your code is supposed to do. The code you provide cannot
possible work -- you use variables before they are defined, use other
variables that are never defined at all, reference mysterious globals.
You even close a file before it is opened!

Please read this:

http://sscce.org/

and provide a *short, self-contained, correct example* that we can
actually run.

But in the meantime, I'm going to consult the entrails and try to guess
what you are doing: you're complaining that iterators have a next method
in Python 2, and __next__ in Python 3. Am I correct?

If so, this is true, but you should not be using the plain next method in
Python 2. You should be using the built-in function next(), not calling
the method directly. The plain next *method* was a mistake, only left in
for compatibility with older versions of Python. Starting from Python 2.6
the correct way to get the next value from an arbitrary iterator is with
the built-in function next(), not by calling a method directly.

(In the same way that you get the length of a sequence by calling the
built-in function len(), not by calling the __len__ method directly.)

So provided you are using Python 2.6 or better, you call:

next(inData)

to get the next value, regardless of whether it is Python 2.x or 3.x.

If you need to support older versions, you can do this:

try:
next # Does the built-in already exist?
except NameError:
# No, we define our own.
def next(iterator):
return iterator.next()

then just use next(inData) as normal.
 
T

Tim Chase

This makes no sense. What's "the CSV next method"? Are you talking
about the csv module? It has no "next method".

In 2.x, the csv.reader() class (and csv.DictReader() class) offered
a .next() method that is absent in 3.x For those who use(d) the
csv.reader object on a regular basis, this was a pretty common
usage. Particularly if you had to do your own header parsing:

f = open(...)
r = csv.reader(f)
try:
headers = r.next()
header_map = analyze(headers)
for row in r:
foo = row[header_map["FOO COLUMN"]]
process(foo)
finally:
f.close()

(I did this for a number of cases where the client couldn't
consistently send column-headers in a consistent
capitalization/spaces, so my header-making function had to normalize
the case/spaces and then reference the normalized names)
So provided you are using Python 2.6 or better, you call:

next(inData)

to get the next value, regardless of whether it is Python 2.x or
3.x.

If you need to support older versions, you can do this:

try:
next # Does the built-in already exist?
except NameError:
# No, we define our own.
def next(iterator):
return iterator.next()

then just use next(inData) as normal.

This is a good expansion of Chris Rebert's suggestion to use next(),
as those of us that have to support pre-2.6 code lack the next()
function out of the box.

-tkc
 
T

Terry Jan Reedy

In 2.x, the csv.reader() class (and csv.DictReader() class) offered
a .next() method that is absent in 3.x

In Py 3, .next was renamed to .__next__ for *all* iterators. The
intention is that one iterate with for item in iterable or use builtin
functions iter() and next().
 
C

Colin J. Williams

In Py 3, .next was renamed to .__next__ for *all* iterators. The
intention is that one iterate with for item in iterable or use builtin
functions iter() and next().
Thanks to Chris, Tim and Terry for their helpful comments.

I was seeking some code that would be acceptable to both Python 2.7 and 3.3.

In the end, I used:

inData= csv.reader(inFile)

def main():
if ver == '2':
headerLine= inData.next()
else:
headerLine= inData.__next__()
...
for item in inData:
assert len(dataStore) == len(item)
j= findCardinal(item[10])
...

This is acceptable to both versions.

It is not usual to have a name with preceding and following
udserscores,imn user code.

Presumably, there is a rationale for the change from csv.reader.next
to csv.reader.__next__.

If next is not acceptable for the version 3 csv.reader, perhaps __next__
could be added to the version 2 csv.reader, so that the same code can be
used in the two versions.

This would avoid the kluge I used above.

Colin W.
 
J

Jussi Piitulainen

Colin J. Williams writes:
....
It is not usual to have a name with preceding and following
udserscores,imn user code.

Presumably, there is a rationale for the change from csv.reader.next
to csv.reader.__next__.
....

I think the user code is supposed to be next(csv.reader). For example,
current documentation contains the following.

# csvreader.__next__()
# Return the next row of the reader’s iterable object as a list,
# parsed according to the current dialect. Usually you should call
# this as next(reader).
 
P

Peter Otten

Colin said:
I was seeking some code that would be acceptable to both Python 2.7 and
3.3.

In the end, I used:

inData= csv.reader(inFile)

def main():
if ver == '2':
headerLine= inData.next()
else:
headerLine= inData.__next__()
...

I think it was mentioned before, but to be explicit:

def main():
headerLine = next(inData)
...

works in Python 2.6, 2.7, and 3.x.
 
C

Colin J. Williams

Colin J. Williams writes:
...
...

I think the user code is supposed to be next(csv.reader). For example,
current documentation contains the following.

# csvreader.__next__()
# Return the next row of the reader’s iterable object as a list,
# parsed according to the current dialect. Usually you should call
# this as next(reader).
Thanks,

This works with both 2.7 and 3.3

Colin W.
 
N

Neil Cerutti

In Py 3, .next was renamed to .__next__ for *all* iterators. The
intention is that one iterate with for item in iterable or use builtin
functions iter() and next().
Thanks to Chris, Tim and Terry for their helpful comments.

I was seeking some code that would be acceptable to both Python 2.7 and 3.3.

In the end, I used:

inData= csv.reader(inFile)

def main():
if ver == '2':
headerLine= inData.next()
else:
headerLine= inData.__next__()
...
for item in inData:
assert len(dataStore) == len(item)
j= findCardinal(item[10])
...

This is acceptable to both versions.

It is not usual to have a name with preceding and following
udserscores,imn user code.

Presumably, there is a rationale for the change from csv.reader.next
to csv.reader.__next__.

If next is not acceptable for the version 3 csv.reader, perhaps __next__
could be added to the version 2 csv.reader, so that the same code can be
used in the two versions.

This would avoid the kluge I used above.

Would using csv.DictReader instead a csv.reader be an option?
 
C

Colin J. Williams

On 4/20/2013 8:34 PM, Tim Chase wrote:
In 2.x, the csv.reader() class (and csv.DictReader() class) offered
a .next() method that is absent in 3.x

In Py 3, .next was renamed to .__next__ for *all* iterators. The
intention is that one iterate with for item in iterable or use builtin
functions iter() and next().
Thanks to Chris, Tim and Terry for their helpful comments.

I was seeking some code that would be acceptable to both Python 2.7 and 3.3.

In the end, I used:

inData= csv.reader(inFile)

def main():
if ver == '2':
headerLine= inData.next()
else:
headerLine= inData.__next__()
...
for item in inData:
assert len(dataStore) == len(item)
j= findCardinal(item[10])
...

This is acceptable to both versions.

It is not usual to have a name with preceding and following
udserscores,imn user code.

Presumably, there is a rationale for the change from csv.reader.next
to csv.reader.__next__.

If next is not acceptable for the version 3 csv.reader, perhaps __next__
could be added to the version 2 csv.reader, so that the same code can be
used in the two versions.

This would avoid the kluge I used above.

Would using csv.DictReader instead a csv.reader be an option?
Since I'm only interested in one or two columns, the simpler approach is
probably better.

Colin W.
 
N

Neil Cerutti

Since I'm only interested in one or two columns, the simpler
approach is probably better.

Here's a sketch of how one of my projects handles that situation.
I think the index variables are invaluable documentation, and
make it a bit more robust. (Python 3, so not every bit is
relevant to you).

with open("today.csv", encoding='UTF-8', newline='') as today_file:
reader = csv.reader(today_file)
header = next(reader)
majr_index = header.index('MAJR')
div_index = header.index('DIV')
for rec in reader:
major = rec[majr_index]
rec[div_index] = DIVISION_TABLE[major]

But a csv.DictReader might still be more efficient. I never
tested. This is the only place I've used this "optimization".
It's fast enough. ;)
 
O

Oscar Benjamin

Here's a sketch of how one of my projects handles that situation.
I think the index variables are invaluable documentation, and
make it a bit more robust. (Python 3, so not every bit is
relevant to you).

with open("today.csv", encoding='UTF-8', newline='') as today_file:
reader = csv.reader(today_file)
header = next(reader)

I once had a bug that took a long time to track down and was caused by
using next() without an enclosing try/except StopIteration (or the
optional default argument to next).

This is a sketch of how you can get the bug that I had:

$ cat next.py
#!/usr/bin/env python

def join(iterables):
'''Join iterable of iterables, stripping first item'''
for iterable in iterables:
iterator = iter(iterable)
header = next(iterator) # Here's the problem
for val in iterator:
yield val

data = [
['foo', 1, 2, 3],
['bar', 4, 5, 6],
[], # Whoops! Who put this empty iterable here?
['baz', 7, 8, 9],
]

for x in join(data):
print(x)

$ ./next.py
1
2
3
4
5
6

The values 7, 8 and 9 are not printed but no error message is shown.
This is because calling next on the iterator over the empty list
raises a StopIteration that is not caught in the join generator. The
StopIteration is then "caught" by the for loop that iterates over
join() causing the loop to terminate prematurely. Since the exception
is caught and cleared by the for loop there's no practical way to get
a debugger to hook into the event that causes it.

In my case this happened somewhere in the middle of a long running
process. It was difficult to pin down what was causing this as the
iteration was over non-constant data and I didn't know what I was
looking for. As a result of the time spent fixing this I'm always very
cautious about calling next() to think about what a StopIteration
would do in context.

In this case a StopIteration is raised when reading from an empty csv file:
.... reader = csv.reader(csvfile)
.... header = next(reader)
....
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
StopIteration

If that code were called from a generator then it would most likely be
susceptible to the problem I'm describing. The fix is to use
next(reader, None) or try/except StopIteration.


Oscar
 
T

Tim Chase

Since I'm only interested in one or two columns, the simpler
approach is probably better.

Here's a sketch of how one of my projects handles that situation.
I think the index variables are invaluable documentation, and
make it a bit more robust. (Python 3, so not every bit is
relevant to you).

with open("today.csv", encoding='UTF-8', newline='') as today_file:
reader = csv.reader(today_file)
header = next(reader)
majr_index = header.index('MAJR')
div_index = header.index('DIV')
for rec in reader:
major = rec[majr_index]
rec[div_index] = DIVISION_TABLE[major]

But a csv.DictReader might still be more efficient. I never
tested. This is the only place I've used this "optimization".
It's fast enough. ;)

I believe the csv module does all the work at c-level, rather than
as pure Python, so it should be notably faster. The only times I've
had to do things by hand like that are when there are header
peculiarities that I can't control, such as mismatched case or
added/remove punctuation (client files are notorious for this). So I
often end up doing something like

def normalize(header):
return header.strip().upper() # other cleanup as needed

reader = csv.reader(f)
headers = next(reader)
header_map = dict(
(normalize(header), i)
for i, header
in enumerate(headers)
)
item = lambda col: row[header_map[col]].strip()
for row in reader:
major = item("MAJR").upper()
division = item("DIV")
# ...

The function calling might add overhead (in which case one could
just use explicit indirect indexing for each value assignment:

major = row[header_map["MAJR"]].strip().upper()

but I usually find that processing CSV files leaves me I/O bound
rather than CPU bound.

-tkc
 
S

Skip Montanaro

But a csv.DictReader might still be more efficient.

Depends on what efficiency you care about. The DictReader class is
implemented in Python, and builds a dict for every row. It will never
be more efficient CPU-wise than instantiating the csv.reader type
directly and only doing what you need.

OTOH, the DictReader class "just works" and its usage is more obvious
when you come back later to modify your code. It also makes the code
insensitive to column ordering (though yours seems to be as well, if
I'm reading it correctly). On the programmer efficiency axis, I score
the DictReader class higher than the reader type.

A simple test:

##########################
import csv
from timeit import Timer

setup = '''import csv
lst = ["""a,b,c,d,e,f,g"""]
lst.extend(["""05:38:24,0.6326,1,0,1.0,0.0,0.0"""] * 1000000)
reader = csv.reader(lst)
dreader = csv.DictReader(lst)
'''

t1 = Timer("for row in reader: pass", setup)
t2 = Timer("for row in dreader: pass", setup)

print(min(t1.repeat(number=10)))
print(min(t2.repeat(number=10)))
###############################

demonstrates that the raw reader is, indeed, much faster than the DictReader:

0.972723007202
8.29047989845

but that's for the basic iteration. Whatever you need to add to the
raw reader to insulate yourself from changes to the structure of the
CSV file and improve readability will slow it down, while the
DictReader will never be worse than the above.

Skip
 
T

Tim Chase

I believe the csv module does all the work at c-level, rather than
as pure Python, so it should be notably faster.

A little digging shows that csv.DictReader is pure Python, using the
underlying _csv.reader which is written in C for speed.

-tkc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top