There must be a better way

Colin J. Williams · Apr 21, 2013

Below is part of a script which shows the changes made to permit the
script to run on either Python 2.7 or Python 3.2.

I was surprised to see that the CSV next method is no longer available.

Suggestions welcome.

Colin W.

def main():
global inData, inFile
if ver == '2':
headerLine= inData.next()
else: # Python version 3.3
inFile.close()
inFile= open('Don Wall April 18 2013.csv', 'r', newline= '')
inData= csv.reader(inFile)
headerLine= inData.__next__()

Chris Rebert · Apr 21, 2013

Below is part of a script which shows the changes made to permit the script
to run on either Python 2.7 or Python 3.2.

I was surprised to see that the CSV next method is no longer available.

Suggestions welcome.

if ver == '2':
headerLine= inData.next()
else: # Python version 3.3

headerLine= inData.__next__()

Use the built-in next() function
(http://docs.python.org/2/library/functions.html#next ) instead:
headerLine = next(iter(inData))

Cheers,
Chris

Steven D'Aprano · Apr 21, 2013

Below is part of a script which shows the changes made to permit the
script to run on either Python 2.7 or Python 3.2.

I was surprised to see that the CSV next method is no longer available.

This makes no sense. What's "the CSV next method"? Are you talking about
the csv module? It has no "next method".

py> import csv
py> csv.next
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'next'

Please *define your terms*, otherwise we are flailing in the dark trying
to guess what your code is supposed to do. The code you provide cannot
possible work -- you use variables before they are defined, use other
variables that are never defined at all, reference mysterious globals.
You even close a file before it is opened!

Please read this:

http://sscce.org/

and provide a *short, self-contained, correct example* that we can
actually run.

But in the meantime, I'm going to consult the entrails and try to guess
what you are doing: you're complaining that iterators have a next method
in Python 2, and __next__ in Python 3. Am I correct?

If so, this is true, but you should not be using the plain next method in
Python 2. You should be using the built-in function next(), not calling
the method directly. The plain next *method* was a mistake, only left in
for compatibility with older versions of Python. Starting from Python 2.6
the correct way to get the next value from an arbitrary iterator is with
the built-in function next(), not by calling a method directly.

(In the same way that you get the length of a sequence by calling the
built-in function len(), not by calling the __len__ method directly.)

So provided you are using Python 2.6 or better, you call:

next(inData)

to get the next value, regardless of whether it is Python 2.x or 3.x.

If you need to support older versions, you can do this:

try:
next # Does the built-in already exist?
except NameError:
# No, we define our own.
def next(iterator):
return iterator.next()

then just use next(inData) as normal.

Tim Chase · Apr 21, 2013

This makes no sense. What's "the CSV next method"? Are you talking
about the csv module? It has no "next method".

In 2.x, the csv.reader() class (and csv.DictReader() class) offered
a .next() method that is absent in 3.x For those who use(d) the
csv.reader object on a regular basis, this was a pretty common
usage. Particularly if you had to do your own header parsing:

f = open(...)
r = csv.reader(f)
try:
headers = r.next()
header_map = analyze(headers)
for row in r:
foo = row[header_map["FOO COLUMN"]]
process(foo)
finally:
f.close()

(I did this for a number of cases where the client couldn't
consistently send column-headers in a consistent
capitalization/spaces, so my header-making function had to normalize
the case/spaces and then reference the normalized names)

So provided you are using Python 2.6 or better, you call:

next(inData)

to get the next value, regardless of whether it is Python 2.x or
3.x.

If you need to support older versions, you can do this:

try:
next # Does the built-in already exist?
except NameError:
# No, we define our own.
def next(iterator):
return iterator.next()

then just use next(inData) as normal.

This is a good expansion of Chris Rebert's suggestion to use next(),
as those of us that have to support pre-2.6 code lack the next()
function out of the box.

-tkc

Terry Jan Reedy · Apr 21, 2013

In 2.x, the csv.reader() class (and csv.DictReader() class) offered
a .next() method that is absent in 3.x

In Py 3, .next was renamed to .__next__ for *all* iterators. The
intention is that one iterate with for item in iterable or use builtin
functions iter() and next().

Colin J. Williams · Apr 21, 2013

In Py 3, .next was renamed to .__next__ for *all* iterators. The
intention is that one iterate with for item in iterable or use builtin
functions iter() and next().

Thanks to Chris, Tim and Terry for their helpful comments.

I was seeking some code that would be acceptable to both Python 2.7 and 3.3.

In the end, I used:

inData= csv.reader(inFile)

def main():
if ver == '2':
headerLine= inData.next()
else:
headerLine= inData.__next__()
...
for item in inData:
assert len(dataStore) == len(item)
j= findCardinal(item[10])
...

This is acceptable to both versions.

It is not usual to have a name with preceding and following
udserscores,imn user code.

Presumably, there is a rationale for the change from csv.reader.next
to csv.reader.__next__.

If next is not acceptable for the version 3 csv.reader, perhaps __next__
could be added to the version 2 csv.reader, so that the same code can be
used in the two versions.

This would avoid the kluge I used above.

Colin W.

Jussi Piitulainen · Apr 21, 2013

Colin J. Williams writes:
....

It is not usual to have a name with preceding and following
udserscores,imn user code.

Presumably, there is a rationale for the change from csv.reader.next
to csv.reader.__next__.

....

I think the user code is supposed to be next(csv.reader). For example,
current documentation contains the following.

# csvreader.__next__()
# Return the next row of the readerâ€™s iterable object as a list,
# parsed according to the current dialect. Usually you should call
# this as next(reader).

Peter Otten · Apr 21, 2013

Colin said:
I was seeking some code that would be acceptable to both Python 2.7 and
3.3.

In the end, I used:

inData= csv.reader(inFile)

def main():
if ver == '2':
headerLine= inData.next()
else:
headerLine= inData.__next__()
...

I think it was mentioned before, but to be explicit:

def main():
headerLine = next(inData)
...

works in Python 2.6, 2.7, and 3.x.

Colin J. Williams · Apr 21, 2013

Colin J. Williams writes:
...
...

I think the user code is supposed to be next(csv.reader). For example,
current documentation contains the following.

# csvreader.__next__()
# Return the next row of the readerâ€™s iterable object as a list,
# parsed according to the current dialect. Usually you should call
# this as next(reader).

Thanks,

This works with both 2.7 and 3.3

Colin W.

Colin J. Williams · Apr 21, 2013

I think it was mentioned before, but to be explicit:

def main():
headerLine = next(inData)
...

works in Python 2.6, 2.7, and 3.x.

Yes, the penny dropped eventually. I've used your statement

The Chris suggestion was slightly different:

Use the built-in next() function
(http://docs.python.org/2/library/functions.html#next ) instead:
headerLine = next(iter(inData))

Colin W.

Colin J. Williams · Apr 21, 2013

I think it was mentioned before, but to be explicit:

def main():
headerLine = next(inData)
...

works in Python 2.6, 2.7, and 3.x.

Yes, the penny dropped eventually. I've used your statement

The Chris suggestion was slightly different:

Use the built-in next() function
(http://docs.python.org/2/library/functions.html#next ) instead:
headerLine = next(iter(inData))

Colin W.

Neil Cerutti · Apr 22, 2013

In Py 3, .next was renamed to .__next__ for *all* iterators. The
intention is that one iterate with for item in iterable or use builtin
functions iter() and next().

Click to expand...

Thanks to Chris, Tim and Terry for their helpful comments.

I was seeking some code that would be acceptable to both Python 2.7 and 3.3.

In the end, I used:

inData= csv.reader(inFile)

def main():
if ver == '2':
headerLine= inData.next()
else:
headerLine= inData.__next__()
...
for item in inData:
assert len(dataStore) == len(item)
j= findCardinal(item[10])
...

This is acceptable to both versions.

It is not usual to have a name with preceding and following
udserscores,imn user code.

Presumably, there is a rationale for the change from csv.reader.next
to csv.reader.__next__.

If next is not acceptable for the version 3 csv.reader, perhaps __next__
could be added to the version 2 csv.reader, so that the same code can be
used in the two versions.

This would avoid the kluge I used above.

Would using csv.DictReader instead a csv.reader be an option?

Colin J. Williams · Apr 22, 2013

On 4/20/2013 8:34 PM, Tim Chase wrote:
In 2.x, the csv.reader() class (and csv.DictReader() class) offered
a .next() method that is absent in 3.x

In Py 3, .next was renamed to .__next__ for *all* iterators. The
intention is that one iterate with for item in iterable or use builtin
functions iter() and next().

Click to expand...

Thanks to Chris, Tim and Terry for their helpful comments.

I was seeking some code that would be acceptable to both Python 2.7 and 3.3.

In the end, I used:

inData= csv.reader(inFile)

def main():
if ver == '2':
headerLine= inData.next()
else:
headerLine= inData.__next__()
...
for item in inData:
assert len(dataStore) == len(item)
j= findCardinal(item[10])
...

This is acceptable to both versions.

It is not usual to have a name with preceding and following
udserscores,imn user code.

Presumably, there is a rationale for the change from csv.reader.next
to csv.reader.__next__.

If next is not acceptable for the version 3 csv.reader, perhaps __next__
could be added to the version 2 csv.reader, so that the same code can be
used in the two versions.

This would avoid the kluge I used above.

Click to expand...

Would using csv.DictReader instead a csv.reader be an option?

Since I'm only interested in one or two columns, the simpler approach is
probably better.

Colin W.

Neil Cerutti · Apr 23, 2013

Since I'm only interested in one or two columns, the simpler
approach is probably better.

Here's a sketch of how one of my projects handles that situation.
I think the index variables are invaluable documentation, and
make it a bit more robust. (Python 3, so not every bit is
relevant to you).

with open("today.csv", encoding='UTF-8', newline='') as today_file:
reader = csv.reader(today_file)
header = next(reader)
majr_index = header.index('MAJR')
div_index = header.index('DIV')
for rec in reader:
major = rec[majr_index]
rec[div_index] = DIVISION_TABLE[major]

But a csv.DictReader might still be more efficient. I never
tested. This is the only place I've used this "optimization".
It's fast enough.

Oscar Benjamin · Apr 23, 2013

Here's a sketch of how one of my projects handles that situation.
I think the index variables are invaluable documentation, and
make it a bit more robust. (Python 3, so not every bit is
relevant to you).

with open("today.csv", encoding='UTF-8', newline='') as today_file:
reader = csv.reader(today_file)
header = next(reader)

I once had a bug that took a long time to track down and was caused by
using next() without an enclosing try/except StopIteration (or the
optional default argument to next).

This is a sketch of how you can get the bug that I had:

$ cat next.py
#!/usr/bin/env python

def join(iterables):
'''Join iterable of iterables, stripping first item'''
for iterable in iterables:
iterator = iter(iterable)
header = next(iterator) # Here's the problem
for val in iterator:
yield val

data = [
['foo', 1, 2, 3],
['bar', 4, 5, 6],
[], # Whoops! Who put this empty iterable here?
['baz', 7, 8, 9],
]

for x in join(data):
print(x)

$ ./next.py
1
2
3
4
5
6

The values 7, 8 and 9 are not printed but no error message is shown.
This is because calling next on the iterator over the empty list
raises a StopIteration that is not caught in the join generator. The
StopIteration is then "caught" by the for loop that iterates over
join() causing the loop to terminate prematurely. Since the exception
is caught and cleared by the for loop there's no practical way to get
a debugger to hook into the event that causes it.

In my case this happened somewhere in the middle of a long running
process. It was difficult to pin down what was causing this as the
iteration was over non-constant data and I didn't know what I was
looking for. As a result of the time spent fixing this I'm always very
cautious about calling next() to think about what a StopIteration
would do in context.

In this case a StopIteration is raised when reading from an empty csv file:
.... reader = csv.reader(csvfile)
.... header = next(reader)
....
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
StopIteration

If that code were called from a generator then it would most likely be
susceptible to the problem I'm describing. The fix is to use
next(reader, None) or try/except StopIteration.

Oscar

Tim Chase · Apr 23, 2013

Since I'm only interested in one or two columns, the simpler
approach is probably better.

Click to expand...

Here's a sketch of how one of my projects handles that situation.
I think the index variables are invaluable documentation, and
make it a bit more robust. (Python 3, so not every bit is
relevant to you).

with open("today.csv", encoding='UTF-8', newline='') as today_file:
reader = csv.reader(today_file)
header = next(reader)
majr_index = header.index('MAJR')
div_index = header.index('DIV')
for rec in reader:
major = rec[majr_index]
rec[div_index] = DIVISION_TABLE[major]

But a csv.DictReader might still be more efficient. I never
tested. This is the only place I've used this "optimization".
It's fast enough.

I believe the csv module does all the work at c-level, rather than
as pure Python, so it should be notably faster. The only times I've
had to do things by hand like that are when there are header
peculiarities that I can't control, such as mismatched case or
added/remove punctuation (client files are notorious for this). So I
often end up doing something like

def normalize(header):
return header.strip().upper() # other cleanup as needed

reader = csv.reader(f)
headers = next(reader)
header_map = dict(
(normalize(header), i)
for i, header
in enumerate(headers)
)
item = lambda col: row[header_map[col]].strip()
for row in reader:
major = item("MAJR").upper()
division = item("DIV")
# ...

The function calling might add overhead (in which case one could
just use explicit indirect indexing for each value assignment:

major = row[header_map["MAJR"]].strip().upper()

but I usually find that processing CSV files leaves me I/O bound
rather than CPU bound.

-tkc

Skip Montanaro · Apr 23, 2013

But a csv.DictReader might still be more efficient.

Depends on what efficiency you care about. The DictReader class is
implemented in Python, and builds a dict for every row. It will never
be more efficient CPU-wise than instantiating the csv.reader type
directly and only doing what you need.

OTOH, the DictReader class "just works" and its usage is more obvious
when you come back later to modify your code. It also makes the code
insensitive to column ordering (though yours seems to be as well, if
I'm reading it correctly). On the programmer efficiency axis, I score
the DictReader class higher than the reader type.

A simple test:

##########################
import csv
from timeit import Timer

setup = '''import csv
lst = ["""a,b,c,d,e,f,g"""]
lst.extend(["""05:38:24,0.6326,1,0,1.0,0.0,0.0"""] * 1000000)
reader = csv.reader(lst)
dreader = csv.DictReader(lst)
'''

t1 = Timer("for row in reader: pass", setup)
t2 = Timer("for row in dreader: pass", setup)

print(min(t1.repeat(number=10)))
print(min(t2.repeat(number=10)))
###############################

demonstrates that the raw reader is, indeed, much faster than the DictReader:

0.972723007202
8.29047989845

but that's for the basic iteration. Whatever you need to add to the
raw reader to insulate yourself from changes to the structure of the
CSV file and improve readability will slow it down, while the
DictReader will never be worse than the above.

Skip

Tim Chase · Apr 23, 2013

I believe the csv module does all the work at c-level, rather than
as pure Python, so it should be notably faster.

A little digging shows that csv.DictReader is pure Python, using the
underlying _csv.reader which is written in C for speed.

-tkc

comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.vhdl FAQ part 2 of 4: books	0	Jul 8, 2003
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004
comp.lang.c Changes to Answers to Frequently Asked Questions (FAQ)	1	Jul 4, 2004

There must be a better way

Colin J. Williams

Chris Rebert

Steven D'Aprano

Tim Chase

Terry Jan Reedy

Colin J. Williams

Jussi Piitulainen

Peter Otten

Colin J. Williams

Colin J. Williams

Colin J. Williams

Neil Cerutti

Colin J. Williams

Neil Cerutti

Oscar Benjamin

Tim Chase

Skip Montanaro

Tim Chase

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads