trouble writing results to files

lisa.engblom · Nov 29, 2006

I have two semi related questions...

First, I am trying to output a list of strings to a csv file using the
csv module. The output file separates each letter of the string with a
comma and then puts each string on a separate line. So the code is:

import csv
output = csv.writer(open('/Python25/working/output.csv', 'a'))
a = ["apple", "cranberry", "tart"]
for elem in range(len(a)):
output.writerow(a[elem])

.... and it would write to the file:
a,p,p,l,e
c,r,a,n,b,e,r,r,y
t,a,r,t

How do I get it to write "apple", "cranberry", "tart" ?

Second, there is a significant delay (5-10 minutes) between when the
program finishes running and when the text actually appears in the
file. Any ideas for why this happens? It is the same for writing with
the csv module or the standard way.

thanks!
Lisa

Roberto Bonvallet · Nov 29, 2006

import csv
output = csv.writer(open('/Python25/working/output.csv', 'a'))
a = ["apple", "cranberry", "tart"]
for elem in range(len(a)):
output.writerow(a[elem])

output.writerow expects a sequence as an argument. You are passing a
string, which is a sequence of characters. By the way, what output are you
expecting to get? Do you want a file with only one line (apple,
cranberry, tart), or each fruit in a different line?

BTW, iterating over range(len(a)) is an anti-pattern in Python. You should
do it like this:

for item in a:
output.writerow([item])

Second, there is a significant delay (5-10 minutes) between when the
program finishes running and when the text actually appears in the
file.

Try closing the file explicitly.
Cheers,

Neil Cerutti · Nov 29, 2006

BTW, iterating over range(len(a)) is an anti-pattern in Python.

Unless you're modifying elements of a, surely?

Roberto Bonvallet · Nov 29, 2006

Neil said:
Unless you're modifying elements of a, surely?

enumerate is your friend

for n, item in enumerate(a):
if f(item):
a[n] = whatever

Neil Cerutti · Nov 29, 2006

Neil said:
Neil said:

Unless you're modifying elements of a, surely?

Click to expand...

enumerate is your friend

for n, item in enumerate(a):
if f(item):
a[n] = whatever

I was going to bring it up but I had a brainfart about the order
of (n, item) in the tuple and was too lazy to look it up. ;-)

Fredrik Lundh · Nov 29, 2006

Neil said:
Unless you're modifying elements of a, surely?

and needs to run on a Python version that doesn't support enumerate.

</F>

lisa.engblom · Nov 29, 2006

Roberto said:
import csv
output = csv.writer(open('/Python25/working/output.csv', 'a'))
a = ["apple", "cranberry", "tart"]
for elem in range(len(a)):
output.writerow(a[elem])

Click to expand...

output.writerow expects a sequence as an argument. You are passing a
string, which is a sequence of characters. By the way, what output are you
expecting to get? Do you want a file with only one line (apple,
cranberry, tart), or each fruit in a different line?

I want it to print everything on one line and then create a new line
where it will print some more stuff. In my real program I am iterating
and it will eventually print the list a couple hundred times. But it
would be useful to understand how to tell it to do either.

BTW, iterating over range(len(a)) is an anti-pattern in Python. You should
do it like this:

for item in a:
output.writerow([item])

I can try that. Is using range(len(a)) a bad solution in the sense
that its likely to create an unexpected error? Or because there is a
more efficient way to accomplish the same thing?

thanks!
Lisa

Fredrik Lundh · Nov 29, 2006

I can try that. Is using range(len(a)) a bad solution in the sense
that its likely to create an unexpected error? Or because there is a
more efficient way to accomplish the same thing?

for-in uses an internal index counter to fetch items from the sequence, so

for item in seq:
function(item)

is simply a shorter and more efficient way to write

for item in range(len(seq)):
function(seq[item])

also see this article:

http://online.effbot.org/2006_11_01_archive.htm#for

</F>

Dennis Lee Bieber · Nov 29, 2006

Roberto said:
Roberto said:

import csv
output = csv.writer(open('/Python25/working/output.csv', 'a'))
a = ["apple", "cranberry", "tart"]
for elem in range(len(a)):
output.writerow(a[elem])

Click to expand...

Click to expand...

said:
I want it to print everything on one line and then create a new line

output.writerow(a)
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

Steven D'Aprano · Nov 29, 2006

and needs to run on a Python version that doesn't support enumerate.

This isn't meant as an argument against using enumerate in the common
case, but there are circumstances where iterating with an index variable
is the right thing to do. "Anti-pattern" tends to imply that it is always
wrong.

The advantage of enumerate disappears if you only need to do something to
certain items, not all of them:
.... print i, alist
....
3 d
5 f

Nice and clear. But this is just ugly and wasteful:
.... if i in xrange(3, 7, 2):
.... print i, c
....
3 d
5 f

although better than the naive alternative using slicing, which is just
wrong:

for i,c in enumerate(alist[3:7:2]):

Click to expand...

Click to expand...

Click to expand...

.... print i, c
....
0 d
1 f

The indexes point to the wrong place in the original, non-sliced list, so
if you need to modify the original, you have to adjust the indexes by hand:

for i,c in enumerate(alist[3:7:2]):

Click to expand...

Click to expand...

Click to expand...

.... print 2*i+3, c
....
3 d
5 f

And remember that if alist is truly huge, you may take a performance hit
due to duplicating all those megabytes of data when you slice it. If you
are modifying the original, better to skip making a slice.

I wrote a piece of code the other day that had to walk along a list,
swapping adjacent elements like this:

for i in xrange(0, len(alist)-1, 2):
alist, alist[i+1] = alist[i+1], alist

The version using enumerate is no improvement:

for i, x in enumerate(alist[0:len(alist)-1:2]):
alist[i*2], alist[i*2+1] = alist[i*2+1], x

In my opinion, it actually is harder to understand what it is doing.
Swapping two items using "a,b = b,a" is a well known and easily recognised
idiom. Swapping two items using "a,b = b,c" is not.

Roberto Bonvallet · Nov 30, 2006

Steven said:
This isn't meant as an argument against using enumerate in the common
case, but there are circumstances where iterating with an index variable
is the right thing to do. "Anti-pattern" tends to imply that it is always
wrong.

Right, I should have said: "iterating over range(len(a)) just to obtain the
elements of a is not the pythonic way to do it".

Cheers,

Peter Otten · Nov 30, 2006

Steven said:
And remember that if alist is truly huge, you may take a performance hit
due to duplicating all those megabytes of data when you slice it.

Having the same object in two lists simultaneously does not double the total
amount of memory; you just need space for an extra pointer.

If you
are modifying the original, better to skip making a slice.

I wrote a piece of code the other day that had to walk along a list,
swapping adjacent elements like this:

for i in xrange(0, len(alist)-1, 2):
alist, alist[i+1] = alist[i+1], alist

The version using enumerate is no improvement:

for i, x in enumerate(alist[0:len(alist)-1:2]):
alist[i*2], alist[i*2+1] = alist[i*2+1], x

In my opinion, it actually is harder to understand what it is doing.
Swapping two items using "a,b = b,a" is a well known and easily recognised
idiom. Swapping two items using "a,b = b,c" is not.

That example was chosen to prove your point. The real contender for the
"swap items" problem are slices.

def swap_slice(items):
left = items[::2]
items[::2] = items[1::2]
items[1::2] = left
return items

def swap_loop(items):
for i in xrange(0, len(items)-1, 2):
k = i+1
items, items[k] = items[k], items
return items

$ python2.5 -m timeit -s 'from swap import swap_loop as swap; r =
range(10**6)' 'swap(r)'
10 loops, best of 3: 326 msec per loop
$ python2.5 -m timeit -s 'from swap import swap_slice as swap; r =
range(10**6)' 'swap(r)'
10 loops, best of 3: 186 msec per loop

$ python2.5 -m timeit -s 'from swap import swap_loop as swap; r =
range(10**7)' 'swap(r)'
10 loops, best of 3: 3.27 sec per loop
$ python2.5 -m timeit -s 'from swap import swap_slice as swap; r =
range(10**7)' 'swap(r)'
10 loops, best of 3: 2.29 sec per loop

With 10**7 items in the list I hear disc access on my system, so the
theoretical sweet spot where swap_slice() is already swapping while
swap_loop() is not may be nonexistent...

Peter

Duncan Booth · Nov 30, 2006

Peter Otten said:
That example was chosen to prove your point. The real contender for the
"swap items" problem are slices.

def swap_slice(items):
left = items[::2]
items[::2] = items[1::2]
items[1::2] = left
return items

It makes no difference to the time or memory use, but you can of course
also write swap_slice using the aforementioned 'well known idiom':

items[::2], items[1::2] = items[1::2], items[::2]

Peter Otten · Nov 30, 2006

Duncan said:
items[::2], items[1::2] = items[1::2], items[::2]

Cool. I really should have found that myself.

Peter

Steven D'Aprano · Nov 30, 2006

Having the same object in two lists simultaneously does not double the total
amount of memory; you just need space for an extra pointer.

Sure. But if you have millions of items in a list, the pointers themselves
take millions of bytes -- otherwise known as megabytes.

[snip]

That example was chosen to prove your point.

Well, I thought about choosing an example that disproved my point, but I
couldn't think of one

The real contender for the "swap items" problem are slices.

def swap_slice(items):
left = items[::2]
items[::2] = items[1::2]
items[1::2] = left
return items

I always forget that extended slices can be assigned to as well as
assigned from! Nice piece of code... if only it worked.
.... left = items[::2]
.... items[::2] = items[1::2]
.... items[1::2] = left
.... return items
....

alist [0, 1, 2, 3, 4]
swap_slice(alist)

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 3, in swap_slice
ValueError: attempt to assign sequence of size 2 to extended slice of size 3

Everybody always forgets corner cases... like lists with odd number of
items... *wink*

Here is a version that handles both odd and even length lists:

def swap_slice(items):
left = items[:len(items)-1:2]
items[:len(items)-1:2] = items[1::2]
items[1::2] = left
return items

Replacing the explicit Python for loop with an implicit loop in C makes it
very much faster.

Duncan Booth · Nov 30, 2006

Steven D'Aprano said:
Everybody always forgets corner cases... like lists with odd number of
items... *wink*

I didn't forget. I just assumed that raising an exception was a more useful
response.

Here is a version that handles both odd and even length lists:

def swap_slice(items):
left = items[:len(items)-1:2]
items[:len(items)-1:2] = items[1::2]
items[1::2] = left
return items

I guess a viable alternative to raising an exception would be to pad the
list to even length:

[1, 2, 3] -> [2, 1, Padding, 3]

for some value of Padding. I don't think I would expect swap_slice to
silently fail to swap the last element.

Peter Otten · Dec 1, 2006

Steven said:
Sure. But if you have millions of items in a list, the pointers themselves
take millions of bytes -- otherwise known as megabytes.

I don't know the exact layout of an int, but let's assume 4 bytes for the
class, the value, the refcount and the initial list entry -- which gives
you 16 bytes per entry for what is probably the class with the smallest
footprint in the python universe. For the range(N) example the slicing
approach then needs an extra 4 bytes or 25 percent. On the other hand, if
you are not dealing with unique objects (say range(100) * (N//100)) the
amount of memory for the actual objects is negligable and consequently the
total amount doubles.
You should at least take that difference into account when you choose the
swapping algorithm.

Well, I thought about choosing an example that disproved my point, but I
couldn't think of one

Lack of fantasy

The real contender for the "swap items" problem are slices.

def swap_slice(items):
left = items[::2]
items[::2] = items[1::2]
items[1::2] = left
return items

Click to expand...

I always forget that extended slices can be assigned to as well as
assigned from! Nice piece of code... if only it worked.
... left = items[::2]
... items[::2] = items[1::2]
... items[1::2] = left
... return items
...

alist [0, 1, 2, 3, 4]
swap_slice(alist)

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 3, in swap_slice
ValueError: attempt to assign sequence of size 2 to extended slice of size
3

Everybody always forgets corner cases... like lists with odd number of
items... *wink*

True in general, but on that one I'm with Duncan.

Here is another implementation that cuts maximum memory down from 100 to
50%.

from itertools import islice
def swap(items):
items[::2], items[1::2] = islice(items, 1, None, 2), items[::2]
return items

Peter

Peter Otten · Dec 1, 2006

Peter said:
Here is another implementation that cuts maximum memory down from 100 to
50%.

from itertools import islice
def swap(items):
items[::2], items[1::2] = islice(items, 1, None, 2), items[::2]
return items

Unfortunately, the following

a = [1, 2, 3]
a[::2] = iter([10, 20, 30])

Click to expand...

Click to expand...

Traceback (most recent call last):
[1, 2, 3]

does not support my bold claim :-( Since the list is not changed there must
be an intermediate copy.

Peter

Processing large CSV files - how to maximise throughput?	11	Oct 24, 2013
Grouping on and exporting to csv files	1	Mar 19, 2013
2nd Try: Trouble writing lines to file that include line feeds - Newbie	1	Dec 22, 2013
Writing files	3	Mar 19, 2007
writing a csv file	1	Nov 11, 2012
Trouble writing lines into file with line feeds- Python Newb	1	Dec 22, 2013
writing results to array	4	Dec 3, 2007
How to loop in folder through all excel files and all sheets using pandas?	0	Dec 1, 2022

trouble writing results to files

lisa.engblom

Roberto Bonvallet

Neil Cerutti

Roberto Bonvallet

Neil Cerutti

Fredrik Lundh

lisa.engblom

Fredrik Lundh

Dennis Lee Bieber

Steven D'Aprano

Roberto Bonvallet

Peter Otten

Duncan Booth

Peter Otten

Steven D'Aprano

Duncan Booth

Peter Otten

Peter Otten

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads