trouble writing results to files

L

lisa.engblom

I have two semi related questions...

First, I am trying to output a list of strings to a csv file using the
csv module. The output file separates each letter of the string with a
comma and then puts each string on a separate line. So the code is:

import csv
output = csv.writer(open('/Python25/working/output.csv', 'a'))
a = ["apple", "cranberry", "tart"]
for elem in range(len(a)):
output.writerow(a[elem])


.... and it would write to the file:
a,p,p,l,e
c,r,a,n,b,e,r,r,y
t,a,r,t

How do I get it to write "apple", "cranberry", "tart" ?

Second, there is a significant delay (5-10 minutes) between when the
program finishes running and when the text actually appears in the
file. Any ideas for why this happens? It is the same for writing with
the csv module or the standard way.

thanks!
Lisa
 
R

Roberto Bonvallet

import csv
output = csv.writer(open('/Python25/working/output.csv', 'a'))
a = ["apple", "cranberry", "tart"]
for elem in range(len(a)):
output.writerow(a[elem])

output.writerow expects a sequence as an argument. You are passing a
string, which is a sequence of characters. By the way, what output are you
expecting to get? Do you want a file with only one line (apple,
cranberry, tart), or each fruit in a different line?

BTW, iterating over range(len(a)) is an anti-pattern in Python. You should
do it like this:

for item in a:
output.writerow([item])
Second, there is a significant delay (5-10 minutes) between when the
program finishes running and when the text actually appears in the
file.

Try closing the file explicitly.
Cheers,
 
N

Neil Cerutti

Neil said:
Unless you're modifying elements of a, surely?

enumerate is your friend :)

for n, item in enumerate(a):
if f(item):
a[n] = whatever

I was going to bring it up but I had a brainfart about the order
of (n, item) in the tuple and was too lazy to look it up. ;-)
 
L

lisa.engblom

Roberto said:
import csv
output = csv.writer(open('/Python25/working/output.csv', 'a'))
a = ["apple", "cranberry", "tart"]
for elem in range(len(a)):
output.writerow(a[elem])

output.writerow expects a sequence as an argument. You are passing a
string, which is a sequence of characters. By the way, what output are you
expecting to get? Do you want a file with only one line (apple,
cranberry, tart), or each fruit in a different line?

I want it to print everything on one line and then create a new line
where it will print some more stuff. In my real program I am iterating
and it will eventually print the list a couple hundred times. But it
would be useful to understand how to tell it to do either.
BTW, iterating over range(len(a)) is an anti-pattern in Python. You should
do it like this:

for item in a:
output.writerow([item])

I can try that. Is using range(len(a)) a bad solution in the sense
that its likely to create an unexpected error? Or because there is a
more efficient way to accomplish the same thing?

thanks!
Lisa
 
F

Fredrik Lundh

I can try that. Is using range(len(a)) a bad solution in the sense
that its likely to create an unexpected error? Or because there is a
more efficient way to accomplish the same thing?

for-in uses an internal index counter to fetch items from the sequence, so

for item in seq:
function(item)

is simply a shorter and more efficient way to write

for item in range(len(seq)):
function(seq[item])

also see this article:

http://online.effbot.org/2006_11_01_archive.htm#for

</F>
 
D

Dennis Lee Bieber

Roberto said:
import csv
output = csv.writer(open('/Python25/working/output.csv', 'a'))
a = ["apple", "cranberry", "tart"]
for elem in range(len(a)):
output.writerow(a[elem])
said:
I want it to print everything on one line and then create a new line

output.writerow(a)
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
S

Steven D'Aprano

and needs to run on a Python version that doesn't support enumerate.

This isn't meant as an argument against using enumerate in the common
case, but there are circumstances where iterating with an index variable
is the right thing to do. "Anti-pattern" tends to imply that it is always
wrong.

The advantage of enumerate disappears if you only need to do something to
certain items, not all of them:
.... print i, alist
....
3 d
5 f

Nice and clear. But this is just ugly and wasteful:
.... if i in xrange(3, 7, 2):
.... print i, c
....
3 d
5 f

although better than the naive alternative using slicing, which is just
wrong:
for i,c in enumerate(alist[3:7:2]):
.... print i, c
....
0 d
1 f

The indexes point to the wrong place in the original, non-sliced list, so
if you need to modify the original, you have to adjust the indexes by hand:
for i,c in enumerate(alist[3:7:2]):
.... print 2*i+3, c
....
3 d
5 f

And remember that if alist is truly huge, you may take a performance hit
due to duplicating all those megabytes of data when you slice it. If you
are modifying the original, better to skip making a slice.

I wrote a piece of code the other day that had to walk along a list,
swapping adjacent elements like this:

for i in xrange(0, len(alist)-1, 2):
alist, alist[i+1] = alist[i+1], alist


The version using enumerate is no improvement:

for i, x in enumerate(alist[0:len(alist)-1:2]):
alist[i*2], alist[i*2+1] = alist[i*2+1], x


In my opinion, it actually is harder to understand what it is doing.
Swapping two items using "a,b = b,a" is a well known and easily recognised
idiom. Swapping two items using "a,b = b,c" is not.
 
R

Roberto Bonvallet

Steven said:
This isn't meant as an argument against using enumerate in the common
case, but there are circumstances where iterating with an index variable
is the right thing to do. "Anti-pattern" tends to imply that it is always
wrong.

Right, I should have said: "iterating over range(len(a)) just to obtain the
elements of a is not the pythonic way to do it".

Cheers,
 
P

Peter Otten

Steven said:
And remember that if alist is truly huge, you may take a performance hit
due to duplicating all those megabytes of data when you slice it.

Having the same object in two lists simultaneously does not double the total
amount of memory; you just need space for an extra pointer.
If you
are modifying the original, better to skip making a slice.

I wrote a piece of code the other day that had to walk along a list,
swapping adjacent elements like this:

for i in xrange(0, len(alist)-1, 2):
alist, alist[i+1] = alist[i+1], alist


The version using enumerate is no improvement:

for i, x in enumerate(alist[0:len(alist)-1:2]):
alist[i*2], alist[i*2+1] = alist[i*2+1], x


In my opinion, it actually is harder to understand what it is doing.
Swapping two items using "a,b = b,a" is a well known and easily recognised
idiom. Swapping two items using "a,b = b,c" is not.


That example was chosen to prove your point. The real contender for the
"swap items" problem are slices.

def swap_slice(items):
left = items[::2]
items[::2] = items[1::2]
items[1::2] = left
return items

def swap_loop(items):
for i in xrange(0, len(items)-1, 2):
k = i+1
items, items[k] = items[k], items
return items

$ python2.5 -m timeit -s 'from swap import swap_loop as swap; r =
range(10**6)' 'swap(r)'
10 loops, best of 3: 326 msec per loop
$ python2.5 -m timeit -s 'from swap import swap_slice as swap; r =
range(10**6)' 'swap(r)'
10 loops, best of 3: 186 msec per loop

$ python2.5 -m timeit -s 'from swap import swap_loop as swap; r =
range(10**7)' 'swap(r)'
10 loops, best of 3: 3.27 sec per loop
$ python2.5 -m timeit -s 'from swap import swap_slice as swap; r =
range(10**7)' 'swap(r)'
10 loops, best of 3: 2.29 sec per loop

With 10**7 items in the list I hear disc access on my system, so the
theoretical sweet spot where swap_slice() is already swapping while
swap_loop() is not may be nonexistent...

Peter
 
D

Duncan Booth

Peter Otten said:
That example was chosen to prove your point. The real contender for the
"swap items" problem are slices.

def swap_slice(items):
left = items[::2]
items[::2] = items[1::2]
items[1::2] = left
return items

It makes no difference to the time or memory use, but you can of course
also write swap_slice using the aforementioned 'well known idiom':

items[::2], items[1::2] = items[1::2], items[::2]
 
S

Steven D'Aprano

Having the same object in two lists simultaneously does not double the total
amount of memory; you just need space for an extra pointer.

Sure. But if you have millions of items in a list, the pointers themselves
take millions of bytes -- otherwise known as megabytes.


[snip]
That example was chosen to prove your point.

Well, I thought about choosing an example that disproved my point, but I
couldn't think of one :)
The real contender for the "swap items" problem are slices.

def swap_slice(items):
left = items[::2]
items[::2] = items[1::2]
items[1::2] = left
return items

I always forget that extended slices can be assigned to as well as
assigned from! Nice piece of code... if only it worked.
.... left = items[::2]
.... items[::2] = items[1::2]
.... items[1::2] = left
.... return items
....
alist [0, 1, 2, 3, 4]
swap_slice(alist)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 3, in swap_slice
ValueError: attempt to assign sequence of size 2 to extended slice of size 3

Everybody always forgets corner cases... like lists with odd number of
items... *wink*

Here is a version that handles both odd and even length lists:

def swap_slice(items):
left = items[:len(items)-1:2]
items[:len(items)-1:2] = items[1::2]
items[1::2] = left
return items


Replacing the explicit Python for loop with an implicit loop in C makes it
very much faster.
 
D

Duncan Booth

Steven D'Aprano said:
Everybody always forgets corner cases... like lists with odd number of
items... *wink*
I didn't forget. I just assumed that raising an exception was a more useful
response.
Here is a version that handles both odd and even length lists:

def swap_slice(items):
left = items[:len(items)-1:2]
items[:len(items)-1:2] = items[1::2]
items[1::2] = left
return items
I guess a viable alternative to raising an exception would be to pad the
list to even length:

[1, 2, 3] -> [2, 1, Padding, 3]

for some value of Padding. I don't think I would expect swap_slice to
silently fail to swap the last element.
 
P

Peter Otten

Steven said:
Sure. But if you have millions of items in a list, the pointers themselves
take millions of bytes -- otherwise known as megabytes.

I don't know the exact layout of an int, but let's assume 4 bytes for the
class, the value, the refcount and the initial list entry -- which gives
you 16 bytes per entry for what is probably the class with the smallest
footprint in the python universe. For the range(N) example the slicing
approach then needs an extra 4 bytes or 25 percent. On the other hand, if
you are not dealing with unique objects (say range(100) * (N//100)) the
amount of memory for the actual objects is negligable and consequently the
total amount doubles.
You should at least take that difference into account when you choose the
swapping algorithm.
Well, I thought about choosing an example that disproved my point, but I
couldn't think of one :)

Lack of fantasy :)
The real contender for the "swap items" problem are slices.

def swap_slice(items):
left = items[::2]
items[::2] = items[1::2]
items[1::2] = left
return items

I always forget that extended slices can be assigned to as well as
assigned from! Nice piece of code... if only it worked.
... left = items[::2]
... items[::2] = items[1::2]
... items[1::2] = left
... return items
...
alist [0, 1, 2, 3, 4]
swap_slice(alist)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 3, in swap_slice
ValueError: attempt to assign sequence of size 2 to extended slice of size
3

Everybody always forgets corner cases... like lists with odd number of
items... *wink*

True in general, but on that one I'm with Duncan.

Here is another implementation that cuts maximum memory down from 100 to
50%.

from itertools import islice
def swap(items):
items[::2], items[1::2] = islice(items, 1, None, 2), items[::2]
return items

Peter
 
P

Peter Otten

Peter said:
Here is another implementation that cuts maximum memory down from 100 to
50%.

from itertools import islice
def swap(items):
    items[::2], items[1::2] = islice(items, 1, None, 2), items[::2]
    return items

Unfortunately, the following
a = [1, 2, 3]
a[::2] = iter([10, 20, 30])
Traceback (most recent call last):
[1, 2, 3]

does not support my bold claim :-( Since the list is not changed there must
be an intermediate copy.

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top