### fastwrite5.py ###
import cStringIO
size = 50*1024*1024
value = 0
filename = 'fastwrite5.dat'
x = 0
b = cStringIO.StringIO()
while x < size:
   line = '{0}\n'.format(value)
   b.write(line)
   value += 1
   x += len(line)+1
Oh, I forgot to mention: you have a bug in this function. You're already
including the newline in the len(line), so there is no need to add one.
The result is that you only generate 44MB instead of 50MB.
f = open(filename, 'w')
f.write(b.getvalue())
f.close()
b.close()
Here are the results of profiling the above on my computer. Including the
overhead of the profiler, it takes just over 50 seconds to run your file
on my computer.
[steve@ando ~]$ python -m cProfile fastwrite5.py
17846645 function calls in 53.575 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 30.561 30.561 53.575 53.575 fastwrite5.py:1(<module>)
1 0.000 0.000 0.000 0.000 {cStringIO.StringIO}
5948879 5.582 0.000 5.582 0.000 {len}
1 0.004 0.004 0.004 0.004 {method 'close' of 'cStringIO.StringO' objects}
1 0.000 0.000 0.000 0.000 {method 'close' of 'file' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
5948879 9.979 0.000 9.979 0.000 {method 'format' of 'str' objects}
1 0.103 0.103 0.103 0.103 {method 'getvalue' of 'cStringIO.StringO' objects}
5948879 7.135 0.000 7.135 0.000 {method 'write' of 'cStringIO.StringO' objects}
1 0.211 0.211 0.211 0.211 {method 'write' of 'file' objects}
1 0.000 0.000 0.000 0.000 {open}
As you can see, the time is dominated by repeatedly calling len(),
str.format() and StringIO.write() methods. Actually writing the data to
the file is quite a small percentage of the cumulative time.
So, here's another version, this time using a pre-calculated limit. I
cheated and just copied the result from the fastwrite5 output
# fasterwrite.py
filename = 'fasterwrite.dat'
with open(filename, 'w') as f:
for i in xrange(5948879): # Actually only 44MB, not 50MB.
f.write('%d\n' % i)
And the profile results are about twice as fast as fastwrite5 above, with
only 8 seconds in total writing to my HDD.
[steve@ando ~]$ python -m cProfile fasterwrite.py
5948882 function calls in 28.840 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 20.592 20.592 28.840 28.840 fasterwrite.py:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
5948879 8.229 0.000 8.229 0.000 {method 'write' of 'file' objects}
1 0.019 0.019 0.019 0.019 {open}
Without the overhead of the profiler, it is a little faster:
[steve@ando ~]$ time python fasterwrite.py
real 0m16.187s
user 0m13.553s
sys 0m0.508s
Although it is still slower than the heavily optimized dd command,
but not unreasonably slow for a high-level language:
[steve@ando ~]$ time dd if=fasterwrite.dat of=copy.dat
90781+1 records in
90781+1 records out
46479922 bytes (46 MB) copied, 0.737009 seconds, 63.1 MB/s
real 0m0.786s
user 0m0.071s
sys 0m0.595s