How to write fast into a file in python?

lokeshkoppaka · May 17, 2013

I need to write numbers into a file upto 50mb and it should be fast
can any one help me how to do that?
i had written the following code..
-----------------------------------------------------------------------------------------------------------
def create_file_numbers_old(filename, size):
start = time.clock()

value = 0
with open(filename, "w") as f:
while f.tell()< size:
f.write("{0}\n".format(value))
value += 1

end = time.clock()

print "time taken to write a file of size", size, " is ", (end -start), "seconds \n"

Steven D'Aprano · May 17, 2013

I need to write numbers into a file upto 50mb and it should be fast can
any one help me how to do that?
i had written the following code..
----------------------------------------------------------------------
def create_file_numbers_old(filename, size): start = time.clock()

value = 0
with open(filename, "w") as f:
while f.tell()< size:
f.write("{0}\n".format(value))
value += 1

end = time.clock()

print "time taken to write a file of size", size, " is ", (end -start),
"seconds \n"

20 seconds to write how many numbers? If you are doing

create_file_numbers_old(filename, 5)

then 20 seconds is really slow. But if you are doing:

create_file_numbers_old(filename, 50000000000000)

then 20 seconds is amazingly fast.

Try this instead, it may be a little faster:

def create_file_numbers_old(filename, size):
count = value = 0
with open(filename, 'w') as f:
while count < size:
s = '%d\n' % value
f.write(s)
count += len(s)
value += 1

If this is still too slow, you can try three other tactics:

1) pre-calculate the largest integer that will fit in `size` bytes, then
use a for-loop instead of a while loop:

maxn = calculation(...)
with open(filename, 'w') as f:
for i in xrange(maxn):
f.write('%d\n' % i)

2) Write an extension module in C that writes to the file.

3) Get a faster hard drive, and avoid writing over a network.

lokeshkoppaka · May 17, 2013

I need to write numbers into a file upto 50mb and it should be fast

can any one help me how to do that?

i had written the following code..

-----------------------------------------------------------------------------------------------------------

def create_file_numbers_old(filename, size):

start = time.clock()

value = 0

with open(filename, "w") as f:

while f.tell()< size:

f.write("{0}\n".format(value))

value += 1

end = time.clock()

print "time taken to write a file of size", size, " is ", (end -start), "seconds \n"

size = 50mb

Dave Angel · May 17, 2013

If you must use googlegroups, at least read this
http://wiki.python.org/moin/GoogleGroupsPython.

size = 50mb

Most of the time is spent figuring out whether the file has reached its
limit size. If you want Python to go fast, just specify the data. On
my Linux system, it takes 11 seconds to write the first 6338888 values,
which is just under 50mb. If I write the obvious loop, writing that
many values takes .25 seconds.

Carlos Nepomuceno · May 17, 2013

I've got the following results on my desktop PC (Win7/Python2.7.5):

C:\src\Python>python -m timeit -cvn3 -r3 "execfile('fastwrite2.py')"
raw times: 123 126 125
3 loops, best of 3: 41 sec per loop

C:\src\Python>python -m timeit -cvn3 -r3 "execfile('fastwrite5.py')"
raw times: 34 34.3 34
3 loops, best of 3: 11.3 sec per loop

C:\src\Python>python -m timeit -cvn3 -r3 "execfile('fastwrite6.py')"
raw times: 0.4 0.447 0.391
3 loops, best of 3: 130 msec per loop

If you can just copy a preexisting file it will surely increase the speed to the levels you need, but doing the cStringIO operations can reduce the time in 72%.

Strangely I just realised that the time it takes to complete such scripts is the same no matter what hard drive I choose to run them. The results are the same for an SSD (main drive) and a HDD.

I think it's very strange to take 11.3s to write 50MB (4.4MB/s) sequentially on a SSD which is capable of 140MB/s.

Is that a Python problem? Why does it take the same time on the HDD?

### fastwrite2.py ### <<< this is your code
size = 50*1024*1024
value = 0
filename = 'fastwrite2.dat'
with open(filename, "w") as f:
    while f.tell()< size:
        f.write("{0}\n".format(value))
        value += 1
    f.close()

### fastwrite5.py ###
import cStringIO
size = 50*1024*1024
value = 0
filename = 'fastwrite5.dat'
x = 0
b = cStringIO.StringIO()
while x < size:
    line = '{0}\n'.format(value)
    b.write(line)
    value += 1
    x += len(line)+1
f = open(filename, 'w')
f.write(b.getvalue())
f.close()
b.close()

### fastwrite6.py ###
import shutil
src = 'fastwrite.dat'
dst = 'fastwrite6.dat'
shutil.copyfile(src, dst)

----------------------------------------

Steven D'Aprano · May 17, 2013

I've got the following results on my desktop PC (Win7/Python2.7.5):

C:\src\Python>python -m timeit -cvn3 -r3 "execfile('fastwrite2.py')" raw
times: 123 126 125
3 loops, best of 3: 41 sec per loop

Your times here are increased significantly by using execfile. Using
execfile means that instead of compiling the code once, then executing
many times, it gets compiled over and over and over and over again. In my
experience, using exec, execfile or eval makes your code ten or twenty
times slower:

[steve@ando ~]$ python -m timeit 'x = 100; y = x/3'
1000000 loops, best of 3: 0.175 usec per loop
[steve@ando ~]$ python -m timeit 'exec("x = 100; y = x/3")'
10000 loops, best of 3: 37.8 usec per loop

Strangely I just realised that the time it takes to complete such
scripts is the same no matter what hard drive I choose to run them. The
results are the same for an SSD (main drive) and a HDD.

There's nothing strange here. The time you measure is dominated by three
things, in reducing order of importance:

* the poor choice of execfile dominates the time taken;

* followed by choice of algorithm;

* followed by the time it actually takes to write to the disk, which is
probably insignificant compared to the other two, regardless of whether
you are using a HDD or SSD.

Until you optimize the code, optimizing the media is a waste of time.

I think it's very strange to take 11.3s to write 50MB (4.4MB/s)
sequentially on a SSD which is capable of 140MB/s.

It doesn't. It takes 11.3 seconds to open a file, read it into memory,
parse it, compile it into byte-code, and only then execute it. My
prediction is that the call to f.write() and f.close() probably take a
fraction of a second, and nearly all of the rest of the time is taken by
other calculations.

Carlos Nepomuceno · May 17, 2013

Thank you Steve! You are totally right!

It takes about 0.2s for the f.write() to return. Certainly because it writes to the system file cache (~250MB/s).

Using a little bit different approach I've got:

C:\src\Python>python -m timeit -cvn3 -r3 -s"from fastwrite5r import run" "run()"
raw times: 24 25.1 24.4
3 loops, best of 3: 8 sec per loop


This time it took 8s to complete from previous 11.3s.

Does those 3.3s are the time to "open, read, parse, compile" steps you told me?

If so, the execute step is really taking 8s, right?

Why does it take so long to build the string to be written? Can it get faster?

Thanks in advance!

### fastwrite5r.py ###
def run():
    import cStringIO
    size = 50*1024*1024
    value = 0
    filename = 'fastwrite5.dat'
    x = 0
    b = cStringIO.StringIO()
    while x < size:
        line = '{0}\n'.format(value)
        b.write(line)
        value += 1
        x += len(line)+1
    f = open(filename, 'w')
    f.write(b.getvalue())
    f.close()
    b.close()

if __name__ == '__main__':
    run()

----------------------------------------

From: (e-mail address removed)
Subject: Re: How to write fast into a file in python?
Date: Fri, 17 May 2013 16:42:55 +0000
To: (e-mail address removed)

I've got the following results on my desktop PC (Win7/Python2.7.5):

C:\src\Python>python -m timeit -cvn3 -r3 "execfile('fastwrite2.py')" raw
times: 123 126 125
3 loops, best of 3: 41 sec per loop

Click to expand...

Your times here are increased significantly by using execfile. Using
execfile means that instead of compiling the code once, then executing
many times, it gets compiled over and over and over and over again. In my
experience, using exec, execfile or eval makes your code ten or twenty
times slower:

[steve@ando ~]$ python -m timeit 'x = 100; y = x/3'
1000000 loops, best of 3: 0.175 usec per loop
[steve@ando ~]$ python -m timeit 'exec("x = 100; y = x/3")'
10000 loops, best of 3: 37.8 usec per loop

Strangely I just realised that the time it takes to complete such
scripts is the same no matter what hard drive I choose to run them. The
results are the same for an SSD (main drive) and a HDD.

Click to expand...

There's nothing strange here. The time you measure is dominated by three
things, in reducing order of importance:

* the poor choice of execfile dominates the time taken;

* followed by choice of algorithm;

* followed by the time it actually takes to write to the disk, which is
probably insignificant compared to the other two, regardless of whether
you are using a HDD or SSD.

Until you optimize the code, optimizing the media is a waste of time.

I think it's very strange to take 11.3s to write 50MB (4.4MB/s)
sequentially on a SSD which is capable of 140MB/s.

Click to expand...

It doesn't. It takes 11.3 seconds to open a file, read it into memory,
parse it, compile it into byte-code, and only then execute it. My
prediction is that the call to f.write() and f.close() probably take a
fraction of a second, and nearly all of the rest of the time is taken by
other calculations.

Steven D'Aprano · May 17, 2013

### fastwrite5.py ###
import cStringIO
size = 50*1024*1024
value = 0
filename = 'fastwrite5.dat'
x = 0
b = cStringIO.StringIO()
while x < size:
Â Â Â line = '{0}\n'.format(value)
Â Â Â b.write(line)
Â Â Â value += 1
Â Â Â x += len(line)+1

Oh, I forgot to mention: you have a bug in this function. You're already
including the newline in the len(line), so there is no need to add one.
The result is that you only generate 44MB instead of 50MB.

f = open(filename, 'w')
f.write(b.getvalue())
f.close()
b.close()

Here are the results of profiling the above on my computer. Including the
overhead of the profiler, it takes just over 50 seconds to run your file
on my computer.

[steve@ando ~]$ python -m cProfile fastwrite5.py
17846645 function calls in 53.575 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
1 30.561 30.561 53.575 53.575 fastwrite5.py:1(<module>)
1 0.000 0.000 0.000 0.000 {cStringIO.StringIO}
5948879 5.582 0.000 5.582 0.000 {len}
1 0.004 0.004 0.004 0.004 {method 'close' of 'cStringIO.StringO' objects}
1 0.000 0.000 0.000 0.000 {method 'close' of 'file' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
5948879 9.979 0.000 9.979 0.000 {method 'format' of 'str' objects}
1 0.103 0.103 0.103 0.103 {method 'getvalue' of 'cStringIO.StringO' objects}
5948879 7.135 0.000 7.135 0.000 {method 'write' of 'cStringIO.StringO' objects}
1 0.211 0.211 0.211 0.211 {method 'write' of 'file' objects}
1 0.000 0.000 0.000 0.000 {open}

As you can see, the time is dominated by repeatedly calling len(),
str.format() and StringIO.write() methods. Actually writing the data to
the file is quite a small percentage of the cumulative time.

So, here's another version, this time using a pre-calculated limit. I
cheated and just copied the result from the fastwrite5 output

# fasterwrite.py
filename = 'fasterwrite.dat'
with open(filename, 'w') as f:
for i in xrange(5948879): # Actually only 44MB, not 50MB.
f.write('%d\n' % i)

And the profile results are about twice as fast as fastwrite5 above, with
only 8 seconds in total writing to my HDD.

[steve@ando ~]$ python -m cProfile fasterwrite.py
5948882 function calls in 28.840 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
1 20.592 20.592 28.840 28.840 fasterwrite.py:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
5948879 8.229 0.000 8.229 0.000 {method 'write' of 'file' objects}
1 0.019 0.019 0.019 0.019 {open}

Without the overhead of the profiler, it is a little faster:

[steve@ando ~]$ time python fasterwrite.py

real 0m16.187s
user 0m13.553s
sys 0m0.508s

Although it is still slower than the heavily optimized dd command,
but not unreasonably slow for a high-level language:

[steve@ando ~]$ time dd if=fasterwrite.dat of=copy.dat
90781+1 records in
90781+1 records out
46479922 bytes (46 MB) copied, 0.737009 seconds, 63.1 MB/s

real 0m0.786s
user 0m0.071s
sys 0m0.595s

Carlos Nepomuceno · May 17, 2013

You've hit the bullseye!

Thanks a lot!!!

Oh, I forgot to mention: you have a bug in this function. You're already
including the newline in the len(line), so there is no need to add one.
The result is that you only generate 44MB instead of 50MB.

That's because I'm running on Windows.
What's the fastest way to check if '\n' translates to 2 bytes on file?

Here are the results of profiling the above on my computer. Including the
overhead of the profiler, it takes just over 50 seconds to run your file
on my computer.

[steve@ando ~]$ python -m cProfile fastwrite5.py
17846645 function calls in 53.575 seconds

Didn't know the cProfile module.Thanks a lot!

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
1 30.561 30.561 53.575 53.575 fastwrite5.py:1(<module>)
1 0.000 0.000 0.000 0.000 {cStringIO.StringIO}
5948879 5.582 0.000 5.582 0.000 {len}
1 0.004 0.004 0.004 0.004 {method 'close' of 'cStringIO.StringO' objects}
1 0.000 0.000 0.000 0.000 {method 'close' of 'file' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
5948879 9.979 0.000 9.979 0.000 {method 'format' of 'str' objects}
1 0.103 0.103 0.103 0.103 {method 'getvalue' of 'cStringIO.StringO' objects}
5948879 7.135 0.000 7.135 0.000 {method 'write' of 'cStringIO.StringO' objects}
1 0.211 0.211 0.211 0.211 {method 'write' of 'file' objects}
1 0.000 0.000 0.000 0.000 {open}

As you can see, the time is dominated by repeatedly calling len(),
str.format() and StringIO.write() methods. Actually writing the data to
the file is quite a small percentage of the cumulative time.

So, here's another version, this time using a pre-calculated limit. I
cheated and just copied the result from the fastwrite5 output

# fasterwrite.py
filename = 'fasterwrite.dat'
with open(filename, 'w') as f:
for i in xrange(5948879): # Actually only 44MB, not 50MB.
f.write('%d\n' % i)

I had the same idea but kept the original method because I didn't want to waste time creating a function for calculating the actual number of iterations needed to deliver 50MB of data.

And the profile results are about twice as fast as fastwrite5 above, with
only 8 seconds in total writing to my HDD.

[steve@ando ~]$ python -m cProfile fasterwrite.py
5948882 function calls in 28.840 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)
1 20.592 20.592 28.840 28.840 fasterwrite.py:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
5948879 8.229 0.000 8.229 0.000 {method 'write' of 'file' objects}
1 0.019 0.019 0.019 0.019 {open}

I thought there would be a call to format method by "'%d\n' % i". It seems the % operator is a lot faster than format.
I just stopped using it because I read it was going to be deprecated.

Why replace such a great and fast operator by a slow method? I mean, why format is been preferred over %?

Without the overhead of the profiler, it is a little faster:

[steve@ando ~]$ time python fasterwrite.py

real 0m16.187s
user 0m13.553s
sys 0m0.508s

Although it is still slower than the heavily optimized dd command,
but not unreasonably slow for a high-level language:

[steve@ando ~]$ time dd if=fasterwrite.dat of=copy.dat
90781+1 records in
90781+1 records out
46479922 bytes (46 MB) copied, 0.737009 seconds, 63.1 MB/s

real 0m0.786s
user 0m0.071s
sys 0m0.595s

Carlos Nepomuceno · May 17, 2013

Think the following update will make the code more portable:

x += len(line)+len(os.linesep)-1

Not sure if it's the fastest way to achieve that. :/

Steven D'Aprano · May 18, 2013

I thought there would be a call to format method by "'%d\n' % i". It
seems the % operator is a lot faster than format. I just stopped using
it because I read it was going to be deprecated. Why replace such a
great and fast operator by a slow method? I mean, why format is been
preferred over %?

That is one of the most annoying, pernicious myths about Python, probably
second only to "the GIL makes Python slow" (it actually makes it fast).

String formatting with % is not deprecated. It will not be deprecated, at
least not until Python 4000.

The string format() method has a few advantages: it is more powerful,
consistent and flexible, but it is significantly slower.

Probably the biggest disadvantage to % formatting, and probably the main
reason why it is "discouraged", is that it treats tuples specially.
Consider if x is an arbitrary object, and you call "%s" % x:

py> "%s" % 23 # works
'23'
py> "%s" % [23, 42] # works
'[23, 42]'

and so on for *almost* any object. But if x is a tuple, strange things
happen:

py> "%s" % (23,) # tuple with one item looks like an int
'23'
py> "%s" % (23, 42) # tuple with two items fails
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: not all arguments converted during string formatting

So when dealing with arbitrary objects that you cannot predict what they
are, it is better to use format.

Chris Angelico · May 18, 2013

Consider if x is an arbitrary object, and you call "%s" % x:

py> "%s" % 23 # works
'23'
py> "%s" % [23, 42] # works
'[23, 42]'

and so on for *almost* any object. But if x is a tuple, strange things
happen

Which can be guarded against by wrapping it up in a tuple. All you're
seeing is that the shortcut notation for a single parameter can't
handle tuples.
return "%s" % (x,)

show(23) '23'
show((23,)) '(23,)'
show([23,42])

Click to expand...

Click to expand...

'[23, 42]'

One of the biggest differences between %-formatting and str.format is
that one is an operator and the other a method. The operator is always
going to be faster, but the method can give more flexibility (not that
I've ever needed or wanted to override anything).
2 0 LOAD_CONST 1 ('%s')
3 LOAD_FAST 0 (x)
6 BUILD_TUPLE 1
9 BINARY_MODULO
10 RETURN_VALUE2 0 LOAD_CONST 1 ('{}')
3 LOAD_ATTR 0 (format)
6 LOAD_FAST 0 (x)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 RETURN_VALUE

Attribute lookup and function call versus binary operator. Potentially
a lot of flexibility, versus basically hard-coded functionality. But
has anyone ever actually made use of it?

str.format does have some cleaner features, like naming of parameters:
'1 vs 2'

Extremely handy when you're working with hugely complex format
strings, and the syntax feels a bit clunky in % (also, it's not
portable to other languages, which is one of %-formatting's biggest
features). Not a huge deal, but if you're doing a lot with that, it
might be a deciding vote.

ChrisA

Fábio Santos · May 18, 2013

Think the following update will make the code more portable:

x += len(line)+len(os.linesep)-1

Not sure if it's the fastest way to achieve that. :/

Putting len(os.linesep)'s value into a local variable will make accessing
it quite a bit faster. But why would you want to do that?

You mentioned "\n" translating to two lines, but this won't happen. Windows
will not mess with what you write to your file. It's just that
traditionally windows and windows programs use \r\n instead of just \n. I
think it was for compatibility with os/2 or macintosh (I don't remember
which), which used \r for newlines.

You don't have to follow this convention. If you open a \n-separated file
with *any* text editor other than notepad, your newlines will be okay.

88888 Dihedral · May 18, 2013

Steven D'Apranoæ–¼ 2013å¹´5æœˆ18æ—¥æ˜ŸæœŸå…UTC+8ä¸‹åˆ12æ™‚01åˆ†13ç§’å¯«é“ï¼š

That is one of the most annoying, pernicious myths about Python, probably

second only to "the GIL makes Python slow" (it actually makes it fast).

The print function in python is designed to print
any printable object with a valid string representation.

The format part of the print function has to construct
a printable string according to the format string
and the variables passed in on the fly.

If the acctual string to be printed in the format processing
is obtained in the high level OOP way , then it is definitely
slow due to the high level overheads and generality requirements.

Chris Angelico · May 18, 2013

Putting len(os.linesep)'s value into a local variable will make accessingit
quite a bit faster. But why would you want to do that?

You mentioned "\n" translating to two lines, but this won't happen. Windows
will not mess with what you write to your file. It's just that traditionally
windows and windows programs use \r\n instead of just \n. I think it was for
compatibility with os/2 or macintosh (I don't remember which), which used\r
for newlines.

You don't have to follow this convention. If you open a \n-separated file
with *any* text editor other than notepad, your newlines will be okay.

Into two characters, not two lines, but yes. A file opened in text
mode on Windows will have its lines terminated with two characters.
(And it's old Macs that used to use \r. OS/2 follows the DOS
convention of \r\n, but again, many apps these days are happy with
Unix newlines there too.)

ChrisA

Carlos Nepomuceno · May 18, 2013

Python really writes '\n\r' on Windows. Just check the files.

Internal representations only keep '\n' for simplicity, but if you wanna keep track of the file length you have to take that into account.

________________________________

Dennis Lee Bieber · May 18, 2013

tOn Sat, 18 May 2013 08:49:55 +0100, Fábio Santos
<[email protected]> declaimed the following in
gmane.comp.python.general:

You mentioned "\n" translating to two lines, but this won't happen. Windows
will not mess with what you write to your file. It's just that
traditionally windows and windows programs use \r\n instead of just \n. I
think it was for compatibility with os/2 or macintosh (I don't remember
which), which used \r for newlines.

Neither... It goes back to Teletype machines where one sent a
carriage return to move the printhead back to the left, then sent a line
feed to advance the paper (while the head was still moving left), and in
some cases also provided a rub-out character (a do-nothing) to add an
additional character time delay.

TRS-80 Mod 1-4 used <cr> for "new line", I believe Apple used <lf>
for "new line"... And both lost the ability to move down the page
without also resetting the carriage to the left. In a world where both
<cr><lf> is used, one could draw a vertical line of | by just spacing
across the first line, printing |, then repeat <lf><bkspc>| until done.
To do the same with conventional <lf> is "new line/return" one has to
transmit all those spaces for each line...

At 300baud, that took time....

Roy Smith · May 18, 2013

Dennis Lee Bieber said:
tOn Sat, 18 May 2013 08:49:55 +0100, Fábio Santos
<[email protected]> declaimed the following in
gmane.comp.python.general:

Neither... It goes back to Teletype machines where one sent a
carriage return to move the printhead back to the left, then sent a line
feed to advance the paper (while the head was still moving left), and in
some cases also provided a rub-out character (a do-nothing) to add an
additional character time delay.

The delay was important. It took more than one character time for the
print head to get back to the left margin. If you kept sending
printable characters while the print head was still flying back, they
would get printed in the middle of the line (perhaps blurred a little).

There was also a dashpot which cushioned the head assembly when it
reached the left margin. Depending on how well adjusted things were
this might take another character time or two to fully settle down.

You can still see the remnants of this in modern Unix systems:

$ stty -a
speed 9600 baud; rows 40; columns 136; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = M-^?; eol2
= M-^?; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z;
rprnt = ^R; werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon
-ixoff -iuclc ixany imaxbel -iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0
vt0 ff0
isig icanon iexten echo echoe -echok -echonl -noflsh -xcase -tostop
-echoprt echoctl echoke

The "nl0" and "cr0" mean it's configured to insert 0 delay after
newlines and carriage returns. Whether setting a non-zero delay
actually does anything useful anymore is an open question, but the
drivers still accept the settings.

Dan Stromberg · May 18, 2013

With CPython 2.7.3:
../t
time taken to write a file of size 52428800 is 15.86 seconds

time taken to write a file of size 52428800 is 7.91 seconds

time taken to write a file of size 52428800 is 9.64 seconds

With pypy-1.9:
../t
time taken to write a file of size 52428800 is 3.708232 seconds

time taken to write a file of size 52428800 is 4.868304 seconds

time taken to write a file of size 52428800 is 1.93612 seconds

Here's the code:
#!/usr/local/pypy-1.9/bin/pypy
#!/usr/bin/python

import sys
import time
import StringIO

sys.path.insert(0, '/usr/local/lib')
import bufsock

def create_file_numbers_old(filename, size):
start = time.clock()

value = 0
with open(filename, "w") as f:
while f.tell() < size:
f.write("{0}\n".format(value))
value += 1

end = time.clock()

print "time taken to write a file of size", size, " is ", (end -start),
"seconds \n"

def create_file_numbers_bufsock(filename, intended_size):
start = time.clock()

value = 0
with open(filename, "w") as f:
bs = bufsock.bufsock(f)
actual_size = 0
while actual_size < intended_size:
string = "{0}\n".format(value)
actual_size += len(string) + 1
bs.write(string)
value += 1
bs.flush()

end = time.clock()

print "time taken to write a file of size", intended_size, " is ", (end
-start), "seconds \n"

def create_file_numbers_file_like(filename, intended_size):
start = time.clock()

value = 0
with open(filename, "w") as f:
file_like = StringIO.StringIO()
actual_size = 0
while actual_size < intended_size:
string = "{0}\n".format(value)
actual_size += len(string) + 1
file_like.write(string)
value += 1
file_like.seek(0)
f.write(file_like.read())

end = time.clock()

print "time taken to write a file of size", intended_size, " is ", (end
-start), "seconds \n"

create_file_numbers_old('output.txt', 50 * 2**20)
create_file_numbers_bufsock('output2.txt', 50 * 2**20)
create_file_numbers_file_like('output3.txt', 50 * 2**20)

Fábio Santos · May 18, 2013

tOn Sat, 18 May 2013 08:49:55 +0100, Fábio Santos
<[email protected]> declaimed the following in
gmane.comp.python.general:

Neither... It goes back to Teletype machines where one sent a
carriage return to move the printhead back to the left, then sent a line
feed to advance the paper (while the head was still moving left), and in
some cases also provided a rub-out character (a do-nothing) to add an
additional character time delay.

TRS-80 Mod 1-4 used <cr> for "new line", I believe Apple used <lf>
for "new line"... And both lost the ability to move down the page
without also resetting the carriage to the left. In a world where both
<cr><lf> is used, one could draw a vertical line of | by just spacing
across the first line, printing |, then repeat <lf><bkspc>| until done.
To do the same with conventional <lf> is "new line/return" one has to
transmit all those spaces for each line...

At 300baud, that took time....

Python really writes '\n\r' on Windows. Just check the files.

Internal representations only keep '\n' for simplicity, but if you wanna keep track of the file length you have to take that into account.

Into two characters, not two lines, but yes. A file opened in text
mode on Windows will have its lines terminated with two characters.
(And it's old Macs that used to use \r. OS/2 follows the DOS
convention of \r\n, but again, many apps these days are happy with
Unix newlines there too.)

ChrisA

Thanks for your corrections and explanations. I stand corrected and
have learned something.

Python Fast I/o	5	Jan 14, 2014
How to change key name in json file with python	0	Oct 2, 2022
Fast forward-backward (write-read)	7	Oct 23, 2012
Python battle game help	2	Feb 23, 2023
How to add echo to .wav file in python ??	3	Jan 12, 2023
How do I save information from an GUI into a XML-file?	0	Aug 17, 2022
How do I write a script to generate 10 random EVEN numbers and writethem to a .txt file?	3	Jul 8, 2013
How to use Densenet121 in monai	0	Feb 16, 2024

How to write fast into a file in python?

lokeshkoppaka

Steven D'Aprano

lokeshkoppaka

Dave Angel

Carlos Nepomuceno

Steven D'Aprano

Carlos Nepomuceno

Steven D'Aprano

Carlos Nepomuceno

Carlos Nepomuceno

Steven D'Aprano

Chris Angelico

Fábio Santos

88888 Dihedral

Chris Angelico

Carlos Nepomuceno

Dennis Lee Bieber

Roy Smith

Dan Stromberg

Fábio Santos

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads