G
gry
Dear pythonistas,
I am writing a tiny utility to produce a file consisting of a
specified number of lines of a given length of random ascii
characters. I am hoping to find a more time and memory efficient way,
that is still fairly simple clear, and _pythonic_.
I would like to have something that I can use at both extremes of
data:
32M chars per line * 100 lines
or
5 chars per line * 1e8 lines.
E.g., the output of bigrand.py for 10 characters, 2 lines might be:
gw2+M/5t&.
S[[db/l?Vx
I'm using python 2.7.0 on linux. I need to use only out-of-the box
modules, since this has to work on a bunch of different computers.
At this point I'm especially concerned with the case of a few very
long lines, since that seems to use a lot of memory, and take a long
time.
Characters are a slight subset of the printable ascii's, specified in
the examples below. My first naive try was:
from sys import stdout
import random
nchars = 32000000
rows = 10
avail_chrs =
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&
\'()*+,-./:;<=>?@[\\]^_`{}'
def make_varchar(nchars):
return (''.join([random.choice(avail_chrs) for i in
range(nchars)]))
for l in range(rows):
stdout.write(make_varchar(nchars))
stdout.write('\n')
This version used around 1.2GB resident/1.2GB virtual of memory for
3min 38sec.
My second try uses much less RAM, but more CPU time, and seems rather,
umm, un-pythonic (the array module always seems a little un
pythonic...)
from sys import stdout
from array import array
import random
nchars = 32000000
rows = 10
avail_chrs =
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&
\'()*+,-./:;<=>?@[\\]^_`{}'
a = array('c', 'X' * nchars)
for l in range(rows):
for i in xrange(nchars):
a = random.choice(avail_chrs)
a.tofile(stdout)
stdout.write('\n')
This version using array took 4 min, 29 sec, using 34MB resident/110
virtual. So, much smaller than the first attempt, but a bit slower.
Can someone suggest a better code? And help me understand the
performance issues here?
-- George
I am writing a tiny utility to produce a file consisting of a
specified number of lines of a given length of random ascii
characters. I am hoping to find a more time and memory efficient way,
that is still fairly simple clear, and _pythonic_.
I would like to have something that I can use at both extremes of
data:
32M chars per line * 100 lines
or
5 chars per line * 1e8 lines.
E.g., the output of bigrand.py for 10 characters, 2 lines might be:
gw2+M/5t&.
S[[db/l?Vx
I'm using python 2.7.0 on linux. I need to use only out-of-the box
modules, since this has to work on a bunch of different computers.
At this point I'm especially concerned with the case of a few very
long lines, since that seems to use a lot of memory, and take a long
time.
Characters are a slight subset of the printable ascii's, specified in
the examples below. My first naive try was:
from sys import stdout
import random
nchars = 32000000
rows = 10
avail_chrs =
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&
\'()*+,-./:;<=>?@[\\]^_`{}'
def make_varchar(nchars):
return (''.join([random.choice(avail_chrs) for i in
range(nchars)]))
for l in range(rows):
stdout.write(make_varchar(nchars))
stdout.write('\n')
This version used around 1.2GB resident/1.2GB virtual of memory for
3min 38sec.
My second try uses much less RAM, but more CPU time, and seems rather,
umm, un-pythonic (the array module always seems a little un
pythonic...)
from sys import stdout
from array import array
import random
nchars = 32000000
rows = 10
avail_chrs =
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&
\'()*+,-./:;<=>?@[\\]^_`{}'
a = array('c', 'X' * nchars)
for l in range(rows):
for i in xrange(nchars):
a = random.choice(avail_chrs)
a.tofile(stdout)
stdout.write('\n')
This version using array took 4 min, 29 sec, using 34MB resident/110
virtual. So, much smaller than the first attempt, but a bit slower.
Can someone suggest a better code? And help me understand the
performance issues here?
-- George