Estimating memory use?

R

Roy Smith

I've got a large text processing task to attack (it's actually a genomics
task; matching DNA probes against bacterial genomes). I've got roughly
200,000 probes, each of which is a 25 character long text string. My first
thought is to compile these into 200,000 regexes, but before I launch into
that, I want to do a back of the envelope guess as to how much memory that
will take.

Is there any easy way to find out how much memory a Python object takes?
If there was, it would be simple to compile a random small collection of
these patterns (say, 100 of them), and add up the sizes of the resulting
regex objects to get a rough idea of how much memory I'll need. I realize
I could just compile them all and watch the size of the Python process
grow, but that seems somewhat brute force.
 
T

Tim N. van der Leeuw

Hi,

What is your 'static' data (database), and what is your input-data?
Those 200.000 probes are your database? Perhaps they can be stored as
pickled compiled regexes and thus be loaded in pickled form; then you
don't need to keep them all in memory at once -- if you fear that
memory usage will be too big.

I don't know if perhaps other string-matching techniques can be used
btw; you don't need the full power of regexes I guess to match DNA
string patterns.
Perhaps you should investigate that a bit, and do some performance
tests?

cheers,

--Tim
 
R

Roy Smith

No, but there are a few early attempts out there at supplying SOME ways
(not necessarily "easy", but SOME). For example, PySizer, at
<http://pysizer.8325.org/>.


Alex

Looks interesting, thanks.

I've already discovered one (very) surprising thing -- if I build a dict
containing all my regexes (takes about 3 minutes on my PowerBook) and
pickle them to a file, re-loading the pickle takes just about as long as
compiling them did in the first place.
 
F

Fredrik Lundh

Roy said:
I've already discovered one (very) surprising thing -- if I build a dict
containing all my regexes (takes about 3 minutes on my PowerBook) and
pickle them to a file, re-loading the pickle takes just about as long as
compiling them did in the first place.

the internal RE byte code format is version dependent, so pickle stores the
patterns instead.

</F>
 
M

MrJean1

There is a function mx_sizeof() in the mx.Tools module from eGenix
which may be helpful. More at


<http://www.egenix.com/files/python/eGenix-mx-Extensions.html#mxTools>

/Jean Brouwers


PS) This is an approximation for memory usage which is useful in
certain, simple cases.

Each built-in type has an attribute __basicsize__ which is the size in
bytes needed to represent the basic type. For example
str.__basicsize__ returns 24 and int.__basictype__ returns 12.

However, __basicsize__ does not include the space needed to store the
object value. For a string, the length of the string has to be added
(times the character width). For example, the size of string "abcd"
would at least approximately str.__basicsize__ + len("abcd") bytes,
assuming single byte characters.

In addition, memory alignment should be taken into account by rounding
the size up to the next multiple of 8 (or maybe 16, depending on
platform, etc.).

An approximation for the amount of memory used by a string S (of single
byte characters) aligned to A bytes would be

(str.__basicsize__ + len(S) + A - 1) & A

Things are more complicated for types like list, tuple and dict and
instances of a class.
 
M

MrJean1

The name of the function in mx.Tools is sizeof() and not mx_sizeof().
My apologies.

Also, it turns out that the return value of mx.Tools.sizeof() function
is non-aligned. For example mx.Tools.sizeof("abcde") returns 29 which
is fine, but not entirely "accurate".

/Jean Brouwers
 
?

=?iso-8859-1?Q?Fran=E7ois?= Pinard

[Fredrik Lundh]
the internal RE byte code format is version dependent, so pickle
stores the patterns instead.

Oh! Nice to know. That explains why, when I was learning Python, my
initial experiment with pickles left me with the (probably wrong)
feeling that they were not worth the trouble.

It might be worth a note in the documentation, somewhere appropriate.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top