Two candies

B

bearophileHUGS

Two little things I may like added.

1) A fast and memory efficient way to create an array.array at a
certain size.
At the moment I can see some ways to do it:

from array import array
from itertools import repeat
a = array("l", xrange(n)) #1
a = array("l", [0]*n)) #2
a = array("l", repeat(0, n)) #3

- #1 doesn't initialize it to a constant, and to me it seems not
memory efficient.
- #2 is faster, but I think it requires twice the memory (so it's not
good when a has to be quite large).
- #3 interacts badly with Psyco (Psyco doesn't digest itertools at
all), and it seems not memory efficient.

So a simple syntax can be added, like a method a.allocate(item, n),
that takes an item and the length of the array to create:

a = array("l")
a.allocate(0, n)

(Note: I know about external numerical packages).
In alternative Psyco may be improved a bit, so the #3 syntax may
become the standard (efficient in memory and space) one.

---------------------------

2) When I use MatchObjects I have to look at the docs to remember the
difference between group() and groups() etc. So I suggest to add a
__getitem__ method to MatchObject, so this:

mo[3]

Equals to:

mo.group(3)

Bye,
bearophile
 
R

Raymond Hettinger

[bearophileH]
1) A fast and memory efficient way to create an array.array at a
certain size.
At the moment I can see some ways to do it:

from array import array
from itertools import repeat
a = array("l", repeat(0, n)) #3 . . .
- #3 interacts badly with Psyco (Psyco doesn't digest itertools at
all), and it seems not memory efficient.

#3 is a fine choice. It is memory efficient -- the repeat() itertool
takes-up only a few bytes. It doesn't need psyco, you already have to
fast C routines talking to each other without having to go through the
interpreter loop.

Raymond
 
B

bearophileHUGS

First of all, thank you Raymond for your answer, and a happy new year
to you and to all the Python group :)

Raymond:
#3 is a fine choice. It is memory efficient -- the repeat() itertool takes-up only a few bytes. It doesn't need psyco, you already have to fast C routines talking to each other without having to go through the interpreter loop.<

In my code I have found otherwise, so to be more sure I have just done
few benchmarks on a Pentium3 CPU, 256 MB RAM, ActivePython 2.5.1.1, on
Win.

It may sound unfair, but I think it's useful to have some basic
reference timing, so I taken timings from a program written in D
language too (it's like a better C++ mixed with some Python and other
things), this is the D code:

import std.stdio, std.conv, std.string, std.c.stdlib, std.c.time;
void main(string[] args) {
int n = toInt(args[1].replace("_", ""));
auto t0 = clock();
if (args[2].strip() == "1")
auto a = new int[n];
else
auto a = cast(int*)calloc(n, int.sizeof);
auto t1 = clock();
writefln("%.2f s", (t1-t0)/cast(double)CLOCKS_PER_SEC);
}

As input it takes n and 1 or 2,
"Warm" timings:

n time-1 time-2
1_000_000 0.04 0.04
10_000_000 0.44 0.42
30_000_000 1.32 1.26

In both cases the a array is initialized to all 0.
D "int" is 4 always bytes. The memory used by this little program is
essentially what you ask for, so it is 4 MB for n = 1 million, 40 MB
for n = 10 millions, and 120 MB for n = 30 millions (plus few other
small things, like the stack, etc. All the other extra memory is
probably less than 800 KB.).


This is the Python code, as input it takes n and 1, 2 or 3:

from array import array
from itertools import repeat
from time import clock
from sys import argv

def main():
n = int(argv[1].replace("_", ""))
choice = int(argv[2])
assert choice in [1, 2, 3]

if choice == 1:
t0 = clock()
a = array("l", xrange(n))
t1 = clock()
if choice == 2:
t0 = clock()
a = array("l", [0] * n)
t1 = clock()
if choice == 3:
t0 = clock()
a = array("l", repeat(0, n))
t1 = clock()

print round(t1 - t0, 2), "s"
assert len(a) == n

main()


"Warm" timings:

Time (seconds):
n #1 #2 #3
1_000_000 1.98 0.79 1.91
10_000_000 21.73 7.97 21.13
30_000_000 70. 28.55 65.3

Approximate peak MB RAM used:
n #1 #2 #3
1_000_000 9 10.3 6.6
10_000_000 60 81 64
30_000_000 184 210+ 200+

At runtime the Python interpreter plus other little things take some
memory, about 1.7 MB.
In the cases with the "+" sign the RAM was not enough, and I've had
swaps, so timings are probably different if you have more RAM (but I
think this benchmark is significant anyway, because IMHO an array of
integers of 120 MB isn't supposed to make your OS swap your 256 MB of
memory if you allocate it properly).
To me it seems that array("l",repeat(0,n)) eats lot of memory, and
it's quite slower than array("l",[0]*n) too. (I presume
array("l",repeat(0,n)) converts the python int 0 to the signed 4-byte
int over and over again...). So I belive still an allocate-like method
may be useful to the array object. It may allocate the data structure
with n=30 millions in probably less than 2 seconds. In the meantime
such arrays I presume I'll use an external numerical lib :)

Bye,
bearophile
 
R

Raymond Hettinger

[Raymond]
#3 is a fine choice. It is memory efficient -- the repeat() itertool takes-up only a few bytes. It doesn't need psyco, you already have to fast C routines talking to each other without having to go through the interpreter loop.<
[bearophile]
In my code I have found otherwise, so to be more sure I have just done
few benchmarks on a Pentium3 CPU, 256 MB RAM, ActivePython 2.5.1.1, on
Win.

I applaud your efforts to measure performance -- that is a reasonably
good way to find-out the best approach (though you have to be very
careful about what you measure, how you measure it, that the system
state hasn't changed between measurements, and how your interpret the
results).

Here's a few thoughts based on knowing what's under-the-hood.

* xrange() and repeat() objects only take-up a few bytes.

* xrange() creates a new integer object for every iteration

* repeat() will re-use the same object over and over

* the counter for repeat() does not use integer objects

* the list approach takes 4 bytes of memory per entry (pointers to a
single object)

* lists, xrange objects, and repeat objects all support the iteration
protocol

* lists and xrange objects also support the sequence protocol (getitem
and len)

* array_new has a special case for lists -- it uses the known length
to pre-size the array and it uses the getitem protocol to access the
elements

* array_new handles repeat() and xrange() by using the iterator
protocol and it grows the array one element at a time, with periodic
resizing and over-allocation -- this is dog slow

* in most apps (except for sparse arrays), the initialization time for
an array is dominated by the time spent actually doing something
useful with the array (iow, this is an odd place to be optimizing)

* the array module could be made a little smarter by checking the
iterators for a hint about their size (this would speed-up the
xrange() and repeat() versions considerably).

* the array module is not designed for speed -- it is all about
storing data compactly

* in contract, numpy and other numerical apps are more thoroughly
optimized

* memory consumption is very difficult to measure since the operating
system has a say in the matter (ask for 1 byte of allocation and you
may get an enormous chunk in return).

That being said, I'll cut to the chase:

* sometimes its easy to get so wrapped-up in thinking this though,
that the obvious gets missed.

* try adding this one to your test suite:

a = array('l', [0]) * n

Raymond
 
T

Terry Jones

Raymond> * in most apps (except for sparse arrays), the initialization time
Raymond> for an array is dominated by the time spent actually doing
Raymond> something useful with the array (iow, this is an odd place to be
Raymond> optimizing)

This brings to mind an old algorithms chestnut (exercise 2.12 in the 1st
edition of Aho, Hopcroft & Ullman [1]):

If the implementation of an algorithm uses (for simplicity's sake) a square
array to represent its data, why are _all_ such algorithms not necessarily
O(n^2) due simply to the initialization requirement (supposing a model of
computation that counts assignments)?

Terry

[1] http://www.amazon.com/Analysis-Algorithms-Addison-Wesley-Information-Processing/dp/0201000296
 
B

bearophileHUGS

Raymond:
though you have to be very careful about what you measure, how you measure it, that the system state hasn't changed between measurements, and how your interpret the results).<

Right. This is part of the list of things they teach you to care of
when you want to make experiments in biology, etc, too :)

* the list approach takes 4 bytes of memory per entry (pointers to a single object)<

I didn't know this...

* try adding this one to your test suite: a = array('l', [0]) * n<

Time ago I used to use that way because it was faster, but then later
I have forgotten it :)
Here are the results:

"Warm" timings, best of 3 (seconds):
n #1 #2 #3 #4
1_000_000 1.98 0.79 1.91 0.08
10_000_000 21.73 7.97 21.13 0.85
30_000_000 70. 28.55 65.3 2.55

Approximate peak MB RAM used:
n #1 #2 #3 #4
1_000_000 9 10.3 6.6 ~6?
10_000_000 60 81 64 ~34?
30_000_000 184 210+ 200+ ~119?

This fixes the problem 1 of my original post. I suggest to add a line
to the Python docs of the array module to explain about this way of
creating the uniform array (the second problem regards the regular
expression object).

Bye and thank you,
a bear hug,
bearophile
 
A

Aahz

2) When I use MatchObjects I have to look at the docs to remember the
difference between group() and groups() etc. So I suggest to add a
__getitem__ method to MatchObject, so this:

mo[3]

Equals to:

mo.group(3)

Patches are wonderful, but if not, please post this RFE to the bug
tracker.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top