Few questions

B

bearophile

Hello, I have few more things to say/ask (left from a discussion in
another Python Newsgroup).

Is it possibile (and useful) to write few small sub-sections of the
Python interpreter in Assembly for Pentium (III/IV)/AMD, to speed up
the interpreter for Win/Linux boxes running on those CPUs? (Such parts
don't replace the C versions, kept for compatibilty).
I think the HLA (High Level Assembly) language can be fit for this
purpose, it's a cute language:
http://webster.cs.ucr.edu/AsmTools/HLA/index.html

--------

I've done a little comparison of the speed of Python lists and arrays:

# speed_test.py
from time import clock
import sys

def array_test():
from array import array
v = array("l", [0] * n)
t = clock()
for i in xrange(len(v)):
v = i
print "Timing:", round(clock()-t,3), "s"

def list_test():
v = [0] * n
t = clock()
for i in xrange(len(v)):
v = i
print "Timing:", round(clock()-t,3), "s"

n= 3*10**6
if str(sys.argv[1]) == "1":
print "List test, n =", str(n) + ":"
list_test()
else:
print "Array test, n =", str(n) + ":"
array_test()


On a old Win2K PC it gives:
C:\py>speed_test 1
List test, n = 3000000:
Timing: 2.804 s

C:\py>speed_test 2
Array test, n = 3000000:
Timing: 3.521 s


Python lists are arrays of pointers to objects, I think (a test shows
that here they use about 16 bytes for every number).
And the Python Arrays are packed: every number here uses 4 bytes.
Why do lists are faster here?

------

Memory cleaning: in the last script I've added some calls to a Win
version of the small "pslist" program, and I've put a "del v" command
after the timings. And I've seen:

C:\py>speed_test 1
List test, n = 3000000:
1) Process size: 1408 KB.
2) Process size: 48928 KB.
3) Process size: 37204 KB.

C:\py>speed_test 2
Array test, n = 3000000:
1) Process size: 1416 KB.
2) Process size: 13152 KB.
3) Process size: 1416 KB.

1 is at the start of the script before v creation, 2 is after its
inizialization loop, and 3 is after the "del v" command, like this:

used_mem(1)
from array import array
v = array("l", [0] * n)
for i in xrange(len(v)):
v = i
used_mem(2)
del v
used_mem(3)

The garbage collector removes at once the array (this is easy, it's
just a lump of memory with little extra things), but the memory used
by the list isn't free even a little time later. (I think that to
understand how/when such such garbage collector works, I have to read
the Python C sources...)

Thank you,
bearophile
 
J

Josiah Carlson

Hello, I have few more things to say/ask (left from a discussion in
another Python Newsgroup).

Is it possibile (and useful) to write few small sub-sections of the
Python interpreter in Assembly for Pentium (III/IV)/AMD, to speed up
the interpreter for Win/Linux boxes running on those CPUs? (Such parts
don't replace the C versions, kept for compatibilty).
I think the HLA (High Level Assembly) language can be fit for this
purpose, it's a cute language:
http://webster.cs.ucr.edu/AsmTools/HLA/index.html

It may or may not be useful or faster to implement portions of the
Python interpreter in assembly. Generally assembly has performance
benefits and penalties per processor that are hard to understand.

I would be willing to wager that time would be better spent checking out
the different compile-time options for the interpreter, as C optimizes
fairly well.


I've done a little comparison of the speed of Python lists and arrays: [snip code]
On a old Win2K PC it gives:
C:\py>speed_test 1
List test, n = 3000000:
Timing: 2.804 s

C:\py>speed_test 2
Array test, n = 3000000:
Timing: 3.521 s


Python lists are arrays of pointers to objects, I think (a test shows
that here they use about 16 bytes for every number).
And the Python Arrays are packed: every number here uses 4 bytes.
Why do lists are faster here?

Crucial observation:
Lists are indeed arrays of pointers that point to 'int objects'.
Arrays (of integers) are arrays of actual stored x-bit integers (where x
is 32 in this case).

In order to write to an array the value of a standard Python integer,
one must look into the 16 byte Python integer to copy the proper 4 bytes
into the array, do bounds checking, etc.

In order to write to a list the value of a standard Python integer, a
pointer copy is sufficient.


Lists win because it is a pointer copy as opposed to an struct lookup
and value copy with bounds checking.

Memory cleaning: in the last script I've added some calls to a Win
version of the small "pslist" program, and I've put a "del v" command
after the timings. And I've seen:

C:\py>speed_test 1
List test, n = 3000000:
1) Process size: 1408 KB.
2) Process size: 48928 KB.
3) Process size: 37204 KB.

C:\py>speed_test 2
Array test, n = 3000000:
1) Process size: 1416 KB.
2) Process size: 13152 KB.
3) Process size: 1416 KB.

1 is at the start of the script before v creation, 2 is after its
inizialization loop, and 3 is after the "del v" command, like this:

used_mem(1)
from array import array
v = array("l", [0] * n)
for i in xrange(len(v)):
v = i
used_mem(2)
del v
used_mem(3)

The garbage collector removes at once the array (this is easy, it's
just a lump of memory with little extra things), but the memory used
by the list isn't free even a little time later. (I think that to
understand how/when such such garbage collector works, I have to read
the Python C sources...)



Python arrays (from the array module) are allocated as a block, and the
values of integers are stored within. Because everything is all nice
and contained, it can be easily freed (just like C arrays).

With Python lists, certainly the pointers to the integer objects are
easily allocated and freed, and the integer objects themselves sit in
the integer free-list.

Now, obviously a bunch of those entries aren't being used after one
deletes the big list of integer, so why isn't it being freed? Due to
the semantics of the free list (you can't reorganize the integers on the
free list, etc.), it cannot be reduced in size.

- Josiah
 
I

Istvan Albert

bearophile wrote:

but the memory used by the list isn't free even a little time later.

Moreover won't ever be freed (from the operating system's view)
until the program ends. That's how C works and has nothing to do
with the kind of object that was allocated for. The only thing that
a free() operation is required to do is to make the freed
memory available for allocation within the same program.

Istvan.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,771
Messages
2,569,587
Members
45,097
Latest member
RayE496148
Top