Light slices + COW

bearophileHUGS · May 3, 2008

Sometimes different languages suggests me ways to cross-pollinate
them.

(Note: probably someone has already suggested (and even implemented)
the following ideas, but I like to know why they aren't fit for
Python).

Python generators now allow me to package some code patterns common in
my code, that other (older) languages didn't allow me to extract; for
example given a sequence (array, list, etc) of items now and then I
need to scan all the upper triangle of their cross matrix:

def xpairs(seq):

len_seq = len(seq)

for i, e1 in enumerate(seq):

for j in xrange(i+1, len_seq):

yield e1, seq[j]

Or adjacent ones (there are ways to generalize the following function,
but this the most common situation for me):

def xpairwise(iterable):

return izip(iterable, islice(iterable, 1, None))

That xpairs() generator is nice, but it's not the best possible code
(but you may disagree with me, and you may think that code better than
the successive D code that uses two slices). Inside it I can't use a
list slicing because it copies subparts of the list, probably becoming
too much slow (for example if len(seq) == 1000).

Compared to Python, the D language is at lower level, so you may
expect it to have a worse syntax (but the ShedSkin compiler has shown
me once for all that you can have a language that is both high-level,
with a short & sexy syntax, and very fast), but you can define a
xpairs() iterable class in it too, and the core of that iterable class
there's a method that may look like this:

if (this.seq.length > 1)

foreach (i, e1; this.seq[0 .. $-1])

foreach (e2; this.seq[i+1 .. $]) {

result = dg(e1, e2); if (result) break; // yield

}

Where the strange last line is the yield, and the $ inside [] is the
length of the current array (a very clever idea that avoids the need
for negative indexes).

That D code is as fast or faster than the code you can back-translate
from Python, this is possible because in D arrays slices are very
light, they are a struct of <length, pointer> (in the future they may
change to a couple of pointers, to speed up the array slice scanning).
So if you slice an array you don't copy its contents, just the start-
end points of the slice. If you read/scan the slice that's all you
have, while if you write on it, D uses a Copy-On-Write strategy, that
is a just-in-time copy of the slice contents.

In practice this speeds up lot of code that manages strings and arrays
(like a XML parser, making it among the faster ones).

Being the slice a struct, it's stack allocated, so the inner foreach
doesn't even create a slice object in the heap each time the outer
foreach loops.

One problem with this strategy is that if you have a huge string, and
you keep only a little slice of it, the D garbage collector will keep
it all in memory. To avoid that problem you have to duplicate the
slice manually:

somestring[inf...sup].dup;

I think Psyco in some situations is able to manage string slices
avoiding the copy.

I think Python 3.0 too may enjoy a similar strategy of light slices +
COW for strings, lists and arrays (tuples and strings don't need the
COW).

Bye,

bearophile

castironpi · May 4, 2008

Sometimes different languages suggests me ways to cross-pollinate
them.

(Note: probably someone has already suggested (and even implemented)
the following ideas, but I like to know why they aren't fit for
Python).

Python generators now allow me to package some code patterns common in
my code, that other (older) languages didn't allow me to extract; for
example given a sequence (array, list, etc) of items now and then I
need to scan all the upper triangle of their cross matrix:

def xpairs(seq):

len_seq = len(seq)

for i, e1 in enumerate(seq):

for j in xrange(i+1, len_seq):

yield e1, seq[j]

Or adjacent ones (there are ways to generalize the following function,
but this the most common situation for me):

def xpairwise(iterable):

return izip(iterable, islice(iterable, 1, None))

That xpairs() generator is nice, but it's not the best possible code
(but you may disagree with me, and you may think that code better than
the successive D code that uses two slices). Inside it I can't use a
list slicing because it copies subparts of the list, probably becoming
too much slow (for example if len(seq) == 1000).

Compared to Python, the D language is at lower level, so you may
expect it to have a worse syntax (but the ShedSkin compiler has shown
me once for all that you can have a language that is both high-level,
with a short & sexy syntax, and very fast), but you can define a
xpairs() iterable class in it too, and the core of that iterable class
there's a method that may look like this:

if (this.seq.length > 1)

foreach (i, e1; this.seq[0 .. $-1])

foreach (e2; this.seq[i+1 .. $]) {

result = dg(e1, e2); if (result) break; // yield

}

Where the strange last line is the yield, and the $ inside [] is the
length of the current array (a very clever idea that avoids the need
for negative indexes).

That D code is as fast or faster than the code you can back-translate
from Python, this is possible because in D arrays slices are very
light, they are a struct of <length, pointer> (in the future they may
change to a couple of pointers, to speed up the array slice scanning).
So if you slice an array you don't copy its contents, just the start-
end points of the slice. If you read/scan the slice that's all you
have, while if you write on it, D uses a Copy-On-Write strategy, that
is a just-in-time copy of the slice contents.

In practice this speeds up lot of code that manages strings and arrays
(like a XML parser, making it among the faster ones).

Being the slice a struct, it's stack allocated, so the inner foreach
doesn't even create a slice object in the heap each time the outer
foreach loops.

One problem with this strategy is that if you have a huge string, and
you keep only a little slice of it, the D garbage collector will keep
it all in memory. To avoid that problem you have to duplicate the
slice manually:

somestring[inf...sup].dup;

I think Psyco in some situations is able to manage string slices
avoiding the copy.

I think Python 3.0 too may enjoy a similar strategy of light slices +
COW for strings, lists and arrays (tuples and strings don't need the
COW).

Bye,

bearophile

In my understanding, the operating system links files across multiple
sections on a disk, while keeping those details from client software.

Files:

AAAA BBBB AA CCCCCCCCC

While File A still reads as: AAAAAA, correctly.

Modifications to B as follow:

Files:

AAAA BBBB AA CCCCCCCCC BBBB

In the case of a large mutable string, modifications can cause linear-
time operations, even if only making a constant-time change.

String:

abcdefg

Modification:

aAbcdefg

causes the entire string to be recopied. Expensive at scale.

A string-on-disk structure could provide linking, such as a String
File Allocation Table.

abcdefg A

correctly reads as:

aAbcdefg

David · May 4, 2008

That xpairs() generator is nice, but it's not the best possible code

(but you may disagree with me, and you may think that code better than
the successive D code that uses two slices). Inside it I can't use a
list slicing because it copies subparts of the list, probably becoming
too much slow (for example if len(seq) == 1000).

What do you mean by best possible? Most efficient? Most readable? And
why don't you use islice?

eg:

def xpairs(seq):
len_seq = len(seq)
for i, e1 in enumerate(seq):
for e2 in islice(seq, i+1, None):
yield e1, e2

Here's a version which makes more use of itertools. It should be more
efficient, but it's ugly

(this is my first time using itertools).

def xpairs(seq):
def _subfunc():
for i in xrange(len(seq)):
e1 = seq
yield izip(repeat(e1), islice(seq, i+1, None))
return chain(*_subfunc())

That D code is as fast or faster than the code you can back-translate
from Python, this is possible because in D arrays slices are very
light, they are a struct of <length, pointer>

Click to expand...

D compiles to efficient machine code so Python is at a disadvantage
even if you use the same syntax (see my first example). You can make
the Python version faster, but beware of premature optimization.

I think Python 3.0 too may enjoy a similar strategy of light slices +
COW for strings, lists and arrays (tuples and strings don't need the
COW).

Click to expand...

What I'dlike to see is a rope[1] module for Python. I'ts in C++'s STL
library[2], but I haven't found a Python version yet.

[1] http://en.wikipedia.org/wiki/Rope_(computer_science)
[2] http://www.sgi.com/tech/stl/Rope.html

With a Python rope library you could do things like this:

a = '<some extremely long string>'
b = rope(a) # Contains a reference to a
c = b[0:100000] # Get a rope object
d = b[100000:200000] # Get another rope object
e = b + b # Get another rope object
print e # Get the string representation of all the re-assembled sub-sections

# And so on. In the above code there was only 1 copy of the huge
string in memory. The rope objects only contain a tree of
sub-operations (slices, concatenations, references to original
sequences, etc).

This shouldn't be too hard to implement. Does anyone know of an
already-existing 'rope' module?

David.

bearophileHUGS · May 4, 2008

David:

What do you mean by best possible? Most efficient? Most readable?

What's a good wine? It's not easy to define what's "good/best". In
such context it's a complex balance of correct, short, fast and
readable (and more, because you need to define a context. This context
refers to Psyco too).

And why don't you use islice?

You are right, but the purpose of light slices that I have suggested
is to avoid using islice so often. And currently Psyco doesn't digest
itertools well.

D compiles to efficient machine code so Python is at a disadvantage
even if you use the same syntax (see my first example). You can make
the Python version faster, but beware of premature optimization.

This time I don't agree with this "premature optimization" thing. My
original Python version is just 5 lines long, it's readable enough,
and it's a part of a large library of mine of similar functions, they
must be fast because I use them all the time as building blocks in
programs.

What I'dlike to see is a rope[1] module for Python.

People have already suggested it, and there's even an implementation
to replace Python strings. It was refused because... I don't know why,
maybe its implementation was too much more complex than the current
one, and it wasn't faster in all situations (and I think Guido wants
data structures that try to replace the basic built-in ones to be
faster in all situations and user-transparent too).

Bye,
bearophile

David · May 4, 2008

D compiles to efficient machine code so Python is at a disadvantage

This time I don't agree with this "premature optimization" thing. My
original Python version is just 5 lines long, it's readable enough,
and it's a part of a large library of mine of similar functions, they
must be fast because I use them all the time as building blocks in
programs.

Have you looked into the 'numpy' libraries? Those have highly
optimized array/numeric processing functions. Also the have a concept
of getting 'views' when you slice, rather than a full copy.

I also suggest benchmarking your apps which use your libs and see
where the bottlenecks are. Then you can move those to external
compiled modules (c/pyrex/cython/shedskin/etc). You can keep your
slower Python version around as a reference implementation.

What I'dlike to see is a rope[1] module for Python.

Click to expand...

People have already suggested it, and there's even an implementation
to replace Python strings. It was refused because... I don't know why,
maybe its implementation was too much more complex than the current
one, and it wasn't faster in all situations (and I think Guido wants
data structures that try to replace the basic built-in ones to be
faster in all situations and user-transparent too).

I'd be happy if there was a separate 'rope' class that you could wrap
arbitrary long sequences in when you need to (the same way that STL
has it separate to the string class, even though the string class has
a lot of other optimizations (internal ref counts, etc, to avoid
unnecessary copies)). Do you know one?

David.

castironpi · May 4, 2008

> D compiles to efficient machine code so Python is at a disadvantage
> even if you use the same syntax (see my first example). You can make
> the Python version faster, but beware of premature optimization.

Click to expand...

This time I don't agree with this "premature optimization" thing. My
original Python version is just 5 lines long, it's readable enough,
and it's a part of a large library of mine of similar functions, they
must be fast because I use them all the time as building blocks in
programs.

Click to expand...

Have you looked into the 'numpy' libraries? Those have highly
optimized array/numeric processing functions. Also the have a concept
of getting 'views' when you slice, rather than a full copy.

I also suggest benchmarking your apps which use your libs and see
where the bottlenecks are. Then you can move those to external
compiled modules (c/pyrex/cython/shedskin/etc). You can keep your
slower Python version around as a reference implementation.

> What I'dlike to see is a rope[1] module for Python.

Click to expand...

People have already suggested it, and there's even an implementation
to replace Python strings. It was refused because... I don't know why,
maybe its implementation was too much more complex than the current
one, and it wasn't faster in all situations (and I think Guido wants
data structures that try to replace the basic built-in ones to be
faster in all situations and user-transparent too).

Click to expand...

I'd be happy if there was a separate 'rope' class that you could wrap
arbitrary long sequences in when you need to (the same way that STL
has it separate to the string class, even though the string class has
a lot of other optimizations (internal ref counts, etc, to avoid
unnecessary copies)). Do you know one?

David.

Persistence on the rope class might be trivial; that is, a constant
number of disk modifications be made per state modification.

Suggestion: PEP for popping slices from lists	13	Aug 8, 2013
Slices when extending python with C++	0	Dec 27, 2011
xslice idea \| a generator slice	7	Jul 11, 2013
Into itertools	5	Apr 26, 2009
parametric vector slices	1	Aug 27, 2013
Sort by number of characters	1	Nov 2, 2023
Short string optimization vs. CoW	0	Feb 14, 2012
In C, the longest palindromic subsequence multithread exists	0	Nov 23, 2022

Light slices + COW

bearophileHUGS

castironpi

David

bearophileHUGS

David

castironpi

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads