# Light slices + COW

B

#### bearophileHUGS

Sometimes different languages suggests me ways to cross-pollinate
them.

(Note: probably someone has already suggested (and even implemented)
the following ideas, but I like to know why they aren't fit for
Python).

Python generators now allow me to package some code patterns common in
my code, that other (older) languages didn't allow me to extract; for
example given a sequence (array, list, etc) of items now and then I
need to scan all the upper triangle of their cross matrix:

def xpairs(seq):

len_seq = len(seq)

for i, e1 in enumerate(seq):

for j in xrange(i+1, len_seq):

yield e1, seq[j]

Or adjacent ones (there are ways to generalize the following function,
but this the most common situation for me):

def xpairwise(iterable):

return izip(iterable, islice(iterable, 1, None))

That xpairs() generator is nice, but it's not the best possible code
(but you may disagree with me, and you may think that code better than
the successive D code that uses two slices). Inside it I can't use a
list slicing because it copies subparts of the list, probably becoming
too much slow (for example if len(seq) == 1000).

Compared to Python, the D language is at lower level, so you may
expect it to have a worse syntax (but the ShedSkin compiler has shown
me once for all that you can have a language that is both high-level,
with a short & sexy syntax, and very fast), but you can define a
xpairs() iterable class in it too, and the core of that iterable class
there's a method that may look like this:

if (this.seq.length > 1)

foreach (i, e1; this.seq[0 .. \$-1])

foreach (e2; this.seq[i+1 .. \$]) {

result = dg(e1, e2); if (result) break; // yield

}

Where the strange last line is the yield, and the \$ inside [] is the
length of the current array (a very clever idea that avoids the need
for negative indexes).

That D code is as fast or faster than the code you can back-translate
from Python, this is possible because in D arrays slices are very
light, they are a struct of <length, pointer> (in the future they may
change to a couple of pointers, to speed up the array slice scanning).
So if you slice an array you don't copy its contents, just the start-
end points of the slice. If you read/scan the slice that's all you
have, while if you write on it, D uses a Copy-On-Write strategy, that
is a just-in-time copy of the slice contents.

In practice this speeds up lot of code that manages strings and arrays
(like a XML parser, making it among the faster ones).

Being the slice a struct, it's stack allocated, so the inner foreach
doesn't even create a slice object in the heap each time the outer
foreach loops.

One problem with this strategy is that if you have a huge string, and
you keep only a little slice of it, the D garbage collector will keep
it all in memory. To avoid that problem you have to duplicate the
slice manually:

somestring[inf...sup].dup;

I think Psyco in some situations is able to manage string slices
avoiding the copy.

I think Python 3.0 too may enjoy a similar strategy of light slices +
COW for strings, lists and arrays (tuples and strings don't need the
COW).

Bye,

bearophile

C

#### castironpi

Sometimes different languages suggests me ways to cross-pollinate
them.

(Note: probably someone has already suggested (and even implemented)
the following ideas, but I like to know why they aren't fit for
Python).

Python generators now allow me to package some code patterns common in
my code, that other (older) languages didn't allow me to extract; for
example given a sequence (array, list, etc) of items now and then I
need to scan all the upper triangle of their cross matrix:

def xpairs(seq):

len_seq = len(seq)

for i, e1 in enumerate(seq):

for j in xrange(i+1, len_seq):

yield e1, seq[j]

Or adjacent ones (there are ways to generalize the following function,
but this the most common situation for me):

def xpairwise(iterable):

return izip(iterable, islice(iterable, 1, None))

That xpairs() generator is nice, but it's not the best possible code
(but you may disagree with me, and you may think that code better than
the successive D code that uses two slices). Inside it I can't use a
list slicing because it copies subparts of the list, probably becoming
too much slow (for example if len(seq) == 1000).

Compared to Python, the D language is at lower level, so you may
expect it to have a worse syntax (but the ShedSkin compiler has shown
me once for all that you can have a language that is both high-level,
with a short & sexy syntax, and very fast), but you can define a
xpairs() iterable class in it too, and the core of that iterable class
there's a method that may look like this:

if (this.seq.length > 1)

foreach (i, e1; this.seq[0 .. \$-1])

foreach (e2; this.seq[i+1 .. \$]) {

result = dg(e1, e2); if (result) break; // yield

}

Where the strange last line is the yield, and the \$ inside [] is the
length of the current array (a very clever idea that avoids the need
for negative indexes).

That D code is as fast or faster than the code you can back-translate
from Python, this is possible because in D arrays slices are very
light, they are a struct of <length, pointer> (in the future they may
change to a couple of pointers, to speed up the array slice scanning).
So if you slice an array you don't copy its contents, just the start-
end points of the slice. If you read/scan the slice that's all you
have, while if you write on it, D uses a Copy-On-Write strategy, that
is a just-in-time copy of the slice contents.

In practice this speeds up lot of code that manages strings and arrays
(like a XML parser, making it among the faster ones).

Being the slice a struct, it's stack allocated, so the inner foreach
doesn't even create a slice object in the heap each time the outer
foreach loops.

One problem with this strategy is that if you have a huge string, and
you keep only a little slice of it, the D garbage collector will keep
it all in memory. To avoid that problem you have to duplicate the
slice manually:

somestring[inf...sup].dup;

I think Psyco in some situations is able to manage string slices
avoiding the copy.

I think Python 3.0 too may enjoy a similar strategy of light slices +
COW for strings, lists and arrays (tuples and strings don't need the
COW).

Bye,

bearophile

In my understanding, the operating system links files across multiple
sections on a disk, while keeping those details from client software.

Files:

AAAA BBBB AA CCCCCCCCC

While File A still reads as: AAAAAA, correctly.

Modifications to B as follow:

Files:

AAAA BBBB AA CCCCCCCCC BBBB

In the case of a large mutable string, modifications can cause linear-
time operations, even if only making a constant-time change.

String:

abcdefg

Modification:

aAbcdefg

causes the entire string to be recopied. Expensive at scale.

A string-on-disk structure could provide linking, such as a String
File Allocation Table.

abcdefg A

aAbcdefg

D

#### David

That xpairs() generator is nice, but it's not the best possible code
(but you may disagree with me, and you may think that code better than
the successive D code that uses two slices). Inside it I can't use a
list slicing because it copies subparts of the list, probably becoming
too much slow (for example if len(seq) == 1000).

What do you mean by best possible? Most efficient? Most readable? And
why don't you use islice?

eg:

def xpairs(seq):
len_seq = len(seq)
for i, e1 in enumerate(seq):
for e2 in islice(seq, i+1, None):
yield e1, e2

Here's a version which makes more use of itertools. It should be more
efficient, but it's ugly (this is my first time using itertools).

def xpairs(seq):
def _subfunc():
for i in xrange(len(seq)):
e1 = seq
yield izip(repeat(e1), islice(seq, i+1, None))
return chain(*_subfunc())
That D code is as fast or faster than the code you can back-translate
from Python, this is possible because in D arrays slices are very
light, they are a struct of <length, pointer>

D compiles to efficient machine code so Python is at a disadvantage
even if you use the same syntax (see my first example). You can make
the Python version faster, but beware of premature optimization.
I think Python 3.0 too may enjoy a similar strategy of light slices +
COW for strings, lists and arrays (tuples and strings don't need the
COW).

What I'dlike to see is a rope[1] module for Python. I'ts in C++'s STL
library[2], but I haven't found a Python version yet.

[1] http://en.wikipedia.org/wiki/Rope_(computer_science)
[2] http://www.sgi.com/tech/stl/Rope.html

With a Python rope library you could do things like this:

a = '<some extremely long string>'
b = rope(a) # Contains a reference to a
c = b[0:100000] # Get a rope object
d = b[100000:200000] # Get another rope object
e = b + b # Get another rope object
print e # Get the string representation of all the re-assembled sub-sections

# And so on. In the above code there was only 1 copy of the huge
string in memory. The rope objects only contain a tree of
sub-operations (slices, concatenations, references to original
sequences, etc).

This shouldn't be too hard to implement. Does anyone know of an

David.

B

#### bearophileHUGS

David:
What do you mean by best possible? Most efficient? Most readable?

What's a good wine? It's not easy to define what's "good/best". In
such context it's a complex balance of correct, short, fast and
readable (and more, because you need to define a context. This context
refers to Psyco too).

And why don't you use islice?

You are right, but the purpose of light slices that I have suggested
is to avoid using islice so often. And currently Psyco doesn't digest
itertools well.

D compiles to efficient machine code so Python is at a disadvantage
even if you use the same syntax (see my first example). You can make
the Python version faster, but beware of premature optimization.

This time I don't agree with this "premature optimization" thing. My
original Python version is just 5 lines long, it's readable enough,
and it's a part of a large library of mine of similar functions, they
must be fast because I use them all the time as building blocks in
programs.

What I'dlike to see is a rope[1] module for Python.

People have already suggested it, and there's even an implementation
to replace Python strings. It was refused because... I don't know why,
maybe its implementation was too much more complex than the current
one, and it wasn't faster in all situations (and I think Guido wants
data structures that try to replace the basic built-in ones to be
faster in all situations and user-transparent too).

Bye,
bearophile

D

#### David

D compiles to efficient machine code so Python is at a disadvantage
This time I don't agree with this "premature optimization" thing. My
original Python version is just 5 lines long, it's readable enough,
and it's a part of a large library of mine of similar functions, they
must be fast because I use them all the time as building blocks in
programs.

Have you looked into the 'numpy' libraries? Those have highly
optimized array/numeric processing functions. Also the have a concept
of getting 'views' when you slice, rather than a full copy.

I also suggest benchmarking your apps which use your libs and see
where the bottlenecks are. Then you can move those to external
compiled modules (c/pyrex/cython/shedskin/etc). You can keep your
slower Python version around as a reference implementation.
What I'dlike to see is a rope[1] module for Python.

People have already suggested it, and there's even an implementation
to replace Python strings. It was refused because... I don't know why,
maybe its implementation was too much more complex than the current
one, and it wasn't faster in all situations (and I think Guido wants
data structures that try to replace the basic built-in ones to be
faster in all situations and user-transparent too).

I'd be happy if there was a separate 'rope' class that you could wrap
arbitrary long sequences in when you need to (the same way that STL
has it separate to the string class, even though the string class has
a lot of other optimizations (internal ref counts, etc, to avoid
unnecessary copies)). Do you know one?

David.

C

#### castironpi

> D compiles to efficient machine code so Python is at a disadvantage
> even if you use the same syntax (see my first example). You can make
> the Python version faster, but beware of premature optimization.
This time I don't agree with this "premature optimization" thing. My
original Python version is just 5 lines long, it's readable enough,
and it's a part of a large library of mine of similar functions, they
must be fast because I use them all the time as building blocks in
programs.

Have you looked into the 'numpy' libraries? Those have highly
optimized array/numeric processing functions. Also the have a concept
of getting 'views' when you slice, rather than a full copy.

I also suggest benchmarking your apps which use your libs and see
where the bottlenecks are. Then you can move those to external
compiled modules (c/pyrex/cython/shedskin/etc). You can keep your
slower Python version around as a reference implementation.
> What I'dlike to see is a rope[1] module for Python.
People have already suggested it, and there's even an implementation
to replace Python strings. It was refused because... I don't know why,
maybe its implementation was too much more complex than the current
one, and it wasn't faster in all situations (and I think Guido wants
data structures that try to replace the basic built-in ones to be
faster in all situations and user-transparent too).

I'd be happy if there was a separate 'rope' class that you could wrap
arbitrary long sequences in when you need to (the same way that STL
has it separate to the string class, even though the string class has
a lot of other optimizations (internal ref counts, etc, to avoid
unnecessary copies)). Do you know one?

David.

Persistence on the rope class might be trivial; that is, a constant
number of disk modifications be made per state modification.

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.