socket send O(N**2) complexity

Zac Burns · Sep 21, 2009

The mysocket.mysend method given at
http://docs.python.org/howto/sockets.html has an (unwitting?) O(N**2)
complexity for long msg due to the string slicing.

I've been looking for a way to optimize this, but aside from a pure
python 'string slice view' that looks at the original string I can't
think of anything. Perhaps start and end keywords could be added to
send? I can't think of a reason for the end keyword, but it would be
there for symmetry.

--
Zachary Burns
(407)590-4814
Aim - Zac256FL
Production Engineer (Digital Overlord)
Zindagi Games

Mike · Sep 21, 2009

The mysocket.mysend method given athttp://docs.python.org/howto/sockets.htmlhas an (unwitting?) O(N**2)
complexity for long msg due to the string slicing.

I've been looking for a way to optimize this, but aside from a pure
python 'string slice view' that looks at the original string I can't
think of anything. Perhaps start and end keywords could be added to
send? I can't think of a reason for the end keyword, but it would be
there for symmetry.

for ch in msg: add to current buffer or start another buffer
for buf in buffers: send(...buf)

exarkun · Sep 21, 2009

Zac Burns wrote in
(e-mail address removed)
in comp.lang.python:

The mysocket.mysend method given at
http://docs.python.org/howto/sockets.html has an (unwitting?) O(N**2)
complexity for long msg due to the string slicing.

I've been looking for a way to optimize this, but aside from a pure
python 'string slice view' that looks at the original string I can't
think of anything. Perhaps start and end keywords could be added to
send? I can't think of a reason for the end keyword, but it would be
there for symmetry.

Click to expand...

I ran this script on various versions of python I have access to:

#encoding: utf-8
raw_input( "start" )

s = 'x' * 1000000
r = [None] * 1000

raw_input( "allocated 1 meg + " )

for i in xrange(1000):
r = s[:]

raw_input( "end" )

Niether of the CPython versions (2.5 and 3.0 (with modified code))
exibited any memory increase between "allocated 1 meg + " and "end"

You bumped into a special case that CPython optimizes. s[:] is s. If
you repeat your test with s[1:], you'll see memory climb as one might
normally expect.

pypy-c (1.0.0) showed a 30k jump, and IronPython 2.0 showed a few megs
jump.

AIUI, as a python string is imutable, a slice of a string is a
new string which points (C char *) to the start of the slice data
and with a length that is the length of the slice, about 8 bytes
on 32 bit machine.

So even though a slice assignment new_s = s[:] appears to a python
programmer to make a "copy" of s, its only the a few bytes of
metadata (the pointer and the length) that is really copied, the
strings character data stays where it is.

So the code you cite is in fact O(N) as the copy is constant size.

Click to expand...

This all (basically) valid for the special case of s[:]. For any other
string slicing, though, the behavior is indeed O(N**2).

To the OP, you can get view-like behavior with the "buffer" builtin.
Here's an example of its usage from Twisted, where it is employed for
exactly the reason raised here:

http://twistedmatrix.com/trac/browser/trunk/twisted/internet/abstract.py#L93

Jean-Paul

Jack Diederich · Sep 21, 2009

AIUI, as a python string is imutable, a slice of a string is a
new string which points (C char *) to the start of the slice data
and with a length that is the length of the slice, about 8 bytes
on 32 bit machine.

Not in CPython. While some special strings are re-used (empty string,
single letters) if you take a slice of an existing string a new buffer
is allocated and the slice memcpy'd into it.

So even though a slice assignment new_s = s[:] appears to a python
programmer to make a "copy" of s, its only the a few bytes of
metadata (the pointer and the length) that is really copied, the
strings character data stays where it is.

There is a special case for copy in CPython that avoids copying the
string bytes, BUT in that case it just INCREFs the existing object and
returns it. There are no new allocations at all.

Be very careful recommending optimizations based on how you think the
underlying implementation works without double checking with the code
first.

-Jack

Zac Burns · Sep 21, 2009

wrote in in
comp.lang.python:

Niether of the CPython versions (2.5 and 3.0 (with modified code))
exibited any memory increase between "allocated 1 meg + " and "end"

Click to expand...

You bumped into a special case that CPython optimizes. s[:] is s. If
you repeat your test with s[1:], you'll see memory climb as one might
normally expect.

Click to expand...

Thanks for the correction (to Jack also)

Rob.

Thanks! The buffer object was exactly the 'slice view' that I was
looking for to fix my version of the algorithm.

--
Zachary Burns
(407)590-4814
Aim - Zac256FL
Production Engineer (Digital Overlord)
Zindagi Games

Antoine Pitrou · Sep 22, 2009

To the OP, you can get view-like behavior with the "buffer" builtin.

And, on Python 3 (or even the 2.7 in development), you can use the "memoryview"
builtin for similar effect.

Regards

Antoine.

Nobody · Sep 23, 2009

Not in CPython. While some special strings are re-used (empty string,
single letters) if you take a slice of an existing string a new buffer
is allocated and the slice memcpy'd into it.

Er, why?

I can understand doing this for mutable sequences, but it doesn't seem
to make much sense for strings.

Jack Diederich · Sep 23, 2009

Er, why?

I can understand doing this for mutable sequences, but it doesn't seem
to make much sense for strings.

Because it adds large amounts of complexity. If you have a string
that points to 8 bytes in a 1Gig string and the giant string goes out
of scope, what do you do? You can keep it alive, you can copy out 8
bytes; you can do one of those things now or you can wait until later
when some kind of garbage collection happens. Once you have that
system why not pile ropes on top of it? and once you have ropes why
not just make strings mutable ...

If you want much longer mailing list threads about the topic search
the python-dev archives.

-Jack

Steven D'Aprano · Sep 23, 2009

Er, why?

I can understand doing this for mutable sequences, but it doesn't seem
to make much sense for strings.

Consider:

huge_string = "abcdef"*1000*1000*1000
tiny_string = huge_string[42:45]
del huge_string

Under the current behaviour, huge_string will be garbage collected. With
the proposed string-view, it won't be. It would be surprising and
disturbing if taking a tiny slice of a huge string prohibited the huge
string from being garbage collected.

Threaded import hang in cPickle.dumps	0	Nov 10, 2009
__import__ returns module without it's attributes?	0	Nov 13, 2009
Why is python so sad?	0	Oct 14, 2009
unittest wart/bug for assertNotEqual	7	Oct 20, 2009
Cannot start a thread in atexit callback	0	May 6, 2009
cPickle vs pickle discrepancy	0	Jan 6, 2009
Return value usage	0	Apr 29, 2009
Retrieving a partial pickle	1	Jul 18, 2009

socket send O(N**2) complexity

Zac Burns

Mike

exarkun

Jack Diederich

Zac Burns

Antoine Pitrou

Nobody

Jack Diederich

Steven D'Aprano

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads