socket send O(N**2) complexity

Z

Zac Burns

The mysocket.mysend method given at
http://docs.python.org/howto/sockets.html has an (unwitting?) O(N**2)
complexity for long msg due to the string slicing.

I've been looking for a way to optimize this, but aside from a pure
python 'string slice view' that looks at the original string I can't
think of anything. Perhaps start and end keywords could be added to
send? I can't think of a reason for the end keyword, but it would be
there for symmetry.

--
Zachary Burns
(407)590-4814
Aim - Zac256FL
Production Engineer (Digital Overlord)
Zindagi Games
 
M

Mike

The mysocket.mysend method given athttp://docs.python.org/howto/sockets.htmlhas an (unwitting?) O(N**2)
complexity for long msg due to the string slicing.

I've been looking for a way to optimize this, but aside from a pure
python 'string slice view' that looks at the original string I can't
think of anything. Perhaps start and end keywords could be added to
send? I can't think of a reason for the end keyword,  but it would be
there for symmetry.

for ch in msg: add to current buffer or start another buffer
for buf in buffers: send(...buf)
 
E

exarkun

Zac Burns wrote in
(e-mail address removed)
in comp.lang.python:
The mysocket.mysend method given at
http://docs.python.org/howto/sockets.html has an (unwitting?) O(N**2)
complexity for long msg due to the string slicing.

I've been looking for a way to optimize this, but aside from a pure
python 'string slice view' that looks at the original string I can't
think of anything. Perhaps start and end keywords could be added to
send? I can't think of a reason for the end keyword, but it would be
there for symmetry.

I ran this script on various versions of python I have access to:

#encoding: utf-8
raw_input( "start" )

s = 'x' * 1000000
r = [None] * 1000

raw_input( "allocated 1 meg + " )

for i in xrange(1000):
r = s[:]

raw_input( "end" )

Niether of the CPython versions (2.5 and 3.0 (with modified code))
exibited any memory increase between "allocated 1 meg + " and "end"


You bumped into a special case that CPython optimizes. s[:] is s. If
you repeat your test with s[1:], you'll see memory climb as one might
normally expect.
pypy-c (1.0.0) showed a 30k jump, and IronPython 2.0 showed a few megs
jump.

AIUI, as a python string is imutable, a slice of a string is a
new string which points (C char *) to the start of the slice data
and with a length that is the length of the slice, about 8 bytes
on 32 bit machine.

So even though a slice assignment new_s = s[:] appears to a python
programmer to make a "copy" of s, its only the a few bytes of
metadata (the pointer and the length) that is really copied, the
strings character data stays where it is.

So the code you cite is in fact O(N) as the copy is constant size.

This all (basically) valid for the special case of s[:]. For any other
string slicing, though, the behavior is indeed O(N**2).

To the OP, you can get view-like behavior with the "buffer" builtin.
Here's an example of its usage from Twisted, where it is employed for
exactly the reason raised here:

http://twistedmatrix.com/trac/browser/trunk/twisted/internet/abstract.py#L93

Jean-Paul
 
J

Jack Diederich

AIUI, as a python string is imutable, a slice of a string is a
new string which points (C char *) to the start of the slice data
and with a length that is the length of the slice, about 8 bytes
on 32 bit machine.

Not in CPython. While some special strings are re-used (empty string,
single letters) if you take a slice of an existing string a new buffer
is allocated and the slice memcpy'd into it.
So even though a slice assignment new_s = s[:] appears to a python
programmer to make a "copy" of s, its only the a few bytes of
metadata (the pointer and the length) that is really copied, the
strings character data stays where it is.

There is a special case for copy in CPython that avoids copying the
string bytes, BUT in that case it just INCREFs the existing object and
returns it. There are no new allocations at all.

Be very careful recommending optimizations based on how you think the
underlying implementation works without double checking with the code
first.

-Jack
 
Z

Zac Burns

 wrote in in
comp.lang.python:
Niether of the CPython versions (2.5 and 3.0 (with modified code))
exibited any memory increase between "allocated 1 meg + " and "end"

You bumped into a special case that CPython optimizes.  s[:] is s.  If
you repeat your test with s[1:], you'll see memory climb as one might
normally expect.

Thanks for the correction (to Jack also)

Rob.

Thanks! The buffer object was exactly the 'slice view' that I was
looking for to fix my version of the algorithm.

--
Zachary Burns
(407)590-4814
Aim - Zac256FL
Production Engineer (Digital Overlord)
Zindagi Games
 
A

Antoine Pitrou

To the OP, you can get view-like behavior with the "buffer" builtin.

And, on Python 3 (or even the 2.7 in development), you can use the "memoryview"
builtin for similar effect.

Regards

Antoine.
 
N

Nobody

Not in CPython. While some special strings are re-used (empty string,
single letters) if you take a slice of an existing string a new buffer
is allocated and the slice memcpy'd into it.

Er, why?

I can understand doing this for mutable sequences, but it doesn't seem
to make much sense for strings.
 
J

Jack Diederich

Er, why?

I can understand doing this for mutable sequences, but it doesn't seem
to make much sense for strings.

Because it adds large amounts of complexity. If you have a string
that points to 8 bytes in a 1Gig string and the giant string goes out
of scope, what do you do? You can keep it alive, you can copy out 8
bytes; you can do one of those things now or you can wait until later
when some kind of garbage collection happens. Once you have that
system why not pile ropes on top of it? and once you have ropes why
not just make strings mutable ...

If you want much longer mailing list threads about the topic search
the python-dev archives.

-Jack
 
S

Steven D'Aprano

Er, why?

I can understand doing this for mutable sequences, but it doesn't seem
to make much sense for strings.

Consider:

huge_string = "abcdef"*1000*1000*1000
tiny_string = huge_string[42:45]
del huge_string

Under the current behaviour, huge_string will be garbage collected. With
the proposed string-view, it won't be. It would be surprising and
disturbing if taking a tiny slice of a huge string prohibited the huge
string from being garbage collected.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,066
Latest member
VytoKetoReviews

Latest Threads

Top