The Running Time of += on Char Strings ?

E

Edg Bamyasi

This Is A Late Cross Post from comp.lang.python. It seems the mistery
is deeper then i expected.

What is the running time of conactination on character strings.

i.e.

is it Amortized Constant time? I don't think it would be O((number of
chars)^2) but i really don't know.

Teach me how to fish, where would i find out more about the

internal
representations of data types in python (and guarenteed run times, im
think of something like sgi.com 's info on the STL) . I have looked
through the docs but i don't seem to see these types of specifications.

thanks * 100
- Haz

P.S.
- Should Note that i am famliure with timeit, but understanding the
underly data structures and representations is an important thing to
know.

P.P.S

This a bit of what i think relevent discourse i have been having via a
email responder of my usenet posting.
Haz> i should have mentioned that i am familure with the timeit
Haz> function, but shurly there must be a specification in the language
Haz> of the running time (number of flops).

Nope. I can't think of an instance where it would be appropriate to specify
runtime properties of various algorithms in the language. For example, if
you were to specify that sorting of lists was O(n log n) that would
potentially preclude the choice of quicksort as an algorithm because its
worst case behavior is O(n * n) even though it is generally faster than most
other sorting algorithms.

The answere here is to use omega(n log n) or specify average and worst
cases. I truely do think that there can be a complexity specification
for the language. I mean all algorithms have a complexity and surely
data structures are choosen with the size/speed tradeoffs in mind.

For instance in the STL has a sorting algorithm and it specifies a
running time the latter way (why they can do assure this is by using
coding with concepts, but i think in the base language it could be
simpler because the data structures are known.) i.e. Its know what
data types += works on and thus it should be know what algoritms are
to be used (that is the highly optimal ones)
From SGI STL Page:
sort
<snip>
Complexity
O(N log(N)) comparisons (both average and worst-case), where N is last
- first. [2]
<snip>

source : http://www.sgi.com/tech/stl/sort.html

Haz> Without knowing these specifications its hard to optimize.

No, you still need to see where your program runs slow and figure out ways
to make it run faster.

Well basically my point is that it is hard to know why a code section
is running slow unless you understand the underlying data
represenations and algorithms

For instance

matlab code:

A=[]
for i=1:N
A=[A;'a']
end

is O(N^2) operation

C code:

vector<char> A;
for(i=0; i<N; i++){
A.push_back('a');
}

is Amortized O(N) [note that this isn't the correct wording exactly,
but in general is O(N)]

Python Code:

A=""
for i in range(N):
A+='a'

running time : ???

So if you are looking through your code in matlab or C and see this
concatination loop you know that it is a problem in the former and not
in the latter. But in python ???
 
D

Daniel Dittmar

Edg said:
What is the running time of conactination on character strings.

i.e.




is it Amortized Constant time? I don't think it would be O((number of
chars)^2) but i really don't know.

Strings are immutable, so
joe+="99999999999999999"
is executed as
joe = joe + "99999999999999999"

This means that there is
- one allocation
- two memcpy
- one deallocation (old value of joe)

My guess is that the allocations play a rather large part in the actual
timing.

Creating a large string by multiple concatenations is slow, instead you
should:
- use module cStringIO
- or add all the strings to a list and do "".join (listvariable)

How are you supposed to know? It's mostly Python folklore, some of which
has been written down in the Python Cookbook
(http://aspn.activestate.com/ASPN/Python/Cookbook/)

Daniel
 
S

Stefan Behnel

Edg said:
What is the running time of conactination on character strings.
i.e. ..>>>joe="123"
..>>>joe+="99999999999999999"
is it Amortized Constant time? I don't think it would be O((number of
chars)^2) but i really don't know.

First of all, this idiom is generally avoided in loops (where it actually
matters). Use ''.join() which the documentation describes as optimized for
your case and as preferable over creating a larger number of immutable
strings. Note that it does not specify the run-time complexity there.

That said, Python is a dynamic, high-level language. CPython is an
implementation. IronPython and Jython are different implementations. There are
others. Many of the internal complexities are implementation specific. Some
have good reasons for this. Also, IronPython and Jython heavily rely on the
performance of the underlying run-time environment for their own performance.
The differences between the various implementations and their run-time
environments can be big enough to render performance specifications useless in
many (though possibly not all) cases.

If you need to know the exact complexity of algorithms, read them. Download
the source distribution of the Python version you want to investigate and read
the source. But remember that there is no actual specification. Do not expect
your code to run at the same speed in all Python versions and implementations.

There are examples for basic algorithms that were exchanged during the long
evolution of the CPython implementation. One is the sort algorithm. Recent 2.4
changes in the handling of lists made some common operations considerably faster.

It is a pragmatically sane approach to accept the high programming level of
Python and to not rely on the specific performance of a specific
implementation. Just use the tool that is made for your task. The information
for choosing the right tool can already be found in the documentation.

Stefan
 
M

MyHaz

Thanks Guys It Was Great Help and I have began to mark my code for the
''.join() string conatination optimization. Upon regoogling (when you
know the right thing to google it can make a big diffrence, having not
know how to google +=, hehe). I found this commentary and set of tests.
I find it a good conclustion to this question.

http://www.skymind.com/~ocrow/python_string/


''.join(['Thank ','you])

- Haz
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,274
Latest member
JessMcMast

Latest Threads

Top