C Slower than Python?

B

Bartc

I'd been benchmarking my own pet language against Python for manipulation of
short strings. This tested the expression a=b+c for strings, and the Python
code looks like:

b="abc"
c="xyz"
for i in xrange(10000000):
a=b+c
print "A=",a

This took about 2.5 secs with Python 2.5 on my machine (my own efforts
achieved 0.7 secs..)

Pretty good, but how fast could C do it? I expected both of these to be
thrashed, yet the code below took over 4 seconds (mingw 3.4.5 with -O2).

(Timings for longer 60-chars strings were 3.5 secs for Python and 7.5 secs
for C. All timings are elapsed time)

OK, this code is naive and simplistic, but how else would you do it in C?
(BTW I've omitted malloc checking, which is in my own code and I presume is
in Python.)

/* Evaluate string a=b+c 10m times */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(void) {
int i,t;
char *a=NULL;
char *b="abc";
char *c="xyz";

t=clock();

for (i=0; i<10000000; ++i) {
free(a);
a=malloc(strlen(b)+strlen(c)+1);
strcpy(a,b);
strcat(a,c);
}

printf("Result=%s\n",a);

printf("Time: %d\n",clock()-t);
}
 
V

vippstar

I'd been benchmarking my own pet language against Python for manipulation of
short strings. This tested the expression a=b+c for strings, and the Python
code looks like:

b="abc"
c="xyz"
for i in xrange(10000000):
a=b+c
print "A=",a

This took about 2.5 secs with Python 2.5 on my machine (my own efforts
achieved 0.7 secs..)

Pretty good, but how fast could C do it? I expected both of these to be
thrashed, yet the code below took over 4 seconds (mingw 3.4.5 with -O2).

(Timings for longer 60-chars strings were 3.5 secs for Python and 7.5 secs
for C. All timings are elapsed time)

OK, this code is naive and simplistic, but how else would you do it in C?
(BTW I've omitted malloc checking, which is in my own code and I presume is
in Python.)

So WHERE's the C question? You should know better than posting
benchmarking stuff here.
/* Evaluate string a=b+c 10m times */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(void) {
int i,t;
char *a=NULL;
char *b="abc";
char *c="xyz";

t=clock();

t is an integer. clock returns clock_t, which is an arithmetic type,
but not necessarily an int.
It could be a long, or unsigned; in which case depending on the value
you may be "invoking" impl-defined behavior or raising an impl-defined
signal.
It could be a float, in which case you'd lose information. (and if the
value is bigger than INT_MAX or less than INT_MIN, also what I said
before)
for (i=0; i<10000000; ++i) {

If INT_MAX < 10000000, infinite loop and then you invoke undefined
behavior. (integer overflow)
free(a);
a=malloc(strlen(b)+strlen(c)+1);

What if malloc returns NULL? You don't care?
strcpy(a,b);
strcat(a,c);

Not an actual bug: if b is a zero length string your code doesn't
work. (it obviously isn't in your code, but I'm only mentioning this
in case you decide to let the user choose the strings)
}

printf("Result=%s\n",a);

printf("Time: %d\n",clock()-t);

If clock returns any integer type larger than int, or a floating point
type, you invoke UB.

If you're going to time implementations, do it with CONFORMING
programs. Blaming the implementation for its output when your program
is incorrect is plain silly.
Moreover, if you're going to post your results, please do it in an
appropriate group, not comp.lang.c.
 
V

vippstar

Not an actual bug: if b is a zero length string your code doesn't
work. (it obviously isn't in your code, but I'm only mentioning this
in case you decide to let the user choose the strings)

Hmm, nevermind that. If b is a zero length string, strcpy(a, b); sets
a[0] to 0.
 
R

Richard Tobin

I'd been benchmarking my own pet language against Python for manipulation of
short strings.

Your tests are almost certainly primarily measuring memory allocation
speed. Python may well used a more specialised allocator than
malloc(), and malloc() varies greatly between implementations.

(On my computer, the C program is about 3 times faster than the Python.)

-- Richard
 
K

Kenny McCormack

Give it a rest.

Oh come on, he's just doing what all the regs have been doing for years
(decades?) now. The problem is that vippy and CBF just aren't *quite*
as good at it as are Heathfield and Thompson (et al) and their coteries
(in whose numbers I count yourself). This is in much the same way that
McCain isn't quite as good at it (BS'ing the masses) as GWB is/was. The
yokels (in both cases - here and in American politics) are beginning to
figure it out.

And when they do, the curtain is going to fall. Fast and firm.

P.S. Another way to say this is that, with vippy and CBF, you can see
the wires. It ruins the effect.
 
B

Bartc

So WHERE's the C question?
Here:


t is an integer. clock returns clock_t, which is an arithmetic type,
but not necessarily an int.

To be honest, I was more interested in my point about some string ops in C
being slower in C (or perhaps not much faster) than Python, a language
generally regarded as magnitudes slower.
What if malloc returns NULL? You don't care?

Not much, since this probably just allocates and deallocates the same 7
bytes, but since you mention it, I did say:

I've also omitted checks for NULL and empty strings, in order to keep the
loop body to a few lines.
If you're going to time implementations, do it with CONFORMING
programs. Blaming the implementation for its output when your program
is incorrect is plain silly.

Common sense should tell you that using long int or clock_t would not have
made it any faster. I'm sure anyone here can do their own 1...10million loop
and use their own timing.
 
D

Dik T. Winter

> I'd been benchmarking my own pet language against Python for manipulation of
> short strings. This tested the expression a=b+c for strings, and the Python
> code looks like: ....
> OK, this code is naive and simplistic, but how else would you do it in C?
> (BTW I've omitted malloc checking, which is in my own code and I presume is
> in Python.)
....
The difference in time is almost certainly because Python keeps track of the
length of strings, which C does not. And moreover, Python uses problably a
special memory allocator.
 
D

Dik T. Winter

> ...
> The difference in time is almost certainly because Python keeps track of the
> length of strings, which C does not. And moreover, Python uses problably a
> special memory allocator.

I just checked. Keeping track of the string length in C reduced the time
for the program from 2.1 to 1.6 seconds on my machine.
 
W

Willem

Dik T. Winter wrote:
) The difference in time is almost certainly because Python keeps track of the
) length of strings, which C does not. And moreover, Python uses problably a
) special memory allocator.

I disagree.
The difference is almost certainly because of the memory allocator.

Keeping track of the length of the string has a negligible effect on such
small strings.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
R

Richard Bos

Dik T. Winter said:
...
The difference in time is almost certainly because Python keeps track of the
length of strings, which C does not. And moreover, Python uses problably a
special memory allocator.

And moreover, Python probably optimises specifically for string
manipulation by default, while Bartc's C compiler probably optimises for
general purpose programming.

Richard
 
A

Antoninus Twink

Oh come on, he's just doing what all the regs have been doing for
years (decades?) now. The problem is that vippy and CBF just aren't
*quite* as good at it as are Heathfield and Thompson (et al) and their
coteries (in whose numbers I count yourself).

I don't think that's fair. RT, unlike a number of the other Richards,
has historically proved himself much more open to broader discussions in
clc.

It's interesting that even within The Clique, it's possible to just be
too much of a dick and start getting marginalized even if you have the
right topicality politics - the attacks on CBF from all quarters are one
example, and now "VIP Star" is proving too much of an embarrassment even
for Heathfield.
 
R

Richard

So WHERE's the C question? You should know better than posting
benchmarking stuff here.

It seems blindingly obvious what it is to me.

Can you not see it?

Hint : he has written some C which is much slower than Python. He wants
to know why.

You really have become an obstructive arse.
 
R

Richard

Oh come on, do you really want a benchmarking discussion here?

Try reading the post then start your posturing and showing off you self
important blow hard.

Writing "efficient" code can certainly be considered in terms of ISO C
even IF compilers come into it.
 
A

Antoninus Twink

I don't know how your code can be made more efficient, but since my
research shows that python itself is written in C, I also don't see
how well written C code can be slower than Python.

Moreover, in C you can always drop down to inline assembly if there's a
bottleneck you really need to speed up. I don't know how easy that is in
Python.
 
W

Willem

Malcolm McLean wrote:
) However you are doing the length calculation 2 * 10 million times, and not
) much else in that loop.

I don't have the original code at hand, but doesn't the 'not much else'
include calling malloc() ? So malloc() is being called 10 million times
as well.

) Memory reads are expensive. Doing four * 2 instead of one could well account
) for the bulk of the difference.

When cache is involved, reading one byte usually gets the next few into
the cache as well, making those following reads a lot quicker.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
B

Bartc

Fundamentally, your C code is quite inefficient and the code
inside python that implements the loop is optimized. In what way
is your code inefficient?

(a) You don't use a scratch buffer. One of the important
optimizations is to reuse space rather than allocating and
freeing temporary space. This is a dangerous optimization
because you are responsible for keeping track of whether your
scratch buffers are big enough and whether they are actually
free. However it is an important optimization; careful design
can reduce the number of malloc/free calls by orders of
magnitude.

(b) You don't keep track of string lengths. Many operations are
cheap if you know the string length and not so cheap if you
don't.

(c) You use strcat. The trouble with strcat(a,c) is that it has
to find the end of a before it can start copying in c.


Below is some code that, in the grand tradition of clc, has not
been tested. It is probably closer to what the internal code in
Python is actually doing in your test. You might take a look at
it and compare its timings with your original code.

I ran your code, with a mod or two (setting a=buf for example).

My original timings were about 4.5 seconds for each of lccwin32 and Mingw.
DMC just tested was 3.2 seconds.

Your code produced 2 seconds for lccwin32, but Mingw and DMC showed a
dramatic reduction to 0.15 seconds.

However, this code doesn't realistically emulate the short string handling I
was trying to test; in practice each string will have a different length,
and the destination may already have content that needs to be cleared.
Imagine a function:

void addstring(char **a, char *b, char *c);

which does the equivalent of a=b+c, /where *a may already point to a
string/.

My own code for this (to implement an interpreted and dynamically typed
language, a lot of overheads C doesn't have) achieved some 0.7 seconds. This
uses special fast routines for allocation and freeing and copying (for
example malloc() is only called once per megabyte, free() I think never, for
small allocations).

I suppose these same routines could be used in a C library to speed things
up. *But* using what is already provided in C, the obvious way to program my
addstring() routine is to use malloc, free, strlen, strcpy and so on:

void addstring(char **a, char*b, char*c){
free(*a);
*a=malloc(strlen(b)+strlen(c)+1);
// *a=malloc(7);
strcpy(*a,b);
strcat(*a,c);
// memcpy(*a,b,3);
// memcpy(*a+3,c,4);
}

This took DMC about 3.2 seconds to execute 10 million times.

Taking string length calculation out of the equation (using the commented
lines above) took 1.8 seconds (don't try this unless b and c have strlen
3!).

Remember my Python 2.5 took (appropriately) 2.5 seconds; nearer 2.0 seconds
without the loop overhead (C's loop overhead was neglible).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top