C++ strings & C strings

A

arnuld

i was doing exercise 4.3.1 - 4.29 of "C++ Primer 4/e" where authors,
with "run-time shown", claim that C++ Library strings are faster than
C-style character strings. i wrote the same programme in C & hence
found that claim of the authors is *partial*. If we use C-style strings
in C++ instead of Library String class, then they are slow but if write
the same programme in C then C strings are "faster" than both C++
Library strings & C-style strings in C++. Below is the code of 3
programmes. i will really appreciate any comments on C++ programming,
regarding exercise, my code etc.

thanks

// C-style strings in C++

#include <iostream>
#include <cstring>

int main() {
// C style character string implementation
const char *pc = "a very long literal string";
const size_t len = strlen(pc + 1); // for NULL terminator

// peformance test on string allocation and copy
for(size_t ix = 0; ix != 1000000; ++ix)
{
char *pc2 = new char[len + 1]; // dynamic space allocation with
NULL terminator
strcpy(pc2, pc); // copying string onto "allocated
space"
if(strcmp(pc2, pc))
;
delete [] pc2;
}
}
-------------------------------------------------------------------------

// C++ Library Strings
#include <iostream>
#include <string>

int main() {
// C++ standard library string implementation
std::string str("a very long literal string");

// performance tst on staring allocation & copy
// automatic memory management by String Library Class
for(unsigned long ix = 0; ix != 10000000; ++ix)
{
std::string str2 = str;
if(str == str2)
;
}

}
----------------------------------------------------------

// C strings

#include <string.h>

int main() {
char *ps = "a very long literal string";
int len = strlen(ps + 1);
unsigned long i;
char *ps2;

for(i = 0; i <= 10000000; ++i)
{
*ps2 = malloc(len + 1);
strcpy(ps2, ps);
if(strcmp(ps2, ps))
;
free(ps2);
ps2 = NULL;
}
}


-- arnuld
http://arnuld.blogspot.com
 
A

Alf P. Steinbach

* arnuld:
i was doing exercise 4.3.1 - 4.29 of "C++ Primer 4/e" where authors,
with "run-time shown", claim that C++ Library strings are faster than
C-style character strings. i wrote the same programme in C & hence
found that claim of the authors is *partial*. If we use C-style strings
in C++ instead of Library String class, then they are slow but if write
the same programme in C then C strings are "faster" than both C++
Library strings & C-style strings in C++. Below is the code of 3
programmes. i will really appreciate any comments on C++ programming,
regarding exercise, my code etc.

thanks

// C-style strings in C++

#include <iostream>
#include <cstring>

int main() {
// C style character string implementation
const char *pc = "a very long literal string";
const size_t len = strlen(pc + 1); // for NULL terminator

Spot the bug.

// peformance test on string allocation and copy
for(size_t ix = 0; ix != 1000000; ++ix)
{
char *pc2 = new char[len + 1]; // dynamic space allocation with
NULL terminator
strcpy(pc2, pc); // copying string onto "allocated
space"
if(strcmp(pc2, pc))
;
delete [] pc2;
}
}
-------------------------------------------------------------------------

// C++ Library Strings
#include <iostream>
#include <string>

int main() {
// C++ standard library string implementation
std::string str("a very long literal string");

// performance tst on staring allocation & copy
// automatic memory management by String Library Class
for(unsigned long ix = 0; ix != 10000000; ++ix)
{
std::string str2 = str;
if(str == str2)
;
}

}
----------------------------------------------------------

// C strings

#include <string.h>

int main() {
char *ps = "a very long literal string";
int len = strlen(ps + 1);

Spot the bug.

unsigned long i;
char *ps2;

for(i = 0; i <= 10000000; ++i)

Spot the difference.
 
K

Kai-Uwe Bux

arnuld said:
i was doing exercise 4.3.1 - 4.29 of "C++ Primer 4/e" where authors,
with "run-time shown", claim that C++ Library strings are faster than
C-style character strings. i wrote the same programme in C & hence
found that claim of the authors is *partial*. If we use C-style strings
in C++ instead of Library String class, then they are slow but if write
the same programme in C then C strings are "faster" than both C++
Library strings & C-style strings in C++. Below is the code of 3
programmes. i will really appreciate any comments on C++ programming,
regarding exercise, my code etc.

thanks

// C-style strings in C++

#include <iostream>
#include <cstring>

int main() {
// C style character string implementation
const char *pc = "a very long literal string";
const size_t len = strlen(pc + 1); // for NULL terminator

// peformance test on string allocation and copy
for(size_t ix = 0; ix != 1000000; ++ix)

This magic number is one digit shorter than the others:

for(size_t ix = 0; ix != 10000000; ++ix)
{
char *pc2 = new char[len + 1]; // dynamic space allocation with
NULL terminator
strcpy(pc2, pc); // copying string onto "allocated
space"
if(strcmp(pc2, pc))
;
delete [] pc2;
}
}
-------------------------------------------------------------------------

// C++ Library Strings
#include <iostream>
#include <string>

int main() {
// C++ standard library string implementation
std::string str("a very long literal string");

// performance tst on staring allocation & copy
// automatic memory management by String Library Class
for(unsigned long ix = 0; ix != 10000000; ++ix)
{
std::string str2 = str;
if(str == str2)
;
}

}
----------------------------------------------------------

// C strings

#include <string.h>

#include said:
int main() {
char *ps = "a very long literal string";
int len = strlen(ps + 1);
unsigned long i;
char *ps2;

for(i = 0; i <= 10000000; ++i)
{
*ps2 = malloc(len + 1);

That should be

ps2 = malloc(len + 1);

or

ps2 = (char*) malloc(len + 1);
strcpy(ps2, ps);
if(strcmp(ps2, ps))
;
free(ps2);
ps2 = NULL;
}
}


-- arnuld
http://arnuld.blogspot.com

With the fixed programs, I get:

First program:
news_group> time a1.out

real 0m5.180s
user 0m3.296s
sys 0m0.128s


Second program:
news_group> time a2.out

real 0m1.846s
user 0m1.276s
sys 0m0.036s


Third program:
news_group> time a3.out

real 0m2.434s
user 0m1.656s
sys 0m0.056s


Still C++ std::string wins (although the test does not really say anything
as a smart compiler could more or less optmize away the loop completely).


Best

Kai-Uwe Bux
 
F

Frederick Gotham

arnuld posted:
i was doing exercise 4.3.1 - 4.29 of "C++ Primer 4/e" where authors,
with "run-time shown", claim that C++ Library strings are faster than
C-style character strings.


C++ Library strings use dynamic memory allocation, which is sluggish by
comparison to the alternative.

If we use C-style strings in C++ instead of Library String class, then
they are slow


I sincerely doubt that.

but if write the same programme in C then C strings are
"faster" than both C++ Library strings & C-style strings in C++. Below
is the code of 3 programmes. i will really appreciate any comments on
C++ programming, regarding exercise, my code etc.


I'm going to rewrite the code. When testing them, watch out for the compiler
removing redundant statements (such as strcmp).


----- (1) Null-terminated arrays in C++ -----

long unsigned const times = 1UL << 31;

#include <cstring>

char const src_lit[] = "An algorithm is a finite set of well-defined "
"unambiguous instructions for accomplishing a task.";

int main()
{
char dest[sizeof src_lit];

long unsigned i = 0;

do
{
std::strcpy(dest,src_lit);
std::strcmp(dest,src_lit);
}
while(times+1 != ++i);
}

----- (2) Library strings in C++ -----

long unsigned const times = 1UL << 31;

#include <string>

char const src_lit[] = "An algorithm is a finite set of well-defined "
"unambiguous instructions for accomplishing a task.";

int main()
{
std::string const src(src_lit);

std::string dest;

long unsigned i = 0;

do
{
dest = src;
dest == src;
}
while(times+1 != ++i);
}

----- (3) Null-terminated arrays in C90 -----

#define TIMES (1UL << 31)

#include <string.h>

char const src_lit[] = "An algorithm is a finite set of well-defined "
"unambiguous instructions for accomplishing a task.";

int main(void)
{
char dest[sizeof src_lit];

long unsigned i = 0;

do
{
strcpy(dest,src_lit);
strcmp(dest,src_lit);
}
while(TIMES+1 != ++i);

return 0;
}


These tests might be slighly biased toward null-terminated strings, given
that the length of the string is known at compile-time. The problem with
std::string, however, is that it will _always_ use dynamic allocation
irrespective of whether the length is known at compile-time, or whether it's
known to be small enough to allow a over-sized buffer.
 
P

Phlip

arnuld said:
i was doing exercise 4.3.1 - 4.29 of "C++ Primer 4/e" where authors,
with "run-time shown", claim that C++ Library strings are faster than
C-style character strings.

If those authors are any good, that benchmark will come with a complete
description showing where and how it is relevant. In the abstract, the claim
that one software system is always faster than another is beyond useless.
int main() {
// C style character string implementation

A C-style string could be on the stack:

char str[99];

Then you must trivially ensure your input data is shorter. And then...
for(size_t ix = 0; ix != 1000000; ++ix)
{
char *pc2 = new char[len + 1]; // dynamic space allocation with
NULL terminator

....you wouldn't need a 'new' inside a loop. 'new', like your 'malloc()', is
so slow it probably dominates the entire exercise.

(And a big tip: Never write a comment that repeats exactly what the code
says. Help the code speak for itself if possible.)
std::string str2 = str;

This might use an optimization called "copy on write". After the =, both
str2 and str share the same memory buffer. so this code probably avoids the
'new' that the other one uses.

Going forward, learn about optimization, but don't stress about your own
code too much. It's easier to make beautiful code fast than fast code
beautiful, and it's very hard to make code beautiful. But beautiful code is
the best to maintain, so learn more about how to make code beautiful.
 
A

arnuld

Kai-Uwe Bux said:
This magic number is one digit shorter than the others:

"my mistake in posting only", in actual programme there are 7 zeros.
That should be

ps2 = malloc(len + 1);

or

ps2 = (char*) malloc(len + 1);

OK, 2nd one works fine, 1st one gives error on GCC 3.3.5. BUT why my
programme worked with /*ps = malloc(len + 1)/ ?
With the fixed programs, I get:

First program:
news_group> time a1.out

real 0m5.180s
user 0m3.296s
sys 0m0.128s


Second program:
news_group> time a2.out

real 0m1.846s
user 0m1.276s
sys 0m0.036s


Third program:
news_group> time a3.out

real 0m2.434s
user 0m1.656s
sys 0m0.056s

this is from GCC 3.3.5 running on my Debian Sarge

unix@debian:~/programming/cpp$ time ./1.o

real 0m2.184s
user 0m2.179s
sys 0m0.000s
unix@debian:~/programming/cpp$ time ./2.o

real 0m1.144s
user 0m1.132s
sys 0m0.001s
unix@debian:~/programming/cpp$ time ./string.o

real 0m1.147s
user 0m1.126s
sys 0m0.004s
unix@debian:~/programming/cpp$

Still C++ std::string wins (although the test does not really say anything
as a smart compiler could more or less optmize away the loop completely).

approximately same time on my system BUT i am still amazed how
"Strings" implemented in Classes i.e. OO-Paradigm nearly match the
performance of malloc() & free(). it really broke my belief that C is
faster than C++.
 
B

Bo Persson

Frederick said:
These tests might be slighly biased toward null-terminated strings,
given that the length of the string is known at compile-time. The
problem with std::string, however, is that it will _always_ use
dynamic allocation irrespective of whether the length is known at
compile-time, or whether it's known to be small enough to allow a
over-sized buffer.

This is not true.

A std::string implementation can use the small string optimization, to
avoid dynamic allocation for some strings. If the string is shorter
than a certain size, and the size is known at compile time,
std::string can take advantage of that.

Another thing is that std::string works even if the sizes vary, and
are not known at compile time.


You have only shown that copying a string of known size to a
preallocated buffer a fixed number of times, can be optimized in
C-style code. Where is the usability of that?


Bo Persson
 
D

Daniel T.

arnuld said:
approximately same time on my system BUT i am still amazed how
"Strings" implemented in Classes i.e. OO-Paradigm nearly match the
performance of malloc() & free(). it really broke my belief that C is
faster than C++.

Worst case, the string class is implemented the same way the best
programmers would implement bare c arrays. Best case, they take
advantage of compiler optimizations that even the best C programmers
can't guarantee will be available.

I would be surprised if you *could* write code emulating a standard
class but faster.

I had it out once with a fellow programmer who wrote his own
double-linked list class, he made it as fast as he possibly could, even
including some assembler. The std::list class was *still* 5% faster in
every test program he devised.
 
A

arnuld

Daniel said:
Worst case, the string class is implemented the same way the best
programmers would implement bare c arrays. Best case, they take
advantage of compiler optimizations that even the best C programmers
can't guarantee will be available.

i did not get whether you are praising C or C++. it is not clear.
String class is implemented as best programmers will implement the bare
C arrays, it means this is the best way to implement String class. how
is it the worst case?


2nd sentence means 2 things:

"String" class takes advantage of compiler optimizations which are not
available at every compiler"

OR

"String" class takes advantage of compiler optimizations & even best C
programmers dont have the knowledge of those optimizations.
I would be surprised if you *could* write code emulating a standard
class but faster.

Hmmm.....it seems you are right here.
I had it out once with a fellow programmer who wrote his own
double-linked list class, he made it as fast as he possibly could, even
including some assembler. The std::list class was *still* 5% faster in
every test program he devised.

WHOOPIE!, one more shock

-- arnuld
http://arnuld.blogspot.com
 
P

Phlip

arnuld said:
"String" class takes advantage of compiler optimizations which are not
available at every compiler"

OR

"String" class takes advantage of compiler optimizations & even best C
programmers dont have the knowledge of those optimizations.

Try it like this:

std::string is implementation-specific. The implementor may exploit the C++
compiler's optimizations (and secret implementation-defined behavior). The
result could be faster than a C programmer could achieve with only a C
compiler's well-defined behaviors.

The distinction is between an _implementor_ and a _programmer_. The former
controls the tool, while the latter uses the tool. And because C++ supports
classes, its implementation can rely on classes directly, leading to further
internal optimizations.

In theory, all of <string> could be implemented as a compiler-specific
command to insert identifiers like std::string, all as a kind of keyword,
backed up by no textually-defined class whatsoever. And the joy of C++ is
that most implementors won't need to do that for most Standard classes.
 
D

Daniel T.

arnuld said:
i did not get whether you are praising C or C++. it is not clear.

Neither. I am praising the programmers who write the tools we use.
String class is implemented as best programmers will implement the
bare C arrays, it means this is the best way to implement String
class. how is it the worst case?

Exactly. When you use the stuff the vendor wrote, you get the best
self-written case for free.
2nd sentence means 2 things:

"String" class takes advantage of compiler optimizations which are
not available at every compiler"

OR

"String" class takes advantage of compiler optimizations & even best
C programmers dont have the knowledge of those optimizations.

It means both. Look at Phlips answer for more detail.
 
R

Roland Pibinger

i was doing exercise 4.3.1 - 4.29 of "C++ Primer 4/e" where authors,
with "run-time shown", claim that C++ Library strings are faster than
C-style character strings. ....
// C++ Library Strings
#include <iostream>
#include <string>

int main() {
// C++ standard library string implementation
std::string str("a very long literal string");

// performance tst on staring allocation & copy
// automatic memory management by String Library Class
for(unsigned long ix = 0; ix != 10000000; ++ix)
{
std::string str2 = str;
if(str == str2)
;
}
}

The performance of std::string is dependant on (non-standardized)
implementation specific techniques, optimizations and trade-offs. If
you use eg. the Dinkumware implementation you will get different
results.

Best wishes,
Roland Pibinger
 
A

arnuld

it means the performance is entirely-dependent on the C++ compiler one
uses (which relies on the talent of the implementator). hence i can
say, C++ programmers are dependent on the compiler they use & this is
not the case with C.

-- arnuld
http://arnuld.blogspot.com
 
D

Daniel T.

arnuld said:
it means the performance is entirely-dependent on the C++ compiler one
uses (which relies on the talent of the implementator). hence i can
say, C++ programmers are dependent on the compiler they use & this is
not the case with C.

C programmers still use a compiler... We are all dependent on the
complier we use for the kind of micro-optimizations being talked about
here.
 
N

Noah Roberts

arnuld said:
it means the performance is entirely-dependent on the C++ compiler one
uses (which relies on the talent of the implementator). hence i can
say, C++ programmers are dependent on the compiler they use & this is
not the case with C.

Well that's a steaming pile of poo. You can of course say it but that
doesn't make it any more true. Of course C programmers are dependent
on the compiler they use.

Geesh...only someone that absolutely must be right and for whom C just
HAS to be the best language on the planet would say something so stupid.
 
Y

Yannick Tremblay

* arnuld:

Spot the bug.

I think this is very important to highlight this.
Both "C" style string versions posted by the OP for this flawed
benchmark exhibit a buffer overflow bug.

If you are unlucky, you run this test. Decide that the C version
is 1% faster, copy paste that code into production code, don't
test it well enough and release it.

How much money have you lost once the bug gets out? If you are
lucky, that's just money.

Yan
 
A

arnuld

can't, heck i dont even know C, i was using C++ Primer & decided to
understand the problem by practically *trying* to write the code.
I think this is very important to highlight this.
Both "C" style string versions posted by the OP for this flawed
benchmark exhibit a buffer overflow bug.

If you are unlucky, you run this test. Decide that the C version
is 1% faster, copy paste that code into production code, don't
test it well enough and release it.

How much money have you lost once the bug gets out? If you are
lucky, that's just money.

Yan, your point is rally serious, that is why i want to know where &
what exactly is the bug.

thanks

-- arnuld
http://arnuld.blogspot.com
 
F

Frederick Gotham

Bo Persson posted:
A std::string implementation can use the small string optimization, to
avoid dynamic allocation for some strings. If the string is shorter
than a certain size, and the size is known at compile time,
std::string can take advantage of that.


I've never heard of that... how would it be implemented? Something like:

#include <cstddef>

template<std::size_t i>
std::string::string(char const (&str))
{
/* Something Funky... ? */
}
 
R

Roland Pibinger

it means the performance is entirely-dependent on the C++ compiler one
uses (which relies on the talent of the implementator). hence i can
say, C++ programmers are dependent on the compiler they use & this is
not the case with C.

Not the compiler but the string (the std::basic_string template)
implementation! The two major string optimizations are:
- refernce counting with COW (copy on write), used in VC++6.0
http://www.gotw.ca/gotw/045.htm
http://www.gotw.ca/gotw/044.htm
http://www.gotw.ca/gotw/043.htm
- SSO (small string optimization): the implementation uses a small
buffer for small strings and avoids any dynamic allocation in that
case, used in VC++7.x, 8.x

Considering your example:
std::string str("a very long literal string");
std::string str2 = str;

'str2 = str' is 'cheap' in VC++ 6.0 but 'expensive' in VC++ 7/8.

Best wishes,
Roland Pibinger
 
N

Noah Roberts

arnuld said:
can't, heck i dont even know C, i was using C++ Primer & decided to
understand the problem by practically *trying* to write the code.


Yan, your point is rally serious, that is why i want to know where &
what exactly is the bug.

The bug is that it checks the string length of the string starting at
the second character in pc. What was actually wanted was strlen(pc) +
1. Later when you add one to the len variable you increase to the
actual length of pc and then try to copy it. This causes a buffer
overrun because the \0 character makes one more. The +1 is correct but
it is in all the wrong places.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top