c-style string vs std::string

C

Christopher

I am growing really tired of having to decypher 1000 functions that
were written to do simple operations on c-style strings that I could
do in 50 lines with streams and std::strings. My peer uses that same
old ,"Its more efficient " argument that I always hear. In fact, that
argument has grown into ,"we shouldn't use any of the STL containers,
because they allocate, which is expensive."

For example, I had to debug through 1500 lines today, that simply
replaced a token in a char * with another char *, because everything
to search for the token, convert characters to digits, check for
digits or alpha characters, shift things to make room, replace
elements, etc was all manually written. I could have done this easily
with a find and replace call from the STL .

Well, I am tired of it. I want to write a test and profile it. One
operation at a time. I am sure the differences are negligable,
especially when wieghing in the maintainability of the code.

Before I start spending time to disprove what hasn't even been proven,
I want to check if anyone has had to do this and has preexisiting
code? Or if anyone knows a reliable resource where I can get some,
instead of writing it from scratch? Also, any advice on how to write
such a test without having any points in it that could void the
results would be useful.
 
I

Ian Collins

I am growing really tired of having to decypher 1000 functions that
were written to do simple operations on c-style strings that I could
do in 50 lines with streams and std::strings. My peer uses that same
old ,"Its more efficient " argument that I always hear. In fact, that
argument has grown into ,"we shouldn't use any of the STL containers,
because they allocate, which is expensive."

For example, I had to debug through 1500 lines today, that simply
replaced a token in a char * with another char *, because everything
to search for the token, convert characters to digits, check for
digits or alpha characters, shift things to make room, replace
elements, etc was all manually written. I could have done this easily
with a find and replace call from the STL .

Well, I am tired of it. I want to write a test and profile it. One
operation at a time. I am sure the differences are negligable,
especially when wieghing in the maintainability of the code.

Assuming giving your peer a slap isn't an option, why don't you use your
existing code? Provide alternatives to the C code and compare the two.
That should be more convincing than an artificial benchmark.
 
C

Christopher

Assuming giving your peer a slap isn't an option, why don't you use your
existing code?  Provide alternatives to the C code and compare the two.
  That should be more convincing than an artificial benchmark.

Lots of dependencies and I have to do it on my own time at home, where
I won't have access to the dependencies.
I suppose I can do the same idea though and use some sort of proxies.
I'll try that route.
 
J

Juha Nieminen

Christopher said:
I am growing really tired of having to decypher 1000 functions that
were written to do simple operations on c-style strings that I could
do in 50 lines with streams and std::strings. My peer uses that same
old ,"Its more efficient " argument that I always hear.

The efficiency depends on a lot of things.

For example, if you only use static arrays of type char as your strings
(which are thus always allocated on the stack and never change size), then
they will be faster than using std::string (which will always allocate
memory on the heap).

The same is true for class members. Certainly "class A { char str[30]; };"
will be significantly more efficient than "class A { std::string std; };"
with a constructor that sets the 'str' to be of size 30 (the class in
question will be faster to instantiate, copy and destroy).

Of course even then it depends on how much those strings are being
allocated and destroyed. If this happens very rarely, then the difference
becomes negligible.

If the C strings are being allocated, resized and freed constantly, then
it becomes more complicated. It depends on how and how much, and what kind
of operations are being applied to them, etc...

If the difference is small or even negligible, then the modularity and
safety provided by std::string becomes a crucial factor. This will not only
reduce the amount of bugs, but in many cases it will make the code shorter,
simpler and easier to understand.
In fact, that
argument has grown into ,"we shouldn't use any of the STL containers,
because they allocate, which is expensive."

And what exactly is the proposed alternative?
 
K

Krice

My peer uses that same old ,"Its more efficient " argument that
I always hear. In fact, that argument has grown into ,"we
shouldn't use any of the STL containers,
because they allocate, which is expensive."

There may be a difference in speed, but std::string is better
(or should be) from programmer's perspective, because it's:
-more reliable and less likely to produce bugs
-usually more readable
-easier to refactor

I think things like that are more important than efficiency
which in most cases is in acceptable level with std::string also.
 
G

Goran

I am growing really tired of having to decypher 1000 functions that
were written to do simple operations on c-style strings that I could
do in 50 lines with streams and std::strings. My peer uses that same
old ,"Its more efficient " argument that I always hear. In fact, that
argument has grown into ,"we shouldn't use any of the STL containers,
because they allocate, which is expensive."

For example, I had to debug through 1500 lines today, that simply
replaced a token in a char * with another char *, because everything
to search for the token, convert characters to digits, check for
digits or alpha characters, shift things to make room, replace
elements, etc was all manually written. I could have done this easily
with a find and replace call from the STL .

Well, I am tired of it. I want to write a test and profile it. One
operation at a time. I am sure the differences are negligable,
especially when wieghing in the maintainability of the code.

Before I start spending time to disprove what hasn't even been proven,
I want to check if anyone has had to do this and has preexisiting
code? Or if anyone knows a reliable resource where I can get some,
instead of writing it from scratch? Also, any advice on how to write
such a test without having any points in it that could void the
results would be useful.

Anectotal evidence: I once refactored part of a C-only code with C++.
The whole shebang: classes, polymorphism, exceptions (not
std::exception based though). I got smaller final executable (small
margin, but still). And I could have done even better if I hadn't
replace one C sort with std::sort (which I did). Performance wasn't an
issue, nor code size, really, just code simplification. So size was an
added bonus.

Goran.
 
J

Jorgen Grahn

And what exactly is the proposed alternative?

If it's anything like a recent project: slower, type-unsafe, informal
and buggy versions written in C. (Can't blame them: it's a C project.)

Or the worse options: choosing an inappropriate algorithm which works
with C arrays, e.g. doing linear searches in an array because you
don't have a std::map.

/Jorgen
 
B

BGB

If it's anything like a recent project: slower, type-unsafe, informal
and buggy versions written in C. (Can't blame them: it's a C project.)

Or the worse options: choosing an inappropriate algorithm which works
with C arrays, e.g. doing linear searches in an array because you
don't have a std::map.

if people know what they are doing, and performance matters some, the
usual (fairly straightforward) solution is to throw a hash-table at the
problem (not difficult to implement).


I before did a check (between using an "if(!strcmp) {...} else ..."
chain and using a hash-table followed by a switch, and found a "break
even" point of about 6 options).

if there were < 6 total options, then the if/else chain was faster, and
in > 6 options, the hash-table + switch was faster.

for larger N (100s or 1000s of strings to match against) then a
hash-table is a clear win...

binary trees (and prefix trees) can give better performance in certain
use cases, but are in general both more complex to implement and give
worse performance than hash-tables in the average case IME.

a chain hash is a typical way to speed up array based lookups.

and, all of this works in C...


yes, "std::map" is more convenient, but it is not "essential" for
writing efficient lookups.

this does not justify blaming noobish mistakes or oversights on the
language itself, where one could just as easily condemn C++ for sake of
"pointer-based memory objects being almost impossible to use without
causing crashes" or "the lack of automatic bounds checking causes memory
to become corrupt" or all of the other ways one can shoot themselves in
the foot.

one can respond "well, it is not C++'s fault if you have no idea what
you are doing", and the same goes for C.

C just leaves a little more in the open, and may require a bit more
manual effort in such cases...
 
B

BGB

Fully agreed.

yep.



This is not quite exact. The std::string implementation can use small
string optimization technique which means that there would be no heap
allocations for strings up to certain length (e.g. 16 bytes).

IIRC, it is 12-bytes in MSVC, but I could be wrong here.


in C, a typical "default" array size is like 256 chars or so (may be
larger or smaller, generally needing to be the "largest reasonably
expected value").

many people also use special constants, such as MAX_PATH (260),
depending on what is being done.

another trick is to allocate a smaller fixed-size buffer, and if a
larger one is needed, then to allocate a larger temporary buffer on the
heap (this way one can use, say, 64 chars, and still be able to handle
anything larger which comes along).

A proponent of C recently posted a benchmark test in a thread here
("Generally, are the programs written by C++ slower than written by C
10%") where he inadvertently used too small strings so that the C++
version was more than twice faster (with VC++2010) than the equivalent C
code based on malloc/free, presumably because of the small string
optimization.

IMHO, string creation/management via malloc/free is evil on multiple levels:
it is slow;
it is rather awkward (one has to remember to free them, ...);
it tends to chew through huge amounts of memory (many malloc
implementations tend to fall on their face with lots of tiny allocations);
it invokes the problem that malloc/free + multiple DLLs = blows up in
ones face (MSVC defaults to static linking the C runtime library, so
each DLL has its own heap);
....

in C++, std::string is generally a much better option.


for a pure C solution, another option is essentially to regard plain
strings as immutable atomic datums, and then make use of interning
(where the strings are stored in strings-tables or similar, and the
pointer to them is treated as their value).

interning strings can be made reasonably fast, and does not burden the
other code with manually managing strings.

a theoretical issue is that lots of large and/or one-off strings would
end up interned and eating lots of memory, but IME this hasn't really
been much of an issue (especially if one uses a GC and a weak hash).

in many common cases, strings values tend to be very repetitive.


note that "buffered strings" / "character buffers" are essentially a
different use-case, and are generally handled independently in such cases.

in this case, the string is often assumed to be mutable, and will
typically be heap-based (allocated via malloc or a GC library or similar).

a "reasonable" strategy for plain C here is to essentially create an
analogue of an std::string object in C (the string-buffer is
held/managed by a wrapper object).

a string-buffer makes sense where either the string is mutable, or it is
otherwise potentially large (examples being input and output buffers,
read-in text files, ...).

granted, all this is not generally needed in C++ code, except where
inter-operation with C code (or other non-C++-aware languages) is needed.

This all depends very much on the compiler and optimization levels of
course.

yep.

this applies to both languages.
 
N

Noah Roberts

I am growing really tired of having to decypher 1000 functions that
were written to do simple operations on c-style strings that I could
do in 50 lines with streams and std::strings. My peer uses that same
old ,"Its more efficient " argument that I always hear. In fact, that
argument has grown into ,"we shouldn't use any of the STL containers,
because they allocate, which is expensive."

For example, I had to debug through 1500 lines today, that simply
replaced a token in a char * with another char *, because everything
to search for the token, convert characters to digits, check for
digits or alpha characters, shift things to make room, replace
elements, etc was all manually written. I could have done this easily
with a find and replace call from the STL .

Well, I am tired of it. I want to write a test and profile it. One
operation at a time. I am sure the differences are negligable,
especially when wieghing in the maintainability of the code.

Before I start spending time to disprove what hasn't even been proven,
I want to check if anyone has had to do this and has preexisiting
code? Or if anyone knows a reliable resource where I can get some,
instead of writing it from scratch? Also, any advice on how to write
such a test without having any points in it that could void the
results would be useful.

Is he against use of malloc too? The C guy I work with is and it's
making it really hard/interesting to do my job.

I think you may eventually find that people don't listen to logic or
reason. People make decisions and then come up with reasons to
support them. Then they trick themselves into thinking that they used
those reasons to make the decision they made. This is why no matter
how reasonable an argument you make, you simply cannot convince people
to your side most the time...and why you can't be convinced most the
time too.

If you really want to change their mind you'll have to use Jedi Mind
Tricks. Get some books or psychology, influence, and manipulation.

One important thing you can do to help your side is "understand" their
side. Unless you do this, most people will simply stick to their guns
harder and harder thinking you haven't listened to them. Act like
you've listened, like you're almost convinced, and then, "but...."
Three things this does...it helps you actually listen to what they're
saying because the best way to pretend that you have is to actually do
so. Next, it breaks down their defenses and lets them know that
you're taking their opinion seriously--this is important to you, no?
Finally, it creates a cooperation feedback in their brain; you've done
them a 'favor' and now they need to return it by listening to your
side.

Sometimes you've got to give in to them a bit to get something you
want more.

The thing is, you've got to work with them, wrong as they are, right?
Don't spend the time fighting. Get what you can, run with it, and
prove yourself. If you fight all the time you'll have to fight all
the time and it becomes a miserable place to work. The small bit of
frustration and hit to your pride that being forced to write shitty
code sometimes causes is simply not worth that. If you can't beat
them, join them...just keep mentioning it every time it comes up, "You
know...if we used strings here, maybe it would take a few extra
microseconds, but we wouldn't have run into this bug."

Every so often you need to step past someone. Use this sparingly
though because nobody likes it.

As to your original problem...had the same issue with someone myself
and I did compare std::string to char*. You'll never get the same
speed out of std::string that you can with a "speed focused" char*
function. You'll be slower by a few nanoseconds every time. The
std::string construct simply does more. So, you're opponent is right
and it should be easy to give that to them to show you "understand"
their side. You will, however, run into the worse kind of bugs when
that char* function goes kaboom. They're harder to work with,
impractical to protect, etc...
 
B

Balog Pal

Christopher said:
I am growing really tired of having to decypher 1000 functions that
were written to do simple operations on c-style strings that I could
do in 50 lines with streams and std::strings. My peer uses that same
old ,"Its more efficient " argument that I always hear. In fact, that
argument has grown into ,"we shouldn't use any of the STL containers,
because they allocate, which is expensive."

For example, I had to debug through 1500 lines today, that simply
replaced a token in a char * with another char *, because everything
to search for the token, convert characters to digits, check for
digits or alpha characters, shift things to make room, replace
elements, etc was all manually written. I could have done this easily
with a find and replace call from the STL .

Well, I am tired of it. I want to write a test and profile it. One
operation at a time. I am sure the differences are negligable,
especially when wieghing in the maintainability of the code.

Tired of bullshit? So quit and find a proper place to work, one that aligns
with your values.

There is no point wasting time on experiments or "proof" or anything -- a
shop with the described mentality is way beyond repair.
Before I start spending time to disprove what hasn't even been proven,
I want to check if anyone has had to do this and has preexisiting
code? Or if anyone knows a reliable resource where I can get some,
instead of writing it from scratch? Also, any advice on how to write
such a test without having any points in it that could void the
results would be useful.

I don't get your questions. You have the baseline. You have alternative in
your head. So cde the alternative version, run the unit tests to prove the
behavior is the same, then measure the peroformance -- or rather make the
opposition measure it for you proving the "inefficiency".
 
E

Ebenezer

Is he against use of malloc too?  The C guy I work with is and it's
making it really hard/interesting to do my job.

I think you may eventually find that people don't listen to logic or
reason.  People make decisions and then come up with reasons to
support them.  Then they trick themselves into thinking that they used
those reasons to make the decision they made.  This is why no matter
how reasonable an argument you make, you simply cannot convince people
to your side most the time...and why you can't be convinced most the
time too.

If you really want to change their mind you'll have to use Jedi Mind
Tricks.  Get some books or psychology, influence, and manipulation.

One important thing you can do to help your side is "understand" their
side.  Unless you do this, most people will simply stick to their guns
harder and harder thinking you haven't listened to them.  Act like
you've listened, like you're almost convinced, and then, "but...."
Three things this does...it helps you actually listen to what they're
saying because the best way to pretend that you have is to actually do
so.  Next, it breaks down their defenses and lets them know that
you're taking their opinion seriously--this is important to you, no?
Finally, it creates a cooperation feedback in their brain; you've done
them a 'favor' and now they need to return it by listening to your
side.

Sometimes you've got to give in to them a bit to get something you
want more.

The thing is, you've got to work with them, wrong as they are, right?
Don't spend the time fighting.  Get what you can, run with it, and
prove yourself.  If you fight all the time you'll have to fight all
the time and it becomes a miserable place to work.  The small bit of
frustration and hit to your pride that being forced to write shitty
code sometimes causes is simply not worth that.  If you can't beat
them, join them...

If you can't join 'em, beat 'em. Remember the Alamo.
Those people died fighting for what was right.


Brian Wood
Ebenezer Enterprises
http://webEbenezer.net
 
E

Ebenezer

Is he against use of malloc too?  The C guy I work with is and it's
making it really hard/interesting to do my job.

I think you may eventually find that people don't listen to logic or
reason.  People make decisions and then come up with reasons to
support them.  Then they trick themselves into thinking that they used
those reasons to make the decision they made.  This is why no matter
how reasonable an argument you make, you simply cannot convince people
to your side most the time...and why you can't be convinced most the
time too.

If you really want to change their mind you'll have to use Jedi Mind
Tricks.  Get some books or psychology, influence, and manipulation.

One important thing you can do to help your side is "understand" their
side.  Unless you do this, most people will simply stick to their guns
harder and harder thinking you haven't listened to them.  Act like
you've listened, like you're almost convinced, and then, "but...."
Three things this does...it helps you actually listen to what they're
saying because the best way to pretend that you have is to actually do
so.  Next, it breaks down their defenses and lets them know that
you're taking their opinion seriously--this is important to you, no?
Finally, it creates a cooperation feedback in their brain; you've done
them a 'favor' and now they need to return it by listening to your
side.

Sometimes you've got to give in to them a bit to get something you
want more.

The thing is, you've got to work with them, wrong as they are, right?
Don't spend the time fighting.  Get what you can, run with it, and
prove yourself.  If you fight all the time you'll have to fight all
the time and it becomes a miserable place to work.  The small bit of
frustration and hit to your pride that being forced to write shitty
code sometimes causes is simply not worth that.  If you can't beat
them, join them...

Sometimes you have to stand and fight like they did at
the Alamo. If you can't join 'em, beat 'em.
I would appreciate it if people would watch their mouths
here.


Brian Wood
Ebenezer Enterprises
http://webEbenezer.net
 
G

Gerald Breuer

Am 17.09.2011 00:07, schrieb Christopher:
I am growing really tired of having to decypher 1000 functions that
were written to do simple operations on c-style strings that I could
do in 50 lines with streams and std::strings. My peer uses that same
old ,"Its more efficient " argument that I always hear. In fact, that
argument has grown into ,"we shouldn't use any of the STL containers,
because they allocate, which is expensive."

That's not a question which should be answered in principle but related
to the actual problem. Bloat is only bloat if a real resouce-problem
araises from this "bloat".
 
J

Jorgen Grahn

On Sep 17, 11:43 am, Noah Roberts <[email protected]> wrote:
....
[snip good stuff]
If you can't join 'em, beat 'em. Remember the Alamo.
Those people died fighting for what was right.

What about the people who should have been at the Alamo, but got
themselves killed in a silly bar fight a week before the battle?

Sometimes it's better to give up, even when you know you're right.

(Not always; next week I hope to throw out most of a coworker's code
from the past half year and replace it with something I wrote in a
couple of nights ... Not much he can do about it if (as I suspect) I
can beat the existing code WRT performance and stability.)

/Jorgen
 
J

Juha Nieminen

BGB said:
if people know what they are doing, and performance matters some, the
usual (fairly straightforward) solution is to throw a hash-table at the
problem (not difficult to implement).

Actually implementing a good hash table is not trivial, for two reasons.

Firstly, the decision of which kind of hash table. Unlike eg. red-black
trees, there are many different possible hash table implementations, and
there's no one single that would be optimal. What type of hash table is
best may actually depend on what kind of data is inserted into it.

Secondly, a naive hashing function may have hidden surprises. It might
seem to work like a charm in all testcases... but then there may be some
pathological input (which is not even necessarily deliberately chosen to
break the hash table, but could be natural input from somewhere) that will
trigger the weakness in the hashing function (which usually presents itself
by the vast majority of the elements ending up in only a fraction of the
hash table positions, in other words, many elements having the same hash
values). This pathological situation might not be discovered in testing,
only when a client somewhere uses it with some unexpected input.

The major inefficiency in std::set and std::map is the memory allocation
that happens with each element. However, this inefficiency can be greatly
alleviated by using a fast memory allocator (which all STL containers
support). Such an allocator can make those containers faster by even an
order of magnitude or so.

Of course using a ready-made implementation of a data container (be it
std::set, std::map or their unordered variants in the new standard) also
reduces the amount of bug-hunting in the program, which is always a great
asset.
 
B

Balog Pal

Jorgen Grahn said:
What about the people who should have been at the Alamo, but got
themselves killed in a silly bar fight a week before the battle?

Sometimes it's better to give up, even when you know you're right.

#include "the serenity prayer"
(Not always; next week I hope to throw out most of a coworker's code
from the past half year and replace it with something I wrote in a
couple of nights ... Not much he can do about it if (as I suspect) I
can beat the existing code WRT performance and stability.)

:)
 
E

Ebenezer

On Sep 17, 11:43 am, Noah Roberts <[email protected]> wrote:

...
[snip good stuff]
If you can't join 'em, beat 'em.  Remember the Alamo.
Those people died fighting for what was right.

What about the people who should have been at the Alamo, but got
themselves killed in a silly bar fight a week before the battle?

Sometimes it's better to give up, even when you know you're right.

(Not always; next week I hope to throw out most of a coworker's code
from the past half year and replace it with something I wrote in a
couple of nights ... Not much he can do about it if (as I suspect) I
can beat the existing code WRT performance and stability.)

I think it would be more tactful to say:

next week I hope to replace most of a coworker's code
from the past half year with something I wrote in a
couple of nights ...
 
J

James

I think you may eventually find that people don't listen to logic or
reason.  People make decisions and then come up with reasons to
support them.  Then they trick themselves into thinking that they used
those reasons to make the decision they made.  This is why no matter
how reasonable an argument you make, you simply cannot convince people
to your side most the time...and why you can't be convinced most the
time too.

A concise version of the above is my favorite Ben Goldacre quote.

"You cannot reason people out of a position that they did not reason
themselves into."

James
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top