C Style Strings

E

Earl Purple

SuperKoko said:
Even if it is popular in C. Many C libraries use another way:
GString is a good example of a clean C interface:
http://developer.gnome.org/doc/API/glib/glib-strings.html#G-STRING-FREE

Instead of having to call "free", the user must call a library-provided
resource deallocating function.

That is necessary to prevent heap corruption if the library and the
client code use different heap managers.

If the client is written in C++ you can use boost::scoped_ptr (or
shared_ptr) with a custom deleter that calls the deallocating function
when appropriate. So you can add your own RAII.

I am not totally against libraries allocating resource that require to
be freed, but if they do so it should be obvious. For example, a
library will often provide a class factory. If I ever have my libraries
create anything in this way, the function name always begins with (or
is called) create.
Using internally, functions which use with the any range of
user-provided memory (or iterators) is a good way to reuse efficiently
code.
However, in C, as in C++, it is good to encapsulate things and to
provide comfortable interfaces.

However when a library is written in C it is often written for
relatively low-level programming. Often a function will require you to
provide a buffer and its length. It will never overwrite the buffer but
will often flag you if the buffer you provided was inadequate. And
sometimes you can request ahead (at little cost) how big your buffer
needs to be.

C also has that wonderful technique of sticking a variable-length
buffer at the end of a structure thus:

struct some_type
{
header_info header;
char data[1];
};

but you malloc sizeof( header_info ) + actual_data_size (or safer
sizeof( some_type ) + data_size - 1 );

and then later on you call a "free" on the whole thing. It means that
you only have one "malloc" for everything. Beware that such a struct
cannot be safely copied though! (well if you do it will be sliced).
 
T

Tomás

scroopy posted:
Hi,

I've always used std::string but I'm having to use a 3rd party library
that returns const char*s. Given:

char* pString1 = "Blah ";
const char* pString2 = "Blah Blah";

How do I append the contents of pString2 to pString? (giving "Blah
Blah Blah")

I looked at strcat but that crashes with an unhandled exception.

Thanks


Maybe something like this.

Unchecked code:


void DestroyString( const char* const p )
{
delete [] p;
}

#include <cstddef>
#include <cstring>
#include <cstdlib>

char * const Concantenate( const char *a, const char *b )
{
std::size_t const len_a = std::strlen(a);
std::size_t const len_b = std::strlen(b);

char *p = new char[ len_a + len_b + 1 ];

std::memcpy( p, a, len_a );

std::memcpy( p + len_a, b, len_b + 1 );

return p;
}

#include <iostream>

int main()
{
char * const p = Concantenate("Hello ", "World!");

std::cout << p << '\n';

DestroyString(p);

std::system("PAUSE");
}


-Tomás
 
M

Martin Ambuhl

kwikius said:
#include <malloc.h>
This is neither a C header nor a C++ header, so off-topic in both groups
to which you posted.
#include <cstring>
This is not a C header, so off-topic in one of the groups to which you
posted.

If you _must_ post to both <and <
try to make your post topical in each. As it stands, your post is
topical in neither.

There is hardly ever any excuse for posting to both newsgroups; these
are concerned with two different languages, and advice given in posts to
both is almost certainly going to be wrong, or at least non-idiomatic,
in at least one of them.
 
M

Malcolm

kwikius said:
#include <malloc.h>
#include <cstring>

char* concat(const char * str1, const char* str2)
{
char * result = (char*) malloc(strlen( str1) + strlen (str2) + 1);
if( result != NULL){
strcpy(result,str1);
strcat(result, str2);
}
return result;
}
Perfectly unexceptional code.
It won't execute as efficiently as it might, but then most programs can
manipulate a string much faster than a human can read it, however
inefficiently written.

If we want we can do a speed-up

void fastconcat(char *out, char *str1, char *str2)
{
while(*str1)
*out++ = *str1++;
while(*str2)
*out++ = *str2++;
*out = 0;
}

this is a bit of nuisance since it throws the burden of memory allocation
onto the user, it is also rather dangerous sinvce we don't check the buffer.
But it will be very fast. That's the beauty of C, you can roll the function
to the problem you face.
#include <iostream>
#include <string>

char* pString1 = "Blah ";
const char* pString2 = "Blah Blah";


int main()
{
// C-style
char* str = concat(pString1,pString2);
if(str != NULL){
std::cout << str <<'\n';
free(str);
}

// C++ style
std::string str1=std::string(pString1) + pString2;
Ok what's going on here?
You have a string, and now you are calling what looks like a string
constructor to create another type of string. Why do you need two types of
string in the program? Do they behave differently when passed to cout? How
do I know that they will behave in the same way?
std::cout << str1 <<'\n';
}

I'm not sure if that is the optimal C method. Its interesting to note
how much better the C++ version is though!
So what's the big - O analysis of that '+' operation? Where is this
documented? What if I want to sacrifice a bit of safety for speed, as we did
with C? Can I overload the string '+' operator to achieve this?

Apologies to our friends on C++, but this was a provocative post.
 
K

kwikius

Martin said:
kwikius wrote:
This is neither a C header nor a C++ header, so off-topic in both groups
to which you posted.
Ok.

This is not a C header, so off-topic in one of the groups to which you
posted.
Ok.

If you _must_ post to both <and <try to make your post topical in each. As it stands, your post is
topical in neither.

The code in the post attempts to highlight the different problems of
dealing with resource management in both languages. Having to deal
manually with resources and having to check results of functions for
validity are both sources of additional code complexity in C it seems.
Maybe there are plans to address this situation in the next version of
the C standard?
There is hardly ever any excuse for posting to both newsgroups; these
are concerned with two different languages, and advice given in posts to
both is almost certainly going to be wrong, or at least non-idiomatic,
in at least one of them.

If I have given incorect advice, I apologise. FWIW I certainly dont
advocate use of malloc or C-style strings or manual memory management.
IOW I advocate use of C++ over C. C holds no advantage whatever. The
concat function shows very neatly why it is best to avoid C-style
strings in C++. In C it seems that it is possible to do better though
there seems to be no standard higher level string library. Maybe there
are plans to address this situation in the next version of the C
standard?

Whatever... Happy coding!

regards
Andy Little
 
T

tedu

kwikius said:
char* pString1 = "Blah ";
const char* pString2 = "Blah Blah";


int main()
{
// C-style
char* str = concat(pString1,pString2);
if(str != NULL){
std::cout << str <<'\n';
free(str);
}

// C++ style
std::string str1=std::string(pString1) + pString2;
std::cout << str1 <<'\n';
}

I'm not sure if that is the optimal C method. Its interesting to note
how much better the C++ version is though!

yeah, it's always better when programs randomly drop
"terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc
Aborted"
messages on the screen and then stop working.
 
S

santosh

.... snip ...
The code in the post attempts to highlight the different problems of
dealing with resource management in both languages. Having to deal
manually with resources and having to check results of functions for
validity are both sources of additional code complexity in C it seems.
Maybe there are plans to address this situation in the next version of
the C standard?

I guess you want garbage collection and exceptions support. Both impose
run-time overhead. Increasingly, C is used in embedded programming
where both might be unfeasible and unnecessary. I don't think there is
much chance that they will be standardised.

Third-party garbage collectors are available for C. But if you really
want these features built into the langauge, maybe you should consider
Java?
If I have given incorect advice, I apologise. FWIW I certainly dont
advocate use of malloc or C-style strings or manual memory management.

Good for you.
IOW I advocate use of C++ over C. C holds no advantage whatever.

It depends on what you're trying to do. Sweeping generalisations aren't
correct.
The concat function shows very neatly why it is best to avoid C-style
strings in C++.

Indeed. If you decide to program in C++, then you should program in
C++.
In C it seems that it is possible to do better though
there seems to be no standard higher level string library.

Yes, anyone who wants an abstract string library has to either roll his
own or use pre-existing ones. The former case, especially, allows one
to optimise for their specific requirements, though I don't think the
code will be significantly better than std::string.
Maybe there are plans to address this situation in the next version of the C
standard?

I doubt it. Even the next revision of the standard is a minimum of 3-4
years away, and the standard committee have always resisted turning C
into another C++/Java wannabe.
 
C

Chris Smith

kwikius said:
In C it seems that it is possible to do better though
there seems to be no standard higher level string library. Maybe there
are plans to address this situation in the next version of the C
standard?

One very big difference between C and some other languages (say, Java
and C# and VB, for example) is that there is no big organization that's
got billions of dollars invested in making C the next Big Thing. As a
result, the language is not as large or complex, and far more stable,
than some of these other languages. I doubt you'll see future versions
of C making really huge changes in the language or APIs to conform with
higher-level programming goals. After all, if you wanted this you
wouldn't use C, and the standards organization doesn't particularly care
if you use C or not.

Short answer: probably not.
 
K

kwikius

santosh said:
kwikius wrote:
... snip ...


I guess you want garbage collection and exceptions support. Both impose
run-time overhead. Increasingly, C is used in embedded programming
where both might be unfeasible and unnecessary. I don't think there is
much chance that they will be standardised.

Yes. C++ claims to be as useable as C in embedded systems, but I have
heard that use of exceptions rather than error codes causes problems.
There is a marginal effect on performance but AFAIK the larger problems
are due to the latency involved in unwinding the stack as well as
apparently extra memory use for the exception handling code. I guess
that the different style of error handling also causes major problems
with integration. There is probably also an element of sticking with
the way things have always been done. AFAIK Most compilers can turn
off exceptions, though I think this is non-standard even in the
so-called free-standing c++ implementations.
Third-party garbage collectors are available for C. But if you really
want these features built into the langauge, maybe you should consider
Java?

I kind of like Java Swing.
Good for you.


It depends on what you're trying to do. Sweeping generalisations aren't
correct.

Actually after posting I am pretty sure that the C code will be faster.
Creating a C++ string probably involves an allocation. The C++ string
concat operator(+) may also involve an allocation,whereas the C code
only has one allocation. That is the downside of automated resource
management.
Indeed. If you decide to program in C++, then you should program in
C++.

Well I like other languages too. I like the platform independent spirit
of Java and its GUI support, but I guess I would miss C++.

regrads
Andy Little
 
W

websnarf

These guys are real great at keeping the snow out of their cave, even
if they don't realize that its part of an avalanche on top of them.
The code in the post attempts to highlight the different problems of
dealing with resource management in both languages. Having to deal
manually with resources and having to check results of functions for
validity are both sources of additional code complexity in C it seems.
Maybe there are plans to address this situation in the next version of
the C standard?

The C standard is not something where people try to actually address
actual real world problems. You can look at their own manifesto --
they claim to "endorse standard practice" and things along those lines.
So in a sense they *endorse* all the problems with the C language, so
long as it is standard practice. Hence the continuing presence of
"gets" in the library.

C++ obviously goes a long way to addressing resources and error
handling in a useful way (RAII and real exception handling) however it
is not a garbage collecting language and thus it will always take a
little more effort to program in it properly. And of course C leaves
the whole concept of construction and destruction up to the programmer.
If I have given incorect advice, I apologise. FWIW I certainly dont
advocate use of malloc or C-style strings or manual memory management.
IOW I advocate use of C++ over C. C holds no advantage whatever.

Well hang on -- this is precisely where you can make an argument for C
over C++. In C since you are forced to do everything by hand, you have
the advantage of being able to do everything by hand. For example, you
use local stack based memory to back certain allocations if you know
that the lifetime of the resource is equal to the lifetime of the
function call. In C++ you can hope your compiler can figure it out; if
not it will use new/delete which eventually falls back to malloc/free
which is hundreds of times slower.
[...] The
concat function shows very neatly why it is best to avoid C-style
strings in C++. In C it seems that it is possible to do better though
there seems to be no standard higher level string library. Maybe there
are plans to address this situation in the next version of the C
standard?

Take a look at http://bstring.sf.net/ . I claim that even just the C
API is generally better than C++'s std::string, or Microsoft's CString
classes. But it includes a C++ API as well which should make everyone
happy.
 
N

Noah Roberts

C++ obviously goes a long way to addressing resources and error
handling in a useful way (RAII and real exception handling) however it
is not a garbage collecting language and thus it will always take a
little more effort to program in it properly.

RAII is not possible in a garbage collected language. Garbage
collection can, in many real world cases, add more complexity than it
is meant to solve.
Well hang on -- this is precisely where you can make an argument for C
over C++. In C since you are forced to do everything by hand, you have
the advantage of being able to do everything by hand. For example, you
use local stack based memory to back certain allocations if you know
that the lifetime of the resource is equal to the lifetime of the
function call. In C++ you can hope your compiler can figure it out; if
not it will use new/delete which eventually falls back to malloc/free
which is hundreds of times slower.

That statement about C++ is simply incorrect; I can't even imagine
where it is coming from.
 
A

Andrew Poelstra

That statement about C++ is simply incorrect; I can't even imagine
where it is coming from.

I imagine that it comes from a basic understanding of stack-based memory.
 
A

Andrew Poelstra

How do you figure?

If you have stack-based memory, any memory allocated will be deallocated
by the hardware by definition when the memory is popped.

Without stack-based memory, malloc() and free() must be called (on a lower
level than C++ lets you see), which invokes the OS to manage freeing memory
and managing it.

Obviously, deallocating memory as a side effect of simply popping a stack is
many times faster than calling a function and letting the OS sort it out.
 
C

Chris Smith

Andrew Poelstra said:
I imagine that it comes from a basic understanding of stack-based memory.

I don't believe the complaint was about stack memory. It was about the
incorrect statement regarding C++. The same statement may be considered
valid concerning Java, C#, VB, or C++/CLI, for example; but those are
different languages from C++. (The word "valid" should be taken lightly
there; I haven't verified the hundreds of times.)

C++ perfectly well allows programmers to allocate any "objects" (not
quite, really, since they don't own their identity so they are a sort of
2/3-object... but in C++ vocab they are objects) on the stack, with all
the accompanying performance benefits.
 
N

Noah Roberts

Chris said:
I don't believe the complaint was about stack memory. It was about the
incorrect statement regarding C++. The same statement may be considered
valid concerning Java, C#, VB, or C++/CLI, for example; but those are
different languages from C++. (The word "valid" should be taken lightly
there; I haven't verified the hundreds of times.)

C++ perfectly well allows programmers to allocate any "objects" (not
quite, really, since they don't own their identity so they are a sort of
2/3-object... but in C++ vocab they are objects) on the stack, with all
the accompanying performance benefits.

Yes, that is what I was talking about. C++ and C do not differ in the
way it was being stated.
 
T

Tomás

Malcolm posted:

If we want we can do a speed-up

void fastconcat(char *out, char *str1, char *str2)
{
while(*str1)
*out++ = *str1++;
while(*str2)
*out++ = *str2++;
*out = 0;
}


I'd say that has the potential to "run slower" than a version which uses
strcpy. A system would be more efficient copying int's than char's, so
strcpy could be implemented platform-specifically as something like:

(Unchecked code:)

inline bool NoByteZero(unsigned const v)
{
return ( ( (v & 0x7F7F7F7F) + 0x7F7F7F7F ) | v ) | 0x7F7F7F7F;
}

void strcpy(char *pdest, const char *psource)
{
int *idest = reinterpret_cast<int*>(pdest);
int *isource = reinterpret_cast<int*>(psource);

for ( ; NoByteZero(*isource); *idest++ = *isource++);

char *cdest = reinterpret_cast<char*>(idest);
char *csource = reinterpret_cast<char*>(isource);

while( *cdest++ = *csource++ );
}

(This code makes the presumption that on the given platform, it's okay to
access memory which isn't yours.)


-Tomás
 
K

kwikius

Malcolm said:
void fastconcat(char *out, char *str1, char *str2)
{
while(*str1)
*out++ = *str1++;
while(*str2)
*out++ = *str2++;
*out = 0;
}

this is a bit of nuisance since it throws the burden of memory allocation
onto the user, it is also rather dangerous sinvce we don't check the buffer.

You are joking right?
But it will be very fast. That's the beauty of C, you can roll the function
to the problem you face.

The whole point is that you cant. C doesnt give you the tools.

[..]
Ok what's going on here?
You have a string, and now you are calling what looks like a string
constructor to create another type of string.

It is the same type.

Why do you need two types of
string in the program? Do they behave differently when passed to cout? How
do I know that they will behave in the same way?

They are the same type
So what's the big - O analysis of that '+' operation? Where is this
documented?

Its part of the C++ standard library.

What if I want to sacrifice a bit of safety for speed, as we did
with C? Can I overload the string '+' operator to achieve this?

Sure, as long as it doesnt clash with overloads defined in the C++
standard.
Apologies to our friends on C++, but this was a provocative post.

Sorry if the post was provocative. C is a wonderful language and I will
have to get back to it some time.

regards
Andy Little
 
B

Barry Schwarz

Perfectly unexceptional code.
It won't execute as efficiently as it might, but then most programs can
manipulate a string much faster than a human can read it, however
inefficiently written.

If we want we can do a speed-up

void fastconcat(char *out, char *str1, char *str2)
{
while(*str1)
*out++ = *str1++;
while(*str2)
*out++ = *str2++;
*out = 0;
}
Why do you believe that manually stepping through each character will
be faster when strcpy and strcat can take advantage of any CISC
instructions the hardware might offer?


Remove del for email
 
J

Jerry Coffin

[ ... ]

[ ... ]
Why do you believe that manually stepping through each character will
be faster when strcpy and strcat can take advantage of any CISC
instructions the hardware might offer?

The first method steps through the first string once (in
strcpy) to copy it, and then again (in strcat) to find
its end, before concatenating the second string onto it.

His method avoids stepping through the first string the
second time.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,169
Latest member
ArturoOlne
Top