substitute for string 0 termination

F

Felix Kater

The C-faq says that "The malloc/free implementation remembers the size
of each block allocated and returned, so it is not necessary to remind
it of the size when freeing."

Could that length information somehow be used as a substitute for
0-termination of strings?

Felix
 
J

John Valko

Felix said:
> The C-faq says that "The malloc/free implementation remembers the size
> of each block allocated and returned, so it is not necessary to remind
> it of the size when freeing."
>
> Could that length information somehow be used as a substitute for
> 0-termination of strings?
>
> Felix

The length that the implementation keeps track of is the size of the
block allocated by malloc() which, assuming success, will be at least
the size you requested. Of course, you can store strings which are
shorter than this without any trouble, and the length will be different.
Furthermore, the way the implementation decides to keep track of the
size of the block returned by malloc() is up to the implementation. So
any hack which you discover for reading the actual size allocated is
completely non-portable. You could, however, use a structure where the
length (of the string) is stored, and use that instead of a terminating
0. However, since the C library functions require 0 termination for
strings, you'll have a much more difficult time passing the string to
these functions. It would make sense, though, to keep track of the
length if you anticipate repetetive calls to strlen() when the length is
not changing, thus avoiding needless O(n) operations.

Hope that helps,
--John
 
A

alexmdac

Yes, you could speed up length tests by storing the length before the
first character like this. The "fast" strings are still
zero-terminated, so they work with the CRT string functions.

char * makeFastString( char *regularString )
{
size_t len, *p;
len = strlen( regularString );
p = malloc( len + 1 + sizeof( size_t ) );
if( !p )
return NULL;
*p = len;
strcpy( (char *)( p + 1 ), regularString );
return (char *)( p + 1 );
}

size_t lengthOfFastString( char *fastString )
{
size_t *p = (size_t *) fastString;
return *( p - 1 );
}

void deleteFastString( char *fastString )
{
size_t *p = (size_t *) fastString;
free( p );
}
 
P

pete

Yes, you could speed up length tests by storing the length before the
first character like this. The "fast" strings are still
zero-terminated, so they work with the CRT string functions.
void deleteFastString( char *fastString )
{
size_t *p = (size_t *) fastString;
free( p );
}

I think

free(fastString);

is what you really want to use, instead of

deleteFastString(fastString);
 
M

Mac

The C-faq says that "The malloc/free implementation remembers the size
of each block allocated and returned, so it is not necessary to remind
it of the size when freeing."

Could that length information somehow be used as a substitute for
0-termination of strings?

Felix

In some hypothetical alternate universe, the committee responsible for
defining the C programming language could have opted to handle strings
differently.

In the universe we actually inhabit, the size of a block returned by
malloc can never be accessed directly by the programmer if the code is to
remain portable.

In any event, there are plenty of times when you don't want to malloc a
chunk of memory to be exactly the size of your string.

However, you could always define a struct wrapper for your strings
which remembers how big they are. See the untested fragment below.

struct my_string
{
size_t size;
char *s;
};

struct my_string *new_my_string(size_t size)
{
struct my_string *new_string;

new_string = malloc(sizeof *new_string);
if (new_string == NULL)
return NULL;
new_string->s = malloc(size);
if (new_string->s == NULL)
{ free(new_string);
return NULL;
}
new_string->size = size;
return new_string;
}

That kind of thing.

--Mac
 
A

Andrey Tarasevich

Felix said:
The C-faq says that "The malloc/free implementation remembers the size
of each block allocated and returned, so it is not necessary to remind
it of the size when freeing."

Could that length information somehow be used as a substitute for
0-termination of strings?
...

No.

Firstly, C standard library does not provide user with any means to
access this information.

Secondly, memory allocation functions are not guaranteed to allocate
exactly the requested amount of memory. Successful allocation requiest
might allocate more memory than was actually requested, which is
perfectly legal in C.
 
S

Servé La

Andrey Tarasevich said:
Firstly, C standard library does not provide user with any means to
access this information.
Secondly, memory allocation functions are not guaranteed to allocate
exactly the requested amount of memory. Successful allocation requiest
might allocate more memory than was actually requested, which is
perfectly legal in C.

I understand this of course, but still I'd like to know why a function like
memsize can't be introduced in the std library. Maybe more bytes are
allocated than requested, a program shouldn't invoke undefined behaviour
when those extra bytes are overwritten right.
I think a portable function like this could be easily added to C and it
would be used a lot considering how many people have asked for something
like this.
 
A

alexmdac

I think a portable function like this could be easily added to C and
it
would be used a lot considering how many people have asked for something
like this.

I usually add this myself, by allocating extra memory and placing a
small header before each memory chunk. The header has a field to store
the chunk size. I usually put debugging information in the header too
e.g. where and when the memory was allocated - invaluable for tracing
memory leaks!
 
J

Jens.Toerring

I understand this of course, but still I'd like to know why a function like
memsize can't be introduced in the std library. Maybe more bytes are
allocated than requested, a program shouldn't invoke undefined behaviour
when those extra bytes are overwritten right.
I think a portable function like this could be easily added to C and it
would be used a lot considering how many people have asked for something
like this.

I guess it's not entirely clear if such a function could really be added
that easily. At least it would put another constraint on the implementors
of malloc() and friends which might have been a reason not to introduce
such a function (and many implementations where it's easy to introduce
already have such a function as an extension) - but that's probably better
asked in comp.std.c where the experts for these questions can be found.

I am also not convinced that it would reduce the number of questions
asked - they probably would be immediately replaced by questions like:
"Why does memsize() tell me the block is xyz bytes large when I only
asked for abc bytes?" where abc < xyz. I actually guess that most people
would like to have this functionality to be able to pass pointers to
functions without the need to also pass the size of what's pointed to.
But if that would be possible (by making the memsize() function not
return the real amount of allocated memory but the size requested
in the malloc() call) then this still wouldn't work for fixed sized
arrays - unless some size information is also stored with fixed sized
arrays that can be obtained using the same hypothetical memsize()
fucntion. I.e. it would only really make sense if you could do

void func( int *x );

int main( void ) {
int a[ 100 ], *b = malloc( 250 * sizeof *b );
func( a );
func( b );
return 0;
}

void func( int *x )
{
printf( "%lu\n", ( unsigned long ) ( memsize( x ) / sizeof *x ) );
}

But then the way fixed sized arrays are dealt with would have to be
changed fundamentally...

And, of course, the next logical step would be a malloc() variant with
type information of the allocated memory stored and another function
for determining this type with a "return value" that can be used in
casts or tested for etc., so that void pointers could be passed around
and type-agnostic functions can be written. But then you really would
make C a rather different language...

Regards, Jens
 
P

pete

Servé La said:
I understand this of course,
but still I'd like to know why a function like
memsize can't be introduced in the std library. Maybe more bytes are
allocated than requested, a program shouldn't invoke
undefined behaviour
when those extra bytes are overwritten right.

The ordinary way to use memory is to allocate it deliberately,
rather than to check and see if any unused memory has been left around.
C code shouldn't be writing beyond the size of the target type.
 
R

Richard Bos

I usually add this myself, by allocating extra memory and placing a
small header before each memory chunk. The header has a field to store
the chunk size.

Beware - it is not trivial to do this and still ensure that the address
following the header is properly aligned for all types; in fact, it is
AFAIK impossible to do so portably.

Richard
 
C

Chris Croughton

Usenet has a `supersedes' feature that you could use.

However, most news servers I know of these days don't accept it, it's
too easily abused by attackers (cancels similarly). And once it's got
to a client it's unlikely that supersedes will do anything apart from
look like a new message...

Chris C
 
M

Michael Mair

Richard said:
Beware - it is not trivial to do this and still ensure that the address
following the header is properly aligned for all types; in fact, it is
AFAIK impossible to do so portably.

Er, not so: You could allocate the double, triple, ... amount of the
memory and pass only the address of the last "element" (i.e. the one
with the right size towards the end of allocated memory). If the
overhead is large enough to store your "small header", maybe with a
safety "distance" from the data starting address, you can memcpy() the
header you created there and retrieve it from there via memcpy() to a
"small header" object. Usually, this will require the double amount of
requested memory and be quite inefficient. (Triple and so on only for
very small amounts of dynamically allocated memory).
The fun starts with realloc(); then, you can even decide to keep the
old overhead and realloc() the difference or whatever rocks your boat.

It's just not worth it :)


Cheers
Michael
 
L

Lawrence Kirby

Er, not so: You could allocate the double, triple, ... amount of the
memory and pass only the address of the last "element" (i.e. the one
with the right size towards the end of allocated memory).

Which one is that, how do you that whether it is properly aligned for
*any* C type?

Lawrence
 
M

Michael Mair

Lawrence said:
Which one is that, how do you that whether it is properly aligned for
*any* C type?

Maybe there is an error in my train of thought, but I thought
about:
MALLOC_WRAPPER(size)
checking whether size is large enough for "small header+safety distance"
I assume it is, otherwise this goes with n*size/(n-1)*size:

So, I unsigned char *p=malloc(2*size), do all the checks and give the
user only the address q=p+size. If the user did give us the return value
of a sizeof Operation (or strlen() call) times something, we should
now have the proper alignment at q.
I enter all the values into a struct small_header object and memcpy() it
to a fixed negative offset with respect to q. I cannot access it there,
of course, but I can memcpy() it to a struct small_header variable and
work with that.

C89:
We run into problems as soon as the user is not honest with us because
(s)he allocates "too much". Example: struct hack.
Our memory alignment requirements may be violated.

C99:
Here, we obviously cannot cope with flexible array members for much
the same reasons (which is why there are no arrays of structures
with f.a.m.)

So, if used correctly and within these restrictions (and with an
overallocating solution for struct hack/f.a.m. working essentially
along the above lines but in the other direction), these should work.

Have I forgotten something? Probably yes :)


Cheers
Michael
 
C

Chris Croughton

Maybe there is an error in my train of thought, but I thought
about:
MALLOC_WRAPPER(size)
checking whether size is large enough for "small header+safety distance"
I assume it is, otherwise this goes with n*size/(n-1)*size: [...]
C89:
We run into problems as soon as the user is not honest with us because
(s)he allocates "too much". Example: struct hack.
Our memory alignment requirements may be violated.

That isn't dishonest, it's the only way to get a variable amount of
data. Indeed, you're doing a form of it yourself.
C99:
Here, we obviously cannot cope with flexible array members for much
the same reasons (which is why there are no arrays of structures
with f.a.m.)

For exactly the same reasons, it won't work, and this use is now blessed
by the Standard.
So, if used correctly and within these restrictions (and with an
overallocating solution for struct hack/f.a.m. working essentially
along the above lines but in the other direction), these should work.

If your restrictions are sufficient, yes, a non-general solution can be
made to work. Another is to use:

union AlignStuff
{
int i;
long l;
long long ll;
void *vp;
} AS[2];

and whatever other types you know your system uses, and use

ptrdiff_t align = (char*)AS[1] - (char*)AS[0];

(because that is guaranteed to give the distance between two objects
aligned for the worst case, so allocate space for your overhead in
n*align bytes). You can include in that all of the types you know your
program uses, but make sure that if other types are added they are
included in the union as well (for instance, a pointer to long or to a
function might be a different alignment from pointer to void).
Have I forgotten something? Probably yes :)

As I said a few weeks back here, it is possible to make a general
allocator which isn't portable (because it knows about system details),
or a portable one which isn't general (because it doesn't know those
details), but not one which is both portable and general within the C
language (if you can run a program to get information from the system or
interactively when installing, for instance from the GNU autoconf
configure script, then you can make it portable and general -- as long
as your program or script is itself portable!).

Question:

In my suggestion above, is (char*)AS[1] - (char*)AS[0] guaranteed to
be the same as sizeof(AS[0])? Where?

Chris C
 
A

alexmdac

I usually add this myself, by allocating extra memory and placing a
small header before each memory chunk. The header has a field to store
the chunk size. I usually put debugging information in the header too
e.g. where and when the memory was allocated - invaluable for tracing
memory leaks!
FWIW I have used this technique for a couple of years on a number of
different architectures, including games consoles which have odd
alignment requirements. I usually make the header 256 bytes and I've
never had any problems.

For those that haven't already seen it, here's the code to pass the
allocation location into the allocation function:

void * DbgAlloc( size_t bytes, const char *file, int line );

#define MALLOC(X) DbgAlloc( bytes, __FILE__, __LINE__ )

It's a very useful technique which can easily be adapted for other
resources. The code's no doubt horribly non-standard compliant (like
most of the compilers I use) :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top