substitute for string 0 termination

Felix Kater · Jan 25, 2005

The C-faq says that "The malloc/free implementation remembers the size
of each block allocated and returned, so it is not necessary to remind
it of the size when freeing."

Could that length information somehow be used as a substitute for
0-termination of strings?

Felix

John Valko · Jan 25, 2005

Felix said:
> The C-faq says that "The malloc/free implementation remembers the size
> of each block allocated and returned, so it is not necessary to remind
> it of the size when freeing."
>
> Could that length information somehow be used as a substitute for
> 0-termination of strings?
>
> Felix

The length that the implementation keeps track of is the size of the
block allocated by malloc() which, assuming success, will be at least
the size you requested. Of course, you can store strings which are
shorter than this without any trouble, and the length will be different.
Furthermore, the way the implementation decides to keep track of the
size of the block returned by malloc() is up to the implementation. So
any hack which you discover for reading the actual size allocated is
completely non-portable. You could, however, use a structure where the
length (of the string) is stored, and use that instead of a terminating
0. However, since the C library functions require 0 termination for
strings, you'll have a much more difficult time passing the string to
these functions. It would make sense, though, to keep track of the
length if you anticipate repetetive calls to strlen() when the length is
not changing, thus avoiding needless O(n) operations.

Hope that helps,
--John

alexmdac · Jan 25, 2005

Yes, you could speed up length tests by storing the length before the
first character like this. The "fast" strings are still
zero-terminated, so they work with the CRT string functions.

char * makeFastString( char *regularString )
{
size_t len, *p;
len = strlen( regularString );
p = malloc( len + 1 + sizeof( size_t ) );
if( !p )
return NULL;
*p = len;
strcpy( (char *)( p + 1 ), regularString );
return (char *)( p + 1 );
}

size_t lengthOfFastString( char *fastString )
{
size_t *p = (size_t *) fastString;
return *( p - 1 );
}

void deleteFastString( char *fastString )
{
size_t *p = (size_t *) fastString;
free( p );
}

alexmdac · Jan 25, 2005

Oops typo: free( p ) should be free( p - 1 )

pete · Jan 26, 2005

Yes, you could speed up length tests by storing the length before the
first character like this. The "fast" strings are still
zero-terminated, so they work with the CRT string functions.

void deleteFastString( char *fastString )
{
size_t *p = (size_t *) fastString;
free( p );
}

I think

free(fastString);

is what you really want to use, instead of

deleteFastString(fastString);

alexmdac · Jan 26, 2005

Yeah there was a typo. Unfortunately you can't edit posts here.

Ben Pfaff · Jan 26, 2005

Yeah there was a typo. Unfortunately you can't edit posts here.

Usenet has a `supersedes' feature that you could use.

Mac · Jan 26, 2005

The C-faq says that "The malloc/free implementation remembers the size
of each block allocated and returned, so it is not necessary to remind
it of the size when freeing."

Could that length information somehow be used as a substitute for
0-termination of strings?

Felix

In some hypothetical alternate universe, the committee responsible for
defining the C programming language could have opted to handle strings
differently.

In the universe we actually inhabit, the size of a block returned by
malloc can never be accessed directly by the programmer if the code is to
remain portable.

In any event, there are plenty of times when you don't want to malloc a
chunk of memory to be exactly the size of your string.

However, you could always define a struct wrapper for your strings
which remembers how big they are. See the untested fragment below.

struct my_string
{
size_t size;
char *s;
};

struct my_string *new_my_string(size_t size)
{
struct my_string *new_string;

new_string = malloc(sizeof *new_string);
if (new_string == NULL)
return NULL;
new_string->s = malloc(size);
if (new_string->s == NULL)
{ free(new_string);
return NULL;
}
new_string->size = size;
return new_string;
}

That kind of thing.

--Mac

Andrey Tarasevich · Jan 26, 2005

Felix said:
The C-faq says that "The malloc/free implementation remembers the size
of each block allocated and returned, so it is not necessary to remind
it of the size when freeing."

Could that length information somehow be used as a substitute for
0-termination of strings?
...

No.

Firstly, C standard library does not provide user with any means to
access this information.

Secondly, memory allocation functions are not guaranteed to allocate
exactly the requested amount of memory. Successful allocation requiest
might allocate more memory than was actually requested, which is
perfectly legal in C.

Servé La · Jan 26, 2005

Andrey Tarasevich said:
Firstly, C standard library does not provide user with any means to
access this information.
Secondly, memory allocation functions are not guaranteed to allocate
exactly the requested amount of memory. Successful allocation requiest
might allocate more memory than was actually requested, which is
perfectly legal in C.

I understand this of course, but still I'd like to know why a function like
memsize can't be introduced in the std library. Maybe more bytes are
allocated than requested, a program shouldn't invoke undefined behaviour
when those extra bytes are overwritten right.
I think a portable function like this could be easily added to C and it
would be used a lot considering how many people have asked for something
like this.

alexmdac · Jan 26, 2005

I think a portable function like this could be easily added to C and
it

would be used a lot considering how many people have asked for something
like this.

I usually add this myself, by allocating extra memory and placing a
small header before each memory chunk. The header has a field to store
the chunk size. I usually put debugging information in the header too
e.g. where and when the memory was allocated - invaluable for tracing
memory leaks!

Jens.Toerring · Jan 26, 2005

I understand this of course, but still I'd like to know why a function like
memsize can't be introduced in the std library. Maybe more bytes are
allocated than requested, a program shouldn't invoke undefined behaviour
when those extra bytes are overwritten right.
I think a portable function like this could be easily added to C and it
would be used a lot considering how many people have asked for something
like this.

I guess it's not entirely clear if such a function could really be added
that easily. At least it would put another constraint on the implementors
of malloc() and friends which might have been a reason not to introduce
such a function (and many implementations where it's easy to introduce
already have such a function as an extension) - but that's probably better
asked in comp.std.c where the experts for these questions can be found.

I am also not convinced that it would reduce the number of questions
asked - they probably would be immediately replaced by questions like:
"Why does memsize() tell me the block is xyz bytes large when I only
asked for abc bytes?" where abc < xyz. I actually guess that most people
would like to have this functionality to be able to pass pointers to
functions without the need to also pass the size of what's pointed to.
But if that would be possible (by making the memsize() function not
return the real amount of allocated memory but the size requested
in the malloc() call) then this still wouldn't work for fixed sized
arrays - unless some size information is also stored with fixed sized
arrays that can be obtained using the same hypothetical memsize()
fucntion. I.e. it would only really make sense if you could do

void func( int *x );

int main( void ) {
int a[ 100 ], *b = malloc( 250 * sizeof *b );
func( a );
func( b );
return 0;
}

void func( int *x )
{
printf( "%lu\n", ( unsigned long ) ( memsize( x ) / sizeof *x ) );
}

But then the way fixed sized arrays are dealt with would have to be
changed fundamentally...

And, of course, the next logical step would be a malloc() variant with
type information of the allocated memory stored and another function
for determining this type with a "return value" that can be used in
casts or tested for etc., so that void pointers could be passed around
and type-agnostic functions can be written. But then you really would
make C a rather different language...

Regards, Jens

pete · Jan 26, 2005

Servé La said:
I understand this of course,
but still I'd like to know why a function like
memsize can't be introduced in the std library. Maybe more bytes are
allocated than requested, a program shouldn't invoke
undefined behaviour
when those extra bytes are overwritten right.

The ordinary way to use memory is to allocate it deliberately,
rather than to check and see if any unused memory has been left around.
C code shouldn't be writing beyond the size of the target type.

Richard Bos · Jan 26, 2005

I usually add this myself, by allocating extra memory and placing a
small header before each memory chunk. The header has a field to store
the chunk size.

Beware - it is not trivial to do this and still ensure that the address
following the header is properly aligned for all types; in fact, it is
AFAIK impossible to do so portably.

Richard

Chris Croughton · Jan 26, 2005

Usenet has a `supersedes' feature that you could use.

However, most news servers I know of these days don't accept it, it's
too easily abused by attackers (cancels similarly). And once it's got
to a client it's unlikely that supersedes will do anything apart from
look like a new message...

Chris C

Michael Mair · Jan 26, 2005

Richard said:
Beware - it is not trivial to do this and still ensure that the address
following the header is properly aligned for all types; in fact, it is
AFAIK impossible to do so portably.

Er, not so: You could allocate the double, triple, ... amount of the
memory and pass only the address of the last "element" (i.e. the one
with the right size towards the end of allocated memory). If the
overhead is large enough to store your "small header", maybe with a
safety "distance" from the data starting address, you can memcpy() the
header you created there and retrieve it from there via memcpy() to a
"small header" object. Usually, this will require the double amount of
requested memory and be quite inefficient. (Triple and so on only for
very small amounts of dynamically allocated memory).
The fun starts with realloc(); then, you can even decide to keep the
old overhead and realloc() the difference or whatever rocks your boat.

It's just not worth it

Cheers
Michael

Lawrence Kirby · Jan 26, 2005

Er, not so: You could allocate the double, triple, ... amount of the
memory and pass only the address of the last "element" (i.e. the one
with the right size towards the end of allocated memory).

Which one is that, how do you that whether it is properly aligned for
*any* C type?

Lawrence

Michael Mair · Jan 26, 2005

Lawrence said:
Which one is that, how do you that whether it is properly aligned for
*any* C type?

Maybe there is an error in my train of thought, but I thought
about:
MALLOC_WRAPPER(size)
checking whether size is large enough for "small header+safety distance"
I assume it is, otherwise this goes with n*size/(n-1)*size:

So, I unsigned char *p=malloc(2*size), do all the checks and give the
user only the address q=p+size. If the user did give us the return value
of a sizeof Operation (or strlen() call) times something, we should
now have the proper alignment at q.
I enter all the values into a struct small_header object and memcpy() it
to a fixed negative offset with respect to q. I cannot access it there,
of course, but I can memcpy() it to a struct small_header variable and
work with that.

C89:
We run into problems as soon as the user is not honest with us because
(s)he allocates "too much". Example: struct hack.
Our memory alignment requirements may be violated.

C99:
Here, we obviously cannot cope with flexible array members for much
the same reasons (which is why there are no arrays of structures
with f.a.m.)

So, if used correctly and within these restrictions (and with an
overallocating solution for struct hack/f.a.m. working essentially
along the above lines but in the other direction), these should work.

Have I forgotten something? Probably yes

Cheers
Michael

Chris Croughton · Jan 26, 2005

Maybe there is an error in my train of thought, but I thought
about:
MALLOC_WRAPPER(size)
checking whether size is large enough for "small header+safety distance"
I assume it is, otherwise this goes with n*size/(n-1)*size: [...]
C89:
We run into problems as soon as the user is not honest with us because
(s)he allocates "too much". Example: struct hack.
Our memory alignment requirements may be violated.

That isn't dishonest, it's the only way to get a variable amount of
data. Indeed, you're doing a form of it yourself.

C99:
Here, we obviously cannot cope with flexible array members for much
the same reasons (which is why there are no arrays of structures
with f.a.m.)

For exactly the same reasons, it won't work, and this use is now blessed
by the Standard.

So, if used correctly and within these restrictions (and with an
overallocating solution for struct hack/f.a.m. working essentially
along the above lines but in the other direction), these should work.

If your restrictions are sufficient, yes, a non-general solution can be
made to work. Another is to use:

union AlignStuff
{
int i;
long l;
long long ll;
void *vp;
} AS[2];

and whatever other types you know your system uses, and use

ptrdiff_t align = (char*)AS[1] - (char*)AS[0];

(because that is guaranteed to give the distance between two objects
aligned for the worst case, so allocate space for your overhead in
n*align bytes). You can include in that all of the types you know your
program uses, but make sure that if other types are added they are
included in the union as well (for instance, a pointer to long or to a
function might be a different alignment from pointer to void).

Have I forgotten something? Probably yes

As I said a few weeks back here, it is possible to make a general
allocator which isn't portable (because it knows about system details),
or a portable one which isn't general (because it doesn't know those
details), but not one which is both portable and general within the C
language (if you can run a program to get information from the system or
interactively when installing, for instance from the GNU autoconf
configure script, then you can make it portable and general -- as long
as your program or script is itself portable!).

Question:

In my suggestion above, is (char*)AS[1] - (char*)AS[0] guaranteed to
be the same as sizeof(AS[0])? Where?

Chris C

alexmdac · Jan 26, 2005

I usually add this myself, by allocating extra memory and placing a
small header before each memory chunk. The header has a field to store
the chunk size. I usually put debugging information in the header too
e.g. where and when the memory was allocated - invaluable for tracing
memory leaks!

FWIW I have used this technique for a couple of years on a number of
different architectures, including games consoles which have odd
alignment requirements. I usually make the header 256 bytes and I've
never had any problems.

For those that haven't already seen it, here's the code to pass the
allocation location into the allocation function:

void * DbgAlloc( size_t bytes, const char *file, int line );

#define MALLOC(X) DbgAlloc( bytes, __FILE__, __LINE__ )

It's a very useful technique which can easily be adapted for other
resources. The code's no doubt horribly non-standard compliant (like
most of the compilers I use)

FAQ 6.6 How do I substitute case insensitively on the LHS while preserving case on the RHS?	0	Feb 8, 2011
0/1 knapsack (again), problem and inquiry about added functionality	1	Jun 14, 2009
Always safe to free() a pointer one byte past the end of an allocatedblock?	15	Aug 3, 2013
I need help fixing my website	2	Oct 15, 2023
malloc for members of a structure and a segmentation fault	19	Sep 15, 2008
Macro NULL or 0	5	May 25, 2009
String concatenation benchmarking weirdness	4	Jan 11, 2013
printing bits ... the right way	2	Apr 1, 2010

substitute for string 0 termination

Felix Kater

John Valko

alexmdac

alexmdac

pete

alexmdac

Ben Pfaff

Mac

Andrey Tarasevich

Servé La

alexmdac

Jens.Toerring

pete

Richard Bos

Chris Croughton

Michael Mair

Lawrence Kirby

Michael Mair

Chris Croughton

alexmdac

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads