substitute for string 0 termination

Discussion in 'C Programming' started by Felix Kater, Jan 25, 2005.

  1. Felix Kater

    Felix Kater Guest

    The C-faq says that "The malloc/free implementation remembers the size
    of each block allocated and returned, so it is not necessary to remind
    it of the size when freeing."

    Could that length information somehow be used as a substitute for
    0-termination of strings?

    Felix
     
    Felix Kater, Jan 25, 2005
    #1
    1. Advertising

  2. Felix Kater

    John Valko Guest

    Felix Kater wrote:
    > The C-faq says that "The malloc/free implementation remembers the size
    > of each block allocated and returned, so it is not necessary to remind
    > it of the size when freeing."
    >
    > Could that length information somehow be used as a substitute for
    > 0-termination of strings?
    >
    > Felix


    The length that the implementation keeps track of is the size of the
    block allocated by malloc() which, assuming success, will be at least
    the size you requested. Of course, you can store strings which are
    shorter than this without any trouble, and the length will be different.
    Furthermore, the way the implementation decides to keep track of the
    size of the block returned by malloc() is up to the implementation. So
    any hack which you discover for reading the actual size allocated is
    completely non-portable. You could, however, use a structure where the
    length (of the string) is stored, and use that instead of a terminating
    0. However, since the C library functions require 0 termination for
    strings, you'll have a much more difficult time passing the string to
    these functions. It would make sense, though, to keep track of the
    length if you anticipate repetetive calls to strlen() when the length is
    not changing, thus avoiding needless O(n) operations.

    Hope that helps,
    --John
     
    John Valko, Jan 25, 2005
    #2
    1. Advertising

  3. Felix Kater

    Guest

    Yes, you could speed up length tests by storing the length before the
    first character like this. The "fast" strings are still
    zero-terminated, so they work with the CRT string functions.

    char * makeFastString( char *regularString )
    {
    size_t len, *p;
    len = strlen( regularString );
    p = malloc( len + 1 + sizeof( size_t ) );
    if( !p )
    return NULL;
    *p = len;
    strcpy( (char *)( p + 1 ), regularString );
    return (char *)( p + 1 );
    }

    size_t lengthOfFastString( char *fastString )
    {
    size_t *p = (size_t *) fastString;
    return *( p - 1 );
    }

    void deleteFastString( char *fastString )
    {
    size_t *p = (size_t *) fastString;
    free( p );
    }
     
    , Jan 25, 2005
    #3
  4. Felix Kater

    Guest

    Oops typo: free( p ) should be free( p - 1 )
     
    , Jan 25, 2005
    #4
  5. Felix Kater

    pete Guest

    wrote:
    >
    > Yes, you could speed up length tests by storing the length before the
    > first character like this. The "fast" strings are still
    > zero-terminated, so they work with the CRT string functions.


    > void deleteFastString( char *fastString )
    > {
    > size_t *p = (size_t *) fastString;
    > free( p );
    > }


    I think

    free(fastString);

    is what you really want to use, instead of

    deleteFastString(fastString);

    --
    pete
     
    pete, Jan 26, 2005
    #5
  6. Felix Kater

    Guest

    Yeah there was a typo. Unfortunately you can't edit posts here.
     
    , Jan 26, 2005
    #6
  7. Felix Kater

    Ben Pfaff Guest

    writes:

    > Yeah there was a typo. Unfortunately you can't edit posts here.


    Usenet has a `supersedes' feature that you could use.
    --
    int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.\
    \n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
    );while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p\
    );}return 0;}
     
    Ben Pfaff, Jan 26, 2005
    #7
  8. Felix Kater

    Mac Guest

    On Wed, 26 Jan 2005 00:11:28 +0100, Felix Kater wrote:

    > The C-faq says that "The malloc/free implementation remembers the size
    > of each block allocated and returned, so it is not necessary to remind
    > it of the size when freeing."
    >
    > Could that length information somehow be used as a substitute for
    > 0-termination of strings?
    >
    > Felix


    In some hypothetical alternate universe, the committee responsible for
    defining the C programming language could have opted to handle strings
    differently.

    In the universe we actually inhabit, the size of a block returned by
    malloc can never be accessed directly by the programmer if the code is to
    remain portable.

    In any event, there are plenty of times when you don't want to malloc a
    chunk of memory to be exactly the size of your string.

    However, you could always define a struct wrapper for your strings
    which remembers how big they are. See the untested fragment below.

    struct my_string
    {
    size_t size;
    char *s;
    };

    struct my_string *new_my_string(size_t size)
    {
    struct my_string *new_string;

    new_string = malloc(sizeof *new_string);
    if (new_string == NULL)
    return NULL;
    new_string->s = malloc(size);
    if (new_string->s == NULL)
    { free(new_string);
    return NULL;
    }
    new_string->size = size;
    return new_string;
    }

    That kind of thing.

    --Mac
     
    Mac, Jan 26, 2005
    #8
  9. Felix Kater wrote:
    > The C-faq says that "The malloc/free implementation remembers the size
    > of each block allocated and returned, so it is not necessary to remind
    > it of the size when freeing."
    >
    > Could that length information somehow be used as a substitute for
    > 0-termination of strings?
    > ...


    No.

    Firstly, C standard library does not provide user with any means to
    access this information.

    Secondly, memory allocation functions are not guaranteed to allocate
    exactly the requested amount of memory. Successful allocation requiest
    might allocate more memory than was actually requested, which is
    perfectly legal in C.

    --
    Best regards,
    Andrey Tarasevich
     
    Andrey Tarasevich, Jan 26, 2005
    #9
  10. Felix Kater

    Servé La Guest

    "Andrey Tarasevich" <> schreef in bericht
    news:...
    > Firstly, C standard library does not provide user with any means to
    > access this information.
    > Secondly, memory allocation functions are not guaranteed to allocate
    > exactly the requested amount of memory. Successful allocation requiest
    > might allocate more memory than was actually requested, which is
    > perfectly legal in C.


    I understand this of course, but still I'd like to know why a function like
    memsize can't be introduced in the std library. Maybe more bytes are
    allocated than requested, a program shouldn't invoke undefined behaviour
    when those extra bytes are overwritten right.
    I think a portable function like this could be easily added to C and it
    would be used a lot considering how many people have asked for something
    like this.
     
    Servé La, Jan 26, 2005
    #10
  11. Felix Kater

    Guest

    > I think a portable function like this could be easily added to C and
    it
    > would be used a lot considering how many people have asked for

    something
    > like this.


    I usually add this myself, by allocating extra memory and placing a
    small header before each memory chunk. The header has a field to store
    the chunk size. I usually put debugging information in the header too
    e.g. where and when the memory was allocated - invaluable for tracing
    memory leaks!
     
    , Jan 26, 2005
    #11
  12. Felix Kater

    -berlin.de Guest

    "Servé La" <> wrote:
    > "Andrey Tarasevich" <> schreef in bericht
    > news:...
    >> Firstly, C standard library does not provide user with any means to
    >> access this information.
    >> Secondly, memory allocation functions are not guaranteed to allocate
    >> exactly the requested amount of memory. Successful allocation requiest
    >> might allocate more memory than was actually requested, which is
    >> perfectly legal in C.


    > I understand this of course, but still I'd like to know why a function like
    > memsize can't be introduced in the std library. Maybe more bytes are
    > allocated than requested, a program shouldn't invoke undefined behaviour
    > when those extra bytes are overwritten right.
    > I think a portable function like this could be easily added to C and it
    > would be used a lot considering how many people have asked for something
    > like this.


    I guess it's not entirely clear if such a function could really be added
    that easily. At least it would put another constraint on the implementors
    of malloc() and friends which might have been a reason not to introduce
    such a function (and many implementations where it's easy to introduce
    already have such a function as an extension) - but that's probably better
    asked in comp.std.c where the experts for these questions can be found.

    I am also not convinced that it would reduce the number of questions
    asked - they probably would be immediately replaced by questions like:
    "Why does memsize() tell me the block is xyz bytes large when I only
    asked for abc bytes?" where abc < xyz. I actually guess that most people
    would like to have this functionality to be able to pass pointers to
    functions without the need to also pass the size of what's pointed to.
    But if that would be possible (by making the memsize() function not
    return the real amount of allocated memory but the size requested
    in the malloc() call) then this still wouldn't work for fixed sized
    arrays - unless some size information is also stored with fixed sized
    arrays that can be obtained using the same hypothetical memsize()
    fucntion. I.e. it would only really make sense if you could do

    void func( int *x );

    int main( void ) {
    int a[ 100 ], *b = malloc( 250 * sizeof *b );
    func( a );
    func( b );
    return 0;
    }

    void func( int *x )
    {
    printf( "%lu\n", ( unsigned long ) ( memsize( x ) / sizeof *x ) );
    }

    But then the way fixed sized arrays are dealt with would have to be
    changed fundamentally...

    And, of course, the next logical step would be a malloc() variant with
    type information of the allocated memory stored and another function
    for determining this type with a "return value" that can be used in
    casts or tested for etc., so that void pointers could be passed around
    and type-agnostic functions can be written. But then you really would
    make C a rather different language...

    Regards, Jens
    --
    \ Jens Thoms Toerring ___ -berlin.de
    \__________________________ http://www.toerring.de
     
    -berlin.de, Jan 26, 2005
    #12
  13. Felix Kater

    pete Guest

    Servé La wrote:
    >
    > "Andrey Tarasevich" <> schreef in bericht
    > news:...
    > > Firstly, C standard library does not provide user with any means to
    > > access this information.
    > > Secondly, memory allocation functions are not guaranteed to allocate
    > > exactly the requested amount of memory. Successful allocation requiest
    > > might allocate more memory than was actually requested, which is
    > > perfectly legal in C.

    >
    > I understand this of course,
    > but still I'd like to know why a function like
    > memsize can't be introduced in the std library. Maybe more bytes are
    > allocated than requested, a program shouldn't invoke
    > undefined behaviour
    > when those extra bytes are overwritten right.


    The ordinary way to use memory is to allocate it deliberately,
    rather than to check and see if any unused memory has been left around.
    C code shouldn't be writing beyond the size of the target type.

    --
    pete
     
    pete, Jan 26, 2005
    #13
  14. Felix Kater

    Richard Bos Guest

    wrote:

    > > I think a portable function like this could be easily added to C and it
    > > would be used a lot considering how many people have asked for something
    > > like this.

    >
    > I usually add this myself, by allocating extra memory and placing a
    > small header before each memory chunk. The header has a field to store
    > the chunk size.


    Beware - it is not trivial to do this and still ensure that the address
    following the header is properly aligned for all types; in fact, it is
    AFAIK impossible to do so portably.

    Richard
     
    Richard Bos, Jan 26, 2005
    #14
  15. On Tue, 25 Jan 2005 18:42:10 -0800, Ben Pfaff
    <> wrote:

    > writes:
    >
    >> Yeah there was a typo. Unfortunately you can't edit posts here.

    >
    > Usenet has a `supersedes' feature that you could use.


    However, most news servers I know of these days don't accept it, it's
    too easily abused by attackers (cancels similarly). And once it's got
    to a client it's unlikely that supersedes will do anything apart from
    look like a new message...

    Chris C
     
    Chris Croughton, Jan 26, 2005
    #15
  16. Felix Kater

    Michael Mair Guest

    Richard Bos wrote:
    > wrote:
    >
    >
    >>>I think a portable function like this could be easily added to C and it
    >>>would be used a lot considering how many people have asked for something
    >>>like this.

    >>
    >>I usually add this myself, by allocating extra memory and placing a
    >>small header before each memory chunk. The header has a field to store
    >>the chunk size.

    >
    >
    > Beware - it is not trivial to do this and still ensure that the address
    > following the header is properly aligned for all types; in fact, it is
    > AFAIK impossible to do so portably.


    Er, not so: You could allocate the double, triple, ... amount of the
    memory and pass only the address of the last "element" (i.e. the one
    with the right size towards the end of allocated memory). If the
    overhead is large enough to store your "small header", maybe with a
    safety "distance" from the data starting address, you can memcpy() the
    header you created there and retrieve it from there via memcpy() to a
    "small header" object. Usually, this will require the double amount of
    requested memory and be quite inefficient. (Triple and so on only for
    very small amounts of dynamically allocated memory).
    The fun starts with realloc(); then, you can even decide to keep the
    old overhead and realloc() the difference or whatever rocks your boat.

    It's just not worth it :)


    Cheers
    Michael
    --
    E-Mail: Mine is a gmx dot de address.
     
    Michael Mair, Jan 26, 2005
    #16
  17. On Wed, 26 Jan 2005 18:18:57 +0100, Michael Mair wrote:

    >
    >
    > Richard Bos wrote:
    >> wrote:
    >>
    >>
    >>>>I think a portable function like this could be easily added to C and it
    >>>>would be used a lot considering how many people have asked for something
    >>>>like this.
    >>>
    >>>I usually add this myself, by allocating extra memory and placing a
    >>>small header before each memory chunk. The header has a field to store
    >>>the chunk size.

    >>
    >>
    >> Beware - it is not trivial to do this and still ensure that the address
    >> following the header is properly aligned for all types; in fact, it is
    >> AFAIK impossible to do so portably.

    >
    > Er, not so: You could allocate the double, triple, ... amount of the
    > memory and pass only the address of the last "element" (i.e. the one
    > with the right size towards the end of allocated memory).


    Which one is that, how do you that whether it is properly aligned for
    *any* C type?

    Lawrence
     
    Lawrence Kirby, Jan 26, 2005
    #17
  18. Felix Kater

    Michael Mair Guest

    Lawrence Kirby wrote:
    > On Wed, 26 Jan 2005 18:18:57 +0100, Michael Mair wrote:
    >
    >
    >>
    >>Richard Bos wrote:
    >>
    >>> wrote:
    >>>
    >>>
    >>>
    >>>>>I think a portable function like this could be easily added to C and it
    >>>>>would be used a lot considering how many people have asked for something
    >>>>>like this.
    >>>>
    >>>>I usually add this myself, by allocating extra memory and placing a
    >>>>small header before each memory chunk. The header has a field to store
    >>>>the chunk size.
    >>>
    >>>
    >>>Beware - it is not trivial to do this and still ensure that the address
    >>>following the header is properly aligned for all types; in fact, it is
    >>>AFAIK impossible to do so portably.

    >>
    >>Er, not so: You could allocate the double, triple, ... amount of the
    >>memory and pass only the address of the last "element" (i.e. the one
    >>with the right size towards the end of allocated memory).

    >
    > Which one is that, how do you that whether it is properly aligned for
    > *any* C type?


    Maybe there is an error in my train of thought, but I thought
    about:
    MALLOC_WRAPPER(size)
    checking whether size is large enough for "small header+safety distance"
    I assume it is, otherwise this goes with n*size/(n-1)*size:

    So, I unsigned char *p=malloc(2*size), do all the checks and give the
    user only the address q=p+size. If the user did give us the return value
    of a sizeof Operation (or strlen() call) times something, we should
    now have the proper alignment at q.
    I enter all the values into a struct small_header object and memcpy() it
    to a fixed negative offset with respect to q. I cannot access it there,
    of course, but I can memcpy() it to a struct small_header variable and
    work with that.

    C89:
    We run into problems as soon as the user is not honest with us because
    (s)he allocates "too much". Example: struct hack.
    Our memory alignment requirements may be violated.

    C99:
    Here, we obviously cannot cope with flexible array members for much
    the same reasons (which is why there are no arrays of structures
    with f.a.m.)

    So, if used correctly and within these restrictions (and with an
    overallocating solution for struct hack/f.a.m. working essentially
    along the above lines but in the other direction), these should work.

    Have I forgotten something? Probably yes :)


    Cheers
    Michael
    --
    E-Mail: Mine is a gmx dot de address.
     
    Michael Mair, Jan 26, 2005
    #18
  19. On Wed, 26 Jan 2005 19:00:38 +0100, Michael Mair
    <> wrote:

    > Maybe there is an error in my train of thought, but I thought
    > about:
    > MALLOC_WRAPPER(size)
    > checking whether size is large enough for "small header+safety distance"
    > I assume it is, otherwise this goes with n*size/(n-1)*size:

    [...]
    > C89:
    > We run into problems as soon as the user is not honest with us because
    > (s)he allocates "too much". Example: struct hack.
    > Our memory alignment requirements may be violated.


    That isn't dishonest, it's the only way to get a variable amount of
    data. Indeed, you're doing a form of it yourself.

    > C99:
    > Here, we obviously cannot cope with flexible array members for much
    > the same reasons (which is why there are no arrays of structures
    > with f.a.m.)


    For exactly the same reasons, it won't work, and this use is now blessed
    by the Standard.

    > So, if used correctly and within these restrictions (and with an
    > overallocating solution for struct hack/f.a.m. working essentially
    > along the above lines but in the other direction), these should work.


    If your restrictions are sufficient, yes, a non-general solution can be
    made to work. Another is to use:

    union AlignStuff
    {
    int i;
    long l;
    long long ll;
    void *vp;
    } AS[2];

    and whatever other types you know your system uses, and use

    ptrdiff_t align = (char*)AS[1] - (char*)AS[0];

    (because that is guaranteed to give the distance between two objects
    aligned for the worst case, so allocate space for your overhead in
    n*align bytes). You can include in that all of the types you know your
    program uses, but make sure that if other types are added they are
    included in the union as well (for instance, a pointer to long or to a
    function might be a different alignment from pointer to void).

    > Have I forgotten something? Probably yes :)


    As I said a few weeks back here, it is possible to make a general
    allocator which isn't portable (because it knows about system details),
    or a portable one which isn't general (because it doesn't know those
    details), but not one which is both portable and general within the C
    language (if you can run a program to get information from the system or
    interactively when installing, for instance from the GNU autoconf
    configure script, then you can make it portable and general -- as long
    as your program or script is itself portable!).

    Question:

    In my suggestion above, is (char*)AS[1] - (char*)AS[0] guaranteed to
    be the same as sizeof(AS[0])? Where?

    Chris C
     
    Chris Croughton, Jan 26, 2005
    #19
  20. Felix Kater

    Guest

    wrote:
    > I usually add this myself, by allocating extra memory and placing a
    > small header before each memory chunk. The header has a field to

    store
    > the chunk size. I usually put debugging information in the header

    too
    > e.g. where and when the memory was allocated - invaluable for tracing
    > memory leaks!

    FWIW I have used this technique for a couple of years on a number of
    different architectures, including games consoles which have odd
    alignment requirements. I usually make the header 256 bytes and I've
    never had any problems.

    For those that haven't already seen it, here's the code to pass the
    allocation location into the allocation function:

    void * DbgAlloc( size_t bytes, const char *file, int line );

    #define MALLOC(X) DbgAlloc( bytes, __FILE__, __LINE__ )

    It's a very useful technique which can easily be adapted for other
    resources. The code's no doubt horribly non-standard compliant (like
    most of the compilers I use) :)
     
    , Jan 26, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Garth17
    Replies:
    0
    Views:
    568
    Garth17
    Mar 16, 2005
  2. Jeff Kish
    Replies:
    2
    Views:
    929
    Jeff Kish
    Nov 15, 2004
  3. Kevin Podsiadlik

    String from char[] with null termination.

    Kevin Podsiadlik, Jan 12, 2006, in forum: Java
    Replies:
    2
    Views:
    4,340
    Roedy Green
    Jan 12, 2006
  4. Dmitry Denisenkov

    C++ String termination

    Dmitry Denisenkov, Feb 10, 2004, in forum: C++
    Replies:
    9
    Views:
    13,486
    Kelsey Bjarnason
    Feb 13, 2004
  5. Replies:
    0
    Views:
    283
Loading...

Share This Page