Data relocation (pointer subtraction undefined except within array)

Discussion in 'C Programming' started by chrisbazley@bigfoot.com, Dec 31, 2009.

  1. Guest

    Hello,

    I wasted a lot of time yesterday writing some code which manages a
    collection of strings within a single heap block allocated by a
    function similar to 'malloc'. A separate array of structs includes
    members which point to the start of each string. My intention is to
    replace existing (working) code where each string is held in a
    separate heap block, to reduce a high turnover of blocks.

    When replacing existing strings with longer strings, or adding strings
    to the collection, the heap block containing the strings must be
    extended using a function like 'realloc'. However, 'realloc' may move
    the base address of the block, which invalidates the pointers to the
    start of each string. I don't really want to replace all my 'char *'
    members with 'ptrdiff_t' (offset from start of heap block containing
    strings) because of the loss of type-safety and additional cost of
    accessing the strings.

    I thought I could solve my problem by adding a relocation offset to
    each pointer immediately after calling 'realloc', derived by
    subtracting the old from the new address of the heap block. However,
    turning to the appendix of my copy of Kernighan & Ritchie, I
    discovered - to my dismay - that the result of subtracting one pointer
    from another is undefined unless both point to objects within the same
    array.

    Presumably that means the following code would have undefined effects:

    unsigned int i;
    char *strings, *new_strings;
    ptrdiff_t relocate;
    struct
    {
    char *string;
    }
    objs[10];

    /* Resize heap block containing strings */
    new_strings = realloc(strings, new_size);
    if (new_strings == NULL)
    {
    /* ...handle error... */
    }

    /* Relocate pointers to the start of each string */
    relocate = new_strings - strings;
    for (i = 0; i < sizeof(objs) / sizeof(objs[0]); i++)
    {
    objs.string += relocate;
    }
    strings = new_strings;

    Can anyone think of a machine architecture where the above code would
    not work? It seems likely to be a common idiom, even if not strictly
    legal.

    It occurs to me that I could circumvent the K&R restriction by
    utilitising the old address of the heap block within my relocation
    loop:

    for (i = 0; i < sizeof(objs) / sizeof(objs[0]); i++)
    {
    objs.string = new_strings + (objs.string - strings);
    }

    However, given that 'strings' is no longer a valid pointer, is this
    version any less reprehensible than my original code?

    I look forward to your comments.

    TIA,
    --
    Christopher Bazley
     
    , Dec 31, 2009
    #1
    1. Advertising

  2. writes:
    <snip>
    > When replacing existing strings with longer strings, or adding strings
    > to the collection, the heap block containing the strings must be
    > extended using a function like 'realloc'. However, 'realloc' may move
    > the base address of the block, which invalidates the pointers to the
    > start of each string. I don't really want to replace all my 'char *'
    > members with 'ptrdiff_t' (offset from start of heap block containing
    > strings) because of the loss of type-safety and additional cost of
    > accessing the strings.
    >
    > I thought I could solve my problem by adding a relocation offset to
    > each pointer immediately after calling 'realloc', derived by
    > subtracting the old from the new address of the heap block. However,
    > turning to the appendix of my copy of Kernighan & Ritchie, I
    > discovered - to my dismay - that the result of subtracting one pointer
    > from another is undefined unless both point to objects within the same
    > array.
    >
    > Presumably that means the following code would have undefined effects:
    >
    > unsigned int i;
    > char *strings, *new_strings;
    > ptrdiff_t relocate;
    > struct
    > {
    > char *string;
    > }
    > objs[10];
    >
    > /* Resize heap block containing strings */
    > new_strings = realloc(strings, new_size);
    > if (new_strings == NULL)
    > {
    > /* ...handle error... */
    > }
    >
    > /* Relocate pointers to the start of each string */
    > relocate = new_strings - strings;
    > for (i = 0; i < sizeof(objs) / sizeof(objs[0]); i++)
    > {
    > objs.string += relocate;
    > }
    > strings = new_strings;
    >
    > Can anyone think of a machine architecture where the above code would
    > not work? It seems likely to be a common idiom, even if not strictly
    > legal.


    I can't think of a current one, no, but then I don't know many
    architectures anymore. Old ones, yes. Then again, can you say for
    sure that there won't be any in the future? [On a segmented
    architecture, pointer arithmetic may be done on the offset alone,
    provided every array lies wholly within a segment. Inter-array
    arithmetic can be meaningless in such situations.]

    > It occurs to me that I could circumvent the K&R restriction by
    > utilitising the old address of the heap block within my relocation
    > loop:
    >
    > for (i = 0; i < sizeof(objs) / sizeof(objs[0]); i++)
    > {
    > objs.string = new_strings + (objs.string - strings);
    > }
    >
    > However, given that 'strings' is no longer a valid pointer, is this
    > version any less reprehensible than my original code?


    I think this is safer from a purely practical point of view. It is
    still undefined, but I suspect it is more likely to work on systems
    where the first would fail.

    Another strategy is to scan the old block replacing the pointers with
    offsets within the block. I.e. you rely on the implementation defined
    conversion from an integer type (ptrdiff_t) to char *. If you keep
    the difference positive, it is very likely to work everywhere.

    --
    Ben.
     
    Ben Bacarisse, Dec 31, 2009
    #2
    1. Advertising

  3. Seebs Guest

    On 2009-12-31, <> wrote:
    > However,
    > turning to the appendix of my copy of Kernighan & Ritchie, I
    > discovered - to my dismay - that the result of subtracting one pointer
    > from another is undefined unless both point to objects within the same
    > array.


    Yes. That's largely because of old-style DOS systems, though there may
    come future systems with similar behavior.

    > However, given that 'strings' is no longer a valid pointer, is this
    > version any less reprehensible than my original code?


    Interestingly:

    There are conceivable (probably even real) implementations on which this will
    work, and the other won't. There are conceivable (probably even real)
    implementations on which the other will work, and this won't.

    The two things you might run into are:

    1. Systems in which loading an invalid pointer into an address register
    can page fault even if you don't try to dereference it.
    2. Systems in which pointers to different objects may be in different
    memory regions such that subtraction doesn't work as expected.

    Suggestion: If you can handle the overhead, then malloc-copy-adjust-free
    rather than using realloc. In practice, if realloc is changing addresses,
    it MUST be possible to malloc, copy, and then free. However, I am sure
    that if you tried hard enough, you could find some implementation somewhere
    on which there existed M and N such that allocating M bytes and reallocing
    to N succeeded, but trying to allocate M and N bytes simultaneously would
    fail.

    I wouldn't worry about it. The canonical thing to do is indeed to do
    the copy/adjust phase yourself in cases like this.

    Or! You could use ptrdiff_t offsets internally, and provide an API which
    yields a pointer for a given object. This could be as simple as
    #define STR(x) (global_base + x->offset)

    -s
    --
    Copyright 2009, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
     
    Seebs, Dec 31, 2009
    #3
  4. bartc Guest

    <> wrote in message
    news:...
    > Hello,
    >
    > I wasted a lot of time yesterday writing some code which manages a
    > collection of strings within a single heap block allocated by a
    > function similar to 'malloc'. A separate array of structs includes
    > members which point to the start of each string. My intention is to
    > replace existing (working) code where each string is held in a
    > separate heap block, to reduce a high turnover of blocks.
    >
    > When replacing existing strings with longer strings, or adding strings
    > to the collection, the heap block containing the strings must be
    > extended using a function like 'realloc'. However, 'realloc' may move
    > the base address of the block, which invalidates the pointers to the
    > start of each string. I don't really want to replace all my 'char *'
    > members with 'ptrdiff_t' (offset from start of heap block containing
    > strings) because of the loss of type-safety and additional cost of
    > accessing the strings.
    >
    > I thought I could solve my problem by adding a relocation offset to
    > each pointer immediately after calling 'realloc', derived by
    > subtracting the old from the new address of the heap block. However,
    > turning to the appendix of my copy of Kernighan & Ritchie, I
    > discovered - to my dismay - that the result of subtracting one pointer
    > from another is undefined unless both point to objects within the same
    > array.


    If you allocate a single block with malloc(), that is effectively a single
    array. It doesn't matter if you then choose to divide it into multiple
    arrays.

    --
    Bartc
     
    bartc, Dec 31, 2009
    #4
  5. Eric Sosman Guest

    Re: Data relocation (pointer subtraction undefined except withinarray)

    On 12/31/2009 8:52 AM, wrote:
    > Hello,
    >
    > I wasted a lot of time yesterday writing some code which manages a
    > collection of strings within a single heap block allocated by a
    > function similar to 'malloc'. A separate array of structs includes
    > members which point to the start of each string. My intention is to
    > replace existing (working) code where each string is held in a
    > separate heap block, to reduce a high turnover of blocks.
    >[...]


    Others have discussed the problems of after-the-fact
    readjustment of pointers. May I suggest that you might not
    need it?

    If the goal is to reduce overhead by holding many small
    strings in one big block, perhaps you could consider allocating
    a second big block when the first is full, leaving the first in
    place and still holding its unmoved strings. If the blocks are
    "large" compared to the per-block overhead, you'd waste only a
    small amount of space -- and you'd completely avoid the dangers
    of pointer-fiddling.

    Maybe the data structure you're using isn't amenable to
    this -- but it might be worth a thought or two.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Dec 31, 2009
    #5
  6. On 31 Dec 2009 16:19:06 GMT, Seebs <> wrote:

    >On 2009-12-31, <> wrote:
    >> However,
    >> turning to the appendix of my copy of Kernighan & Ritchie, I
    >> discovered - to my dismay - that the result of subtracting one pointer
    >> from another is undefined unless both point to objects within the same
    >> array.

    >
    >Yes. That's largely because of old-style DOS systems, though there may
    >come future systems with similar behavior.
    >
    >> However, given that 'strings' is no longer a valid pointer, is this
    >> version any less reprehensible than my original code?

    >
    >Interestingly:
    >
    >There are conceivable (probably even real) implementations on which this will
    >work, and the other won't. There are conceivable (probably even real)
    >implementations on which the other will work, and this won't.
    >
    >The two things you might run into are:
    >
    >1. Systems in which loading an invalid pointer into an address register
    >can page fault even if you don't try to dereference it.
    >2. Systems in which pointers to different objects may be in different
    >memory regions such that subtraction doesn't work as expected.
    >
    >Suggestion: If you can handle the overhead, then malloc-copy-adjust-free
    >rather than using realloc. In practice, if realloc is changing addresses,
    >it MUST be possible to malloc, copy, and then free. However, I am sure
    >that if you tried hard enough, you could find some implementation somewhere
    >on which there existed M and N such that allocating M bytes and reallocing
    >to N succeeded, but trying to allocate M and N bytes simultaneously would
    >fail.


    Consider the case where realloc attempts to extend the existing area
    and only assigns a new area when the extension fails. For example, if
    the "heap" is 10 bytes and the first 6 have been allocated,
    reallocating to 7 is easy but allocating a new 7 is not possible.
    There is a similar situation possible if the heap is fragmented.

    --
    Remove del for email
     
    Barry Schwarz, Dec 31, 2009
    #6
  7. Guest

    Re: Data relocation (pointer subtraction undefined except withinarray)

    On Dec 31 2009, 4:21 pm, "bartc" <> wrote:
    > <> wrote in message
    >
    > news:...
    > > Hello,

    >
    > > I wasted a lot of time yesterday writing some code which manages a
    > > collection of strings within a single heap block allocated by a
    > > function similar to 'malloc'. A separate array of structs includes
    > > members which point to the start of each string. My intention is to
    > > replace existing (working) code where each string is held in a
    > > separate heap block, to reduce a high turnover of blocks.

    >
    > > When replacing existing strings with longer strings, or adding strings
    > > to the collection, the heap block containing the strings must be
    > > extended using a function like 'realloc'. However, 'realloc' may move
    > > the base address of the block, which invalidates the pointers to the
    > > start of each string. I don't really want to replace all my 'char *'
    > > members with 'ptrdiff_t' (offset from start of heap block containing
    > > strings) because of the loss of type-safety and additional cost of
    > > accessing the strings.

    >
    > > I thought I could solve my problem by adding a relocation offset to
    > > each pointer immediately after calling 'realloc', derived by
    > > subtracting the old from the new address of the heap block. However,
    > > turning to the appendix of my copy of Kernighan & Ritchie, I
    > > discovered - to my dismay - that the result of subtracting one pointer
    > > from another is undefined unless both point to objects within the same
    > > array.

    >
    > If you allocate a single block with malloc(), that is effectively a single
    > array. It doesn't matter if you then choose to divide it into multiple
    > arrays.


    I think you misunderstood my problem. I was not doing arithmetic
    between pointers to objects within a single block; I was trying to
    relocate pointers so that they point to the new location of objects
    within a block that has been moved by 'realloc'.

    --
    Christopher Bazley
     
    , Jan 27, 2010
    #7
  8. Guest

    Re: Data relocation (pointer subtraction undefined except withinarray)

    On Dec 31 2009, 5:04 pm, Eric Sosman <>
    wrote:
    > On 12/31/2009 8:52 AM, wrote:
    >
    > > Hello,

    >
    > > I wasted a lot of time yesterday writing some code which manages a
    > > collection of strings within a single heap block allocated by a
    > > function similar to 'malloc'. A separate array of structs includes
    > > members which point to the start of each string. My intention is to
    > > replace existing (working) code where each string is held in a
    > > separate heap block, to reduce a high turnover of blocks.
    > >[...]

    >
    >      Others have discussed the problems of after-the-fact
    > readjustment of pointers.  May I suggest that you might not
    > need it?
    >
    >      If the goal is to reduce overhead by holding many small
    > strings in one big block, perhaps you could consider allocating
    > a second big block when the first is full, leaving the first in
    > place and still holding its unmoved strings.  If the blocks are
    > "large" compared to the per-block overhead, you'd waste only a
    > small amount of space -- and you'd completely avoid the dangers
    > of pointer-fiddling.
    >
    >      Maybe the data structure you're using isn't amenable to
    > this -- but it might be worth a thought or two.


    It sounds as though you are suggesting a linked list of large buffers,
    each holding multiple strings. Unfortunately I don't think that would
    suit my needs because strings are often replaced by other strings of
    different length, or deleted entirely. I don't think I would be happy
    with memory usage increasing with each modification of one of my
    objects-with-associated-strings, even if all that memory could be
    recovered (by walking the linked list) upon deletion of the object.

    --
    Christopher Bazley
     
    , Jan 27, 2010
    #8
  9. Guest

    Re: Data relocation (pointer subtraction undefined except withinarray)

    Thanks to everyone who posted replies (especially Ben, Peter and
    Eric); you helped me to organise my thoughts and eventually arrive at
    a solution. I feel like I owe an explanation of how I solved my
    problem, so here it is...

    On Dec 31 2009, 4:19 pm, Seebs <> wrote:
    [snip]
    > The two things you might run into are:
    >
    > 1.  Systems in which loading an invalid pointer into an address register
    > can page fault even if you don't try to dereference it.


    Ah, I wouldn't have thought of that! I am used to the ARM
    architecture, where all registers (except the program counter) are
    general purpose; the CPU can't tell whether a value is a pointer or
    not until it is used as the base address in a load or store
    instruction.

    > 2.  Systems in which pointers to different objects may be in different
    > memory regions such that subtraction doesn't work as expected.


    Like the BBC Master Series microcomputer, where additional memory (so-
    called 'sideways RAM') was paged into the memory map between 0x8000
    and 0xC000 in 16 KB chunks.

    > Suggestion:  If you can handle the overhead, then malloc-copy-adjust-free
    > rather than using realloc.  In practice, if realloc is changing addresses,
    > it MUST be possible to malloc, copy, and then free.  However, I am sure
    > that if you tried hard enough, you could find some implementation somewhere
    > on which there existed M and N such that allocating M bytes and reallocing
    > to N succeeded, but trying to allocate M and N bytes simultaneously would
    > fail.
    >
    > I wouldn't worry about it.  The canonical thing to do is indeed to do
    > the copy/adjust phase yourself in cases like this.


    This is closest to the solution that I eventually adopted.

    My original idea of shuffling strings around the buffer when a string
    is deleted or replaced turned into a nightmare (which is why my
    initial implementation was to put each string in a separate heap
    block). A single call through my API allows several elements of an
    array to be modified in an atomic operation. The strings associated
    with some elements may grow, and those associated with others may
    shrink. It's basically a sliding heap, but more complex because of the
    need for atomicity!

    Before any strings could be moved, I needed to calculate the 'peak'
    memory usage for the edit about to happen. This may be greater than
    the final string buffer requirement, if short strings are replaced by
    long strings before short strings are replaced by long strings. I also
    felt that I should periodically minimise the string buffer size (e.g.
    when in a quiescent state). Every time the string buffer was resized,
    I would have needed to traverse the array, relocating every string
    pointers.

    In the end, I decided that the pain of maintaining my own sliding heap
    within a single block just wasn't worth it. Without the guarantee of
    contiguous address space from a fixed base address, resizing the
    string buffer would likely require its entire content to be copied
    (whether implicitly by 'realloc' or explicitly by 'memcpy'). Although
    likely to be quicker than copying strings piecemeal using 'strcpy',
    that is offset by the fact that some of the strings will be destined
    for the scrapheap anyway.

    Before editing the array of structs, I now sum the expected change in
    string buffer requirement for each array element (e.g. -3 when
    replacing "ladder" with "car") and add it to the current buffer size.
    This allows me to calculate the required string buffer size without
    traversing the whole array twice (unless every element is to be
    modified).

    I then allocate a new string buffer of exactly the right size and
    traverse the array of structs from the beginning, copying each string
    from the old buffer to the new buffer, or else replacing it. Rather
    than relocating pointers, I get fresh pointers returned from 'strcpy'.
    Lastly, I free the old string buffer. Thus, the buffer is never longer
    than it needs to be and each edit operation requires only one
    'malloc'/'free'.

    My performance tests show speed changes of 27% to 27923% relative to
    the previous strategy of allocating a separate heap block for each
    string!

    > Or!  You could use ptrdiff_t offsets internally, and provide an API which
    > yields a pointer for a given object.  This could be as simple as
    >         #define STR(x)  (global_base + x->offset)


    Unfortunately, I'm wedded to the idea of using the same struct type on
    both sides of an API. The client passes a pointer to an array of this
    type, each element of which may contain one or more string pointers.
    Internally, my module makes a deep copy of each struct and bundles it
    with some internal data.

    Requiring the client to pass a string buffer pointer and specify
    strings as ptrdiff_t offsets within that buffer would make using my
    API more painful. Alternatively, I could store ptrdiff_t value(s) as
    part of the internal data associated with each struct, but it would be
    messy to keep that synchronised with the struct type definition.

    Cheers,
    --
    Christopher Bazley
     
    , Jan 27, 2010
    #9
  10. Re: Data relocation (pointer subtraction undefined except within array)

    In message <
    ps.com>
    wrote:

    [snip]

    I doubt anyone read this far, but there were a number of mistakes in
    my original posting...

    > Before any strings could be moved, I needed to calculate the 'peak'
    > memory usage for the edit about to happen. This may be greater than
    > the final string buffer requirement, if short strings are replaced by
    > long strings before short strings are replaced by long strings.


    Of course I meant "...if short strings are replaced by long strings
    before long strings are replaced by short strings."

    [snip]

    > My performance tests show speed changes of 27% to 27923% relative to
    > the previous strategy of allocating a separate heap block for each
    > string!


    I meant 127% to 28023% of the original speed. (One of my tests took
    1/280.23 of the number of clock ticks that it previously did.)

    --
    Chris Bazley
    Star Fighter 3000: http://starfighter.acornarcade.com/
     
    Christopher Bazley, Jan 30, 2010
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Salerno
    Replies:
    20
    Views:
    867
    John Salerno
    Aug 11, 2006
  2. Fabio Z Tessitore

    who is simpler? try/except/else or try/except

    Fabio Z Tessitore, Aug 12, 2007, in forum: Python
    Replies:
    5
    Views:
    384
  3. David House

    try -> except -> else -> except?

    David House, Jul 6, 2009, in forum: Python
    Replies:
    2
    Views:
    354
    Bruno Desthuilliers
    Jul 6, 2009
  4. ara howard
    Replies:
    1
    Views:
    231
  5. Replies:
    0
    Views:
    366
Loading...

Share This Page