Accessing void * buffer/array through char * pointer

Discussion in 'C Programming' started by s0suk3@gmail.com, Sep 6, 2008.

  1. Guest

    This code

    #include <stdio.h>

    int main(void)
    {
    int hello[] = {'h', 'e', 'l', 'l', 'o'};
    char *p = (void *) hello;

    for (size_t i = 0; i < sizeof(hello); ++i) {
    printf("byte %2zu: <%c>", i, p);
    if (!p)
    printf(" (null char)");
    printf("\n");
    }

    return 0;
    }

    produces this output

    byte 0: <h>
    byte 1: <> (null char)
    byte 2: <> (null char)
    byte 3: <> (null char)
    byte 4: <e>
    byte 5: <> (null char)
    byte 6: <> (null char)
    byte 7: <> (null char)
    byte 8: <l>
    byte 9: <> (null char)
    byte 10: <> (null char)
    byte 11: <> (null char)
    byte 12: <l>
    byte 13: <> (null char)
    byte 14: <> (null char)
    byte 15: <> (null char)
    byte 16: <o>
    byte 17: <> (null char)
    byte 18: <> (null char)
    byte 19: <> (null char)

    I'm confused about the int *-to-void *-to char * conversion. The
    output shows that ints are four bytes on my machine, and the values in
    the initializer 'h', 'e', 'l', 'l', 'o' have the values 104, 101, 108,
    108, and 111, respectively, so they're able to be represented as chars
    (duh...). But if I'd change the initializer to something like

    int hello[] = {1000, 43676, 362, 6364, 2575};

    I'd get this output

    byte 0: <�>
    byte 1: <>
    byte 2: <> (null char)
    byte 3: <> (null char)
    byte 4: <�>
    byte 5: <�>
    byte 6: <> (null char)
    byte 7: <> (null char)
    byte 8: <j>
    byte 9: <>
    byte 10: <> (null char)
    byte 11: <> (null char)
    byte 12: <�>
    byte 13: <â–’>
    byte 14: <> (null char)
    byte 15: <> (null char)
    byte 16: <>
    byte 17: <
    >

    byte 18: <> (null char)
    byte 19: <> (null char)

    (In other words, non-printable characters.)

    Is some kind of overflow happening when I subscript the char pointer?
    Or am I simply getting meaningless values because of accessing a char
    pointer that points to something that wasn't a char object?

    Sebastian
     
    , Sep 6, 2008
    #1
    1. Advertising

  2. writes:

    > #include <stdio.h>
    >
    > int main(void)
    > {
    > int hello[] = {'h', 'e', 'l', 'l', 'o'};
    > char *p = (void *) hello;
    >
    > for (size_t i = 0; i < sizeof(hello); ++i) {
    > printf("byte %2zu: <%c>", i, p);
    > if (!p)
    > printf(" (null char)");
    > printf("\n");
    > }
    >
    > return 0;
    > }

    <snip>
    > I'm confused about the int *-to-void *-to char * conversion. The
    > output shows that ints are four bytes on my machine, and the values in
    > the initializer 'h', 'e', 'l', 'l', 'o' have the values 104, 101, 108,
    > 108, and 111, respectively, so they're able to be represented as chars
    > (duh...). But if I'd change the initializer to something like
    >
    > int hello[] = {1000, 43676, 362, 6364, 2575};
    >
    > I'd get this output
    >
    > byte 0: <�>
    > byte 1: <>
    > byte 2: <> (null char)
    > byte 3: <> (null char)
    > byte 4: <�>
    > byte 5: <�>
    > byte 6: <> (null char)
    > byte 7: <> (null char)

    <snip>
    > (In other words, non-printable characters.)
    >
    > Is some kind of overflow happening when I subscript the char
    > pointer?


    No, no overflow is happening on access.

    > Or am I simply getting meaningless values because of accessing a char
    > pointer that points to something that wasn't a char object?


    First, they are not meaningless. Some process governs exactly what
    you see but since it may involve things like the terminal setting it
    can be a very complex one.

    Secondly, a char pointer always points at a char object. C does not
    mandate the value but any object of any type can be accessed as if it
    is a sequence of char objects. In your case the first two characters
    are almost certainly 1000 % 256 and 1000 / 256, i.e. the least and
    second least significant bytes of the binary representation of 1000.

    I know this is not an actual answer, but your either/or questions
    don't give me much room!

    --
    Ben.
     
    Ben Bacarisse, Sep 6, 2008
    #2
    1. Advertising

  3. On Sep 6, 4:26 pm, wrote:
    > This code
    >
    > #include <stdio.h>
    >
    > int main(void)
    > {
    > int hello[] = {'h', 'e', 'l', 'l', 'o'};
    > char *p = (void *) hello;
    >
    > for (size_t i = 0; i < sizeof(hello); ++i) {
    > printf("byte %2zu: <%c>", i, p);
    > if (!p)
    > printf(" (null char)");
    > printf("\n");
    > }
    >
    > return 0;
    >
    > }
    >
    > produces this output
    >
    > byte 0: <h>
    > byte 1: <> (null char)
    > byte 2: <> (null char)
    > byte 3: <> (null char)
    > byte 4: <e>
    > byte 5: <> (null char)
    > byte 6: <> (null char)
    > byte 7: <> (null char)
    > byte 8: <l>
    > byte 9: <> (null char)
    > byte 10: <> (null char)
    > byte 11: <> (null char)
    > byte 12: <l>
    > byte 13: <> (null char)
    > byte 14: <> (null char)
    > byte 15: <> (null char)
    > byte 16: <o>
    > byte 17: <> (null char)
    > byte 18: <> (null char)
    > byte 19: <> (null char)
    >
    > I'm confused about the int *-to-void *-to char * conversion. The
    > output shows that ints are four bytes on my machine, and the values in
    > the initializer 'h', 'e', 'l', 'l', 'o' have the values 104, 101, 108,
    > 108, and 111, respectively, so they're able to be represented as chars
    > (duh...). But if I'd change the initializer to something like
    >
    > int hello[] = {1000, 43676, 362, 6364, 2575};
    >
    > I'd get this output
    >
    > byte 0: < >
    > byte 1: <>
    > byte 2: <> (null char)
    > byte 3: <> (null char)
    > byte 4: < >
    > byte 5: < >
    > byte 6: <> (null char)
    > byte 7: <> (null char)
    > byte 8: <j>
    > byte 9: <>
    > byte 10: <> (null char)
    > byte 11: <> (null char)
    > byte 12: < >
    > byte 13: <‘>
    > byte 14: <> (null char)
    > byte 15: <> (null char)
    > byte 16: <>
    > byte 17: <
    >
    > byte 18: <> (null char)
    > byte 19: <> (null char)
    >
    > (In other words, non-printable characters.)
    >
    > Is some kind of overflow happening when I subscript the char pointer?
    > Or am I simply getting meaningless values because of accessing a char
    > pointer that points to something that wasn't a char object?
    >
    > Sebastian


    your question is same as i asked here before acutally acutally when we
    have array of chars or int the linker allocates bytes of initialized
    values plus 4 BYTES MORE which also include a NuLL char '\0'
     
    raashid bhatt, Sep 6, 2008
    #3
  4. On Sep 6, 4:26 pm, wrote:
    > This code
    >
    > #include <stdio.h>
    >
    > int main(void)
    > {
    > int hello[] = {'h', 'e', 'l', 'l', 'o'};
    > char *p = (void *) hello;
    >
    > for (size_t i = 0; i < sizeof(hello); ++i) {
    > printf("byte %2zu: <%c>", i, p);
    > if (!p)
    > printf(" (null char)");
    > printf("\n");
    > }
    >
    > return 0;
    >
    > }
    >
    > produces this output
    >
    > byte 0: <h>
    > byte 1: <> (null char)
    > byte 2: <> (null char)
    > byte 3: <> (null char)
    > byte 4: <e>
    > byte 5: <> (null char)
    > byte 6: <> (null char)
    > byte 7: <> (null char)
    > byte 8: <l>
    > byte 9: <> (null char)
    > byte 10: <> (null char)
    > byte 11: <> (null char)
    > byte 12: <l>
    > byte 13: <> (null char)
    > byte 14: <> (null char)
    > byte 15: <> (null char)
    > byte 16: <o>
    > byte 17: <> (null char)
    > byte 18: <> (null char)
    > byte 19: <> (null char)
    >
    > I'm confused about the int *-to-void *-to char * conversion. The
    > output shows that ints are four bytes on my machine, and the values in
    > the initializer 'h', 'e', 'l', 'l', 'o' have the values 104, 101, 108,
    > 108, and 111, respectively, so they're able to be represented as chars
    > (duh...). But if I'd change the initializer to something like
    >
    > int hello[] = {1000, 43676, 362, 6364, 2575};
    >
    > I'd get this output
    >
    > byte 0: < >
    > byte 1: <>
    > byte 2: <> (null char)
    > byte 3: <> (null char)
    > byte 4: < >
    > byte 5: < >
    > byte 6: <> (null char)
    > byte 7: <> (null char)
    > byte 8: <j>
    > byte 9: <>
    > byte 10: <> (null char)
    > byte 11: <> (null char)
    > byte 12: < >
    > byte 13: <‘>
    > byte 14: <> (null char)
    > byte 15: <> (null char)
    > byte 16: <>
    > byte 17: <
    >
    > byte 18: <> (null char)
    > byte 19: <> (null char)
    >
    > (In other words, non-printable characters.)
    >
    > Is some kind of overflow happening when I subscript the char pointer?
    > Or am I simply getting meaningless values because of accessing a char
    > pointer that points to something that wasn't a char object?
    >
    > Sebastian


    here is the post
    http://groups.google.co.in/group/co...b969f/8642714e7b3e6c17?hl=en#8642714e7b3e6c17
     
    raashid bhatt, Sep 6, 2008
    #4
  5. Guest

    On Sep 6, 2:26 pm, wrote:
    > This code


    Is broken. :p

    > #include <stdio.h>
    >
    > int main(void)
    > {
    > int hello[] = {'h', 'e', 'l', 'l', 'o'};
    > char *p = (void *) hello;


    Change p to `unsigned char'. And the cast can be (void *) or (unsigned
    char *).

    > for (size_t i = 0; i < sizeof(hello); ++i) {
    > printf("byte %2zu: <%c>", i, p);


    Remove the 2 in %2zu, else the output might not be meaningful.
    Evaluating p can invoke undefined behavior. Changing p to type
    `unsigned char *' as I have suggested previously fixes this.

    > if (!p)
    > printf(" (null char)");
    > printf("\n");
    > }
    >
    > return 0;
    >
    > }
    >
    > produces this output


    <snip output>


    > I'm confused about the int *-to-void *-to char * conversion. The
    > output shows that ints are four bytes on my machine, and the values in
    > the initializer 'h', 'e', 'l', 'l', 'o' have the values 104, 101, 108,
    > 108, and 111, respectively, so they're able to be represented as chars
    > (duh...). But if I'd change the initializer to something like


    You can convert any pointer to object to unsigned char * to inspect
    its object representation.

    > int hello[] = {1000, 43676, 362, 6364, 2575};
    >
    > I'd get this output


    <snip>

    > (In other words, non-printable characters.)


    So what?

    > Is some kind of overflow happening when I subscript the char pointer?
    > Or am I simply getting meaningless values because of accessing a char
    > pointer that points to something that wasn't a char object?


    *ASSUMING* you change p to unsigned char *, you split the objects
    representation in CHAR_BIT chunks, and you treat those bits as value
    bits, even though in the original object they might be padding bits or
    a sign bit.

    Change your program to this for more meaningful output:

    #include <stdio.h>

    typedef int object_type;
    #define SIZE 10

    int main(void) {

    object_type object[SIZE];
    unsigned char *p;
    size_t i;

    p = (unsigned char *)object;

    for(i = 0; i < sizeof object; i++)
    if(isprint(p)) printf("p[%zu] = '%c'\n", i, p);
    else printf("p[%zu] = not printable\n", i);

    return 0;
    }
     
    , Sep 6, 2008
    #5
  6. Ben Bacarisse, Sep 6, 2008
    #6
  7. writes:

    > On Sep 6, 2:26 pm, wrote:

    <snip>
    >> for (size_t i = 0; i < sizeof(hello); ++i) {
    >> printf("byte %2zu: <%c>", i, p);

    >
    > Remove the 2 in %2zu, else the output might not be meaningful.


    What on earth is wrong with the 2?

    > Evaluating p can invoke undefined behavior.


    I think it is better to say "may invoke undefined behaviour". I
    accept that you don't agree (there's been a long thread about this
    already) but just for the benefit of the OP there are systems on which
    the code posted can't go wrong in any way. Using a potentially signed
    char pointer merely limits the portability to a very specific class of
    implementations and you, as the programmer, can know (with absolute
    certainty) if the code's behaviour is defined or not beforehand. This
    is quite unlike some other kinds of UB.

    However, one should always used unsigned char for this purpose since
    there is no advantage to be gained by using char *.

    --
    Ben.
     
    Ben Bacarisse, Sep 6, 2008
    #7
  8. Guest

    On Sep 6, 7:02 am, Ben Bacarisse <> wrote:
    > writes:

    <snip>
    >> Or am I simply getting meaningless values because of accessing a char
    >> pointer that points to something that wasn't a char object?

    >
    > First, they are not meaningless.  Some process governs exactly what
    > you see but since it may involve things like the terminal setting it
    > can be a very complex one.
    >


    Well, I meant meaningless in the sense that the values were produced
    in an unnatural way. For example, with the second initializer, the
    bits that make up for the 1000 in the first element of the int array
    are then broken when accessed through the char pointer and everything
    becomes a mess! (I got -24 when trying to see the numerical value of
    the first element that the char pointer was pointing to.)

    > Secondly, a char pointer always points at a char object.  C does not
    > mandate the value but any object of any type can be accessed as if it
    > is a sequence of char objects.  In your case the first two characters
    > are almost certainly 1000 % 256 and 1000 / 256, i.e. the least and
    > second least significant bytes of the binary representation of 1000.
    >


    Changing the printf format string to "byte %2zu: <%d>" (i.e., to print
    a number instead of a character) yields -24 (which is not 1000 % 256)
    for the first character and 3 (which is indeed 1000 / 256) for the
    second.

    > I know this is not an actual answer, but your either/or questions
    > don't give me much room!
    >


    No, it was indeed useful. Anyway, I was just unsure whether accessing
    the char pointer in that way was safe. Thanks.

    Sebastian
     
    , Sep 6, 2008
    #8
  9. Guest

    On Sep 6, 4:09 pm, Ben Bacarisse <> wrote:
    > writes:
    > > On Sep 6, 2:26 pm, wrote:

    > <snip>
    > >> for (size_t i = 0; i < sizeof(hello); ++i) {
    > >> printf("byte %2zu: <%c>", i, p);

    >
    > > Remove the 2 in %2zu, else the output might not be meaningful.

    >
    > What on earth is wrong with the 2?


    Nothing, I got confused with the meaning of `2' in scanf. (I thought
    only the first two digits would be printed, like only the first two
    digits are matched in scanf)
    I apologize for this.
     
    , Sep 6, 2008
    #9
  10. writes:

    > On Sep 6, 7:02 am, Ben Bacarisse <> wrote:
    >> writes:

    > <snip>
    >>> Or am I simply getting meaningless values because of accessing a char
    >>> pointer that points to something that wasn't a char object?

    <snip>
    >> Secondly, a char pointer always points at a char object.  C does not
    >> mandate the value but any object of any type can be accessed as if it
    >> is a sequence of char objects.  In your case the first two characters
    >> are almost certainly 1000 % 256 and 1000 / 256, i.e. the least and
    >> second least significant bytes of the binary representation of 1000.
    >>

    >
    > Changing the printf format string to "byte %2zu: <%d>" (i.e., to print
    > a number instead of a character) yields -24 (which is not 1000 % 256)
    > for the first character and 3 (which is indeed 1000 / 256) for the
    > second.


    OK, fair cop. 1000 % 256 = 232 = 256-24. I.e. -24 is 1000 % 256 when
    that bit pattern is interpreted as signed using the most common
    representation of signed values (2s complement). Also -24 = 232 (mod
    256) i.e. -24 is a value of 1000 mod 256 so the result is
    understandable in terms of div and mod. I did not know your char was
    signed (though in true that is common).

    >> I know this is not an actual answer, but your either/or questions
    >> don't give me much room!
    >>

    >
    > No, it was indeed useful. Anyway, I was just unsure whether accessing
    > the char pointer in that way was safe. Thanks.


    Now that is another question altogether. It is safe unless your
    implementation has signed chars and has trap representations for some
    bit patters in signed char (-0 being the most likely on non 2s
    complement machines).

    In one sense (maximum portability) using char is not safe, but since
    any given implementation must tell you if it is not safe, I feel it is
    a bit broad to say "undefined". However (and this is a big however)
    since there is nothing at all to be gained at all from using char
    (rather than unsigned char) you should always use unsigned char when
    inspecting the representation of an object.

    --
    Ben.
     
    Ben Bacarisse, Sep 6, 2008
    #10
  11. "no name given" wrote:

    > On Sep 6, 7:02 am, Ben Bacarisse <> wrote:
    >> Secondly, a char pointer always points at a char object.  C does not
    >> mandate the value but any object of any type can be accessed as if it
    >> is a sequence of char objects.  In your case the first two characters
    >> are almost certainly 1000 % 256 and 1000 / 256, i.e. the least and
    >> second least significant bytes of the binary representation of 1000.
    >>

    >
    > Changing the printf format string to "byte %2zu: <%d>" (i.e., to print
    > a number instead of a character) yields -24 (which is not 1000 % 256)
    > for the first character and 3 (which is indeed 1000 / 256) for the
    > second.


    Yes, 1000 % 256 == 232. So your result shows that your characters
    are 8 bit wide and signed; and that you use an inappropriate type
    to access them when you prefer only non-negative values.

    -- Ralf
     
    Ralf Damaschke, Sep 6, 2008
    #11
  12. Guest

    On Sep 6, 9:00 am, Ben Bacarisse <> wrote:
    > writes:
    >> On Sep 6, 7:02 am, Ben Bacarisse <> wrote:
    >>> writes:

    <snip>
    >>> I know this is not an actual answer, but your either/or questions
    >>> don't give me much room!

    >
    >> No, it was indeed useful. Anyway, I was just unsure whether accessing
    >> the char pointer in that way was safe. Thanks.

    >
    > Now that is another question altogether.  It is safe unless your
    > implementation has signed chars and has trap representations for some
    > bit patters in signed char (-0 being the most likely on non 2s
    > complement machines).
    >
    > In one sense (maximum portability) using char is not safe, but since
    > any given implementation must tell you if it is not safe, I feel it is
    > a bit broad to say "undefined".  However (and this is a big however)
    > since there is nothing at all to be gained at all from using char
    > (rather than unsigned char) you should always use unsigned char when
    > inspecting the representation of an object.
    >


    What is the advantage of using unsigned char over char? Why can it
    even invoke undefined behavior as vippstar said? I read somewhere
    that, if char happens to be signed, and if the variable will be used
    to hold 8-bit characters (i.e., a character whose value requires the
    use of the leftmost bit), it was better to use unsigned char if the
    variable would ever be used as an integer (otherwise we might end up
    with a negative value). Are there any other advantages other than
    that? And what's the deal with UB?

    Sebastian
     
    , Sep 6, 2008
    #12
  13. writes:

    > On Sep 6, 9:00 am, Ben Bacarisse <> wrote:
    >> writes:
    >>> On Sep 6, 7:02 am, Ben Bacarisse <> wrote:
    >>>> writes:

    > <snip>
    >>>> I know this is not an actual answer, but your either/or questions
    >>>> don't give me much room!

    >>
    >>> No, it was indeed useful. Anyway, I was just unsure whether accessing
    >>> the char pointer in that way was safe. Thanks.

    >>
    >> Now that is another question altogether.  It is safe unless your
    >> implementation has signed chars and has trap representations for some
    >> bit patters in signed char (-0 being the most likely on non 2s
    >> complement machines).
    >>
    >> In one sense (maximum portability) using char is not safe, but since
    >> any given implementation must tell you if it is not safe, I feel it is
    >> a bit broad to say "undefined".  However (and this is a big however)
    >> since there is nothing at all to be gained at all from using char
    >> (rather than unsigned char) you should always use unsigned char when
    >> inspecting the representation of an object.
    >>

    >
    > What is the advantage of using unsigned char over char?


    Accessing an arbitrary bit pattern using unsigned char * is 100% safe
    on all conforming C implementations, so that is what all wise C
    programmers do. There can be no reason no to (that I can see).

    This means that discussing what happens when char is used is an
    academic exercise. I am all for that, but don't confuse the fact that
    problems when using char are rare with any endorsement of using char
    rather unsigned char!

    > Why can it
    > even invoke undefined behavior as vippstar said?


    As far as I can see, using signed char * is only a problem when some
    bit patterns are trap representations. This is very rare and I'd be
    prepared to bet that you would know if you are using such an
    implementation. If you are porting to such an implementation you will
    hope that the compiler offers a flag to make char unsigned by default
    because quite a lot of old C code using char * to do what you are
    doing. This is the UB that vippstar was talking about (I think).

    The most likely case being accessing a negative zero on a system where
    negative zero is a trap representation rather than an alternative form
    of zero (this is specifically permitted). On systems where negative
    zero is not a trap, your code won't be undefined but you will be
    unable to distinguish between the two forms of zero. I.e. two
    different ints might compare the same if you compare them as arrays of
    signed char values[1].

    > I read somewhere
    > that, if char happens to be signed, and if the variable will be used
    > to hold 8-bit characters (i.e., a character whose value requires the
    > use of the leftmost bit), it was better to use unsigned char if the
    > variable would ever be used as an integer (otherwise we might end up
    > with a negative value).


    When char is signed you can be led into making other mistakes -- for
    example forgetting to convert to unsigned before calling one of the
    is* functions from ctype.h, or using the char to index an array
    without checking for negative values, but these are secondary problems
    and are unrelated to what vippstar was referring to.

    [1] memcmp never does this.

    --
    Ben.
     
    Ben Bacarisse, Sep 6, 2008
    #13
  14. On Sat, 06 Sep 2008 15:31:23 +0000, Richard Heathfield wrote:
    > said:
    > <snip>
    >
    >> What is the advantage of using unsigned char over char?

    >
    > It's guaranteed that you can treat any object as if it were an array of
    > unsigned char (subject to syntax rules, of course) without breaking
    > anything - for example,
    > [snip]
    > The same guarantee is not extended to char or signed char.
    >
    > See 6.2.6.1(3) of C99.


    You're correct that this talks of unsigned char only, but:

    6.3.2.3p7:
    "When a pointer to an object is converted to a pointer to a character
    type, the result points to the lowest addressed byte of the object.
    Successive increments of the result, up to the size of the object, yield
    pointers to the remaining bytes of the object."

    6.5p7:
    "An object shall have its stored value accessed only by an lvalue
    expression that has one of the following types:
    [snip]
    -- a character type."

    6.2.6.1p5:
    "Certain object representations need not represent a value of the object
    type. If the stored value of an object has such a representation and is
    read by an lvalue expression that does not have character type, the
    behavior is undefined. If such a representation is produced by a side
    effect that modifies all or any part of the object by an lvalue
    expression that does not have character type, the behavior is
    undefined. Such a representation is called a trap representation."

    all talk of "character type", not "unsigned char". So... signed char * can
    be used to point to the bytes of an object. It can be used to access those
    bytes of the object. And if the byte happens to be a trap representation
    for signed char, then the behaviour is still not automatically undefined.
    All that 6.2.6.1p3 says is that the values you get this way are not called
    the object representation, as far as I can tell.
     
    Harald van Dijk, Sep 6, 2008
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ollej Reemt
    Replies:
    7
    Views:
    547
    Jack Klein
    Apr 22, 2005
  2. Stig Brautaset

    `void **' revisited: void *pop(void **root)

    Stig Brautaset, Oct 25, 2003, in forum: C Programming
    Replies:
    15
    Views:
    794
    The Real OS/2 Guy
    Oct 28, 2003
  3. Replies:
    5
    Views:
    842
    S.Tobias
    Jul 22, 2005
  4. Replies:
    1
    Views:
    413
    Victor Bazarov
    May 23, 2007
  5. Xavier Roche
    Replies:
    3
    Views:
    106
    James Kuyper
    Mar 25, 2014
Loading...

Share This Page