when is typecasting (unsigned char*) to (char*) dangerous?

Discussion in 'C Programming' started by tim, Nov 15, 2011.

  1. tim

    tim Guest

    thanks in advance for your help, tim
     
    tim, Nov 15, 2011
    #1
    1. Advertising

  2. tim

    Kaz Kylheku Guest

    On 2011-11-15, tim <> wrote:
    > thanks in advance for your help, tim


    It's dangerous when it's done by someone who got other people
    to answer his technical interview or homework questions.
     
    Kaz Kylheku, Nov 15, 2011
    #2
    1. Advertising

  3. include the subject of your post in the body of your post

    "when is typecasting (unsigned char*) to (char*) dangerous? Options"

    On Nov 15, 7:59 pm, tim <> wrote:
    > thanks in advance for your help, tim


    what is "typecasting"? It isn't defined by the C Standard so I'm
    guessing you're talking about the problems older actors have? Must be
    some extended pun to do with stars and character actors not being
    asked for autographs...
     
    Nick Keighley, Nov 16, 2011
    #3
  4. On Nov 15, 9:59 pm, tim <> wrote:
    > thanks in advance for your help, tim
    >

    In narrow technical sense, it's almost never dangerous. chars can have
    trap representations whilst unsigned chars can't, but it's most
    unlikely your code will ever need to run on such a machine.

    However casting unsigned char to char, though not the other way round,
    usually means that someone doesn't know what they are doing. char
    should be used for characters, i.e human-readable text unsigned char
    for bytes, usually arbitrary bits, occasionally for small integers. If
    you need a tiny signed integer, use signed char. It doesn't normally
    make sense to convert an arbitrary bit pattern to a human-readable
    character.
    --
    MiniBasic - how to write a script interpreter. Working ANSI C Basic
    interpreter
    http://www.malcolmmclean.site11.com/www
     
    Malcolm McLean, Nov 16, 2011
    #4
  5. On 16.11.2011 10:36, Malcolm McLean wrote:
    > On Nov 15, 9:59 pm, tim <> wrote:
    >> thanks in advance for your help, tim
    >>

    > In narrow technical sense, it's almost never dangerous. chars can have
    > trap representations whilst unsigned chars can't, but it's most
    > unlikely your code will ever need to run on such a machine.
    >
    > However casting unsigned char to char, though not the other way round,
    > usually means that someone doesn't know what they are doing. char
    > should be used for characters, i.e human-readable text unsigned char
    > for bytes, usually arbitrary bits, occasionally for small integers. If
    > you need a tiny signed integer, use signed char. It doesn't normally
    > make sense to convert an arbitrary bit pattern to a human-readable
    > character.


    Heh! Tell that to the guys who wrote libbzip2! They have a routine
    called BZ2_bzBuffToBuffDecompress() and they think that the compressed
    buffer pointer is of type "char*". Nice one. So I had to cast that
    pointer to make the warning go away.

    CYA,
    Markus
     
    Markus Wichmann, Nov 16, 2011
    #5
  6. tim

    nroberts Guest

    On Nov 16, 1:36 am, Malcolm McLean <>
    wrote:
    > On Nov 15, 9:59 pm, tim <> wrote:> thanks in advance for your help, tim
    >
    > In narrow technical sense, it's almost never dangerous. chars can have
    > trap representations whilst unsigned chars can't, but it's most
    > unlikely your code will ever need to run on such a machine.
    >
    > However casting unsigned char to char, though not the other way round,
    > usually means that someone doesn't know what they are doing. char
    > should be used for characters, i.e human-readable text unsigned char
    > for bytes, usually arbitrary bits, occasionally for small integers.


    Not since the 70's...maybe earlier. The character sets on today's
    systems extend the full length of the 8 bit, unsigned character if not
    further.

    > If
    > you need a tiny signed integer, use signed char.


    Not unless you've got a really good reason. It doesn't buy you
    anything.
     
    nroberts, Nov 16, 2011
    #6
  7. tim

    tim Guest

    THIS IS NOT A HOMEWORK

    On Tue, 15 Nov 2011 20:51:01 +0000, Kaz Kylheku wrote:
    > On 2011-11-15, tim <> wrote:
    >> thanks in advance for your help, tim

    >
    > It's dangerous when it's done by someone who got other people to answer
    > his technical interview or homework questions.
     
    tim, Nov 16, 2011
    #7
  8. tim

    Ben Pfaff Guest

    When "char" has trap representations that "unsigned char" does
    not?
    --
    "When I have to rely on inadequacy, I prefer it to be my own."
    --Richard Heathfield
     
    Ben Pfaff, Nov 16, 2011
    #8
  9. tim

    Kaz Kylheku Guest

    On 2011-11-16, tim <> wrote:
    > THIS IS NOT A HOMEWORK


    In that case, ...

    If the unsigned char * value is already well defined and everything, it is
    safe to convert it to char *, and even to access the memory. In C, this
    is not considered to be invalid aliasing. Any object can be accessed
    as an array of characters, plain, signed or unsigned.

    Sometimes this conversion will be necessary. If you know that some region of
    memory contains a null terminate C string that you would like to compare with
    strcmp, you will end up doing that cast.

    One thing that may be dangerous is that char may be a signed value. This
    means that through a char * pointer, some of the byte values will appear
    negative. You can trip up like this:

    int translated_char = table[*char_ptr]; /* oops, negative index */

    Accessing memory using an unsigned char * pointer ensures that bytes are
    treated as positive binary numbers.
     
    Kaz Kylheku, Nov 16, 2011
    #9
  10. tim

    Ben Pfaff Guest

    Vincenzo Mercuri <> writes:

    > Ben Pfaff ha scritto:
    >> When "char" has trap representations that "unsigned char" does
    >> not?

    >
    > I thought about this as well, but the typecast "per se" is in fact
    > a conversion between pointer types so I think it would be safe.


    I agree that the cast itself is not the problem.
    --
    Ben Pfaff
    http://benpfaff.org
     
    Ben Pfaff, Nov 16, 2011
    #10
  11. Kaz Kylheku <> writes:
    > On 2011-11-16, tim <> wrote:
    >> THIS IS NOT A HOMEWORK


    (no need to shout)

    > In that case, ...
    >
    > If the unsigned char * value is already well defined and everything, it is
    > safe to convert it to char *, and even to access the memory. In C, this
    > is not considered to be invalid aliasing. Any object can be accessed
    > as an array of characters, plain, signed or unsigned.


    Does the standard guarantee that? I was unable to find anything
    that permits treating arbitrary objects as arrays of anything other
    than unsigned char.

    C99 6.1.6.1p4:

    Values stored in non-bit-field objects of any other object type
    consist of n * CHAR_BIT bits, where n is the size of an object
    of that type, in bytes. The value may be copied into an object
    of type unsigned char [n] (e.g., by memcpy); the resulting set
    of bytes is called the *object representation& of the value.

    C99 6.2.6.2p1:

    For unsigned integer types other than unsigned char, the bits of the
    object representation shall be divided into two groups: value bits
    and padding bits (there need not be any of the latter).

    p2:

    Which of these [sign and magnitude, two's complement, ones'
    complement] applies is implementation-defined, as is whether
    the value with sign bit 1 and all value bits zero (for the
    first two), or with sign bit and all value bits 1 (for ones’
    complement), is a trap representation or a normal value.

    As far as I can tell, given that CHAR_BIT==8, it would be legal for an
    implementation to have plain char (if it's signed) and signed char have
    a range of -127 .. +127, with the extra representation being a trap
    representation. It would even be legal for signed char to have padding
    bits, possibly leading to even more trap representation; given the
    requirements for SCHAR_MIN and SCHAR_MAX, that's possible only if
    CHAR_BIT > 8.

    I seriously doubt that any real-world implementation takes advantage
    of this.

    I have a vague memory of a statement that plain and signed char cannot
    have trap representations, but I can't confirm that from the standard.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Nov 16, 2011
    #11
  12. tim

    Kaz Kylheku Guest

    On 2011-11-16, Keith Thompson <> wrote:
    > Kaz Kylheku <> writes:
    >> On 2011-11-16, tim <> wrote:
    >>> THIS IS NOT A HOMEWORK

    >
    > (no need to shout)
    >
    >> In that case, ...
    >>
    >> If the unsigned char * value is already well defined and everything, it is
    >> safe to convert it to char *, and even to access the memory. In C, this
    >> is not considered to be invalid aliasing. Any object can be accessed
    >> as an array of characters, plain, signed or unsigned.

    >
    > Does the standard guarantee that?


    Even if there is a trap representation there, it's not an aliasing issue.

    If you could not alias an object using chars, then no access at all would
    be well-defined.
     
    Kaz Kylheku, Nov 16, 2011
    #12
  13. Kaz Kylheku <> writes:
    > On 2011-11-16, Keith Thompson <> wrote:
    >> Kaz Kylheku <> writes:
    >>> On 2011-11-16, tim <> wrote:
    >>>> THIS IS NOT A HOMEWORK

    >>
    >> (no need to shout)
    >>
    >>> In that case, ...
    >>>
    >>> If the unsigned char * value is already well defined and everything, it is
    >>> safe to convert it to char *, and even to access the memory. In C, this
    >>> is not considered to be invalid aliasing. Any object can be accessed
    >>> as an array of characters, plain, signed or unsigned.

    >>
    >> Does the standard guarantee that?

    >
    > Even if there is a trap representation there, it's not an aliasing issue.
    >
    > If you could not alias an object using chars, then no access at all would
    > be well-defined.


    I don't follow your reasoning.

    Where does the standard say that you can alias any object with an
    array of plain or signed char?

    If you can't do so, how does that affect the ability to access an
    object as its declared type, or as an array of unsigned char?

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Nov 16, 2011
    #13
  14. tim

    Kaz Kylheku Guest

    On 2011-11-16, Keith Thompson <> wrote:
    > Kaz Kylheku <> writes:
    >> On 2011-11-16, Keith Thompson <> wrote:
    >>> Kaz Kylheku <> writes:
    >>>> On 2011-11-16, tim <> wrote:
    >>>>> THIS IS NOT A HOMEWORK
    >>>
    >>> (no need to shout)
    >>>
    >>>> In that case, ...
    >>>>
    >>>> If the unsigned char * value is already well defined and everything, it is
    >>>> safe to convert it to char *, and even to access the memory. In C, this
    >>>> is not considered to be invalid aliasing. Any object can be accessed
    >>>> as an array of characters, plain, signed or unsigned.
    >>>
    >>> Does the standard guarantee that?

    >>
    >> Even if there is a trap representation there, it's not an aliasing issue.
    >>
    >> If you could not alias an object using chars, then no access at all would
    >> be well-defined.

    >
    > I don't follow your reasoning.
    >
    > Where does the standard say that you can alias any object with an
    > array of plain or signed char?


    6.5 paragraph 7. An object can be accessed with an lvalue which
    is of character type.
     
    Kaz Kylheku, Nov 16, 2011
    #14
  15. tim

    James Kuyper Guest

    On 11/16/2011 02:52 PM, Keith Thompson wrote:
    ....
    > Where does the standard say that you can alias any object with an
    > array of plain or signed char?


    6.5p7, last item: "a character type". The term "alias" is used only in
    footnote 76, but that's sufficient for this purpose.
     
    James Kuyper, Nov 16, 2011
    #15
  16. tim

    James Kuyper Guest

    On 11/16/2011 02:27 PM, Keith Thompson wrote:
    ....
    > I have a vague memory of a statement that plain and signed char cannot
    > have trap representations, but I can't confirm that from the standard.


    I know of no reason why signed char (and therefore, char) cannot have
    trap representations. However, every statement in 6.2.6.1p5 which says
    that the behavior is undefined when a trap representation is involved,
    explicitly excludes all character types, not just unsigned char. I'm not
    quite sure what to make of that fact, but I'm sure that explicitly
    excluding all character types was intentional; I'm not so sure whether
    it was intentional to allow signed char to have trap representations.
     
    James Kuyper, Nov 16, 2011
    #16
  17. On Nov 16, 9:24 pm, James Kuyper <> wrote:
    > On 11/16/2011 02:27 PM, Keith Thompson wrote:
    > ...
    >
    > > I have a vague memory of a statement that plain and signed char cannot
    > > have trap representations, but I can't confirm that from the standard.

    >
    > I know of no reason why signed char (and therefore, char) cannot have
    > trap representations. However, every statement in 6.2.6.1p5 which says
    > that the behavior is undefined when a trap representation is involved,
    > explicitly excludes all character types, not just unsigned char. I'm not
    > quite sure what to make of that fact, but I'm sure that explicitly
    > excluding all character types was intentional; I'm not so sure whether
    > it was intentional to allow signed char to have trap representations.


    6.2.6.1p5 refers to the trap representations for the type of the
    object. In other words, if an object p of type void * holds a trap
    representation, 6.2.6.1p5 makes it explicit that reading that object
    as void * is not valid. It doesn't say that signed char can be used to
    access the bytes in p, it merely doesn't say that it can't. If signed
    char has no trap representations, the required behaviour can be
    inferred from other parts of the standard. If signed char does have
    trap representations, then even though 6.2.6.1p5 doesn't explicitly
    state that the behaviour is undefined, since the standard never
    defines the behaviour, the end result is the same.
     
    Harald van Dijk, Nov 16, 2011
    #17
  18. tim

    James Kuyper Guest

    On 11/16/2011 03:41 PM, Harald van Dijk wrote:
    > On Nov 16, 9:24 pm, James Kuyper <> wrote:

    ....
    >> I know of no reason why signed char (and therefore, char) cannot have
    >> trap representations. However, every statement in 6.2.6.1p5 which says
    >> that the behavior is undefined when a trap representation is involved,
    >> explicitly excludes all character types, not just unsigned char. I'm not
    >> quite sure what to make of that fact, but I'm sure that explicitly
    >> excluding all character types was intentional; I'm not so sure whether
    >> it was intentional to allow signed char to have trap representations.

    >
    > 6.2.6.1p5 refers to the trap representations for the type of the
    > object. In other words, if an object p of type void * holds a trap
    > representation, 6.2.6.1p5 makes it explicit that reading that object
    > as void * is not valid.


    So, in your opinion, what is the significance of the exclusion of
    character types from those statements? What do those statements mean,
    with those exclusions, that differs from what they would mean if those
    exclusions were dropped? Please accompany your explanation with specific
    examples of code that would have defined behavior under the existing
    rules, but not with that modification, or vice-versa.
     
    James Kuyper, Nov 16, 2011
    #18
  19. On Nov 16, 10:02 pm, James Kuyper <> wrote:
    > On 11/16/2011 03:41 PM, Harald van Dijk wrote:
    > > On Nov 16, 9:24 pm, James Kuyper <> wrote:

    > ...
    > >> I know of no reason why signed char (and therefore, char) cannot have
    > >> trap representations. However, every statement in 6.2.6.1p5 which says
    > >> that the behavior is undefined when a trap representation is involved,
    > >> explicitly excludes all character types, not just unsigned char. I'm not
    > >> quite sure what to make of that fact, but I'm sure that explicitly
    > >> excluding all character types was intentional; I'm not so sure whether
    > >> it was intentional to allow signed char to have trap representations.

    >
    > > 6.2.6.1p5 refers to the trap representations for the type of the
    > > object. In other words, if an object p of type void * holds a trap
    > > representation, 6.2.6.1p5 makes it explicit that reading that object
    > > as void * is not valid.

    >
    > So, in your opinion, what is the significance of the exclusion of
    > character types from those statements? What do those statements mean,
    > with those exclusions, that differs from what they would mean if those
    > exclusions were dropped? Please accompany your explanation with specific
    > examples of code that would have defined behavior under the existing
    > rules, but not with that modification, or vice-versa.


    If those exclusions were dropped, then using memcpy (or rather, a
    custom function written in standard C that behaves exactly like
    memcpy) to copy an object holding a trap representation would be
    invalid.

    /* the standard function memcpy, but implemented in 100% standard C */
    extern void *mymemcpy(void *dest, void *src, size_t n);

    struct S
    {
    int ptrIsValid;
    void *ptr;
    };

    {
    struct S s1, s2;
    s2.ptrIsValid = 0; /* ptr is left uninitialised */
    mymemcpy(&s1, &s2, sizeof(s1));
    }

    Without the exclusion in 6.2.6.1p5, if pointer types can have trap
    representations, mymemcpy would potentially use a character type to
    read a trap representation. This should be allowed, and by excluding
    character types in that paragraph, this is allowed.
     
    Harald van Dijk, Nov 16, 2011
    #19
  20. Kaz Kylheku <> writes:
    > On 2011-11-16, Keith Thompson <> wrote:
    >> Kaz Kylheku <> writes:
    >>> On 2011-11-16, Keith Thompson <> wrote:
    >>>> Kaz Kylheku <> writes:
    >>>>> On 2011-11-16, tim <> wrote:
    >>>>>> THIS IS NOT A HOMEWORK
    >>>>
    >>>> (no need to shout)
    >>>>
    >>>>> In that case, ...
    >>>>>
    >>>>> If the unsigned char * value is already well defined and
    >>>>> everything, it is safe to convert it to char *, and even to
    >>>>> access the memory. In C, this is not considered to be invalid
    >>>>> aliasing. Any object can be accessed as an array of characters,
    >>>>> plain, signed or unsigned.
    >>>>
    >>>> Does the standard guarantee that?
    >>>
    >>> Even if there is a trap representation there, it's not an aliasing issue.
    >>>
    >>> If you could not alias an object using chars, then no access at all would
    >>> be well-defined.

    >>
    >> I don't follow your reasoning.


    And I still don't. If, hypothetically, the standard permitted objects
    to be aliased using unsigned chars but not signed or plain chars, how
    would that imply that "no access at all would be well-defined"?

    >> Where does the standard say that you can alias any object with an
    >> array of plain or signed char?

    >
    > 6.5 paragraph 7. An object can be accessed with an lvalue which
    > is of character type.


    Ah, thank you, that's one of the clues I was missing. The other is
    6.2.6.1p5 (thanks to James Kuyper for catching that one); that says
    explicitly that you can access an object via an lvalue of character
    type.

    So let's assume that you have an object of type unsigned char with
    the value SCHAR_MAX + 1, and you access it as a signed char --
    but that representation is a trap representation
    for signed char:

    unsigned char u = SCHAR_MAX + 1;
    signed char s = *(signed char*)&u;

    My reading is that the behavior is undefined by omission. 6.2.6.1p5
    says that storing a non-character trap representation has undefined
    behavior; it explicitly excludes character types. 6.5p7 says that
    an object shall have its stored value accessed *only* by an lvalue of
    certain types, including character types, but that doesn't imply that
    the behavior of such an access is defined. For example, accessing
    an int object by an lvalue of of type int is permitted by 6.5p7,
    but has undefined behavior if the object holds a trap representation.

    If the behavior is defined, what is it?

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Nov 16, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sushil
    Replies:
    1
    Views:
    733
    Jack Klein
    Nov 28, 2003
  2. b83503104
    Replies:
    1
    Views:
    3,535
    Eric Sosman
    Jun 21, 2004
  3. Replies:
    10
    Views:
    541
    David Thompson
    Apr 19, 2007
  4. Alex Vinokur
    Replies:
    9
    Views:
    832
    James Kanze
    Oct 13, 2008
  5. pozz
    Replies:
    12
    Views:
    793
    Tim Rentsch
    Mar 20, 2011
Loading...

Share This Page