Name to method?

Discussion in 'C Programming' started by superheathen@yahoo.ca, Feb 11, 2008.

  1. Guest

    Hi

    I'm reading from a database that stores information as an integer
    representing a char array of ints, it is created in the following
    way:

    unsigned char examplearray[] = {4, 3, 2, 1};
    unsigned int exampleint = *(unsigned int *)examplearray;

    and back again using:

    unsigned char examplearray2[4];
    *(unsigned int*)examplearray2 = exampleint;

    it works, I just have no clue how it works. Does this technique have a
    name so I can look into it?
     
    , Feb 11, 2008
    #1
    1. Advertising

  2. In article <>,
    <> wrote:

    >I'm reading from a database that stores information as an integer
    >representing a char array of ints, it is created in the following
    >way:


    >unsigned char examplearray[] = {4, 3, 2, 1};
    >unsigned int exampleint = *(unsigned int *)examplearray;


    >and back again using:


    >unsigned char examplearray2[4];
    >*(unsigned int*)examplearray2 = exampleint;


    >it works, I just have no clue how it works. Does this technique have a
    >name so I can look into it?


    It is sometimes called "type punning".
    --
    "Pray do not take the pains / To set me right. /
    In vain my faults ye quote; / I wrote as others wrote /
    On Sunium's hight." -- Walter Savage Landor
     
    Walter Roberson, Feb 11, 2008
    #2
    1. Advertising

  3. Guest

    i'm still suck.

    Maybe someone can explain the math in the (de)conversion?
     
    , Feb 11, 2008
    #3
  4. Guest

    On Feb 11, 1:44 am, wrote:
    > i'm still suck.
    >
    > Maybe someone can explain the math in the (de)conversion?


    There's not really much math involved, you're just swapping around
    pointers. Here's a rewrite that may be easier to understand:
    unsigned char examplearray[] = {4, 3, 2, 1};
    unsigned int *pointerint = examplearray; /* pointerint points to
    examplearray (which is 32-bits; the size of an int) */
    unsigned int exampleint = *pointerint; /* sets new int exampleint to
    what pointerint points to */
     
    , Feb 11, 2008
    #4
  5. Guest

    On Feb 10, 4:59 pm, wrote:
    > On Feb 11, 1:44 am, wrote:
    >
    > > i'm still suck.

    >
    > > Maybe someone can explain the math in the (de)conversion?

    >
    > There's not really much math involved, you're just swapping around
    > pointers. Here's a rewrite that may be easier to understand:
    > unsigned char examplearray[] = {4, 3, 2, 1};
    > unsigned int *pointerint = examplearray; /* pointerint points to
    > examplearray (which is 32-bits; the size of an int) */
    > unsigned int exampleint = *pointerint; /* sets new int exampleint to
    > what pointerint points to */

    There's got to be a little, in the example I provided the integer
    isn't a pointer or array. (though I tried using the example you
    provided and got:
    warning: initialization from incompatible pointer type) Using some
    casting , it somehow takes the array and converts it to 16909060 (how
    does it get this number?) and then using 16909060 is able to
    reconstruct the array. To me it'd make more sense if it did use
    pointers instead of the casting.

    Sorry if I'm too dense to get what you're getting at.
     
    , Feb 11, 2008
    #5
  6. Guest

    On Feb 10, 5:27 pm, wrote:
    > On Feb 10, 4:59 pm, wrote:> On Feb 11, 1:44 am, wrote:
    >
    > > > i'm still suck.

    >
    > > > Maybe someone can explain the math in the (de)conversion?

    >
    > > There's not really much math involved, you're just swapping around
    > > pointers. Here's a rewrite that may be easier to understand:
    > > unsigned char examplearray[] = {4, 3, 2, 1};
    > > unsigned int *pointerint = examplearray; /* pointerint points to
    > > examplearray (which is 32-bits; the size of an int) */
    > > unsigned int exampleint = *pointerint; /* sets new int exampleint to
    > > what pointerint points to */

    >
    > There's got to be a little, in the example I provided the integer
    > isn't a pointer or array. (though I tried using the example you
    > provided and got:
    > warning: initialization from incompatible pointer type) Using some
    > casting , it somehow takes the array and converts it to 16909060 (how
    > does it get this number?) and then using 16909060 is able to
    > reconstruct the array. To me it'd make more sense if it did use
    > pointers instead of the casting.
    >
    > Sorry if I'm too dense to get what you're getting at.


    also, if it were some sort of memory location, wouldn't it be
    subjected to change each compile, rendering it unable to read the
    database?
     
    , Feb 11, 2008
    #6
  7. In article <>,
    <> wrote:
    >On Feb 11, 1:44=A0am, wrote:


    >> Maybe someone can explain the math in the (de)conversion?


    >There's not really much math involved, you're just swapping around
    >pointers.



    There is some elementary math involved.

    The original code had,

    >>>unsigned char examplearray[] = {4, 3, 2, 1};
    >>>unsigned int exampleint = *(unsigned int *)examplearray;


    This presumes that 'unsigned int' is the same size as 4 unsigned
    char, which is can also be expressed as sizeof(unsigned int) == 4.

    An unsigned char is always at least 8 bits, so unsigned int in
    this code is presumed to be at least 32 bits wide. This is a
    non-portable assumption: 'unsigned long' should be used
    instead of 'unsigned int', as unsigned long is guaranteed to
    be at least 32 bits, but unsigned int might be as small as 16 bits.

    It is possible in C to have a 32 bit unsigned int or unsigned long
    and yet for sizeof(int) to not be 4: for example it is legal in
    C for unsigned char itself to be 32 bits and sizeof(unsigned int) == 1.
    Real systems with such characteristics exist -- and the code would
    completely break on them.

    When unsigned char examplearray[] = {4, 3, 2, 1}; then C guarantees
    that the 4, 3, 2, 1 will be stored in memory in increasing address
    order. If I use | to mark the end of bytes in increasing memory order,
    examplearray would end up holding |4|3|2|1| in that order.

    When (unsigned int *)examplearray is done (note I removed the
    leading * from the expression), the resulting pointer will be
    a pointer to unsigned int, and it will point to the beginning of
    that memory area, |4|3|2|1| . The * in front of the pointer expression,
    *(unsigned int *)examplearray "dereferences" that pointer, so
    unsigned int exampleint will be an unsigned int loaded from memory
    that was initialized to |4|3|2|1| .

    Now this is the part that starts getting complicated: the -numeric-
    significance of each byte of the |4|3|2|1| for the purposes of
    unsigned int, is not necessarily going to be in the same order
    as the bytes are written in memory.

    On some systems ("big endian systems") the numeric order -would- be in
    exactly that order, and the numeric value of the unsigned int would be
    4 << (3*CHARBIT) + 3 << (2*CHARBIT) + 2 << (1*CHARBIT) + 1 << (0*CHARBIT)
    where CHARBIT is the number of bits in a char (typically 8 but could
    be more.) Using a non-C notation for a moment where ** represents
    exponentiation, this would be
    4 * CHARBIT**3 + 3 * CHARBIT**2 + 2 * CHARBIT**1 + 1 * CHARBIT**0
    which is exactly parallel to traditional decimal (base 10) notation
    in which the base 10 number 4321 means
    4 * 10**3 + 3 * 10**2 + 2 * 10**1 + 1 * 10**0

    However, there are other systems ("little endian") in which the
    numeric order of the |4|3|2|1| bytes would be loaded from memory
    completely differently. Two variations with "little endian"
    systems would be

    3 << (3*CHARBIT) + 4 << (2*CHARBIT) + 1 << (1*CHARBIT) + 2 << (0*CHARBIT)
    and
    2 << (3*CHARBIT) + 1 << (2*CHARBIT) + 4 << (1*CHARBIT) + 3 << (0*CHARBIT)

    which could be respectively written (in non-C notation) as

    3 * CHARBIT**3 + 4 * CHARBIT**2 + 1 * CHARBIT**1 + 2 * CHARBIT**0
    and
    2 * CHARBIT**3 + 1 * CHARBIT**2 + 4 * CHARBIT**1 + 3 * CHARBIT**0

    which would have analogs in base 10 as if the byte stream |4|3|2|1|
    loaded into memory as the decimal numbers 3412 or 2143 respectively.

    These different ways of assigning relative numeric significance to
    streams of bytes in memory are not wrong, they are just different,
    and as long as the program is consistant about which order is used
    there is no problem (except when talking to other systems that
    use different orders.)

    Pentium-type processors tend to use one of the little-endian
    orderings; some processors such as MIPS R4000/R10000/R12000 etc.
    use "big-endian" orderings. If you work with more than 2 distinct
    processor architectures, you will probably encounter different
    "endian" orderings at some point.


    Now, when the process is reversed and the character array is
    populated with the unsigned long value, the processor will take
    the numeric value it has in the processor, and will write a sequence
    of bytes into memory. The order that it does that writing in
    need not be "most significant bit first" (that is, it need not be
    the bit that denotes the highest numeric value that gets written
    first). It could be -- "big endian" systems write in that order
    for example. But lots of other systems write in some other order
    (perhaps for some attempt to maintain compatability with
    the original 8 bit processors in their family lines). Whatever
    order the processor uses to write values to memory will be the
    exact mirror of the order that it loads from memory with,
    so if the numeric order that it picked up from loading |4|3|2|1|
    into memory was
    3 * CHARBIT**3 + 4 * CHARBIT**2 + 1 * CHARBIT**1 + 2 * CHARBIT**0
    then whatever current value it has to deal with will be written
    reflecting that value order, producing |4|3|2|1| in memory.
    With this ordering, if the current value it had in memory was
    141 * CHARBIT**3 + 17 * CHARBIT**2 + 92 * CHARBIT**1 + 29 * CHARBIT**0
    then to maintain consistency with the loads, the bytes it would
    write into memory would be |17|141|29|92| .

    Now, no matter what order was used to determine numeric signficance upon
    load, the storage will undo the effect for the same value,
    so no matter what order your processor uses internally, loading
    |4|3|2|1| from memory into an unsigned long and storing it again
    is going to result in |4|3|2|1| (assuming that sizeof(unsigned long) == 4)

    So the matter is more complex than just "manipulating pointers",
    but the mathematics involved ends up cancelling itself out if you
    load and then store the same value. If you had, for example, added
    1 to the unsigned long and then stored the result back into memory,
    you might have ended up with |4|3|2|2| or with |4|3|3|1| or with
    |4|4|2|1| or with |5|3|2|1| and the mathematics involved would
    help describe that. And if you were working with CHARBIT 8
    and you had (say) |4|255|255|1| and were to add 1 to the unsigned long
    storage of that, you would need the mathematics shown above to understand
    the results you might get.

    For any given number of bits per char, there are 24 different values
    that |4|3|2|1| might get loaded as an unsigned long, depending upon
    the processor. A few processors, such as the ARM, are able to use
    different memory storage orderings depending on the state of a flag.
    (The MIPS Rx000 processors can as well, but it is more typical to
    hard-wire the order bit so that it is constant for any one MIPS
    motherboard.)
    --
    This is a Usenet signature block. Please do not quote it when replying
    to one of my postings.
    http://en.wikipedia.org/wiki/Signature_block
     
    Walter Roberson, Feb 11, 2008
    #7
  8. Guest

    On Feb 10, 6:45 pm, -cnrc.gc.ca (Walter Roberson)
    wrote:
    > In article <>,
    >
    > <> wrote:
    > >On Feb 11, 1:44=A0am, wrote:
    > >> Maybe someone can explain the math in the (de)conversion?

    > >There's not really much math involved, you're just swapping around
    > >pointers.

    >
    > There is some elementary math involved.
    >
    > The original code had,
    >
    > >>>unsigned char examplearray[] = {4, 3, 2, 1};
    > >>>unsigned int exampleint = *(unsigned int *)examplearray;

    >
    > This presumes that 'unsigned int' is the same size as 4 unsigned
    > char, which is can also be expressed as sizeof(unsigned int) == 4.
    >
    > An unsigned char is always at least 8 bits, so unsigned int in
    > this code is presumed to be at least 32 bits wide. This is a
    > non-portable assumption: 'unsigned long' should be used
    > instead of 'unsigned int', as unsigned long is guaranteed to
    > be at least 32 bits, but unsigned int might be as small as 16 bits.
    >
    > It is possible in C to have a 32 bit unsigned int or unsigned long
    > and yet for sizeof(int) to not be 4: for example it is legal in
    > C for unsigned char itself to be 32 bits and sizeof(unsigned int) == 1.
    > Real systems with such characteristics exist -- and the code would
    > completely break on them.
    >
    > When unsigned char examplearray[] = {4, 3, 2, 1}; then C guarantees
    > that the 4, 3, 2, 1 will be stored in memory in increasing address
    > order. If I use | to mark the end of bytes in increasing memory order,
    > examplearray would end up holding |4|3|2|1| in that order.
    >
    > When (unsigned int *)examplearray is done (note I removed the
    > leading * from the expression), the resulting pointer will be
    > a pointer to unsigned int, and it will point to the beginning of
    > that memory area, |4|3|2|1| . The * in front of the pointer expression,
    > *(unsigned int *)examplearray "dereferences" that pointer, so
    > unsigned int exampleint will be an unsigned int loaded from memory
    > that was initialized to |4|3|2|1| .
    >
    > Now this is the part that starts getting complicated: the -numeric-
    > significance of each byte of the |4|3|2|1| for the purposes of
    > unsigned int, is not necessarily going to be in the same order
    > as the bytes are written in memory.
    >
    > On some systems ("big endian systems") the numeric order -would- be in
    > exactly that order, and the numeric value of the unsigned int would be
    > 4 << (3*CHARBIT) + 3 << (2*CHARBIT) + 2 << (1*CHARBIT) + 1 << (0*CHARBIT)
    > where CHARBIT is the number of bits in a char (typically 8 but could
    > be more.) Using a non-C notation for a moment where ** represents
    > exponentiation, this would be
    > 4 * CHARBIT**3 + 3 * CHARBIT**2 + 2 * CHARBIT**1 + 1 * CHARBIT**0
    > which is exactly parallel to traditional decimal (base 10) notation
    > in which the base 10 number 4321 means
    > 4 * 10**3 + 3 * 10**2 + 2 * 10**1 + 1 * 10**0
    >
    > However, there are other systems ("little endian") in which the
    > numeric order of the |4|3|2|1| bytes would be loaded from memory
    > completely differently. Two variations with "little endian"
    > systems would be
    >
    > 3 << (3*CHARBIT) + 4 << (2*CHARBIT) + 1 << (1*CHARBIT) + 2 << (0*CHARBIT)
    > and
    > 2 << (3*CHARBIT) + 1 << (2*CHARBIT) + 4 << (1*CHARBIT) + 3 << (0*CHARBIT)
    >
    > which could be respectively written (in non-C notation) as
    >
    > 3 * CHARBIT**3 + 4 * CHARBIT**2 + 1 * CHARBIT**1 + 2 * CHARBIT**0
    > and
    > 2 * CHARBIT**3 + 1 * CHARBIT**2 + 4 * CHARBIT**1 + 3 * CHARBIT**0
    >
    > which would have analogs in base 10 as if the byte stream |4|3|2|1|
    > loaded into memory as the decimal numbers 3412 or 2143 respectively.
    >
    > These different ways of assigning relative numeric significance to
    > streams of bytes in memory are not wrong, they are just different,
    > and as long as the program is consistant about which order is used
    > there is no problem (except when talking to other systems that
    > use different orders.)
    >
    > Pentium-type processors tend to use one of the little-endian
    > orderings; some processors such as MIPS R4000/R10000/R12000 etc.
    > use "big-endian" orderings. If you work with more than 2 distinct
    > processor architectures, you will probably encounter different
    > "endian" orderings at some point.
    >
    > Now, when the process is reversed and the character array is
    > populated with the unsigned long value, the processor will take
    > the numeric value it has in the processor, and will write a sequence
    > of bytes into memory. The order that it does that writing in
    > need not be "most significant bit first" (that is, it need not be
    > the bit that denotes the highest numeric value that gets written
    > first). It could be -- "big endian" systems write in that order
    > for example. But lots of other systems write in some other order
    > (perhaps for some attempt to maintain compatability with
    > the original 8 bit processors in their family lines). Whatever
    > order the processor uses to write values to memory will be the
    > exact mirror of the order that it loads from memory with,
    > so if the numeric order that it picked up from loading |4|3|2|1|
    > into memory was
    > 3 * CHARBIT**3 + 4 * CHARBIT**2 + 1 * CHARBIT**1 + 2 * CHARBIT**0
    > then whatever current value it has to deal with will be written
    > reflecting that value order, producing |4|3|2|1| in memory.
    > With this ordering, if the current value it had in memory was
    > 141 * CHARBIT**3 + 17 * CHARBIT**2 + 92 * CHARBIT**1 + 29 * CHARBIT**0
    > then to maintain consistency with the loads, the bytes it would
    > write into memory would be |17|141|29|92| .
    >
    > Now, no matter what order was used to determine numeric signficance upon
    > load, the storage will undo the effect for the same value,
    > so no matter what order your processor uses internally, loading
    > |4|3|2|1| from memory into an unsigned long and storing it again
    > is going to result in |4|3|2|1| (assuming that sizeof(unsigned long) == 4)
    >
    > So the matter is more complex than just "manipulating pointers",
    > but the mathematics involved ends up cancelling itself out if you
    > load and then store the same value. If you had, for example, added
    > 1 to the unsigned long and then stored the result back into memory,
    > you might have ended up with |4|3|2|2| or with |4|3|3|1| or with
    > |4|4|2|1| or with |5|3|2|1| and the mathematics involved would
    > help describe that. And if you were working with CHARBIT 8
    > and you had (say) |4|255|255|1| and were to add 1 to the unsigned long
    > storage of that, you would need the mathematics shown above to understand
    > the results you might get.
    >
    > For any given number of bits per char, there are 24 different values
    > that |4|3|2|1| might get loaded as an unsigned long, depending upon
    > the processor. A few processors, such as the ARM, are able to use
    > different memory storage orderings depending on the state of a flag.
    > (The MIPS Rx000 processors can as well, but it is more typical to
    > hard-wire the order bit so that it is constant for any one MIPS
    > motherboard.)
    > --
    > This is a Usenet signature block. Please do not quote it when replying
    > to one of my postings.http://en.wikipedia.org/wiki/Signature_block


    excellent, appreciated tons.
     
    , Feb 11, 2008
    #8
  9. Jack Klein Guest

    On Sun, 10 Feb 2008 16:02:36 -0800 (PST), wrote
    in comp.lang.c:

    > Hi
    >
    > I'm reading from a database that stores information as an integer
    > representing a char array of ints, it is created in the following
    > way:
    >
    > unsigned char examplearray[] = {4, 3, 2, 1};
    > unsigned int exampleint = *(unsigned int *)examplearray;
    >
    > and back again using:
    >
    > unsigned char examplearray2[4];
    > *(unsigned int*)examplearray2 = exampleint;
    >
    > it works, I just have no clue how it works. Does this technique have a
    > name so I can look into it?


    It might happen to "work" for your expectation of "work" on the
    particular platform where you are using it. The C standard makes no
    such guarantee, because the behavior is undefined. On some platforms,
    if "examplearray" does not have the proper alignment, trying to access
    it as an unsigned int will generate a hardware trap.

    What you have here is an example of poorly written code by a
    programmer who isn't anywhere near as knowledgeable as he/she thinks.

    --
    Jack Klein
    Home: http://JK-Technology.Com
    FAQs for
    comp.lang.c http://c-faq.com/
    comp.lang.c++ http://www.parashift.com/c -faq-lite/
    alt.comp.lang.learn.c-c++
    http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html
     
    Jack Klein, Feb 11, 2008
    #9
  10. In article <>,
    Jack Klein <> blathered:
    ....
    >What you have here is an example of poorly written code by a
    >programmer who isn't anywhere near as knowledgeable as he/she thinks.


    Yeah, that guy, Linus Torvalds, a real idiot. Probably lost his first
    (and only) programming job - probably homeless and out on the street by now.

    Yeah, I hear he used to do a lot of that sort of thing - type punning,
    and god knows what else.
     
    Kenny McCormack, Feb 11, 2008
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?iso-8859-1?B?bW9vcJk=?=
    Replies:
    7
    Views:
    857
    Roedy Green
    Jan 2, 2006
  2. ding feng
    Replies:
    2
    Views:
    2,848
    ding feng
    Jun 25, 2003
  3. Bobby Chamness
    Replies:
    2
    Views:
    2,415
    Joe Smith
    Apr 22, 2007
  4. Jack-2
    Replies:
    3
    Views:
    286
    Jack-2
    Dec 24, 2003
  5. Java  script  Dude

    IE name="name" & form.name property bug

    Java script Dude, Jun 29, 2004, in forum: Javascript
    Replies:
    5
    Views:
    258
    Java script Dude
    Jun 30, 2004
Loading...

Share This Page