endianness and sscanf/sprintf

Discussion in 'C Programming' started by pramod, Dec 31, 2003.

  1. pramod

    pramod Guest

    Two different platforms communicate over protocols which consist of
    functions and arguments in ascii form. System might be little
    endian/big endian.

    It is possible to format string using sprintf and retreive it using
    sscanf.
    Each parameter has a delimiter, data type size is ported to the
    platform, and expected argument order is known.

    Is this approach portable w.r.t. endianess ?


    regards,
    Pramod
     
    pramod, Dec 31, 2003
    #1
    1. Advertising

  2. pramod

    John Carson Guest

    "pramod" <> wrote in message
    news:
    > Two different platforms communicate over protocols which consist of
    > functions and arguments in ascii form. System might be little
    > endian/big endian.
    >
    > It is possible to format string using sprintf and retreive it using
    > sscanf.
    > Each parameter has a delimiter, data type size is ported to the
    > platform, and expected argument order is known.
    >
    > Is this approach portable w.r.t. endianess ?
    >
    >
    > regards,
    > Pramod



    endianness only affects the way that integers are stored (and perhaps
    floating point numbers --- I am not sure). It does not affect the storage of
    characters so it is not an issue if you are only sending text.


    --
    John Carson
    1. To reply to email address, remove donald
    2. Don't reply to email address (post here instead)
     
    John Carson, Dec 31, 2003
    #2
    1. Advertising

  3. EventHelix.com, Jan 1, 2004
    #3
  4. EventHelix.com wrote:

    > You will be fine as everything is being converted to characters.
    > As long as characters are represented as 8 bytes, the numbers
    > will be interpreted correctly.


    In C (and, as far as I am aware, C++ too), characters are always represented
    in a single byte. Character /constants/ are represented (in C, but not C++)
    by the int type, which might conceivably be eight bytes. Is that what you
    meant?

    (Followups set to comp.lang.c)

    --
    Richard Heathfield :
    "Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
    C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
    K&R answers, C books, etc: http://users.powernet.co.uk/eton
     
    Richard Heathfield, Jan 1, 2004
    #4
  5. On Wed, 31 Dec 2003 01:20:44 -0800, pramod wrote:

    > Two different platforms communicate over protocols which consist of
    > functions and arguments in ascii form. System might be little
    > endian/big endian.
    >
    > It is possible to format string using sprintf and retreive it using
    > sscanf.
    > Each parameter has a delimiter, data type size is ported to the
    > platform, and expected argument order is known.
    >
    > Is this approach portable w.r.t. endianess ?


    Yes, and a very good way to do it. But only if really using ascii,
    otherwise you may end up mixing codesets. Consider using UTF8 if you use
    characters >=128 (i.e. not ascii).

    HTH,
    M4
     
    Martijn Lievaart, Jan 1, 2004
    #5
  6. pramod

    Jeff Schwab Guest

    Jeff Schwab, Jan 1, 2004
    #6
  7. "Jeff Schwab" <> wrote...
    > EventHelix.com wrote:
    > > You will be fine as everything is being converted to characters.
    > > As long as characters are represented as 8 bytes,

    >
    > bits?


    Not that it matters. The second sentence almost invalidates the otherwise
    perfectly correct first ;-)

    Peter
     
    Peter Pichler, Jan 1, 2004
    #7
  8. Richard Heathfield <> wrote in message news:<>...
    > EventHelix.com wrote:
    >
    > > You will be fine as everything is being converted to characters.
    > > As long as characters are represented as 8 bytes, the numbers
    > > will be interpreted correctly.

    >
    > In C (and, as far as I am aware, C++ too), characters are always represented
    > in a single byte. Character /constants/ are represented (in C, but not C++)
    > by the int type, which might conceivably be eight bytes. Is that what you
    > meant?
    >
    > (Followups set to comp.lang.c)


    Typo: it should have been "8 bits" (i.e. byte).

    Sandeep
     
    EventHelix.com, Jan 2, 2004
    #8
  9. EventHelix.com wrote:

    > Richard Heathfield <> wrote in message
    > news:<>...
    >> EventHelix.com wrote:
    >>
    >> > You will be fine as everything is being converted to characters.
    >> > As long as characters are represented as 8 bytes, the numbers
    >> > will be interpreted correctly.

    >>
    >> In C (and, as far as I am aware, C++ too), characters are always
    >> represented in a single byte. Character /constants/ are represented (in
    >> C, but not C++) by the int type, which might conceivably be eight bytes.
    >> Is that what you meant?
    >>

    > Typo: it should have been "8 bits" (i.e. byte).


    But there is no requirement in either C or C++ for a byte to be exactly 8
    bits; only that it must be /at least/ 8 bits.

    --
    Richard Heathfield :
    "Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
    C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
    K&R answers, C books, etc: http://users.powernet.co.uk/eton
     
    Richard Heathfield, Jan 2, 2004
    #9
  10. On Fri, 02 Jan 2004 05:53:31 +0000, Richard Heathfield wrote:

    > EventHelix.com wrote:
    >
    >> Richard Heathfield <> wrote in message
    >> news:<>...
    >>> EventHelix.com wrote:
    >>>
    >>> > You will be fine as everything is being converted to characters.
    >>> > As long as characters are represented as 8 bytes, the numbers
    >>> > will be interpreted correctly.


    Even assuming you ment 8 bits, this is not true. If one system uses ascii
    and the other uses ebcdic, you're screwed. Even the subtle distinctions
    between iso-latin-1 and iso-latin-15, two almost compatible and often used
    character sets, might bite you. All of these use 8 bits (well OK, ascii
    uses 7).

    >>>
    >>> In C (and, as far as I am aware, C++ too), characters are always
    >>> represented in a single byte. Character /constants/ are represented (in
    >>> C, but not C++) by the int type, which might conceivably be eight bytes.
    >>> Is that what you meant?
    >>>

    >> Typo: it should have been "8 bits" (i.e. byte).

    >
    > But there is no requirement in either C or C++ for a byte to be exactly 8
    > bits; only that it must be /at least/ 8 bits.


    But note the unfortunate discrepancy between the meaning of the word byte
    in C/C++ and that of measoring storage. However, C/C++ is not alone here,
    Internet standards talk about octets when they mean 8 bits.

    Same with the unit words. That means different things to different people.
    The way I learned it at uni, very long time ago, was that a word was the
    basic unit of storage. Same as the definition of byte in C/C++. Along came
    MicroSoft and institutionalised the word-size of the 8086 as a WORD, so to
    others a word now is 16 bits. I've seen even different uses of the word
    'word', anyone got an example?

    Why am I saying this? Because in the context of C/C++ a byte has a defined
    meaning. However, in the context of disks and memory, a byte has a
    different meaning. When the context is not clear it is very easy to get
    confusion. Ah I here you say, but this is a C/C++ group, so the meaning is
    clear. That may be true, but:
    - The problem described a certain context, one where many people
    (incorrectly) use the word byte to mean 8 bits.
    - It is very confusing to people anyhow. Youngsters are raised with the
    notion that a byte is 8 bits.

    In the end, we can only conclude that this difference in meaning is very
    unfortunate. Technically, an octet is the correct term for 8 bits. But
    we're never going to change the common use of byte anymore. In the
    meantime we'll have to live with it.

    I just wished the C/C++ standards had used a different term than byte.
    Even word would have been better.

    M4
     
    Martijn Lievaart, Jan 2, 2004
    #10
  11. Martijn Lievaart <> writes:
    [...]
    > But note the unfortunate discrepancy between the meaning of the word byte
    > in C/C++ and that of measoring storage. However, C/C++ is not alone here,
    > Internet standards talk about octets when they mean 8 bits.
    >
    > Same with the unit words. That means different things to different people.
    > The way I learned it at uni, very long time ago, was that a word was the
    > basic unit of storage. Same as the definition of byte in C/C++. Along came
    > MicroSoft and institutionalised the word-size of the 8086 as a WORD, so to
    > others a word now is 16 bits. I've seen even different uses of the word
    > 'word', anyone got an example?

    [...]
    > I just wished the C/C++ standards had used a different term than byte.
    > Even word would have been better.


    I agree that it would have avoided a lot of confusion if the C and C++
    standards had used a term other than "byte" (perhaps "storage unit").
    While I'm wishing for things that didn't happen, it would also have
    been nice if the concept hadn't been tied to the size of a character.

    I think (but I'm not sure, and it doesn't really matter) that the use
    of the word "word" predates the 8086 (and it probably would have been
    Intel, not Microsoft, that introduced the word "word" in descriptions
    of CPU instruction operand sizes). Most or all CPUs I've seen use the
    words "byte" and "word" to refer to operand sizes. The meaning of a
    "word" varies across architectures far more than the meaning of
    "byte".

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
    Schroedinger does Shakespeare: "To be *and* not to be"
    (Note new e-mail address)
     
    Keith Thompson, Jan 2, 2004
    #11
  12. On Fri, 02 Jan 2004 20:45:45 +0000, Keith Thompson wrote:

    > I think (but I'm not sure, and it doesn't really matter) that the use
    > of the word "word" predates the 8086 (and it probably would have been
    > Intel, not Microsoft, that introduced the word "word" in descriptions
    > of CPU instruction operand sizes). Most or all CPUs I've seen use the
    > words "byte" and "word" to refer to operand sizes. The meaning of a
    > "word" varies across architectures far more than the meaning of
    > "byte".


    Exactly what I was trying to say. F.i the CDC used 60-bit words. (No
    wonder that design is extinct :).

    M4
     
    Martijn Lievaart, Jan 3, 2004
    #12
  13. pramod

    Lew Pitcher Guest

    Martijn Lievaart wrote:
    [snip]
    > Same with the unit words. That means different things to different people.
    > The way I learned it at uni, very long time ago, was that a word was the
    > basic unit of storage. Same as the definition of byte in C/C++. Along came
    > MicroSoft and institutionalised the word-size of the 8086 as a WORD, so to
    > others a word now is 16 bits. I've seen even different uses of the word
    > 'word', anyone got an example?


    In the IBM mainframe world, a "word" (or "fullword") has been 32bits for the
    last 40+ years. A 16bit quantity is a "halfword".

    [snip]


    --
    Lew Pitcher

    Master Codewright and JOAT-in-training
    Registered Linux User #112576 (http://counter.li.org/)
    Slackware - Because I know what I'm doing.
     
    Lew Pitcher, Jan 3, 2004
    #13
  14. pramod

    pete Guest

    Lew Pitcher wrote:
    >
    > Martijn Lievaart wrote:
    > [snip]
    > > Same with the unit words.
    > > That means different things to different people.
    > > The way I learned it at uni, very long time ago,
    > > was that a word was the basic unit of storage.
    > > Same as the definition of byte in C/C++. Along came
    > > MicroSoft and institutionalised the word-size of
    > > the 8086 as a WORD, so to others a word now is 16 bits.
    > > I've seen even different uses of the word
    > > 'word', anyone got an example?

    >
    > In the IBM mainframe world, a "word" (or "fullword")
    > has been 32bits for the
    > last 40+ years. A 16bit quantity is a "halfword".


    I'm familiar with "word" having a similar meaning as
    the traditional meaning of "int", having the
    "natural size suggested by the architecture
    of the execution environment"

    --
    pete
     
    pete, Jan 4, 2004
    #14
  15. pramod

    Ron Natalie Guest

    "Lew Pitcher" <> wrote in message news:6s4x6-4.ca...

    >
    > In the IBM mainframe world, a "word" (or "fullword") has been 32bits for the
    > last 40+ years. A 16bit quantity is a "halfword".


    Back when I was heavily into PDP-11's (16 bits), my mainframe friends referred
    to my computers as halfword machines.

    Just about every 32 bit processor (with the exception of the x86 stuff) calls a
    WORD 32 bits. Even on the 386+ the word size really is 32 bits, but since
    the thing is upward compatible with the old 16 bit 8086... they call words DWORDS.

    On the 7094 and it's follow ons (including the UNIVAC and the DEC-10/20) the
    word size is 36 bits. Anything smaller is a "partial word" (which there is no fixed
    divisions leading to amusing things such as the same hardware supporting byte sizes
    from 5 to 9 bits).

    I've worked on 64 bit word machines. The CRAY is word addressed...there really
    is NO such hardware datatype other than 64 bit integrals and 64 bit reals. Char's
    are a unholy kludge in software (they didn't even try anything else, sizeof any non-comoosite
    type is either 8 or 64).

    Never say die, the 64 bit word machines are coming back (AMD, IA64, etc...)!
     
    Ron Natalie, Jan 4, 2004
    #15
  16. pramod

    Ron Natalie Guest

    "pete" <> wrote in message news:...
    >> I'm familiar with "word" having a similar meaning as

    > the traditional meaning of "int", having the
    > "natural size suggested by the architecture
    > of the execution environment"


    Of course even int's get perverted. For example, on many 64 bit
    architectures where 64 bits is the natural size, they've just punted and
    made int's 32 bits because that's what the larger body of code assumes.
    It took us over a decade to get people to stop expecting *0 to be 0.
     
    Ron Natalie, Jan 4, 2004
    #16
  17. pramod

    Joe Wright Guest

    Ron Natalie wrote:
    >
    > "Lew Pitcher" <> wrote in message news:6s4x6-4.ca...
    >
    > >
    > > In the IBM mainframe world, a "word" (or "fullword") has been 32bits for the
    > > last 40+ years. A 16bit quantity is a "halfword".

    >

    [ snippage ]

    > On the 7094 and it's follow ons (including the UNIVAC and the DEC-10/20) the
    > word size is 36 bits. Anything smaller is a "partial word" (which there is no fixed
    > divisions leading to amusing things such as the same hardware supporting byte sizes
    > from 5 to 9 bits).
    >

    The IBM 7094 came out in January 1963 and was the last of its ilk from
    IBM. Its follow on was the S/360 in 1964. I never came across a "partial
    word". For I/O the 36-bit word was divided into 6-bit chunks to be
    written to (and read from) 7-channel magnetic tape. For character I/O
    the 6 bits were encoded into something called BCD which translated
    directly to and from the 026 punch card. With the S/360 came the 32-bit
    word and 8-bit character, 9-channel mag tape and EBCDIC (Extended BCD
    Interchange Code).
    --
    Joe Wright http://www.jw-wright.com
    "Everything should be made as simple as possible, but not simpler."
    --- Albert Einstein ---
     
    Joe Wright, Jan 4, 2004
    #17
  18. [OT] byte sizes, was "Re: endianness and sscanf/sprintf"

    Ron Natalie wrote:

    > On the 7094 and it's follow ons (including the UNIVAC and the DEC-10/20) the
    > word size is 36 bits. Anything smaller is a "partial word" (which there is no fixed
    > divisions leading to amusing things such as the same hardware supporting byte sizes
    > from 5 to 9 bits).


    The PDP-10 and PDP-20 were "follow ons" to the PDP-6, not the 7094,
    although both derived features from earlier machines. The the PDP-6/10
    family (and, to a lesser degree, the 7090/7094 family) had many
    instructions that operated on 18-bit halfwords, for the good reason that
    instructions were divided with an 18-bit address field (+indirect bit).
    This structure -- from 7094 side again -- lies behind the "car" and "cdr"
    functions in Lisp.
    The PDP-6 and -10 used byte pointers which could address bytes of any size
    from 1- to 36-bits. Some sizes, notably 19-35 bits, are obviously quite
    wasteful. The most common sizes were the ones you name (5- to 9-bit bytes).


    --
    Martin Ambuhl
     
    Martin Ambuhl, Jan 4, 2004
    #18
  19. pramod

    Ron Natalie Guest

    "Joe Wright" <> wrote in message news:...

    > > On the 7094 and it's follow ons (including the UNIVAC and the DEC-10/20) the
    > > word size is 36 bits. Anything smaller is a "partial word" (which there is no fixed
    > > divisions leading to amusing things such as the same hardware supporting byte sizes
    > > from 5 to 9 bits).
    > >

    > The IBM 7094 came out in January 1963 and was the last of its ilk from
    > IBM. Its follow on was the S/360 in 1964. I never came across a "partial
    > word".

    The follow-on's were not from IBM. The 7094 begat both the UNIVAC
    1100 series and the DEC mainframes. Both of which had the arbitrary
    byte operations. The 7094 did have both 6 and 7 bit I/O bytes available.
    The UNIVAC had an even larger array of byte size usage.

    An another amusing asside, is that there was a UNIVAC communications
    processor for the 1100-series (I'm spacing on it's nomenclature? CSE?),
    which actually ran the 360 instruction set.

    Speaking of the 7-track tape drivers, when they shop finally ditched the last
    of the 7-track UNISERVO tape drivers we lost the ability to run the program
    that played Christmas carols using the sound the tape in the vacuum columns
    made. Nobody ever retuned it for the 9-track drives.
     
    Ron Natalie, Jan 4, 2004
    #19
  20. "Ron Natalie" <> writes:
    [...]
    > I've worked on 64 bit word machines. The CRAY is word
    > addressed...there really is NO such hardware datatype other than 64
    > bit integrals and 64 bit reals. Char's are a unholy kludge in
    > software (they didn't even try anything else, sizeof any
    > non-comoosite type is either 8 or 64).


    There have been a number of different Cray models, with different
    architectures, but I think the vector systems (the oldest I've worked
    on was the T90) have been fairly consistent in their data types.

    I think you're quoting bit sizes rather than byte sizes. The C
    compiler uses an 8-bit byte for compatibility with other systems, even
    though there's no real hardware support for 8-bit operands.
    sizeof(char) is 1, of course; sizeof(TYPE) is 8 (64 bits) for each of
    short, int, and long. Byte pointers are word pointers with a byte
    offset kludged into the high-order 3 bits. Carefully written C code
    works just fine; code that makes too many assumptions can fail badly.

    The T3E isn't quite so exotic; it uses Alpha CPUs.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
    Schroedinger does Shakespeare: "To be *and* not to be"
     
    Keith Thompson, Jan 5, 2004
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. pramod
    Replies:
    22
    Views:
    1,878
    Lew Pitcher
    Jan 6, 2004
  2. kelvSYC

    Endianness and streams

    kelvSYC, Jun 5, 2005, in forum: C++
    Replies:
    8
    Views:
    462
    Pete Becker
    Jun 6, 2005
  3. Case

    memcpy() and endianness

    Case, May 10, 2004, in forum: C Programming
    Replies:
    26
    Views:
    1,854
    Dan Pop
    May 12, 2004
  4. gamehack

    Bit shifts and endianness

    gamehack, Jan 5, 2006, in forum: C Programming
    Replies:
    72
    Views:
    6,904
    Dave Thompson
    Jan 11, 2006
  5. Bumbala

    Question about sscanf and sprintf

    Bumbala, Feb 7, 2009, in forum: C Programming
    Replies:
    9
    Views:
    1,068
    Bumbala
    Feb 10, 2009
Loading...

Share This Page