endianness and sscanf/sprintf

pramod · Dec 31, 2003

Two different platforms communicate over protocols which consist of
functions and arguments in ascii form. System might be little
endian/big endian.

It is possible to format string using sprintf and retreive it using
sscanf.
Each parameter has a delimiter, data type size is ported to the
platform, and expected argument order is known.

Is this approach portable w.r.t. endianess ?

regards,
Pramod

John Carson · Dec 31, 2003

pramod said:
Two different platforms communicate over protocols which consist of
functions and arguments in ascii form. System might be little
endian/big endian.

It is possible to format string using sprintf and retreive it using
sscanf.
Each parameter has a delimiter, data type size is ported to the
platform, and expected argument order is known.

Is this approach portable w.r.t. endianess ?

regards,
Pramod

endianness only affects the way that integers are stored (and perhaps
floating point numbers --- I am not sure). It does not affect the storage of
characters so it is not an issue if you are only sending text.

EventHelix.com · Jan 1, 2004

You will be fine as everything is being converted to characters.
As long as characters are represented as 8 bytes, the numbers
will be interpreted correctly. Java bytecodes use the same approach.

The following article discusses the endianness in detail:

http://www.eventhelix.com/RealtimeMantra/ByteAlignmentAndOrdering.htm

Sandeep

Richard Heathfield · Jan 1, 2004

EventHelix.com said:
You will be fine as everything is being converted to characters.
As long as characters are represented as 8 bytes, the numbers
will be interpreted correctly.

In C (and, as far as I am aware, C++ too), characters are always represented
in a single byte. Character /constants/ are represented (in C, but not C++)
by the int type, which might conceivably be eight bytes. Is that what you
meant?

(Followups set to comp.lang.c)

Martijn Lievaart · Jan 1, 2004

Two different platforms communicate over protocols which consist of
functions and arguments in ascii form. System might be little
endian/big endian.

It is possible to format string using sprintf and retreive it using
sscanf.
Each parameter has a delimiter, data type size is ported to the
platform, and expected argument order is known.

Is this approach portable w.r.t. endianess ?

Yes, and a very good way to do it. But only if really using ascii,
otherwise you may end up mixing codesets. Consider using UTF8 if you use
characters >=128 (i.e. not ascii).

HTH,
M4

Jeff Schwab · Jan 1, 2004

EventHelix.com said:
You will be fine as everything is being converted to characters.
As long as characters are represented as 8 bytes,
bits?

the numbers
will be interpreted correctly. Java bytecodes use the same approach.

The following article discusses the endianness in detail:

http://www.eventhelix.com/RealtimeMantra/ByteAlignmentAndOrdering.htm

Sandeep

Peter Pichler · Jan 1, 2004

Jeff Schwab said:
bits?

Not that it matters. The second sentence almost invalidates the otherwise
perfectly correct first ;-)

Peter

EventHelix.com · Jan 1, 2004

Richard Heathfield said:
In C (and, as far as I am aware, C++ too), characters are always represented
in a single byte. Character /constants/ are represented (in C, but not C++)
by the int type, which might conceivably be eight bytes. Is that what you
meant?

(Followups set to comp.lang.c)

Typo: it should have been "8 bits" (i.e. byte).

Sandeep

Richard Heathfield · Jan 2, 2004

EventHelix.com said:
Typo: it should have been "8 bits" (i.e. byte).

But there is no requirement in either C or C++ for a byte to be exactly 8
bits; only that it must be /at least/ 8 bits.

Martijn Lievaart · Jan 2, 2004

Even assuming you ment 8 bits, this is not true. If one system uses ascii
and the other uses ebcdic, you're screwed. Even the subtle distinctions
between iso-latin-1 and iso-latin-15, two almost compatible and often used
character sets, might bite you. All of these use 8 bits (well OK, ascii
uses 7).

But there is no requirement in either C or C++ for a byte to be exactly 8
bits; only that it must be /at least/ 8 bits.

But note the unfortunate discrepancy between the meaning of the word byte
in C/C++ and that of measoring storage. However, C/C++ is not alone here,
Internet standards talk about octets when they mean 8 bits.

Same with the unit words. That means different things to different people.
The way I learned it at uni, very long time ago, was that a word was the
basic unit of storage. Same as the definition of byte in C/C++. Along came
MicroSoft and institutionalised the word-size of the 8086 as a WORD, so to
others a word now is 16 bits. I've seen even different uses of the word
'word', anyone got an example?

Why am I saying this? Because in the context of C/C++ a byte has a defined
meaning. However, in the context of disks and memory, a byte has a
different meaning. When the context is not clear it is very easy to get
confusion. Ah I here you say, but this is a C/C++ group, so the meaning is
clear. That may be true, but:
- The problem described a certain context, one where many people
(incorrectly) use the word byte to mean 8 bits.
- It is very confusing to people anyhow. Youngsters are raised with the
notion that a byte is 8 bits.

In the end, we can only conclude that this difference in meaning is very
unfortunate. Technically, an octet is the correct term for 8 bits. But
we're never going to change the common use of byte anymore. In the
meantime we'll have to live with it.

I just wished the C/C++ standards had used a different term than byte.
Even word would have been better.

M4

Keith Thompson · Jan 2, 2004

Martijn Lievaart said:
But note the unfortunate discrepancy between the meaning of the word byte
in C/C++ and that of measoring storage. However, C/C++ is not alone here,
Internet standards talk about octets when they mean 8 bits.

Same with the unit words. That means different things to different people.
The way I learned it at uni, very long time ago, was that a word was the
basic unit of storage. Same as the definition of byte in C/C++. Along came
MicroSoft and institutionalised the word-size of the 8086 as a WORD, so to
others a word now is 16 bits. I've seen even different uses of the word
'word', anyone got an example? [...]
I just wished the C/C++ standards had used a different term than byte.
Even word would have been better.

I agree that it would have avoided a lot of confusion if the C and C++
standards had used a term other than "byte" (perhaps "storage unit").
While I'm wishing for things that didn't happen, it would also have
been nice if the concept hadn't been tied to the size of a character.

I think (but I'm not sure, and it doesn't really matter) that the use
of the word "word" predates the 8086 (and it probably would have been
Intel, not Microsoft, that introduced the word "word" in descriptions
of CPU instruction operand sizes). Most or all CPUs I've seen use the
words "byte" and "word" to refer to operand sizes. The meaning of a
"word" varies across architectures far more than the meaning of
"byte".

Martijn Lievaart · Jan 3, 2004

I think (but I'm not sure, and it doesn't really matter) that the use
of the word "word" predates the 8086 (and it probably would have been
Intel, not Microsoft, that introduced the word "word" in descriptions
of CPU instruction operand sizes). Most or all CPUs I've seen use the
words "byte" and "word" to refer to operand sizes. The meaning of a
"word" varies across architectures far more than the meaning of
"byte".

Exactly what I was trying to say. F.i the CDC used 60-bit words. (No
wonder that design is extinct

.

M4

Lew Pitcher · Jan 3, 2004

Martijn Lievaart wrote:
[snip]

Same with the unit words. That means different things to different people.
The way I learned it at uni, very long time ago, was that a word was the
basic unit of storage. Same as the definition of byte in C/C++. Along came
MicroSoft and institutionalised the word-size of the 8086 as a WORD, so to
others a word now is 16 bits. I've seen even different uses of the word
'word', anyone got an example?

In the IBM mainframe world, a "word" (or "fullword") has been 32bits for the
last 40+ years. A 16bit quantity is a "halfword".

[snip]

--
Lew Pitcher

Master Codewright and JOAT-in-training
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.

pete · Jan 4, 2004

Lew said:
Martijn Lievaart wrote:
[snip]

Same with the unit words.
That means different things to different people.
The way I learned it at uni, very long time ago,
was that a word was the basic unit of storage.
Same as the definition of byte in C/C++. Along came
MicroSoft and institutionalised the word-size of
the 8086 as a WORD, so to others a word now is 16 bits.
I've seen even different uses of the word
'word', anyone got an example?

Click to expand...

In the IBM mainframe world, a "word" (or "fullword")
has been 32bits for the
last 40+ years. A 16bit quantity is a "halfword".

I'm familiar with "word" having a similar meaning as
the traditional meaning of "int", having the
"natural size suggested by the architecture
of the execution environment"

Ron Natalie · Jan 4, 2004

Lew Pitcher said:
In the IBM mainframe world, a "word" (or "fullword") has been 32bits for the
last 40+ years. A 16bit quantity is a "halfword".

Back when I was heavily into PDP-11's (16 bits), my mainframe friends referred
to my computers as halfword machines.

Just about every 32 bit processor (with the exception of the x86 stuff) calls a
WORD 32 bits. Even on the 386+ the word size really is 32 bits, but since
the thing is upward compatible with the old 16 bit 8086... they call words DWORDS.

On the 7094 and it's follow ons (including the UNIVAC and the DEC-10/20) the
word size is 36 bits. Anything smaller is a "partial word" (which there is no fixed
divisions leading to amusing things such as the same hardware supporting byte sizes
from 5 to 9 bits).

I've worked on 64 bit word machines. The CRAY is word addressed...there really
is NO such hardware datatype other than 64 bit integrals and 64 bit reals. Char's
are a unholy kludge in software (they didn't even try anything else, sizeof any non-comoosite
type is either 8 or 64).

Never say die, the 64 bit word machines are coming back (AMD, IA64, etc...)!

Ron Natalie · Jan 4, 2004

pete said:
the traditional meaning of "int", having the
"natural size suggested by the architecture
of the execution environment"

Of course even int's get perverted. For example, on many 64 bit
architectures where 64 bits is the natural size, they've just punted and
made int's 32 bits because that's what the larger body of code assumes.
It took us over a decade to get people to stop expecting *0 to be 0.

Joe Wright · Jan 4, 2004

[ snippage ]

On the 7094 and it's follow ons (including the UNIVAC and the DEC-10/20) the
word size is 36 bits. Anything smaller is a "partial word" (which there is no fixed
divisions leading to amusing things such as the same hardware supporting byte sizes
from 5 to 9 bits).

The IBM 7094 came out in January 1963 and was the last of its ilk from
IBM. Its follow on was the S/360 in 1964. I never came across a "partial
word". For I/O the 36-bit word was divided into 6-bit chunks to be
written to (and read from) 7-channel magnetic tape. For character I/O
the 6 bits were encoded into something called BCD which translated
directly to and from the 026 punch card. With the S/360 came the 32-bit
word and 8-bit character, 9-channel mag tape and EBCDIC (Extended BCD
Interchange Code).

Martin Ambuhl · Jan 4, 2004

Ron said:
On the 7094 and it's follow ons (including the UNIVAC and the DEC-10/20) the
word size is 36 bits. Anything smaller is a "partial word" (which there is no fixed
divisions leading to amusing things such as the same hardware supporting byte sizes
from 5 to 9 bits).

The PDP-10 and PDP-20 were "follow ons" to the PDP-6, not the 7094,
although both derived features from earlier machines. The the PDP-6/10
family (and, to a lesser degree, the 7090/7094 family) had many
instructions that operated on 18-bit halfwords, for the good reason that
instructions were divided with an 18-bit address field (+indirect bit).
This structure -- from 7094 side again -- lies behind the "car" and "cdr"
functions in Lisp.
The PDP-6 and -10 used byte pointers which could address bytes of any size
from 1- to 36-bits. Some sizes, notably 19-35 bits, are obviously quite
wasteful. The most common sizes were the ones you name (5- to 9-bit bytes).

Ron Natalie · Jan 4, 2004

Joe Wright said:
The IBM 7094 came out in January 1963 and was the last of its ilk from
IBM. Its follow on was the S/360 in 1964. I never came across a "partial
word".

The follow-on's were not from IBM. The 7094 begat both the UNIVAC
1100 series and the DEC mainframes. Both of which had the arbitrary
byte operations. The 7094 did have both 6 and 7 bit I/O bytes available.
The UNIVAC had an even larger array of byte size usage.

An another amusing asside, is that there was a UNIVAC communications
processor for the 1100-series (I'm spacing on it's nomenclature? CSE?),
which actually ran the 360 instruction set.

Speaking of the 7-track tape drivers, when they shop finally ditched the last
of the 7-track UNISERVO tape drivers we lost the ability to run the program
that played Christmas carols using the sound the tape in the vacuum columns
made. Nobody ever retuned it for the 9-track drives.

Keith Thompson · Jan 4, 2004

Ron Natalie said:
I've worked on 64 bit word machines. The CRAY is word
addressed...there really is NO such hardware datatype other than 64
bit integrals and 64 bit reals. Char's are a unholy kludge in
software (they didn't even try anything else, sizeof any
non-comoosite type is either 8 or 64).

There have been a number of different Cray models, with different
architectures, but I think the vector systems (the oldest I've worked
on was the T90) have been fairly consistent in their data types.

I think you're quoting bit sizes rather than byte sizes. The C
compiler uses an 8-bit byte for compatibility with other systems, even
though there's no real hardware support for 8-bit operands.
sizeof(char) is 1, of course; sizeof(TYPE) is 8 (64 bits) for each of
short, int, and long. Byte pointers are word pointers with a byte
offset kludged into the high-order 3 bits. Carefully written C code
works just fine; code that makes too many assumptions can fail badly.

The T3E isn't quite so exotic; it uses Alpha CPUs.

Endianness macros	48	Aug 23, 2013
Unpacking signed shorts and integers with specified endianness	4	Jun 18, 2007
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
Reading little-endian data from a file in a portable manner	46	Jul 15, 2010
Optimize a serial protocol for data exchange between two C applications	16	Dec 20, 2012
C++, wchar_t, Unicode and all that stuff	3	Dec 23, 2005
FAQ 6.1 How can I hope to use regular expressions without creating illegible and unmaintainable code	0	Feb 25, 2011
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 1, 2008

endianness and sscanf/sprintf

pramod

John Carson

EventHelix.com

Richard Heathfield

Martijn Lievaart

Jeff Schwab

Peter Pichler

EventHelix.com

Richard Heathfield

Martijn Lievaart

Keith Thompson

Martijn Lievaart

Lew Pitcher

pete

Ron Natalie

Ron Natalie

Joe Wright

Martin Ambuhl

Ron Natalie

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads