Byte swapping help please

A

Ann

I am opening a file which looks like 0xABCDEF01 on another machine but
0x01EFCDAB on my machine.

Is this a byte swapping?

Could anyone give a good way to check if bytes are being swapped? (code
should work smoothly across different machine.)

Thanks,
Ann
 
C

Craig Ruff

I am opening a file which looks like 0xABCDEF01 on another machine but
0x01EFCDAB on my machine.

Is this a byte swapping?

Or possbily spouse swapping.
Could anyone give a good way to check if bytes are being swapped? (code
should work smoothly across different machine.)

Perhaps we should just send the code directly to your instructor so
we can get the credit for your homework?
 
K

Keith Thompson

Ann said:
I am opening a file which looks like 0xABCDEF01 on another machine but
0x01EFCDAB on my machine.

Is this a byte swapping?

Looks like it.
Could anyone give a good way to check if bytes are being swapped? (code
should work smoothly across different machine.)

In principle, there is no reliable way to tell. If you read a 32-bit
unsigned integer from a file and get a value of 0xABCDEF01, how can
you know whether it should be 0xABCDEF01 or 0x01EFCDAB? Without more
information, you can't. Even with more information, you may not be
able to tell.

If you're storing binary data in a file, byte ordering is only one of
the problems you can run into. Sizes of types can vary across
different implementations; so cah floating-point representations.

The safest approach is to write *only* byte data. For example, if you
want to write an integer value 0x01EFCDAB to a file, you can read and
write the individual bytes (0x01, 0xEF, 0xCD, 0xAB) in a fixed order.
Or you can write a textual representation of the number, which also
has the advantage of letting you view the file with a text editor.

Strictly speaking, you might still have problems on systems with byte
sizes bigger than 8 bits, or with non-ASCII character sets; the former
is unlikely to arise in practice, and the latter can be solved with
textual conversion tools. (There are systems, mostly DSPs, with bytes
bigger than 8 bits, but they're embedded systems, and you're not
likely to need to share files with them.)

If you must write raw binary data to a file, you might add information
to the file header indicating how the data is formatted.
 
R

ray

I am opening a file which looks like 0xABCDEF01 on another machine but
0x01EFCDAB on my machine.

Is this a byte swapping?

Could anyone give a good way to check if bytes are being swapped? (code
should work smoothly across different machine.)

Thanks,
Ann

One is 'big-endian' and one is 'little-endian'. That is exactly what htonl
and ntohl are for (host to network long and network to host long).
 
K

Keith Thompson

Or possbily spouse swapping.


Perhaps we should just send the code directly to your instructor so
we can get the credit for your homework?

I didn't see any indication that this was homework. (If it was, it's
a very poorly stated problem; as I mentioned elsethread, there is no
reliable way to detect the byte ordering of a binary file.)
 
R

Roberto Waltman

Keith said:
... as I mentioned elsethread, there is no
reliable way to detect the byte ordering of a binary file.

I believe there is, if you are allowed to "cheat" by using an
auxiliary file, or a field with a known value at the beginning of the
file.
Assuming, for example, that the file will be used to store 32 bit
integers, writing 0x12345678 as the first 4 octets will provide you
will all the information you need to decode the following data
correctly.
 
A

Al Balmer

I believe there is, if you are allowed to "cheat" by using an
auxiliary file, or a field with a known value at the beginning of the
file.
Assuming, for example, that the file will be used to store 32 bit
integers, writing 0x12345678 as the first 4 octets will provide you
will all the information you need to decode the following data
correctly.

That can be a useful technique. William Waite used something like that
to distribute the Stage 2 macro processor. As I recall, the Fortran
bootstrap read a record containing the character set to be used.
 
R

Roberto Waltman

Al Balmer wrote:

William Waite used something like that
to distribute the Stage 2 macro processor. As I recall, the Fortran
bootstrap read a record containing the character set to be used.

What is/was "the Stage 2 macro processor" ?
</OT>
 
A

Al Balmer

Al Balmer wrote:

<OT> ( I think..)

OT in comp.lang.c. I plead ignorance about the other two groups said:
What is/was "the Stage 2 macro processor" ?

A rather nice macro processor designed to be ported to any system
which had a Fortran compiler. Years ago, I used it to implement a
system called "SAP" (Structured Assembler Programming) which was used
successfully to implement a number of process control products. Here's
an abstract of an early paper:
http://hopl.murdoch.edu.au/showlanguage2.prx?exp=534
########
* Waite, W. M. "The Mobile Programming System: STAGE2" view
details Abstract: STAGE2 is the second level of a bootstrap sequence
which is easily implemented on any computer. It is a flexible,
powerful macro processor designed specifically as a tool for
constructing machine-independent software. In this paper the features
provided by STAGE2 are summarized, and the implementation techniques
which have made it possible to have STAGE2 running on a new machine
with less than one man-week of effort are discussed. The approach has
been successful on over 15 machines of widely varying characteristics.
DOI
in [ACM] CACM 13(09) (Sep 1970) view details
########

The published papers were not quite sufficient to implement the
system. For that, the best resource was the book:

Waite, W. M. Implementing Software for Non-numeric Applications, P-H
1973
 
R

Roberto Waltman

Al said:
A rather nice macro processor designed to be ported to any system
which had a Fortran compiler. Years ago, I used it to implement a
system called "SAP" (Structured Assembler Programming) which was used
successfully to implement a number of process control products. Here's
an abstract of an early paper:
http://hopl.murdoch.edu.au/showlanguage2.prx?exp=534

Thanks for the info, looks interesting. Of course now I must learn
what FLUB and LIMP are, and then... There goes my weekend...
 
K

Keith Thompson

Yes, the id stored at the beginning of the binary file is called
"magic number".

Here is an example:
http://aslan.smnd.sk/anino/programming/gettext-doc/gettext_6.html

Please read <http://cfaj.freeshell.org/google/>.

Yes, if you store the right information in the file, it's possible to
determine its endianness. My point was that there's no *general* way
to do this. If I use fwrite() to write, say, an array of integers to
a binary file, there's no way to determine the endianness of the file
unless it's indicated explicitly, or unless I know something about
the expected values.
 
A

Andy Glew

Roberto Waltman said:
Thanks for the info, looks interesting. Of course now I must learn
what FLUB and LIMP are, and then... There goes my weekend...


Cool.

Circa 2000 I defined (and had somebody implement) a macro language
that had a property similar to something I saw on a quick scan of
Stage2: the text after expansion was completely reprocessed by all
remaining patterns, repeatedly.

Now, sure, macro languages like CPP will expand a macro, then will
expand all macros in the macro, etc. But, to the best of my
knowledge, they have a single pattern match going on - macro
invocation such as FOO().

The fun part was
a) defining "best match" in a way that users found meaningful
b) reparsing completely - e.g. Foo##Bar() might concatenate,
and then expand FooBar()
c) defining things in a way so that phase ordering did not
produce unpleasant artifacts.

Unfortunately, that project evaporated when I left Intel.
 
C

Charles Allen

Ann:
Keith Thompson:
If you're storing binary data in a file, byte ordering is only one of
the problems you can run into. Sizes of types can vary across
...
The safest approach is to write *only* byte data. For example, if you

Depending on what level you control the I/O and how often this is
going to occur, you could also look at XDR, or something higher level
like netCDF.
 
A

Andrew Reilly

Yes, if you store the right information in the file, it's possible to
determine its endianness. My point was that there's no *general* way
to do this. If I use fwrite() to write, say, an array of integers to
a binary file, there's no way to determine the endianness of the file
unless it's indicated explicitly, or unless I know something about
the expected values.

If the numbers represent a white, random sequence, then it might not
matter which order you read them. Maybe that's good enough for the OP?

Personally, I favor the "write the bytes in the order you want them"
school. Even htonl and friends have the problem that you have to cast the
result to an unsigned char array, which can annoy the DSP processors that
you spoke of, earlier. They'll usually be quite happy with the arithmetic
of the explicit byte extraction approach, and if you have an octet-wide
peripheral or memory to stuff the results, you'll even get the same file...
(Luckily for the sanity of programmers, byte-addressability seems to be
becoming more popular in DSPs too, at least those that have some
expectation of being spoken to in C, some of the time.)

Cheers,
 
J

jaysome

Keith Thompson wrote:

[snip]
Strictly speaking, you might still have problems on systems with byte
sizes bigger than 8 bits, or with non-ASCII character sets; the former
is unlikely to arise in practice, and the latter can be solved with
textual conversion tools.

In other words, strictly speaking, portable C code does not exist in the
real world. I just downloaded some code that was purported to be
portable C. But it was written using the ASCII character set, and my
development environment uses the EBCDIC character set. Needless to say,
I got compilation errors.

You can not distribute portable C code in electronic form--you must
write a book or publish a document or use some other form of
communication that conveys your source code. Those infatuated with
portable C must somehow translate such a listing to their
platform-specific character encoding if they wish to use your portable C
code. The brute force way to do this is to type in the text manually in
a text editor. Should you choose this route, you'd be much advised to
teach your spouse to do this. I'm sure if you tell him or her that such
an effort buttresses the spirit of portable C, him or her will willingly
comply. Be rest assured there are some hard-core portable C fanatics
working on a "portable" OCR solution to automate this task.

What you see is not always what you get. When you open a file in your
text editor or development environment, it assumes a certain character
encoding of the file. If the character encoding is not what your text
editor or development environment expects, then don't blame the C
standard, which says nothing about how character encoding of files is
specified: ASCII and EBCDIC, among others, are acceptable.
 
T

Terje Mathisen

jaysome said:
Keith Thompson wrote:

[snip]
Strictly speaking, you might still have problems on systems with byte
sizes bigger than 8 bits, or with non-ASCII character sets; the former
is unlikely to arise in practice, and the latter can be solved with
textual conversion tools.

In other words, strictly speaking, portable C code does not exist in the
real world. I just downloaded some code that was purported to be
portable C. But it was written using the ASCII character set, and my
development environment uses the EBCDIC character set. Needless to say,
I got compilation errors.

Trolls that cannot use the default ascii-ebcdic conversion tools, not
even to the extent of sending the code as email, needs to go back into
their cave and hide.

Terje
 
R

Roberto Waltman

Yes, the id stored at the beginning of the binary file is called
"magic number".

Here is an example:
http://aslan.smnd.sk/anino/programming/gettext-doc/gettext_6.html

Please include some context from the message you are replying to. This
is meaningless when read in isolation. Since I still remember what I
wrote earlier today, I will provide it for you this time. ;)

"Assuming, for example, that the file will be used to store 32 bit
integers, writing 0x12345678 as the first 4 octets will provide you
will all the information you need to decode the following data
correctly."

The intent here is not to have something that will identify what the
contents or layout if the file are, as in the common use of file
"magic numbers", but to detect byte-swapping.

When you write a magic number to identify a file, you expect to read
it back with the same value.

What I refer to in my post, is that you can write

0x12345678

and, after transferring the file to another system, read back any of
the following:

0x12345678 - no change.
0x78563412 - byte reversal.
0x56781234 - high/low half swap.
0x34127856 - byte swap between each half.

( I believe these are the only permutations that make sense. Somebody
tell me I'm wrong? )

If you know that the word was indeed 0x12345678, then you know how to
correct for byte-swapping in the rest of the file.


Roberto Waltman
[ please reply to the group,
return address is invalid ]
 
M

Micah Cowan

jaysome said:
Keith Thompson wrote:

[snip]
Strictly speaking, you might still have problems on systems with byte
sizes bigger than 8 bits, or with non-ASCII character sets; the former
is unlikely to arise in practice, and the latter can be solved with
textual conversion tools.

In other words, strictly speaking, portable C code does not exist in
the real world. I just downloaded some code that was purported to be
portable C. But it was written using the ASCII character set, and my
development environment uses the EBCDIC character set. Needless to
say, I got compilation errors.

You make an extremely poor case.

Source code that does not consist of characters in your
implementation's source character set, is obviously not C code at all,
from the perspective of your implementation.

However, mainstream methods for downloading, such as HTTP,
specifically provide information about whether a file is text or not,
and usually (always, in the case of HTTP) what encoding it is in.

Any reasonable definition for correctly and completely "downloading" a
plaintext file from one host to another would necessarily include
proper transcoding. Otherwise, what you have at the end is not the
plaintext file that was offered to you by the server.

Given this, there is certainly plenty of 100% portable C code. Though
I'm sure it's well in the minority.
 
O

Old Wolf

jaysome said:
In other words, strictly speaking, portable C code does not exist in the
real world. I just downloaded some code that was purported to be
portable C. But it was written using the ASCII character set, and my
development environment uses the EBCDIC character set.

Which environment is that, out of interest?
Needless to say, I got compilation errors.

The C standard defines a source character set -- all of which
are supported by both ASCII and EBCDIC. So either the source
was not actually portable in the first place (ie. it contained an
illegal character), or you did not properly convert the source
from ASCII to EBCDIC before loading it onto your system.
You can not distribute portable C code in electronic form--you must
write a book or publish a document or use some other form of
communication that conveys your source code. Those infatuated with
portable C must somehow translate such a listing to their
platform-specific character encoding if they wish to use your portable C
code. The brute force way to do this is to type in the text manually in
a text editor.

This is just silly. Would you also say that source files cannot
be distributed in gzip archives, because your C compiler cannot
read gzip files? No. When you receive a source file, you first
translate it to the correct form for your environment.
What you see is not always what you get. When you open a file in your
text editor or development environment, it assumes a certain character
encoding of the file. If the character encoding is not what your text
editor or development environment expects, then don't blame the C
standard

Of course not. Invoking your development environment properly
is not part of any standard, nor should it be. Read your IDE's
documentation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,533
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top