Byte swapping help please

Discussion in 'C Programming' started by Ann, Apr 13, 2006.

  1. Ann

    Ann Guest

    I am opening a file which looks like 0xABCDEF01 on another machine but
    0x01EFCDAB on my machine.

    Is this a byte swapping?

    Could anyone give a good way to check if bytes are being swapped? (code
    should work smoothly across different machine.)

    Thanks,
    Ann
    Ann, Apr 13, 2006
    #1
    1. Advertising

  2. Ann

    Craig Ruff Guest

    In article <>,
    Ann <> wrote:
    >I am opening a file which looks like 0xABCDEF01 on another machine but
    >0x01EFCDAB on my machine.
    >
    >Is this a byte swapping?


    Or possbily spouse swapping.

    >Could anyone give a good way to check if bytes are being swapped? (code
    >should work smoothly across different machine.)


    Perhaps we should just send the code directly to your instructor so
    we can get the credit for your homework?
    --

    Craig Ruff NCAR
    303-497-1211 P.O. Box 3000
    Boulder, CO 80307
    Craig Ruff, Apr 13, 2006
    #2
    1. Advertising

  3. "Ann" <> writes:
    > I am opening a file which looks like 0xABCDEF01 on another machine but
    > 0x01EFCDAB on my machine.
    >
    > Is this a byte swapping?


    Looks like it.

    > Could anyone give a good way to check if bytes are being swapped? (code
    > should work smoothly across different machine.)


    In principle, there is no reliable way to tell. If you read a 32-bit
    unsigned integer from a file and get a value of 0xABCDEF01, how can
    you know whether it should be 0xABCDEF01 or 0x01EFCDAB? Without more
    information, you can't. Even with more information, you may not be
    able to tell.

    If you're storing binary data in a file, byte ordering is only one of
    the problems you can run into. Sizes of types can vary across
    different implementations; so cah floating-point representations.

    The safest approach is to write *only* byte data. For example, if you
    want to write an integer value 0x01EFCDAB to a file, you can read and
    write the individual bytes (0x01, 0xEF, 0xCD, 0xAB) in a fixed order.
    Or you can write a textual representation of the number, which also
    has the advantage of letting you view the file with a text editor.

    Strictly speaking, you might still have problems on systems with byte
    sizes bigger than 8 bits, or with non-ASCII character sets; the former
    is unlikely to arise in practice, and the latter can be solved with
    textual conversion tools. (There are systems, mostly DSPs, with bytes
    bigger than 8 bits, but they're embedded systems, and you're not
    likely to need to share files with them.)

    If you must write raw binary data to a file, you might add information
    to the file header indicating how the data is formatted.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Apr 13, 2006
    #3
  4. Ann

    ray Guest

    On Thu, 13 Apr 2006 11:47:24 -0700, Ann wrote:

    > I am opening a file which looks like 0xABCDEF01 on another machine but
    > 0x01EFCDAB on my machine.
    >
    > Is this a byte swapping?
    >
    > Could anyone give a good way to check if bytes are being swapped? (code
    > should work smoothly across different machine.)
    >
    > Thanks,
    > Ann


    One is 'big-endian' and one is 'little-endian'. That is exactly what htonl
    and ntohl are for (host to network long and network to host long).
    ray, Apr 13, 2006
    #4
  5. (Craig Ruff) writes:
    > In article <>,
    > Ann <> wrote:
    >>I am opening a file which looks like 0xABCDEF01 on another machine but
    >>0x01EFCDAB on my machine.
    >>
    >>Is this a byte swapping?

    >
    > Or possbily spouse swapping.
    >
    >>Could anyone give a good way to check if bytes are being swapped? (code
    >>should work smoothly across different machine.)

    >
    > Perhaps we should just send the code directly to your instructor so
    > we can get the credit for your homework?


    I didn't see any indication that this was homework. (If it was, it's
    a very poorly stated problem; as I mentioned elsethread, there is no
    reliable way to detect the byte ordering of a binary file.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Apr 13, 2006
    #5
  6. Keith Thompson wrote:

    >... as I mentioned elsethread, there is no
    >reliable way to detect the byte ordering of a binary file.


    I believe there is, if you are allowed to "cheat" by using an
    auxiliary file, or a field with a known value at the beginning of the
    file.
    Assuming, for example, that the file will be used to store 32 bit
    integers, writing 0x12345678 as the first 4 octets will provide you
    will all the information you need to decode the following data
    correctly.
    Roberto Waltman, Apr 13, 2006
    #6
  7. Ann

    Al Balmer Guest

    On Thu, 13 Apr 2006 15:57:40 -0400, Roberto Waltman
    <> wrote:

    >Keith Thompson wrote:
    >
    >>... as I mentioned elsethread, there is no
    >>reliable way to detect the byte ordering of a binary file.

    >
    >I believe there is, if you are allowed to "cheat" by using an
    >auxiliary file, or a field with a known value at the beginning of the
    >file.
    >Assuming, for example, that the file will be used to store 32 bit
    >integers, writing 0x12345678 as the first 4 octets will provide you
    >will all the information you need to decode the following data
    >correctly.


    That can be a useful technique. William Waite used something like that
    to distribute the Stage 2 macro processor. As I recall, the Fortran
    bootstrap read a record containing the character set to be used.

    --
    Al Balmer
    Sun City, AZ
    Al Balmer, Apr 13, 2006
    #7
  8. Al Balmer wrote:

    <OT> ( I think..)
    > William Waite used something like that
    >to distribute the Stage 2 macro processor. As I recall, the Fortran
    >bootstrap read a record containing the character set to be used.


    What is/was "the Stage 2 macro processor" ?
    </OT>
    Roberto Waltman, Apr 13, 2006
    #8
  9. Ann

    Al Balmer Guest

    On Thu, 13 Apr 2006 16:49:16 -0400, Roberto Waltman
    <> wrote:

    >Al Balmer wrote:
    >
    ><OT> ( I think..)


    OT in comp.lang.c. I plead ignorance about the other two groups <g>.

    >> William Waite used something like that
    >>to distribute the Stage 2 macro processor. As I recall, the Fortran
    >>bootstrap read a record containing the character set to be used.

    >
    >What is/was "the Stage 2 macro processor" ?


    A rather nice macro processor designed to be ported to any system
    which had a Fortran compiler. Years ago, I used it to implement a
    system called "SAP" (Structured Assembler Programming) which was used
    successfully to implement a number of process control products. Here's
    an abstract of an early paper:
    http://hopl.murdoch.edu.au/showlanguage2.prx?exp=534
    ########
    * Waite, W. M. "The Mobile Programming System: STAGE2" view
    details Abstract: STAGE2 is the second level of a bootstrap sequence
    which is easily implemented on any computer. It is a flexible,
    powerful macro processor designed specifically as a tool for
    constructing machine-independent software. In this paper the features
    provided by STAGE2 are summarized, and the implementation techniques
    which have made it possible to have STAGE2 running on a new machine
    with less than one man-week of effort are discussed. The approach has
    been successful on over 15 machines of widely varying characteristics.
    DOI
    in [ACM] CACM 13(09) (Sep 1970) view details
    ########

    The published papers were not quite sufficient to implement the
    system. For that, the best resource was the book:

    Waite, W. M. Implementing Software for Non-numeric Applications, P-H
    1973

    ></OT>


    --
    Al Balmer
    Sun City, AZ
    Al Balmer, Apr 13, 2006
    #9
  10. Al Balmer wrote:

    >>What is/was "the Stage 2 macro processor" ?

    >
    >A rather nice macro processor designed to be ported to any system
    >which had a Fortran compiler. Years ago, I used it to implement a
    >system called "SAP" (Structured Assembler Programming) which was used
    >successfully to implement a number of process control products. Here's
    >an abstract of an early paper:
    >http://hopl.murdoch.edu.au/showlanguage2.prx?exp=534


    Thanks for the info, looks interesting. Of course now I must learn
    what FLUB and LIMP are, and then... There goes my weekend...
    Roberto Waltman, Apr 13, 2006
    #10
  11. Ann

    Guest

    , Apr 14, 2006
    #11
  12. writes:
    > Yes, the id stored at the beginning of the binary file is called
    > "magic number".
    >
    > Here is an example:
    > http://aslan.smnd.sk/anino/programming/gettext-doc/gettext_6.html


    Please read <http://cfaj.freeshell.org/google/>.

    Yes, if you store the right information in the file, it's possible to
    determine its endianness. My point was that there's no *general* way
    to do this. If I use fwrite() to write, say, an array of integers to
    a binary file, there's no way to determine the endianness of the file
    unless it's indicated explicitly, or unless I know something about
    the expected values.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Apr 14, 2006
    #12
  13. Ann

    Andy Glew Guest

    Roberto Waltman <> writes:

    > Al Balmer wrote:
    >
    > >>What is/was "the Stage 2 macro processor" ?

    > >
    > >A rather nice macro processor designed to be ported to any system
    > >which had a Fortran compiler. Years ago, I used it to implement a
    > >system called "SAP" (Structured Assembler Programming) which was used
    > >successfully to implement a number of process control products. Here's
    > >an abstract of an early paper:
    > >http://hopl.murdoch.edu.au/showlanguage2.prx?exp=534

    >
    > Thanks for the info, looks interesting. Of course now I must learn
    > what FLUB and LIMP are, and then... There goes my weekend...



    Cool.

    Circa 2000 I defined (and had somebody implement) a macro language
    that had a property similar to something I saw on a quick scan of
    Stage2: the text after expansion was completely reprocessed by all
    remaining patterns, repeatedly.

    Now, sure, macro languages like CPP will expand a macro, then will
    expand all macros in the macro, etc. But, to the best of my
    knowledge, they have a single pattern match going on - macro
    invocation such as FOO().

    The fun part was
    a) defining "best match" in a way that users found meaningful
    b) reparsing completely - e.g. Foo##Bar() might concatenate,
    and then expand FooBar()
    c) defining things in a way so that phase ordering did not
    produce unpleasant artifacts.

    Unfortunately, that project evaporated when I left Intel.
    Andy Glew, Apr 14, 2006
    #13
  14. Ann:
    >> Could anyone give a good way to check if bytes are being swapped? (code
    >> should work smoothly across different machine.)


    Keith Thompson:
    > If you're storing binary data in a file, byte ordering is only one of
    > the problems you can run into. Sizes of types can vary across
    > ...
    > The safest approach is to write *only* byte data. For example, if you


    Depending on what level you control the I/O and how often this is
    going to occur, you could also look at XDR, or something higher level
    like netCDF.

    --
    Charles Allen
    Charles Allen, Apr 14, 2006
    #14
  15. On Fri, 14 Apr 2006 02:10:06 +0000, Keith Thompson wrote:
    > Yes, if you store the right information in the file, it's possible to
    > determine its endianness. My point was that there's no *general* way
    > to do this. If I use fwrite() to write, say, an array of integers to
    > a binary file, there's no way to determine the endianness of the file
    > unless it's indicated explicitly, or unless I know something about
    > the expected values.


    If the numbers represent a white, random sequence, then it might not
    matter which order you read them. Maybe that's good enough for the OP?

    Personally, I favor the "write the bytes in the order you want them"
    school. Even htonl and friends have the problem that you have to cast the
    result to an unsigned char array, which can annoy the DSP processors that
    you spoke of, earlier. They'll usually be quite happy with the arithmetic
    of the explicit byte extraction approach, and if you have an octet-wide
    peripheral or memory to stuff the results, you'll even get the same file...
    (Luckily for the sanity of programmers, byte-addressability seems to be
    becoming more popular in DSPs too, at least those that have some
    expectation of being spoken to in C, some of the time.)

    Cheers,

    --
    Andrew
    Andrew Reilly, Apr 14, 2006
    #15
  16. Ann

    jaysome Guest

    Keith Thompson wrote:

    [snip]

    > Strictly speaking, you might still have problems on systems with byte
    > sizes bigger than 8 bits, or with non-ASCII character sets; the former
    > is unlikely to arise in practice, and the latter can be solved with
    > textual conversion tools.


    In other words, strictly speaking, portable C code does not exist in the
    real world. I just downloaded some code that was purported to be
    portable C. But it was written using the ASCII character set, and my
    development environment uses the EBCDIC character set. Needless to say,
    I got compilation errors.

    You can not distribute portable C code in electronic form--you must
    write a book or publish a document or use some other form of
    communication that conveys your source code. Those infatuated with
    portable C must somehow translate such a listing to their
    platform-specific character encoding if they wish to use your portable C
    code. The brute force way to do this is to type in the text manually in
    a text editor. Should you choose this route, you'd be much advised to
    teach your spouse to do this. I'm sure if you tell him or her that such
    an effort buttresses the spirit of portable C, him or her will willingly
    comply. Be rest assured there are some hard-core portable C fanatics
    working on a "portable" OCR solution to automate this task.

    What you see is not always what you get. When you open a file in your
    text editor or development environment, it assumes a certain character
    encoding of the file. If the character encoding is not what your text
    editor or development environment expects, then don't blame the C
    standard, which says nothing about how character encoding of files is
    specified: ASCII and EBCDIC, among others, are acceptable.

    --
    jay
    jaysome, Apr 15, 2006
    #16
  17. jaysome wrote:
    > Keith Thompson wrote:
    >
    > [snip]
    >
    >> Strictly speaking, you might still have problems on systems with byte
    >> sizes bigger than 8 bits, or with non-ASCII character sets; the former
    >> is unlikely to arise in practice, and the latter can be solved with
    >> textual conversion tools.

    >
    > In other words, strictly speaking, portable C code does not exist in the
    > real world. I just downloaded some code that was purported to be
    > portable C. But it was written using the ASCII character set, and my
    > development environment uses the EBCDIC character set. Needless to say,
    > I got compilation errors.


    Trolls that cannot use the default ascii-ebcdic conversion tools, not
    even to the extent of sending the code as email, needs to go back into
    their cave and hide.

    Terje

    --
    - <>
    "almost all programming can be viewed as an exercise in caching"
    Terje Mathisen, Apr 15, 2006
    #17
  18. wrote:
    >Yes, the id stored at the beginning of the binary file is called
    >"magic number".
    >
    >Here is an example:
    >http://aslan.smnd.sk/anino/programming/gettext-doc/gettext_6.html


    Please include some context from the message you are replying to. This
    is meaningless when read in isolation. Since I still remember what I
    wrote earlier today, I will provide it for you this time. ;)

    "Assuming, for example, that the file will be used to store 32 bit
    integers, writing 0x12345678 as the first 4 octets will provide you
    will all the information you need to decode the following data
    correctly."

    The intent here is not to have something that will identify what the
    contents or layout if the file are, as in the common use of file
    "magic numbers", but to detect byte-swapping.

    When you write a magic number to identify a file, you expect to read
    it back with the same value.

    What I refer to in my post, is that you can write

    0x12345678

    and, after transferring the file to another system, read back any of
    the following:

    0x12345678 - no change.
    0x78563412 - byte reversal.
    0x56781234 - high/low half swap.
    0x34127856 - byte swap between each half.

    ( I believe these are the only permutations that make sense. Somebody
    tell me I'm wrong? )

    If you know that the word was indeed 0x12345678, then you know how to
    correct for byte-swapping in the rest of the file.


    Roberto Waltman
    [ please reply to the group,
    return address is invalid ]
    Roberto Waltman, Apr 18, 2006
    #18
  19. Ann

    Micah Cowan Guest

    jaysome <> writes:

    > Keith Thompson wrote:
    >
    > [snip]
    >
    > > Strictly speaking, you might still have problems on systems with byte
    > > sizes bigger than 8 bits, or with non-ASCII character sets; the former
    > > is unlikely to arise in practice, and the latter can be solved with
    > > textual conversion tools.

    >
    > In other words, strictly speaking, portable C code does not exist in
    > the real world. I just downloaded some code that was purported to be
    > portable C. But it was written using the ASCII character set, and my
    > development environment uses the EBCDIC character set. Needless to
    > say, I got compilation errors.


    You make an extremely poor case.

    Source code that does not consist of characters in your
    implementation's source character set, is obviously not C code at all,
    from the perspective of your implementation.

    However, mainstream methods for downloading, such as HTTP,
    specifically provide information about whether a file is text or not,
    and usually (always, in the case of HTTP) what encoding it is in.

    Any reasonable definition for correctly and completely "downloading" a
    plaintext file from one host to another would necessarily include
    proper transcoding. Otherwise, what you have at the end is not the
    plaintext file that was offered to you by the server.

    Given this, there is certainly plenty of 100% portable C code. Though
    I'm sure it's well in the minority.

    --
    Micah J. Cowan
    Programmer, musician, typesetting enthusiast, gamer...
    http://micah.cowan.name/
    Micah Cowan, Apr 19, 2006
    #19
  20. Ann

    Old Wolf Guest

    jaysome wrote:
    >
    > In other words, strictly speaking, portable C code does not exist in the
    > real world. I just downloaded some code that was purported to be
    > portable C. But it was written using the ASCII character set, and my
    > development environment uses the EBCDIC character set.


    Which environment is that, out of interest?

    > Needless to say, I got compilation errors.


    The C standard defines a source character set -- all of which
    are supported by both ASCII and EBCDIC. So either the source
    was not actually portable in the first place (ie. it contained an
    illegal character), or you did not properly convert the source
    from ASCII to EBCDIC before loading it onto your system.

    > You can not distribute portable C code in electronic form--you must
    > write a book or publish a document or use some other form of
    > communication that conveys your source code. Those infatuated with
    > portable C must somehow translate such a listing to their
    > platform-specific character encoding if they wish to use your portable C
    > code. The brute force way to do this is to type in the text manually in
    > a text editor.


    This is just silly. Would you also say that source files cannot
    be distributed in gzip archives, because your C compiler cannot
    read gzip files? No. When you receive a source file, you first
    translate it to the correct form for your environment.

    > What you see is not always what you get. When you open a file in your
    > text editor or development environment, it assumes a certain character
    > encoding of the file. If the character encoding is not what your text
    > editor or development environment expects, then don't blame the C
    > standard


    Of course not. Invoking your development environment properly
    is not part of any standard, nor should it be. Read your IDE's
    documentation.
    Old Wolf, Apr 19, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. KK
    Replies:
    2
    Views:
    522
    Big Brian
    Oct 14, 2003
  2. jt
    Replies:
    3
    Views:
    910
    Keith Thompson
    May 23, 2005
  3. Konstantin Kletschke

    swapping bits in a byte

    Konstantin Kletschke, Apr 4, 2007, in forum: VHDL
    Replies:
    4
    Views:
    3,175
    Benjamin Todd
    Apr 10, 2007
  4. Replies:
    7
    Views:
    548
    Charlie Gordon
    Oct 1, 2007
  5. Jonathan Lee

    Re: byte swapping confusion

    Jonathan Lee, Nov 30, 2010, in forum: C++
    Replies:
    6
    Views:
    446
    Ian Collins
    Dec 2, 2010
Loading...

Share This Page