Storing data of different byte sizes

Discussion in 'C++' started by magnus.moraberg, May 23, 2009.

  1. Hi,

    I wish to read a wave file header which uses different amounts of
    bytes to store different pieces of information. For example -

    two bytes for the number of channels
    four bytes for the length of the raw data.

    But since the basic types in c++ are system independent, I'm unsure
    how to store these. This is currently how I read the data length -

    waveFile.seekg(40);<char*>(&dataLength), 4);

    where dataLength is an unsigned int. This is all well and good while
    an int is four bytes, but how would you guys do this?

    Am I right in saying that a char is always one byte?

    Also, the actually data samples can themselves have different byte
    sizes. So lets say I store 10 samples in memory, each 4 bytes in size.
    I would use this code to point to a particular sample -

    byteSize = 4;
    char* sampleBufferPtr = populate(/**/);

    but how would I convert the sample to a float?

    Thanks for your help,

    magnus.moraberg, May 23, 2009
    1. Advertisements

  2. magnus.moraberg

    Ron AF Greve Guest


    This is how I read the 'SubChunkSize':

    UInt32 SubChunk2Size; // == NumSamples * NumChannels * BitsPerSample/8 (is
    the rest of this file)

    I just read in with the size of the UInt32 (my own type which
    is always 4 bytes unsigned int).

    I just read the whole header as one struct though (including that last
    field) and only handle one type of Wav file. reinterpret_cast<char*>( &WavHeader.WavHeader ), sizeof(
    WavHeader.WavHeader ) );

    From the top of my head; I thought the samples are integers (could be 8/16
    mono or stereo) and I think most other libraries also uses ints (like
    OpenAL, speex etc.).

    Regards, Ron AF Greve
    Ron AF Greve, May 24, 2009
    1. Advertisements

  3. magnus.moraberg

    Ian Collins Guest

    The only truly portable solution is to read the data in bytes and build
    up whatever bigger types are required, depending on size and byte order.

    If the byte system order matches the file byte order, you can use the
    widely available C fixed width types (intN_t).
    You would have to know the floating point format used. If the file
    format matches your host, you can get away with a cast:

    float* fp = reinterpret_cast<float*>(sampleBufferPtr);
    Ian Collins, May 24, 2009
  4. magnus.moraberg

    James Kanze Guest

    It doesn't necessarily work even when int is four bytes.

    Only if the file format uses 2's complement (usually the
    Yes, but. A byte isn't necessarily 8 bits.
    Maybe. Not with g++. It's definitely undefined behavior,
    and although IMHO, the intent of the standard was more or
    less for this to work, there are various reasons that it
    doesn't always. (I'm supposing here that sampleBufferPtr
    has type uint32_t*, and points to a valid uint32_t. If it
    is just a pointer into your buffer, the code will core dump
    on most processors, because of alignment considerations, and
    there's not much the compiler can do about it.)

    If the floating point format is the same in the file and in
    your machine, you can use memcpy to copy a uint32_t into a
    float. Otherwise, you've got to extract the fields, and use
    functions like ldexp to create the actual value. (The code
    to do so is actually fairly simple---until you add all of
    the necessary error handling:).)
    James Kanze, May 24, 2009
  5. magnus.moraberg

    James Kanze Guest

    Which if it works, is only by shear luck. It's not
    guaranteed, and it doesn't work most of the time.
    (Depending on the file format, it will fail on a Sparc, or
    on an Intel machine---for most network formats, it will fail
    on the Intel.)
    James Kanze, May 24, 2009
  6. If possible, use libsndfile. If not, you can still study it.

    Martin Eisenberg, May 24, 2009
  7. magnus.moraberg

    Ron AF Greve Guest


    Well actually the only problem I can imagine when I would move to a 128 byte
    system (I probably have to read field by field). However due to the
    structure of the header this was just a bit of a timesaver instead of
    writing everything out.

    All types are my own fixed size types and it certainly does work on intel,
    since there is where I use it, since I read the file and can listen to the
    contents using OpenAL I think I can safely assume it works :)

    But you are right that on systems with other 'endiness' there needs some
    byte swapping to be done and for systems larger than 64 bits, ressetting the
    file pointer might be necessary.

    Regards, Ron AF Greve

    Which if it works, is only by shear luck. It's not
    guaranteed, and it doesn't work most of the time.
    (Depending on the file format, it will fail on a Sparc, or
    on an Intel machine---for most network formats, it will fail
    on the Intel.)
    Ron AF Greve, May 24, 2009
  8. magnus.moraberg

    James Kanze Guest

    And on systems with different integral representations you'll
    need other adjustments, and with compilers which use different
    padding, you'll need other adjustments, and on systems where
    bytes aren't 8 bits, you'll need other adjustments.

    FWIW: I've seen byte order of a 32 bit integer change from one
    version of the compiler to the next, from 2301 to 0123. Padding
    often changes according to compiler options. And most systems
    (not Intel) have alignment restrictions, which means that if the
    data in the buffer isn't aligned, you get a core dump.

    Byte order is just the tip of the iceberg. In practice, you
    need to define the format (or use an already defined format),
    and implement the correct formatting.
    James Kanze, May 25, 2009
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.