Storing data of different byte sizes

Discussion in 'C++' started by magnus.moraberg@gmail.com, May 23, 2009.

  1. Guest

    Hi,

    I wish to read a wave file header which uses different amounts of
    bytes to store different pieces of information. For example -

    two bytes for the number of channels
    four bytes for the length of the raw data.

    But since the basic types in c++ are system independent, I'm unsure
    how to store these. This is currently how I read the data length -

    waveFile.seekg(40);
    waveFile.read(reinterpret_cast<char*>(&dataLength), 4);

    where dataLength is an unsigned int. This is all well and good while
    an int is four bytes, but how would you guys do this?

    Am I right in saying that a char is always one byte?

    Also, the actually data samples can themselves have different byte
    sizes. So lets say I store 10 samples in memory, each 4 bytes in size.
    I would use this code to point to a particular sample -

    byteSize = 4;
    char* sampleBufferPtr = populate(/**/);
    sampleBufferPtr(byteSize*sampleIndex)

    but how would I convert the sample to a float?

    Thanks for your help,

    Barry.
    , May 23, 2009
    #1
    1. Advertising

  2. Ron AF Greve Guest

    Hi,

    This is how I read the 'SubChunkSize':

    UInt32 SubChunk2Size; // == NumSamples * NumChannels * BitsPerSample/8 (is
    the rest of this file)

    I just read in with Stream.read() the size of the UInt32 (my own type which
    is always 4 bytes unsigned int).

    I just read the whole header as one struct though (including that last
    field) and only handle one type of Wav file.

    Input.read( reinterpret_cast<char*>( &WavHeader.WavHeader ), sizeof(
    WavHeader.WavHeader ) );

    From the top of my head; I thought the samples are integers (could be 8/16
    mono or stereo) and I think most other libraries also uses ints (like
    OpenAL, speex etc.).

    Regards, Ron AF Greve

    http://informationsuperhighway.eu

    <> wrote in message
    news:...
    > Hi,
    >
    > I wish to read a wave file header which uses different amounts of
    > bytes to store different pieces of information. For example -
    >
    > two bytes for the number of channels
    > four bytes for the length of the raw data.
    >
    > But since the basic types in c++ are system independent, I'm unsure
    > how to store these. This is currently how I read the data length -
    >
    > waveFile.seekg(40);
    > waveFile.read(reinterpret_cast<char*>(&dataLength), 4);
    >
    > where dataLength is an unsigned int. This is all well and good while
    > an int is four bytes, but how would you guys do this?
    >
    > Am I right in saying that a char is always one byte?
    >
    > Also, the actually data samples can themselves have different byte
    > sizes. So lets say I store 10 samples in memory, each 4 bytes in size.
    > I would use this code to point to a particular sample -
    >
    > byteSize = 4;
    > char* sampleBufferPtr = populate(/**/);
    > sampleBufferPtr(byteSize*sampleIndex)
    >
    > but how would I convert the sample to a float?
    >
    > Thanks for your help,
    >
    > Barry.
    Ron AF Greve, May 24, 2009
    #2
    1. Advertising

  3. Ian Collins Guest

    wrote:
    > Hi,
    >
    > I wish to read a wave file header which uses different amounts of
    > bytes to store different pieces of information. For example -
    >
    > two bytes for the number of channels
    > four bytes for the length of the raw data.
    >
    > But since the basic types in c++ are system independent, I'm unsure
    > how to store these. This is currently how I read the data length -
    >
    > waveFile.seekg(40);
    > waveFile.read(reinterpret_cast<char*>(&dataLength), 4);
    >
    > where dataLength is an unsigned int. This is all well and good while
    > an int is four bytes, but how would you guys do this?


    The only truly portable solution is to read the data in bytes and build
    up whatever bigger types are required, depending on size and byte order.

    If the byte system order matches the file byte order, you can use the
    widely available C fixed width types (intN_t).

    > Am I right in saying that a char is always one byte?


    Yes.

    > Also, the actually data samples can themselves have different byte
    > sizes. So lets say I store 10 samples in memory, each 4 bytes in size.
    > I would use this code to point to a particular sample -
    >
    > byteSize = 4;
    > char* sampleBufferPtr = populate(/**/);
    > sampleBufferPtr(byteSize*sampleIndex)
    >
    > but how would I convert the sample to a float?


    You would have to know the floating point format used. If the file
    format matches your host, you can get away with a cast:

    float* fp = reinterpret_cast<float*>(sampleBufferPtr);

    --
    Ian Collins
    Ian Collins, May 24, 2009
    #3
  4. James Kanze Guest

    On May 24, 1:45 am, Ian Collins <> wrote:
    > wrote:


    > > I wish to read a wave file header which uses different
    > > amounts of bytes to store different pieces of
    > > information. For example -


    > > two bytes for the number of channels
    > > four bytes for the length of the raw data.


    > > But since the basic types in c++ are system independent,
    > > I'm unsure how to store these. This is currently how I
    > > read the data length -

    >
    > > waveFile.seekg(40);
    > > waveFile.read(reinterpret_cast<char*>(&dataLength), 4);


    > > where dataLength is an unsigned int. This is all well
    > > and good while an int is four bytes,


    It doesn't necessarily work even when int is four bytes.

    > > but how would you guys do this?


    > The only truly portable solution is to read the data in
    > bytes and build up whatever bigger types are required,
    > depending on size and byte order.


    > If the byte system order matches the file byte order, you
    > can use the widely available C fixed width types
    > (intN_t).


    Only if the file format uses 2's complement (usually the
    case).

    > > Am I right in saying that a char is always one byte?


    > Yes.


    Yes, but. A byte isn't necessarily 8 bits.

    > > Also, the actually data samples can themselves have
    > > different byte sizes. So lets say I store 10 samples in
    > > memory, each 4 bytes in size. I would use this code to
    > > point to a particular sample -


    > > byteSize = 4;
    > > char* sampleBufferPtr = populate(/**/);
    > > sampleBufferPtr(byteSize*sampleIndex)


    > > but how would I convert the sample to a float?


    > You would have to know the floating point format used. If
    > the file format matches your host, you can get away with a
    > cast:


    > float* fp = reinterpret_cast<float*>(sampleBufferPtr);


    Maybe. Not with g++. It's definitely undefined behavior,
    and although IMHO, the intent of the standard was more or
    less for this to work, there are various reasons that it
    doesn't always. (I'm supposing here that sampleBufferPtr
    has type uint32_t*, and points to a valid uint32_t. If it
    is just a pointer into your buffer, the code will core dump
    on most processors, because of alignment considerations, and
    there's not much the compiler can do about it.)

    If the floating point format is the same in the file and in
    your machine, you can use memcpy to copy a uint32_t into a
    float. Otherwise, you've got to extract the fields, and use
    functions like ldexp to create the actual value. (The code
    to do so is actually fairly simple---until you add all of
    the necessary error handling:).)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, May 24, 2009
    #4
  5. James Kanze Guest

    On May 24, 1:33 am, "Ron AF Greve" <me@localhost> wrote:

    > This is how I read the 'SubChunkSize':


    > UInt32 SubChunk2Size; // == NumSamples * NumChannels * BitsPerSample/8 (is
    > the rest of this file)


    > I just read in with Stream.read() the size of the UInt32
    > (my own type which is always 4 bytes unsigned int).


    > I just read the whole header as one struct though
    > (including that last field) and only handle one type of
    > Wav file.


    Which if it works, is only by shear luck. It's not
    guaranteed, and it doesn't work most of the time.
    (Depending on the file format, it will fail on a Sparc, or
    on an Intel machine---for most network formats, it will fail
    on the Intel.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, May 24, 2009
    #5
  6. wrote:

    > I wish to read a wave file header which uses different amounts
    > of bytes to store different pieces of information.


    If possible, use libsndfile. If not, you can still study it.
    http://www.mega-nerd.com/libsndfile/


    Martin

    --
    Quidquid latine scriptum est, altum videtur.
    Martin Eisenberg, May 24, 2009
    #6
  7. Ron AF Greve Guest

    Hi,


    Well actually the only problem I can imagine when I would move to a 128 byte
    system (I probably have to read field by field). However due to the
    structure of the header this was just a bit of a timesaver instead of
    writing everything out.

    All types are my own fixed size types and it certainly does work on intel,
    since there is where I use it, since I read the file and can listen to the
    contents using OpenAL I think I can safely assume it works :)

    But you are right that on systems with other 'endiness' there needs some
    byte swapping to be done and for systems larger than 64 bits, ressetting the
    file pointer might be necessary.

    Regards, Ron AF Greve

    http://informationsuperhighway.eu

    "James Kanze" <> wrote in message
    news:...
    On May 24, 1:33 am, "Ron AF Greve" <me@localhost> wrote:

    > This is how I read the 'SubChunkSize':


    > UInt32 SubChunk2Size; // == NumSamples * NumChannels * BitsPerSample/8
    > (is
    > the rest of this file)


    > I just read in with Stream.read() the size of the UInt32
    > (my own type which is always 4 bytes unsigned int).


    > I just read the whole header as one struct though
    > (including that last field) and only handle one type of
    > Wav file.


    Which if it works, is only by shear luck. It's not
    guaranteed, and it doesn't work most of the time.
    (Depending on the file format, it will fail on a Sparc, or
    on an Intel machine---for most network formats, it will fail
    on the Intel.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    Ron AF Greve, May 24, 2009
    #7
  8. James Kanze Guest

    On May 24, 1:46 pm, "Ron AF Greve" <me@localhost> wrote:

    > Well actually the only problem I can imagine when I would move
    > to a 128 byte system (I probably have to read field by field).
    > However due to the structure of the header this was just a bit
    > of a timesaver instead of writing everything out.


    > All types are my own fixed size types and it certainly does
    > work on intel, since there is where I use it, since I read the
    > file and can listen to the contents using OpenAL I think I can
    > safely assume it works :)


    > But you are right that on systems with other 'endiness' there
    > needs some byte swapping to be done and for systems larger
    > than 64 bits, ressetting the file pointer might be necessary.


    And on systems with different integral representations you'll
    need other adjustments, and with compilers which use different
    padding, you'll need other adjustments, and on systems where
    bytes aren't 8 bits, you'll need other adjustments.

    FWIW: I've seen byte order of a 32 bit integer change from one
    version of the compiler to the next, from 2301 to 0123. Padding
    often changes according to compiler options. And most systems
    (not Intel) have alignment restrictions, which means that if the
    data in the buffer isn't aligned, you get a core dump.

    Byte order is just the tip of the iceberg. In practice, you
    need to define the format (or use an already defined format),
    and implement the correct formatting.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, May 25, 2009
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jaap de Bergen
    Replies:
    15
    Views:
    578
    Andrew Thompson
    Sep 1, 2004
  2. crash.test.dummy
    Replies:
    1
    Views:
    913
    Knute Johnson
    Feb 17, 2006
  3. daman
    Replies:
    9
    Views:
    521
    kchayka
    Apr 19, 2005
  4. jimgardener

    read() returns data of different sizes

    jimgardener, Oct 2, 2010, in forum: Python
    Replies:
    1
    Views:
    241
    Chris Rebert
    Oct 2, 2010
  5. Myth__Buster
    Replies:
    23
    Views:
    1,083
    Nobody
    Jun 26, 2012
Loading...

Share This Page