Storing data of different byte sizes

magnus.moraberg · May 23, 2009

Hi,

I wish to read a wave file header which uses different amounts of
bytes to store different pieces of information. For example -

two bytes for the number of channels
four bytes for the length of the raw data.

But since the basic types in c++ are system independent, I'm unsure
how to store these. This is currently how I read the data length -

waveFile.seekg(40);
waveFile.read(reinterpret_cast<char*>(&dataLength), 4);

where dataLength is an unsigned int. This is all well and good while
an int is four bytes, but how would you guys do this?

Am I right in saying that a char is always one byte?

Also, the actually data samples can themselves have different byte
sizes. So lets say I store 10 samples in memory, each 4 bytes in size.
I would use this code to point to a particular sample -

byteSize = 4;
char* sampleBufferPtr = populate(/**/);
sampleBufferPtr(byteSize*sampleIndex)

but how would I convert the sample to a float?

Thanks for your help,

Barry.

Ron AF Greve · May 24, 2009

Hi,

This is how I read the 'SubChunkSize':

UInt32 SubChunk2Size; // == NumSamples * NumChannels * BitsPerSample/8 (is
the rest of this file)

I just read in with Stream.read() the size of the UInt32 (my own type which
is always 4 bytes unsigned int).

I just read the whole header as one struct though (including that last
field) and only handle one type of Wav file.

Input.read( reinterpret_cast<char*>( &WavHeader.WavHeader ), sizeof(
WavHeader.WavHeader ) );

From the top of my head; I thought the samples are integers (could be 8/16
mono or stereo) and I think most other libraries also uses ints (like
OpenAL, speex etc.).

Regards, Ron AF Greve

http://informationsuperhighway.eu

Ian Collins · May 24, 2009

Hi,

I wish to read a wave file header which uses different amounts of
bytes to store different pieces of information. For example -

two bytes for the number of channels
four bytes for the length of the raw data.

But since the basic types in c++ are system independent, I'm unsure
how to store these. This is currently how I read the data length -

waveFile.seekg(40);
waveFile.read(reinterpret_cast<char*>(&dataLength), 4);

where dataLength is an unsigned int. This is all well and good while
an int is four bytes, but how would you guys do this?

The only truly portable solution is to read the data in bytes and build
up whatever bigger types are required, depending on size and byte order.

If the byte system order matches the file byte order, you can use the
widely available C fixed width types (intN_t).

Am I right in saying that a char is always one byte?
Yes.

Also, the actually data samples can themselves have different byte
sizes. So lets say I store 10 samples in memory, each 4 bytes in size.
I would use this code to point to a particular sample -

byteSize = 4;
char* sampleBufferPtr = populate(/**/);
sampleBufferPtr(byteSize*sampleIndex)

but how would I convert the sample to a float?

You would have to know the floating point format used. If the file
format matches your host, you can get away with a cast:

float* fp = reinterpret_cast<float*>(sampleBufferPtr);

James Kanze · May 24, 2009

(e-mail address removed) wrote:

It doesn't necessarily work even when int is four bytes.

The only truly portable solution is to read the data in
bytes and build up whatever bigger types are required,
depending on size and byte order.

If the byte system order matches the file byte order, you
can use the widely available C fixed width types
(intN_t).

Only if the file format uses 2's complement (usually the
case).

Yes.

Yes, but. A byte isn't necessarily 8 bits.

You would have to know the floating point format used. If
the file format matches your host, you can get away with a
cast:

float* fp = reinterpret_cast<float*>(sampleBufferPtr);

Maybe. Not with g++. It's definitely undefined behavior,
and although IMHO, the intent of the standard was more or
less for this to work, there are various reasons that it
doesn't always. (I'm supposing here that sampleBufferPtr
has type uint32_t*, and points to a valid uint32_t. If it
is just a pointer into your buffer, the code will core dump
on most processors, because of alignment considerations, and
there's not much the compiler can do about it.)

If the floating point format is the same in the file and in
your machine, you can use memcpy to copy a uint32_t into a
float. Otherwise, you've got to extract the fields, and use
functions like ldexp to create the actual value. (The code
to do so is actually fairly simple---until you add all of
the necessary error handling

.)

James Kanze · May 24, 2009

This is how I read the 'SubChunkSize':

UInt32 SubChunk2Size; // == NumSamples * NumChannels * BitsPerSample/8 (is
the rest of this file)

I just read in with Stream.read() the size of the UInt32
(my own type which is always 4 bytes unsigned int).

I just read the whole header as one struct though
(including that last field) and only handle one type of
Wav file.

Which if it works, is only by shear luck. It's not
guaranteed, and it doesn't work most of the time.
(Depending on the file format, it will fail on a Sparc, or
on an Intel machine---for most network formats, it will fail
on the Intel.)

Martin Eisenberg · May 24, 2009

I wish to read a wave file header which uses different amounts
of bytes to store different pieces of information.

If possible, use libsndfile. If not, you can still study it.
http://www.mega-nerd.com/libsndfile/

Martin

Ron AF Greve · May 24, 2009

Hi,

Well actually the only problem I can imagine when I would move to a 128 byte
system (I probably have to read field by field). However due to the
structure of the header this was just a bit of a timesaver instead of
writing everything out.

All types are my own fixed size types and it certainly does work on intel,
since there is where I use it, since I read the file and can listen to the
contents using OpenAL I think I can safely assume it works

But you are right that on systems with other 'endiness' there needs some
byte swapping to be done and for systems larger than 64 bits, ressetting the
file pointer might be necessary.

Regards, Ron AF Greve

http://informationsuperhighway.eu

This is how I read the 'SubChunkSize':

UInt32 SubChunk2Size; // == NumSamples * NumChannels * BitsPerSample/8
(is
the rest of this file)

I just read in with Stream.read() the size of the UInt32
(my own type which is always 4 bytes unsigned int).

I just read the whole header as one struct though
(including that last field) and only handle one type of
Wav file.

Which if it works, is only by shear luck. It's not
guaranteed, and it doesn't work most of the time.
(Depending on the file format, it will fail on a Sparc, or
on an Intel machine---for most network formats, it will fail
on the Intel.)

James Kanze · May 25, 2009

Well actually the only problem I can imagine when I would move
to a 128 byte system (I probably have to read field by field).
However due to the structure of the header this was just a bit
of a timesaver instead of writing everything out.

All types are my own fixed size types and it certainly does
work on intel, since there is where I use it, since I read the
file and can listen to the contents using OpenAL I think I can
safely assume it works

But you are right that on systems with other 'endiness' there
needs some byte swapping to be done and for systems larger
than 64 bits, ressetting the file pointer might be necessary.

And on systems with different integral representations you'll
need other adjustments, and with compilers which use different
padding, you'll need other adjustments, and on systems where
bytes aren't 8 bits, you'll need other adjustments.

FWIW: I've seen byte order of a 32 bit integer change from one
version of the compiler to the next, from 2301 to 0123. Padding
often changes according to compiler options. And most systems
(not Intel) have alignment restrictions, which means that if the
data in the buffer isn't aligned, you get a core dump.

Byte order is just the tip of the iceberg. In practice, you
need to define the format (or use an already defined format),
and implement the correct formatting.

Help fit different screen sizes	0	Dec 10, 2019
Sizes of pointers	233	Jul 30, 2013
Best Way to Handle Unknown Data Sizes?	5	Apr 13, 2012
How to represent "RESERVED 1 byte" & "RESERVED 4 byte" in a line ...!!	3	Nov 6, 2013
read() returns data of different sizes	1	Oct 2, 2010
how to add pad byte	11	Mar 25, 2012
Storing byte stream in std::string	5	Jan 3, 2008
Sizes and types for network programming	35	Sep 15, 2010

Storing data of different byte sizes

magnus.moraberg

Ron AF Greve

Ian Collins

James Kanze

James Kanze

Martin Eisenberg

Ron AF Greve

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads