WAV file question

S

siroregano

Hi Everyone-

I'm new to this group, and almost-as-new to asking programming
questions publicly, so please forgive me if I miss a convention or two!

I have a text file, around 40,000 lines long, where each line is a
string of 4 ASCII characters corresponding to a 12-bit hexadecimal
audio sample. The file reads something like this...

081F
081C
081A
0818
080E
etc...

I would like to write a simple C program in a Linux environment that
will read in this file, convert the strings to signed 2's-compliment
int samples, and output a simple WAV audio file. I've found several
resources on "The Canonical WAV format" via Google, and they describe
the format pretty well.

I already have a program reading in the file and fprintf'ing each line
out to stdout, as a stub. Now I need to build the WAV file.

Question #1: Am I reinventing the wheel? Does anyone know of another
piece of code out there that could do this out-of-the-box? I saw that
ogg123 seems to have a "raw" device option, but there doesn't seem to
be much documentation as to what it expects as a raw sample stream.

Question #2, for those familiar with WAV files: The WAV docs seem to
indicate that there can be only one DATA chunk in each WAV file. In a
DATA chunk, you get 4 bytes to give the size of your data... But what
if you have more data bytes than you can drop into an unsigned 4-byte
int?

Question #3, which is more C-language related: The WAV format uses
little-endian bit orders for all data, but the byte-order of several of
the multibyte (either 2-byte or 4-byte) variables is big-endian. So for
an int, the most-significant-byte would be stored in the first byte of
memory, the next-most-significant byte would follow it, and so on...
What is the byte order used by x86/gnu-linux? How can I flip around the
byte order of certain variables so that when I write out the WAV file,
the bytes are in the order the player expects??

Question #4: Do I have to do anything special to write out a binary
file as opposed to an ASCII text file? Can I just use fputc() to
iteratively drop the contents of the WAV chunk structs into a file? Do
I need to do anything at the top of the file to set the mime type?

Thanks in advance for your advice,
[medic]Dave
 
S

SM Ryan

# Question #2, for those familiar with WAV files: The WAV docs seem to
# indicate that there can be only one DATA chunk in each WAV file. In a
# DATA chunk, you get 4 bytes to give the size of your data... But what
# if you have more data bytes than you can drop into an unsigned 4-byte
# int?

Isn't the maximum over 3 hours of sound? Going over 4 megabytes is going
to run into problems on some file systems. Sometimes there are inherent
limits you have to find some other way to do it.

# Question #3, which is more C-language related: The WAV format uses
# little-endian bit orders for all data, but the byte-order of several of

Your system probably provides various byte swapping functions that can
get them in the proper order for you. You can also do byte swapping
yourself, for example for a 4 byte integer x

(x>>24) & 0x000000FF
| (x>>8) & 0x0000FF00
| (x<<8) & 0x00FF0000
| (x<<24)& 0xFF000000

# Question #4: Do I have to do anything special to write out a binary
# file as opposed to an ASCII text file? Can I just use fputc() to
# iteratively drop the contents of the WAV chunk structs into a file? Do
# I need to do anything at the top of the file to set the mime type?

On some systems you want to use an fopen mode like "wb", b for binary.
(Unix does not distinguish between text and binary files.)
You can use fputc or fwrite. Presumably the mime type is decided by
the initial chunk descriptor: simply writing the file in the correct
format would then suffice. Perhaps also you have to make sure the file
name ends in .wav.
 
C

Chris Torek

Question #3, which is more C-language related: The WAV format uses
little-endian bit orders for all data, but the byte-order of several of
the multibyte (either 2-byte or 4-byte) variables is big-endian. So for
an int, the most-significant-byte would be stored in the first byte of
memory, the next-most-significant byte would follow it, and so on...
What is the byte order used by x86/gnu-linux? How can I flip around the
byte order of certain variables so that when I write out the WAV file,
the bytes are in the order the player expects??

Usually, the best answer to the question "how do I flip the bytes
around in C code" is "don't". That is, rather than reading several
bytes into a variable, then fixing up the variable, just read and
write *values*:

int val1, val2;
unsigned int composite; /* at least 16 bits */

val1 = getc(fp);
val2 = getc(fp);
if (val1 == EOF || val2 == EOF) ... handle error ...

/* build 16-bit value from two 8-bit values */
composite =
((unsigned int)(val1 & 0xff) << 8) |
((unsigned int)(val2 & 0xff));

Do likewise when writing the values.

The result is as portable as is possible in C. By not giving up
control of the order of the two (or three, or four, or 500, or
however many) getc() or putc() operations, you do not have to
correct later for "machine used incorrect order earlier".
 
K

Keith Thompson

[ Quoting character fixed. ]

SM Ryan said:
Isn't the maximum over 3 hours of sound? Going over 4 megabytes is going
to run into problems on some file systems. Sometimes there are inherent
limits you have to find some other way to do it.

I think you mean 4 gigabytes (2**32), not 4 megabytes (2**22).

On some systems, stdio starts running into problems at 2 gigabytes,
because fseek() takes a signed long. If none of your files are that
big, you can probably afford not to worry about it. Otherwise, find
out what you have to do on your system to handle large files. Using
fgetpos/fsetpos rather than ftell/fseek might or might not be
sufficient.

[snip]
On some systems you want to use an fopen mode like "wb", b for binary.
(Unix does not distinguish between text and binary files.)

If you want to write a binary file, you should use "wb" on *all*
systems. It's true that Unix doesn't distinguish between text and
binary files, but since "w" denotes text and "wb" denotes binary, you
might as well use "wb", for documentation if nothing else. With "w",
your code will work on some systems. With "wb" it will work on all
systems, at essentially no additional cost.
 
E

Erik de Castro Lopo

Hi Everyone-

I'm new to this group, and almost-as-new to asking programming
questions publicly, so please forgive me if I miss a convention or two!

I have a text file, around 40,000 lines long, where each line is a
string of 4 ASCII characters corresponding to a 12-bit hexadecimal
audio sample. The file reads something like this...

081F
081C
081A
0818
080E
etc...

I would like to write a simple C program in a Linux environment that
will read in this file, convert the strings to signed 2's-compliment
int samples, and output a simple WAV audio file.

Have a look at libsndfile:

http://www.mega-nerd.com/libsndfile/

Question #1: Am I reinventing the wheel?

Yes, see above.
Question #2, for those familiar with WAV files: The WAV docs seem to
indicate that there can be only one DATA chunk in each WAV file.

Never seen one.
Question #3, which is more C-language related: The WAV format uses
little-endian bit orders for all data,

libsndfile does all the ahrd work. It does byte swapping and format
conversion as required on the fly.

HTH,
Erik
--
+-----------------------------------------------------------+
Erik de Castro Lopo (e-mail address removed) (Yes it's valid)
+-----------------------------------------------------------+
"It's far too easy to make fun of Microsoft products, but it takes a
real man to make them work, and a god to make them do anything useful"
-- Anonymous
 
S

SM Ryan

# I think you mean 4 gigabytes (2**32), not 4 megabytes (2**22).

A million bytes here, a million bytes there. Eventually youre talking
real memory.
 
M

Malcolm

Question #1: Am I reinventing the wheel?
The task is so simple that it will be easier for an experienced C programmer
to write it than to find existing code. For an inexperienced C programmer,
it is a good learning exercise.
Question #2, for those familiar with WAV files: The WAV docs seem to
indicate that there can be only one DATA chunk in each WAV file. In a
DATA chunk, you get 4 bytes to give the size of your data... But what
if you have more data bytes than you can drop into an unsigned 4-byte
int?
Check the format on wotsit.com. It may be that the files are limited to 4GB,
in which case you'll have to break them up.
Question #3, which is more C-language related: The WAV format uses
little-endian bit orders for all data,
Don't do this

fwrite(&x, sizeof(int), 1, fp);

do this
void put16le(int x, FILE *fp)
{
fputc(x & 0xFF, fp);
fputc( (x >> 8) & 0xFF, fp);
}

Then you don't need to worry about your machine's particular byte order.
Question #4: Do I have to do anything special to write out a binary
file as opposed to an ASCII text file?
FILE *fp = fopen(path, "wb");

will open a file in binary. On most sytems there is actually no difference
between a text file and a binary, but a few will insert carriage returns
before every newline, or make the terminal go into Chinese mode if non-ASCII
characters are written to a text file and then echoed to the screen, or do
other nasty things. So use the b.
 
S

siroregano

Thank you all for your help - while I've been C coding for a few years
now, it has been limited to simple C to be run on microcontrollers. So
I've never really been exposed to programming for a big OS like Linux.

It looks like libsndfile has what I need in the "Raw PCM" format input.
Now I'm working on trading beer for tutoring with a C-guru friend of
mine to get the app up and runing!

Thanks again,
Dave
 
S

siroregano

Update - After checking out libsndfile more in-depth, and examining
some of the examples provided with the source distribution, it looks
like this library will be *perfect* for what I'm trying to do. Thank
you!

Dave
 
M

Michel Rouzic

Question 1 : Um... "I have a text file, around 40,000 lines long, where
each line is a
string of 4 ASCII characters corresponding to a 12-bit hexadecimal
audio sample"

wow, um, why dont you directly read from a .WAV file? I did that and
it's so simple (I do a fread to read each tag into a variable then with
loops i put the sound data into a multidimensionnal array)

Question 2 : Yeah you can't have a .WAV file over 4 GB. I won't ask you
what you're gonna do with over 4 GB, but don't forget that not a long
ago, if i remember good with FAT32 you couldn't have files over 4 GB...

Question 3 : Dont worry about little-endian/big-endian, or at least i
didn't have to worry about that personally. when you do a fread(&x,
sizeof(unsigned int), 1, wave_file); it just gets read properly and
written properly when writting with fwrite.

Question 4 : Wow, i'm not sure i got your question right, but well, for
you output file, you open it so you can write it in binary (with
something like wav_out=fopen(argv[2],"wb");) then you write all the
tags with fwrite and then all the sound data...

I did some simple code that reads and write directly .wav files in ANSI
C, ask me by email if you wanna see the code
 
K

Keith Thompson

Michel Rouzic said:
Question 3 : Dont worry about little-endian/big-endian, or at least i
didn't have to worry about that personally. when you do a fread(&x,
sizeof(unsigned int), 1, wave_file); it just gets read properly and
written properly when writting with fwrite.

I don't know anything about how WAV files are formatted, but I suspect
you just happened to be reading a file on a system whose endianness
matches the endianness of the file. If you tried to read the same
file on a system with the opposite endianness, your fread would give
you byte-flipped garbage.
 
M

Michel Rouzic

oh, well yeah, .wav file are in little endian and im doing that on a
little-endian machine. so, if i compiled the same code on a Mac it
couldn't read a .wav file properly?
 
J

Joe Wright

Michel said:
oh, well yeah, .wav file are in little endian and im doing that on a
little-endian machine. so, if i compiled the same code on a Mac it
couldn't read a .wav file properly?

No. fread() and fwrite(), on files open in binary mode, will faithfully
read a file into memory and then write memory back to a file. If memory
was not changed in any way, the output file is identical to the input
file. fread() and fwrite() don't know from endian.

The problem with endianness comes when data objects wider than byte
(char) are written to a file on a little endian machine (Intel) and then
the file is read by a big endian machine (Apple). If the file has data
writen from 'short', 'int', and/or 'long' objects, the receiving machine
(Apple) can read these objects into memory but must 'switch ends' before
using the objects as data. If the receiver (Apple) would modify the data
and create another file for the Intel machine it must 'switch ends'
again before writing the file.
 
K

Keith Thompson

Michel Rouzic said:
oh, well yeah, .wav file are in little endian and im doing that on a
little-endian machine. so, if i compiled the same code on a Mac it
couldn't read a .wav file properly?

Please provide some context when you post a followup. We don't
necessarily have easy access to the article to which you're replying.
The Google Groups interface makes proper followups needlessly
difficult, but not impossible. I've restored the context here.

If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.

The answer to your question is probably yes. The Mac happens to be
big-endian. If you create a binary file on a little-endian platform
by fwrite()ing, say, an int value, then copy the file to a big-endian
platform and attempt to read it by fread()ing an int, you'll get
incorrect results. If you instead read and write the file a byte at a
time, constructing larger quantities by explicitly assembling them
from bytes rather than by storing them in int objects, it should work.
Or you can read an int as an int and then apply a function that swaps
the bytes; some systems may provide functions to do this. <OT>Look
for htonl, htons, ntohl, and ntohs -- and remember that "network
order" is big-endian.</OT>
 
E

Erik de Castro Lopo

Michel said:
oh, well yeah, .wav file are in little endian and im doing that on a
little-endian machine.
Correct.

so, if i compiled the same code on a Mac it
couldn't read a .wav file properly?

If you code does not know how to end swap the values read from
the file, then the file would not be read correctly.

libsndfile ( http://www.mega-nerd.com/libsndfile ) does
the right thing regardless of what CPU it it is being run
on. It also allows you to open a file with 16 bit short
int data and read that data as 32 bit floats. The short to
float conversion happens autmaticaly inside the library.

Erik
--
+-----------------------------------------------------------+
Erik de Castro Lopo (e-mail address removed) (Yes it's valid)
+-----------------------------------------------------------+
"I consider C++ the most significant technical hazard to the survival
of your project and do so without apologies." -- Alistair Cockburn
 
G

Gordon Burditt

oh, well yeah, .wav file are in little endian and im doing that on a
little-endian machine. so, if i compiled the same code on a Mac it
couldn't read a .wav file properly?

Write your code properly to use Documented-endian (that is, the
endianness specified in the document describing the file format)
in the file, and write it so the Native-endian type does not matter.
This often involves shifting and masking. It almost certainly does
NOT involve using fread() to read binary data into a structure with
multi-byte integers as elements, then using it without further
adjustment. Oh, yes, you should also use the Documented integer
size in the file, and write your code so the type used in your code
is sufficient to hold that integer but won't break if the Native
integer size is bigger.

Gordon L. Burditt
 
M

Michel Rouzic

oh yeah sorry i didnt know about that google group reply thing.

anyways, wow, how do you do then if on a mac you want to read a 32-bit
..wav file? i mean, not a long ago i read file byte per byte and then
doing this like "x = 256*b + a" but when i wanted to read 32-bit .wav
files, i didn't wanna program anything to read 4 bytes and them put
them together into 32-bit floats cuz thats way to complicated (so i
used fread to read floats). is it what mac programmers have to do
(unless they use some library that does it)?
 
D

Default User

Michel said:
oh yeah sorry i didnt know about that google group reply thing.

So why then did you refuse to use in this post? Are you intentionally
trying to annoy people?




Brian
 
K

Keith Thompson

Michel Rouzic said:
oh yeah sorry i didnt know about that google group reply thing.

And now you do, but you still don't provide any context in your
followup. I don't think it's strictly necessary in this case, but
it's polite to at least acknowedge whose article you're responding to.
anyways, wow, how do you do then if on a mac you want to read a 32-bit
.wav file? i mean, not a long ago i read file byte per byte and then
doing this like "x = 256*b + a" but when i wanted to read 32-bit .wav
files, i didn't wanna program anything to read 4 bytes and them put
them together into 32-bit floats cuz thats way to complicated (so i
used fread to read floats). is it what mac programmers have to do
(unless they use some library that does it)?

Dealing with floating-point numbers in a binary file written on
another platform is even trickier than dealing with integers. Most
systems these days use the IEEE format, or something very similar, but
there are still a number of other formats out there (Cray, IBM
mainframe, VAX, etc.). If you know the format used in the file, you
can write portable code that will extract the fields and
mathematically construct a floating-point value, though possibly with
some loss of range and/or precision.

I'm actually a bit surprised that the WAV format doesn't use just
integers. (One web page I just read didn't say anything about
floating-point, but another did mention it). The format specification
should tell you exactly how the values are represented in the file.
Extracting the information is left as an exercise.

I'm sure that plenty of code has already been written to do this.
Finding it is also left as an exercise.
 
E

Erik de Castro Lopo

Keith said:
Dealing with floating-point numbers in a binary file written on
another platform is even trickier than dealing with integers.

Yes, and libsndfile ( http://www.mega-nerd.com/libsndfile )
handles floating point data in WAV, AIFF, AU and a number of
other file formats correctly, regardless of whether your are
on Win32, MacOSX or Linux/Unix.
I'm sure that plenty of code has already been written to do this.
Finding it is also left as an exercise.

In case you missed it:

http://www.mega-nerd.com/libsndfile/

Released under the LGPL so you can even use it in commercial
apps. Runs on WIn32, MacOS and Linux.

Erik
--
+-----------------------------------------------------------+
Erik de Castro Lopo (e-mail address removed) (Yes it's valid)
+-----------------------------------------------------------+
Learning Linux is like joining a cult. Sure it's fun at first but
you waste time, become brainwashed, and then have to be de-programmed
by Bill Gates before you can work for Him again.
- Ray Lopez, in [email protected]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Need help reading .wav file in C# 0
Java and wav file generation 2
wav to mp3 19
reading wav files 30
WAV to BMP 9
Plotting wav file 5
Wav file manipulation. 0
text file 2

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top