How to merge .wav files

Jarson · Sep 30, 2004

I'm building a web-based message alert system in Perl (CGI) using voice TTS.
Each web client will get a custom voice message that will actually consist
of selected .wav files merged together to appear as one. My problem, is
that I don't know how to handle .wav files to merge them properly under
Perl. Alternatively, if there is a way for a CGI program to send a stream
of multiple separate .wav files, that would work to. Is there?

Thanks, Jarson

jarson can be found at sygration. That's a dot com company.

Jarson · Sep 30, 2004

news:v4%[email protected]... [snip]

Have a look at http://www.xav.com/perl/site/lib/Win32/Sound.html
It might get you close enuff to where you can code it.

Nothing really applicable in that library. It is for playing sound in
Windows systems. I don't wish to actually play any sound on my Unix server;
the sound will be served to the clients. There are also some
Audio::SoundFile libraries on CPAN for doing the same on unix, but I would
hope that a simple merge would not require such a complex library.

Tassilo v. Parseval · Oct 1, 2004

Also sprach Jarson:

I'm building a web-based message alert system in Perl (CGI) using voice TTS.
Each web client will get a custom voice message that will actually consist
of selected .wav files merged together to appear as one. My problem, is
that I don't know how to handle .wav files to merge them properly under
Perl.

Merging two .wav files is relatively easy. All you have to do is going
sample-wise through both of them in parallel, add the two samples (a
sample is just a signed integer) and write the new value to another
file. You can do the reading with Audio::WAV::Read::read() and writing
with Audio::WAV::Write::write().

Some things to watch for: You have to truncate values when they would go
beyond the maximum or minimum range of the bitrate. For 16 bits the
range is +/- 2**15 - 1. Otherwise they wrap around. Then the two .wav
files should have the same format. If file one is stereo and the second
one mono you always read two samples of the first file and one of the
second and add the second value to the first two values. When they
differ in bitrate you have to convert the samples of the file with the
lower bitrate accordingly (a 8 bit sampling-rate means that you have to
distribute the values in the range (-128 .. 127) to values in the range of
(-2**15 .. 2**15 - 1). Most of the time this distribution happens
evenly. Different sample frequency means that you skip certain samples
in the file with the higher frequency.

Tassilo

Sherm Pendley · Oct 1, 2004

Tassilo said:
Some things to watch for: You have to truncate values when they would go
beyond the maximum or minimum range

Ouch, just reading this makes my ears hurt! Truncation results is sound
waves that are squared off at the top and/or bottom. It's commonly known
as "clipping", and it sounds horrible.

Don't truncate when you're doing the addition. Do the math with 32, 64,
or even 96-bit ints internally to allow for plenty of headroom, and
normalize the output to the desired bit width only on output.

sherm--

Anno Siegel · Oct 1, 2004

Sherm Pendley said:
Ouch, just reading this makes my ears hurt! Truncation results is sound
waves that are squared off at the top and/or bottom. It's commonly known
as "clipping", and it sounds horrible.

Don't truncate when you're doing the addition. Do the math with 32, 64,
or even 96-bit ints internally to allow for plenty of headroom, and
normalize the output to the desired bit width only on output.

I'm no audio buff, but using more than 32 bits to calculate 16 bit
quantities sounds excessive.

Alternatively to truncation or normalization, calculating a (possibly
weighted) average looks plausible too.

Anno

kevin · Oct 1, 2004

Jarson said:
I'm building a web-based message alert system in Perl (CGI) using voice TTS.
Each web client will get a custom voice message that will actually consist
of selected .wav files merged together to appear as one. My problem, is
that I don't know how to handle .wav files to merge them properly under
Perl. Alternatively, if there is a way for a CGI program to send a stream
of multiple separate .wav files, that would work to. Is there?

Jarson,
below is a perl script i wrote to add silence on the front of a wav file,
together with the notes i have on the wav file header. Unfortunately,
i can't remember where i got the wav file header docs from

the script ran on linux.
you should be able to merge wavs in a similar way.

HTH,
kevin

#!/usr/bin/perl -w
#
#
#
$|=1;

my $file=shift or usage();
my $offset=shift or usage();
my $new=shift or usage();

my $bytes=getBytes($file,$offset);
print "adding $bytes bytes\n";
writeWav($file,$bytes,$new);
print "done\n";
exit;

sub usage{
(my $prog=$0)=~ s{.*/}{};
print <<EOH;
Usage: $prog infile.wav offset outfile.wav
Add 'offset' frames of silence to the start of infile.wav
EOH
exit;
}

sub getBytes{
my ($file,$offset)=@_;
my $buffer;

open WAV,$file or die "cannot open $file\n";
read WAV,$buffer,4;
die "invalid wav file\n" unless $buffer eq 'RIFF';
read WAV,$buffer,8;
read WAV,$buffer,4;
die "invalid wav file\n" unless $buffer eq 'fmt ';
read WAV,$buffer,12;
read WAV,$buffer,4;
my $bytes=unpack("V",$buffer)/24;
#print "$bytes bytes per frame\n";
close WAV;
return $offset*$bytes;
}

sub writeWav{
my ($file,$bytes,$new)=@_;

open WAV,$file or die "cannot open $file\n";
open NEW,">$new" or die "cannot open $new\n";

#copy RIFF header
read WAV,$buffer,4;
print NEW $buffer;
read WAV,$buffer,4; #length
print NEW pack("V",$bytes+unpack("V",$buffer));
read WAV,$buffer,4;
print NEW $buffer;

#copy FORMAT chunk
read WAV,$buffer,24;
print NEW $buffer;

#copy DATA chunk adding in the extra silence
read WAV,$buffer,4;
print NEW $buffer;
read WAV,$buffer,4; #length ? bytes or samples ????
#print((unpack("V",$buffer)/4)." length\n");
#print((($bytes+unpack("V",$buffer))/4)." length\n");
print NEW pack("V",$bytes+unpack("V",$buffer));

#silence
print NEW pack("H","00") for 1 .. $bytes;

#sound
print NEW $buffer while read WAV,$buffer,2048;

close NEW;
close WAV;

}

#
# wav file
# unpack 4 bytes in V, two bytes in v
#
#RIFF
# 4 "RIFF"
# 4 length of package (binary, little-endian)
# 4 "WAVE"
#
#FORMAT
# 4 "fmt "
# 4 length
# 2
# 2 channels
# 4 sample rate
# 4 bytes/sec
# 2 bytes/sample
# 2 bits/sample
#
#DATA
# 4 "data"
# 4 length of data
# * data

Sherm Pendley · Oct 1, 2004

Anno said:
I'm no audio buff, but using more than 32 bits to calculate 16 bit
quantities sounds excessive.

For a single operation involving only two 16-bit tracks, yes. But
high-end apps - stuff like ProTools, Logic, etc. - use 32-bit tracks
internally, and support a ridiculous number of them.

1023 32-bit tracks need 42 bits of range to mix them all without the
risk of truncation - it's far more convenient to round that up and use
64-bit long longs.

I'll freely admit though, that this is getting *very* far afield of the
original question.

Alternatively to truncation or normalization, calculating a (possibly
weighted) average looks plausible too.

Nope. Mixing sound means addition. If you average them, that makes the
quiet samples louder and the loud ones quieter. The effect is the most
pronounced where you least want it to be, where one track is very loud
and the other is very quiet; the mixed result has the two sounds much
closer together in volume than they should be.

sherm--

Tassilo v. Parseval · Oct 1, 2004

Also sprach Sherm Pendley:

Ouch, just reading this makes my ears hurt! Truncation results is sound
waves that are squared off at the top and/or bottom. It's commonly known
as "clipping", and it sounds horrible.

Clipping is the most basic way of doing this, indeed. But very often the
result isn't as bad as it may appear because many recordings have quite a
headroom to the maximum peak (at least always the ones I dealt with in
the past). Files that are well compressed, though, suffer from clipping
more audibly and a less simplistic approach is neede.

Don't truncate when you're doing the addition. Do the math with 32, 64,
or even 96-bit ints internally to allow for plenty of headroom, and
normalize the output to the desired bit width only on output.

Gee, 96 bits? How many streams do you usually mix together?

Tassilo

Anno Siegel · Oct 1, 2004

Sherm Pendley said:
Anno Siegel wrote:

[mixing wav files]

Nope. Mixing sound means addition. If you average them, that makes the
quiet samples louder and the loud ones quieter.

We're way off topic, but...

Unbiased averaging *is* addition, after applying a factor of 1/2 to each
summand. Weighted averaging is also addition, after applying individual
factors (whose sum is 1) to each summand. The only difference is that
the sum is immediately scaled so that it never exceeds the maximum of the
inputs. I don't see your point.

The effect is the most
pronounced where you least want it to be, where one track is very loud
and the other is very quiet; the mixed result has the two sounds much
closer together in volume than they should be.

How so?

Anno

Sherm Pendley · Oct 1, 2004

Anno said:
Unbiased averaging *is* addition, after applying a factor of 1/2 to each
summand. Weighted averaging is also addition, after applying individual
factors (whose sum is 1) to each summand. The only difference is that
the sum is immediately scaled so that it never exceeds the maximum of the
inputs. I don't see your point.

My point is that you don't know if 0.5 is the best scaling factor. It's
the safest, in that it guarantees a zero chance of clipping. But it can
also reduce the dynamic range needlessly.

For instance, assume that the highest total of two samples is 34k - to
reduce this to the 32k required to fit into 16 bits is a scaling factor
of about 0.94. Reducing all the samples by a factor of 0.5 would then
leave the loudest point at a mere 17k, effectively reducing the total
dynamic range by nearly half.

For the best audio definition, you want to scale the final result so
that the highest peak just barely fits in the range of the output
format. You can't determine what that scaling factor will be, until
you've actually added all of the samples to determine what the value of
that highest peak is.

sherm--

Sherm Pendley · Oct 1, 2004

Tassilo said:
Gee, 96 bits? How many streams do you usually mix together?

I've seen pro apps advertising a 96-bit internal data path. Whether it
was actually useful, really that wide, or just marketroid nonsense, is
certainly open for debate.

sherm--

Jarson · Oct 1, 2004

Wooooo! Stop! I am very sorry for using the word "merge" when I should have
said "join" or "concatenate".
I don't want to have the two files overlapping each other, I simply want one
joined after the other.

Example:
file1.wav says: "Hello. The blah blah blah system has detected an alert
in your area."
file2.wav says: "The Ohio thing-a-ma-gig is operating at 50% capacity."

fileJoin.wav says: "Hello. The blah blah blah system has detected an
alert in your area. The Ohio thing-a-ma-gig is operating at 50% capacity."

A simple UNIX cat does not work as there appears to be header information in
the first wav file that prevents the joined wav file from working properly.
How should I do a join in perl.

Jarson

Jarson · Oct 1, 2004

kevin said:
Jarson,
below is a perl script i wrote to add silence on the front of a wav file,
together with the notes i have on the wav file header. Unfortunately,
i can't remember where i got the wav file header docs from

the script ran on linux.
you should be able to merge wavs in a similar way.

HTH,
kevin

[snip]

Ahhh, yes! Your code looks very useful. Since I will be keeping the format
of the wave files the same (channels, sample rate, bytes/sec, etc.) I should
be able to join the data sections and update the length so it corresponds to
the joined length.

Thanks Kevin.
Jarson

Tassilo v. Parseval · Oct 1, 2004

Also sprach Sherm Pendley:

I've seen pro apps advertising a 96-bit internal data path. Whether it
was actually useful, really that wide, or just marketroid nonsense, is
certainly open for debate.

Ah, but that might be something very different. Some digital signal
processing happens with floating point numbers, like when audio data is
sent through a reverb processor or equalizer which requires a prior
fourier-synthesis. And sometimes it can mean a speed-up (especially on
modern processors) if you use floating point values instead of
integers. In the end you always need integer values but in between it
can be benificial to work with very wide floats because you are less
prone to losing quality due to the limited precision.

Tassilo

Tassilo v. Parseval · Oct 1, 2004

[ Please don't top-post ]

Also sprach Jarson:

Wooooo! Stop! I am very sorry for using the word "merge" when I should have
said "join" or "concatenate".
I don't want to have the two files overlapping each other, I simply want one
joined after the other.

Example:
file1.wav says: "Hello. The blah blah blah system has detected an alert
in your area."
file2.wav says: "The Ohio thing-a-ma-gig is operating at 50% capacity."

fileJoin.wav says: "Hello. The blah blah blah system has detected an
alert in your area. The Ohio thing-a-ma-gig is operating at 50% capacity."

A simple UNIX cat does not work as there appears to be header information in
the first wav file that prevents the joined wav file from working properly.
How should I do a join in perl.

Yes, the first 44 bytes of a wav-file is the header. If you want to
append one wave-file to another, you first have to strip off the first
44 bytes of the stream to be appended. After that, you measure the size
of the stream in bytes (it's filesize minus 44 for obvious reasons).
Then you append the stream to the file. The last thing you have to do is
update two fields in the wave-header of the first stream. Those two
fields denote the size in bytes of the file and the stream:

#!/usr/bin/perl -w

use Fcntl qw/:seek/;

my ($file1, $file2) = @ARGV;

open WAV1, "+<", $file1 or die $!;
open WAV2, $file2 or die $!;

binmode WAV1;
binmode WAV2;

# the length of the second stream without header
my $size = -s WAV2 - 44;

$/ = \4; # four bytes on each <>

seek WAV1, 4, SEEK_SET;
my $filesize = unpack "V", <WAV1>; # it's little-endian AFAIK
seek WAV1, 32, SEEK_CUR;
my $streamsize = unpack "V", <WAV1>;

seek WAV1, 4, SEEK_SET;
print WAV1 pack "V", $filesize + $size;
seek WAV1, 32, SEEK_CUR;
print WAV1 pack "V", $streamsize + $size;

seek WAV1, 0, SEEK_END;

$/ = \4096; # increase block-size a bit

# skip header of second file
seek WAV2, 44, SEEK_SET;

# append
print WAV1 $_ while <WAV2>;

close WAV1;
close WAV2;

The above is totally untested so byte-offsets and such might be a bit
off. If you get hold of a description of a wave-header you should be
able to understand what the above does.

Needless to say, you can only concatenate two wave-files when they have
the same format. Otherwise, you have to convert the one you append
first.

Tassilo

Jarson · Oct 1, 2004

Tassilo v. Parseval said:
[ Please don't top-post ]

Also sprach Jarson:

Yes, the first 44 bytes of a wav-file is the header. If you want to
append one wave-file to another, you first have to strip off the first
44 bytes of the stream to be appended. After that, you measure the size
of the stream in bytes (it's filesize minus 44 for obvious reasons).
Then you append the stream to the file. The last thing you have to do is
update two fields in the wave-header of the first stream. Those two
fields denote the size in bytes of the file and the stream:

#!/usr/bin/perl -w

use Fcntl qw/:seek/;

my ($file1, $file2) = @ARGV;

open WAV1, "+<", $file1 or die $!;
open WAV2, $file2 or die $!;

binmode WAV1;
binmode WAV2;

# the length of the second stream without header
my $size = -s WAV2 - 44;

$/ = \4; # four bytes on each <>

seek WAV1, 4, SEEK_SET;
my $filesize = unpack "V", <WAV1>; # it's little-endian AFAIK
seek WAV1, 32, SEEK_CUR;
my $streamsize = unpack "V", <WAV1>;

seek WAV1, 4, SEEK_SET;
print WAV1 pack "V", $filesize + $size;
seek WAV1, 32, SEEK_CUR;
print WAV1 pack "V", $streamsize + $size;

seek WAV1, 0, SEEK_END;

$/ = \4096; # increase block-size a bit

# skip header of second file
seek WAV2, 44, SEEK_SET;

# append
print WAV1 $_ while <WAV2>;

close WAV1;
close WAV2;

The above is totally untested so byte-offsets and such might be a bit
off. If you get hold of a description of a wave-header you should be
able to understand what the above does.

Needless to say, you can only concatenate two wave-files when they have
the same format. Otherwise, you have to convert the one you append
first.

Tassilo

Fantastic! I modified Kevin C's code snippet to effectively do as you
described and the joined
wave file works fine. Thanks for the insight. Now I'm riding the wave!
Jarson

How to easily merge multiple PDF files?	1	Jan 29, 2025
How to reliably determine paths of active apache .conf files from within php	2	Jul 26, 2022
WAV to BMP & Back!	6	Aug 23, 2006
WAV to BMP	9	Aug 24, 2006
Is there a way to merge two XML files via Python?	0	Jan 12, 2012
How to merge two files like the following with the XML or text parser	8	Oct 27, 2005
How to merge two files like the following with the XML parser	2	Oct 27, 2005
How to arrange many files of C source code	22	Mar 2, 2013

How to merge .wav files

Jarson

Jarson

Tassilo v. Parseval

Sherm Pendley

Anno Siegel

kevin

Sherm Pendley

Tassilo v. Parseval

Anno Siegel

Sherm Pendley

Sherm Pendley

Jarson

Jarson

Tassilo v. Parseval

Tassilo v. Parseval

Jarson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads