How to merge .wav files

Discussion in 'Perl Misc' started by Jarson, Sep 30, 2004.

  1. Jarson

    Jarson Guest

    I'm building a web-based message alert system in Perl (CGI) using voice TTS.
    Each web client will get a custom voice message that will actually consist
    of selected .wav files merged together to appear as one. My problem, is
    that I don't know how to handle .wav files to merge them properly under
    Perl. Alternatively, if there is a way for a CGI program to send a stream
    of multiple separate .wav files, that would work to. Is there?

    Thanks, Jarson

    jarson can be found at sygration. That's a dot com company.
     
    Jarson, Sep 30, 2004
    #1
    1. Advertising

  2. Jarson

    Jarson Guest


    >"Fred Toewe" <> wrote in message
    >news:v4%6d.1748$...

    [snip]
    >
    > Have a look at http://www.xav.com/perl/site/lib/Win32/Sound.html
    > It might get you close enuff to where you can code it.
    >

    Nothing really applicable in that library. It is for playing sound in
    Windows systems. I don't wish to actually play any sound on my Unix server;
    the sound will be served to the clients. There are also some
    Audio::SoundFile libraries on CPAN for doing the same on unix, but I would
    hope that a simple merge would not require such a complex library.
     
    Jarson, Oct 1, 2004
    #2
    1. Advertising

  3. Also sprach Jarson:

    > I'm building a web-based message alert system in Perl (CGI) using voice TTS.
    > Each web client will get a custom voice message that will actually consist
    > of selected .wav files merged together to appear as one. My problem, is
    > that I don't know how to handle .wav files to merge them properly under
    > Perl.


    Merging two .wav files is relatively easy. All you have to do is going
    sample-wise through both of them in parallel, add the two samples (a
    sample is just a signed integer) and write the new value to another
    file. You can do the reading with Audio::WAV::Read::read() and writing
    with Audio::WAV::Write::write().

    Some things to watch for: You have to truncate values when they would go
    beyond the maximum or minimum range of the bitrate. For 16 bits the
    range is +/- 2**15 - 1. Otherwise they wrap around. Then the two .wav
    files should have the same format. If file one is stereo and the second
    one mono you always read two samples of the first file and one of the
    second and add the second value to the first two values. When they
    differ in bitrate you have to convert the samples of the file with the
    lower bitrate accordingly (a 8 bit sampling-rate means that you have to
    distribute the values in the range (-128 .. 127) to values in the range of
    (-2**15 .. 2**15 - 1). Most of the time this distribution happens
    evenly. Different sample frequency means that you skip certain samples
    in the file with the higher frequency.

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
     
    Tassilo v. Parseval, Oct 1, 2004
    #3
  4. Tassilo v. Parseval wrote:

    > Some things to watch for: You have to truncate values when they would go
    > beyond the maximum or minimum range


    Ouch, just reading this makes my ears hurt! Truncation results is sound
    waves that are squared off at the top and/or bottom. It's commonly known
    as "clipping", and it sounds horrible.

    Don't truncate when you're doing the addition. Do the math with 32, 64,
    or even 96-bit ints internally to allow for plenty of headroom, and
    normalize the output to the desired bit width only on output.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Oct 1, 2004
    #4
  5. Jarson

    Anno Siegel Guest

    Sherm Pendley <> wrote in comp.lang.perl.misc:
    > Tassilo v. Parseval wrote:
    >
    > > Some things to watch for: You have to truncate values when they would go
    > > beyond the maximum or minimum range

    >
    > Ouch, just reading this makes my ears hurt! Truncation results is sound
    > waves that are squared off at the top and/or bottom. It's commonly known
    > as "clipping", and it sounds horrible.
    >
    > Don't truncate when you're doing the addition. Do the math with 32, 64,
    > or even 96-bit ints internally to allow for plenty of headroom, and
    > normalize the output to the desired bit width only on output.


    I'm no audio buff, but using more than 32 bits to calculate 16 bit
    quantities sounds excessive.

    Alternatively to truncation or normalization, calculating a (possibly
    weighted) average looks plausible too.

    Anno
     
    Anno Siegel, Oct 1, 2004
    #5
  6. Jarson

    kevin Guest

    "Jarson" <> wrote in message news:<xuY6d.20544$>...
    > I'm building a web-based message alert system in Perl (CGI) using voice TTS.
    > Each web client will get a custom voice message that will actually consist
    > of selected .wav files merged together to appear as one. My problem, is
    > that I don't know how to handle .wav files to merge them properly under
    > Perl. Alternatively, if there is a way for a CGI program to send a stream
    > of multiple separate .wav files, that would work to. Is there?
    >



    Jarson,
    below is a perl script i wrote to add silence on the front of a wav file,
    together with the notes i have on the wav file header. Unfortunately,
    i can't remember where i got the wav file header docs from :(

    the script ran on linux.
    you should be able to merge wavs in a similar way.

    HTH,
    kevin

    #!/usr/bin/perl -w
    #
    #
    #
    $|=1;

    my $file=shift or usage();
    my $offset=shift or usage();
    my $new=shift or usage();


    my $bytes=getBytes($file,$offset);
    print "adding $bytes bytes\n";
    writeWav($file,$bytes,$new);
    print "done\n";
    exit;

    sub usage{
    (my $prog=$0)=~ s{.*/}{};
    print <<EOH;
    Usage: $prog infile.wav offset outfile.wav
    Add 'offset' frames of silence to the start of infile.wav
    EOH
    exit;
    }


    sub getBytes{
    my ($file,$offset)=@_;
    my $buffer;


    open WAV,$file or die "cannot open $file\n";
    read WAV,$buffer,4;
    die "invalid wav file\n" unless $buffer eq 'RIFF';
    read WAV,$buffer,8;
    read WAV,$buffer,4;
    die "invalid wav file\n" unless $buffer eq 'fmt ';
    read WAV,$buffer,12;
    read WAV,$buffer,4;
    my $bytes=unpack("V",$buffer)/24;
    #print "$bytes bytes per frame\n";
    close WAV;
    return $offset*$bytes;
    }

    sub writeWav{
    my ($file,$bytes,$new)=@_;

    open WAV,$file or die "cannot open $file\n";
    open NEW,">$new" or die "cannot open $new\n";

    #copy RIFF header
    read WAV,$buffer,4;
    print NEW $buffer;
    read WAV,$buffer,4; #length
    print NEW pack("V",$bytes+unpack("V",$buffer));
    read WAV,$buffer,4;
    print NEW $buffer;

    #copy FORMAT chunk
    read WAV,$buffer,24;
    print NEW $buffer;

    #copy DATA chunk adding in the extra silence
    read WAV,$buffer,4;
    print NEW $buffer;
    read WAV,$buffer,4; #length ? bytes or samples ????
    #print((unpack("V",$buffer)/4)." length\n");
    #print((($bytes+unpack("V",$buffer))/4)." length\n");
    print NEW pack("V",$bytes+unpack("V",$buffer));

    #silence
    print NEW pack("H","00") for 1 .. $bytes;

    #sound
    print NEW $buffer while read WAV,$buffer,2048;

    close NEW;
    close WAV;

    }


    #
    # wav file
    # unpack 4 bytes in V, two bytes in v
    #
    #RIFF
    # 4 "RIFF"
    # 4 length of package (binary, little-endian)
    # 4 "WAVE"
    #
    #FORMAT
    # 4 "fmt "
    # 4 length
    # 2
    # 2 channels
    # 4 sample rate
    # 4 bytes/sec
    # 2 bytes/sample
    # 2 bits/sample
    #
    #DATA
    # 4 "data"
    # 4 length of data
    # * data
     
    kevin, Oct 1, 2004
    #6
  7. Anno Siegel wrote:

    > I'm no audio buff, but using more than 32 bits to calculate 16 bit
    > quantities sounds excessive.


    For a single operation involving only two 16-bit tracks, yes. But
    high-end apps - stuff like ProTools, Logic, etc. - use 32-bit tracks
    internally, and support a ridiculous number of them.

    1023 32-bit tracks need 42 bits of range to mix them all without the
    risk of truncation - it's far more convenient to round that up and use
    64-bit long longs.

    I'll freely admit though, that this is getting *very* far afield of the
    original question. :)

    > Alternatively to truncation or normalization, calculating a (possibly
    > weighted) average looks plausible too.


    Nope. Mixing sound means addition. If you average them, that makes the
    quiet samples louder and the loud ones quieter. The effect is the most
    pronounced where you least want it to be, where one track is very loud
    and the other is very quiet; the mixed result has the two sounds much
    closer together in volume than they should be.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Oct 1, 2004
    #7
  8. Also sprach Sherm Pendley:

    > Tassilo v. Parseval wrote:
    >
    >> Some things to watch for: You have to truncate values when they would go
    >> beyond the maximum or minimum range

    >
    > Ouch, just reading this makes my ears hurt! Truncation results is sound
    > waves that are squared off at the top and/or bottom. It's commonly known
    > as "clipping", and it sounds horrible.


    Clipping is the most basic way of doing this, indeed. But very often the
    result isn't as bad as it may appear because many recordings have quite a
    headroom to the maximum peak (at least always the ones I dealt with in
    the past). Files that are well compressed, though, suffer from clipping
    more audibly and a less simplistic approach is neede.

    > Don't truncate when you're doing the addition. Do the math with 32, 64,
    > or even 96-bit ints internally to allow for plenty of headroom, and
    > normalize the output to the desired bit width only on output.


    Gee, 96 bits? How many streams do you usually mix together? :)

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
     
    Tassilo v. Parseval, Oct 1, 2004
    #8
  9. Jarson

    Anno Siegel Guest

    Sherm Pendley <> wrote in comp.lang.perl.misc:
    > Anno Siegel wrote:


    [mixing wav files]

    > > Alternatively to truncation or normalization, calculating a (possibly
    > > weighted) average looks plausible too.

    >
    > Nope. Mixing sound means addition. If you average them, that makes the
    > quiet samples louder and the loud ones quieter.


    We're way off topic, but...

    Unbiased averaging *is* addition, after applying a factor of 1/2 to each
    summand. Weighted averaging is also addition, after applying individual
    factors (whose sum is 1) to each summand. The only difference is that
    the sum is immediately scaled so that it never exceeds the maximum of the
    inputs. I don't see your point.

    > The effect is the most
    > pronounced where you least want it to be, where one track is very loud
    > and the other is very quiet; the mixed result has the two sounds much
    > closer together in volume than they should be.


    How so?

    Anno
     
    Anno Siegel, Oct 1, 2004
    #9
  10. Anno Siegel wrote:

    > Unbiased averaging *is* addition, after applying a factor of 1/2 to each
    > summand. Weighted averaging is also addition, after applying individual
    > factors (whose sum is 1) to each summand. The only difference is that
    > the sum is immediately scaled so that it never exceeds the maximum of the
    > inputs. I don't see your point.


    My point is that you don't know if 0.5 is the best scaling factor. It's
    the safest, in that it guarantees a zero chance of clipping. But it can
    also reduce the dynamic range needlessly.

    For instance, assume that the highest total of two samples is 34k - to
    reduce this to the 32k required to fit into 16 bits is a scaling factor
    of about 0.94. Reducing all the samples by a factor of 0.5 would then
    leave the loudest point at a mere 17k, effectively reducing the total
    dynamic range by nearly half.

    For the best audio definition, you want to scale the final result so
    that the highest peak just barely fits in the range of the output
    format. You can't determine what that scaling factor will be, until
    you've actually added all of the samples to determine what the value of
    that highest peak is.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Oct 1, 2004
    #10
  11. Tassilo v. Parseval wrote:

    > Gee, 96 bits? How many streams do you usually mix together? :)


    I've seen pro apps advertising a 96-bit internal data path. Whether it
    was actually useful, really that wide, or just marketroid nonsense, is
    certainly open for debate.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Oct 1, 2004
    #11
  12. Jarson

    Jarson Guest

    Wooooo! Stop! I am very sorry for using the word "merge" when I should have
    said "join" or "concatenate".
    I don't want to have the two files overlapping each other, I simply want one
    joined after the other.

    Example:
    file1.wav says: "Hello. The blah blah blah system has detected an alert
    in your area."
    file2.wav says: "The Ohio thing-a-ma-gig is operating at 50% capacity."

    fileJoin.wav says: "Hello. The blah blah blah system has detected an
    alert in your area. The Ohio thing-a-ma-gig is operating at 50% capacity."

    A simple UNIX cat does not work as there appears to be header information in
    the first wav file that prevents the joined wav file from working properly.
    How should I do a join in perl.

    Jarson


    "Anno Siegel" <-berlin.de> wrote in message
    news:cjj7gq$pcd$-Berlin.DE...
    > Sherm Pendley <> wrote in comp.lang.perl.misc:
    >> Tassilo v. Parseval wrote:
    >>
    >> > Some things to watch for: You have to truncate values when they would
    >> > go
    >> > beyond the maximum or minimum range

    >>
    >> Ouch, just reading this makes my ears hurt! Truncation results is sound
    >> waves that are squared off at the top and/or bottom. It's commonly known
    >> as "clipping", and it sounds horrible.
    >>
    >> Don't truncate when you're doing the addition. Do the math with 32, 64,
    >> or even 96-bit ints internally to allow for plenty of headroom, and
    >> normalize the output to the desired bit width only on output.

    >
    > I'm no audio buff, but using more than 32 bits to calculate 16 bit
    > quantities sounds excessive.
    >
    > Alternatively to truncation or normalization, calculating a (possibly
    > weighted) average looks plausible too.
    >
    > Anno
     
    Jarson, Oct 1, 2004
    #12
  13. Jarson

    Jarson Guest

    "kevin" <> wrote in message
    news:...
    > Jarson,
    > below is a perl script i wrote to add silence on the front of a wav file,
    > together with the notes i have on the wav file header. Unfortunately,
    > i can't remember where i got the wav file header docs from :(
    >
    > the script ran on linux.
    > you should be able to merge wavs in a similar way.
    >
    > HTH,
    > kevin
    >

    [snip]

    Ahhh, yes! Your code looks very useful. Since I will be keeping the format
    of the wave files the same (channels, sample rate, bytes/sec, etc.) I should
    be able to join the data sections and update the length so it corresponds to
    the joined length.

    Thanks Kevin.
    Jarson
     
    Jarson, Oct 1, 2004
    #13
  14. Also sprach Sherm Pendley:

    > Tassilo v. Parseval wrote:
    >
    >> Gee, 96 bits? How many streams do you usually mix together? :)

    >
    > I've seen pro apps advertising a 96-bit internal data path. Whether it
    > was actually useful, really that wide, or just marketroid nonsense, is
    > certainly open for debate.


    Ah, but that might be something very different. Some digital signal
    processing happens with floating point numbers, like when audio data is
    sent through a reverb processor or equalizer which requires a prior
    fourier-synthesis. And sometimes it can mean a speed-up (especially on
    modern processors) if you use floating point values instead of
    integers. In the end you always need integer values but in between it
    can be benificial to work with very wide floats because you are less
    prone to losing quality due to the limited precision.

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
     
    Tassilo v. Parseval, Oct 1, 2004
    #14
  15. [ Please don't top-post ]

    Also sprach Jarson:

    > Wooooo! Stop! I am very sorry for using the word "merge" when I should have
    > said "join" or "concatenate".
    > I don't want to have the two files overlapping each other, I simply want one
    > joined after the other.
    >
    > Example:
    > file1.wav says: "Hello. The blah blah blah system has detected an alert
    > in your area."
    > file2.wav says: "The Ohio thing-a-ma-gig is operating at 50% capacity."
    >
    > fileJoin.wav says: "Hello. The blah blah blah system has detected an
    > alert in your area. The Ohio thing-a-ma-gig is operating at 50% capacity."
    >
    > A simple UNIX cat does not work as there appears to be header information in
    > the first wav file that prevents the joined wav file from working properly.
    > How should I do a join in perl.


    Yes, the first 44 bytes of a wav-file is the header. If you want to
    append one wave-file to another, you first have to strip off the first
    44 bytes of the stream to be appended. After that, you measure the size
    of the stream in bytes (it's filesize minus 44 for obvious reasons).
    Then you append the stream to the file. The last thing you have to do is
    update two fields in the wave-header of the first stream. Those two
    fields denote the size in bytes of the file and the stream:

    #!/usr/bin/perl -w

    use Fcntl qw/:seek/;

    my ($file1, $file2) = @ARGV;

    open WAV1, "+<", $file1 or die $!;
    open WAV2, $file2 or die $!;

    binmode WAV1;
    binmode WAV2;

    # the length of the second stream without header
    my $size = -s WAV2 - 44;

    $/ = \4; # four bytes on each <>

    seek WAV1, 4, SEEK_SET;
    my $filesize = unpack "V", <WAV1>; # it's little-endian AFAIK
    seek WAV1, 32, SEEK_CUR;
    my $streamsize = unpack "V", <WAV1>;

    seek WAV1, 4, SEEK_SET;
    print WAV1 pack "V", $filesize + $size;
    seek WAV1, 32, SEEK_CUR;
    print WAV1 pack "V", $streamsize + $size;

    seek WAV1, 0, SEEK_END;

    $/ = \4096; # increase block-size a bit

    # skip header of second file
    seek WAV2, 44, SEEK_SET;

    # append
    print WAV1 $_ while <WAV2>;

    close WAV1;
    close WAV2;

    The above is totally untested so byte-offsets and such might be a bit
    off. If you get hold of a description of a wave-header you should be
    able to understand what the above does.

    Needless to say, you can only concatenate two wave-files when they have
    the same format. Otherwise, you have to convert the one you append
    first.

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
     
    Tassilo v. Parseval, Oct 1, 2004
    #15
  16. Jarson

    Jarson Guest

    "Tassilo v. Parseval" <> wrote in message
    news:...
    >[ Please don't top-post ]
    >
    > Also sprach Jarson:
    >
    >
    > Yes, the first 44 bytes of a wav-file is the header. If you want to
    > append one wave-file to another, you first have to strip off the first
    > 44 bytes of the stream to be appended. After that, you measure the size
    > of the stream in bytes (it's filesize minus 44 for obvious reasons).
    > Then you append the stream to the file. The last thing you have to do is
    > update two fields in the wave-header of the first stream. Those two
    > fields denote the size in bytes of the file and the stream:
    >
    > #!/usr/bin/perl -w
    >
    > use Fcntl qw/:seek/;
    >
    > my ($file1, $file2) = @ARGV;
    >
    > open WAV1, "+<", $file1 or die $!;
    > open WAV2, $file2 or die $!;
    >
    > binmode WAV1;
    > binmode WAV2;
    >
    > # the length of the second stream without header
    > my $size = -s WAV2 - 44;
    >
    > $/ = \4; # four bytes on each <>
    >
    > seek WAV1, 4, SEEK_SET;
    > my $filesize = unpack "V", <WAV1>; # it's little-endian AFAIK
    > seek WAV1, 32, SEEK_CUR;
    > my $streamsize = unpack "V", <WAV1>;
    >
    > seek WAV1, 4, SEEK_SET;
    > print WAV1 pack "V", $filesize + $size;
    > seek WAV1, 32, SEEK_CUR;
    > print WAV1 pack "V", $streamsize + $size;
    >
    > seek WAV1, 0, SEEK_END;
    >
    > $/ = \4096; # increase block-size a bit
    >
    > # skip header of second file
    > seek WAV2, 44, SEEK_SET;
    >
    > # append
    > print WAV1 $_ while <WAV2>;
    >
    > close WAV1;
    > close WAV2;
    >
    > The above is totally untested so byte-offsets and such might be a bit
    > off. If you get hold of a description of a wave-header you should be
    > able to understand what the above does.
    >
    > Needless to say, you can only concatenate two wave-files when they have
    > the same format. Otherwise, you have to convert the one you append
    > first.
    >
    > Tassilo


    Fantastic! I modified Kevin C's code snippet to effectively do as you
    described and the joined
    wave file works fine. Thanks for the insight. Now I'm riding the wave!
    Jarson
     
    Jarson, Oct 1, 2004
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. aten
    Replies:
    1
    Views:
    663
    Roedy Green
    Oct 4, 2003
  2. Replies:
    3
    Views:
    752
    Andrew Thompson
    Jan 23, 2004
  3. Replies:
    4
    Views:
    11,637
    Thomas Weidenfeller
    Apr 21, 2006
  4. Jomba
    Replies:
    0
    Views:
    335
    Jomba
    May 3, 2005
  5. '2+
    Replies:
    2
    Views:
    494
Loading...

Share This Page