mp3 file magic number identification

J

John Joyce

Does anybody know how to identify (validate) mp3 files (other audio
files would be interesting as well) by 'magic number'?
I never trust file extensions to be correct. It's to easy for users
to accidentally munge file names in a GUI or even for malicious users
to try bad things by simply changing file names.

Any library or code is welcome!
Daniel Berger said he'd even add it to Ptools or a similar library if
it gets posted on Ruby-Talk.

I did find this online as a purported mp3 magic number (in hex of
course),
49 44 33
but I'm not even going to bother using it since I don't know
definitively that all mp3's will have it, and I don't know where to
expect it in the file.

Thanks,
John Joyce
 
S

Stefan Mahlitz

John said:
Does anybody know how to identify (validate) mp3 files (other audio
files would be interesting as well) by 'magic number'?
I did find this online as a purported mp3 magic number (in hex of course),
49 44 33
but I'm not even going to bother using it since I don't know
definitively that all mp3's will have it, and I don't know where to
expect it in the file.

does this help: http://raa.ruby-lang.org/project/filemagic/

Stefan
 
A

Adam Shelly

Does anybody know how to identify (validate) mp3 files (other audio
files would be interesting as well) by 'magic number'?

I know for sure that:
= wave files start with the characters 'RIFF' followed by 4 bytes
(filesize-8) followed by 'WAVE'.
= ogg vorbis files start with 'oggS' followed by 24 bytes then 0x01
and the string 'vorbis'
= MIDI files start with 'MThd'

and according to wikipedia (and verified with one file on my system)
= MP3 files should start with 0xFF FB or 0xFF FA.

-Adam
 
J

John Joyce

I know for sure that:
= wave files start with the characters 'RIFF' followed by 4 bytes
(filesize-8) followed by 'WAVE'.
= ogg vorbis files start with 'oggS' followed by 24 bytes then 0x01
and the string 'vorbis'
= MIDI files start with 'MThd'

and according to wikipedia (and verified with one file on my system)
= MP3 files should start with 0xFF FB or 0xFF FA.

-Adam
Thanks Adam, that's the kind of thing I'm looking for exactly.
If anyone can contribute more audio file magic numbers, please do!

I guess video/AV files should be next as well, primarily things
like .mov, .wmv, etc...
Oh, and I think we should find out if smaf is the same as midi.

These kind of validation tools can be useful to us all these days.
 
B

Ben Bleything

Thanks Adam, that's the kind of thing I'm looking for exactly.
If anyone can contribute more audio file magic numbers, please do!

If you're on a *nix system, you should have a "magic" file someplace
that describes the magic of every filetype that the "file" command can
understand.

If you're not, find someone who is that can send you the file :) You
might also look at the libmagic source or the filemagic source.

Ben
 
J

John Joyce

Thanks Adam, Ben, and others...
found the magic number file in
/usr/share/file/magic
(on OS X, but likely in the same place on any *nix, I'm guessing it's
one of those files that is often used by people more sophisticated
than myself who write C for a living)
There is a LOT of stuff in there!!
Wish I had looked in there before!
So I've written a minimal bit of Ruby like an lazy person. Copying
D.Berger's Ptools style basically, by simply adding to the File class
my mini methods.

Though I'm going to need some testing... the magic file (not always
easy to read)
says this :
# MPEG 1.0 Layer 3
0 beshort&0xfffe =0xfffa \bMP3

I'm not 100% sure, but Adam said 0xFFFB or 0xFFFA, and the magic file
lists only FFFA or does it mean FFFE and/or FFFA ?
 
B

Ben Bleything

Though I'm going to need some testing... the magic file (not always
easy to read)
says this :
# MPEG 1.0 Layer 3
0 beshort&0xfffe =0xfffa \bMP3

I'm not 100% sure, but Adam said 0xFFFB or 0xFFFA, and the magic file
lists only FFFA or does it mean FFFE and/or FFFA ?

Do "man magic" (or possibly man 5 magic, or man -s 5 magic), and it
should describe the format of the file. Basically, it's offset, type,
magic, message. Numeric types can be specified with &0xnnnn, where the
number is ANDed with the magic. I'm basically just quoting from the
manpage, though, so give it a gander.

Unix is cool :)

Ben
 
J

John Joyce

Do "man magic" (or possibly man 5 magic, or man -s 5 magic), and it
should describe the format of the file. Basically, it's offset, type,
magic, message. Numeric types can be specified with &0xnnnn, where
the
number is ANDed with the magic. I'm basically just quoting from the
manpage, though, so give it a gander.

Unix is cool :)

Ben
Yeah, I read that already. Seemed simple. Many file descriptions are
readable, but the MP3 one is one of many that don't make sense to me.
" Numeric types can be specified with &0xnnnn, where the
number is ANDed with the magic."
Makes no sense to me at all. I'm not a C person really.

So what does the above mean??
I see hex numbers. but what is that '=' doing ?
That's cryptic.
The other lines after that make sense. They all describe the second
byte and that it determines the bitrate.
so do I care about 0xfffe? or 0xfffa?
or both?
I'm hoping I'm doing this right.
 
F

Felipe Contreras

Hi,

Does anybody know how to identify (validate) mp3 files (other audio
files would be interesting as well) by 'magic number'?
I never trust file extensions to be correct. It's to easy for users
to accidentally munge file names in a GUI or even for malicious users
to try bad things by simply changing file names.

Any library or code is welcome!
Daniel Berger said he'd even add it to Ptools or a similar library if
it gets posted on Ruby-Talk.

I did find this online as a purported mp3 magic number (in hex of
course),
49 44 33
but I'm not even going to bother using it since I don't know
definitively that all mp3's will have it, and I don't know where to
expect it in the file.

http://en.wikipedia.org/wiki/MP3

This explains it all:
http://upload.wikimedia.org/wikipedia/commons/0/01/Mp3filestructure.svg

So first byte should be 0xFF, second byte & 0xFE should equal 0xFA.
that is only for layer-3.

However if the MP3 has ID3v1 tags then it will start with "ID3".

Best regards.
 
B

Ben Bleything

Yeah, I read that already. Seemed simple. Many file descriptions are
readable, but the MP3 one is one of many that don't make sense to me.
" Numeric types can be specified with &0xnnnn, where the
Makes no sense to me at all. I'm not a C person really.

That's not a C thing, that's just general math.
So what does the above mean??
I see hex numbers. but what is that '=' doing ?

Okay, from left to right:

0: that's the offset. It means the magic starts at byte 0

beshort&0xfffe: the magic is a big-endian short (2bytes), and you should
take the value you get from the file and AND it with 0xfffe

=0xfffa: this is what you're looking for

\bMP3: this is what file will print if it matches this magic.
That's cryptic.

Sure, but it's all explained in the man page.
The other lines after that make sense. They all describe the second
byte and that it determines the bitrate.
so do I care about 0xfffe? or 0xfffa?
or both?

Yes, that's the magic.
I'm hoping I'm doing this right.

If your script is correctly identifying MP3 files you're using as a
control, then you're probably doing it just fine :)

One thing to be careful of is that there are multiple definitions of
what an MP3 looks like (at least, there are in my magic file). For
instance, MP3s with an ID3v2 tag will start with "ID3" instead of the
magic described above.

Make sure you search through your whole magic file for any given type
before you commit to writing code for it. You might find exceptions or
easier cases.

Cheers,
Ben
 
B

Ben Bleything

So first byte should be 0xFF, second byte & 0xFE should equal 0xFA.
that is only for layer-3.

However if the MP3 has ID3v1 tags then it will start with "ID3".

Actually, ID3v1 tags go at the end of the file, ID3v2 tags go at the
beginning (usually; they're supported in both locations).

Ben
 
J

John Joyce

Actually, ID3v1 tags go at the end of the file, ID3v2 tags go at the
beginning (usually; they're supported in both locations).

Ben

That's one point I was definitely concerned about. Some sites
describe one or the other, but don't always carefully make the
distinction which ID3 version.

Well, my script seems to work. For my current purposes it should be
enough, but I'm still a little fuzzy on what it means to AND the
bytes FFFE and FFFA ?

What kind of AND?

I'm not only trying to have a working script, I want to know what I'm
doing here so next time I don't have to ask
(this is the first time I've delved into binary file structures, so
bear with me here.)
I am learning a lot with this. thanks
 
B

Ben Bleything

That's one point I was definitely concerned about. Some sites
describe one or the other, but don't always carefully make the
distinction which ID3 version.

Yeah. MP3s are complex :)
Well, my script seems to work. For my current purposes it should be
enough, but I'm still a little fuzzy on what it means to AND the
bytes FFFE and FFFA ?

So you take the first 2 bytes of the file and AND them with FFFE. If
the result is FFFA, then you've got an mp3 file (albeit one with no
tag).
What kind of AND?

Bitwise. Boolean AND doesn't make any sense in this context.

Say the first two bytes are 0xFFFB:

magic = 0xFFFB
check = 0xFFFE

if (magic & check) == 0xFFFA
puts "you've got an mp3"
end
I'm not only trying to have a working script, I want to know what I'm
doing here so next time I don't have to ask
(this is the first time I've delved into binary file structures, so
bear with me here.)
I am learning a lot with this. thanks

No problem. This stuff is pretty trivial in the grand scheme of things,
but can definitely be confusing if you've never worked with binary
before.

Ben
 
D

darren kirby

quoth the John Joyce:
Thanks Adam, that's the kind of thing I'm looking for exactly.
If anyone can contribute more audio file magic numbers, please do!

The first 4 bytes of a Flac file must be 0x66, 0x4C, 0x61, and 0x43,
ie: 'fLaC'.
I guess video/AV files should be next as well, primarily things
like .mov, .wmv, etc...

wma/.wmv are a bit trickier. You can do:

-----------------------------
def byteStringToGUID(byteString)
guidString = sprintf("%02X", byteString[3])
guidString += sprintf("%02X", byteString[2])
guidString += sprintf("%02X", byteString[1])
guidString += sprintf("%02X", byteString[0])
guidString += '-'
guidString += sprintf("%02X", byteString[5])
guidString += sprintf("%02X", byteString[4])
guidString += '-'
guidString += sprintf("%02X", byteString[7])
guidString += sprintf("%02X", byteString[6])
guidString += '-'
guidString += sprintf("%02X", byteString[8])
guidString += sprintf("%02X", byteString[9])
guidString += '-'
guidString += sprintf("%02X", byteString[10])
guidString += sprintf("%02X", byteString[11])
guidString += sprintf("%02X", byteString[12])
guidString += sprintf("%02X", byteString[13])
guidString += sprintf("%02X", byteString[14])
guidString += sprintf("%02X", byteString[15])
end

fh = File.new("example.wma", "rb")
id = byteStringToGUID(fh.read(16))
if id == '75B22630-668E-11CF-A6D9-00AA0062CE6C'
puts "Valid wma/wmv file"
else
"Not a wma/wmv"
end
--------------------------

This will work for anything in an ASF wrapper.

Your best bet to find this info for other files is to find and read the
respective specs. These should be easy to track down using Wikipedia's audio
and video codec categories. Usually there is a direct link to the spec, or at
least the official site for the codec.

HTH

-d
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top