[ANN] Metadata 0.5

Discussion in 'Ruby' started by Ilmari Heikkinen, Sep 16, 2007.

  1. Konrad Meyer <> wrote:
    > Any chance you could wrap this up as a gem?


    I already have a gemspec file, but gem screws up bin/chardet by
    plastering it with #!/usr/bin/ruby boilerplate (it's a python file).

    And I don't know how to turn it off.


    > Another bug (Sorry :D):
    > $ mdh -p ~/music/Limp\ Bizkit\ -\ Rollin\'\ \(edited\).ogg
    > sh: -c: line 0: syntax error near unexpected token `('
    > sh: -c: line 0: `ogginfo '/home/konrad/music/Limp Bizkit - Rollin\'
    > (edited).ogg''
    >
    > (Last line was broken up to email length.) You're already escaping single
    > quotes for the shell, need to escape start-parens and end-parens as well.


    Argh, amateurish mistake on my part, thanks for catching that. Fixed.
    If in a bit over-engineered way (creating a safely named link to the file.)
    Probably impossible to safely pass a filename like "-f -i -l -e -z"
    to a shell command that doesn't support "--" in any other way, though.


    > Also:
    > For mp3 id3v2 tags, the binary string "\xCB\x99\xC5\xA3" is being inserted
    > at the front of all the string fields.
    >
    > [snip]
    >
    > I *think* this is an id3v2 thing. Also, it happens in more than one file and
    > amaroK sees the tags "correctly", so I'm thinking it's on the metadata's
    > end. Thanks!


    Right you are. Fixed. No idea what was causing it. Moved to
    using id3lib for the tags (it extracts embedded album art as well!) and
    mplayer for the rest of the metadata.


    Here we go, 0.5:

    tarball: http://dark.fhtr.org/repos/metadata/metadata-0.5.tar.gz
    git: http://dark.fhtr.org/repos/metadata


    Description
    -----------

    This package `Metadata' comes with a library called `metadata' and
    a small program called `mdh'.

    The library probes files for their metadata (e.g. jpeg dimensions
    and camera make, mp3 artist, pdf word count) and returns the metadata
    as a Hash.

    Mdh can print out file metadata as YAML and package the metadata
    with the file.

    This package has many dependencies since there is no single universal
    metadata header format that all files use. Blame resource forks, filename
    extensions, bags of bytes and mimetypes.


    Usage
    -----

    # print out metadata header
    mdh -p myfile.jpg

    # create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
    mdh myfile.jpg

    # print out metadata header from mdh file
    mdh -e -p myfile.jpg.mdh

    # strip out metadata header from mdh file and save it to myfile.jpg
    mdh -e myfile.jpg.mdh

    # print out list of flags
    mdh -h

    irb> Metadata.extract('myfile.jpg')
    irb> Metadata.extract_text('myfile.jpg')
    irb> Pathname.new("myfile.jpg").metadata


    List of supported formats
    -------------------------

    Audio:
    Whatever you manage to make mplayer play.
    Plus FLAC, m4a and wma handled specially.
    Successfully tested with:
    mp3, flac, ogg, wav
    Should also work:
    wma, m4a

    Video:
    Whatever you manage to make mplayer play.
    Successfully tested with:
    wmv, mov, divx, xvid, flv, ogm, mpg

    Images:
    Should handle pretty much anything (apart from XCF and ORF.)
    Successfully tested with:
    jpeg, png, gif, nef, dng, crw, pef, psd

    Documents:
    Successfully tested with:
    pdf, ppt, odp, sxi, ps, ps.gz, html, txt
    Should work:
    - OpenOffice docs work to some degree (personally, I'm using unoconv to
    convert OO docs to temp PDFs for the text & dimensions extraction, so
    those bits of data are missing.)
    - MS Office docs to some degree (ppt at least, doc and xls should work too,
    dimensions missing due to the above temp PDF -thing.)

    Others:
    Whatever extract spits out on the five or six bits of metadata I'm using
    from it. Archive contents at least.

    Requirements
    ------------

    * Ruby 1.8

    * Tons of metadata extraction programs and libs,
    list of gems:
    flacinfo-rb
    wmainfo-rb
    MP4info
    id3lib-ruby
    list of debian packages:
    dcraw
    libimlib2-ruby
    extract
    libimage-exiftool-perl
    poppler-utils
    mplayer
    html2text
    imagemagick
    unhtml
    pstotext
    antiword
    catdoc
    shared-mime-info

    * You do want to install the latest versions of dcraw and
    shared-mime-info to be able to handle camera raw images.
    http://cybercom.net/~dcoffin/dcraw/
    http://freedesktop.org/wiki/Software/shared-mime-info

    * Python + chardet library
    http://chardet.feedparser.org/

    Install
    -------

    De-compress archive and enter its top directory.
    Then type:

    ($ su)
    # ruby setup.rb

    These simple step installs this program under the default
    location of Ruby libraries. You can also install files into
    your favorite directory by supplying setup.rb some options.
    Try "ruby setup.rb --help".


    License
    -------

    Ruby's

    --
    Ilmari Heikkinen <ilmari.heikkinen gmail com>
    http://fhtr.blogspot.com
     
    Ilmari Heikkinen, Sep 16, 2007
    #1
    1. Advertising

  2. Ilmari Heikkinen

    Konrad Meyer Guest

    --nextPart2436955.XrW6RtufSZ
    Content-Type: text/plain;
    charset="utf-8"
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    Quoth Ilmari Heikkinen:
    > Konrad Meyer <> wrote:
    > > Any chance you could wrap this up as a gem?

    >=20
    > I already have a gemspec file, but gem screws up bin/chardet by
    > plastering it with #!/usr/bin/ruby boilerplate (it's a python file).
    >=20
    > And I don't know how to turn it off.
    >=20
    >=20
    > > Another bug (Sorry :D):
    > > $ mdh -p ~/music/Limp\ Bizkit\ -\ Rollin\'\ \(edited\).ogg
    > > sh: -c: line 0: syntax error near unexpected token `('
    > > sh: -c: line 0: `ogginfo '/home/konrad/music/Limp Bizkit - Rollin\'
    > > (edited).ogg''
    > >
    > > (Last line was broken up to email length.) You're already escaping sing=

    le
    > > quotes for the shell, need to escape start-parens and end-parens as wel=

    l.
    >=20
    > Argh, amateurish mistake on my part, thanks for catching that. Fixed.
    > If in a bit over-engineered way (creating a safely named link to the file=

    =2E)
    > Probably impossible to safely pass a filename like "-f -i -l -e -z"
    > to a shell command that doesn't support "--" in any other way, though.
    >=20
    >=20
    > > Also:
    > > For mp3 id3v2 tags, the binary string "\xCB\x99\xC5\xA3" is being inser=

    ted
    > > at the front of all the string fields.
    > >
    > > [snip]
    > >
    > > I *think* this is an id3v2 thing. Also, it happens in more than one fil=

    e=20
    and
    > > amaroK sees the tags "correctly", so I'm thinking it's on the metadata's
    > > end. Thanks!

    >=20
    > Right you are. Fixed. No idea what was causing it. Moved to
    > using id3lib for the tags (it extracts embedded album art as well!) and
    > mplayer for the rest of the metadata.
    >=20
    >=20
    > Here we go, 0.5:
    >=20
    > tarball: http://dark.fhtr.org/repos/metadata/metadata-0.5.tar.gz
    > git: http://dark.fhtr.org/repos/metadata
    >
    > ...
    >=20
    > --
    > Ilmari Heikkinen <ilmari.heikkinen gmail com>
    > http://fhtr.blogspot.com


    Thanks, trying it out now. (I'm basically running it on every file
    in my collection and running back to you when I get errors. :D)

    =2D-=20
    Konrad Meyer <> http://konrad.sobertillnoon.com/

    --nextPart2436955.XrW6RtufSZ
    Content-Type: application/pgp-signature; name=signature.asc
    Content-Description: This is a digitally signed message part.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQBG7RoXCHB0oCiR2cwRAgi5AJ9yOb/VCX8egoJ2Si5MtpIw8IscbwCgiabO
    dSyUUSTtBezL67NqUQ55kiE=
    =xym/
    -----END PGP SIGNATURE-----

    --nextPart2436955.XrW6RtufSZ--
     
    Konrad Meyer, Sep 16, 2007
    #2
    1. Advertising

  3. Ilmari Heikkinen

    Konrad Meyer Guest

    --nextPart1916627.dPRiRoWNCN
    Content-Type: text/plain;
    charset="utf-8"
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    Quoth Ilmari Heikkinen:
    > Konrad Meyer <> wrote:
    > > Any chance you could wrap this up as a gem?

    >=20
    > I already have a gemspec file, but gem screws up bin/chardet by
    > plastering it with #!/usr/bin/ruby boilerplate (it's a python file).
    >=20
    > And I don't know how to turn it off.
    >=20
    >=20
    > > Another bug (Sorry :D):
    > > $ mdh -p ~/music/Limp\ Bizkit\ -\ Rollin\'\ \(edited\).ogg
    > > sh: -c: line 0: syntax error near unexpected token `('
    > > sh: -c: line 0: `ogginfo '/home/konrad/music/Limp Bizkit - Rollin\'
    > > (edited).ogg''
    > >
    > > (Last line was broken up to email length.) You're already escaping sing=

    le
    > > quotes for the shell, need to escape start-parens and end-parens as wel=

    l.
    >=20
    > Argh, amateurish mistake on my part, thanks for catching that. Fixed.
    > If in a bit over-engineered way (creating a safely named link to the file=

    =2E)
    > Probably impossible to safely pass a filename like "-f -i -l -e -z"
    > to a shell command that doesn't support "--" in any other way, though.
    >=20
    >=20
    > > Also:
    > > For mp3 id3v2 tags, the binary string "\xCB\x99\xC5\xA3" is being inser=

    ted
    > > at the front of all the string fields.
    > >
    > > [snip]
    > >
    > > I *think* this is an id3v2 thing. Also, it happens in more than one fil=

    e=20
    and
    > > amaroK sees the tags "correctly", so I'm thinking it's on the metadata's
    > > end. Thanks!

    >=20
    > Right you are. Fixed. No idea what was causing it. Moved to
    > using id3lib for the tags (it extracts embedded album art as well!) and
    > mplayer for the rest of the metadata.
    >=20
    >=20
    > Here we go, 0.5:
    >=20
    > tarball: http://dark.fhtr.org/repos/metadata/metadata-0.5.tar.gz
    > git: http://dark.fhtr.org/repos/metadata
    >=20
    >=20
    > Description
    > -----------
    >=20
    > This package `Metadata' comes with a library called `metadata' and
    > a small program called `mdh'.
    >=20
    > The library probes files for their metadata (e.g. jpeg dimensions
    > and camera make, mp3 artist, pdf word count) and returns the metadata
    > as a Hash.
    >=20
    > Mdh can print out file metadata as YAML and package the metadata
    > with the file.
    >=20
    > This package has many dependencies since there is no single universal
    > metadata header format that all files use. Blame resource forks, filena=

    me
    > extensions, bags of bytes and mimetypes.
    >=20
    >=20
    > Usage
    > -----
    >=20
    > # print out metadata header
    > mdh -p myfile.jpg
    >=20
    > # create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
    > mdh myfile.jpg
    >=20
    > # print out metadata header from mdh file
    > mdh -e -p myfile.jpg.mdh
    >=20
    > # strip out metadata header from mdh file and save it to myfile.jpg
    > mdh -e myfile.jpg.mdh
    >=20
    > # print out list of flags
    > mdh -h
    >=20
    > irb> Metadata.extract('myfile.jpg')
    > irb> Metadata.extract_text('myfile.jpg')
    > irb> Pathname.new("myfile.jpg").metadata
    >=20
    >=20
    > List of supported formats
    > -------------------------
    >=20
    > Audio:
    > Whatever you manage to make mplayer play.
    > Plus FLAC, m4a and wma handled specially.
    > Successfully tested with:
    > mp3, flac, ogg, wav
    > Should also work:
    > wma, m4a
    >=20
    > Video:
    > Whatever you manage to make mplayer play.
    > Successfully tested with:
    > wmv, mov, divx, xvid, flv, ogm, mpg
    >=20
    > Images:
    > Should handle pretty much anything (apart from XCF and ORF.)
    > Successfully tested with:
    > jpeg, png, gif, nef, dng, crw, pef, psd
    >=20
    > Documents:
    > Successfully tested with:
    > pdf, ppt, odp, sxi, ps, ps.gz, html, txt
    > Should work:
    > - OpenOffice docs work to some degree (personally, I'm using unoconv =

    to
    > convert OO docs to temp PDFs for the text & dimensions extraction, =

    so
    > those bits of data are missing.)
    > - MS Office docs to some degree (ppt at least, doc and xls should wor=

    k=20
    too,
    > dimensions missing due to the above temp PDF -thing.)
    >=20
    > Others:
    > Whatever extract spits out on the five or six bits of metadata I'm us=

    ing
    > from it. Archive contents at least.
    >=20
    > Requirements
    > ------------
    >=20
    > * Ruby 1.8
    >=20
    > * Tons of metadata extraction programs and libs,
    > list of gems:
    > flacinfo-rb
    > wmainfo-rb
    > MP4info
    > id3lib-ruby
    > list of debian packages:
    > dcraw
    > libimlib2-ruby
    > extract
    > libimage-exiftool-perl
    > poppler-utils
    > mplayer
    > html2text
    > imagemagick
    > unhtml
    > pstotext
    > antiword
    > catdoc
    > shared-mime-info
    >=20
    > * You do want to install the latest versions of dcraw and
    > shared-mime-info to be able to handle camera raw images.
    > http://cybercom.net/~dcoffin/dcraw/
    > http://freedesktop.org/wiki/Software/shared-mime-info
    >=20
    > * Python + chardet library
    > http://chardet.feedparser.org/
    >=20
    > Install
    > -------
    >=20
    > De-compress archive and enter its top directory.
    > Then type:
    >=20
    > ($ su)
    > # ruby setup.rb
    >=20
    > These simple step installs this program under the default
    > location of Ruby libraries. You can also install files into
    > your favorite directory by supplying setup.rb some options.
    > Try "ruby setup.rb --help".
    >=20
    >=20
    > License
    > -------
    >=20
    > Ruby's
    >=20
    > --
    > Ilmari Heikkinen <ilmari.heikkinen gmail com>
    > http://fhtr.blogspot.com


    Another bug, here we go:

    undefined method `audio_x_vorbis_ogg' for Metadata:Module
    undefined method `audio_x_vorbis_ogg' for Metadata:Module
    /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:365:in `video_x_theora_og=
    g'
    /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:142:in `__send__'
    /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:142:in `extract'

    That code that seems to be failing is:

    def video_x_theora_ogg(filename, charset)
    h =3D video(filename, charset)
    wma =3D audio_x_vorbis_ogg(filename, charset)
    %w(
    Artist Title Album Genre ReleaseDate TrackNo VariableBitrate
    ).each{|t|
    h['Video.'+t] =3D wma['Audio.'+t]
    }
    h
    end

    This makes sense, as audio_x_vorbis_ogg() doesn't exist anywhere else. :D

    =2D-=20
    Konrad Meyer <> http://konrad.sobertillnoon.com/

    --nextPart1916627.dPRiRoWNCN
    Content-Type: application/pgp-signature; name=signature.asc
    Content-Description: This is a digitally signed message part.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQBG7R2CCHB0oCiR2cwRArYdAKCW9skY/1wZHeAABn3+USvHwb4DvQCfcqIp
    Epu7SVDCHG8S25L+DJzImII=
    =M9Ox
    -----END PGP SIGNATURE-----

    --nextPart1916627.dPRiRoWNCN--
     
    Konrad Meyer, Sep 16, 2007
    #3
  4. Ilmari Heikkinen, Sep 16, 2007
    #4
  5. Ilmari Heikkinen

    Konrad Meyer Guest

    --nextPart1460016.8SKBsZgep3
    Content-Type: text/plain;
    charset="utf-8"
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    Quoth Ilmari Heikkinen:
    > On 9/16/07, Konrad Meyer <> wrote:
    >=20
    > > Another bug, here we go:

    >=20
    > > undefined method `audio_x_vorbis_ogg' for Metadata:Module

    >=20
    > > This makes sense, as audio_x_vorbis_ogg() doesn't exist anywhere else. =

    :D
    >=20
    > Fixed. And 0.6 :)
    > http://dark.fhtr.org/repos/metadata/metadata-0.6.tar.gz


    Ooh, here's another: :)

    /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in `unlink': No such fi=
    le=20
    or directory - _tmp_metadata_temp_22720__604265598_1189946590.27022=20
    (Errno::ENOENT)
    from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in=20
    `secure_filename'
    from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:590:in=20
    `extract_extract_info'
    from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:153:in `extrac=
    t'

    Not sure what that is, and frankly atm my brain is a bit too weak to think
    about it. But you should be fresh and able to solve that.

    =2D-=20
    Konrad Meyer <> http://konrad.sobertillnoon.com/

    --nextPart1460016.8SKBsZgep3
    Content-Type: application/pgp-signature; name=signature.asc
    Content-Description: This is a digitally signed message part.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQBG7SWDCHB0oCiR2cwRAkSCAJ9ygpHsS8Xw+Zlqo4M3zVN79zukbQCguZOP
    pmwdE/PKN18+Qt5OahJP6F4=
    =CroN
    -----END PGP SIGNATURE-----

    --nextPart1460016.8SKBsZgep3--
     
    Konrad Meyer, Sep 16, 2007
    #5
  6. On 9/16/07, Konrad Meyer <> wrote:
    > Quoth Ilmari Heikkinen:
    > > On 9/16/07, Konrad Meyer <> wrote:
    > >
    > > > Another bug, here we go:

    > >
    > > > undefined method `audio_x_vorbis_ogg' for Metadata:Module

    > >
    > > > This makes sense, as audio_x_vorbis_ogg() doesn't exist anywhere else. :D

    > >
    > > Fixed. And 0.6 :)
    > > http://dark.fhtr.org/repos/metadata/metadata-0.6.tar.gz

    >
    > Ooh, here's another: :)
    >
    > /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in `unlink': No such file
    > or directory - _tmp_metadata_temp_22720__604265598_1189946590.27022
    > (Errno::ENOENT)
    > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in
    > `secure_filename'
    > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:590:in
    > `extract_extract_info'
    > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:153:in `extract'
    >
    > Not sure what that is, and frankly atm my brain is a bit too weak to think
    > about it. But you should be fresh and able to solve that.


    Apparently temporary hardlinks weren't such a hot idea after all. Nuts.

    Ok, now escaping filename by default, only trying to "ln rescue cp" for
    filenames starting with a dash. Running it against my downloads-dir
    presently, been working ok thus far. YMMV of course :)

    http://dark.fhtr.org/repos/metadata/metadata-0.7.tar.gz
     
    Ilmari Heikkinen, Sep 16, 2007
    #6
  7. Ilmari Heikkinen

    Konrad Meyer Guest

    --nextPart1701079.cmvNZQ0e5P
    Content-Type: text/plain;
    charset="utf-8"
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    Quoth Ilmari Heikkinen:
    > On 9/16/07, Konrad Meyer <> wrote:
    > > Quoth Ilmari Heikkinen:
    > > > On 9/16/07, Konrad Meyer <> wrote:
    > > >
    > > > > Another bug, here we go:
    > > >
    > > > > undefined method `audio_x_vorbis_ogg' for Metadata:Module
    > > >
    > > > > This makes sense, as audio_x_vorbis_ogg() doesn't exist anywhere=20

    else. :D
    > > >
    > > > Fixed. And 0.6 :)
    > > > http://dark.fhtr.org/repos/metadata/metadata-0.6.tar.gz

    > >
    > > Ooh, here's another: :)
    > >
    > > /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in `unlink': No suc=

    h=20
    file
    > > or directory - _tmp_metadata_temp_22720__604265598_1189946590.27022
    > > (Errno::ENOENT)
    > > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in
    > > `secure_filename'
    > > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:590:in
    > > `extract_extract_info'
    > > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:153:in=20

    `extract'
    > >
    > > Not sure what that is, and frankly atm my brain is a bit too weak to th=

    ink
    > > about it. But you should be fresh and able to solve that.

    >=20
    > Apparently temporary hardlinks weren't such a hot idea after all. Nuts.
    >=20
    > Ok, now escaping filename by default, only trying to "ln rescue cp" for
    > filenames starting with a dash. Running it against my downloads-dir
    > presently, been working ok thus far. YMMV of course :)
    >=20
    > http://dark.fhtr.org/repos/metadata/metadata-0.7.tar.gz


    The title tag isn't being parsed out of oggs:

    $ mdh -p music/korn_-_clown.ogg=20
    Video.TrackNo: 16
    Video.Artist: Korn
    Video.Genre: Hard Rock
    Video.Album: Greatest Hits Vol. 1

    vs mplayer:

    Ogg file format detected.
    Clip info:
    Genre: Hard Rock
    Name: Clown
    Artist: Korn
    Album: Greatest Hits Vol. 1
    Track: 16

    Cheers,
    =2D-=20
    Konrad Meyer <> http://konrad.sobertillnoon.com/

    --nextPart1701079.cmvNZQ0e5P
    Content-Type: application/pgp-signature; name=signature.asc
    Content-Description: This is a digitally signed message part.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQBG7afNCHB0oCiR2cwRArbwAKCjfb1RYGBF8gnEkz5oPwM0XC0+mgCfTugg
    +0E7h5FQqbdh0bVrXN48BHc=
    =h/qo
    -----END PGP SIGNATURE-----

    --nextPart1701079.cmvNZQ0e5P--
     
    Konrad Meyer, Sep 16, 2007
    #7
  8. On 9/17/07, Konrad Meyer <> wrote:
    > Quoth Ilmari Heikkinen:
    > > On 9/16/07, Konrad Meyer <> wrote:
    > > > Quoth Ilmari Heikkinen:
    > > > > On 9/16/07, Konrad Meyer <> wrote:
    > > > >
    > > > > > Another bug, here we go:
    > > > >
    > > > > > undefined method `audio_x_vorbis_ogg' for Metadata:Module
    > > > >
    > > > > > This makes sense, as audio_x_vorbis_ogg() doesn't exist anywhere

    > else. :D
    > > > >
    > > > > Fixed. And 0.6 :)
    > > > > http://dark.fhtr.org/repos/metadata/metadata-0.6.tar.gz
    > > >
    > > > Ooh, here's another: :)
    > > >
    > > > /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in `unlink': No such

    > file
    > > > or directory - _tmp_metadata_temp_22720__604265598_1189946590.27022
    > > > (Errno::ENOENT)
    > > > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in
    > > > `secure_filename'
    > > > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:590:in
    > > > `extract_extract_info'
    > > > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:153:in

    > `extract'
    > > >
    > > > Not sure what that is, and frankly atm my brain is a bit too weak to think
    > > > about it. But you should be fresh and able to solve that.

    > >
    > > Apparently temporary hardlinks weren't such a hot idea after all. Nuts.
    > >
    > > Ok, now escaping filename by default, only trying to "ln rescue cp" for
    > > filenames starting with a dash. Running it against my downloads-dir
    > > presently, been working ok thus far. YMMV of course :)
    > >
    > > http://dark.fhtr.org/repos/metadata/metadata-0.7.tar.gz

    >
    > The title tag isn't being parsed out of oggs:
    >
    > $ mdh -p music/korn_-_clown.ogg
    > Video.TrackNo: 16
    > Video.Artist: Korn
    > Video.Genre: Hard Rock
    > Video.Album: Greatest Hits Vol. 1
    >
    > vs mplayer:
    >
    > Ogg file format detected.
    > Clip info:
    > Genre: Hard Rock
    > Name: Clown
    > Artist: Korn
    > Album: Greatest Hits Vol. 1
    > Track: 16
    >


    Ah, it uses Name instead of Title. Thanks!
    Added it and made 0.8.

    http://dark.fhtr.org/repos/metadata/metadata-0.8.tar.gz

    Now I wonder what other synonyms mplayer uses...
    I'd really appreciate it if you could run the following over
    your media library and tell what field names it spews out:

    find $MEDIA_LIBRARY_DIR -type f | \
    mplayer -identify -ao null -vo null -frames 0 -playlist - | \
    grep ID_CLIP_INFO_NAME | sed 's/^.*=//' | sort | uniq

    (replace $MEDIA_LIBRARY_DIR with the directory name)

    Thanks again,
    --
    Ilmari Heikkinen
    http://fhtr.blogspot.com
     
    Ilmari Heikkinen, Sep 17, 2007
    #8
  9. Ilmari Heikkinen

    Konrad Meyer Guest

    --nextPart2191061.PLc03qNWHU
    Content-Type: text/plain;
    charset="utf-8"
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    Quoth Ilmari Heikkinen:
    > On 9/17/07, Konrad Meyer <> wrote:
    > > Quoth Ilmari Heikkinen:
    > > > On 9/16/07, Konrad Meyer <> wrote:
    > > > > Quoth Ilmari Heikkinen:
    > > > > > On 9/16/07, Konrad Meyer <> wrote:
    > > > > >
    > > > > > > Another bug, here we go:
    > > > > >
    > > > > > > undefined method `audio_x_vorbis_ogg' for Metadata:Module
    > > > > >
    > > > > > > This makes sense, as audio_x_vorbis_ogg() doesn't exist anywhere

    > > else. :D
    > > > > >
    > > > > > Fixed. And 0.6 :)
    > > > > > http://dark.fhtr.org/repos/metadata/metadata-0.6.tar.gz
    > > > >
    > > > > Ooh, here's another: :)
    > > > >
    > > > > /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in `unlink': No=

    =20
    such
    > > file
    > > > > or directory - _tmp_metadata_temp_22720__604265598_1189946590.27022
    > > > > (Errno::ENOENT)
    > > > > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in
    > > > > `secure_filename'
    > > > > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:590:in
    > > > > `extract_extract_info'
    > > > > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:153:in

    > > `extract'
    > > > >
    > > > > Not sure what that is, and frankly atm my brain is a bit too weak t=

    o=20
    think
    > > > > about it. But you should be fresh and able to solve that.
    > > >
    > > > Apparently temporary hardlinks weren't such a hot idea after all. Nut=

    s.
    > > >
    > > > Ok, now escaping filename by default, only trying to "ln rescue cp" f=

    or
    > > > filenames starting with a dash. Running it against my downloads-dir
    > > > presently, been working ok thus far. YMMV of course :)
    > > >
    > > > http://dark.fhtr.org/repos/metadata/metadata-0.7.tar.gz

    > >
    > > The title tag isn't being parsed out of oggs:
    > >
    > > $ mdh -p music/korn_-_clown.ogg
    > > Video.TrackNo: 16
    > > Video.Artist: Korn
    > > Video.Genre: Hard Rock
    > > Video.Album: Greatest Hits Vol. 1
    > >
    > > vs mplayer:
    > >
    > > Ogg file format detected.
    > > Clip info:
    > > Genre: Hard Rock
    > > Name: Clown
    > > Artist: Korn
    > > Album: Greatest Hits Vol. 1
    > > Track: 16
    > >

    >=20
    > Ah, it uses Name instead of Title. Thanks!
    > Added it and made 0.8.
    >=20
    > http://dark.fhtr.org/repos/metadata/metadata-0.8.tar.gz
    >=20
    > Now I wonder what other synonyms mplayer uses...
    > I'd really appreciate it if you could run the following over
    > your media library and tell what field names it spews out:
    >=20
    > find $MEDIA_LIBRARY_DIR -type f | \
    > mplayer -identify -ao null -vo null -frames 0 -playlist - | \
    > grep ID_CLIP_INFO_NAME | sed 's/^.*=3D//' | sort | uniq
    >=20
    > (replace $MEDIA_LIBRARY_DIR with the directory name)
    >=20
    > Thanks again,
    > --
    > Ilmari Heikkinen
    > http://fhtr.blogspot.com


    I'd love to run that but mplayer dies rather early on on some of my files.
    Also, seems like we have another bug (not sure what kind of file it's on,
    sorry):

    undefined method `empty?' for 40:Fixnum
    undefined method `empty?' for 40:Fixnum
    /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:765:in `enc_utf8'
    /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:352:in `video'
    /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:142:in `__send__'
    /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:142:in `extract'

    I'd guess one of the libraries you're using for parsing is giving back 40
    as a genre or track number (a bit high, but might be tagged wrong) and it
    needs to be converted to a string before you can use it.

    Thanks!
    =2D-=20
    Konrad Meyer <> http://konrad.sobertillnoon.com/

    --nextPart2191061.PLc03qNWHU
    Content-Type: application/pgp-signature; name=signature.asc
    Content-Description: This is a digitally signed message part.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQBG7jL1CHB0oCiR2cwRAnfgAJ9gq1JThinfb58fXeYWpF33uxss3wCfdkKS
    yWVonRjP8EUbTENiYWKr2RE=
    =GnEy
    -----END PGP SIGNATURE-----

    --nextPart2191061.PLc03qNWHU--
     
    Konrad Meyer, Sep 17, 2007
    #9
  10. On 9/17/07, Konrad Meyer <> wrote:
    > Quoth Ilmari Heikkinen:
    > > find $MEDIA_LIBRARY_DIR -type f | \
    > > mplayer -identify -ao null -vo null -frames 0 -playlist - | \
    > > grep ID_CLIP_INFO_NAME | sed 's/^.*=//' | sort | uniq
    > >
    > > (replace $MEDIA_LIBRARY_DIR with the directory name)
    > >


    > I'd love to run that but mplayer dies rather early on on some of my files.


    Hmm, here's a ruby version that should work through those:

    media_dir = "music"
    seen_names = {}
    mpc = "mplayer -identify -ao null -vo null -frames 0 -playlist - 2>/dev/null"
    Dir["#{media_dir}/**/*"].each{|fn|
    if File.file?(fn)
    IO.popen(mpc, "r+"){|mp|
    begin
    mp.puts fn
    mp.close_write
    tags = mp.read.strip.split("\n").grep(/^ID_CLIP_INFO_NAME/)
    names = tags.map{|t| t.split("=", 2)[1] }
    names.each{|n|
    seen_names[n] ||= (puts n; true)
    }
    rescue
    end
    }
    end
    }

    > I'd guess one of the libraries you're using for parsing is giving back 40
    > as a genre or track number (a bit high, but might be tagged wrong) and it
    > needs to be converted to a string before you can use it.


    Good catch, thanks. Fixed.
    http://dark.fhtr.org/repos/metadata/metadata-0.9.tar.gz


    >
    > Thanks!
    > --
    > Konrad Meyer <> http://konrad.sobertillnoon.com/
    >
    >
     
    Ilmari Heikkinen, Sep 17, 2007
    #10
  11. Ilmari Heikkinen

    Konrad Meyer Guest

    --nextPart1337699.2cq04A4Lq6
    Content-Type: text/plain;
    charset="utf-8"
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    Quoth Ilmari Heikkinen:
    > On 9/17/07, Konrad Meyer <> wrote:
    > > Quoth Ilmari Heikkinen:
    > > > find $MEDIA_LIBRARY_DIR -type f | \
    > > > mplayer -identify -ao null -vo null -frames 0 -playlist - | \
    > > > grep ID_CLIP_INFO_NAME | sed 's/^.*=3D//' | sort | uniq
    > > >
    > > > (replace $MEDIA_LIBRARY_DIR with the directory name)
    > > >

    >=20
    > > I'd love to run that but mplayer dies rather early on on some of my fil=

    es.
    >=20
    > Hmm, here's a ruby version that should work through those:
    >=20
    > media_dir =3D "music"
    > seen_names =3D {}
    > mpc =3D "mplayer -identify -ao null -vo null -frames 0 -playlist -=20

    2>/dev/null"
    > Dir["#{media_dir}/**/*"].each{|fn|
    > if File.file?(fn)
    > IO.popen(mpc, "r+"){|mp|
    > begin
    > mp.puts fn
    > mp.close_write
    > tags =3D mp.read.strip.split("\n").grep(/^ID_CLIP_INFO_NAME/)
    > names =3D tags.map{|t| t.split("=3D", 2)[1] }
    > names.each{|n|
    > seen_names[n] ||=3D (puts n; true)
    > }
    > rescue
    > end
    > }
    > end
    > }
    >=20
    > > I'd guess one of the libraries you're using for parsing is giving back =

    40
    > > as a genre or track number (a bit high, but might be tagged wrong) and =

    it
    > > needs to be converted to a string before you can use it.

    >=20
    > Good catch, thanks. Fixed.
    > http://dark.fhtr.org/repos/metadata/metadata-0.9.tar.gz
    >=20
    >=20
    > >
    > > Thanks!
    > > --
    > > Konrad Meyer <> http://konrad.sobertillnoon.com/


    Just FYI -- I've been running your script since about 6.5 hours ago, I'll
    reply again when it's actually done. So far no problems.

    HTH,
    =2D-=20
    Konrad Meyer <> http://konrad.sobertillnoon.com/

    --nextPart1337699.2cq04A4Lq6
    Content-Type: application/pgp-signature; name=signature.asc
    Content-Description: This is a digitally signed message part.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQBG7yDqCHB0oCiR2cwRAkJxAKCb1YpPhAQVi5sDfsPLz2TatXPl6ACfWKWj
    pBU3uDimN8PAM0ORZ9i5yjE=
    =LAW0
    -----END PGP SIGNATURE-----

    --nextPart1337699.2cq04A4Lq6--
     
    Konrad Meyer, Sep 18, 2007
    #11
  12. On 9/18/07, Konrad Meyer <> wrote:
    > Just FYI -- I've been running your script since about 6.5 hours ago, I'll
    > reply again when it's actually done. So far no problems.


    Big library, eh :)
    If it's still running, I doubt it's going to do much more...

    What has it printed out?

    Thanks!
    --
    Ilmari Heikkinen
    http://fhtr.blogspot.com
     
    Ilmari Heikkinen, Sep 18, 2007
    #12
  13. Ilmari Heikkinen

    Konrad Meyer Guest

    --nextPart1551355.kC21M76nqv
    Content-Type: text/plain;
    charset="utf-8"
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    Quoth Ilmari Heikkinen:
    > On 9/18/07, Konrad Meyer <> wrote:
    > > Just FYI -- I've been running your script since about 6.5 hours ago, I'=

    ll
    > > reply again when it's actually done. So far no problems.

    >=20
    > Big library, eh :)
    > If it's still running, I doubt it's going to do much more...
    >=20
    > What has it printed out?
    >=20
    > Thanks!
    > --
    > Ilmari Heikkinen
    > http://fhtr.blogspot.com


    Well, I'm having it run over all of them, then print the output. But
    sometimes mplayer just starts eating 100% cpu and doesn't exit, so the
    script gets stuck on the one song.

    =2D-=20
    Konrad Meyer <> http://konrad.sobertillnoon.com/

    --nextPart1551355.kC21M76nqv
    Content-Type: application/pgp-signature; name=signature.asc
    Content-Description: This is a digitally signed message part.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQBG8BtaCHB0oCiR2cwRAhloAJ9dpm9Rvm2FQfWI17QkwsLjR3O+CgCeKJD5
    bRiqoqcOeQ6lc8Yq0kfyzNI=
    =XbKP
    -----END PGP SIGNATURE-----

    --nextPart1551355.kC21M76nqv--
     
    Konrad Meyer, Sep 18, 2007
    #13
  14. Ilmari Heikkinen

    Konrad Meyer Guest

    --nextPart2146575.UDm0dpDfa2
    Content-Type: text/plain;
    charset="utf-8"
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    Quoth Ilmari Heikkinen:
    > On 9/18/07, Konrad Meyer <> wrote:
    > > Just FYI -- I've been running your script since about 6.5 hours ago, I'=

    ll
    > > reply again when it's actually done. So far no problems.

    >=20
    > Big library, eh :)
    > If it's still running, I doubt it's going to do much more...
    >=20
    > What has it printed out?
    >=20
    > Thanks!
    > --
    > Ilmari Heikkinen
    > http://fhtr.blogspot.com


    As a matter of fact (I only had to kill mplayer 5-7 times):
    Genre
    Name
    Artist
    Creation Date
    Album
    Track
    Title
    Year
    Comment
    name
    author
    Comments

    HTH,
    =2D-=20
    Konrad Meyer <> http://konrad.sobertillnoon.com/

    --nextPart2146575.UDm0dpDfa2
    Content-Type: application/pgp-signature; name=signature.asc
    Content-Description: This is a digitally signed message part.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQBG8Bx8CHB0oCiR2cwRAqipAJ95sBGhGJcs7E8eUkpSO0RWprB2WwCgwpM0
    A7VynSNfUThoevP4MHp8Ax0=
    =niK7
    -----END PGP SIGNATURE-----

    --nextPart2146575.UDm0dpDfa2--
     
    Konrad Meyer, Sep 18, 2007
    #14
  15. Ilmari Heikkinen

    Konrad Meyer Guest

    --nextPart11033769.INX5gEHWCS
    Content-Type: text/plain;
    charset="utf-8"
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    Quoth Ilmari Heikkinen:
    > On 9/18/07, Konrad Meyer <> wrote:
    > > Just FYI -- I've been running your script since about 6.5 hours ago, I'=

    ll
    > > reply again when it's actually done. So far no problems.

    >=20
    > Big library, eh :)
    > If it's still running, I doubt it's going to do much more...
    >=20
    > What has it printed out?
    >=20
    > Thanks!
    > --
    > Ilmari Heikkinen
    > http://fhtr.blogspot.com


    Ok, now it successfully runs over my entire collection. Yay! Now it's time
    to fix all those tag-less songs.

    Thanks much,
    =2D-=20
    Konrad Meyer <> http://konrad.sobertillnoon.com/

    --nextPart11033769.INX5gEHWCS
    Content-Type: application/pgp-signature; name=signature.asc
    Content-Description: This is a digitally signed message part.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQBG8IF5CHB0oCiR2cwRAgcqAJ43Bglz6062sGaSsmbUK/zWEXAQzwCeLFWH
    jafx4CXbUV3d3XEcRnIoXA4=
    =5cfs
    -----END PGP SIGNATURE-----

    --nextPart11033769.INX5gEHWCS--
     
    Konrad Meyer, Sep 19, 2007
    #15
  16. On 9/18/07, Konrad Meyer <> wrote:
    > Quoth Ilmari Heikkinen:
    > > On 9/18/07, Konrad Meyer <> wrote:
    > > > Just FYI -- I've been running your script since about 6.5 hours ago, I'll
    > > > reply again when it's actually done. So far no problems.

    > >
    > > Big library, eh :)
    > > If it's still running, I doubt it's going to do much more...
    > >
    > > What has it printed out?
    > >
    > > Thanks!
    > > --
    > > Ilmari Heikkinen
    > > http://fhtr.blogspot.com

    >
    > As a matter of fact (I only had to kill mplayer 5-7 times):
    > Genre
    > Name
    > Artist
    > Creation Date
    > Album
    > Track
    > Title
    > Year
    > Comment
    > name
    > author
    > Comments
    >


    Alright, thanks a lot! I was missing 'name' and 'author', the rest I
    had already.
     
    Ilmari Heikkinen, Sep 19, 2007
    #16
  17. Ilmari Heikkinen

    Konrad Meyer Guest

    --nextPart1699522.G4aO2yeNWv
    Content-Type: text/plain;
    charset="utf-8"
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    Quoth Ilmari Heikkinen:
    > On 9/18/07, Konrad Meyer <> wrote:
    > > Quoth Ilmari Heikkinen:
    > > > On 9/18/07, Konrad Meyer <> wrote:
    > > > > Just FYI -- I've been running your script since about 6.5 hours ago=

    ,=20
    I'll
    > > > > reply again when it's actually done. So far no problems.
    > > >
    > > > Big library, eh :)
    > > > If it's still running, I doubt it's going to do much more...
    > > >
    > > > What has it printed out?
    > > >
    > > > Thanks!
    > > > --
    > > > Ilmari Heikkinen
    > > > http://fhtr.blogspot.com

    > >
    > > As a matter of fact (I only had to kill mplayer 5-7 times):
    > > Genre
    > > Name
    > > Artist
    > > Creation Date
    > > Album
    > > Track
    > > Title
    > > Year
    > > Comment
    > > name
    > > author
    > > Comments
    > >

    >=20
    > Alright, thanks a lot! I was missing 'name' and 'author', the rest I
    > had already.


    Alright, glad to help.

    =2D-=20
    Konrad Meyer <> http://konrad.sobertillnoon.com/

    --nextPart1699522.G4aO2yeNWv
    Content-Type: application/pgp-signature; name=signature.asc
    Content-Description: This is a digitally signed message part.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (GNU/Linux)

    iD8DBQBG8M46CHB0oCiR2cwRAjD1AJ47lleM64l1pdftQwIGQz39GcQ4QwCeKgPa
    g0CV4OsK0FiN+LcRuWa+XtU=
    =aXdg
    -----END PGP SIGNATURE-----

    --nextPart1699522.G4aO2yeNWv--
     
    Konrad Meyer, Sep 19, 2007
    #17
  18. Re: Metadata 0.5

    Ilmari Heikkinen wrote:
    >
    > tarball: http://dark.fhtr.org/repos/metadata/metadata-0.5.tar.gz
    > git: http://dark.fhtr.org/repos/metadata
    >


    These links don't work. Is there somewhere else I can find this project?
    Is it available as a gem?

    >
    > Description
    > -----------
    >
    > This package `Metadata' comes with a library called `metadata' and
    > a small program called `mdh'.
    >
    > The library probes files for their metadata (e.g. jpeg dimensions
    > and camera make, mp3 artist, pdf word count) and returns the metadata
    > as a Hash.
    >
    > Mdh can print out file metadata as YAML and package the metadata
    > with the file.
    >
    > This package has many dependencies since there is no single universal
    > metadata header format that all files use. Blame resource forks,
    > filename
    > extensions, bags of bytes and mimetypes.
    >


    Thanks.
    --
    Posted via http://www.ruby-forum.com/.
     
    Dan Tenenbaum, Sep 29, 2010
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brett Selleck

    Schema Metadata not a Metadata Schema

    Brett Selleck, Sep 4, 2003, in forum: XML
    Replies:
    1
    Views:
    418
    Andy Dingley
    Sep 4, 2003
  2. Ilmari Heikkinen

    [ANN] metadata extractor

    Ilmari Heikkinen, Sep 11, 2007, in forum: Ruby
    Replies:
    1
    Views:
    134
    Konrad Meyer
    Sep 13, 2007
  3. Ilmari Heikkinen

    [ANN] Metadata 0.3

    Ilmari Heikkinen, Sep 15, 2007, in forum: Ruby
    Replies:
    6
    Views:
    173
    Konrad Meyer
    Sep 15, 2007
  4. Ilmari Heikkinen

    [ANN] Metadata 1.0-rc2

    Ilmari Heikkinen, Sep 19, 2007, in forum: Ruby
    Replies:
    12
    Views:
    455
    Bill Kelly
    Sep 21, 2007
  5. Ilmari Heikkinen

    [ANN] Metadata 1.1

    Ilmari Heikkinen, Sep 24, 2007, in forum: Ruby
    Replies:
    2
    Views:
    228
    Ilmari Heikkinen
    Sep 25, 2007
Loading...

Share This Page