Comparing directory contents

Discussion in 'Ruby' started by dave davidson, Aug 3, 2005.

  1. Hi all,

    I work in the SCM dept of a Windows software shop. A typical software build
    involves us getting the code from an engineer, compiling the binaries, gathering
    any support files, and then wrapping it in an installer (Installshield). We run
    the installer to make sure everything looks ok. As quick-and-dirty sanity check
    to make sure we got everything, we go into the install folder, do a 'dir /s',
    and pipe the output to a text file. If the file list in the current build
    matches the file list of the previous, we give it the ok. These lists are saved
    on disk, printed and filed with the build paperwork so we can refer to them
    again if necessary.

    This method works surprisingly well for catching files that were mistakenly
    excluded, but as you can imagine it gets very tedious and error-prone since we
    have to hand-check the output. Additionally, many times we are asked by the
    engineer to include additional support files, or remove existing ones. I'm
    thinking there must be a better way, or better yet, a Ruby Way :)
    I am relatively new to the language, so I don't really know which angle to
    attack it from. The basic gist would be to read in the previous file list
    output, strip any junk (extra spaces, line breaks, etc), and do the same for the
    current, so what's left is two lists of just pure filenames (don't care about
    timestamps or attributes right now). The script would process the lists and the
    result would be something like "Indentical" or "Extra files: [filenames]" or
    "Removed files: [filenames]".

    I'm wondering if something like this already exists. A search of rubyforge and
    RAA, however, did not turn up anything this specific, although I really wasn't
    sure what I should be looking for. If I could be pointed to a base library that
    would get me going, that would be great. Any insights on implementation would
    also be greatly apprecited. Thanks!
    dave davidson, Aug 3, 2005
    #1
    1. Advertising

  2. On 03/08/05, dave davidson <> wrote:
    > Hi all,
    >=20
    > I work in the SCM dept of a Windows software shop. A typical software bu=

    ild
    > involves us getting the code from an engineer, compiling the binaries, ga=

    thering
    > any support files, and then wrapping it in an installer (Installshield). =

    We run
    > the installer to make sure everything looks ok. As quick-and-dirty sanit=

    y check
    > to make sure we got everything, we go into the install folder, do a 'dir =

    /s',
    > and pipe the output to a text file. If the file list in the current buil=

    d
    > matches the file list of the previous, we give it the ok. These lists ar=

    e saved
    > on disk, printed and filed with the build paperwork so we can refer to th=

    em
    > again if necessary.
    >=20
    > This method works surprisingly well for catching files that were mistaken=

    ly
    > excluded, but as you can imagine it gets very tedious and error-prone sin=

    ce we
    > have to hand-check the output. Additionally, many times we are asked by =

    the
    > engineer to include additional support files, or remove existing ones. I'=

    m
    > thinking there must be a better way, or better yet, a Ruby Way :)
    > I am relatively new to the language, so I don't really know which angle t=

    o
    > attack it from. The basic gist would be to read in the previous file lis=

    t
    > output, strip any junk (extra spaces, line breaks, etc), and do the same =

    for the
    > current, so what's left is two lists of just pure filenames (don't care a=

    bout
    > timestamps or attributes right now). The script would process the lists =

    and the
    > result would be something like "Indentical" or "Extra files: [filenames]"=

    or
    > "Removed files: [filenames]".
    >=20
    > I'm wondering if something like this already exists. A search of rubyfor=

    ge and
    > RAA, however, did not turn up anything this specific, although I really w=

    asn't
    > sure what I should be looking for. If I could be pointed to a base libra=

    ry that
    > would get me going, that would be great. Any insights on implementation =

    would
    > also be greatly apprecited. Thanks!
    >=20
    >=20
    >=20


    Does this help?

    bschroed@black:~/svn/projekte/ruby-things$ ls -1 > before.list
    bschroed@black:~/svn/projekte/ruby-things$ touch another.one
    bschroed@black:~/svn/projekte/ruby-things$ ls -1 > after.list
    bschroed@black:~/svn/projekte/ruby-things$ irb
    irb(main):001:0> before =3D File.read('before.list').to_a
    =3D> ["before.list\n", ...]
    irb(main):002:0> after =3D File.read('after.list').to_a
    =3D> ["before.list\n", "after.list\n", "another.one\n", ...]
    irb(main):003:0> before - after
    =3D> []
    irb(main):004:0> after - before
    =3D> ["after.list\n", "another.one\n"]

    regards,

    Brian

    --=20
    http://ruby.brian-schroeder.de/

    Stringed instrument chords: http://chordlist.brian-schroeder.de/
    Brian Schröder, Aug 3, 2005
    #2
    1. Advertising

  3. dave davidson

    Jacob Fugal Guest

    Jacob Fugal, Aug 3, 2005
    #3
  4. Hello Jacob,

    JF> Though Brian Schr=F6der gave an interesting irb implementation, what =
    you
    JF> really need is diff[1]. And don't despair, there is diff for
    JF> Windows[2] (via the command line).

    JF> The GNU developers have put a *lot*
    JF> of work and refinement into this heavily used tool -- don't reinvent
    JF> the wheel.

    <flame>
    And they still got nothing what even comes close to "AraxisMerge" on
    Windows, neither from the GUI nor from the quality of the diff algorithm.
    </flame>

    But back to the question from the original poster, i think diff is a
    complete wrong idea as he said he only needs the file names and the conte=
    nt
    does not matter for an installer as he puts the complete file into the
    setup.exe.

    I don't see a very ruby way to solve it as it is a not very
    complicated task to process strings. Build two hashs over the file lists
    and compare them item by item. Just parsing the previous file list would =
    be litte bit
    complicated if the Installshield file format must be parsed and not a
    plain string list, but still it should be able to write the script in
    100 lines. Or maybe i did not understand dave's real problem.


    --=20
    Best regards, emailto: scholz at scriptolutions d=
    ot com
    Lothar Scholz http://www.ruby-ide.com
    CTO Scriptolutions Ruby, PHP, Python IDE 's
    =20
    Lothar Scholz, Aug 3, 2005
    #4
  5. dave davidson

    Jacob Fugal Guest

    On 8/3/05, Lothar Scholz <> wrote:
    > JF> Though Brian Schr=F6der gave an interesting irb implementation, what =

    you
    > JF> really need is diff[1]. And don't despair, there is diff for
    > JF> Windows[2] (via the command line).
    >=20
    > JF> The GNU developers have put a *lot*
    > JF> of work and refinement into this heavily used tool -- don't reinvent
    > JF> the wheel.
    >=20
    > <flame>
    > And they still got nothing what even comes close to "AraxisMerge" on
    > Windows, neither from the GUI nor from the quality of the diff algorithm.
    > </flame>


    Ok, to qualify my statement: Don't reinvent this particular wheel for
    a once-off solution. I won't say that someone else can make a better
    wheel when that's their primary goal. I don't think Dave Davidson's
    goal is to develop a new diff utility. Regarding AraxisMerge, I've
    never heard of it. It may be better than GNU DiffUtils. I can't make
    any judgement there.

    > But back to the question from the original poster, i think diff is a
    > complete wrong idea as he said he only needs the file names and the conte=

    nt
    > does not matter for an installer as he puts the complete file into the
    > setup.exe.


    diff -qr | grep '^Only'

    Know the tool before dismissing it.

    Jacob Fugal
    Jacob Fugal, Aug 4, 2005
    #5
  6. dave davidson

    Jacob Fugal Guest

    On 8/4/05, Jacob Fugal <> wrote:
    > diff -qr | grep '^Only'


    Rather,

    diff -qr DIR1 DIR2 | grep '^Only'

    Sorry for the shabby proofreading...

    Jacob Fugal
    Jacob Fugal, Aug 4, 2005
    #6
  7. dave davidson

    Guest

    On Fri, 5 Aug 2005, Jacob Fugal wrote:

    > On 8/3/05, Lothar Scholz <> wrote:
    >> JF> Though Brian Schr=F6der gave an interesting irb implementation, what =

    > you
    >> JF> really need is diff[1]. And don't despair, there is diff for
    >> JF> Windows[2] (via the command line).
    >> =20
    >> JF> The GNU developers have put a *lot*
    >> JF> of work and refinement into this heavily used tool -- don't reinvent
    >> JF> the wheel.
    >> =20
    >> <flame>
    >> And they still got nothing what even comes close to "AraxisMerge" on
    >> Windows, neither from the GUI nor from the quality of the diff algorithm.
    >> </flame>

    >
    > Ok, to qualify my statement: Don't reinvent this particular wheel for
    > a once-off solution. I won't say that someone else can make a better
    > wheel when that's their primary goal. I don't think Dave Davidson's
    > goal is to develop a new diff utility. Regarding AraxisMerge, I've
    > never heard of it. It may be better than GNU DiffUtils. I can't make
    > any judgement there.
    >
    >> But back to the question from the original poster, i think diff is a
    >> complete wrong idea as he said he only needs the file names and the conte=

    > nt
    >> does not matter for an installer as he puts the complete file into the
    >> setup.exe.

    >
    > diff -qr | grep '^Only'
    >
    > Know the tool before dismissing it.


    the way i read the OP's post the original contents should be stored and
    alterable. the diff approach would require both directories to exist and be
    stored and i think the OP wanted to store only the __inventory__ of the dir -
    not the actual dir. so not only would the storage/database requirements
    skyrocket, but you'd be using a sledgehammer to pound in a mini-tack. this
    problem is quite easily solved in only a few lines of ruby - including
    database code, command line parsing, etc:


    here's the code:


    harp:~ > cat ./dirlist

    #! /usr/bin/env ruby
    require 'pstore'
    require 'yaml'
    require 'getoptlong'

    class DirDb < ::pStore
    def [] dir
    transaction{ super(exp(dir)) rescue nil}
    end
    def []= dir, filelist
    transaction{ super(exp(dir), filelist) }
    end
    def exp dir
    File::expand_path dir
    end
    end

    class FileList < ::Array
    def initialize dir
    @dir = File::expand_path dir
    @glob = File::join @dir, '**', '*'
    replace Dir[@glob].map{|f| File::expand_path f}
    end
    def basenames
    map{|f| f.gsub(%r|^#{ Regexp::escape @dir }/*|,'')}
    end
    def add filename
    self << File::expand_path(File::join(@dir, filename))
    end
    def delete filename
    super(File::expand_path(File::join(@dir, filename)))
    end
    def to_yaml
    to_a.to_yaml
    end
    end

    class Main
    def self::main(*a, &b)
    new(*a, &b).run
    end
    def initialize
    gl = GetoptLong::new ['--db', '-d', GetoptLong::REQUIRED_ARGUMENT]
    gl.each do |opt, arg|
    case opt
    when /db/
    @db_path = arg
    end
    end
    @db_path ||= File::expand_path(File::join('~', '.dirdb'))
    @mode, @mode_args = ARGV.shift, ARGV
    @mode ||= 'help'
    @db = DirDb::new @db_path
    end
    def run
    send(@mode, *@mode_args)
    end
    def scan dir
    @db[dir] = FileList::new dir
    show dir
    end
    def show dir
    y @db[dir]
    end
    def report old_dir, new_dir
    previous = @db[old_dir]
    current = FileList::new new_dir
    report = {}
    report['identical'] = previous.basenames & current.basenames
    report['extra'] = current.basenames - previous.basenames
    report['removed'] = previous.basenames - current.basenames
    y report
    end
    def add dir, filename
    filelist = @db[dir]
    filelist.add filename
    @db[dir] = filelist
    end
    def delete dir, filename
    filelist = @db[dir]
    filelist.delete filename
    @db[dir] = filelist
    end
    def help
    puts "#{ $0 } scan dir | show dir | report new_dir old_dir | add dir filename | delete dir filename"
    end
    end

    $0 == __FILE__ and Main::main



    and here's how you use it:


    harp:~ > mkdir version-1.0.0 && touch version-1.0.0/a version-1.0.0/b version-1.0.0/c


    harp:~ > ./dirlist
    ./dirlist scan dir | show dir | report new_dir old_dir | add dir filename | delete dir filename


    harp:~ > ./dirlist scan version-1.0.0/
    ---
    - /home/ahoward/version-1.0.0/a
    - /home/ahoward/version-1.0.0/b
    - /home/ahoward/version-1.0.0/c


    harp:~ > rm -rf version-1.0.0/


    harp:~ > mkdir version-2.0.0 && touch version-2.0.0/a version-2.0.0/b


    harp:~ > ./dirlist report version-1.0.0 version-2.0.0
    ---
    removed:
    - c
    extra: []
    identical:
    - a
    - b


    harp:~ > touch version-2.0.0/c


    harp:~ > ./dirlist report version-1.0.0 version-2.0.0
    ---
    removed: []
    extra: []
    identical:
    - a
    - b
    - c


    harp:~ > touch version-2.0.0/d


    harp:~ > ./dirlist report version-1.0.0 version-2.0.0
    ---
    removed: []
    extra:
    - d
    identical:
    - a
    - b
    - c


    harp:~ > ./dirlist add version-1.0.0 d


    harp:~ > ./dirlist report version-1.0.0 version-2.0.0
    ---
    removed: []
    extra: []
    identical:
    - a
    - b
    - c
    - d


    harp:~ > rm version-2.0.0/a


    harp:~ > ./dirlist report version-1.0.0 version-2.0.0
    ---
    removed:
    - a
    extra: []
    identical:
    - b
    - c
    - d


    harp:~ > ./dirlist delete version-1.0.0 a


    harp:~ > ./dirlist report version-1.0.0 version-2.0.0
    ---
    removed: []
    extra: []
    identical:
    - b
    - c
    - d


    in any case, i'm all for using built-in tools to accomplish tasks - but this
    task is so basic it seem silly not to just write it in pure ruby...

    kind regards.

    -a
    --
    ===============================================================================
    | email :: ara [dot] t [dot] howard [at] noaa [dot] gov
    | phone :: 303.497.6469
    | My religion is very simple. My religion is kindness.
    | --Tenzin Gyatso
    ===============================================================================
    , Aug 4, 2005
    #7
  8. All,

    Thanks so much for the hints and pointers regarding this issue... I've not had a
    chance to try all the suggestions (too busy counting files by hand :) but I just
    wanted to let you know i appreciate the help!

    Dave
    dave davidson, Aug 4, 2005
    #8
  9. dave davidson

    Jacob Fugal Guest

    On 8/4/05, <> wrote:
    > > diff -qr | grep '^Only'

    <snip>
    >=20
    > the way i read the OP's post the original contents should be stored and
    > alterable. the diff approach would require both directories to exist and=

    be
    > stored and i think the OP wanted to store only the __inventory__ of the d=

    ir -
    > not the actual dir. so not only would the storage/database requirements
    > skyrocket, but you'd be using a sledgehammer to pound in a mini-tack.


    Ok, I forgot about that constraint. I still think diff would be the
    exact tool I would use when on a *nix system:

    # Done once to build the list compared against
    $ find master_dir/ > master.list

    # Done each time to verify all files are there in the working copy
    $ find working_dir/ | diff master.list -

    I'll admit that once you start getting into pipes and such this
    solution probably won't work, or at least not as easily, on Windows.

    Jacob Fugal
    Jacob Fugal, Aug 5, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. srinukasam

    comparing the contents of memory

    srinukasam, Jun 22, 2005, in forum: VHDL
    Replies:
    5
    Views:
    568
    Ralf Hildebrandt
    Jun 23, 2005
  2. Don Adams
    Replies:
    1
    Views:
    580
    Martin Honnen
    Mar 5, 2004
  3. mike
    Replies:
    2
    Views:
    477
    Roedy Green
    Mar 15, 2008
  4. Kamarulnizam Rahim
    Replies:
    4
    Views:
    203
    Robert Klemme
    Jan 28, 2011
  5. Replies:
    0
    Views:
    178
Loading...

Share This Page