implementing python's os.walk

Discussion in 'Ruby' started by Brad Volz, Dec 16, 2008.

  1. Brad Volz

    Brad Volz Guest

    Hello,

    I seem to be having some difficulty creating a version of python's
    os.walk() for ruby, and I was hoping for some pointers.

    As some background, python's os.walk() [1] is a generator function.
    It is passed the top of a directory tree and it returns the following
    for each subdirectory that it encounters:
    . the working directory
    . an Array of subdirectories
    . an Array of non-directory files

    Here is some truncated output for my use case:

    >>> import os
    >>> repo = '/usr/local/nfsen/profiles-data/live/lax1er1'
    >>> for root, dirs, files in os.walk(repo):

    ... if len(files) == 288:
    ... print root
    ...
    /usr/local/nfsen/profiles-data/live/lax1er1/2008/11/11
    ..
    /usr/local/nfsen/profiles-data/live/lax1er1/2008/12/13

    Essentially, when there are exactly 288 files in a subdirectory, I
    want to print or otherwise do something with the working directory.

    Here is my attempt at simply translating this library function to ruby:

    #! /usr/bin/env ruby

    require 'pp'

    dirs = [ '/usr/local/nfsen/profiles-data/live/lax1er1' ]

    def find_dirs(top)
    dirs = []
    nondirs = []
    Dir.entries(top).each do |f|
    next if f =~ /(\.$|\.\.$)/
    full_path = [top, f].join('/')
    if File.directory?(full_path)
    dirs.push(full_path)
    else
    nondirs.push(full_path)
    end
    end

    yield top, dirs, nondirs

    dirs.each do |d|
    if File.directory?(d)
    for o in find_dirs(d) { |a,b,c| puts "#{a} #{b} #{c}"}
    yield o
    end
    end
    end
    end

    find_dirs(dirs[0]) do |top,dirs,nondirs|
    if nondirs.length == 288
    puts "#{top}"
    end
    end

    There are some things that I know are wrong or missing, but that is
    due to trying to get something to run at all without throwing an
    exception.

    The part that I think is totally wrong is:

    for o in find_dirs(d) { |a,b,c| puts "#{a} #{b} #{c}"}

    It's really only in there currently to keep the from getting
    'LocalJumpError: no block given.' Unfortunately, I have no idea what
    the correct way to deal with this would be.

    The missing part would be including the directory contents in addition
    to the working directory and the Array of subdirectories.

    So, I guess my main questions would be: What do I need to do to get
    this sort of a generator to work? Do I need to wrap this up in a
    'class' or is a 'def' sufficient? What should the block look like,
    and where should it be in the code?

    Thanks for reading,

    Brad

    [1] http://svn.python.org/view/python/branches/release25-maint/Lib/os.py?rev=62757&view=auto
     
    Brad Volz, Dec 16, 2008
    #1
    1. Advertising

  2. Brad Volz

    Hugh Sasse Guest

    On Tue, 16 Dec 2008, Brad Volz wrote:

    > Hello,
    >
    > I seem to be having some difficulty creating a version of python's os.walk()
    > for ruby, and I was hoping for some pointers.
    >
    > As some background, python's os.walk() [1] is a generator function. It is


    You can have generators in Ruby
    ri Generator
    furnishes you with the details...

    > passed the top of a directory tree and it returns the following for each
    > subdirectory that it encounters:
    > . the working directory
    > . an Array of subdirectories
    > . an Array of non-directory files
    >
    > Here is some truncated output for my use case:
    >
    > > > > import os
    > > > > repo = '/usr/local/nfsen/profiles-data/live/lax1er1'
    > > > > for root, dirs, files in os.walk(repo):

    > ... if len(files) == 288:
    > ... print root
    > ...
    > /usr/local/nfsen/profiles-data/live/lax1er1/2008/11/11
    > ..
    > /usr/local/nfsen/profiles-data/live/lax1er1/2008/12/13
    >
    > Essentially, when there are exactly 288 files in a subdirectory, I want to
    > print or otherwise do something with the working directory.


    OK.
    >
    > Here is my attempt at simply translating this library function to ruby:
    >
    > #! /usr/bin/env ruby
    >
    > require 'pp'
    >
    > dirs = [ '/usr/local/nfsen/profiles-data/live/lax1er1' ]
    >
    > def find_dirs(top)
    > dirs = []
    > nondirs = []
    > Dir.entries(top).each do |f|
    > next if f =~ /(\.$|\.\.$)/


    or maybe
    next if f =~ /^\.\.?$/
    or
    next if f =~ /^\.{1,2}$/

    > full_path = [top, f].join('/')


    full_path = File.join(tmp,f) # separator agnostic

    > if File.directory?(full_path)
    > dirs.push(full_path)
    > else
    > nondirs.push(full_path)
    > end
    > end
    >
    > yield top, dirs, nondirs


    yielding to a proc with arity 3....

    >
    > dirs.each do |d|
    > if File.directory?(d)
    > for o in find_dirs(d) { |a,b,c| puts "#{a} #{b} #{c}"}
    > yield o


    yielding to a proc with arity 1

    That may be one problem

    > end
    > end
    > end
    > end
    >
    > find_dirs(dirs[0]) do |top,dirs,nondirs|
    > if nondirs.length == 288
    > puts "#{top}"
    > end
    > end
    >
    > There are some things that I know are wrong or missing, but that is due to
    > trying to get something to run at all without throwing an exception.


    ri Find is short enough to quote:

    ------------------------------------------------------------ Class: Find
    The +Find+ module supports the top-down traversal of a set of file
    paths.

    For example, to total the size of all files under your home
    directory, ignoring anything in a "dot" directory (e.g.
    $HOME/.ssh):

    require 'find'

    total_size = 0

    Find.find(ENV["HOME"]) do |path|
    if FileTest.directory?(path)
    if File.basename(path)[0] == ?.
    Find.prune # Don't look any further into this directory.
    else
    next
    end
    else
    total_size += FileTest.size(path)
    end
    end

    ------------------------------------------------------------------------


    Instance methods:
    -----------------
    find, prune


    That will do most of the lifting for you...
    [...]
    Hugh
     
    Hugh Sasse, Dec 16, 2008
    #2
    1. Advertising

  3. Brad Volz wrote:
    > As some background, python's os.walk() [1] is a generator function.
    > It is passed the top of a directory tree and it returns the following
    > for each subdirectory that it encounters:
    > . the working directory
    > . an Array of subdirectories
    > . an Array of non-directory files


    The normal 'ruby way' to do this would be as an object which *yields*
    each of these things in turn, rather than returning them.

    In many cases you can use it directly like this. If you want to turn it
    into a generator you can wrap it using generator.rb; or more cleanly in
    ruby 1.9, turn it into an Enumerator, which has this functionality built
    in.

    class Foo
    def all_dirs
    yield "dir1"
    yield "dir2"
    yield "dir3"
    end
    end

    foo = Foo.new

    # Normal style
    foo.all_dirs { |x| p x }

    # Generator style (ruby 1.9, uses Fiber)
    g = foo.to_enum:)all_dirs)
    3.times { p g.next }

    # Generator style (ruby 1.8, beware uses callcc)
    require 'generator'
    require 'enumerator'
    g = Generator.new(foo.to_enum:)all_dirs))
    3.times { p g.next }
    --
    Posted via http://www.ruby-forum.com/.
     
    Brian Candler, Dec 16, 2008
    #3
  4. Brad Volz wrote:
    > Hello,
    >
    > I seem to be having some difficulty creating a version of python's
    > os.walk() for ruby, and I was hoping for some pointers.


    You should really know how to use Ruby's stdlib named "find".
    http://www.ruby-doc.org/stdlib/libdoc/find/rdoc/index.html

    Below is another twisted example in Ruby 1.9. It can be used much like python's
    generators:

    irb(main):060:0> for root, dirs, files in os_walk("/dev/disk") do
    irb(main):061:1* puts root
    irb(main):062:1> end



    require 'find'
    require 'pathname'

    def os_walk(dir)
    ret = Fiber.new do
    Pathname(dir).find do |ent|
    next unless ent.directory?
    dirs, files = ent.children.partition {|i| i.directory? }
    Fiber.yield ent, dirs, files
    end
    raise
    end
    def ret.each
    loop { yield resume } rescue self
    end
    ret
    end
     
    Urabe Shyouhei, Dec 16, 2008
    #4
  5. 2008/12/16 Brad Volz <>:
    > I seem to be having some difficulty creating a version of python's os.walk()
    > for ruby, and I was hoping for some pointers.
    >
    > As some background, python's os.walk() [1] is a generator function. It is
    > passed the top of a directory tree and it returns the following for each
    > subdirectory that it encounters:
    > . the working directory
    > . an Array of subdirectories
    > . an Array of non-directory files
    >
    > Here is some truncated output for my use case:
    >
    >>>> import os
    >>>> repo = '/usr/local/nfsen/profiles-data/live/lax1er1'
    >>>> for root, dirs, files in os.walk(repo):

    > ... if len(files) == 288:
    > ... print root
    > ...
    > /usr/local/nfsen/profiles-data/live/lax1er1/2008/11/11
    > ..
    > /usr/local/nfsen/profiles-data/live/lax1er1/2008/12/13
    >
    > Essentially, when there are exactly 288 files in a subdirectory, I want to
    > print or otherwise do something with the working directory.


    > The part that I think is totally wrong is:
    >
    > for o in find_dirs(d) { |a,b,c| puts "#{a} #{b} #{c}"}
    >
    > It's really only in there currently to keep the from getting
    > 'LocalJumpError: no block given.' Unfortunately, I have no idea what the
    > correct way to deal with this would be.
    >
    > The missing part would be including the directory contents in addition to
    > the working directory and the Array of subdirectories.
    >
    > So, I guess my main questions would be: What do I need to do to get this
    > sort of a generator to work? Do I need to wrap this up in a 'class' or is a
    > 'def' sufficient? What should the block look like, and where should it be
    > in the code?


    You have a recursive algorithm here but you want each call of the
    method invoke the *same block*. This can be achieved by forwarding
    the block with the &b notation:

    def find_dirs(top, &b)
    ...
    # enter recursion
    find_dirs(d, &b)
    end

    The way you did it, every invocation yields to the caller's block
    which is only the correct one for the first caller; all others yield
    to the block in their parent calling find_dirs.

    You might as well be able to create a totally different solution using Find:

    require 'find'

    roots.each do |root|
    dir_count = Hash.new 0

    Find.find root do |file|
    d, f = File.split file
    next if /\A\.{1,2}\z/ =~ f
    dir_count[d] += 1 if File.file? file
    end

    dir_count.each do |dir, cnt|
    puts root if cnt == 288
    end
    end

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
     
    Robert Klemme, Dec 16, 2008
    #5
  6. Brad Volz

    Brad Volz Guest

    Many thanks for the suggestions.

    I wasn't previously aware of Find, I have been able to get it to
    provide all of the information that I need.

    Thanks again!

    Brad
     
    Brad Volz, Dec 17, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. SD
    Replies:
    1
    Views:
    448
  2. looping
    Replies:
    3
    Views:
    440
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Oct 13, 2006
  3. Marcus Alves Grando
    Replies:
    7
    Views:
    469
    Marcus Alves Grando
    Nov 14, 2007
  4. Mark Tolonen
    Replies:
    0
    Views:
    556
    Mark Tolonen
    Jun 25, 2009
  5. Gabriel Genellina

    Re: os.walk and os.listdir problems python 3.0+

    Gabriel Genellina, Jun 27, 2009, in forum: Python
    Replies:
    0
    Views:
    466
    Gabriel Genellina
    Jun 27, 2009
Loading...

Share This Page