[SUMMARY] TumbleDRYer (#53)

R

Ruby Quiz

Great quiz Hugh! This is one of those unique ideas that was a lot of fun to
play with. There's always some neat things to be accomplished with code that
writes code.

I want to take a look at Bob Showalter's solution below. It's really just one
linear procedure, so I'll show all the code and then break it down:

require 'strscan'
require 'abbrev'

class TumbleDRYer

# minimum length of a phrase to consider
MIN_PHRASE = 10

# minimum times a phrase must occur to consider
MIN_OCCUR = 3

# minimum length for abbreviation
MIN_ABBR = 2

def initialize(string)
@input = string
end

def dry

# this will accumulate a list of repeated phrases to condense
phrases = Array.new

# this will receive the abbreviation for each phrase
abbr = Hash.new

lines = @input.to_a
loop do

# process the input data by lines. we find "phrases" by
# first finding the start and end of each "word" in the line,
# and then combining those words into longer phrases. for
# each phrase, we count the number of times it occurs in the
# total input.
phr = Hash.new
lines.each do |line|
s = StringScanner.new(line)
words = Array.new
loop do
s.scan_until(/(?=\S)/) or break
beg = s.pos
s.scan(/\S+/)
words << [ beg, s.pos ]
end

# combine words to make 'phrases'
combos(words)

# accumulate phrases, counting their occurences.
# skip phrases that are too short.
words.each do |from, to|
p = line[from, to - from]
next unless p.length >= MIN_PHRASE
phr[p] ||= 0
phr[p] += 1
end
end

# get the longest phrase that occurs the most times
longest = phr.sort_by { |k,v| -(k.length * 1000 + v)
}.find { |k,v| v >= MIN_OCCUR } or break
phrase, occurs = longest

# save the phrase, and then blank it out of the input data
# so we can search for more phrases
phrases << phrase
lines.each { |line| line.gsub!(phrase, ' ' * phrase.length) }

end

# now we have all the phrases we want to replace.
# find unique abbreviations for each phrase.
temp = Hash.new
phrases.each do |phrase|
key = phrase.scan(/\w+/).flatten.to_s.downcase
key = '_' + key unless key =~ /^[_a-zA-Z]/
key += '_' while temp.has_key? key
temp[key] = phrase
end
temp.keys.abbrev.sort.each do |s, key|
phrase = temp[key]
abbr[phrase] = s if abbr[phrase].nil? ||
abbr[phrase].length < MIN_ABBR
end

# generate the output class
puts "class WashingMachine"
puts " def initialize"
phrases.each do |phrase|
puts ' @' + abbr[phrase] + " = '" +
phrase.gsub("'", "\\\\'") + "'"
@input.gsub!(phrase, '#{@' + abbr[phrase] + '}')
end
puts " end\n"
puts " def output\nprint <<EOF"
puts @input
puts "EOF\n end\n"
puts "end"

end

private

def combos(arr, max = arr.size - 1, i = 0)
(i+1..max).each do |j|
arr << [ arr[0], arr[j][1] ]
end
combos(arr, max, i + 1) if i < max - 1
end

end

TumbleDRYer.new(ARGF.read).dry

That's the code, save one unused line I removed.

The first thing I do when reading code is to try and understand the process. To
get the lay of the land, we need to know what's called where. The path of
execution is trivial to find here. The last line is the only code, besides a
couple of require statements, outside of a class definition. We know that's
what Ruby will run.

That line tells us that we need to be examining two methods
TumbleDRYer#initialize and TumbleDRYer#dry. Don't be shy with that scrollbar
now. Zip around until you find them. TumbleDRYer#initialize is easy enough,
all it does is record the passed input. TumbleDRYer#dry is a monster, so let's
come back to it.

Anything else in this file? Well, we can see that two standard libraries are
used, strscan and abbrev. We can also see three constants at the top of the
class that start to give us ideas about how the code will break down the
sections it is meant to simplify.

Beyond that, there's only one other method, which we can assume is a helper to
TumbleDRYer#dry. It's smaller, so let's see if we can figure it out first. Uh
oh, recursion! My brain just melted and leaked out of my ear. Okay, let's see
if we can cheat and just call it. Copy and paste and irb to the rescue:
def combos(arr, max = arr.size - 1, i = 0)
(i+1..max).each do |j| ?> arr << [ arr[0], arr[j][1] ]
end
combos(arr, max, i + 1) if i < max - 1
end => nil
combos( (1..10).to_a )

=> nil

Or not. I was hoping for a more informative return value obviously.

Okay, it looks like I actually need to read a little. I see that `arr` is a
parameter of the method. That's how I decided on a sample data set. A little
farther in I can see `arr << ...`. Bingo. It modifies the array. Now I know
how to check it:
test_data = (1..10).to_a => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
combos test_data => nil
test_data
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, [1, 1], [1, 1], [1, 0], [1, 0], [1, 1],
[1, 1], [1, 0], [1, 0], [1, 1], [0, 1], [0, 0], [0, 0], [0, 1], [0, 1],
[0, 0], [0, 0], [0, 1], [1, 0], [1, 0], [1, 1], [1, 1], [1, 0], [1, 0],
[1, 1], [0, 0], [0, 1], [0, 1], [0, 0], [0, 0], [0, 1], [1, 1], [1, 1],
[1, 0], [1, 0], [1, 1], [0, 1], [0, 0], [0, 0], [0, 1], [1, 0], [1, 0],
[1, 1], [0, 0], [0, 1], [1, 1]]

Yuck. In the dictionary under "unhelpful", you will find the above listing.
More reading is required.

Well, if I finish the line I stopped at last time, I find this little nugget:
`[ arr[0], arr[j][1] ]`. That makes me think `arr` is expected to be an
Array of Arrays. It looks like the subarrays only need two items, since the
indexes are 0 and 1. Third try is a charm:
test_data = [1, 3, 5, 7, 9].map { |n| [n, n + 1] } => [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
combos test_data => nil
test_data
=> [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [1, 4], [1, 6], [1, 8], [1, 10],
[3, 6], [3, 8], [3, 10], [5, 8], [5, 10], [7, 10]]

That looks promising. Now what are we seeing? Obviously it added some
groupings to the end of the Array. 1 was already seen with 2, but it also
paired it with 4, 6, 8, and 10. That's all of the other second elements. Then
3 is paired with 6, 8, and 10 and it was already paired with 4, which leaves
only 2 missing. Now I see the pattern. Each first element is paired with all
of the second elements that come after it, in the original Array.

Even knowing how that works, I'm not sure I understand what that is actually for
yet. I'll file it away in the back of my mind and see if I'm ready to tackle
TumbleDRYer#dry.

Here's the first chunk of that method, to save you some scrolling:

def dry

# this will accumulate a list of repeated phrases to condense
phrases = Array.new

# this will receive the abbreviation for each phrase
abbr = Hash.new

lines = @input.to_a
loop do

# process the input data by lines. we find "phrases" by
# first finding the start and end of each "word" in the line,
# and then combining those words into longer phrases. for
# each phrase, we count the number of times it occurs in the
# total input.
phr = Hash.new
lines.each do |line|
s = StringScanner.new(line)
words = Array.new
loop do
s.scan_until(/(?=\S)/) or break
beg = s.pos
s.scan(/\S+/)
words << [ beg, s.pos ]
end

# combine words to make 'phrases'
combos(words)

# accumulate phrases, counting their occurrences.
# skip phrases that are too short.
words.each do |from, to|
p = line[from, to - from]
next unless p.length >= MIN_PHRASE
phr[p] ||= 0
phr[p] += 1
end
end

# get the longest phrase that occurs the most times
longest = phr.sort_by { |k,v| -(k.length * 1000 + v)
}.find { |k,v| v >= MIN_OCCUR } or break
phrase, occurs = longest

# save the phrase, and then blank it out of the input data
# so we can search for more phrases
phrases << phrase
lines.each { |line| line.gsub!(phrase, ' ' * phrase.length) }

end

# ...

Luckily, that's well commented code. The comments easily explain what the
variables are for, and then we get an introduction to the process in `loop do
... end`. As soon as I read that, TumbleDRYer#combos clicks into place.

The Array of Arrays passed to TumbleDRYer#combos holds a start and end position
for each word. The start positions are combined with all of the end positions
after that word to give the possible phrases.

Before the call to TumbleDRYer#combos, we can see the Array of Arrays being
created, with a little StringScanner work. Words are found by scanning
non-whitespace characters, because replacing at any other boundaries pretty much
kills the human readable goal of the output.

Just beneath that call to TumbleDRYer#combos the phrases are assembled, checked
for the minimum length, and tallied in the phrase count. The chunk of code
right after that selects the longest phrase that occurs the most times for
substitution. Note the clever use of negation in #sort_by here, to reverse the
order. The last couple of lines save the phrase and replace it with whitespace,
so it won't match in future iterations.

Remember, this is all inside of a big `loop do ... end`, so by the time we break
out of here, all the replacement phrases will have been selected.

# ...

# now we have all the phrases we want to replace.
# find unique abbreviations for each phrase.
temp = Hash.new
phrases.each do |phrase|
key = phrase.scan(/\w+/).flatten.to_s.downcase
key = '_' + key unless key =~ /^[_a-zA-Z]/
key += '_' while temp.has_key? key
temp[key] = phrase
end
temp.keys.abbrev.sort.each do |s, key|
phrase = temp[key]
abbr[phrase] = s if abbr[phrase].nil? ||
abbr[phrase].length < MIN_ABBR
end

# ...

This section of the method just finds names for each of the replaced phrases.
It generally choses the first few letters or numbers of the phrase, unless it
needs to insert a _ character or two to maintain Ruby syntax or break
ambiguities. A lot of the work here is done by the abbrev library. If you're
not familiar with how that works, ask irb:
require "abbrev" => true
names = %w{James Edward Gray II} => ["James", "Edward", "Gray", "II"]
names.respond_to? :abbrev => true
names.abbrev
=> {"II"=>"II", "Gr"=>"Gray", "Gra"=>"Gray", "Jam"=>"James", "Ja"=>"James",
"Edw"=>"Edward", "Edwa"=>"Edward", "Edwar"=>"Edward", "Gray"=>"Gray",
"James"=>"James", "Edward"=>"Edward", "E"=>"Edward", "G"=>"Gray",
"Jame"=>"James", "I"=>"II", "Ed"=>"Edward", "J"=>"James"}

It turns an Array into a Hash of all the unique abbreviations that can be used
to form the words in the original Array.

Last little bit of code:

# ...

# generate the output class
puts "class WashingMachine"
puts " def initialize"
phrases.each do |phrase|
puts ' @' + abbr[phrase] + " = '" +
phrase.gsub("'", "\\\\'") + "'"
@input.gsub!(phrase, '#{@' + abbr[phrase] + '}')
end
puts " end\n"
puts " def output\nprint <<EOF"
puts @input
puts "EOF\n end\n"
puts "end"

end

That just outputs the DRYed code to STDOUT. This output uses a Ruby heredoc
with String interpolation. Unfortunately, I think that means that Ruby code,
with its own String interpolation, will confuse it. Have a look at Dominik
Bathon's code for custom interpolation or Daniel Sheppard's code for ERb
templates that even make use of arguments in replacement.

As always my thanks to all the creative souls that gave this quiz a try. All of
the solutions are educational.

Ruby Quiz will take a week off now so I can run up to Denver and wish my sister,
Nicole Blake Thanner, a happy sweet sixteenth birthday!
 
J

James Edward Gray II

Ruby Quiz will take a week off now...

Just FYI, there will be a Ruby Quiz on the 11th, but it will be
later. My return flight doesn't land until that evening and I can't
launch it until I'm back home. It will make the right day, but not
at the usual morning time. Thanks for your patients.

James Edward Gray II
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top