[SUMMARY] Banned Words (#9)

R

Ruby Quiz

The general idea behind a lot of the solutions to this week's quiz was pretty
basic. Try a big list (probably the whole list, in this problem), if that gets
blocked divide it into smaller lists and try again.

When one of these chunks of words gets through, we know that every words in that
chunk was clean. The higher up in our search that happens, the more work that
saves us. Because of that, this solution is ideal when there aren't a lot of
banned words, as would probably be the case in the real world example of this
quiz.

Here's my own solution as the most basic example of this process:

def isolate( list, test )
if test.clean? list.join(" ")
Array.new
elsif list.size == 1
list
else
left, right = list[0...(list.size / 2)],
list[(list.size / 2)..-1]
isolate(left, test) + isolate(right, test)
end
end

isolate() is a recursive routine that takes an Array of words and a test filter
and returns the banned words. If the entire words list passes a clean?() test,
we return an empty Array (no banned words in the given list). If we don't get
an okay from clean?() and we only have one word in the list, we've found a
banned word and we return the one word Array itself to show that. Finally, if
we didn't pass clean?() and we have more than one word, we divide the list in
half, call isolate() on each half, and combine the results of both of those
calls. Eventually, this drills down to find all the banned words.

Of course, that's just the basics. Wayne Vucenic found a very clever
optimization. If we check a big list and it gets banned, then we split it up
and check the first half, which comes back clean, we know the second have would
be banned and can skip the check. That could save a significant amount of
messages that we would otherwise need to send.

Brian Schroeder made a search for the optimal divisor for the word list, when
breaking it into chunks. Brian landed on 3 as the magic number, but there has
been some follow up discussion on Ruby Talk about his findings and if they are
complete. Regardless, do download Brian's code to see just how much thought and
effort he has put into it (and why he wants to sue me).

Jannis Harder didn't cut the word list in half at each step either. I'm not
sure exactly how Jannis arrived at the progression used, but it's an interesting
scale:

@steps = [110000,700000,300000,200000,110000,
70000,30000,20000,11000,7000,
3000,2000,1100,700,300,200,110,
70,30,20,11,7,3,2,1]

In my own testing, this doesn't seem to best Wayne's optimization though.

Jannis Harder did post a test framework that ended up being used by multiple
entries. I thought that was very helpful.

The final entry was by Michael Ulm, who used knowledge of the dictionary size
and banned word probability to optimize the number of messages needed to find
the banned words. This is a nice example of what can be done with a little bit
of knowledge, though that's probably not too likely in real world scenarios.

The main issue brought up in discussion was that my filter class had issues.
Several alternatives where proposed, each with their own issues. Matching a
word can be surprisingly tricky, when you take into account simple things like
plurals, international characters, and apostrophes. In the end, everybody seem
to just use what they could live with.

My thanks to the submitters and discussers, informative as always.

We'll try a classic programming challenge tomorrow from Knuth...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top