[QUIZ] Scrabble Stems (#12)

R

Ruby Quiz

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.grayproductions.net/ruby_quiz/

3. Enjoy!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Martin DeMello

In Scrabble parlance, a 'bingo stem' is defined as a set of six
letters that combines with a large fraction of the alphabet to anagram
to valid seven letter words (a 'bingo' is a move using all seven tiles
on your rack). For instance, one of the more prolific stems, SATIRE,
combines with twenty letters, A, B, C, D, E, F, G, H, I, K, L, M, N,
O, P, R, S, T, V, W (forming words like ASTERIA, BAITERS, RACIEST
...).

Write a program that, given a word list and a cutoff n, finds all 6
letter stems that combine with n or more letters, sorted in order of
how many distinct letters they combine with.
 
G

Giovanni Intini

When talking of 6 letter stems, do I have to find all the 6 letter
stems that also have a meaning by themselves? (Like Satire for
example). Or may I find even nonsense stems?
 
J

James Edward Gray II

When talking of 6 letter stems, do I have to find all the 6 letter
stems that also have a meaning by themselves? (Like Satire for
example). Or may I find even nonsense stems?

I don't believe the stems meaning or lack thereof is significant.

James Edward Gray II
 
M

Martin DeMello

Giovanni Intini said:
When talking of 6 letter stems, do I have to find all the 6 letter
stems that also have a meaning by themselves? (Like Satire for
example). Or may I find even nonsense stems?

No, all six letter stems count. The "meaningful" rearrangement is just a
handy way to refer to the stem.

martin
 
G

Glenn Parker

Ruby said:
In Scrabble parlance, a 'bingo stem' is defined as a set of six
letters that combines with a large fraction of the alphabet to anagram
to valid seven letter words.

Write a program that, given a word list and a cutoff n, finds all 6
letter stems that combine with n or more letters, sorted in order of
how many distinct letters they combine with.

If this quiz is really just about Scrabble, can we assume the input word
list will contain only 7-letter words?

If I filter my dictionary (2of4brif.txt) down to just 7-letter words, I
can handle this task in less than an hour of wall-clock time on a 2GHz
PC. If I allow all words with 7 or more letters, I don't know if I'll
be able to finish the processing in under a week's time.

I've certainly learned a few things about Ruby performance tuning along
the way.
 
J

James Edward Gray II

If this quiz is really just about Scrabble, can we assume the input
word list will contain only 7-letter words?

I filtered out words less than and more than seven letters.
If I filter my dictionary (2of4brif.txt) down to just 7-letter words,
I can handle this task in less than an hour of wall-clock time on a
2GHz PC. If I allow all words with 7 or more letters, I don't know if
I'll be able to finish the processing in under a week's time.

I've certainly learned a few things about Ruby performance tuning
along the way.

Hmmm, I would say that it doesn't HAVE to take that long. My program
runs in under 30 seconds. ;)

James Edward Gray II
 
G

Glenn Parker

James said:
Hmmm, I would say that it doesn't HAVE to take that long. My program
runs in under 30 seconds. ;)

Thanks, stripping it down to handle only 7-letter words made a big
difference. It runs in well under 4 seconds now. :)
 
G

Giovanni Intini

From the original post it looked like you could have anything longer than 6.
 
J

James Edward Gray II

From the original post it looked like you could have anything longer
than 6.

In the game of Scrabble, you are limited to seven "tiles". You get 50
points when you can play all seven at once.

Technically, that could be in longer words (using letters already on
the board). I didn't get the impression this quiz was concerned with
that though.

James Edward Gray II
 
G

Glenn Parker

James said:
In the game of Scrabble, you are limited to seven "tiles". You get 50
points when you can play all seven at once.

The quiz alluded to Scrabble, but I have to agree that the phrasing was
still ambiguous about the length of the input words.
 
M

Martin DeMello

Giovanni Intini said:
From the original post it looked like you could have anything longer than 6.

My bad, then - I meant six letter stems that form seven letter words
when combined with at least n out of the 26 letters in the alphabet.
There are seven-to-make-eight stems too, but as James noted, the quiz is
not concerned with them (it's a trivial extension, though).

martin
 
C

Carlos

In Scrabble parlance, a 'bingo stem' is defined as a set of six
letters that combines with a large fraction of the alphabet to anagram
to valid seven letter words (a 'bingo' is a move using all seven tiles
on your rack). For instance, one of the more prolific stems, SATIRE,
combines with twenty letters, A, B, C, D, E, F, G, H, I, K, L, M, N,
O, P, R, S, T, V, W (forming words like ASTERIA, BAITERS, RACIEST
...).

Write a program that, given a word list and a cutoff n, finds all 6
letter stems that combine with n or more letters, sorted in order of
how many distinct letters they combine with.

Here is a possible (I hope correct) solution. SATIRE combines only with 16
letters according to my dictionary.

DICT = "/usr/share/dict/words"
CUTOFF = ARGV[0].to_i

STEMS = {}

File.open(DICT) do |f|
f.each do |word|
word.chomp!
next if word.length != 7
word.downcase!
letters = word.split(//).sort!
uniques = letters.uniq
word = letters.join
uniques.each do |letter|
stem = word.sub(/#{letter}/, "")
(STEMS[stem] ||= {})[letter] = 1
end
end
end

result = STEMS.delete_if { |k,v| v.size < CUTOFF }.
sort_by { |k,v| v.size }.
reverse!.
collect! { |k,v| [k, v.size] }

result.each do |stem, combining| puts "#{stem} #{combining}" end
 
G

Glenn Parker

--------------030707030609070201020205
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
by Martin DeMello

In Scrabble parlance, a 'bingo stem' is defined as a set of six
letters that combines with a large fraction of the alphabet to anagram
to valid seven letter words (a 'bingo' is a move using all seven tiles
on your rack). For instance, one of the more prolific stems, SATIRE,
combines with twenty letters, A, B, C, D, E, F, G, H, I, K, L, M, N,
O, P, R, S, T, V, W (forming words like ASTERIA, BAITERS, RACIEST
...).

Write a program that, given a word list and a cutoff n, finds all 6
letter stems that combine with n or more letters, sorted in order of
how many distinct letters they combine with.

Here's my solution. My earlier performance problems resulted from
several very inefficient attempts at generating unique letter
combinations, i.e. list all unique sets of "N letters taken 6 at a
time". Reducing the problem to "7 letters taken 6 at a time" made that
part trivial, but I kept plugging at the more general problem.

The dictionary I used was 2of4brif.txt, available as part of the 12Dicts
package at http://prdownloads.sourceforge.net/wordlist/12dicts-4.0.zip

On my 2GHz P4, I get 5319 stems with a cutoff of 3 in 4.062 seconds. A
version optimized for 7-letter words (not shown) takes barely over 2
seconds.

If I swap line 47 in for line 48, so that all words of 7 letters or more
are processed, I get 167,227 stems with a cutoff of 3 in 2196.453
seconds (~37 min.).

--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/>

--------------030707030609070201020205
Content-Type: text/plain;
name="bingo1.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="bingo1.rb"

#!/usr/bin/env ruby

class Combiner
include Enumerable

def initialize(pick, from)
# How many objects should be returned in each set
@pick = pick
# List of objects from which to produce combinations
@from = from
end

def each
unpicked = @from.length - @pick
# Indices of objects in @from to return as a set.
selectors = (0...@pick).to_a
loop do
yield selectors.collect {|i| @from}
# Advance the selectors array to the next combination.
s = @pick - 1
s -= 1 while selectors == s + unpicked
break if s < 0
value = selectors
s.upto(@pick - 1) do |i|
selectors = (value += 1)
end
end
end
end

class BingoStems
STEMLENGTH = 6

def initialize(filename, cutoff)
@filename = filename
@cutoff = cutoff
@stems = {}
end

def find
start_time = Time.now
# Collect stems for each word in the word list
File.open(@filename) do |file|
file.each do |word|
word.chomp!
# Use the next line alternative to process longer words.
#next if word.length <= STEMLENGTH
next if word.length != (STEMLENGTH + 1)
word.downcase!
Combiner.new(STEMLENGTH, word.split(//).sort).each do |stem|
(@stems[stem.join('')] ||= {})[word] = true
end
end
end
# Discard stems with less than @cutoff word combos
@stems.delete_if {|k, v| v.length < @cutoff}
puts "found #{@stems.length} stems in #{Time.now - start_time} seconds"
self
end

def report(stream = $stdout, verbose = false)
keys = @stems.keys.sort_by {|a| @stems[a].length}
keys.each do |key|
stream.puts "#{key}: #{@stems[key].length}"
next unless verbose
@stems[key].each_key do |word|
stream.puts " " + word
end
end
stream.flush
end

end

CUTOFF = ARGV[0].to_i
DICT = ARGV[1]
BingoStems.new(DICT, CUTOFF).find.report(File.open('output.txt', 'w'), true)

--------------030707030609070201020205--
 
D

Dennis Ranke

Carlos said:
Here is a possible (I hope correct) solution. SATIRE combines only with 16
letters according to my dictionary.

Very simmilar to my solution: (it finds 18 combining letters for SATIRE
using the ENABLE2K word list)

hash = Hash.new {|h, k| h[k] = 0}
File.foreach(ARGV[0] || 'WORD.LST') do |line|
line.strip!
if line.size == 7
letters = line.downcase.scan(/./).sort.join
7.times do |i|
hash[letters[0, i] + letters[(i+1)..-1]] |= 1 << (letters - 97)
end
end
end

cutoff = (ARGV[1] || '15').to_i
count = {}
hash.each do |k, v|
v = (v & 0x5555555) + ((v>>1) & 0x5555555)
v = (v & 0x3333333) + ((v>>2) & 0x3333333)
v = (v & 0xf0f0f0f) + ((v>>4) & 0xf0f0f0f)
v = (v & 0x0ff00ff) + ((v>>8) & 0x0ff00ff)
v = (v & 0x000ffff) + ((v>>16) & 0x000ffff)
count[k] = v if v >= cutoff
end

count.keys.sort_by {|k| count[k]}.each do |letters|
printf "%s: (%d) ", letters, count[letters]
combi = hash[letters]
26.times do |i|
print((i+97).chr) if combi == 1
end
puts
end
 
J

Jannis Harder

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This is my solution for _meaningful_ stems (words in the dictionary):
#Usage: <Dictionary> <Stemsize> <Cutoff>

#On my machine:
#8728 stems
#13969 possible bingos
#1141 stems have more than 6 bingos
#stemsize is 6
#bingosize is 7
#brute force needs 3169957232 comparisons
#I need 226928 hash lookups
#22.270484 sec (searching bingos)
#6.235895 sec (reading dictionary)
#28.506379 sec (total)
#0.00326608375343721 sec / stem

stemsize = (ARGV[1] || 6).to_i
minbingos = (ARGV[2] || 6).to_i

alphabeth=("a".."z").to_a;
stems = []
bingowords = []

results = {}

start1 = Time.new
File.foreach(File.expand_path(ARGV[0]||"~/dict")) do |line|
chomped = line.chomp.downcase
stems << chomped if chomped.size == stemsize
bingowords << chomped if chomped.size == stemsize+1
end

start2 = Time.new

sbingos = bingowords.map{|word| word.split("").sort.join}
sbingohash={}

sbingos.each_index do |index|
sbingohash[sbingos[index]]=bingowords[index]
end



stems.each do |stem|

bingosfound = [];
xbingo = stem.split("")
alphabeth.each do |char|
stbingo = (xbingo+[char]).sort.join
fbingo = sbingohash[stbingo];
bingosfound << fbingo if fbingo
end


if bingosfound.size >= minbingos
results[stem] = bingosfound

end

end
out =""
results.to_a.sort_by{|e|-(e[1].size)}.each do |result|
out << "#{result[0]} #{result[1].size} #{result[1].join","}\n"
end
done = Time.new;

puts "\nOrdered:\n\n"
puts out
puts "#{stems.size} stems\n#{bingowords.size} possible
bingos\n#{results.size} stems have more than #{minbingos} bingos"
puts "stemsize is #{stemsize}\nbingosize is #{stemsize+1}"
puts "brute force needs #{stems.size*alphabeth.size*bingowords.size}
comparisons"
puts "I need #{stems.size*alphabeth.size} hash lookups"
puts "#{(done-start2).to_s} sec (searching bingos)"
puts "#{(start2-start1).to_s} sec (reading dictionary)"
puts "#{(done-start1).to_s} sec (total)"
puts "#{((done-start1).to_f/stems.size).to_s} sec / stem"








Jannis Harder

"jp6iSZmkLp5ISZlEiW5C".unpack("m")[0].unpack("C*").map{|x|x.chr}.join.
unpack("B*")[0].scan(/.{24}/){i=7;$&.scan(/..../){print\
"\e[3#{i-=1};1;40m ";$&.each_byte{|z|print" #"[z-?0,1]*2}};puts"\e[0m"}
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (Darwin)

iD8DBQFBxfY65YRWfc27RzQRAkJXAKCSR8rsHajHIrFORwy1mstEbQ4XOwCeIscl
Y4O3UkTlCLMZnD0F27a21cg=
=aRbb
-----END PGP SIGNATURE-----
 
C

Carlos

Oops... no need to interpolate. This line should be

stem = word.sub(letter, "")

as in Martin DeMello's solution.

Thanks.
 
J

James Edward Gray II

Here's my solution. It doesn't really have any clever optimizations,
but it runs in seconds.

James Edward Gray II

#!/usr/bin/env ruby

unless ARGV.size >= 2 and ARGV[1] =~ /^[1-9]|1\d|2[0-6]$/
puts "Usage: #$0 WORD_LIST_FILE(S) MINIMUM_STEM_LIMIT"
exit
end

$limit = ARGV.pop.to_i
$stems = { }

while line = ARGF.gets
line.chomp!
line.tr!("^a-zA-Z", "")
line.downcase!

next unless line.length == 7

word = line.split("")
word.each_index do |i|
stem = word.dup
stem.delete_at(i)
stem = stem.sort.join

if $stems.include?(stem)
$stems[stem] << word unless $stems[stem].include?(word)
else
$stems[stem] = [ word ]
end
end
end

$stems.each_pair do |stem, letters|
next if letters.size < $limit

puts stem
puts "\t#{letters.size}: #{letters.sort.join}"
end
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top