What's the most ruby-ish way to write this python code?

D

Drew Olson

All -

I've been using ruby for quite some time and I'm only beginning to look
at python. I really like the way ruby flows as compared to python and I
can't see myself getting pulled away from ruby. However, there is one
thing I like about python, and that is the generators. What is the best
way to translate the following code into ruby? Assume word is some
string and letters is a string of all the characters a..z:

[word[0:i]+c+word[i+1:] for i in range(len(word)) for c in letters]

This is all based on trying to efficiently rewrite
http://norvig.com/spell-correct.html in ruby. Essentially what this code
does is get the array of words resulting from inserting every character
at every possible position in the given word. I find it pretty succinct,
but I know ruby can do better! I've come up with two ways to do this in
ruby, but neither seems to "click" with me:

(0...word.size).inject([]) do |words,i|
letters.split('').each do |c|
words << word[0...i]+c+word[i..-1]
end
words
end

OR

(0...words.size).map do |i|
letters.split('').map do |c|
word[0...i]+c+word[i..-1]
end
end.flatten

Any advice? Currenty, I'm using the first approach and it's sloooooow
(I'm assuming inject has high overhead).
 
R

Raf Coremans

2007/5/9 said:
All -

I've been using ruby for quite some time and I'm only beginning to look
at python. I really like the way ruby flows as compared to python and I
can't see myself getting pulled away from ruby. However, there is one
thing I like about python, and that is the generators. What is the best
way to translate the following code into ruby? Assume word is some
string and letters is a string of all the characters a..z:

[word[0:i]+c+word[i+1:] for i in range(len(word)) for c in letters]

This is all based on trying to efficiently rewrite
http://norvig.com/spell-correct.html in ruby. Essentially what this code
does is get the array of words resulting from inserting every character
at every possible position in the given word. I find it pretty succinct,
but I know ruby can do better! I've come up with two ways to do this in
ruby, but neither seems to "click" with me:

(0...word.size).inject([]) do |words,i|
letters.split('').each do |c|
words << word[0...i]+c+word[i..-1]
end
words
end

OR

(0...words.size).map do |i|
letters.split('').map do |c|
word[0...i]+c+word[i..-1]
end
end.flatten

Any advice? Currenty, I'm using the first approach and it's sloooooow
(I'm assuming inject has high overhead).

Both ways seem ruby-esque to me.

Note though that you should use word[i+1..-1] (not word[i..-1]) in
order to get the same result as your python code.

The fastest executing algorithm that I could find was:
ar = []
(0...word.size).each do |i|
letters.split(//).each do |c|
ar << word.dup
ar[-1] = c
end
end

Regards,
Raf
 
D

Drew Olson

Thanks for the response.
Note though that you should use word[i+1..-1] (not word[i..-1]) in
order to get the same result as your python code.

I noticed this just after I posted...good catch :)
The fastest executing algorithm that I could find was:
ar = []
(0...word.size).each do |i|
letters.split(//).each do |c|
ar << word.dup
ar[-1] = c
end
end


This was my suspicion as well. At least to my way of think, this seems
decidedly non-ruby-esque. How is it that map has more overhead than
duplicating the word i*c times?

Another question for the performance folks out there: will I see a
performance gain when I access the resulting words from this method if I
store them in a set rather than an array?
 
J

Jeremy Hinegardner

--u3/rZRmxL6MmkK24
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Well this was find diversion for the evening :)

Original python
[word[0:i]+c+word[i+1:] for i in range(len(word)) for c in letters]


2007/5/9 said:
(0...word.size).inject([]) do |words,i|
letters.split('').each do |c|
words << word[0...i]+c+word[i+1..-1]
end
words
end

The fastest executing algorithm that I could find was:
ar = []
(0...word.size).each do |i|
letters.split(//).each do |c|
ar << word.dup
ar[-1] = c
end
end


I managed to get a few on my machine that were as fast or faster than
what was given so far. One thing that would work in all of the above
code would be to precompute 'letters' into an array before entering any
of the loops. Since that's a known nconstant we can generate that with:

LETTERS = ('a'..'z').to_a

and then use it in everywhere. Replacing the above in Raf and Drew's
solution cut some good time off. I did play with using a Range instead
of an Array for LETTERS, but the array turned out to be more efficient.

The two solutions I came up with that were as fast or faster than what
was already give were:

1) speed up using concat instead of append (fastest I found)

word = "ruby"
ar = []
word.size.times do |x|
ar = ar.concat LETTERS.collect { |l| z = word.dup ; z[x] = l ; z }
end

2) speed up with pre computing the result set, this one is very close to
the time of Raf's using the LETTERS array, but slower than (1)

In this one we generate an array holding the original word and then a
parallel one holding the replacement letter for the first. We then
use integer math to calculate the appropriate index of the letter in
the word in the first array to replace with the letter from the
second array.

word = "ruby"
lsize = LETTERS.size
words = Array.new(word.size * lsize) { word.dup }
replacements = LETTERS * word.size
words.size.times { |i| words[i/lsize] = replacements }

Both of these generate duplicates, and so I played with using a Set
instead of an Array, but in my testing using a Set was more expensive
than using an Array and then calling .uniq! on the result.

Of the two I think (1) is the more rubyesque of what I put together.

I've attached the benchmark script I was playing with to find out what
was the fastest method for folks to have fun with.

enjoy,

-jeremy

--
========================================================================
Jeremy Hinegardner (e-mail address removed)


--u3/rZRmxL6MmkK24
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="gen-words.rb"

#!/usr/bin/env ruby

require 'benchmark'
include Benchmark
LETTERS = ('a'..'z').to_a
LETTERS_S = LETTERS.join("")
WORD = "ruby"


def drew_inject_split
(0...WORD.size).inject([]) do |words,i|
LETTERS_S.split('').each do |c|
words << WORD[0...i]+c+WORD[i+1..-1]
end
words
end
end

def drew_inject_array
(0...WORD.size).inject([]) do |words,i|
LETTERS.each do |c|
words << WORD[0...i]+c+WORD[i+1..-1]
end
words
end
end


def drew_map_split
(0...WORD.size).map do |i|
LETTERS_S.split('').map do |c|
WORD[0..i]+c+WORD[i+1..-1]
end
end.flatten
end

def drew_map_array
(0...WORD.size).map do |i|
LETTERS.map do |c|
WORD[0..i]+c+WORD[i+1..-1]
end
end.flatten
end

def raf_split
ar = []
(0...WORD.size).each do |i|
LETTERS_S.split(//).each do |c|
ar << WORD.dup
ar[-1] = c
end
end
ar
end

def raf_array
ar = []
(0...WORD.size).each do |i|
LETTERS.each do |c|
ar << WORD.dup
ar[-1] = c
end
end
ar
end

def jeremy_1
ar = []
WORD.size.times do |x|
ar = ar.concat LETTERS.collect { |l| z = WORD.dup ; z[x] = l ; z }
end
ar
end

require 'set'
def jeremy_1_set
set = Set.new
WORD.size.times do |x|
set.merge(LETTERS.collect { |l| z = WORD.dup ; z[x] = l ; z })
end
set
end

def jeremy_2
ar = Array.new(WORD.size * LETTERS.size) { WORD.dup }
ar.size.times do |i|
ar[i / LETTERS.size] = LETTERS[i % LETTERS.size]
end
ar
end

def jeremy_3
ar = Array.new(WORD.size * LETTERS.size) { WORD.dup }
lsize = LETTERS.size
ar.size.times do |i|
ar[i / lsize] = LETTERS[i % lsize]
end
ar
end

def jeremy_4
ar = Array.new(WORD.size * LETTERS.size) { WORD.dup }
lsize = LETTERS.size
replacements = LETTERS * WORD.size

ar.size.times do |i|
ar[i / lsize] = replacements
end
ar
end

def jeremy_5
ar = []
w_index = (0...WORD.size).to_a
LETTERS.each do |l|
ar = ar.concat w_index.collect { |i| z = WORD.dup ; z = l ; z }
end
ar
end

def run_benchmark
Benchmark.bmbm(20) do |x|
x.report("Drew 1 - inject/split") { 1000.times { drew_inject_split } }
x.report("Drew 1 - inject/array") { 1000.times { drew_inject_array} }
x.report("Drew 2 - map/split" ) { 1000.times { drew_map_split } }
x.report("Drew 2 - map/array" ) { 1000.times { drew_map_array} }
x.report("Raf - split" ) { 1000.times { raf_split } }
x.report("Raf - array" ) { 1000.times { raf_array} }
x.report("jeremy - 1" ) { 1000.times { jeremy_1 } }
x.report("jeremy - 1 - set " ) { 1000.times { jeremy_1_set } }
x.report("jeremy - 1 - uniq!" ) { 1000.times { jeremy_1.uniq! } }
x.report("jeremy - 2" ) { 1000.times { jeremy_2 } }
x.report("jeremy - 3" ) { 1000.times { jeremy_3 } }
x.report("jeremy - 4" ) { 1000.times { jeremy_4 } }
x.report("jeremy - 4 - uniq!" ) { 1000.times { jeremy_4.uniq! } }
x.report("jeremy - 5" ) { 1000.times { jeremy_5} }
end
end

if __FILE__ == $0
valid = drew_inject_split
[:jeremy_1, :jeremy_1_set, :jeremy_2, :jeremy_3, :jeremy_4, :jeremy_5].each do |m|
j = send(m)
puts "result of #{m} => #{(j - valid).size}"
end
run_benchmark
end


--u3/rZRmxL6MmkK24--
 
R

Robert Klemme

All -

I've been using ruby for quite some time and I'm only beginning to look
at python. I really like the way ruby flows as compared to python and I
can't see myself getting pulled away from ruby. However, there is one
thing I like about python, and that is the generators. What is the best
way to translate the following code into ruby? Assume word is some
string and letters is a string of all the characters a..z:

[word[0:i]+c+word[i+1:] for i in range(len(word)) for c in letters]

This is all based on trying to efficiently rewrite
http://norvig.com/spell-correct.html in ruby. Essentially what this code
does is get the array of words resulting from inserting every character
at every possible position in the given word. I find it pretty succinct,
but I know ruby can do better! I've come up with two ways to do this in
ruby, but neither seems to "click" with me:

(0...word.size).inject([]) do |words,i|
letters.split('').each do |c|
words << word[0...i]+c+word[i..-1]
end
words
end

OR

(0...words.size).map do |i|
letters.split('').map do |c|
word[0...i]+c+word[i..-1]
end
end.flatten

Any advice? Currenty, I'm using the first approach and it's sloooooow
(I'm assuming inject has high overhead).

Since you are splitting letters all the time, I'd start with pulling out
this from the loop. So you could do

LETTERS = "a".."z"

Then, for efficiency reasons, I would not construct the whole thing in
memory but create an Enumerable class for it

WordGen = Struct.new:)word) do
include Enumerable

def each
word.size.times do |idx|
WordGen::LETTERS.each do |chr|
yield word[0...idx] << chr << word[idx..-1]
end
end
self
end

def to_a
map {|x|x}
end
end

WordGen::LETTERS = "a".."z"

Note also, that you can use "<<" with substrings without affecting the
original string. "<<" is more efficient than "+".

Now you can do

wg = WordGen.new "foobar"
wg.each {|w| puts w}
array = wg.to_a
# etc.

Kind regards

robert
 
R

Rick DeNatale

On 09.05.2007 21:02, Drew Olson wrote:

Here's another way to do it:

letters = ('a'..'z').to_a
word = 'word'

ans = []
(0..word.size).each do |i|
letters.each do |c|
ans << word.sub(/.{#{i}}/, "\\0#{c}")
end
end

print ans.join("\n")

or as an enumerator:

class Wordgen
include Enumerable

attr_accessor :word

Letters = ('a'..'z').to_a

def initialize(word)
self.word = word
end

def each
(0..word.size).each do |i|
Letters.each do |c|
yield word.sub(/.{#{i}}/, "\\0#{c}")
end
end
end
end

Wordgen.new("word").each do |ans|
p ans
end
 
D

Drew Olson

Rick said:
Here's another way to do it:

Thanks so much for all the responses, very insightful. If anyone wants
to take a crack at rewriting http://norvig.com/spell-correct.html in
ruby, go for it. I'll post my current code later this evening (which is
dog slow) and then I'll begin updating it based on the feedback I've
received.

Thanks again,
Drew
 
G

Gordon Thiesfeld

Have a look at this:

I'm sorry, I just noticed you participated in that thread. So, I
guess, don't bother :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top