Help me understand why the Ruby block is slower than without

A

Alan Burch

I just wrote my first Ruby script. I'm an experienced C and perl
programmer, so please, if it looks too much like these languages and not
Ruby, let me know. I've got a 100K word list (Linux dictionary) on my
Mac and am opening it then looking for any words that are exactly 10
letters long with no letters repeating ('profligate\n') == 11 is a
match. After I wrote my first version I did some playing. I first saw
that the array class mixed in enumerable and that I could use the to_a
call from there, but a quick check using -r profile showed that my
original call to split was a much quicker way to convert from a string
to an array. I then tried putting the File.open in a block and found
that this was much slower, even if I subtract out the time for the open,
which I assume is an error in how the profile is counting total time.

Here's the faster version:

f = File.open("./words")
begin
while f.gets
if $_.length == 11
ar = $_.split(//)
if ar.uniq! == nil
print "#{ar.to_s}"
end
end
end
rescue EOFError
f.close
end

And here's the slower block version:

File.open("./words") { |f|
while f.gets
if $_.length == 11
ar = $_.split(//)
if ar.uniq! == nil
print "#{ar.to_s}"
end
end
end
}

Again, the words file is just a list of about 100K unique words from the
dict command or similar on *nix....

Any critique welcome and enlightenment is encouraged.
Thanks!
 
W

William James

Alan said:
I just wrote my first Ruby script. I'm an experienced C and perl
programmer, so please, if it looks too much like these languages and not
Ruby, let me know. I've got a 100K word list (Linux dictionary) on my
Mac and am opening it then looking for any words that are exactly 10
letters long with no letters repeating ('profligate\n') == 11 is a
match. After I wrote my first version I did some playing. I first saw
that the array class mixed in enumerable and that I could use the to_a
call from there, but a quick check using -r profile showed that my
original call to split was a much quicker way to convert from a string
to an array. I then tried putting the File.open in a block and found
that this was much slower, even if I subtract out the time for the open,
which I assume is an error in how the profile is counting total time.

Here's the faster version:

f = File.open("./words")
begin
while f.gets
if $_.length == 11
ar = $_.split(//)
if ar.uniq! == nil
print "#{ar.to_s}"
end
end
end
rescue EOFError
f.close
end

And here's the slower block version:

File.open("./words") { |f|
while f.gets
if $_.length == 11
ar = $_.split(//)
if ar.uniq! == nil
print "#{ar.to_s}"
end
end
end
}

Again, the words file is just a list of about 100K unique words from the
dict command or similar on *nix....

Any critique welcome and enlightenment is encouraged.
Thanks!


File.open("wordlist") { |f|
while w = f.gets
puts w if w.size==11 && w.split(//).uniq.size == 11
end
}
 
A

Alan Burch

File.open("wordlist") { |f|
while w = f.gets
puts w if w.size==11 && w.split(//).uniq.size == 11
end
}

Ok, factor of 10 faster, and more Ruby like, much and many Thanks!
Others, any comments on the block slow down?
AB
 
A

Alan Burch

Ok, factor of 10 faster, and more Ruby like, much and many Thanks!
Others, any comments on the block slow down?
AB

I mis-spoke. Not a factor of 10 faster, just marginally. I had
"wordlist" in my directory as a list of the unique 10 letter words.
I do like the code better still, but with out the block, it's still much
faster. Also using uniq! rather than size is quicker than taking the
size twice.

My current fastest script:

f= File.open("./words")
begin
while w = f.gets
puts w if w.size == 11 && w.split(//).uniq! == nil
end
rescue EOFError
f.close
end

Not measurably faster than the first one, but seems better and more Ruby
like to me.
 
M

Mark Devlin

Alan said:
I mis-spoke. Not a factor of 10 faster, just marginally. I had
"wordlist" in my directory as a list of the unique 10 letter words.
I do like the code better still, but with out the block, it's still much
faster. Also using uniq! rather than size is quicker than taking the
size twice.

Solely for my own amusement, since I'm still trying teach myself Ruby...

File.open("./words").read.split.collect! {|x| x if x.length == 10 &&
x.split(//).uniq! == nil}.compact!.each {|x| puts x }
 
G

George Ogata

Alan Burch said:
Ok, factor of 10 faster, and more Ruby like, much and many Thanks!
Others, any comments on the block slow down?

I don't see much of a slowdown.

----------------------------------------------------------------------

g@crash:~/tmp$ cat read-slow.rb
File.open("./words") { |f|
while f.gets
if $_.length == 11
ar = $_.split(//)
if ar.uniq! == nil
print "#{ar.to_s}"
end
end
end
}
g@crash:~/tmp$ /usr/bin/time ruby read-slow.rb > out-slow
2.56user 0.01system 0:02.64elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+550minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-slow.rb > out-slow
2.55user 0.01system 0:02.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+550minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-slow.rb > out-slow
2.54user 0.01system 0:02.56elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+550minor)pagefaults 0swaps
g@crash:~/tmp$ cat read-fast.rb
f = File.open("./words")
begin
while f.gets
if $_.length == 11
ar = $_.split(//)
if ar.uniq! == nil
print "#{ar.to_s}"
end
end
end
rescue EOFError
f.close
end
g@crash:~/tmp$ /usr/bin/time ruby read-fast.rb > out-fast
2.51user 0.01system 0:02.54elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+544minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-fast.rb > out-fast
2.50user 0.01system 0:02.56elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+544minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-fast.rb > out-fast
2.51user 0.01system 0:02.53elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+544minor)pagefaults 0swaps

----------------------------------------------------------------------

There's a bit of a slowdown, but note that in your "fast" algo, the
stream is never closed, since IO#gets never throws EOFError. Do `ri
IO#gets' for the method's documentation. :)

Another speedup: replace:

w.split(//).uniq.size == 11

with:

w !~ /(.).*\1/

It's faster since there's less intermediate diddlage, but
theoretically it shouldn't scale as well. You'd have to increase your
"11" quite a lot to notice it though I think.

More shell dump.

----------------------------------------------------------------------

g@crash:~/tmp$ cat read-one.rb
File.open("words") { |f|
while w = f.gets
puts w if w.size==11 && w.split(//).uniq.size == 11
end
}
g@crash:~/tmp$ /usr/bin/time ruby read-one.rb > out-one
2.54user 0.02system 0:02.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+548minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-one.rb > out-one
2.54user 0.01system 0:02.56elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+548minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-one.rb > out-one
2.55user 0.01system 0:02.58elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+548minor)pagefaults 0swaps
g@crash:~/tmp$ cat read-two.rb
File.open("words") { |f|
while w = f.gets
puts w if w.size==11 && w !~ /(.).*\1/
end
}
g@crash:~/tmp$ /usr/bin/time ruby read-two.rb > out-two
1.23user 0.01system 0:01.25elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+713minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-two.rb > out-two
1.27user 0.01system 0:01.29elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+713minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-two.rb > out-two
1.27user 0.02system 0:01.30elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+713minor)pagefaults 0swaps
g@crash:~/tmp$
g@crash:~/tmp$
g@crash:~/tmp$ diff out-one out-two
g@crash:~/tmp$

----------------------------------------------------------------------
 
A

Alan Burch

Mark said:
Solely for my own amusement, since I'm still trying teach myself Ruby...

File.open("./words").read.split.collect! {|x| x if x.length == 10 &&
x.split(//).uniq! == nil}.compact!.each {|x| puts x }

Mark:
Thanks for doing this way. I had thought about trying to read it in and
split it up, but didn't know how to do it as I've only read a bit of the
pikaxe book. On my two processor G5 with 4 gb of memory, your version
is about 30% slower than the fastest method above. 18.69 vs 12.52
seconds. I intend to look a bit closer at your code and see if I can
see another way to speed it up.

Gary:
Thanks for your input also. I saw the redundancy when William James
gave me input, but really don't fully understand arrays vs strings in
Ruby yet and also the differences in print vs puts and other types of
output. I'll read through pikaxe a bit more right now.

Others:
Any input as to why it runs slower inside the file block? Have I
overlooked something simple?
 
D

dblack

Hi --

Also, calling to_s to convert an object to a string within a string
interpolation
block is redundant.

print "#{ar}"

works and then you realize that you don't need the interpolation so

print ar

is even better. Understanding this is what David Black called a
'Ruby right of passage'. At least I think it was David who said that
recently. I'm too lazy to google for the reference at the moment.

You're right but spelled rite wrong, Wright :)


David

--
David A. Black ([email protected])
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" chapters now available
from Manning Early Access Program! http://www.manning.com/books/black
 
A

Alan Burch

George said:
Another speedup: replace:

w.split(//).uniq.size == 11

with:

w !~ /(.).*\1/

It's faster since there's less intermediate diddlage, but
theoretically it shouldn't scale as well. You'd have to increase your
"11" quite a lot to notice it though I think.

George:
Much thanks, I think that you've proved what I suspected, that Ruby is
counting the time wrong with the profile (ruby -r profile script.rb) as
when I subtract the profile time for the File.open block it's only a bit
slower than the faster call. I appreciate all the help and will try to
ask a more difficult question next time.
I've always been fairly strong with regexes, but I'd have never thought
to use one here. Thanks for that as well.

David:
Thanks for chiming in, I'll check out your links as well.

Alan
 
B

Bill Kelly

Hi,

From: "Mark Devlin said:
Solely for my own amusement, since I'm still trying teach myself Ruby...

File.open("./words").read.split.collect! {|x| x if x.length == 10 &&
x.split(//).uniq! == nil}.compact!.each {|x| puts x }

One detail here is the file handle is not being closed. A few alternatives
that close the file:

# open with block
File.open("./words"){|f| f.read.split.collect! {|x| x if x.length == 10 && x.split(//).uniq! == nil}.compact.each {|x| puts x } }

# File.read method
File.read("./words").split.collect! {|x| x if x.length == 10 && x.split(//).uniq! == nil}.compact.each {|x| puts x }

# IO.readlines method
IO.readlines("./words").collect! {|x| x if x.length == 11 && x.split(//).uniq! == nil}.compact.each {|x| puts x }

Note, used length 11 because readlines keeps linefeeds; also changed all to
non-bang form of compact, as compact! would return nil if it didn't do any work.
(I.e. if all words in the input satisfied the criteria, collect! would have returned
nil, and we'd have gotten a NoMethodError: undefined method `each' for nil:NilClass.)


Regards,

Bill
 
S

Stephen Waits

--Apple-Mail-3-537470167
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed


My current fastest script:

f= File.open("./words")
begin
while w = f.gets
puts w if w.size == 11 && w.split(//).uniq! == nil
end
rescue EOFError
f.close
end

Not measurably faster than the first one, but seems better and more
Ruby
like to me.

I'm curious why you see it so? Personally, seems less Ruby-like to me.

--Steve


--Apple-Mail-3-537470167
content-type: application/pgp-signature; x-mac-type=70674453;
name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFEElo2nnOWIarjHBkRAk2tAJ9kWfQ6/DCBzS1yIhJGvvYLBgqRKACfYFqu
HEIG8R/MiKLk5cM9rxzWerE=
=qJM9
-----END PGP SIGNATURE-----

--Apple-Mail-3-537470167--
 
A

Alan Burch

Stephen said:
I'm curious why you see it so? Personally, seems less Ruby-like to me.

--Steve

Steve:
Let me be the first to say that I certainly don't understand Ruby idiom
yet, that's why I'm using the word seems in my earlier reply. I may
feel quite different about the newer code in a few days... I like it
better because it's more succinct as there're fewer intermediate steps
and because I can see what it's doing quite quickly. My code looks a
bit like that K&R C that I learned in the early 80s. You can even see
my OTB style and probably guess that I still use vi(m) :). I'd rather
learn the Ruby idiom and that's part of what I was asking here.
Learning that 'puts' doesn't throw the EOF as I was expecting, was an
important lesson and so was the rite of passage that Mr Wright pointed
out. The Ruby language feels good to me, and I'd like it to feel as
comfortable as C does to me, so thanks again for all the kind help.

Bill:
Thanks for adding a bit more to reading it all in and processing script.

I like code that's quite readable and will take that over code that's
faster but not so readable in any case where I can get away with it.
That said, I still really like the regex that checks to see if the
string is unique. No conversion to an array and no need to re-check the
size of it all so the current piece of code I like best is:

f= File.open("./words").each { |w|
puts w if w.size == 11 && w !~ /(.).*\1/
}

It's also the fastest that I've tested. I would, however, add a comment
to explain what the regex does so I wouldn't have to stare at it for a
minute or two to figure it out after a few months away.
Alan
 
B

Benjohn Barnes

--Apple-Mail-14-557431005
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
delsp=yes;
format=flowed


Another speedup: replace:

w.split(//).uniq.size == 11

with:

w !~ /(.).*\1/

!? :) How on earth does that work? Every time I think I've sort of
got the hang of regexp, they spring something new on me.

I was also going to ask why everyone was doing "split( // )" instead
of "split( '' )"?

- oooh, coffee's ready...

Cheers,
Benjohn
--Apple-Mail-14-557431005--
 
J

James Edward Gray II

File.open("wordlist") { |f|
while w = f.gets
puts w if w.size==11 && w.split(//).uniq.size == 11
end
}

That's what the foreach() iterator is for:

File.foreach("wordlist") do |word|
puts word if word.chomp.split("").uniq.size == 10
end

James Edward Gray II
 
J

James Edward Gray II

I first saw
that the array class mixed in enumerable and that I could use the to_a
call from there, but a quick check using -r profile showed that my
original call to split was a much quicker way to convert from a string
to an array.

This sounds like premature optimization. Remember, you start
worrying about speed when the code gets too slow. Not before.

James Edward Gray II
 
A

Alan Burch

Benjohn said:
!? :) How on earth does that work? Every time I think I've sort of
got the hang of regexp, they spring something new on me.

I was also going to ask why everyone was doing "split( // )" instead
of "split( '' )"?

- oooh, coffee's ready...

Cheers,
Benjohn

Benjohn:
A . matches any char except the \n, putting it in (), makes it save in
\1 each time it matches, the .* matches zero or more chars that's not a
\n, so if we try to match a string such as "profligate\n" the regex
would first look for (p).*\p, with the second p being anywhere in the
string then (r).*(r), etc. A string with a repeating set of letters
"wristwatch\n" matches (w)rist\w and returns a match. I highly
recommend O'reilly's "Mastering Regular Expressions", I've only read the
first edition, but it's an eye-opener (or maybe the opposite if you try
to read it in bed :))

A note for those following along in the DOS world, the dos string ends
\r\n and won't return the expected result as a matching DOS string will
need to be 12 long. This sacrifices portability for speed (I didn't
want to use chop after each gets).

As to split, I just used what I'm used to from perl. It's an empty
pattern and makes sense to me that way.
Alan
 
A

Alan Burch

James said:
This sounds like premature optimization. Remember, you start
worrying about speed when the code gets too slow. Not before.

James Edward Gray II

James:
I'm going to have to respectfully disagree. I guess maybe I'm getting
to old to code, but I first learned assembler and then C. Assembler
served me well in that I knew how to write the fastest, least resource
intensive C. Back in the early 80s on a VAX running V7 UNIX that was
more important than maintainability. As I've continued my craft and
learned many other languages, I've found that truly understanding what's
happening "under the hood" of any language was the key to writing code
that didn't break, executed quickly, and kept clients happy.
Furthermore, there's something in me that makes me better love the
language when I completely master it. I don't believe the language is
mastered until one understands things such as why one construct executes
quicker than another. I can now picture how a mix-in works and why
calling the to_a mix-in is a slower construct. I don't understand all
the nuances of that yet, but I intend to and that will make Ruby that
much more enjoyable to me.
Thanks for a different insight,
Alan
 
A

Alan Burch

James said:
That's what the foreach() iterator is for:

File.foreach("wordlist") do |word|
puts word if word.chomp.split("").uniq.size == 10
end

James Edward Gray II

James:
This code doesn't work on my Mac. I do have a version that uses the
file block and each/foreach above, but I'm suspecting that when the
string becomes an array after the split something's breaking down as I
get words of all sizes out???
Thanks,
Alan
 
J

James Edward Gray II

This code doesn't work on my Mac.

Something is fishy there, for it works just fine on my own Mac:

Neo:~/Desktop$ ls
tens.rb wordlist
Neo:~/Desktop$ cat wordlist
one
two
three
0123456789
five
0123456789
Neo:~/Desktop$ cat tens.rb
#!/usr/local/bin/ruby -w

File.foreach("wordlist") do |word|
puts word if word.chomp.split("").uniq.size == 10
end

__END__
Neo:~/Desktop$ ruby tens.rb
0123456789
0123456789

James Edward Gray II
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top