Beautiful Code : Pity he didn't ask here...

John Carter · Jul 10, 2007

Just got the O'Reilly announcement of the book "Beautiful
Code"... here is the sample chapter by Tim Bray on Ruby!

http://www.oreilly.com/catalog/9780596510046/chapter/ch04.pdf

Pity he didn't run his examples by this forum first, I'm sure we could
have made them more beautiful..

Well, maybe for the 2nd edition, heres my bash at cleaning it up...

For example scanning a log file.....

EXAMPLE 4-2 . Printing article names
1 ARGF.each_line do |line|
2 if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
3 puts $1
4 end
5 end

Since most Linux distro's roll the log files to a reasonable size I would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
puts $1
end

In Example 4.3 he has...
1 counts = {}
2 counts.default = 0

Personally I prefer...
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that's personal preference I guess.

In example 4.4 he rightly identifies line 10 as a little ugly and bemoans the lack of sort_by_value in hash.
10 keys_by_count = counts.keys.sort { |a, b| counts <=> counts[a] }
it seems he doesn't know about..

count.keys.sort_by{|key| count[key]}

Of course he could have done...

class Hash
def sort_keys_by_value
keys.sort_by{|key| fetch(key)}
end
end

Example 4.5 looks like a poster child for the Hash.new block approach...
4 @hash = {}

James Edward Gray II · Jul 10, 2007

Well, maybe for the 2nd edition, heres my bash at cleaning it up...

For example scanning a log file.....

EXAMPLE 4-2 . Printing article names
1 ARGF.each_line do |line|
2 if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }
3 puts $1
4 end
5 end

Since most Linux distro's roll the log files to a reasonable size I
would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }) do |match|
puts $1
end

Hmm, I don't think we should slurp when we don't need too. There's
no reason to be wasteful.

I really liked the rest of your fixes though.

James Edward Gray II

Ryan Davis · Jul 10, 2007

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }) do |match|
puts $1
end

Click to expand...

Hmm, I don't think we should slurp when we don't need too. There's
no reason to be wasteful.

Actually there is plenty of reason to be wasteful:

1) We can.
2) It leads to code that reads like the above.
3) We can!
4) It is only a short hop to:

puts ARGF.read.scan( %r%GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/
\d\d/[^ .]+) %).join("\n")

5) We can!!
6) It is our moral obligation to use our oft-idle and underused
computers to their fullest.

There was a 7th... but I forgot what it was.

Martin DeMello · Jul 10, 2007

Since most Linux distro's roll the log files to a reasonable size I would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
puts $1
end

ARGF.read.scan( %r{GET /ongoing/When/\d{3}x/(\d#{4}/\d#{4}/[^ .]+) }) do |match|
puts $1
end

martin

Robert Dober · Jul 10, 2007

Since most Linux distro's roll the log files to a reasonable size I would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
puts $1
end

Click to expand...

I am with James do not slurp in the whole string it just does not make
sense to me. I would feel very uneasy using regexps on large strings
even if right now there seems no backtracking issues to be present,
but code evolves sometimes

ARGF.read.scan( %r{GET /ongoing/When/\d{3}x/(\d#{4}/\d#{4}/[^ .]+) }) do |match|

Nope Martin read the original code again!! All the / and \

.
I was about to fall into the same trap.
As a matter of fact I miss all beauty in that code

The regexp is
just ugly, some ideas for cosmetic surgery

def dig_dirs *args
Regexp.new args.map{|n| "\d" * n}.join("/")
end

puts ARGF.map { |line|
line =~ %r{ GET /ongoing/When/\d{3}x/(#{dig_dirs 4, 2, 2}/[^ .])}
$1
}.compact

But beauty lies to everybody (or was that "in the eyes of the beholder"? ).

Cheers
Robert

Robert Dober · Jul 10, 2007

Just got the O'Reilly announcement of the book "Beautiful

In Example 4.3 he has...
1 counts = {}
2 counts.default = 0

Personally I prefer...
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that's personal preference I guess.

In example 4.4 he rightly identifies line 10 as a little ugly and bemoans the lack of sort_by_value in hash.
10 keys_by_count = counts.keys.sort { |a, b| counts <=> counts[a] }
it seems he doesn't know about..

count.keys.sort_by{|key| count[key]}

Of course he could have done...

class Hash
def sort_keys_by_value
keys.sort_by{|key| fetch(key)}
end
end

or

tops = counts.sort_by{|*kv| kv.last}.reverse[0..9]
puts tops.map{ |k,v| "#{v}: #{k}"}

#each is just a primitive for Enumerables, right?

Example 4.5 looks like a poster child for the Hash.new block approach...
4 @hash = {}
.
.

9 if @hash[s[0]]
10 @hash[s[0]] << [ s[1], article ]
11 else
12 @hash[s[0]] = [ s[1], article ]
13 end

Replace that with...

4 @hash = Hash.new{|hash,key| hash[key] = []}

10 @hash[s[0]] << [ s[1], article ]

That makes it real pretty code.

Click to expand...

Save for

File.open(some_name).each_line

Let the GC close the file!? It is not going to happen !!

<snip>
Robert

Daniel Martin · Jul 10, 2007

John Carter said:
Just got the O'Reilly announcement of the book "Beautiful
Code"... here is the sample chapter by Tim Bray on Ruby!
...

For example scanning a log file.....

Since most Linux distro's roll the log files to a reasonable size I would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
puts $1
end

As James said, I'm going to have to disagree on this, though I'd clean
it up to:

article_pattern = \
%r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
ARGF.each_line do |line|
line.scan(article_pattern) do |article,|
puts article
end
end

Look, no global variables!

I agree with your use of Hash.new(0), but disagree that it's mere
personal preference - it's much more common to use even the single
argument form in the constructor than to use default=.

It is odd that he seems unaware of Array.sort_by - anyone know when
this was added to Ruby core? The book could be rather out of date.

Example 4.5 looks like a poster child for the Hash.new block approach...

Agreed. I wonder if perhaps he's spent more time in python or some
other dynamic language than Ruby.

In fact, I can't find any pretty reason to have the class BigHash at
all....looks like his Java roots are showing through.

Yeah; in Java, you can't have code existing outside of a class, and I
suspect that's a big part of it.

On the other hand both have the classic bug that (high + low) or
(l+u) overflows on large arrays...

Actually, his code doesn't: java arrays can only have 2**31 entries
(since their index is a java int), and java ints are always signed.
Therefore, any overflow in the addition in java yields a negative
integer, and >>> in java does right shift without sign extension.
(That is, with 0s shifted in at the left).

The glibc code does have that problem, but only on arrays that have
more than (maximum size_t)/2 elements.

dblack · Jul 10, 2007

Hi --

In Example 4.3 he has...
1 counts = {}
2 counts.default = 0

Personally I prefer...
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that's personal preference I guess.

Yes, but be careful: your two lines don't do the same thing as each
other:

irb(main):007:0> counts = Hash.new(0)
=> {}
irb(main):008:0> counts["hi"]
=> 0
irb(main):009:0> counts
=> {}
irb(main):010:0> counts = Hash.new{|hash,key| hash[key]=0}
=> {}
irb(main):011:0> counts["hi"]
=> 0
irb(main):012:0> counts
=> {"hi"=>0}

In example 4.4 he rightly identifies line 10 as a little ugly and bemoans the
lack of sort_by_value in hash.
10 keys_by_count = counts.keys.sort { |a, b| counts <=> counts[a] }
it seems he doesn't know about..

count.keys.sort_by{|key| count[key]}

Of course he could have done...

class Hash
def sort_keys_by_value
keys.sort_by{|key| fetch(key)}
end
end

It's best not to get into changing the core language, though, if the
book isn't going to discuss the benefits and pitfalls of doing so in
depth.

David

--
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)

Christoffer Sawicki · Jul 10, 2007

Hello.

count.keys.sort_by{|key| count[key]}

IMHO, the most elegant way to do this is:

hash.sort_by { |k, v| v }.map { |k, v| k }

...where k = key and v = value.

Cheers,

Gregory Brown · Jul 10, 2007

Hello.

count.keys.sort_by{|key| count[key]}

Click to expand...

IMHO, the most elegant way to do this is:

hash.sort_by { |k, v| v }.map { |k, v| k }

...where k = key and v = value.

Cheers,

Click to expand...

Ack, pet peeve.

hash.sort_by { | key, value | value } .map { | key, value | key }

If you have to specify "where k = key and v = value" then these
should have been used in your code.

Always favor readability at the expense of verbosity, both your
future self and whoever else maintains your code will thank you.

I find myself always using |k,v|, when used on something hashlike I
don't think readability suffers.

James Edward Gray II · Jul 10, 2007

Hello.

count.keys.sort_by{|key| count[key]}

IMHO, the most elegant way to do this is:

hash.sort_by { |k, v| v }.map { |k, v| k }

...where k =3D key and v =3D value.

Cheers,

Click to expand...

Ack, pet peeve.

hash.sort_by { | key, value | value } .map { | key, value | key }

If you have to specify "where k =3D key and v =3D value" then these
should have been used in your code.

Always favor readability at the expense of verbosity, both your
future self and whoever else maintains your code will thank you.

Click to expand...

I find myself always using |k,v|, when used on something hashlike I
don't think readability suffers.

I've also recently adopted the trick of using _ as an unused =20
parameter name. I believe it was Ara that first suggested this and I =20=

think it's a great idea:

hash.sort_by { |key, _| =85 }=85

James Edward Gray II=

Daniel Martin · Jul 10, 2007

Robert Dober said:
Save for

File.open(some_name).each_line

Let the GC close the file!? It is not going to happen !!

Good point. In many ways, it feels that the author is not as
comfortable with Ruby blocks as one should be. A better version,
since we also don't need a separate class:

def get_big_hash(filename)
hash = Hash.new {|h,k| h[k]=[]}
lines = 0
File.open(filename) do |file|
file.each_line do |line|
s = line.split
article = s[2].intern
hash[s[0]] << [ s[1], article ]
lines += 1
STDERR.puts "Line: #{lines}" if (lines % 100_000) == 0
end
end
return hash
end

I really like the block-using form of File.open. It makes me feel all
nice and safe like being wrapped in with-open-file or a carefully
constructed dynamic-wind.

Paul Battley · Jul 10, 2007

T24gMTAvMDcvMDcsIEphbWVzIEVkd2FyZCBHcmF5IElJIDxqYW1lc0BncmF5cHJvZHVjdGlvbnMu
bmV0PiB3cm90ZToKPiBJJ3ZlIGFsc28gcmVjZW50bHkgYWRvcHRlZCB0aGUgdHJpY2sgb2YgdXNp
bmcgXyBhcyBhbiB1bnVzZWQKPiBwYXJhbWV0ZXIgbmFtZS4gIEkgYmVsaWV2ZSBpdCB3YXMgQXJh
IHRoYXQgZmlyc3Qgc3VnZ2VzdGVkIHRoaXMgYW5kIEkKPiB0aGluayBpdCdzIGEgZ3JlYXQgaWRl
YToKPgo+ICAgIGhhc2guc29ydF9ieSB7IHxrZXksIF98IOKApiB94oCmCgpJIHRob3VnaHQgaXQg
Y2FtZSBmcm9tIEhhc2tlbGwsIHdoZXJlIF8gaXMgYSB3aWxkY2FyZCBwYXJhbWV0ZXI6CgpQcmVs
dWRlPiBsZXQgcG93ZXIgXyAwID0gMQpQcmVsdWRlPiBwb3dlciA0IDAKMQpQcmVsdWRlPiBwb3dl
ciAxMDAgMAoxClByZWx1ZGU+IHBvd2VyIDQgMgoqKiogRXhjZXB0aW9uOiA8aW50ZXJhY3RpdmU+
OjE6NC0xNjogTm9uLWV4aGF1c3RpdmUgcGF0dGVybnMgaW4gZnVuY3Rpb24gcG93ZXIKUHJlbHVk
ZT4gbGV0IHBvd2VyIGEgYiA9IGEgKiogYgpQcmVsdWRlPiBwb3dlciA0IDIKMTYuMAoKLi4uIGhv
d2V2ZXIsIEkgd291bGRuJ3QgYmUgc3VycHJpc2VkIHRvIGZpbmQgdGhhdCBIYXNrZWxsIHRvb2sg
aXQgZnJvbQpzb21lIG90aGVyIGxhbmd1YWdlLgoKUGF1bC4K

James Edward Gray II · Jul 10, 2007

I thought it came from Haskell, where _ is a wildcard parameter:

I meant that I believe it was Ara who recommended it as a good name =20
for an unused block parameter in Ruby.

James Edward Gray II=

Tim Bray · Jul 10, 2007

Just got the O'Reilly announcement of the book "Beautiful
Code"... here is the sample chapter by Tim Bray on Ruby!

http://www.oreilly.com/catalog/9780596510046/chapter/ch04.pdf

Pity he didn't run his examples by this forum first, I'm sure we could
have made them more beautiful..

Yep, mostly because for some reason I didn't know about sort_by. I
wonder how I missed that? Obviously, the main point of my article
was to encourage people to use the idiom of reading lines, picking
'em apart with a regex, and accumulating in a hash, invented in awk,
brought to the world in perl, perfected in Ruby.

sort_by aside, I'm not convinced that any of the alternatives to my
main loop are more beautiful or readable. I personally think that
the reason Ruby will enjoy an ever increasing market share in the
programming-languages space is because it's more readable.

I'm looking at the following...

EXAMPLE 4-2 . Printing article names
1 ARGF.each_line do |line|
2 if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }
3 puts $1
4 end
5 end

Since most Linux distro's roll the log files to a reasonable size I
would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }) do |match|
puts $1
end

Really? Considering most of the readership is new to Ruby?

In Example 4.3 he has...
1 counts = {}
2 counts.default = 0

Personally I prefer...
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that's personal preference I guess.

Gosh, I think I totally disagree on readability grounds.

-Tim

Tim Bray · Jul 10, 2007

As James said, I'm going to have to disagree on this, though I'd clean
it up to:

article_pattern = \
%r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
ARGF.each_line do |line|
line.scan(article_pattern) do |article,|
puts article
end
end

Look, no global variables!

Yep, much better, I wish I'd done it that way.

I agree with your use of Hash.new(0), but disagree that it's mere
personal preference - it's much more common to use even the single
argument form in the constructor than to use default=.

Why? default= makes it more obvious what you're doing. I think it's
better practice.

Agreed. I wonder if perhaps he's spent more time in python or some
other dynamic language than Ruby.

Perl

-Tim

Chad Perrin · Jul 10, 2007

In Example 4.3 he has...
1 counts = {}
2 counts.default = 0

Personally I prefer...
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that's personal preference I guess.

Click to expand...

Gosh, I think I totally disagree on readability grounds.

Actually, I think this:

counts = Hash.new(0)

. . is perfectly readable, and a bit more elegant than the two-line
version from "Example 4.3". On the other hand, this bit looks pretty
ugly to me:

counts = Hash.new{|hash,key| hash[key]=0]}

. . especially since it looks like someone is afraid he has a limited
number of space characters to use, and doesn't want to run out. Even
with some spaces added to it, though, it's not as pretty and readable.

Caleb Powell · Jul 10, 2007

Yep, mostly because for some reason I didn't know about sort_by. I
wonder how I missed that? O

Perhaps because sort_by is defined in Enumerable, not in the Array
class. I wish that the Ruby documentation inlined methods like these
from Mixin classes (similar to how Javadoc displays a parent class'
methods). Is this possible when generating using rdoc?

Ryan Davis · Jul 10, 2007

Always favor readability at the expense of verbosity, both your =20
future self and whoever else maintains your code will thank you.

I won't. k and v work Just Fine=99 for me and whomever else I work =20
with. Verbosity for verbosity's sake, esp when it pushes outside of =20
common and accepted idioms, is a form of mentarbation.

degei · Jul 11, 2007

I meant that I believe it was Ara who recommended it as a good name
for an unused block parameter in Ruby.

James Edward Gray II

Just as an aside, I believe Erlang adopts this style as a part of this
language, where it will warn if you have unused variables but allows
you to annotate them with _ (ex. _var) to denote that they will not be
used, or just replace the variable name with _ thus eliminating the
warning.

(I don't know Erlang yet, just remembered reading this in
http://pragdave.pragprog.com/pragdave/2007/04/a_first_erlang_.html)

Rubyisms wanted to shorten code in search program	11	Dec 4, 2007
Hash of Hash of Arrays Question	5	Jul 18, 2009
[SUMMARY] Getting to 100 (#119)	0	Apr 12, 2007
feedback on code design	23	May 30, 2012
my ruby code won't go as fast as my perl code	11	Jul 15, 2004
[SUMMARY] Shirt Reader (#140)	1	Sep 27, 2007
[SUMMARY] String Equations (#112)	0	Feb 8, 2007
Runtime disparity - Same program in Perl and Ruby	5	Jun 14, 2007

Beautiful Code : Pity he didn't ask here...

John Carter

James Edward Gray II

Ryan Davis

Martin DeMello

Robert Dober

Robert Dober

Daniel Martin

dblack

Christoffer Sawicki

Gregory Brown

James Edward Gray II

Daniel Martin

Paul Battley

James Edward Gray II

Tim Bray

Tim Bray

Chad Perrin

Caleb Powell

Ryan Davis

degei

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads