Beautiful Code : Pity he didn't ask here...

J

John Carter

Just got the O'Reilly announcement of the book "Beautiful
Code"... here is the sample chapter by Tim Bray on Ruby!

http://www.oreilly.com/catalog/9780596510046/chapter/ch04.pdf

Pity he didn't run his examples by this forum first, I'm sure we could
have made them more beautiful..

Well, maybe for the 2nd edition, heres my bash at cleaning it up...

For example scanning a log file.....

EXAMPLE 4-2 . Printing article names
1 ARGF.each_line do |line|
2 if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
3 puts $1
4 end
5 end

Since most Linux distro's roll the log files to a reasonable size I would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
puts $1
end

In Example 4.3 he has...
1 counts = {}
2 counts.default = 0

Personally I prefer...
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that's personal preference I guess.

In example 4.4 he rightly identifies line 10 as a little ugly and bemoans the lack of sort_by_value in hash.
10 keys_by_count = counts.keys.sort { |a, b| counts <=> counts[a] }
it seems he doesn't know about..

count.keys.sort_by{|key| count[key]}

Of course he could have done...

class Hash
def sort_keys_by_value
keys.sort_by{|key| fetch(key)}
end
end


Example 4.5 looks like a poster child for the Hash.new block approach...
4 @hash = {}
 
J

James Edward Gray II

Well, maybe for the 2nd edition, heres my bash at cleaning it up...

For example scanning a log file.....

EXAMPLE 4-2 . Printing article names
1 ARGF.each_line do |line|
2 if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }
3 puts $1
4 end
5 end

Since most Linux distro's roll the log files to a reasonable size I
would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }) do |match|
puts $1
end

Hmm, I don't think we should slurp when we don't need too. There's
no reason to be wasteful.

I really liked the rest of your fixes though.

James Edward Gray II
 
R

Ryan Davis

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }) do |match|
puts $1
end

Hmm, I don't think we should slurp when we don't need too. There's
no reason to be wasteful.

Actually there is plenty of reason to be wasteful:

1) We can.
2) It leads to code that reads like the above.
3) We can!
4) It is only a short hop to:

puts ARGF.read.scan( %r%GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/
\d\d/[^ .]+) %).join("\n")

5) We can!!
6) It is our moral obligation to use our oft-idle and underused
computers to their fullest.

There was a 7th... but I forgot what it was.
 
M

Martin DeMello

Since most Linux distro's roll the log files to a reasonable size I would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
puts $1
end

ARGF.read.scan( %r{GET /ongoing/When/\d{3}x/(\d#{4}/\d#{4}/[^ .]+) }) do |match|
puts $1
end

martin
 
R

Robert Dober

Since most Linux distro's roll the log files to a reasonable size I would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
puts $1
end
I am with James do not slurp in the whole string it just does not make
sense to me. I would feel very uneasy using regexps on large strings
even if right now there seems no backtracking issues to be present,
but code evolves sometimes ;)
ARGF.read.scan( %r{GET /ongoing/When/\d{3}x/(\d#{4}/\d#{4}/[^ .]+) }) do |match|

Nope Martin read the original code again!! All the / and \ ;).
I was about to fall into the same trap.
As a matter of fact I miss all beauty in that code :( The regexp is
just ugly, some ideas for cosmetic surgery ;)

def dig_dirs *args
Regexp.new args.map{|n| "\d" * n}.join("/")
end

puts ARGF.map { |line|
line =~ %r{ GET /ongoing/When/\d{3}x/(#{dig_dirs 4, 2, 2}/[^ .])}
$1
}.compact

But beauty lies to everybody (or was that "in the eyes of the beholder"? ).

Cheers
Robert
 
R

Robert Dober

Just got the O'Reilly announcement of the book "Beautiful
In Example 4.3 he has...
1 counts = {}
2 counts.default = 0
Personally I prefer...
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that's personal preference I guess.

In example 4.4 he rightly identifies line 10 as a little ugly and bemoans the lack of sort_by_value in hash.
10 keys_by_count = counts.keys.sort { |a, b| counts <=> counts[a] }
it seems he doesn't know about..

count.keys.sort_by{|key| count[key]}

Of course he could have done...

class Hash
def sort_keys_by_value
keys.sort_by{|key| fetch(key)}
end
end

or

tops = counts.sort_by{|*kv| kv.last}.reverse[0..9]
puts tops.map{ |k,v| "#{v}: #{k}"}

#each is just a primitive for Enumerables, right?
Example 4.5 looks like a poster child for the Hash.new block approach...
4 @hash = {}
.
.

9 if @hash[s[0]]
10 @hash[s[0]] << [ s[1], article ]
11 else
12 @hash[s[0]] = [ s[1], article ]
13 end


Replace that with...

4 @hash = Hash.new{|hash,key| hash[key] = []}

10 @hash[s[0]] << [ s[1], article ]


That makes it real pretty code.

Save for

File.open(some_name).each_line

Let the GC close the file!? It is not going to happen !!

<snip>
Robert
 
D

Daniel Martin

John Carter said:
Just got the O'Reilly announcement of the book "Beautiful
Code"... here is the sample chapter by Tim Bray on Ruby!
...

For example scanning a log file.....

Since most Linux distro's roll the log files to a reasonable size I would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
puts $1
end

As James said, I'm going to have to disagree on this, though I'd clean
it up to:

article_pattern = \
%r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
ARGF.each_line do |line|
line.scan(article_pattern) do |article,|
puts article
end
end

Look, no global variables!

I agree with your use of Hash.new(0), but disagree that it's mere
personal preference - it's much more common to use even the single
argument form in the constructor than to use default=.

It is odd that he seems unaware of Array.sort_by - anyone know when
this was added to Ruby core? The book could be rather out of date.
Example 4.5 looks like a poster child for the Hash.new block approach...

Agreed. I wonder if perhaps he's spent more time in python or some
other dynamic language than Ruby.
In fact, I can't find any pretty reason to have the class BigHash at
all....looks like his Java roots are showing through.

Yeah; in Java, you can't have code existing outside of a class, and I
suspect that's a big part of it.
On the other hand both have the classic bug that (high + low) or
(l+u) overflows on large arrays...

Actually, his code doesn't: java arrays can only have 2**31 entries
(since their index is a java int), and java ints are always signed.
Therefore, any overflow in the addition in java yields a negative
integer, and >>> in java does right shift without sign extension.
(That is, with 0s shifted in at the left).

The glibc code does have that problem, but only on arrays that have
more than (maximum size_t)/2 elements.
 
D

dblack

Hi --

In Example 4.3 he has...
1 counts = {}
2 counts.default = 0

Personally I prefer...
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that's personal preference I guess.

Yes, but be careful: your two lines don't do the same thing as each
other:

irb(main):007:0> counts = Hash.new(0)
=> {}
irb(main):008:0> counts["hi"]
=> 0
irb(main):009:0> counts
=> {}
irb(main):010:0> counts = Hash.new{|hash,key| hash[key]=0}
=> {}
irb(main):011:0> counts["hi"]
=> 0
irb(main):012:0> counts
=> {"hi"=>0}
In example 4.4 he rightly identifies line 10 as a little ugly and bemoans the
lack of sort_by_value in hash.
10 keys_by_count = counts.keys.sort { |a, b| counts <=> counts[a] }
it seems he doesn't know about..

count.keys.sort_by{|key| count[key]}

Of course he could have done...

class Hash
def sort_keys_by_value
keys.sort_by{|key| fetch(key)}
end
end


It's best not to get into changing the core language, though, if the
book isn't going to discuss the benefits and pitfalls of doing so in
depth.


David

--
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
 
C

Christoffer Sawicki

Hello.

count.keys.sort_by{|key| count[key]}

IMHO, the most elegant way to do this is:

hash.sort_by { |k, v| v }.map { |k, v| k }

...where k = key and v = value.

Cheers,
 
G

Gregory Brown

Hello.

count.keys.sort_by{|key| count[key]}

IMHO, the most elegant way to do this is:

hash.sort_by { |k, v| v }.map { |k, v| k }

...where k = key and v = value.

Cheers,

Ack, pet peeve.

hash.sort_by { | key, value | value } .map { | key, value | key }

If you have to specify "where k = key and v = value" then these
should have been used in your code.

Always favor readability at the expense of verbosity, both your
future self and whoever else maintains your code will thank you.

I find myself always using |k,v|, when used on something hashlike I
don't think readability suffers.
 
J

James Edward Gray II

Hello.

count.keys.sort_by{|key| count[key]}

IMHO, the most elegant way to do this is:

hash.sort_by { |k, v| v }.map { |k, v| k }

...where k =3D key and v =3D value.

Cheers,

Ack, pet peeve.

hash.sort_by { | key, value | value } .map { | key, value | key }

If you have to specify "where k =3D key and v =3D value" then these
should have been used in your code.

Always favor readability at the expense of verbosity, both your
future self and whoever else maintains your code will thank you.

I find myself always using |k,v|, when used on something hashlike I
don't think readability suffers.

I've also recently adopted the trick of using _ as an unused =20
parameter name. I believe it was Ara that first suggested this and I =20=

think it's a great idea:

hash.sort_by { |key, _| =85 }=85

James Edward Gray II=
 
D

Daniel Martin

Robert Dober said:
Save for

File.open(some_name).each_line

Let the GC close the file!? It is not going to happen !!

Good point. In many ways, it feels that the author is not as
comfortable with Ruby blocks as one should be. A better version,
since we also don't need a separate class:

def get_big_hash(filename)
hash = Hash.new {|h,k| h[k]=[]}
lines = 0
File.open(filename) do |file|
file.each_line do |line|
s = line.split
article = s[2].intern
hash[s[0]] << [ s[1], article ]
lines += 1
STDERR.puts "Line: #{lines}" if (lines % 100_000) == 0
end
end
return hash
end

I really like the block-using form of File.open. It makes me feel all
nice and safe like being wrapped in with-open-file or a carefully
constructed dynamic-wind.
 
P

Paul Battley

T24gMTAvMDcvMDcsIEphbWVzIEVkd2FyZCBHcmF5IElJIDxqYW1lc0BncmF5cHJvZHVjdGlvbnMu
bmV0PiB3cm90ZToKPiBJJ3ZlIGFsc28gcmVjZW50bHkgYWRvcHRlZCB0aGUgdHJpY2sgb2YgdXNp
bmcgXyBhcyBhbiB1bnVzZWQKPiBwYXJhbWV0ZXIgbmFtZS4gIEkgYmVsaWV2ZSBpdCB3YXMgQXJh
IHRoYXQgZmlyc3Qgc3VnZ2VzdGVkIHRoaXMgYW5kIEkKPiB0aGluayBpdCdzIGEgZ3JlYXQgaWRl
YToKPgo+ICAgIGhhc2guc29ydF9ieSB7IHxrZXksIF98IOKApiB94oCmCgpJIHRob3VnaHQgaXQg
Y2FtZSBmcm9tIEhhc2tlbGwsIHdoZXJlIF8gaXMgYSB3aWxkY2FyZCBwYXJhbWV0ZXI6CgpQcmVs
dWRlPiBsZXQgcG93ZXIgXyAwID0gMQpQcmVsdWRlPiBwb3dlciA0IDAKMQpQcmVsdWRlPiBwb3dl
ciAxMDAgMAoxClByZWx1ZGU+IHBvd2VyIDQgMgoqKiogRXhjZXB0aW9uOiA8aW50ZXJhY3RpdmU+
OjE6NC0xNjogTm9uLWV4aGF1c3RpdmUgcGF0dGVybnMgaW4gZnVuY3Rpb24gcG93ZXIKUHJlbHVk
ZT4gbGV0IHBvd2VyIGEgYiA9IGEgKiogYgpQcmVsdWRlPiBwb3dlciA0IDIKMTYuMAoKLi4uIGhv
d2V2ZXIsIEkgd291bGRuJ3QgYmUgc3VycHJpc2VkIHRvIGZpbmQgdGhhdCBIYXNrZWxsIHRvb2sg
aXQgZnJvbQpzb21lIG90aGVyIGxhbmd1YWdlLgoKUGF1bC4K
 
J

James Edward Gray II

I thought it came from Haskell, where _ is a wildcard parameter:

I meant that I believe it was Ara who recommended it as a good name =20
for an unused block parameter in Ruby.

James Edward Gray II=
 
T

Tim Bray

Just got the O'Reilly announcement of the book "Beautiful
Code"... here is the sample chapter by Tim Bray on Ruby!

http://www.oreilly.com/catalog/9780596510046/chapter/ch04.pdf

Pity he didn't run his examples by this forum first, I'm sure we could
have made them more beautiful..

Yep, mostly because for some reason I didn't know about sort_by. I
wonder how I missed that? Obviously, the main point of my article
was to encourage people to use the idiom of reading lines, picking
'em apart with a regex, and accumulating in a hash, invented in awk,
brought to the world in perl, perfected in Ruby.

sort_by aside, I'm not convinced that any of the alternatives to my
main loop are more beautiful or readable. I personally think that
the reason Ruby will enjoy an ever increasing market share in the
programming-languages space is because it's more readable.

I'm looking at the following...
EXAMPLE 4-2 . Printing article names
1 ARGF.each_line do |line|
2 if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }
3 puts $1
4 end
5 end

Since most Linux distro's roll the log files to a reasonable size I
would have just gone with...

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }) do |match|
puts $1
end

Really? Considering most of the readership is new to Ruby?
In Example 4.3 he has...
1 counts = {}
2 counts.default = 0

Personally I prefer...
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that's personal preference I guess.

Gosh, I think I totally disagree on readability grounds.

-Tim
 
T

Tim Bray

As James said, I'm going to have to disagree on this, though I'd clean
it up to:

article_pattern = \
%r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
ARGF.each_line do |line|
line.scan(article_pattern) do |article,|
puts article
end
end

Look, no global variables!

Yep, much better, I wish I'd done it that way.
I agree with your use of Hash.new(0), but disagree that it's mere
personal preference - it's much more common to use even the single
argument form in the constructor than to use default=.

Why? default= makes it more obvious what you're doing. I think it's
better practice.
Agreed. I wonder if perhaps he's spent more time in python or some
other dynamic language than Ruby.

Perl :)

-Tim
 
C

Chad Perrin

In Example 4.3 he has...
1 counts = {}
2 counts.default = 0

Personally I prefer...
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that's personal preference I guess.

Gosh, I think I totally disagree on readability grounds.

Actually, I think this:

counts = Hash.new(0)

. . is perfectly readable, and a bit more elegant than the two-line
version from "Example 4.3". On the other hand, this bit looks pretty
ugly to me:

counts = Hash.new{|hash,key| hash[key]=0]}

. . especially since it looks like someone is afraid he has a limited
number of space characters to use, and doesn't want to run out. Even
with some spaces added to it, though, it's not as pretty and readable.
 
C

Caleb Powell

Yep, mostly because for some reason I didn't know about sort_by. I
wonder how I missed that? O

Perhaps because sort_by is defined in Enumerable, not in the Array
class. I wish that the Ruby documentation inlined methods like these
from Mixin classes (similar to how Javadoc displays a parent class'
methods). Is this possible when generating using rdoc?
 
R

Ryan Davis

Always favor readability at the expense of verbosity, both your =20
future self and whoever else maintains your code will thank you.

I won't. k and v work Just Fine=99 for me and whomever else I work =20
with. Verbosity for verbosity's sake, esp when it pushes outside of =20
common and accepted idioms, is a form of mentarbation.
 
D

degei

I meant that I believe it was Ara who recommended it as a good name
for an unused block parameter in Ruby.

James Edward Gray II

Just as an aside, I believe Erlang adopts this style as a part of this
language, where it will warn if you have unused variables but allows
you to annotate them with _ (ex. _var) to denote that they will not be
used, or just replace the variable name with _ thus eliminating the
warning.

(I don't know Erlang yet, just remembered reading this in
http://pragdave.pragprog.com/pragdave/2007/04/a_first_erlang_.html)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,998
Latest member
MarissaEub

Latest Threads

Top