need a hash/iteration tutorial...text reading.

S

Steven Demonnin

I have been working my way through a ruby book (Beginning Ruby) and I
want to extend on an interesting capability dealing with hashes.

the code:
text=''
line_count=0
File.open("txt.txt").each do |line|
line_count +=1
text << line
end


puts "#{line_count} lines"

total_charachters=text.length
puts "#{total_charachters} charachters"
sentence_count=text.split(/\.|\?|!/).length
total_characters_no_spaces=text.gsub(/\s+/,"").length
puts "#{total_characters_no_spaces} without spaces"
word_count=text.split.length
puts "#{word_count} words in the text and #{sentence_count} sentences"
paragraph_count= text.split(/\n\n/).length
puts "#{paragraph_count} paragraphs"
puts "#{sentence_count/paragraph_count} sentences per paragraph on
avarage"
puts "#{word_count/sentence_count} words per sentence"
stop_words= %w{a the by on for of are with just but and to the my has
some in}
words=text.scan(/\w+/)
keywords=words.select{|word| !stop_words.include?(word)}
puts "#{((keywords.length.to_f/words.length.to_f)*100).to_i}% non stop
words"




this has been a fun code, and I have been running various text files
through it.

What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?
 
M

Michael Kohl

[Note: parts of this message were removed to make it a legal post.]

What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?


Super quick and dirty, but should get you started:

words = {}
File.open("txt.txt").each do |line|
line.split(' ').each { |w| words.has_key?(w) ? words[w] += 1 : words[w] =
1 }
end

words.sort_by { |e| e[1]}.reverse.each { |k, v| puts "#{k}: #{v}"}
 
M

Matt Neuburg

Steven Demonnin said:
What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?

That is called a "histogram" and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
"word". If you're willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:

http://www.apeth.com/ruby/02justenoughruby.html

To sort, add this line:

wds = h.sort {|x,y| x[1] <=> y[1]}

Note that the concept "sort a hash" has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.
 
S

Steven Demonnin

Matt said:
Steven Demonnin said:
What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?

That is called a "histogram" and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
"word". If you're willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:

http://www.apeth.com/ruby/02justenoughruby.html

To sort, add this line:

wds = h.sort {|x,y| x[1] <=> y[1]}

Note that the concept "sort a hash" has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.
Since arrays are key/value (from what I can understand), there are only
two part to the array. I thought you couldn't put a third value in an
array.

thanks for the help. I am going to check out the web page.

(Never knew of Histogram. Learn something new every other day or so.)
 
D

David A. Black

Hi --

Steven Demonnin said:
What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?

That is called a "histogram" and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
"word". If you're willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:

http://www.apeth.com/ruby/02justenoughruby.html

To sort, add this line:

wds = h.sort {|x,y| x[1] <=> y[1]}

Or, slightly more compact:

wds = h.sort_by {|x| x[1] }
Note that the concept "sort a hash" has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.

In 1.9 hashes are ordered, but by key-insertion order. You can't
change the order, so you can't sort back into a hash (unless you
create a new hash manually using the sorted order).


David

--
David A. Black / Ruby Power and Light, LLC
Ruby/Rails consulting & training: http://www.rubypal.com
Now available: The Well-Grounded Rubyist (http://manning.com/black2)
"Ruby 1.9: What You Need To Know" Envycasts with David A. Black
http://www.envycasts.com
 
D

David A. Black

Hi --

Since arrays are key/value (from what I can understand), there are only
two part to the array. I thought you couldn't put a third value in an
array.

It's more that you sort the hash into an array of two-element arrays,
and then sort that array. Iterating through an array of two-element
arrays is similar to iterating through a hash, in the sense that each
iteration yields two values.


David

--
David A. Black / Ruby Power and Light, LLC
Ruby/Rails consulting & training: http://www.rubypal.com
Now available: The Well-Grounded Rubyist (http://manning.com/black2)
"Ruby 1.9: What You Need To Know" Envycasts with David A. Black
http://www.envycasts.com
 
R

Robert Dober

Hi --

Steven Demonnin said:
What I want to know is, is it possible to create a has where the key is
the word, and the value is the number of occurrences of the word in the
text, and then sort the hash by the values?

That is called a "histogram" and is one of the most common examples
(Google is your friend). The sticking point here is what you mean by a
"word". If you're willing to accept a fairly crude definition of this
notion, then that is an example I develop in the Blocks section of my
Ruby tutorial chapter here:

http://www.apeth.com/ruby/02justenoughruby.html

To sort, add this line:

wds =3D h.sort {|x,y| x[1] <=3D> y[1]}

Or, slightly more compact:

=A0wds =3D h.sort_by {|x| x[1] }
Note that the concept "sort a hash" has no real meaning, since a hash is
not ordered. What you can do is to convert to an array and sort the
array.

In 1.9 hashes are ordered, but by key-insertion order. You can't
change the order, so you can't sort back into a hash (unless you
create a new hash manually using the sorted order).

IIRC the insertion order is maintained correctly for literals and Hash[]
thus
Hash[ * a_hash.sort_by{ whatever } ]
should do the trick.

Cheers
Robert
David

--
David A. Black / Ruby Power and Light, LLC
Ruby/Rails consulting & training: http://www.rubypal.com
Now available: The Well-Grounded Rubyist (http://manning.com/black2)
"Ruby 1.9: What You Need To Know" Envycasts with David A. Black
http://www.envycasts.com



--=20
Toutes les grandes personnes ont d=92abord =E9t=E9 des enfants, mais peu
d=92entre elles s=92en souviennent.

All adults have been children first, but not many remember.

[Antoine de Saint-Exup=E9ry]
 
R

Robert Dober

=A0This message is in MIME format. =A0The first part should be readable t= ext,
=A0while the remaining parts are likely unreadable without MIME-aware too=
ls.
Ooops, that was yet another mistake, thx for telling me David.
IIRC the insertion order is maintained correctly for literals and Hash[]
thus
Hash[ * a_hash.sort_by{ whatever } ]
should do the trick.

You'd want to throw in a flatten(1) to unwrap the inner arrays:

=A0[*hash.sort_by {...}.flatten(1)]
well spotted.
R.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top