Help on best way to gather/sort results [Array/Hash]?

T

Tony De

Greetings ruby fans,

I'm a greenhorn at this cool lang ruby. Much to learn. Perhaps you
chaps could help me with an issue I have. I've read through a number of
the post on sorting Arrays and Hashes. And yet I can't seem to put my
finger on the solution. I want to sort on the second column. So it
seemed from what information I gathered, that I need to gather my
results into a hash. Am I on the right track? Oh, let me tell you what
your looking at here; I am scanning each mail file in our queue for
commonalites (spammer) instead of the useless (my opinoin) qmHandle we
have for qmail. So, I've got a working prototype. If you could help me
on my sort and if you have any other comments/suggestions to throw my
way I'm sure I could learn a thing or two. Being new to ruby, there's a
lot of new ideas here. Thank guys.

Code:
#!/usr/local/bin/ruby -w
require 'find'

@results = Array.new

# Iterate through the child directories & call the parse file method
def scan_dirs
root = "/var/qmail/queue/mess"
Find.find(root) do |file|
parse_file(file)
end
@results.sort!
print_results
end

# Parse each file for the information we want
def parse_file(path)
file = path[(path.length-7), path.length]
sourceip = ""
email = ""
subject = ""
email_found = false
line_no = 0

File.open(path, 'r').each do |line|

line = line.strip # Remove any \n\r nil, etc
line_no += 1

if line_no == 1
if line.match("invoked for bounce")
# Internal Bounce Msg
sourceip = "SMTP"
end
end

if (line_no == 2 and sourceip.empty?)
if line.match("webmail.commspeed.net")
sourceip = "Webmail"
else
sourceip = line.scan(/\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b/)
if sourceip.empty?
sourceip = "No Source IP**"
end
end
end

if (line.match("SquirrelMail") and sourceip == "Webmail") or
(line.match("From:") and sourceip != "Webmail")
if email.empty?
email = get_email(line)
end
end

if line.match("Subject:") and subject.empty?
subject = truncate(line,50)
end

if line_no == 20 #Nothing more we want to read in the file
@results << ["#{file}", "#{sourceip}", "#{email}", "#{subject}"]
line_no = 0
return
end
end
end

# Truncate subject line
def truncate(string, width)
if string.length <= width
string
else
string[0, width-3] + "..."
end
end

# Print out results
def print_results
print "\e[2J\e[f"

print "Mess#".ljust(10," ")
print "Source".ljust(18," ")
print "Email Addrress".ljust(30, " ")
print "Subject".ljust(50, " ")
1.times { print "\n" }
111.times { print "-" }
1.times { print "\n" }

@results.each do |line|
print line[0].ljust(10," ")
print line[1].ljust(18," ")
print line[2].ljust(30, " ")
print line[3].ljust(50, " ")

1.times { print "\n" }
end
end

# Get email address from line/string
def get_email(line_to_parse)
# Pull the email address from the line
line_to_parse.scan(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i).flatten
end

# Ok, begin our scan
scan_dirs
exit

Partial results listing: (I've modified the content to protect privacy)
Mess# Source Email Addrress Subject
---------------------------------------------------------------------------------------------------------------
3360108 111.111.17.1 (e-mail address removed)
3360167 111.111.7.213 (e-mail address removed) Subject:
Removed to protect the innocent....
3360186 Webmail (e-mail address removed) Subject:
Removed to protect the innocent
3360209 111.111.40.10 (e-mail address removed)
3360215 111.111.15.110 (e-mail address removed) Subject:
Removed to protect the innocent
3360217 111.111.9.248 (e-mail address removed) Subject:
Removed to protect the innocent
3360226 111.111.11.43 (e-mail address removed) Subject:
Removed to protect the innocent
3360228 111.111.16.34 (e-mail address removed) Subject:
Pictures
3360241 111.111.18.73 (e-mail address removed) Subject:
Removed to protect the innocent
3360242 111.111.14.109 (e-mail address removed) Subject:
Emailing: maps.htm
 
T

Todd Benson

Greetings ruby fans,

I'm a greenhorn at this cool lang ruby. Much to learn. Perhaps you
chaps could help me with an issue I have. I've read through a number of
the post on sorting Arrays and Hashes. And yet I can't seem to put my
finger on the solution. I want to sort on the second column. So it
seemed from what information I gathered, that I need to gather my
results into a hash. Am I on the right track? Oh, let me tell you what
your looking at here; I am scanning each mail file in our queue for
commonalites (spammer) instead of the useless (my opinoin) qmHandle we
have for qmail. So, I've got a working prototype. If you could help me
on my sort and if you have any other comments/suggestions to throw my
way I'm sure I could learn a thing or two. Being new to ruby, there's a
lot of new ideas here. Thank guys.

Being a little lazy at the moment to go through the code, have you
looked at #sort_by{}?

Todd
 
T

Tony De

Todd said:
Being a little lazy at the moment to go through the code, have you
looked at #sort_by{}?

Todd

I believe I ran across it but I recall it blew up when I worked with it.
Likely an issue with my as the unskilled ruby coder than the method
itself. I'll take another look. Thanks

tonyd
 
T

Tony De

Tony said:
I believe I ran across it but I recall it blew up when I worked with it.
Likely an issue with my as the unskilled ruby coder than the method
itself. I'll take another look. Thanks

tonyd

Ok, tried this @results.sort_by { |a| a[1] } - thinking that I want to
sort on the second element in my array "Source". No sort was performed
at all. Scratching my head..

tonyd
 
T

Tony De

Christian said:
Wouldn't you want something like @results.sort {|x, y| y[1] <=> x[1]}?

Ok, tried this @results.sort_by { |a| a[1] } - thinking that I want to
sort on the second element in my array "Source". No sort was performed
at all. Scratching my head..

tonyd


--

"Every child has many wishes. Some include a wallet, two chicks and a
cigar,
but that's another story."


Just tried that. No sort. Just for ref. My array struct looks like
this:
@results << ["#{file}", "#{sourceip}", "#{email}", "#{subject}"]

I want to sort on sourceip. Thanks guys.
 
C

Christopher Dicely

Tony said:
I believe I ran across it but I recall it blew up when I worked with it.
Likely an issue with my as the unskilled ruby coder than the method
itself. I'll take another look. Thanks

tonyd

Ok, tried this @results.sort_by { |a| a[1] } - thinking that I want to
sort on the second element in my array "Source". No sort was performed
at all. Scratching my head..

Did you capture the result? #sort and #sort_by are non-destructive, so this:

---
a= [ [0,3], [4,1], [5,2]]
a.sort_by {|i| i[1]}
puts a.inspect
 
T

Tony De

Christian said:
Does the sort work if you just put in the sourceip? Say, you did
@results <<
"#{sourceip}", and then used @results.sort. Does it still not sort?




--

"Every child has many wishes. Some include a wallet, two chicks and a
cigar,
but that's another story."

Yeah, if I only collect the "sourceip" and do a @results.sort it doesn't
work. I have to do a .sort!. And the same behaviour with @results <<
["#{file}", "#{sourceip}", "#{email}", "#{subject}"]. @results.sort
does not sort. @results.sort! does. And I tried sort! as an after
thought.

tonyd
 
T

Tony De

Tony said:
Christian said:
Does the sort work if you just put in the sourceip? Say, you did
@results <<
"#{sourceip}", and then used @results.sort. Does it still not sort?




--

"Every child has many wishes. Some include a wallet, two chicks and a
cigar,
but that's another story."

Yeah, if I only collect the "sourceip" and do a @results.sort it doesn't
work. I have to do a .sort!. And the same behaviour with @results <<
["#{file}", "#{sourceip}", "#{email}", "#{subject}"]. @results.sort
does not sort. @results.sort! does. And I tried sort! as an after
thought.

tonyd

"as an after thought" got me to thinking. Try @results.sort! {|x, y|
y[1] <=> x[1]}. And it works. .sort fails, .sort! works. Any ideas
why? I would really like to understand this a little more. Thanks
guys, all of you, for your help.

tonyd
 
D

David A. Black

Hi --

Tony said:
Christian said:
Does the sort work if you just put in the sourceip? Say, you did
@results <<
"#{sourceip}", and then used @results.sort. Does it still not sort?



cigar,
--
Posted via http://www.ruby-forum.com/.




--

"Every child has many wishes. Some include a wallet, two chicks and a
cigar,
but that's another story."

Yeah, if I only collect the "sourceip" and do a @results.sort it doesn't
work. I have to do a .sort!. And the same behaviour with @results <<
["#{file}", "#{sourceip}", "#{email}", "#{subject}"]. @results.sort
does not sort. @results.sort! does. And I tried sort! as an after
thought.

tonyd

"as an after thought" got me to thinking. Try @results.sort! {|x, y|
y[1] <=> x[1]}. And it works. .sort fails, .sort! works. Any ideas
why? I would really like to understand this a little more. Thanks
guys, all of you, for your help.

Just to be clear: they both work. They're different methods, though,
and they do different things. sort! stores its results back in the
original object; sort returns the results in a new object.

The significance of the ! at the end is that the method is considered
to be the "dangerous" version of the non-! method of the same name.
(This is the conventional, intended meaning of !, although it has no
language-level significance to the interpreter.) You can think of the
"danger", in this case, as consisting of the fact that your original
object will be altered. The ! is a kind of "heads up!" sign.


David

--
Rails training from David A. Black and Ruby Power and Light:
ADVANCING WITH RAILS April 14-17 New York City
INTRO TO RAILS June 9-12 Berlin
ADVANCING WITH RAILS June 16-19 Berlin
See http://www.rubypal.com for details and updates!
 
T

Tony De

David said:
Hi --



Just to be clear: they both work. They're different methods, though,
and they do different things. sort! stores its results back in the
original object; sort returns the results in a new object.

The significance of the ! at the end is that the method is considered
to be the "dangerous" version of the non-! method of the same name.
(This is the conventional, intended meaning of !, although it has no
language-level significance to the interpreter.) You can think of the
"danger", in this case, as consisting of the fact that your original
object will be altered. The ! is a kind of "heads up!" sign.


David

Oooooooo! We'll that makes perfect sense. Thanks! You know, sometimes
you read your handy pickaxe or a blog somewhere, buy it slides right
past you. I appreciate the clarification.

tonyd
 
J

Jesús Gabriel y Galán

I've read through a number of
the post on sorting Arrays and Hashes. And yet I can't seem to put my
finger on the solution. I want to sort on the second column.

sort_by is your friend. This is an example:

irb(main):002:0> result = [[1,2,3],[4,5,6],[1,3,7],[3,2,1]]
=> [[1, 2, 3], [4, 5, 6], [1, 3, 7], [3, 2, 1]]
irb(main):003:0> result.sort_by{|a| a[1]}
=> [[1, 2, 3], [3, 2, 1], [1, 3, 7], [4, 5, 6]]

Regards,

Jesus.
 
T

Todd Benson

Oooooooo! We'll that makes perfect sense. Thanks! You know, sometimes
you read your handy pickaxe or a blog somewhere, buy it slides right
past you. I appreciate the clarification.

tonyd

They all work. I use #sort_by all the time for legibility, and I
don't care that much about speed for the stuff I work on. According
to the docs, sort_by doesn't scale by speed for small key sets and
large populations (that might be your case). It does, however,
perform better when there occurs object creation for the comparison
test.

Todd
 
T

Tony De

Todd said:
They all work. I use #sort_by all the time for legibility, and I
don't care that much about speed for the stuff I work on. According
to the docs, sort_by doesn't scale by speed for small key sets and
large populations (that might be your case). It does, however,
perform better when there occurs object creation for the comparison
test.

Todd

Thanks Jesus & Todd for your posts also. I appreciate the education.
Forums are great for getting real world experience on language usage and
gotcha's. So I do have another question on my sort. I realize that in
addition to the sort on the second element in each row of my array
(sourceip) I would also like to then sort on the third element (email).
So my current sort is:

@results.sort! {|x, y| y[1] <=> x[1]}

So this now sorts first by element[2] and then by element[3]:
new_results = @results.sort_by { |x| [x[1], x[2]] }

There are so many ways to accomplish the same result. That dosen't
mean, however, it's the most efficient. Would there be a more efficient
way to do this? Not that this script is costing me a great deal in
resources. But it nice to code tight when possible. Thanks again.

tonyd
 
T

Todd Benson

Todd said:
They all work. I use #sort_by all the time for legibility, and I
don't care that much about speed for the stuff I work on. According
to the docs, sort_by doesn't scale by speed for small key sets and
large populations (that might be your case). It does, however,
perform better when there occurs object creation for the comparison
test.

Todd

Thanks Jesus & Todd for your posts also. I appreciate the education.
Forums are great for getting real world experience on language usage and
gotcha's. So I do have another question on my sort. I realize that in
addition to the sort on the second element in each row of my array
(sourceip) I would also like to then sort on the third element (email).
So my current sort is:


@results.sort! {|x, y| y[1] <=> x[1]}

So this now sorts first by element[2] and then by element[3]:
new_results = @results.sort_by { |x| [x[1], x[2]] }

There are so many ways to accomplish the same result. That dosen't
mean, however, it's the most efficient. Would there be a more efficient
way to do this? Not that this script is costing me a great deal in
resources. But it nice to code tight when possible. Thanks again.



tonyd

On my machine...

a = [[3, 2, 1], [4, 5, 6], [1, 5, 7], [1, 2, 3]]

t = Time.now
10_000.times do
a.sort_by {|x| [x[1], x[2]]}
end
puts Time.now - t

t = Time.now
10_000.times do
a.sort {|x,y| [x[1], x[2]] <=> [y[1], y[2]]}
end
puts Time.now - t

10_000.times do
a.sort! {|x,y| [x[1], x[2]] <=> [y[1], y[2]]}
end
puts Time.now - t

=> 0.25 #sort_by
=> 0.453 #sort
=> 0.859 #sort!


This may be due to the creation of addition Array objects within the block.

Just a guess.

Todd
 
J

Jesús Gabriel y Galán

Thanks Jesus & Todd for your posts also. I appreciate the education.
Forums are great for getting real world experience on language usage and
gotcha's. So I do have another question on my sort. I realize that in
addition to the sort on the second element in each row of my array
(sourceip) I would also like to then sort on the third element (email).
So my current sort is:


@results.sort! {|x, y| y[1] <=> x[1]}

So this now sorts first by element[2] and then by element[3]:
new_results = @results.sort_by { |x| [x[1], x[2]] }

There are so many ways to accomplish the same result. That dosen't
mean, however, it's the most efficient. Would there be a more efficient
way to do this? Not that this script is costing me a great deal in
resources. But it nice to code tight when possible. Thanks again.
On my machine...

a = [[3, 2, 1], [4, 5, 6], [1, 5, 7], [1, 2, 3]]

t = Time.now
10_000.times do

a.sort_by {|x| [x[1], x[2]]}
end
puts Time.now - t

t = Time.now
10_000.times do
a.sort {|x,y| [x[1], x[2]] <=> [y[1], y[2]]}
end
puts Time.now - t

10_000.times do
a.sort! {|x,y| [x[1], x[2]] <=> [y[1], y[2]]}
end
puts Time.now - t

=> 0.25 #sort_by
=> 0.453 #sort
=> 0.859 #sort!


This may be due to the creation of addition Array objects within the block.

The difference between sort and sort_by is that sort calls the block
every time it
needs to make a comparison between two elements, passing both elements.
sort_by, on the other hand, calls the block once for each element in the array,
and calculates and records the sort value for each element. Then it performs
the sorting algorithm against those values.

So, when is one more efficient than the other depends on the length of the array
(well, the number of comparisons made by the sorting algorithm) and the cost of
calculating the sort value.

Jesus.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,875
Messages
2,569,928
Members
46,190
Latest member
JorjaRosen

Latest Threads

Top