Determining uniqueness on a single array element

J

James Byrne

I am loading file names and mtimes into an array and then putting that
array inside an outer array. I have run into the situation where the
same file sometimes exists in different places in the file system and
occasionally with a different file name.

I need to ensure that I process the contents of each file only once.
So, in addition to the two elements originally captured I now create an
MD5 hexdigest of the file contents: [ f.mtime, f.name, f.hexdigest ]
and store that.

Now I wish to ensure that each distinct hexdigest is processed but once.
I can do this:

hex_array = []
outer_array.each do |inner_array|
next if hex_array.include?( inner_array[2] )
hex_array << inner_array[2]
. . .

I wonder if there is a better way? Any suggestions?
 
G

Gary Wright

hex_array = []
outer_array.each do |inner_array|
next if hex_array.include?( inner_array[2] )
hex_array << inner_array[2]
. . .

This assumes Ruby 1.9.2 where Array#uniq takes a block:

outer_array.uniq { |mtime, name, md5| md5 }.do |mtime, name, md5|
# do stuff here
end


Gary Wright
 
B

Brian Candler

James Byrne wrote in post #979694:
Now I wish to ensure that each distinct hexdigest is processed but once.
I can do this:

hex_array = []
outer_array.each do |inner_array|
next if hex_array.include?( inner_array[2] )
hex_array << inner_array[2]
. . .

I wonder if there is a better way? Any suggestions?

(1) auto-splat to avoid the [2] magic index

outer_array.each do |mtime, name, hexdigest|

(2) Use a hash, rather than an array, to record ones you've processed.
This avoids a linear search on every iteration

seen = {}
outer_array.each do |mtime, name, hexdigest|
next if seen[hexdigest]
seen[hexdigest] = true
...
end
 
J

James Byrne

Brian Candler wrote in post #979745:
(1) auto-splat to avoid the [2] magic index

outer_array.each do |mtime, name, hexdigest|

(2) Use a hash, rather than an array, to record ones you've processed.
This avoids a linear search on every iteration

seen = {}
outer_array.each do |mtime, name, hexdigest|
next if seen[hexdigest]
seen[hexdigest] = true
...
end

Very nice. Thank you.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top