More efficient comparing

K

Kyle Hunter

Hello,

I've got an array that holds urls. Example Element:
http://bla.random.com/bla.jpg.

I also have images in a directory. I'd like my script to compare the
bla.jpg from the URL with all files in a directory to make sure it's not
a duplicate of something that's already there - if it is - delete it
from the array. My current way of doing it uses quite a bit of
resources, was wondering if someone could show me a more efficient
example if possible.

Current Code:
Dir["#{$baseDir}/#{board}/#{$dateString}/**"].each do |file|
$imgArray.delete_if{
|i| i =~ /#{file.split('/').pop}/
}
 
J

Jano Svitok

Hello,

I've got an array that holds urls. Example Element:
http://bla.random.com/bla.jpg.

I also have images in a directory. I'd like my script to compare the
bla.jpg from the URL with all files in a directory to make sure it's not
a duplicate of something that's already there - if it is - delete it
from the array. My current way of doing it uses quite a bit of
resources, was wondering if someone could show me a more efficient
example if possible.

Current Code:
Dir["#{$baseDir}/#{board}/#{$dateString}/**"].each do |file|
$imgArray.delete_if{
|i| i =~ /#{file.split('/').pop}/
}

1. use file.split('/').last instead of pop - pop modifies the array

2. move the regex out of the block - you'll save some object
constructions/destructions, and it'll be easier on GC too.

3. try instead of splitting using a regex or rindex.
i.e. last_part = $1 if file =~ /\/([^/]*)$/
or last_part = file[file.rindex('/')..-1] or
file[(file.rindex('/')||0)..-1] to fix the cae when there's no '/' in
the filename.

4. use Benchmark class to measure your improvements - that way, you'll
know exactly if the new code is better or not.
 
J

Justin Collins

Kyle said:
Hello,

I've got an array that holds urls. Example Element:
http://bla.random.com/bla.jpg.

I also have images in a directory. I'd like my script to compare the
bla.jpg from the URL with all files in a directory to make sure it's not
a duplicate of something that's already there - if it is - delete it
from the array. My current way of doing it uses quite a bit of
resources, was wondering if someone could show me a more efficient
example if possible.

Current Code:
Dir["#{$baseDir}/#{board}/#{$dateString}/**"].each do |file|
$imgArray.delete_if{
|i| i =~ /#{file.split('/').pop}/
}

Try:

require 'set'

files = Dir["#{$baseDir}/#{board}/#{$dateString}/**"].map { |file|
File.basename(file) }.to_set

$imgArray.delete_if {|i| files.include? i.split("/")[-1] }


-Justin
 
K

Kyle Hunter

Thanks both of you,

Justin's example seems to have made it more efficient. Later on I'll
compare my original, a version with Jano's suggestions, and Justin's
using benchmark to see what is the most efficient.

Any more suggestions are welcome, of course, the more to benchmark the
merrier!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,241
Latest member
Lisa1997

Latest Threads

Top