Directory searching againist a text file

S

Stuart Clarke

I am in the middle of writing a quick program which will scan the
contents of a given file path recursively for a list of keywords stored
in a file. My code so far is below, but before moving ahead I have two
questions.

First: I am passing in a text file called "terms.txt" to search for each
keyword in the file I assume the best way to to do so is as follows:

terms.each do |term|
if line =~ term
puts ""
end

My second question is: This program works well for searching text files
but what about word docs and spreadsheets? Do i need some Windows API in
there??


Many thanks


require 'find'

class ESearch

#method which is passed file path from cmd line
def scanFiles(path)
terms = "C:\Documents and Settings\user\Desktop\terms.txt"
#process each file under the passed file path
Find.find(path) do |curPath|
next unless File.file?(curPath)
#process the contens of each file line by line counting line
nmbers
File.open(curPath) do |file|
file.each do |line|
#check if a line in the file matches term and output the path
and line number
if line =~ terms
puts "#{curPath}"
end
end
end
end
end
end

#run of cmd line pass in file path, this will ask for a file path if one
is not passed
if __FILE__ == $0
if ARGV.size != 1
puts "Use: #{$0} [path]"
exit
end

esearch = ESearch.new()
esearch.scanFiles(ARGV[0])
end
 
W

William James

Stuart said:
I am in the middle of writing a quick program which will scan the
contents of a given file path recursively for a list of keywords
stored in a file. My code so far is below, but before moving ahead I
have two questions.

First: I am passing in a text file called "terms.txt" to search for
each keyword in the file I assume the best way to to do so is as
follows:

terms.each do |term|
if line =~ term
puts ""
end

My second question is: This program works well for searching text
files but what about word docs and spreadsheets? Do i need some
Windows API in there??

You can read these files if you open them in binary mode.
However, they will contain so much extra binary crap that
it may not be easy to search in them.
Many thanks


require 'find'

class ESearch

#method which is passed file path from cmd line
def scanFiles(path)
terms = "C:\Documents and Settings\user\Desktop\terms.txt"
#process each file under the passed file path
Find.find(path) do |curPath|
next unless File.file?(curPath)
#process the contens of each file line by line counting line
nmbers
File.open(curPath) do |file|
file.each do |line|
#check if a line in the file matches term and output the
path and line number
if line =~ terms
puts "#{curPath}"
end
end
end
end
end
end

#run of cmd line pass in file path, this will ask for a file path if
one is not passed
if FILE == $0
if ARGV.size != 1
puts "Use: #{$0} [path]"
exit
end

esearch = ESearch.new()
esearch.scanFiles(ARGV[0])
end


terms = IO.read("terms.txt").strip.split(/\s*\n\s*/)

ARGF.each{|line| line.strip!
if terms.include? line
puts "#{ARGF.filename}:#{ARGF.lineno}: #{line}"
end
}

Running it:

ruby scanner.rb *.dat
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top