Searching a file for multiple strings

G

gotbyrd

Hello,

I'm fairly new with python and am trying to build a fairly simple
search script. Ultimately, I'm wanting to search a directory of files
for multiple user inputted keywords. I've already written a script
that can search for a single string through multiple files, now I just
need to adapt it to multiple strings.

I found a bit of code that's a good start:

import re
test = open('something.txt', 'r').read()

list = ['a', 'b', 'c']

foundit = re.compile('|'.join(re.escape(target) for target in list))
if foundit.findall(test):
print 'yes!'

The only trouble with this is it returns yes! if it finds any of the
search items, and I only want a return when it finds all of them. Is
there a bit of code that's similar that I can use?

Thanks
 
T

Tim Chase

I'm fairly new with python and am trying to build a fairly simple
search script. Ultimately, I'm wanting to search a directory of files
for multiple user inputted keywords. I've already written a script
that can search for a single string through multiple files, now I just
need to adapt it to multiple strings.

I found a bit of code that's a good start:

import re
test = open('something.txt', 'r').read()

list = ['a', 'b', 'c']

foundit = re.compile('|'.join(re.escape(target) for target in list))
if foundit.findall(test):
print 'yes!'

The only trouble with this is it returns yes! if it finds any of the
search items, and I only want a return when it finds all of them. Is
there a bit of code that's similar that I can use?

[insert standard admonition about using "list" as a variable
name, masking the built-in "list"]
Unless there's a reason to use regular expressions, you could
simply use

test = open("something.txt").read()
items = ['a', 'b', 'c']
if all(s in test for s in items):
print "Yes!"
else:
print "Sorry, bub"

This presumes python2.5 in which the "all()" function was added.
Otherwise in pre-2.5, you could do

for s in items:
if s not in test:
print "Sorry, bub"
break
else:
print "Yeparoo"

(note that the "else" goes with the "for", not the "if")

-tkc
 
T

Tim Chase

I'm fairly new with python and am trying to build a fairly simple
One more item: if your files are large, it may be more efficient
to scan through them incrementally rather than reading the whole
file into memory, assuming your patterns aren't multi-line (and
by your escaping example, I suspect they're just single-words):

items = set(['a', 'b', 'c'])
for fname in ['file1.txt', 'file2.txt']:
still_to_find = items.copy()
for line in file(fname):
found = set()
for item in still_to_find:
if item in line:
found.add(item)
still_to_find.difference_update(found)
if not still_to_find: break
if still_to_find:
print "%s: Nope" % fname
else:
print "%s: Yep" % fname

just one more way to do it :)

-tkc
 
J

John Machin

Not to discourage the use of Python, but it seems that fgrep with the
-f flag already does exactly what you want. If you're on Windows, you
can get the Windows version of fgrep here: http://unxutils.sourceforge.net/

That URL is antique and a dead end. When you find the actual
sourceforge project page (http://sourceforge.net/projects/unxutils/)
and browse the forums and the CVS repository, you'll see tumbleweed
blowing down Main Street and not much else (besides a few whinges
about that dead end URL, and many unanswered issues).

Alternative: http://gnuwin32.sourceforge.net/
 
J

John Machin

Thanks very much for the update. I had been using the older versions
(on the rare occasions I've had to use Windows). I didn't know there
was a currently-maintained version.

Read my lips: unxutils is *not* "currently-maintained". Use GnuWin32.
 
G

gotbyrd

I'm fairly new with python and am trying to build a fairly simple
search script.  Ultimately, I'm wanting to search a directory of files
for multiple user inputted keywords.  I've already written a script
that can search for a single string through multiple files, now I just
need to adapt it to multiple strings.
I found a bit of code that's a good start:
import re
test = open('something.txt', 'r').read()
list = ['a', 'b', 'c']
foundit = re.compile('|'.join(re.escape(target) for target in list))
if foundit.findall(test):
    print 'yes!'
The only trouble with this is it returns yes! if it finds any of the
search items, and I only want a return when it finds all of them.  Is
there a bit of code that's similar that I can use?

[insert standard admonition about using "list" as a variable
name, masking the built-in "list"]
Unless there's a reason to use regular expressions, you could
simply use

   test = open("something.txt").read()
   items = ['a', 'b', 'c']
   if all(s in test for s in items):
     print "Yes!"
   else:
     print "Sorry, bub"

This presumes python2.5 in which the "all()" function was added.
  Otherwise in pre-2.5, you could do

   for s in items:
     if s not in test:
       print "Sorry, bub"
       break
   else:
     print "Yeparoo"

(note that the "else" goes with the "for", not the "if")

-tkc

Thanks, Tim. What you suggested worked perfectly!

Jason
 
G

gotbyrd

Not to discourage the use of Python, but it seems that fgrep with the
-f flag already does exactly what you want. If you're on Windows, you
can get the Windows version of fgrep here:http://unxutils.sourceforge.net/

Shawn

Shawn,

Thanks for your suggestion, but the office I work in has python 2.4 on
its workstations and we're prohibited from installing outside
software. Learning to use python was quicker than getting the IT
staff to approve adding new software :)

Jason
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top