how to search multiple textfiles ?

S

Stef Mientki

hello,

I want to search multiple textfiles (python source files) for a specific
word.
I can find all files, open them and do a search,
but I guess that will be rather slow.

I couldn't find any relevant information through google.

Does anyone know of a search library that performs this task fast ?

If it indeed only concerns py-files,
is there another way of searching words ?
( I could imagine that such a "py-only-search" would have benefits,
because you could set a flag to see the words in comment yes or no )

thanks,
Stef Mientki


Het UMC St Radboud staat geregistreerd bij de Kamer van Koophandel in het handelsregister onder nummer 41055629.
The Radboud University Nijmegen Medical Centre is listed in the Commercial Register of the Chamber of Commerce under file number 41055629.
 
M

Mike Driscoll

hello,

I want to search multiple textfiles (python source files) for a specific
word.
I can find all files, open them and do a search,
but I guess that will be rather slow.

I couldn't find any relevant information through google.

Does anyone know of a search library that performs this task fast ?

If it indeed only concerns py-files,
is there another way of searching words ?
( I could imagine that such a "py-only-search" would have benefits,
because you could set a flag to see the words in comment yes or no )

thanks,
Stef Mientki

Het UMC St Radboud staat geregistreerd bij de Kamer van Koophandel in het handelsregister onder nummer 41055629.
The Radboud University Nijmegen Medical Centre is listed in the Commercial Register of the Chamber of Commerce under file number 41055629.


On Windows I use the free version of Bare Grep: http://www.baremetalsoft.com/baregrep/

No, it's not a Python solution, but it works for my needs. You should
try using Python to search your script files and see if it really is
too slow though.

Mike
 
G

George Sakkis

hello,

I want to search multiple textfiles (python source files) for a specific
word.
I can find all files, open them and do a search,
but I guess that will be rather slow.

I couldn't find any relevant information through google.

Does anyone know of a search library that performs this task fast ?

If it indeed only concerns py-files,
is there another way of searching words ?
( I could imagine that such a "py-only-search" would have benefits,
because you could set a flag to see the words in comment yes or no )

If you're on *nix platform, you can use:

$ find -name "*py" | xargs egrep "\bword\b"

HTH,
George
 
M

Méta-MCI \(MVP\)

Hi!

On Windows, you can use the (standard) command findstr

Example:
findstr /n /s /I strsearched *.py

@-salutations
 
P

Paul Rubin

Stef Mientki said:
Does anyone know of a search library that performs this task fast ?

You mean you want a Python search engine (with inverted indexes and all that)?
Try: nucular.sf.net
 
S

Sean DiZazzo

hello,

I want to search multiple textfiles (python source files) for a specific
word.
I can find all files, open them and do a search,
but I guess that will be rather slow.

I couldn't find any relevant information through google.

Does anyone know of a search library that performs this task fast ?

If it indeed only concerns py-files,
is there another way of searching words ?
( I could imagine that such a "py-only-search" would have benefits,
because you could set a flag to see the words in comment yes or no )

thanks,
Stef Mientki

Het UMC St Radboud staat geregistreerd bij de Kamer van Koophandel in het handelsregister onder nummer 41055629.
The Radboud University Nijmegen Medical Centre is listed in the Commercial Register of the Chamber of Commerce under file number 41055629.

I use 'fgrep' ie... `fgrep -r "toFind" /source`

~Sean
 
S

Stef Mientki

Mike said:
On Windows I use the free version of Bare Grep: http://www.baremetalsoft.com/baregrep/

No, it's not a Python solution, but it works for my needs. You should
try using Python to search your script files and see if it really is
too slow though.
hi guys,
I did some tests and I'm amazed about the results:

I did a search on the Python directory: 300 MB, 10325 files in 660 folders.

I did several searches, with / without case-sensitive, whole words /
not, many/few occurrences,
but the differences between those were negligible.

Finding all occurences with line numbers:
- Pyscripter 110 sec ( PyScripter is the default IDE I use now)
- Delphi 20 .. 35 sec
- Findstr 4 sec

With the following programs I only searched for the first occurence,
which I think is good in the case of many files, because you can only
view 20 or 30 lines in 1 screen.
- Explorer XXX didn't find anything
- FileNurse 5 sec (FileNurse is my own Explorer replacement, written in
Delphi)
- Python 3 sec (very quick and dirty procedure with string.find method

I'm really amazed by the speed of Python !!
It can only be beaten by findstr, which is only available on windows.

Paul: nucular looks very promissing, but I couldn't get it working
within a few minutes. I might also be a little overkill,
but I'll certainly bookmark the link for future use.

thanks again,
cheers,
Stef
 
M

Méta-MCI \(MVP\)

Hi !

Thanks for return.

Some infos: from a long time, I found that it's often more fast to use
windows's command, instead of develop in high level language (and also,
low level...)

FINDSTR is fast. OK. But internal commands are more fast. Example : DIR
(with all his options)
And it's faster to read the result via a Pipe.
Thus, I use frequently this sort of function:


import os

def cmdone(repstart, commande, moderetour="LIST"):
os.chdir(repstart)
sret=''.join(os.popen(commande))
if moderetour.upper() == "STR":
return sret
else:
return sret.split('\n')

print cmdone('D:\\dev\\python','findstr /N /I ponx *.py','STR')
print
print cmdone('D:\\dev\\python','dir *.jpg /B')




Sorry for my bad english, and have a good day...
 
L

Lawrence D'Oliveiro

- Pyscripter 110 sec ( PyScripter is the default IDE I use now)
- Delphi 20 .. 35 sec
- Findstr 4 sec

What order did you try try them in? Did you try each one more than once, in
different orders? Just to rule out filesystem caching effects.
I'm really amazed by the speed of Python !!
It can only be beaten by findstr, which is only available on windows.

Did you try find -exec grep -F?
 
S

Stef Mientki

Lawrence said:
In message <[email protected]>, Stef
Mientki wrote:



What order did you try try them in? Did you try each one more than once, in
different orders? Just to rule out filesystem caching effects.
I repeated all of them at least twice, to see if I got the same result.
And indeed the very first run (PyScripter) was about 150 sec.
So I think the above mentioned values give a good impression, nothing more.
Did you try find -exec grep -F?
well my windows version doesn't understand that :

P:\Python>find /?
Searches for a text string in a file or files.

FIND [/V] [/C] [/N] [/I] [/OFF[LINE]] "string" [[drive:][path]filename[
....]]

/V Displays all lines NOT containing the specified string.
/C Displays only the count of lines containing the string.
/N Displays line numbers with the displayed lines.
/I Ignores the case of characters when searching for the string.
/OFF[LINE] Do not skip files with offline attribute set.
"string" Specifies the text string to find.
[drive:][path]filename
Specifies a file or files to search.

If a path is not specified, FIND searches the text typed at the prompt
or piped from another command.

cheers,
Stef
 
L

Lawrence D'Oliveiro

well my windows version doesn't understand that :

I assumed when you said "It can only be beaten by findstr, which is only
available on windows", that meant you had tried some non-Windows options,
before concluding that Windows "findstr" was the fastest.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top