searching a substring

J

junky_fellow

Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.

thanx in advance.....
 
M

Martin Johansen

junky_fellow said:
Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.

thanx in advance.....


It is a fact that you must check all characters (except the last two). The
simplest method would be to search for 'a' and if you find it, test for 'b'
then 'c'.

The fastest method in this case (byte data[4] = a,b,c) would be to cast data
to an int (if you work with a 32-bit processor) then "xor" it with every
third byte of the data string, if the result is 0 you have found a sub
string. This can be optimized in assembler as well, and I think this is the
fastest method if combined with a suitable buffering method.
 
P

pete

junky_fellow said:
Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.

Use srstr(), from <string.h>.
 
D

Darrell Grainger

Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.

You have to define efficient. If primary memory is a concern then
efficient would be reading one character at a time and seeing if it
matches the first letter of the string. If it does then compare the
second, third, etc. characters of the string.

If efficient is as fast as possible without concern for primary memory
then determine file size, allocate a block of memory the size of the file,
read the entire contents into memory then search the memory for the string
(maybe using strstr() or just using the search for the first letter, if
you find it search the the entire string).

Or you could try a combination. Allocate a large chunk of memory, read in
part of the file and search the memory for the string you are looking for.
Only danger here is if the last letter of block #1 is 'a' and the first
letters of block #2 are "bc". You'll never to figure out how to handle
that situations.
 
P

pete

Martin said:
junky_fellow said:
Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.

thanx in advance.....

It is a fact that you must check all characters
(except the last two). The simplest method would be to
search for 'a' and if you find it, test for 'b' then 'c'.

The fastest method in this case (byte data[4] = a,b,c)
would be to cast data to an int
(if you work with a 32-bit processor) then "xor" it with every

xor "what" with every third byte ?
third byte of the data string,
if the result is 0 you have found a sub string.

Even assuming CHAR_BIT equals 8 and sizeof(int) equals 4, and
ignoring alignment problems,
I have no idea of what you think you're saying.
 
M

Martin Johansen

pete said:
Martin said:
junky_fellow said:
Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.

thanx in advance.....

It is a fact that you must check all characters
(except the last two). The simplest method would be to
search for 'a' and if you find it, test for 'b' then 'c'.

The fastest method in this case (byte data[4] = a,b,c)
would be to cast data to an int
(if you work with a 32-bit processor) then "xor" it with every

xor "what" with every third byte ?

Oh, sorry, I ment xor every third byte with data[3] casted to int, three
byte from a file could be read into an array in the same manner as a,b,c in
data[3].
 
P

pete

Martin said:
pete said:
Martin said:
Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.

thanx in advance.....

It is a fact that you must check all characters
(except the last two). The simplest method would be to
search for 'a' and if you find it, test for 'b' then 'c'.

The fastest method in this case (byte data[4] = a,b,c)
would be to cast data to an int
(if you work with a 32-bit processor) then "xor" it with every

xor "what" with every third byte ?

Oh, sorry, I ment xor every third byte with data[3]
casted to int, three byte from a file could be read
into an array in the same manner as a,b,c in data[3].

I think you mean data[2], if you mean 'c'.

I don't understand why you prefer to check the result of an xor,
instead of checking equality directly.
Your described procedure,
only checks for one letter, not a substring.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top