Find first matching substring

E

Eloff

Almost every time I've had to do parsing of text over the last 5 years
I've needed this function:

def find_first(s, subs, start=None, end=None):
results = [s.find(sub, start, end) for sub in subs]
results = [r for r in results if r != -1]
if results:
return min(results)

return -1

It finds the first matching substring in the target string, if there
is more than one match, it returns the position of the match that
occurs at the lowest index in the string.

Has anyone else had problems where they could have applied this
function?

It seems to me that python's find (and rfind, index, rindex) could be
modified (the same way that startswith and endswith have been) to
behave this way if a tuple were passed. Do other's agree that this
would be desirable?

Thanks,
Dan
 
M

MRAB

Eloff said:
> Almost every time I've had to do parsing of text over the last 5 years
> I've needed this function:
>
> def find_first(s, subs, start=None, end=None):
> results = [s.find(sub, start, end) for sub in subs]
> results = [r for r in results if r != -1]
> if results:
> return min(results)
>
> return -1
>
> It finds the first matching substring in the target string, if there
> is more than one match, it returns the position of the match that
> occurs at the lowest index in the string.
>
One possible optimisation for your code is to note that if you find that
one of the substrings starts at a certain position then you're not
interested in any subsequent substring which might start at or after
that position, so you could reduce the search space for each substring
found.
Has anyone else had problems where they could have applied this
function?

It seems to me that python's find (and rfind, index, rindex) could be
modified (the same way that startswith and endswith have been) to
behave this way if a tuple were passed. Do other's agree that this
would be desirable?
Possibly. I think that allowing a tuple in the partition and rpartition
methods might also be useful.
 
J

John Machin

Almost every time I've had to do parsing of text over the last 5 years
I've needed this function: [snip]
It finds the first matching substring in the target string, if there
is more than one match, it returns the position of the match that
occurs at the lowest index in the string.

Alternatives:
(1) re.search(r"sub0|sub1|...", ...)
(2) google "Aho Corasick Python" (one result should be a thread in
this newsgroup within the last week)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top