Why re.match()?

kj · Jul 1, 2009

For a recovering Perl-head like me it is difficult to understand
why Python's re module offers both match and search. Why not just
use search with a beginning-of-string anchor? I find it particularly
puzzling because I have this (possibly mistaken) idea that the
Python design philosophy tends towards minimalism, a sort of Occam's
razor, when it comes to language entities; i.e. having re.match
along with re.search seems to me like an "unnecessary multiplication
of entities". What am I missing?

TIA!

kj

Carl Banks · Jul 1, 2009

For a recovering Perl-head like me it is difficult to understand
why Python's re module offers both match and search. Why not just
use search with a beginning-of-string anchor? I find it particularly
puzzling because I have this (possibly mistaken) idea that the
Python design philosophy tends towards minimalism, a sort of Occam's
razor, when it comes to language entities; i.e. having re.match
along with re.search seems to me like an "unnecessary multiplication
of entities". What am I missing?

It always seemed redundant to me also (notwithstanding Duncan Booth's
explanation of slight semantic differences). However, I find myself
using re.match much more often than re.search, so perhaps in this case
a "second obvious way" is justified.

Carl Banks

MRAB · Jul 2, 2009

Carl said:
It always seemed redundant to me also (notwithstanding Duncan Booth's
explanation of slight semantic differences). However, I find myself
using re.match much more often than re.search, so perhaps in this case
a "second obvious way" is justified.

re.match is anchored at a certain position, whereas re.search isn't. The
re module doesn't support Perl's \G anchor.

John Machin · Jul 2, 2009

re.match is anchored at a certain position, whereas re.search isn't. The
re module doesn't support Perl's \G anchor.

re.search is obstinately determined when it comes to flogging dead
horses:

C:\junk>\python26\python -mtimeit -s"import re" "re.match('xxx',
'y'*100)"
100000 loops, best of 3: 3.37 usec per loop

C:\junk>\python26\python -mtimeit -s"import re" "re.search('^xxx',
'y'*100)"
100000 loops, best of 3: 7.01 usec per loop

C:\junk>\python26\python -mtimeit -s"import re" "re.match('xxx',
'y'*1000)"
100000 loops, best of 3: 3.85 usec per loop

C:\junk>\python26\python -mtimeit -s"import re" "re.search('^xxx',
'y'*1000)"
10000 loops, best of 3: 37.9 usec per loop

C:\junk>

kj · Jul 2, 2009

In said:
So, for example:

I find this unconvincing; with re.search alone one could simply
do:

re.compile("^c").search("abcdef"[2:])

Click to expand...

Click to expand...

<_sre.SRE_Match object at 0x75918>

No need for re.match(), at least as far as your example shows.

Maybe there are times when re.match() is more "convenient" in some
way, but it is awfully Perlish to "multiply language elements" for
the sake of this trifling convenience.

kynn

Steven D'Aprano · Jul 2, 2009

In said:
In said:

So, for example:

Click to expand...

I find this unconvincing; with re.search alone one could simply do:

re.compile("^c").search("abcdef"[2:])

Click to expand...

Click to expand...

<_sre.SRE_Match object at 0x75918>

No need for re.match(), at least as far as your example shows.

Your source string "abcdef" is tiny. Consider the case where the source
string is 4GB of data. You want to duplicate the whole lot, minus two
characters. Not so easy now.

Maybe there are times when re.match() is more "convenient" in some way,
but it is awfully Perlish to "multiply language elements" for the sake
of this trifling convenience.

No, it's absolutely Pythonic.

....
Although practicality beats purity.

kj · Jul 2, 2009

In said:
In said:

So, for example:

Click to expand...

re.compile("c").match("abcdef", 2)
<_sre.SRE_Match object at 0x0000000002C09B90>
re.compile("^c").search("abcdef", 2)

Click to expand...

I find this unconvincing; with re.search alone one could simply do:

re.compile("^c").search("abcdef"[2:])

Click to expand...

<_sre.SRE_Match object at 0x75918>

No need for re.match(), at least as far as your example shows.

Click to expand...

Your source string "abcdef" is tiny. Consider the case where the source
string is 4GB of data. You want to duplicate the whole lot, minus two
characters. Not so easy now.

I'm sure that it is possible to find cases in which the *current*
implementation of re.search() would be inefficient, but that's
because this implementation is perverse, which, I guess, is ultimately
the point of my original post. Why privilege the special case of
a start-of-string anchor? What if you wanted to apply an end-anchored
pattern to some prefix of your 4GB string? Why not have a special
re method for that? And another for every possible special case?

If the concern is efficiency for such cases, then simply implement
optional offset and length parameters for re.search(), to specify
any arbitrary substring to apply the search to. To have a special-case
re.match() method in addition to a general re.search() method is
antithetical to language minimalism, and plain-old bizarre. Maybe
there's a really good reason for it, but it has not been mentioned
yet.

kj

Hrvoje Niksic · Jul 2, 2009

kj said:
For a recovering Perl-head like me it is difficult to understand
why Python's re module offers both match and search. Why not just
use search with a beginning-of-string anchor?

I need re.match when parsing the whole string. In that case I never
want to search through the string, but process the whole string with
some regulat expression, for example when tokenizing. For example:

pos = 0
while pos != len(s):
match = TOKEN_RE.match(s, pos)
if match:
process_token(match)
pos = match.end()
else:
raise ParseError('invalid syntax at position %d' % pos)

Steven D'Aprano · Jul 3, 2009

I'm sure that it is possible to find cases in which the *current*
implementation of re.search() would be inefficient, but that's because
this implementation is perverse, which, I guess, is ultimately the point
of my original post. Why privilege the special case of a
start-of-string anchor?

Because wanting to see if a string matches from the beginning is a very
important and common special case.

What if you wanted to apply an end-anchored
pattern to some prefix of your 4GB string? Why not have a special re
method for that? And another for every possible special case?

Because they're not common special cases. They're rare and not special
enough to justify special code.

If the concern is efficiency for such cases, then simply implement
optional offset and length parameters for re.search(), to specify any
arbitrary substring to apply the search to. To have a special-case
re.match() method in addition to a general re.search() method is
antithetical to language minimalism, and plain-old bizarre. Maybe
there's a really good reason for it, but it has not been mentioned yet.

There is, and it has. You're welcome to keep your own opinion, but I
don't think you'll find many experienced Python coders will agree with it.

kj · Jul 3, 2009

There is, and it has.

I "misspoke" earlier. I should have written "I'm *sure* there's
a really good reason for it." And I think no one in this thread
(myself included, of course) has a clue of what it is. I miss the
days when Guido still posted to comp.lang.python. He'd know.

Regarding the "practicality beats purity" line, it's hard to think
of a better motto for *Perl*, with all its practicality-oriented
special doodads. (And yes, I know where the "practicality beats
purity" line comes from.) Even *Perl* does not have a special
syntax for the task that re.match is supposedly tailor-made for,
according to the replies I've received. Given that it is so trivial
to implement all of re.match's functionality with only one additional
optional parameter for re.search (i.e. pos), it is absurd to claim
that re.match is necessary for the sake of this special functionality.
The justification for re.match must be elsewhere.

But thanks for letting me know that I'm entitled to my opinion.
That's a huge relief.

kj

MRAB · Jul 3, 2009

kj said:
I "misspoke" earlier. I should have written "I'm *sure* there's
a really good reason for it." And I think no one in this thread
(myself included, of course) has a clue of what it is. I miss the
days when Guido still posted to comp.lang.python. He'd know.

Regarding the "practicality beats purity" line, it's hard to think
of a better motto for *Perl*, with all its practicality-oriented
special doodads. (And yes, I know where the "practicality beats
purity" line comes from.) Even *Perl* does not have a special
syntax for the task that re.match is supposedly tailor-made for,
according to the replies I've received. Given that it is so trivial
to implement all of re.match's functionality with only one additional
optional parameter for re.search (i.e. pos), it is absurd to claim
that re.match is necessary for the sake of this special functionality.
The justification for re.match must be elsewhere.

But thanks for letting me know that I'm entitled to my opinion.
That's a huge relief.

As I wrote, re.match anchors the match whereas re.search doesn't. An
alternative would have been to implement Perl's \G anchor, but I believe
that that was invented after the re module was written.

Aahz · Jul 3, 2009

I "misspoke" earlier. I should have written "I'm *sure* there's
a really good reason for it." And I think no one in this thread
(myself included, of course) has a clue of what it is. I miss the
days when Guido still posted to comp.lang.python. He'd know.

You may find this enlightening:

http://www.python.org/doc/1.4/lib/node52.html

Bruno Desthuilliers · Jul 3, 2009

kj a écrit :
(snipo

To have a special-case
re.match() method in addition to a general re.search() method is
antithetical to language minimalism,

FWIW, Python has no pretention to minimalism.

kj · Jul 3, 2009

In said:
You may find this enlightening:

Click to expand...

http://www.python.org/doc/1.4/lib/node52.html

Click to expand...

Indeed. Thank you.

kj

Lie Ryan · Jul 3, 2009

Steven said:
Because wanting to see if a string matches from the beginning is a very
important and common special case.

I find the most oddest thing about re.match is that it have an implicit
beginning anchor, but not implicit end anchor. I thought it was much
more common to ensure that a string matches a certain pattern, than just
matching the beginning. But everyone's mileages vary.

kj · Jul 6, 2009

In said:
kj a écrit :
(snipo

FWIW, Python has no pretention to minimalism.

Assuming that you mean by this that Python's authors have no such
pretensions:

"There is real value in having a small language."

Guido van Rossum, 2007.07.03
http://mail.python.org/pipermail/python-3000/2007-July/008663.html

So there.

BTW, that's just one example. I've seen similar sentiments expressed
by Guido over and over and over: any new proposed enhancement to
Python must be good enough in his mind to justify cluttering the
language. That attitude counts as minimalism in my book.

The best explanation I have found so far for re.match is that it
is an unfortunate bit of legacy, something that would not be there
if the design of Python did not have to be mindful of keeping old
code chugging along...

kj

Diez B. Roggisch · Jul 6, 2009

kj said:
Assuming that you mean by this that Python's authors have no such
pretensions:

"There is real value in having a small language."

Guido van Rossum, 2007.07.03
http://mail.python.org/pipermail/python-3000/2007-July/008663.html

So there.

BTW, that's just one example. I've seen similar sentiments expressed
by Guido over and over and over: any new proposed enhancement to
Python must be good enough in his mind to justify cluttering the
language. That attitude counts as minimalism in my book.

The best explanation I have found so far for re.match is that it
is an unfortunate bit of legacy, something that would not be there
if the design of Python did not have to be mindful of keeping old
code chugging along...

language != libraries.

Diez

Rhodri James · Jul 7, 2009

In <[email protected]> Bruno Desthuilliers

Assuming that you mean by this that Python's authors have no such
pretensions:

"There is real value in having a small language."

Guido van Rossum, 2007.07.03
http://mail.python.org/pipermail/python-3000/2007-July/008663.html

re.match() is part of the library, not the language. The standard
library is in no sense of the word small. It has a mild tendency
to avoid repeating itself, but presumably the stonkingly obvious
optimisation possibilities of re.match() over re.search() are
considered worth the (small) increase in size.

Terry Reedy · Jul 7, 2009

kj said:
"There is real value in having a small language."

Guido van Rossum, 2007.07.03
http://mail.python.org/pipermail/python-3000/2007-July/008663.html

So there.

small != minimal

BTW, that's just one example. I've seen similar sentiments expressed
by Guido over and over and over: any new proposed enhancement to
Python must be good enough in his mind to justify cluttering the
language. That attitude counts as minimalism in my book.

The best explanation I have found so far for re.match is that it
is an unfortunate bit of legacy, something that would not be there
if the design of Python did not have to be mindful of keeping old
code chugging along...

It is possible that someone proposed removing re.match for 3.0, but I do
not remember any such discussion. Some things were dropped when that
contraint was (teporarily) dropped.

tjr

Bruno Desthuilliers · Jul 7, 2009

kj a écrit :

Assuming that you mean by this that Python's authors have no such
pretensions:

"There is real value in having a small language."

Guido van Rossum, 2007.07.03
http://mail.python.org/pipermail/python-3000/2007-July/008663.html

There are some differences between "small" and "minimal"...

So there.

BTW, that's just one example. I've seen similar sentiments expressed
by Guido over and over and over: any new proposed enhancement to
Python must be good enough in his mind to justify cluttering the
language. That attitude counts as minimalism in my book.

And in mine, it counts as "keeping the language's evolution under
control" - which is still not the same thing as being "minimalist". If
Python really was on the "minimalist" side, you wouldn't even have
"class" or "def" statements - both being mostly syntactic sugar. And
let's not even talk about @decorator syntax...

Why C Is Not My Favourite Programming Language	132	Feb 5, 2005
PEP 350: Codetags	20	Sep 26, 2005
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	May 1, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Feb 15, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 15, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 15, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 1, 2008

Why re.match()?

kj

Carl Banks

MRAB

John Machin

kj

Steven D'Aprano

kj

Hrvoje Niksic

Steven D'Aprano

kj

MRAB

Aahz

Bruno Desthuilliers

kj

Lie Ryan

kj

Diez B. Roggisch

Rhodri James

Terry Reedy

Bruno Desthuilliers

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads