Why re.match()?

K

kj

For a recovering Perl-head like me it is difficult to understand
why Python's re module offers both match and search. Why not just
use search with a beginning-of-string anchor? I find it particularly
puzzling because I have this (possibly mistaken) idea that the
Python design philosophy tends towards minimalism, a sort of Occam's
razor, when it comes to language entities; i.e. having re.match
along with re.search seems to me like an "unnecessary multiplication
of entities". What am I missing?

TIA!

kj
 
C

Carl Banks

For a recovering Perl-head like me it is difficult to understand
why Python's re module offers both match and search.  Why not just
use search with a beginning-of-string anchor?  I find it particularly
puzzling because I have this (possibly mistaken) idea that the
Python design philosophy tends towards minimalism, a sort of Occam's
razor, when it comes to language entities; i.e. having re.match
along with re.search seems to me like an "unnecessary multiplication
of entities".  What am I missing?

It always seemed redundant to me also (notwithstanding Duncan Booth's
explanation of slight semantic differences). However, I find myself
using re.match much more often than re.search, so perhaps in this case
a "second obvious way" is justified.


Carl Banks
 
M

MRAB

Carl said:
It always seemed redundant to me also (notwithstanding Duncan Booth's
explanation of slight semantic differences). However, I find myself
using re.match much more often than re.search, so perhaps in this case
a "second obvious way" is justified.
re.match is anchored at a certain position, whereas re.search isn't. The
re module doesn't support Perl's \G anchor.
 
J

John Machin

re.match is anchored at a certain position, whereas re.search isn't. The
re module doesn't support Perl's \G anchor.

re.search is obstinately determined when it comes to flogging dead
horses:

C:\junk>\python26\python -mtimeit -s"import re" "re.match('xxx',
'y'*100)"
100000 loops, best of 3: 3.37 usec per loop

C:\junk>\python26\python -mtimeit -s"import re" "re.search('^xxx',
'y'*100)"
100000 loops, best of 3: 7.01 usec per loop

C:\junk>\python26\python -mtimeit -s"import re" "re.match('xxx',
'y'*1000)"
100000 loops, best of 3: 3.85 usec per loop

C:\junk>\python26\python -mtimeit -s"import re" "re.search('^xxx',
'y'*1000)"
10000 loops, best of 3: 37.9 usec per loop

C:\junk>
 
K

kj

In said:
So, for example:

I find this unconvincing; with re.search alone one could simply
do:
re.compile("^c").search("abcdef"[2:])
<_sre.SRE_Match object at 0x75918>

No need for re.match(), at least as far as your example shows.

Maybe there are times when re.match() is more "convenient" in some
way, but it is awfully Perlish to "multiply language elements" for
the sake of this trifling convenience.

kynn
 
S

Steven D'Aprano

In said:
So, for example:
I find this unconvincing; with re.search alone one could simply do:
re.compile("^c").search("abcdef"[2:])
<_sre.SRE_Match object at 0x75918>

No need for re.match(), at least as far as your example shows.

Your source string "abcdef" is tiny. Consider the case where the source
string is 4GB of data. You want to duplicate the whole lot, minus two
characters. Not so easy now.

Maybe there are times when re.match() is more "convenient" in some way,
but it is awfully Perlish to "multiply language elements" for the sake
of this trifling convenience.

No, it's absolutely Pythonic.

....
Although practicality beats purity.
 
K

kj

In said:
In said:
So, for example:
re.compile("c").match("abcdef", 2)
<_sre.SRE_Match object at 0x0000000002C09B90>
re.compile("^c").search("abcdef", 2)
I find this unconvincing; with re.search alone one could simply do:
re.compile("^c").search("abcdef"[2:])
<_sre.SRE_Match object at 0x75918>

No need for re.match(), at least as far as your example shows.
Your source string "abcdef" is tiny. Consider the case where the source
string is 4GB of data. You want to duplicate the whole lot, minus two
characters. Not so easy now.

I'm sure that it is possible to find cases in which the *current*
implementation of re.search() would be inefficient, but that's
because this implementation is perverse, which, I guess, is ultimately
the point of my original post. Why privilege the special case of
a start-of-string anchor? What if you wanted to apply an end-anchored
pattern to some prefix of your 4GB string? Why not have a special
re method for that? And another for every possible special case?

If the concern is efficiency for such cases, then simply implement
optional offset and length parameters for re.search(), to specify
any arbitrary substring to apply the search to. To have a special-case
re.match() method in addition to a general re.search() method is
antithetical to language minimalism, and plain-old bizarre. Maybe
there's a really good reason for it, but it has not been mentioned
yet.

kj
 
H

Hrvoje Niksic

kj said:
For a recovering Perl-head like me it is difficult to understand
why Python's re module offers both match and search. Why not just
use search with a beginning-of-string anchor?

I need re.match when parsing the whole string. In that case I never
want to search through the string, but process the whole string with
some regulat expression, for example when tokenizing. For example:

pos = 0
while pos != len(s):
match = TOKEN_RE.match(s, pos)
if match:
process_token(match)
pos = match.end()
else:
raise ParseError('invalid syntax at position %d' % pos)
 
S

Steven D'Aprano

I'm sure that it is possible to find cases in which the *current*
implementation of re.search() would be inefficient, but that's because
this implementation is perverse, which, I guess, is ultimately the point
of my original post. Why privilege the special case of a
start-of-string anchor?

Because wanting to see if a string matches from the beginning is a very
important and common special case.

What if you wanted to apply an end-anchored
pattern to some prefix of your 4GB string? Why not have a special re
method for that? And another for every possible special case?

Because they're not common special cases. They're rare and not special
enough to justify special code.

If the concern is efficiency for such cases, then simply implement
optional offset and length parameters for re.search(), to specify any
arbitrary substring to apply the search to. To have a special-case
re.match() method in addition to a general re.search() method is
antithetical to language minimalism, and plain-old bizarre. Maybe
there's a really good reason for it, but it has not been mentioned yet.

There is, and it has. You're welcome to keep your own opinion, but I
don't think you'll find many experienced Python coders will agree with it.
 
K

kj

There is, and it has.

I "misspoke" earlier. I should have written "I'm *sure* there's
a really good reason for it." And I think no one in this thread
(myself included, of course) has a clue of what it is. I miss the
days when Guido still posted to comp.lang.python. He'd know.

Regarding the "practicality beats purity" line, it's hard to think
of a better motto for *Perl*, with all its practicality-oriented
special doodads. (And yes, I know where the "practicality beats
purity" line comes from.) Even *Perl* does not have a special
syntax for the task that re.match is supposedly tailor-made for,
according to the replies I've received. Given that it is so trivial
to implement all of re.match's functionality with only one additional
optional parameter for re.search (i.e. pos), it is absurd to claim
that re.match is necessary for the sake of this special functionality.
The justification for re.match must be elsewhere.

But thanks for letting me know that I'm entitled to my opinion.
That's a huge relief.

kj
 
M

MRAB

kj said:
I "misspoke" earlier. I should have written "I'm *sure* there's
a really good reason for it." And I think no one in this thread
(myself included, of course) has a clue of what it is. I miss the
days when Guido still posted to comp.lang.python. He'd know.

Regarding the "practicality beats purity" line, it's hard to think
of a better motto for *Perl*, with all its practicality-oriented
special doodads. (And yes, I know where the "practicality beats
purity" line comes from.) Even *Perl* does not have a special
syntax for the task that re.match is supposedly tailor-made for,
according to the replies I've received. Given that it is so trivial
to implement all of re.match's functionality with only one additional
optional parameter for re.search (i.e. pos), it is absurd to claim
that re.match is necessary for the sake of this special functionality.
The justification for re.match must be elsewhere.

But thanks for letting me know that I'm entitled to my opinion.
That's a huge relief.
As I wrote, re.match anchors the match whereas re.search doesn't. An
alternative would have been to implement Perl's \G anchor, but I believe
that that was invented after the re module was written.
 
B

Bruno Desthuilliers

kj a écrit :
(snipo
To have a special-case
re.match() method in addition to a general re.search() method is
antithetical to language minimalism,

FWIW, Python has no pretention to minimalism.
 
L

Lie Ryan

Steven said:
Because wanting to see if a string matches from the beginning is a very
important and common special case.

I find the most oddest thing about re.match is that it have an implicit
beginning anchor, but not implicit end anchor. I thought it was much
more common to ensure that a string matches a certain pattern, than just
matching the beginning. But everyone's mileages vary.
 
K

kj

In said:
kj a écrit :
(snipo
FWIW, Python has no pretention to minimalism.

Assuming that you mean by this that Python's authors have no such
pretensions:

"There is real value in having a small language."

Guido van Rossum, 2007.07.03
http://mail.python.org/pipermail/python-3000/2007-July/008663.html

So there.

BTW, that's just one example. I've seen similar sentiments expressed
by Guido over and over and over: any new proposed enhancement to
Python must be good enough in his mind to justify cluttering the
language. That attitude counts as minimalism in my book.

The best explanation I have found so far for re.match is that it
is an unfortunate bit of legacy, something that would not be there
if the design of Python did not have to be mindful of keeping old
code chugging along...

kj
 
D

Diez B. Roggisch

kj said:
Assuming that you mean by this that Python's authors have no such
pretensions:

"There is real value in having a small language."

Guido van Rossum, 2007.07.03
http://mail.python.org/pipermail/python-3000/2007-July/008663.html

So there.

BTW, that's just one example. I've seen similar sentiments expressed
by Guido over and over and over: any new proposed enhancement to
Python must be good enough in his mind to justify cluttering the
language. That attitude counts as minimalism in my book.

The best explanation I have found so far for re.match is that it
is an unfortunate bit of legacy, something that would not be there
if the design of Python did not have to be mindful of keeping old
code chugging along...

language != libraries.

Diez
 
R

Rhodri James

In <[email protected]> Bruno Desthuilliers



Assuming that you mean by this that Python's authors have no such
pretensions:

"There is real value in having a small language."

Guido van Rossum, 2007.07.03
http://mail.python.org/pipermail/python-3000/2007-July/008663.html

re.match() is part of the library, not the language. The standard
library is in no sense of the word small. It has a mild tendency
to avoid repeating itself, but presumably the stonkingly obvious
optimisation possibilities of re.match() over re.search() are
considered worth the (small) increase in size.
 
T

Terry Reedy

kj said:
"There is real value in having a small language."

Guido van Rossum, 2007.07.03
http://mail.python.org/pipermail/python-3000/2007-July/008663.html

So there.

small != minimal
BTW, that's just one example. I've seen similar sentiments expressed
by Guido over and over and over: any new proposed enhancement to
Python must be good enough in his mind to justify cluttering the
language. That attitude counts as minimalism in my book.

The best explanation I have found so far for re.match is that it
is an unfortunate bit of legacy, something that would not be there
if the design of Python did not have to be mindful of keeping old
code chugging along...

It is possible that someone proposed removing re.match for 3.0, but I do
not remember any such discussion. Some things were dropped when that
contraint was (teporarily) dropped.

tjr
 
B

Bruno Desthuilliers

kj a écrit :
Assuming that you mean by this that Python's authors have no such
pretensions:

"There is real value in having a small language."

Guido van Rossum, 2007.07.03
http://mail.python.org/pipermail/python-3000/2007-July/008663.html

There are some differences between "small" and "minimal"...
So there.

BTW, that's just one example. I've seen similar sentiments expressed
by Guido over and over and over: any new proposed enhancement to
Python must be good enough in his mind to justify cluttering the
language. That attitude counts as minimalism in my book.

And in mine, it counts as "keeping the language's evolution under
control" - which is still not the same thing as being "minimalist". If
Python really was on the "minimalist" side, you wouldn't even have
"class" or "def" statements - both being mostly syntactic sugar. And
let's not even talk about @decorator syntax...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,772
Messages
2,569,593
Members
45,111
Latest member
KetoBurn
Top