idiom for RE matching

G

Gordon Airporte

I have some code which relies on running each line of a file through a
large number of regexes which may or may not apply. For each pattern I
want to match I've been writing

gotit = mypattern.findall(line)
if gotit:
gotit = gotit[0]
...do whatever else...

This seems kind of clunky. Is there a prettier way to handle this?
I've also been assuming that using the re functions that create match
objects is slower/heavier than dealing with the simple list returned by
findall(). I've profiled it and these matches are the biggest part of
the running time of the program, so I really would rather not use
anything slower.
 
R

Roger Miller

...
I've also been assuming that using the re functions that create match
objects is slower/heavier than dealing with the simple list returned by
findall(). I've profiled it and these matches are the biggest part of
the running time of the program, so I really would rather not use
anything slower.

My guess would be that searching for more matches after finding the
first would be more expensive than creating a match object. But that
would probably depend on the nature of your data and REs, so you need
to test it both ways if you are concerned about performance.

It would be nice if findall() had an optional parameter to limit the
number of matches, similar to the maxsplit parameter of string.split().
 
M

memracom

I have some code which relies on running each line of a file through a
large number of regexes which may or may not apply.

Have you read and understood what MULTILINE means in the manual
section on re syntax?

Essentially, you can make a single pattern which tests a match against
each line.

-- Michael Dillon
 
G

Gordon Airporte

Have you read and understood what MULTILINE means in the manual
section on re syntax?

Essentially, you can make a single pattern which tests a match against
each line.

-- Michael Dillon

No, I have not looked into this - thank you. RE's are hard enough to get
into that I didn't want the added complication of the flags. Now that
I'm comfortable writing patterns I guess I never got around to the rest
of the options.
 
M

mik3l3374

I have some code which relies on running each line of a file through a
large number of regexes which may or may not apply. For each pattern I
want to match I've been writing

gotit = mypattern.findall(line)
if gotit:
gotit = gotit[0]
...do whatever else...

This seems kind of clunky. Is there a prettier way to handle this?
I've also been assuming that using the re functions that create match
objects is slower/heavier than dealing with the simple list returned by
findall(). I've profiled it and these matches are the biggest part of
the running time of the program, so I really would rather not use
anything slower.

if your search is not overly complicated, i think regexp is not
needed. if you want, you can post a sample what you want to search,
and some sample input.
 
G

Gordon Airporte

if your search is not overly complicated, i think regexp is not
needed. if you want, you can post a sample what you want to search,
and some sample input.

I'm afraid it's pretty complicated :). I'm doing analysis of hand
histories that online poker sites leave for you. Here's one hand of a
play money ring game:


Full Tilt Poker Game #2042984473: Table Play Chip 344 - 10/20 - Limit
Hold'em - 18:07:20 ET - 2007/03/22
Seat 1: grandmarambo (1,595)
Seat 4: justnoldfoolm (2,430)
Seat 5: rickrn (1,890)
Seat 7: harlan312 (820)
Seat 8: moi (335)
justnoldfoolm posts the small blind of 5
rickrn posts the big blind of 10
The button is in seat #1
*** HOLE CARDS ***
Dealt to moi [Jd 2c]
harlan312 calls 10
moi folds
grandmarambo calls 10
justnoldfoolm raises to 20
rickrn folds
harlan312 calls 10
grandmarambo calls 10
*** FLOP *** [7s 8s 2s]
justnoldfoolm bets 10
harlan312 raises to 20
grandmarambo calls 20
justnoldfoolm raises to 30
harlan312 calls 10
grandmarambo calls 10
*** TURN *** [7s 8s 2s] [3d]
justnoldfoolm bets 20
harlan312 calls 20
grandmarambo calls 20
*** RIVER *** [7s 8s 2s 3d] [7h]
justnoldfoolm bets 20
harlan312 folds
grandmarambo folds
Uncalled bet of 20 returned to justnoldfoolm
justnoldfoolm mucks
justnoldfoolm wins the pot (220)
*** SUMMARY ***
Total pot 220 | Rake 0
Board: [7s 8s 2s 3d 7h]
Seat 1: grandmarambo (button) folded on the River
Seat 4: justnoldfoolm (small blind) collected (220), mucked
Seat 5: rickrn (big blind) folded before the Flop
Seat 7: harlan312 folded on the River
Seat 8: moi didn't bet (folded)


So I'm picking out all kinds of info about my cards, my stack, my
betting, my position, board cards, other people's cards, etc. For
example, this pattern picks out which player bet and how much:

betsRe = re.compile('^(.*) bets ([\d,]*)')

I have 13 such patterns. The files I'm analyzing are just a session's
worth of histories like this, separated by \n\n\n. All of this
information needs to be organized by hand or by when it happened in a
hand, so I can't just run patterns over the whole file or I'll lose context.
(Of course, in theory I could write a single monster expression that
would chop it all up properly and organize by context, but it would be
next to impossible to write/debug/maintain.)
 
G

Gabriel Genellina

if your search is not overly complicated, i think regexp is not
needed. if you want, you can post a sample what you want to search,
and some sample input.

I'm afraid it's pretty complicated :). I'm doing analysis of hand
histories that online poker sites leave for you. Here's one hand of a
play money ring game:


Full Tilt Poker Game #2042984473: Table Play Chip 344 - 10/20 - Limit
Hold'em - 18:07:20 ET - 2007/03/22
Seat 1: grandmarambo (1,595)
Seat 4: justnoldfoolm (2,430)
justnoldfoolm posts the small blind of 5
rickrn posts the big blind of 10
The button is in seat #1
*** HOLE CARDS ***
Dealt to moi [Jd 2c]
justnoldfoolm bets 10
[more sample lines]

So I'm picking out all kinds of info about my cards, my stack, my
betting, my position, board cards, other people's cards, etc. For
example, this pattern picks out which player bet and how much:

betsRe = re.compile('^(.*) bets ([\d,]*)')

I have 13 such patterns. The files I'm analyzing are just a session's
worth of histories like this, separated by \n\n\n. All of this
information needs to be organized by hand or by when it happened in a
hand, so I can't just run patterns over the whole file or I'll lose
context.
(Of course, in theory I could write a single monster expression that
would chop it all up properly and organize by context, but it would be
next to impossible to write/debug/maintain.)

But you don't HAVE to use a regular expression. For so simple and
predictable input, using partition or 'xxx in string' is around 4x faster:

import re

betsRe = re.compile('^(.*) bets ([\d,]*)')

def test_partition(line):
who, bets, amount = line.partition(" bets ")
if bets:
return who, amount

def test_re(line):
r = betsRe.match(line)
if r:
return r.group(1), r.group(2)

line1 = "justnoldfoolm bets 10"
assert test_re(line1) == test_partition(line1) == ("justnoldfoolm", "10")
line2 = "Uncalled bet of 20 returned to justnoldfoolm"
assert test_re(line2) == test_partition(line2) == None

py> timeit.Timer("test_partition(line1)", "from __main__ import
*").repeat()
<timeit-src>:2: SyntaxWarning: import * only allowed at module level
[1.1922188434563594, 1.2086988709458808, 1.1956522407177488]
py> timeit.Timer("test_re(line1)", "from __main__ import *").repeat()
<timeit-src>:2: SyntaxWarning: import * only allowed at module level
[5.2871529761464018, 5.2763971398599523, 5.2791986132315714]

As is often the case, a regular expression is NOT the right tool to use in
this case.
 
G

Gordon Airporte

Gabriel said:
As is often the case, a regular expression is NOT the right tool to use
in this case.

--Gabriel Genellina

Very interesting, thank you. I think 'pattern matching' and I
automatically think 'regular expressions'.
I did already find that it speeds things up to pre-test a line like

if 'bets' or 'calls' or 'raises' in line:
run the appropriate re's

which isn't very pretty at all, and it seems I didn't manage to take the
next steps to doing aways with the re's altogether.
 
M

Miles

I did already find that it speeds things up to pre-test a line like

if 'bets' or 'calls' or 'raises' in line:
run the appropriate re's

Be careful: unless this is just pseudocode, this Python doesn't do
what you think it does; it always runs the regular expressions, so any
speed-up is imaginary.
'spam'

AFAIK, the (Python 2.5) idiom for what you want is:
True

-Miles
 
B

Ben Finney

Gordon Airporte said:
I did already find that it speeds things up to pre-test a line like

if 'bets' or 'calls' or 'raises' in line:
run the appropriate re's

which isn't very pretty at all

Nor does it work the way you might suppose:
... print "Yes"
...
Yes ... print "Yes"
...
Yes

Hint: the expression being evaluated for the 'if' statement above is
equivalent to:

if (('bets') or ('calls') or ('raises' in line)):
print "Yes"
 
G

Gordon Airporte

Miles said:
Be careful: unless this is just pseudocode, this Python doesn't do
what you think it does; it always runs the regular expressions, so any
speed-up is imaginary.

Yes, that's pseudo code even though I didn't really mean it that way
when I typed it. The actual code uses the proper 'if foo in line or if
bar in line:' form.
 
W

Wildemar Wildenburger

Gordon said:
Yes, that's pseudo code even though I didn't really mean it that way
when I typed it. The actual code uses the proper 'if foo in line or if
bar in line:' form.
One 'if' too many.
/W
 
B

Ben Finney

Gordon Airporte said:
The actual code uses the proper 'if foo in line or if bar in line:'
form.
File "<stdin>", line 1
if foo in line or if bar in line:
^
SyntaxError: invalid syntax

Not that I want to pick on you; I just don't want something wrong
labelled as "proper" to go unchallenged in the archives :)
 
G

Gordon Airporte

Ben said:
Not that I want to pick on you; I just don't want something wrong
labelled as "proper" to go unchallenged in the archives :)

Oh gawd :p

I swear I have it right in the actual file! heh.
Copy and paste something that's compiled kids, copy and paste.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,149
Latest member
Vinay Kumar Nevatia0
Top