Matching zero only once using RE

G

GregM

Hi,

I've looked at a lot of pages on the net and still can't seem to nail
this. Would someone more knowledgeable in regular expressions please
provide some help to point out what I'm doing wrong?

I am trying to see if a web page contains the exact text:
You have found 0 matches

But instead I seem to be matching all sorts of expected line like
You have found <a number up to 5 digits long with comma> matches
for example:
You have found 34 matches
You have found 189 matches
You have found 16,734 matches
You have found 1,706 matches
You have found 300 matches

The last 2 I thought I had eliminated but sadly it seems not the
examples above actually seem to match my expression below. :(

Here is what I'm doing:
zeromatch = []
SecondarySearchTerm = 'You found (0){1} matches'
primarySearchTerm = 'Looking for Something'
primarySearchTerm2 = 'has been an error connecting'

# pagetext is all the body text on a web page.
# I'm using COM to drive MSIE and pagetext = doc.body.outerText

if (re.search(primarySearchTerm, pagetext) or
re.search(primarySearchTerm2, pagetext)):
failedlinks.append(link)
elif (re.search(SecondarySearchTerm, pagetext)):
zeromatch.append(link)

I've tried other RE's be had even more spectacular failures any help
would be greatly appreciated.

Thanks in Advance,
Greg Moore
Software Test
Shop.com
 
J

Jaime Wyant

Hi,

I've looked at a lot of pages on the net and still can't seem to nail
this. Would someone more knowledgeable in regular expressions please
provide some help to point out what I'm doing wrong?

I am trying to see if a web page contains the exact text:
You have found 0 matches
Shouldn't your regular expression be "You have found 0 matches". If
you're looking for that exact string, then you should use that.

This works for me:
<_sre.SRE_Match object at 0x00B21800>

ALso, it looks like your pattern is off. The pattern you use is given below...

SecondarySearchTerm = 'You found (0){1} matches'

However, you state that you are looking for 'You have found 0 matches'

* Notice the 'have' in the string you are searching for and it's
absence in your search term ;)

HTH,
jw
 
M

Mike Meyer

GregM said:
I've looked at a lot of pages on the net and still can't seem to nail
this. Would someone more knowledgeable in regular expressions please
provide some help to point out what I'm doing wrong?

I am trying to see if a web page contains the exact text:
You have found 0 matches

Why in the gods names are you using an re for this? Just use in:

I think it's time to form a Committee for the Prevention of Regular
Expression Abuse.

<mike
 
B

Benji York

GregM said:
I am trying to see if a web page contains the exact text:
You have found 0 matches

It is unclear to me why you're using a regex at all. If you want to
find the *exact* text "You have found 0 matches" perhaps you should do
something like this:

if "You have found 0 matches" in pagetext:
print 'yes'
else:
print 'no'
 
G

GregM

Hi
thanks to all of you. Mike I like your committee idea. where can I
join? lol

Greg.
 
A

Aahz

I think it's time to form a Committee for the Prevention of Regular
Expression Abuse.

'Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.' --Jamie Zawinski
 
F

Fredrik Lundh

Mike said:
I think it's time to form a Committee for the Prevention of Regular
Expression Abuse.

on the other hand, the RE engine uses a more advanced scanning
algorithm than string find, which means that constant RE:s can in
fact be faster under some circumstances (certain patterns, target
strings with lots of false partial matches, etc).

see "searching for literal text" on this page

http://mail.python.org/pipermail/python-dev/2000-August/007797.html

for some figures.

(things have improved since then, especially in 2.4. in 2.3, "in" was
also a lot slower than "find". and all three are still slower than they
have to be: http://online.effbot.org/2004_08_01_archive.htm#find2 )

</F>
 
T

Tim Roberts

Mike Meyer said:
I think it's time to form a Committee for the Prevention of Regular
Expression Abuse.

As I learned from personal experience, this is a disease which one
contracts when moving to Python from Perl. Perl teaches you that the
entire world is a string, and every operation is a regular expression match
upon that string.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,062
Latest member
OrderKetozenseACV

Latest Threads

Top