matching exactly a 4 digit number in python

H

harijay

Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"

I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .

Here is my test program output and the test given below
Thanks for your help
Harijay

PyMate r8111 running Python 2.5.1 (/usr/bin/python)
Matched I have 2004 rupees
Matched I have 3324234 and more
Matched As 3233
Matched 2323423414 is good
Matched 4444 dc sav 2412441 asdf
SKIPPED random1341also and also
SKIPPED
SKIPPED 13
Matched a 1331 saves
SKIPPED and and as dad
SKIPPED A has 13123123
SKIPPED A 13123
SKIPPED 123 adn
Matched 1312 times I have told you
DONE

#!/usr/bin/python
import re
x = [" I have 2004 rupees "," I have 3324234 and more" , " As 3233 " ,
"2323423414 is good","4444 dc sav 2412441 asdf " , "random1341also and
also" ,"","13"," a 1331 saves" ," and and as dad"," A has 13123123","
A 13123","123 adn","1312 times I have told you"]

p = re.compile(r'\d{4} ')

for elem in x:
if re.search(p,elem):
print "Matched " + elem
else:
print "SKIPPED " + elem

print "DONE"
 
M

Mr.SpOOn

2008/11/21 harijay said:
Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"

Try with this:

p = re.compile(r'\d{4}$')

The $ character matches the end of the string. It should work.
 
J

John Machin

Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"

No it doesn't. When used with re.search on that string it matches
3324, it doesn't "match" the whole sentence.
I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .

{4} does NOT mean the same as {4,}.
{4} is the same as {4,4}
{4,} means {4,INFINITY}

Ignoring {4,}:

You need to specify a regex that says "4 digits followed by (non-digit
or end-of-string)". Have a try at that and come back here if you have
any more problems.

some test data:
xxx1234
xxx12345
xxx1234xxx
xxx12345xxx
xxx1234xxx1235xxx
xxx12345xxx1234xxx
 
S

skip

Try this instead:
... m = pat.search(s)
... print repr(s),
... print (m is not None) and "matches" or "does not match"
...
' I have 2004 rupees ' matches
' I have 3324234 and more' does not match
' As 3233 ' matches
'2323423414 is good' does not match
'4444 dc sav 2412441 asdf ' matches
'random1341also and also' matches
'' does not match
'13' does not match
' a 1331 saves' matches
' and and as dad' does not match
' A has 13123123' does not match
'A 13123' does not match
'123 adn' does not match
'1312 times I have told you' matches
 
G

George Sakkis

Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"

I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .

No, why should it ? What you're saying is "give me 4 consecutive
digits", without specifying what should precede or follow these
digits. A correct expression is a bit more hairy:

p = re.compile(r'''
(?:\D|\b) # find a non-digit or word boundary..
(\d{4}) # .. followed by the 4 digits to be matched as group
#1..
(?:\D|\b) # .. which are followed by non-digit or word boundary
''', re.VERBOSE)


HTH,
George
 
M

MRAB

George said:
No, why should it ? What you're saying is "give me 4 consecutive
digits", without specifying what should precede or follow these
digits. A correct expression is a bit more hairy:

p = re.compile(r'''
(?:\D|\b) # find a non-digit or word boundary..
(\d{4}) # .. followed by the 4 digits to be matched as group
#1..
(?:\D|\b) # .. which are followed by non-digit or word boundary
''', re.VERBOSE)
You want to match a sequence of 4 digits: \d{4}
not preceded by a digit: (?<!\d)
not followed by a digit: (?!\d)

which is: re.compile(r'(?<!\d)\d{4}(?!\d)')
 
H

harijay

Thanks John Machin and Mark Tolonen ..
SO I guess the correct one is to use the word boundary meta character
"\b"

so r'\b\d{4}\b' is what I need since it reads

a 4 digit number in between word boundaries

Thanks a tonne, and this being my second post to comp.lang.python. I
am always amazed at how helpful everyone on this group is

Hari
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top