Finding # prefixing numbers

P

peterbe

In a text that contains references to numbers like this: #583 I want to
find them with a regular expression but I'm having problems with the
hash. Hopefully this code explains where I'm stuck:
import re
re.compile(r'\b(\d\d\d)\b').findall('#123 x (#234) or:#456 #6789') ['123', '234', '456']
re.compile(r'\b(X\d\d\d)\b').findall('X123 x (X234) or:X456 X6789') ['X123', 'X234', 'X456']
re.compile(r'\b(#\d\d\d)\b').findall('#123 x (#234) or:#456 #6789') []
re.compile(r'\b(\#\d\d\d)\b').findall('#123 x (#234) or:#456 #6789')
[]

As you can guess, I'm trying to find a hash followed by 3 digits word
bounded. As in the example above, it wouldn't have been a problem if
the prefix was an 'X' but that's not the case here.
 
D

Duncan Booth

In a text that contains references to numbers like this: #583 I want
to find them with a regular expression but I'm having problems with
the hash. Hopefully this code explains where I'm stuck:
import re
re.compile(r'\b(\d\d\d)\b').findall('#123 x (#234) or:#456 #6789') ['123', '234', '456']
re.compile(r'\b(X\d\d\d)\b').findall('X123 x (X234) or:X456 X6789') ['X123', 'X234', 'X456']
re.compile(r'\b(#\d\d\d)\b').findall('#123 x (#234) or:#456 #6789') []
re.compile(r'\b(\#\d\d\d)\b').findall('#123 x (#234) or:#456
#6789')
[]

As you can guess, I'm trying to find a hash followed by 3 digits word
bounded. As in the example above, it wouldn't have been a problem if
the prefix was an 'X' but that's not the case here.

From the re documentation:
\b
Matches the empty string, but only at the beginning or end of a word.
A word is defined as a sequence of alphanumeric or underscore
characters, so the end of a word is indicated by whitespace or a
non-alphanumeric, non-underscore character. Note that \b is defined as
the boundary between \w and \ W, so the precise set of characters
deemed to be alphanumeric depends on the values of the UNICODE and
LOCALE flags. Inside a character range, \b represents the backspace
character, for compatibility with Python's string literals.

# is not a letter or digit, so \b# will match only if the # is directly
preceded by a letter or digit which isn't the case in any of your examples.
Use \B (which is the opposite of \b) instead:
['#123', '#234', '#456']
 
C

Caleb Hattingh

You really owe it to yourself to try the PyParsing package, if you have to
do this kind of thing with any frequency.

The syntactic difference between PyParsing and regular expressions is
greater than the syntactic difference between Python and C.

thx
Caleb

In a text that contains references to numbers like this: #583 I want to
find them with a regular expression but I'm having problems with the
hash. Hopefully this code explains where I'm stuck:
import re
re.compile(r'\b(\d\d\d)\b').findall('#123 x (#234) or:#456 #6789') ['123', '234', '456']
re.compile(r'\b(X\d\d\d)\b').findall('X123 x (X234) or:X456 X6789') ['X123', 'X234', 'X456']
re.compile(r'\b(#\d\d\d)\b').findall('#123 x (#234) or:#456 #6789') []
re.compile(r'\b(\#\d\d\d)\b').findall('#123 x (#234) or:#456 #6789')
[]

As you can guess, I'm trying to find a hash followed by 3 digits word
bounded. As in the example above, it wouldn't have been a problem if
the prefix was an 'X' but that's not the case here.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top