Finding # prefixing numbers

peterbe · Jul 19, 2005

In a text that contains references to numbers like this: #583 I want to
find them with a regular expression but I'm having problems with the
hash. Hopefully this code explains where I'm stuck:

import re
re.compile(r'\b(\d\d\d)\b').findall('#123 x (#234) or:#456 #6789') ['123', '234', '456']
re.compile(r'\b(X\d\d\d)\b').findall('X123 x (X234) or:X456 X6789') ['X123', 'X234', 'X456']
re.compile(r'\b(#\d\d\d)\b').findall('#123 x (#234) or:#456 #6789') []
re.compile(r'\b(\#\d\d\d)\b').findall('#123 x (#234) or:#456 #6789')

Click to expand...

Click to expand...

[]

As you can guess, I'm trying to find a hash followed by 3 digits word
bounded. As in the example above, it wouldn't have been a problem if
the prefix was an 'X' but that's not the case here.

Duncan Booth · Jul 19, 2005

In a text that contains references to numbers like this: #583 I want
to find them with a regular expression but I'm having problems with
the hash. Hopefully this code explains where I'm stuck:

import re
re.compile(r'\b(\d\d\d)\b').findall('#123 x (#234) or:#456 #6789') ['123', '234', '456']
re.compile(r'\b(X\d\d\d)\b').findall('X123 x (X234) or:X456 X6789') ['X123', 'X234', 'X456']
re.compile(r'\b(#\d\d\d)\b').findall('#123 x (#234) or:#456 #6789') []
re.compile(r'\b(\#\d\d\d)\b').findall('#123 x (#234) or:#456
#6789')

Click to expand...

Click to expand...

[]

As you can guess, I'm trying to find a hash followed by 3 digits word
bounded. As in the example above, it wouldn't have been a problem if
the prefix was an 'X' but that's not the case here.

From the re documentation:

\b
Matches the empty string, but only at the beginning or end of a word.
A word is defined as a sequence of alphanumeric or underscore
characters, so the end of a word is indicated by whitespace or a
non-alphanumeric, non-underscore character. Note that \b is defined as
the boundary between \w and \ W, so the precise set of characters
deemed to be alphanumeric depends on the values of the UNICODE and
LOCALE flags. Inside a character range, \b represents the backspace
character, for compatibility with Python's string literals.

# is not a letter or digit, so \b# will match only if the # is directly
preceded by a letter or digit which isn't the case in any of your examples.
Use \B (which is the opposite of \b) instead:
['#123', '#234', '#456']

peterbe · Jul 19, 2005

Thank you! That solved my problem.

Caleb Hattingh · Jul 19, 2005

You really owe it to yourself to try the PyParsing package, if you have to
do this kind of thing with any frequency.

The syntactic difference between PyParsing and regular expressions is
greater than the syntactic difference between Python and C.

thx
Caleb

In a text that contains references to numbers like this: #583 I want to
find them with a regular expression but I'm having problems with the
hash. Hopefully this code explains where I'm stuck:

import re
re.compile(r'\b(\d\d\d)\b').findall('#123 x (#234) or:#456 #6789') ['123', '234', '456']
re.compile(r'\b(X\d\d\d)\b').findall('X123 x (X234) or:X456 X6789') ['X123', 'X234', 'X456']
re.compile(r'\b(#\d\d\d)\b').findall('#123 x (#234) or:#456 #6789') []
re.compile(r'\b(\#\d\d\d)\b').findall('#123 x (#234) or:#456 #6789')

Click to expand...

Click to expand...

[]

As you can guess, I'm trying to find a hash followed by 3 digits word
bounded. As in the example above, it wouldn't have been a problem if
the prefix was an 'X' but that's not the case here.

Numeric root-finding in Python	13	Feb 12, 2012
Taxicab Numbers	4	Dec 27, 2007
RegExp performance?	6	Feb 25, 2007
[SUMMARY] Happy Numbers (#93)	0	Sep 7, 2006
Help wanted with md2 hash algorithm	12	Jan 6, 2006
[QUIZ SOLUTION] Happy Numbers (#93)	3	Sep 6, 2006
Better crypto hash functions, long, with code	2	Aug 26, 2005
high performance hyperlink extraction	2	Sep 13, 2005

Finding # prefixing numbers

peterbe

Duncan Booth

peterbe

Caleb Hattingh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads