Regex similar to "^(?u)\w$", but without digits?

Andreas · Apr 11, 2009

Hello,

I'd like to create a regex that captures any unicode character, but
not the underscore and the digits 0-9. "^(?u)\w$" captures them also.
Is there a possibility to restrict an expression like "\w" to "\w
without [0-9_]"?
I'm using python 2.5.4

Thanks in advance,
Andreas

John Machin · Apr 12, 2009

Hello,

I'd like to create a regex that captures any unicode character, but
not the underscore and the digits 0-9.

[requirement 1]

"^(?u)\w$" captures them also.
Is there a possibility to restrict an expression like "\w" to "\w
without [0-9_]"?

[requirement 2]

The two requirements are not the same.
R1: [^0-9_] matches any character except the underscore and the digits
0-9
R2: To match "like \w except for underscore and digits 0-9", find
"negative lookbehind assertion" in the re docs.

I've omitted the ^, $ and (?u) because the above advice is general.

HTH,
John

Mark Tolonen · Apr 12, 2009

Andreas said:
Hello,

I'd like to create a regex that captures any unicode character, but
not the underscore and the digits 0-9. "^(?u)\w$" captures them also.
Is there a possibility to restrict an expression like "\w" to "\w
without [0-9_]"?

'(?u)[^\W0-9_]' removes 0-9_ from \w.

-Mark

Andreas Pfrengle · Apr 13, 2009

Hello,

Click to expand...

I'd like to create a regex that captures any unicode character, but
not the underscore and the digits 0-9. "^(?u)\w$" captures them also.
Is there a possibility to restrict an expression like "\w" to "\w
without [0-9_]"?

Click to expand...

'(?u)[^\W0-9_]' removes 0-9_ from \w.

-Mark

Hello Mark,

haven't tried it yet, but it looks good!
@John: Sorry for being imprecise, I meant *letters*, not *characters*,
so requirement 2 fits my needs.

Regards,
Andreas

Andreas Pfrengle · Apr 13, 2009

Hello,

Click to expand...

I'd like to create a regex that captures any unicode character, but
not the underscore and the digits 0-9. "^(?u)\w$" captures them also.
Is there a possibility to restrict an expression like "\w" to "\w
without [0-9_]"?

Click to expand...

'(?u)[^\W0-9_]' removes 0-9_ from \w.

-Mark

Hello Mark,

haven't tried it yet, but it looks good!
@John: Sorry for being imprecise, I meant *letters*, not *characters*,
so requirement 2 fits my needs.

Regards,
Andreas

Mark Tolonen · Apr 13, 2009

Andreas Pfrengle said:
Hello,

Click to expand...

I'd like to create a regex that captures any unicode character, but
not the underscore and the digits 0-9. "^(?u)\w$" captures them also.
Is there a possibility to restrict an expression like "\w" to "\w
without [0-9_]"?

Click to expand...

'(?u)[^\W0-9_]' removes 0-9_ from \w.

-Mark

Click to expand...

Hello Mark,

haven't tried it yet, but it looks good!
@John: Sorry for being imprecise, I meant *letters*, not *characters*,
so requirement 2 fits my needs.

Note that \w matches alphanumeric Unicode characters. If you only want
letters, consider superscripts(¹²³), fractions (¼½¾), and other characters
are also numbers to Unicode. See the unicodedata.category function and
http://www.unicode.org/Public/UNIDATA/UCD.html#General_Category_Values.

If you only want letters as considered by the Unicode standard, something
this would give you only Unicode letters (it could be optimized to list
ranges of characters):

u'(?u)[' + u''.join(unichr(n) for n in xrange(65536) if
ud.category(unichr(n))[0]=='L') + u']'

Hmm, maybe Python 3.0 with its default Unicode strings needs a regex
extension to specify the Unicode category to match.

-Mark

python3 raw strings and \u escapes	10	May 30, 2012
Convert AWK regex to Python	6	May 16, 2011
Conversion of perl based regex to python method	2	May 24, 2006
String#split regex \W on non-ASCII text	1	Nov 9, 2010
python regex character group matches	2	Sep 17, 2008
Help with regex	11	Aug 6, 2009
Questions about regex	3	May 29, 2009
Regex for unicode letter characters	4	Jan 11, 2009

Regex similar to "^(?u)\w$", but without digits?

Andreas

John Machin

Mark Tolonen

Andreas Pfrengle

Andreas Pfrengle

Mark Tolonen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads