regular expression unicode character class trouble

  • Thread starter Diez B. Roggisch
  • Start date
D

Diez B. Roggisch

Hi,

I need in a unicode-environment the character-class

set("\w") - set("[0-9]")

or aplha w/o num. Any ideas how to create that? And what performance
implications do I have to fear? I mean I guess that the characterclasses
aren't implementet as sets, but as comparison-function that compares a
value with certain well-defined ranges.

Regards,

Diez
 
S

Steven Bethard

Diez said:
Hi,

I need in a unicode-environment the character-class

set("\w") - set("[0-9]")

or aplha w/o num. Any ideas how to create that?

I'd use something like r"[^_\d\W]", that is, all things that are neither
underscores, digits or non-alphas. In action:

py> re.findall(r'[^_\d\W]+', '42badger100x__xxA1BC')
['badger', 'x', 'xxA', 'BC']

HTH,

STeVe
 
D

Diez B. Roggisch

Steven said:
I'd use something like r"[^_\d\W]", that is, all things that are neither
underscores, digits or non-alphas. In action:

py> re.findall(r'[^_\d\W]+', '42badger100x__xxA1BC')
['badger', 'x', 'xxA', 'BC']

HTH,

Seems so, great!

Diez
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top