Regex for unicode letter characters

Discussion in 'Python' started by schickb, Jan 11, 2009.

  1. schickb

    schickb Guest

    I need a regex that will match strings containing only unicode letter
    characters (not including numeric or the _ character). I was surprised
    to find the 're' module does not include a special character class for
    this already (python 2.6). Or did I miss something?

    It seems like this would be a very common need. Is the following the
    only option to generate the character class (based on an old post by
    Martin v. Löwis )?

    import unicodedata, sys

    def letters():
    start = end = None
    result = []
    for index in xrange(sys.maxunicode + 1):
    c = unichr(index)
    if unicodedata.category(c)[0] == 'L':
    if start is None:
    start = end = c
    else:
    end = c
    elif start:
    if start == end:
    result.append(start)
    else:
    result.append(start + "-" + end)
    start = None
    return u'[' + u''.join(result) + u']'

    Seems rather cumbersome.

    -Brad
     
    schickb, Jan 11, 2009
    #1
    1. Advertising

  2. schickb

    MRAB Guest

    schickb wrote:
    > I need a regex that will match strings containing only unicode letter
    > characters (not including numeric or the _ character). I was surprised
    > to find the 're' module does not include a special character class for
    > this already (python 2.6). Or did I miss something?
    >
    > It seems like this would be a very common need. Is the following the
    > only option to generate the character class (based on an old post by
    > Martin v. Löwis )?
    >

    [snip]
    Basically, yes.

    The re module was last worked on in 2003 (remember it's all voluntary!).
    Such omissions should be addressed in Python 2.7.
     
    MRAB, Jan 11, 2009
    #2
    1. Advertising

  3. schickb

    Steve Holden Guest

    MRAB wrote:
    > schickb wrote:
    >> I need a regex that will match strings containing only unicode letter
    >> characters (not including numeric or the _ character). I was surprised
    >> to find the 're' module does not include a special character class for
    >> this already (python 2.6). Or did I miss something?
    >>
    >> It seems like this would be a very common need. Is the following the
    >> only option to generate the character class (based on an old post by
    >> Martin v. Löwis )?
    >>

    > [snip]
    > Basically, yes.
    >
    > The re module was last worked on in 2003 (remember it's all voluntary!).
    > Such omissions should be addressed in Python 2.7.


    By "should be" do you mean "ought to be (but I have no intention of
    helping)", "are expected to be (but someone else will be doing the
    work", "it's on my list and I am expecting to get finished in time for
    2.7 integration" or something else?

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
     
    Steve Holden, Jan 11, 2009
    #3
  4. schickb

    MRAB Guest

    Steve Holden wrote:
    > MRAB wrote:
    >> schickb wrote:
    >>> I need a regex that will match strings containing only unicode letter
    >>> characters (not including numeric or the _ character). I was surprised
    >>> to find the 're' module does not include a special character class for
    >>> this already (python 2.6). Or did I miss something?
    >>>
    >>> It seems like this would be a very common need. Is the following the
    >>> only option to generate the character class (based on an old post by
    >>> Martin v. Löwis )?
    >>>

    >> [snip]
    >> Basically, yes.
    >>
    >> The re module was last worked on in 2003 (remember it's all voluntary!).
    >> Such omissions should be addressed in Python 2.7.

    >
    > By "should be" do you mean "ought to be (but I have no intention of
    > helping)", "are expected to be (but someone else will be doing the
    > work", "it's on my list and I am expecting to get finished in time for
    > 2.7 integration" or something else?
    >

    The third one.
     
    MRAB, Jan 11, 2009
    #4
  5. schickb

    Steve Holden Guest

    MRAB wrote:
    > Steve Holden wrote:
    >> MRAB wrote:
    >>> schickb wrote:
    >>>> I need a regex that will match strings containing only unicode letter
    >>>> characters (not including numeric or the _ character). I was surprised
    >>>> to find the 're' module does not include a special character class for
    >>>> this already (python 2.6). Or did I miss something?
    >>>>
    >>>> It seems like this would be a very common need. Is the following the
    >>>> only option to generate the character class (based on an old post by
    >>>> Martin v. Löwis )?
    >>>>
    >>> [snip]
    >>> Basically, yes.
    >>>
    >>> The re module was last worked on in 2003 (remember it's all voluntary!).
    >>> Such omissions should be addressed in Python 2.7.

    >>
    >> By "should be" do you mean "ought to be (but I have no intention of
    >> helping)", "are expected to be (but someone else will be doing the
    >> work", "it's on my list and I am expecting to get finished in time for
    >> 2.7 integration" or something else?
    >>

    > The third one.


    Well, that's good news. Let me know if you need help.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC http://www.holdenweb.com/
     
    Steve Holden, Jan 11, 2009
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. vertigo

    big letter -> small letter

    vertigo, Jul 6, 2004, in forum: Python
    Replies:
    4
    Views:
    774
    Reinhold Birkenfeld
    Jul 6, 2004
  2. Tony Meyer

    RE: big letter -> small letter

    Tony Meyer, Jul 6, 2004, in forum: Python
    Replies:
    0
    Views:
    511
    Tony Meyer
    Jul 6, 2004
  3. Andrew McNamara

    Re: big letter -> small letter

    Andrew McNamara, Jul 6, 2004, in forum: Python
    Replies:
    2
    Views:
    760
    Scott David Daniels
    Jul 6, 2004
  4. Grzegorz ¦liwiñski
    Replies:
    2
    Views:
    985
    Grzegorz ¦liwiñski
    Jan 19, 2011
  5. Replies:
    54
    Views:
    494
    Michele Dondi
    Jan 16, 2005
Loading...

Share This Page