python regex: misbehaviour with "\r" (0x0D) as Newline characterin Unicode Mode

Discussion in 'Python' started by Arian Sanusi, Jan 27, 2008.

  1. Arian Sanusi

    Arian Sanusi Guest

    Hi,

    concerning to unicode, "\n", "\r "and "\r\n" (0x000A, 0x000D and
    0x000D+0x000A) should be threatened as newline character
    at least this is how i understand it:
    (http://en.wikipedia.org/wiki/Newline#Unicode)

    obviously, the re module does not care, and on unix, only threatens \n
    as newline char:

    >>> a=re.compile(u"^a",re.U|re.M)
    >>> a.search(u"bc\ra")
    >>> a.search(u"bc\na")

    <_sre.SRE_Match object at 0xb5908fa8>

    same thing for $:
    >>> b = re.compile(u"c$",re.U|re.M)
    >>> b.search(u"bc\r\n")
    >>> b.search(u"abc")

    <_sre.SRE_Match object at 0xb5908f70>
    >>> b.search(u"bc\nde")

    <_sre.SRE_Match object at 0xb5908fa8>

    is this a known bug in the re module? i couldn't find any issues in the
    bug tracker.
    Or is this just a user fault and you guys can help me?

    arian

    p.s.: appears in both python2.4 and 2.5
     
    Arian Sanusi, Jan 27, 2008
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John J Lee
    Replies:
    3
    Views:
    492
    bruno at modulix
    Dec 1, 2005
  2. Replies:
    15
    Views:
    1,110
    Keith Thompson
    Mar 14, 2006
  3. Vivienne
    Replies:
    3
    Views:
    454
    Vivienne
    Jan 15, 2007
  4. Fredrik Lundh
    Replies:
    0
    Views:
    371
    Fredrik Lundh
    Jan 27, 2008
  5. Marvin Gülker

    irb misbehaviour with arrow keys on Windows

    Marvin Gülker, Nov 7, 2010, in forum: Ruby
    Replies:
    9
    Views:
    144
    Charles Calvert
    Nov 13, 2010
Loading...

Share This Page