Python 2.7 re.IGNORECASE broken in re.sub?

Discussion in 'Python' started by Christopher, Aug 16, 2010.

  1. Christopher

    Christopher Guest

    I have the following problem:

    Python 2.7 (r27:82525, Jul 4 2010, 07:43:08) [MSC v.1500 64 bit
    (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> t="Python26"
    >>> import re
    >>> re.sub(r"python\d\d", "Python27", t)

    'Python26'
    >>> re.sub(r"python\d\d", "Python27", t, re.IGNORECASE)

    'Python26'
    >>> re.sub(r"Python\d\d", "Python27", t, re.IGNORECASE)

    'Python27'
    >>>


    ---

    Perhaps this is intended behavior, but it seems like the last two
    results should be the same, not the first two. In other words, the
    call to re.sub with re.IGNORECASE on should return "Python27" not
    "Python26".

    This appears to be the case when using compiled pattern matching:

    >>> r=re.compile(r"python\d\d", re.IGNORECASE)
    >>> r.sub("Python27", t)

    'Python27'

    ---

    Is this a known bug? Is it by design for some odd reason?
     
    Christopher, Aug 16, 2010
    #1
    1. Advertising

  2. On Sun, 15 Aug 2010 16:45:49 -0700, Christopher wrote:

    > I have the following problem:
    >
    >>>> t="Python26"
    >>>> import re
    >>>> re.sub(r"python\d\d", "Python27", t)

    > 'Python26'
    >>>> re.sub(r"python\d\d", "Python27", t, re.IGNORECASE)

    > 'Python26'
    >>>> re.sub(r"Python\d\d", "Python27", t, re.IGNORECASE)

    > 'Python27'


    > Is this a known bug? Is it by design for some odd reason?



    >>> help(re.sub)


    Help on function sub in module re:

    sub(pattern, repl, string, count=0)
    ...


    You're passing re.IGNORECASE (which happens to equal 2) as a count
    argument, not as a flag. Try this instead:

    >>> re.sub(r"python\d\d" + '(?i)', "Python27", t)

    'Python27'



    --
    Steven
     
    Steven D'Aprano, Aug 16, 2010
    #2
    1. Advertising

  3. Christopher

    Alex Willmer Guest

    On Aug 16, 1:07 am, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:
    > You're passing re.IGNORECASE (which happens to equal 2) as a count
    > argument, not as a flag. Try this instead:
    >
    > >>> re.sub(r"python\d\d" + '(?i)', "Python27", t)

    > 'Python27'


    Basically right, but in-line flags must be placed at the start of a
    pattern, or the result is undefined. Also in Python 2.7 re.sub() has a
    flags argument.

    Python 2.7.0+ (release27-maint:83286, Aug 16 2010, 01:25:58)
    [GCC 4.4.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import re
    >>> t = 'Python26'
    >>> re.sub(r'(?i)python\d\d', 'Python27', t)

    'Python27'
    >>> re.sub(r'python\d\d', 'Python27', t, flags=re.IGNORECASE)

    'Python27'

    Alex
     
    Alex Willmer, Aug 16, 2010
    #3
  4. Christopher

    MRAB Guest

    Alex Willmer wrote:
    > On Aug 16, 1:07 am, Steven D'Aprano <st...@REMOVE-THIS-
    > cybersource.com.au> wrote:
    >> You're passing re.IGNORECASE (which happens to equal 2) as a count
    >> argument, not as a flag. Try this instead:
    >>
    >>>>> re.sub(r"python\d\d" + '(?i)', "Python27", t)

    >> 'Python27'

    >
    > Basically right, but in-line flags must be placed at the start of a
    > pattern, or the result is undefined. Also in Python 2.7 re.sub() has a
    > flags argument.
    >

    [snip]
    In re such flags apply to the entire regex, no matter where they appear.
    This even applies to the (?x) (VERBOSE) flag; if re sees it at the end
    of the regex then it has to re-scan the entire regex!

    For clarity and compatibility with other regex implementations, put it
    initially.
     
    MRAB, Aug 16, 2010
    #4
  5. On Sun, 15 Aug 2010 17:36:07 -0700, Alex Willmer wrote:

    > On Aug 16, 1:07 am, Steven D'Aprano <st...@REMOVE-THIS-
    > cybersource.com.au> wrote:
    >> You're passing re.IGNORECASE (which happens to equal 2) as a count
    >> argument, not as a flag. Try this instead:
    >>
    >> >>> re.sub(r"python\d\d" + '(?i)', "Python27", t)

    >> 'Python27'

    >
    > Basically right, but in-line flags must be placed at the start of a
    > pattern, or the result is undefined.


    Pardon me, but that's clearly not correct, as proven by the fact that the
    above example works.

    You can say that the flags *should* go at the start, for the sake of
    efficiency, or ease of comprehension, or tradition, or to appease the
    Regex Cops who roam the streets beating up those who don't write regexes
    in the approved fashion. But it isn't true that they *must* go at the
    front.


    --
    Steven
     
    Steven D'Aprano, Aug 16, 2010
    #5
  6. Christopher

    Christopher Guest

    On Aug 15, 8:07 pm, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:
    > On Sun, 15 Aug 2010 16:45:49 -0700, Christopher wrote:
    > > I have the following problem:

    >
    > >>>> t="Python26"
    > >>>> import re
    > >>>> re.sub(r"python\d\d", "Python27", t)

    > > 'Python26'
    > >>>> re.sub(r"python\d\d", "Python27", t, re.IGNORECASE)

    > > 'Python26'
    > >>>> re.sub(r"Python\d\d", "Python27", t, re.IGNORECASE)

    > > 'Python27'
    > > Is this a known bug?  Is it by design for some odd reason?
    > >>> help(re.sub)

    >
    > Help on function sub in module re:
    >
    >     sub(pattern, repl, string, count=0)
    >     ...
    >
    > You're passing re.IGNORECASE (which happens to equal 2) as a count
    > argument, not as a flag. Try this instead:
    >
    > >>> re.sub(r"python\d\d" + '(?i)', "Python27", t)

    >
    > 'Python27'
    >


    Thanks. Somehow I didn't notice that other argument after looking at
    it a million times. :)
     
    Christopher, Aug 16, 2010
    #6
  7. Christopher

    Alex Willmer Guest

    On Aug 16, 12:23 pm, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:
    > On Sun, 15 Aug 2010 17:36:07 -0700, Alex Willmer wrote:
    > > On Aug 16, 1:07 am, Steven D'Aprano <st...@REMOVE-THIS-
    > > cybersource.com.au> wrote:
    > >> You're passing re.IGNORECASE (which happens to equal 2) as a count
    > >> argument, not as a flag. Try this instead:

    >
    > >> >>> re.sub(r"python\d\d" + '(?i)', "Python27", t)
    > >> 'Python27'

    >
    > > Basically right, but in-line flags must be placed at the start of a
    > > pattern, or the result is undefined.

    >
    > Pardon me, but that's clearly not correct, as proven by the fact that the
    > above example works.


    Undefined includes 'might work sometimes'. I refer you to the Python
    documentation:

    "Note that the (?x) flag changes how the expression is parsed. It
    should be used first in the expression string, or after one or more
    whitespace characters. If there are non-whitespace characters before
    the flag, the results are undefined."
    http://docs.python.org/library/re.html#regular-expression-syntax
     
    Alex Willmer, Aug 16, 2010
    #7
  8. Christopher

    Alex Willmer Guest

    On Aug 16, 1:46 pm, Alex Willmer <> wrote:
    > "Note that the (?x) flag changes how the expression is parsed. It
    > should be used first in the expression string, or after one or more
    > whitespace characters. If there are non-whitespace characters before
    > the flag, the results are undefined.
    > "http://docs.python.org/library/re.html#regular-expression-syntax


    Hmm, I found a lot of instances that place (?iLmsux) after non-
    whitespace characters

    http://google.com/codesearch?hl=en&lr=&q=file:\.py[w]%3F$+[^[:space:]%22']%2B\(\%3F[iLmsux]%2B\)

    including two from the Python unit tests, re_test.py lines 109-110.
    Perhaps the documentation is overly cautious..
     
    Alex Willmer, Aug 16, 2010
    #8
  9. On Mon, 16 Aug 2010 05:46:17 -0700, Alex Willmer wrote:

    > On Aug 16, 12:23 pm, Steven D'Aprano <st...@REMOVE-THIS-
    > cybersource.com.au> wrote:
    >> On Sun, 15 Aug 2010 17:36:07 -0700, Alex Willmer wrote:
    >> > On Aug 16, 1:07 am, Steven D'Aprano <st...@REMOVE-THIS-
    >> > cybersource.com.au> wrote:
    >> >> You're passing re.IGNORECASE (which happens to equal 2) as a count
    >> >> argument, not as a flag. Try this instead:

    >>
    >> >> >>> re.sub(r"python\d\d" + '(?i)', "Python27", t)
    >> >> 'Python27'

    >>
    >> > Basically right, but in-line flags must be placed at the start of a
    >> > pattern, or the result is undefined.

    >>
    >> Pardon me, but that's clearly not correct, as proven by the fact that
    >> the above example works.

    >
    > Undefined includes 'might work sometimes'. I refer you to the Python
    > documentation:
    >
    > "Note that the (?x) flag changes how the expression is parsed. It should
    > be used first in the expression string, or after one or more whitespace
    > characters. If there are non-whitespace characters before the flag, the
    > results are undefined."
    > http://docs.python.org/library/re.html#regular-expression-syntax



    Well so it does. I stand corrected.

    I note though that even the docs say "should" rather than "must". I
    wonder whether the documentation author is just being cautious, because
    I've seen comments on the python-dev list that imply that the current
    behaviour of flags (that their effect is global to the regex) is
    supported. E.g.:

    http://code.activestate.com/lists/python-dev/98681/

    At the point that people are seriously considering changing the behaviour
    of a replacement re engine in order to support the current "undefined"
    behaviour, perhaps that behaviour isn't quite so undefined and the docs
    need to be re-written?



    --
    Steven
     
    Steven D'Aprano, Aug 17, 2010
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ananda Sim
    Replies:
    0
    Views:
    493
    Ananda Sim
    Aug 18, 2003
  2. Jeremy

    re.IGNORECASE and re.VERBOSE

    Jeremy, Jul 18, 2005, in forum: Python
    Replies:
    1
    Views:
    471
    Reinhold Birkenfeld
    Jul 18, 2005
  3. Simon Brunning

    Re: re.IGNORECASE and re.VERBOSE

    Simon Brunning, Jul 18, 2005, in forum: Python
    Replies:
    1
    Views:
    634
    Reinhold Birkenfeld
    Jul 18, 2005
  4. Ben
    Replies:
    2
    Views:
    920
  5. Lawrence D'Oliveiro

    Death To Sub-Sub-Sub-Directories!

    Lawrence D'Oliveiro, May 5, 2011, in forum: Java
    Replies:
    92
    Views:
    2,083
    Lawrence D'Oliveiro
    May 20, 2011
Loading...

Share This Page