re.sub unexpected behaviour

Discussion in 'Python' started by Javier Collado, Jul 6, 2010.

  1. Hello,

    Let's imagine that we have a simple function that generates a
    replacement for a regular expression:

    def process(match):
    return match.string

    If we use that simple function with re.sub using a simple pattern and
    a string we get the expected output:
    re.sub('123', process, '123')
    '123'

    However, if the string passed to re.sub contains a trailing new line
    character, then we get an extra new line character unexpectedly:
    re.sub(r'123', process, '123\n')
    '123\n\n'

    If we try to get the same result using a replacement string, instead
    of a function, the strange behaviour cannot be reproduced:
    re.sub(r'123', '123', '123')
    '123'

    re.sub('123', '123', '123\n')
    '123\n'

    Is there any explanation for this? If I'm skipping something when
    using a replacement function with re.sub, please let me know.

    Best regards,
    Javier
    Javier Collado, Jul 6, 2010
    #1
    1. Advertising

  2. On Tue, 06 Jul 2010 19:10:17 +0200, Javier Collado wrote:

    > Hello,
    >
    > Let's imagine that we have a simple function that generates a
    > replacement for a regular expression:
    >
    > def process(match):
    > return match.string
    >
    > If we use that simple function with re.sub using a simple pattern and a
    > string we get the expected output:
    > re.sub('123', process, '123')
    > '123'
    >
    > However, if the string passed to re.sub contains a trailing new line
    > character, then we get an extra new line character unexpectedly:
    > re.sub(r'123', process, '123\n')
    > '123\n\n'


    I don't know why you say it is unexpected. The regex "123" matched the
    first three characters of "123\n". Those three characters are replaced by
    a copy of the string you are searching "123\n", which gives "123\n\n"
    exactly as expected.

    Perhaps these examples might help:

    >>> re.sub('W', process, 'Hello World')

    'Hello Hello Worldorld'
    >>> re.sub('o', process, 'Hello World')

    'HellHello World WHello Worldrld'


    Here's a simplified pure-Python equivalent of what you are doing:

    def replace_with_match_string(target, s):
    n = s.find(target)
    if n != -1:
    s = s[:n] + s + s[n+len(target):]
    return s



    > If we try to get the same result using a replacement string, instead of
    > a function, the strange behaviour cannot be reproduced: re.sub(r'123',
    > '123', '123')
    > '123'
    >
    > re.sub('123', '123', '123\n')
    > '123\n'


    The regex "123" matches the first three characters of "123\n", which is
    then replaced by "123", giving "123\n", exactly as expected.

    >>> re.sub("o", "123", "Hello World")

    'Hell123 W123rld'




    --
    Steven
    Steven D'Aprano, Jul 6, 2010
    #2
    1. Advertising

  3. Thanks for your answers. They helped me to realize that I was
    mistakenly using match.string (the whole string) when I should be
    using math.group(0) (the whole match).

    Best regards,
    Javier
    Javier Collado, Jul 6, 2010
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark
    Replies:
    4
    Views:
    2,930
    scoude
    Jan 12, 2011
  2. Steven Van den Berghe

    unexpected map behaviour

    Steven Van den Berghe, Aug 28, 2003, in forum: C++
    Replies:
    2
    Views:
    401
    Christian Jan├čen
    Aug 28, 2003
  3. Old Wolf
    Replies:
    1
    Views:
    376
    Victor Bazarov
    Feb 4, 2004
  4. Ben
    Replies:
    2
    Views:
    853
  5. Lawrence D'Oliveiro

    Death To Sub-Sub-Sub-Directories!

    Lawrence D'Oliveiro, May 5, 2011, in forum: Java
    Replies:
    92
    Views:
    1,934
    Lawrence D'Oliveiro
    May 20, 2011
Loading...

Share This Page