Re: Raw string substitution problem

Discussion in 'Python' started by Ed Keith, Dec 16, 2009.

  1. Ed Keith

    Ed Keith Guest

    --- On Wed, 12/16/09, Gabriel Genellina <> wrote:

    > From: Gabriel Genellina <>
    > Subject: Re: Raw string substitution problem
    > To:
    > Date: Wednesday, December 16, 2009, 9:35 AM
    > En Wed, 16 Dec 2009 11:09:32 -0300,
    > Ed Keith <>
    > escribió:
    >
    > > I am having a problem when substituting a raw string.

    > When I do the following:
    > >
    > > re.sub('abc', r'a\nb\nc', '123abcdefg')
    > >
    > > I get
    > >
    > > """
    > > 123a
    > > b
    > > cdefg
    > > """
    > >
    > > what I want is
    > >
    > > r'123a\nb\ncdefg'

    >
    > From http://docs.python.org/library/re.html#re.sub
    >
    >     re.sub(pattern, repl, string[, count])
    >
    >     ...repl can be a string or a function;
    > if
    >     it is a string, any backslash escapes
    > in
    >     it are processed.. That is, \n is
    > converted
    >     to a single newline character, \r is
    >     converted to a linefeed, and so forth.
    >
    > So you'll have to double your backslashes:
    >
    > py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
    > '123a\\nb\\ncdefg'
    >
    > --Gabriel Genellina
    >
    > --http://mail.python.org/mailman/listinfo/python-list
    >


    That is going to be a nontrivial exercise. I have control over the pattern, but the texts to be substituted and substituted into will be read from user supplied files. I need to reproduce the exact text the is read from the file.

    Maybe what I should do is use re to break the string into two pieces, the part before the pattern to be replaces and the part after it, then splice the replacement text in between them. Seems like doing it the hard way, but it should work.

    Thanks,

    -EdK
    Ed Keith, Dec 16, 2009
    #1
    1. Advertising

  2. Ed Keith

    Peter Otten Guest

    Ed Keith wrote:

    > --- On Wed, 12/16/09, Gabriel Genellina <> wrote:
    >
    >> From: Gabriel Genellina <>
    >> Subject: Re: Raw string substitution problem
    >> To:
    >> Date: Wednesday, December 16, 2009, 9:35 AM
    >> En Wed, 16 Dec 2009 11:09:32 -0300,
    >> Ed Keith <>
    >> escribió:
    >>
    >> > I am having a problem when substituting a raw string.

    >> When I do the following:
    >> >
    >> > re.sub('abc', r'a\nb\nc', '123abcdefg')
    >> >
    >> > I get
    >> >
    >> > """
    >> > 123a
    >> > b
    >> > cdefg
    >> > """
    >> >
    >> > what I want is
    >> >
    >> > r'123a\nb\ncdefg'

    >>
    >> From http://docs.python.org/library/re.html#re.sub
    >>
    >> re.sub(pattern, repl, string[, count])
    >>
    >> ...repl can be a string or a function;
    >> if
    >> it is a string, any backslash escapes
    >> in
    >> it are processed. That is, \n is
    >> converted
    >> to a single newline character, \r is
    >> converted to a linefeed, and so forth.
    >>
    >> So you'll have to double your backslashes:
    >>
    >> py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
    >> '123a\\nb\\ncdefg'
    >>
    >> --Gabriel Genellina
    >>
    >> --http://mail.python.org/mailman/listinfo/python-list
    >>

    >
    > That is going to be a nontrivial exercise. I have control over the
    > pattern, but the texts to be substituted and substituted into will be read
    > from user supplied files. I need to reproduce the exact text the is read
    > from the file.


    There is a helper function re.escape() that you can use to sanitize the
    substitution:

    >>> print re.sub('abc', re.escape(r'a\nb\nc'), '123abcdefg')

    123a\nb\ncdefg

    Peter
    Peter Otten, Dec 16, 2009
    #2
    1. Advertising

  3. En Wed, 16 Dec 2009 14:51:08 -0300, Peter Otten <>
    escribió:

    > Ed Keith wrote:
    >
    >> --- On Wed, 12/16/09, Gabriel Genellina <> wrote:
    >>
    >>> Ed Keith <>
    >>> escribió:
    >>>
    >>> > I am having a problem when substituting a raw string.
    >>> When I do the following:
    >>> >
    >>> > re.sub('abc', r'a\nb\nc', '123abcdefg')
    >>> >
    >>> > I get
    >>> >
    >>> > """
    >>> > 123a
    >>> > b
    >>> > cdefg
    >>> > """
    >>> >
    >>> > what I want is
    >>> >
    >>> > r'123a\nb\ncdefg'
    >>>
    >>> So you'll have to double your backslashes:
    >>>
    >>> py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
    >>> '123a\\nb\\ncdefg'
    >>>

    >> That is going to be a nontrivial exercise. I have control over the
    >> pattern, but the texts to be substituted and substituted into will be
    >> read
    >> from user supplied files. I need to reproduce the exact text the is read
    >> from the file.

    >
    > There is a helper function re.escape() that you can use to sanitize the
    > substitution:
    >
    >>>> print re.sub('abc', re.escape(r'a\nb\nc'), '123abcdefg')

    > 123a\nb\ncdefg


    Unfortunately re.escape does much more than that:

    py> print re.sub('abc', re.escape(r'a.b.c'), '123abcdefg')
    123a\.b\.cdefg

    I think the string_escape encoding is what the OP needs:

    py> print re.sub('abc', r'a\n(b.c)\nd'.encode("string_escape"),
    '123abcdefg')
    123a\n(b.c)\nddefg

    --
    Gabriel Genellina
    Gabriel Genellina, Dec 16, 2009
    #3
  4. Ed Keith

    Peter Otten Guest

    Gabriel Genellina wrote:

    > En Wed, 16 Dec 2009 14:51:08 -0300, Peter Otten <>
    > escribió:
    >
    >> Ed Keith wrote:
    >>
    >>> --- On Wed, 12/16/09, Gabriel Genellina <> wrote:
    >>>
    >>>> Ed Keith <>
    >>>> escribió:
    >>>>
    >>>> > I am having a problem when substituting a raw string.
    >>>> When I do the following:
    >>>> >
    >>>> > re.sub('abc', r'a\nb\nc', '123abcdefg')
    >>>> >
    >>>> > I get
    >>>> >
    >>>> > """
    >>>> > 123a
    >>>> > b
    >>>> > cdefg
    >>>> > """
    >>>> >
    >>>> > what I want is
    >>>> >
    >>>> > r'123a\nb\ncdefg'
    >>>>
    >>>> So you'll have to double your backslashes:
    >>>>
    >>>> py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
    >>>> '123a\\nb\\ncdefg'
    >>>>
    >>> That is going to be a nontrivial exercise. I have control over the
    >>> pattern, but the texts to be substituted and substituted into will be
    >>> read
    >>> from user supplied files. I need to reproduce the exact text the is read
    >>> from the file.

    >>
    >> There is a helper function re.escape() that you can use to sanitize the
    >> substitution:
    >>
    >>>>> print re.sub('abc', re.escape(r'a\nb\nc'), '123abcdefg')

    >> 123a\nb\ncdefg

    >
    > Unfortunately re.escape does much more than that:
    >
    > py> print re.sub('abc', re.escape(r'a.b.c'), '123abcdefg')
    > 123a\.b\.cdefg


    Sorry, I didn't think of that.

    > I think the string_escape encoding is what the OP needs:
    >
    > py> print re.sub('abc', r'a\n(b.c)\nd'.encode("string_escape"),
    > '123abcdefg')
    > 123a\n(b.c)\nddefg


    Another possibility:

    >>> print re.sub('abc', lambda m: r'a\nb\n.c\a', '123abcdefg')

    123a\nb\n.c\adefg

    Peter
    Peter Otten, Dec 16, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Phd
    Replies:
    3
    Views:
    1,131
    Steven Bethard
    Dec 6, 2004
  2. Dan Perl

    from string to raw string

    Dan Perl, Dec 14, 2004, in forum: Python
    Replies:
    7
    Views:
    786
    Dan Perl
    Dec 16, 2004
  3. ldng
    Replies:
    3
    Views:
    1,805
    Tim Golden
    May 10, 2007
  4. Ed Keith

    Raw string substitution problem

    Ed Keith, Dec 16, 2009, in forum: Python
    Replies:
    1
    Views:
    249
    Chris Hulan
    Dec 16, 2009
  5. Gabriel Genellina

    Re: Raw string substitution problem

    Gabriel Genellina, Dec 16, 2009, in forum: Python
    Replies:
    13
    Views:
    653
Loading...

Share This Page