substitution

Discussion in 'Python' started by Adi Eyal, Jan 18, 2010.

  1. Adi Eyal

    Adi Eyal Guest

    > From: superpollo <>
    > To:
    > Date: Mon, 18 Jan 2010 11:15:37 +0100
    > Subject: substitution
    > hi.
    >
    > what is the most pythonic way to substitute substrings?
    >
    > eg: i want to apply:
    >
    > foo --> bar
    > baz --> quux
    > quuux --> foo
    >
    > so that:
    >
    > fooxxxbazyyyquuux --> barxxxquuxyyyfoo
    >
    > bye



    Using regular expressions the answer is short (and sweet)

    mapping = {
    "foo" : "bar",
    "baz" : "quux",
    "quuux" : "foo"
    }

    pattern = "(%s)" % "|".join(mapping.keys())
    repl = lambda x : mapping.get(x.group(1), x.group(1))
    s = "fooxxxbazyyyquuux"
    re.subn(pattern, repl, s)
    Adi Eyal, Jan 18, 2010
    #1
    1. Advertising

  2. Adi Eyal

    Iain King Guest

    On Jan 18, 2:17 pm, Adi Eyal <> wrote:
    > > From: superpollo <>
    > > To:
    > > Date: Mon, 18 Jan 2010 11:15:37 +0100
    > > Subject: substitution
    > > hi.

    >
    > > what is the most pythonic way to substitute substrings?

    >
    > > eg: i want to apply:

    >
    > > foo --> bar
    > > baz --> quux
    > > quuux --> foo

    >
    > > so that:

    >
    > > fooxxxbazyyyquuux --> barxxxquuxyyyfoo

    >
    > > bye

    >
    > Using regular expressions the answer is short (and sweet)
    >
    > mapping = {
    >         "foo" : "bar",
    >         "baz" : "quux",
    >         "quuux" : "foo"
    >
    > }
    >
    > pattern = "(%s)" % "|".join(mapping.keys())
    > repl = lambda x : mapping.get(x.group(1), x.group(1))
    > s = "fooxxxbazyyyquuux"
    > re.subn(pattern, repl, s)


    Winner! :)

    Iain
    Iain King, Jan 18, 2010
    #2
    1. Advertising

  3. On Mon, 18 Jan 2010 06:23:44 -0800, Iain King wrote:

    > On Jan 18, 2:17 pm, Adi Eyal <> wrote:

    [...]
    >> Using regular expressions the answer is short (and sweet)
    >>
    >> mapping = {
    >>         "foo" : "bar",
    >>         "baz" : "quux",
    >>         "quuux" : "foo"
    >>
    >> }
    >>
    >> pattern = "(%s)" % "|".join(mapping.keys())
    >> repl = lambda x : mapping.get(x.group(1), x.group(1))
    >> s = "fooxxxbazyyyquuux"
    >> re.subn(pattern, repl, s)

    >
    > Winner! :)


    What are the rules for being declared "Winner"? For the simple case
    given, calling s.replace three times is much faster: more than twice as
    fast.

    But a bigger problem is that the above "winner" may not work correctly if
    there are conflicts between the target strings (e.g. 'a'->'X',
    'aa'->'Y'). The problem is that the result you get depends on the order
    of the searches, BUT as given, that order is non-deterministic.
    dict.keys() returns in an arbitrary order, which means the caller can't
    specify the order except by accident. For example:

    >>> repl = lambda x : m[x.group(1)]
    >>> m = {'aa': 'Y', 'a': 'X'}
    >>> pattern = "(%s)" % "|".join(m.keys())
    >>> subn(pattern, repl, 'aaa') # expecting 'YX'

    ('XXX', 3)

    The result that you get using this method will be consistent but
    arbitrary and unpredictable.




    For those who care, here's my timing code:

    from timeit import Timer

    setup = """
    mapping = {"foo" : "bar", "baz" : "quux", "quuux" : "foo"}
    pattern = "(%s)" % "|".join(mapping.keys())
    repl = lambda x : mapping.get(x.group(1), x.group(1))
    repl = lambda x : mapping[x.group(1)]
    s = "fooxxxbazyyyquuux"
    from re import subn
    """

    t1 = Timer("subn(pattern, repl, s)", setup)
    t2 = Timer(
    "s.replace('foo', 'bar').replace('baz', 'quux').replace('quuux', 'foo')",
    "s = 'fooxxxbazyyyquuux'")


    And the results on my PC:

    >>> min(t1.repeat(number=100000))

    1.1273870468139648
    >>> min(t2.repeat(number=100000))

    0.49491715431213379



    --
    Steven
    Steven D'Aprano, Jan 18, 2010
    #3
  4. Adi Eyal

    Iain King Guest

    On Jan 18, 4:26 pm, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:
    > On Mon, 18 Jan 2010 06:23:44 -0800, Iain King wrote:
    > > On Jan 18, 2:17 pm, Adi Eyal <> wrote:

    > [...]
    > >> Using regular expressions the answer is short (and sweet)

    >
    > >> mapping = {
    > >>         "foo" : "bar",
    > >>         "baz" : "quux",
    > >>         "quuux" : "foo"

    >
    > >> }

    >
    > >> pattern = "(%s)" % "|".join(mapping.keys())
    > >> repl = lambda x : mapping.get(x.group(1), x.group(1))
    > >> s = "fooxxxbazyyyquuux"
    > >> re.subn(pattern, repl, s)

    >
    > > Winner! :)

    >
    > What are the rules for being declared "Winner"? For the simple case
    > given, calling s.replace three times is much faster: more than twice as
    > fast.
    >
    > But a bigger problem is that the above "winner" may not work correctly if
    > there are conflicts between the target strings (e.g. 'a'->'X',
    > 'aa'->'Y'). The problem is that the result you get depends on the order
    > of the searches, BUT as given, that order is non-deterministic.
    > dict.keys() returns in an arbitrary order, which means the caller can't
    > specify the order except by accident. For example:
    >
    > >>> repl = lambda x : m[x.group(1)]
    > >>> m = {'aa': 'Y', 'a': 'X'}
    > >>> pattern = "(%s)" % "|".join(m.keys())
    > >>> subn(pattern, repl, 'aaa')  # expecting 'YX'

    >
    > ('XXX', 3)
    >
    > The result that you get using this method will be consistent but
    > arbitrary and unpredictable.
    >
    > For those who care, here's my timing code:
    >
    > from timeit import Timer
    >
    > setup = """
    > mapping = {"foo" : "bar", "baz" : "quux", "quuux" : "foo"}
    > pattern = "(%s)" % "|".join(mapping.keys())
    > repl = lambda x : mapping.get(x.group(1), x.group(1))
    > repl = lambda x : mapping[x.group(1)]
    > s = "fooxxxbazyyyquuux"
    > from re import subn
    > """
    >
    > t1 = Timer("subn(pattern, repl, s)", setup)
    > t2 = Timer(
    > "s.replace('foo', 'bar').replace('baz', 'quux').replace('quuux', 'foo')",
    > "s = 'fooxxxbazyyyquuux'")
    >
    > And the results on my PC:
    >
    > >>> min(t1.repeat(number=100000))

    > 1.1273870468139648
    > >>> min(t2.repeat(number=100000))

    >
    > 0.49491715431213379
    >
    > --
    > Steven


    Adi elicited that response from me because his solution was vastly
    more succinct than everything else that had appeared up til that point
    while still meeting the OP's requirements. The OP never cared about
    overlap between 2 'find' strings, just between the 'find' and
    'replace' strings (though I did take it into account in my second post
    for the sake of completeness). His code could have been a little
    cleaner, I'd have trimmed it to:

    mapping = {"foo": "bar", "baz": "quux", "quuux": "foo"}
    pattern = "(%s)" % "|".join(mapping)
    repl = lambda x : mapping[x.group(1)]
    s = "fooxxxbazyyyquuux"
    re.subn(pattern, repl, s)

    but apart from that was very pythonic: explicit, succinct, and all the
    heavy work is done by the module (i.e. in compiled c code in the
    majority case of CPython). It can be 'upgraded' to cover the find-
    find overlap if you really want (I *believe* regexps will match the
    leftmost option in a group first):

    subs = [("foo", "bar"), ("baz", "quux"), ("quuux", "foo")]
    pattern = "(%s)" % "|".join((x[0] for x in subs))
    mapping = dict(subs)
    repl = lambda x : mapping[x.group(1)]
    s = "fooxxxbazyyyquuux"
    re.subn(pattern, repl, s)

    Anyway, there's no prize for winning, but by all means: if you think
    someone else's code and not a variation on this should win for most
    pythonic, then make your nomination :)

    Iain
    Iain King, Jan 18, 2010
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. valentin tihomirov

    Should this substitution be compilable?

    valentin tihomirov, Nov 28, 2004, in forum: VHDL
    Replies:
    12
    Views:
    773
    valentin tihomirov
    Nov 30, 2004
  2. Troll
    Replies:
    6
    Views:
    2,429
    Kris Wempa
    Sep 26, 2003
  3. Justin

    adobe multiline substitution

    Justin, Dec 8, 2003, in forum: Perl
    Replies:
    0
    Views:
    488
    Justin
    Dec 8, 2003
  4. Ashok

    Substitution Problem

    Ashok, Jul 18, 2004, in forum: Perl
    Replies:
    1
    Views:
    645
    Gunnar Hjalmarsson
    Jul 18, 2004
  5. Ian
    Replies:
    4
    Views:
    2,296
    Ben Bacarisse
    Feb 2, 2006
Loading...

Share This Page