Re: substitution

Discussion in 'Python' started by Iain King, Jan 18, 2010.

  1. Iain King

    Iain King Guest

    On Jan 18, 10:21 am, superpollo <> wrote:
    > superpollo ha scritto:
    >
    > > hi.

    >
    > > what is the most pythonic way to substitute substrings?

    >
    > > eg: i want to apply:

    >
    > > foo --> bar
    > > baz --> quux
    > > quuux --> foo

    >
    > > so that:

    >
    > > fooxxxbazyyyquuux --> barxxxquuxyyyfoo

    >
    > > bye

    >
    > i explain better:
    >
    > say the subs are:
    >
    > quuux --> foo
    > foo --> bar
    > baz --> quux
    >
    > then i cannot apply the subs in sequence (say, .replace() in a loop),
    > otherwise:
    >
    > fooxxxbazyyyquuux --> fooxxxbazyyyfoo --> barxxxbazyyybar -->
    > barxxxquuxyyybar
    >
    > not as intended...



    Not sure if it's the most pythonic, but I'd probably do it like this:

    def token_replace(string, subs):
    subs = dict(subs)
    tokens = {}
    for i, sub in enumerate(subs):
    tokens[sub] = i
    tokens = sub
    current = [string]
    for sub in subs:
    new = []
    for piece in current:
    if type(piece) == str:
    chunks = piece.split(sub)
    new.append(chunks[0])
    for chunk in chunks[1:]:
    new.append(tokens[sub])
    new.append(chunk)
    else:
    new.append(piece)
    current = new
    output = []
    for piece in current:
    if type(piece) == str:
    output.append(piece)
    else:
    output.append(subs[tokens[piece]])
    return ''.join(output)

    >>> token_replace("fooxxxbazyyyquuux", [("quuux", "foo"), ("foo", "bar"), ("baz", "quux")])

    'barxxxquuxyyyfoo'

    I'm sure someone could whittle that down to a handful of list comps...
    Iain
    Iain King, Jan 18, 2010
    #1
    1. Advertising

  2. Iain King

    Iain King Guest

    On Jan 18, 12:41 pm, Iain King <> wrote:
    > On Jan 18, 10:21 am, superpollo <> wrote:
    >
    >
    >
    > > superpollo ha scritto:

    >
    > > > hi.

    >
    > > > what is the most pythonic way to substitute substrings?

    >
    > > > eg: i want to apply:

    >
    > > > foo --> bar
    > > > baz --> quux
    > > > quuux --> foo

    >
    > > > so that:

    >
    > > > fooxxxbazyyyquuux --> barxxxquuxyyyfoo

    >
    > > > bye

    >
    > > i explain better:

    >
    > > say the subs are:

    >
    > > quuux --> foo
    > > foo --> bar
    > > baz --> quux

    >
    > > then i cannot apply the subs in sequence (say, .replace() in a loop),
    > > otherwise:

    >
    > > fooxxxbazyyyquuux --> fooxxxbazyyyfoo --> barxxxbazyyybar -->
    > > barxxxquuxyyybar

    >
    > > not as intended...

    >
    > Not sure if it's the most pythonic, but I'd probably do it like this:
    >
    > def token_replace(string, subs):
    >         subs = dict(subs)
    >         tokens = {}
    >         for i, sub in enumerate(subs):
    >                 tokens[sub] = i
    >                 tokens = sub
    >         current = [string]
    >         for sub in subs:
    >                 new = []
    >                 for piece in current:
    >                         if type(piece) == str:
    >                                 chunks = piece.split(sub)
    >                                 new.append(chunks[0])
    >                                 for chunk in chunks[1:]:
    >                                         new.append(tokens[sub])
    >                                         new.append(chunk)
    >                         else:
    >                                 new.append(piece)
    >                 current = new
    >         output = []
    >         for piece in current:
    >                 if type(piece) == str:
    >                         output.append(piece)
    >                 else:
    >                         output.append(subs[tokens[piece]])
    >         return ''.join(output)
    >
    > >>> token_replace("fooxxxbazyyyquuux", [("quuux", "foo"), ("foo", "bar"), ("baz", "quux")])

    >
    > 'barxxxquuxyyyfoo'
    >
    > I'm sure someone could whittle that down to a handful of list comps...
    > Iain


    Slightly better (lets you have overlapping search strings, used in the
    order they are fed in):

    def token_replace(string, subs):
    tokens = {}
    if type(subs) == dict:
    for i, sub in enumerate(subs):
    tokens[sub] = i
    tokens = subs[sub]
    else:
    s = []
    for i, (k,v) in enumerate(subs):
    tokens[k] = i
    tokens = v
    s.append(k)
    subs = s
    current = [string]
    for sub in subs:
    new = []
    for piece in current:
    if type(piece) == str:
    chunks = piece.split(sub)
    new.append(chunks[0])
    for chunk in chunks[1:]:
    new.append(tokens[sub])
    new.append(chunk)
    else:
    new.append(piece)
    current = new
    output = []
    for piece in current:
    if type(piece) == str:
    output.append(piece)
    else:
    output.append(tokens[piece])
    return ''.join(output)
    Iain King, Jan 18, 2010
    #2
    1. Advertising

  3. Iain King

    Peter Otten Guest

    Iain King wrote:

    > Not sure if it's the most pythonic, but I'd probably do it like this:
    >
    > def token_replace(string, subs):
    > subs = dict(subs)
    > tokens = {}
    > for i, sub in enumerate(subs):
    > tokens[sub] = i
    > tokens = sub
    > current = [string]
    > for sub in subs:
    > new = []
    > for piece in current:
    > if type(piece) == str:
    > chunks = piece.split(sub)
    > new.append(chunks[0])
    > for chunk in chunks[1:]:
    > new.append(tokens[sub])
    > new.append(chunk)
    > else:
    > new.append(piece)
    > current = new
    > output = []
    > for piece in current:
    > if type(piece) == str:
    > output.append(piece)
    > else:
    > output.append(subs[tokens[piece]])
    > return ''.join(output)
    >
    > >>> token_replace("fooxxxbazyyyquuux", [("quuux", "foo"), ("foo", "bar"),

    ("baz", "quux")])
    > 'barxxxquuxyyyfoo'
    >
    > I'm sure someone could whittle that down to a handful of list comps...


    I tried, but failed:

    def join(chunks, separator):
    chunks = iter(chunks)
    yield next(chunks)
    for chunk in chunks:
    yield separator
    yield chunk

    def token_replace(string, subs):
    tokens = {}

    current = [string]
    for i, (find, replace) in enumerate(subs):
    tokens = replace
    new = []
    for piece in current:
    if piece in tokens:
    new.append(piece)
    else:
    new.extend(join(piece.split(find), i))
    current = new

    return ''.join(tokens.get(piece, piece) for piece in current)

    You could replace the inner loop with sum(..., []), but that would be really
    ugly.

    Peter
    Peter Otten, Jan 18, 2010
    #3
  4. On Mon, 18 Jan 2010 14:43:46 +0100, superpollo <>
    declaimed the following in gmane.comp.python.general:

    >
    > i guess that the algorithm would be easier if it was known in advance
    > that the string to substitute must have some specific property, say:
    >
    > 1) they all must start with "XYZ"
    > 2) they all have the same length N (e.g. 5)
    >

    That now seems to conflict with your previous sample where old=>new
    terms were different lengths.

    The original description is one in which I'd probably have done a
    series of .split()/.join() operations, using some sort of marker string
    that is not valid for the original input to hold the position of
    "old"... (repeat for each "old", with unique markers) Then repeating the
    ..split/.join replacing the markers with the proper "new" strings...

    So how do you combine item 1 above with the prior multiple
    replacements?

    -=-=-=-=-=-=-=-=-

    INPUT = "qweXYZ12asdXYZ1345XYZ"
    OLD = "XYZ"
    NEW = "IWAS"
    MINLEN = 5

    res = []
    oldlen = len(OLD)
    tail = MINLEN - oldlen
    parts = INPUT.split(OLD)
    res.append(parts[0])
    for term in parts[1:]:
    if len(term) >= tail:
    res.append(NEW + term)
    else:
    res.append(OLD + term)
    output = "".join(res)

    print "'%s'" % output
    -=-=-=-=-=-=-=-=-=-
    'qweIWAS12asdIWAS1345XYZ'


    --
    Wulfraed Dennis Lee Bieber KD6MOG
    HTTP://wlfraed.home.netcom.com/
    Dennis Lee Bieber, Jan 19, 2010
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. valentin tihomirov

    Should this substitution be compilable?

    valentin tihomirov, Nov 28, 2004, in forum: VHDL
    Replies:
    12
    Views:
    780
    valentin tihomirov
    Nov 30, 2004
  2. Troll
    Replies:
    6
    Views:
    2,430
    Kris Wempa
    Sep 26, 2003
  3. Justin

    adobe multiline substitution

    Justin, Dec 8, 2003, in forum: Perl
    Replies:
    0
    Views:
    489
    Justin
    Dec 8, 2003
  4. Ashok

    Substitution Problem

    Ashok, Jul 18, 2004, in forum: Perl
    Replies:
    1
    Views:
    647
    Gunnar Hjalmarsson
    Jul 18, 2004
  5. Ian
    Replies:
    4
    Views:
    2,297
    Ben Bacarisse
    Feb 2, 2006
Loading...

Share This Page