splitting perl-style find/replace regexp using python

Discussion in 'Python' started by John Pye, Mar 1, 2007.

  1. John Pye

    John Pye Guest

    Hi all

    I have a file with a bunch of perl regular expressions like so:

    /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
    bold
    /(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
    b>''$3/ # italic bold
    /(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
    italic

    These are all find/replace expressions delimited as '/search/replace/
    # comment' where 'search' is the regular expression we're searching
    for and 'replace' is the replacement expression.

    Is there an easy and general way that I can split these perl-style
    find-and-replace expressions into something I can use with Python, eg
    re.sub('search','replace',str) ?

    I though generally it would be good enough to split on '/' but as you
    see the <\/b> messes that up. I really don't want to learn perl
    here :)

    Cheers
    JP
     
    John Pye, Mar 1, 2007
    #1
    1. Advertising

  2. John Pye

    James Stroud Guest

    John Pye wrote:
    > Hi all
    >
    > I have a file with a bunch of perl regular expressions like so:
    >
    > /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
    > bold
    > /(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
    > b>''$3/ # italic bold
    > /(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
    > italic
    >
    > These are all find/replace expressions delimited as '/search/replace/
    > # comment' where 'search' is the regular expression we're searching
    > for and 'replace' is the replacement expression.
    >
    > Is there an easy and general way that I can split these perl-style
    > find-and-replace expressions into something I can use with Python, eg
    > re.sub('search','replace',str) ?
    >
    > I though generally it would be good enough to split on '/' but as you
    > see the <\/b> messes that up. I really don't want to learn perl
    > here :)
    >
    > Cheers
    > JP
    >


    This could be more general, in principal a perl regex could end with a
    "\", e.g. "\\/", but I'm guessing that won't happen here.

    py> for p in perlish:
    .... print p
    ....
    /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
    /(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/
    /(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/
    py> import re
    py> splitter = re.compile(r'[^\\]/')
    py> for p in perlish:
    .... print splitter.split(p)
    ....
    ['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
    "$1'''$2'''$", '']
    ['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
    "$1''<b>$2<\\/b>''$", '']
    ['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
    "$1''$2''$", '']

    (I'm hoping this doesn't wrap!)

    James
     
    James Stroud, Mar 1, 2007
    #2
    1. Advertising

  3. John Pye

    Peter Otten Guest

    John Pye wrote:

    > I have a file with a bunch of perl regular expressions like so:
    >
    > /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
    > bold
    > /(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
    > b>''$3/ # italic bold
    > /(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
    > italic
    >
    > These are all find/replace expressions delimited as '/search/replace/
    > # comment' where 'search' is the regular expression we're searching
    > for and 'replace' is the replacement expression.
    >
    > Is there an easy and general way that I can split these perl-style
    > find-and-replace expressions into something I can use with Python, eg
    > re.sub('search','replace',str) ?
    >
    > I though generally it would be good enough to split on '/' but as you
    > see the <\/b> messes that up. I really don't want to learn perl
    > here :)


    How about matching all escaped chars and '/', and then throwing away the
    former:

    def split(s):
    breaks = re.compile(r"(\\.)|(/)").finditer(s)
    left, mid, right = [b.start() for b in breaks if b.group(2)]
    return s[left+1:mid], s[mid+1:right]

    Peter
     
    Peter Otten, Mar 1, 2007
    #3
  4. John Pye

    James Stroud Guest

    James Stroud wrote:
    > John Pye wrote:
    >> Hi all
    >>
    >> I have a file with a bunch of perl regular expressions like so:
    >>
    >> /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
    >> bold
    >> /(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
    >> b>''$3/ # italic bold
    >> /(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
    >> italic
    >>
    >> These are all find/replace expressions delimited as '/search/replace/
    >> # comment' where 'search' is the regular expression we're searching
    >> for and 'replace' is the replacement expression.
    >>
    >> Is there an easy and general way that I can split these perl-style
    >> find-and-replace expressions into something I can use with Python, eg
    >> re.sub('search','replace',str) ?
    >>
    >> I though generally it would be good enough to split on '/' but as you
    >> see the <\/b> messes that up. I really don't want to learn perl
    >> here :)
    >>
    >> Cheers
    >> JP
    >>

    >
    > This could be more general, in principal a perl regex could end with a
    > "\", e.g. "\\/", but I'm guessing that won't happen here.
    >
    > py> for p in perlish:
    > ... print p
    > ...
    > /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
    > /(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/
    > /(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/
    > py> import re
    > py> splitter = re.compile(r'[^\\]/')
    > py> for p in perlish:
    > ... print splitter.split(p)
    > ...
    > ['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
    > "$1'''$2'''$", '']
    > ['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
    > "$1''<b>$2<\\/b>''$", '']
    > ['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
    > "$1''$2''$", '']
    >
    > (I'm hoping this doesn't wrap!)
    >
    > James


    I realized that threw away the closing parentheses. This is the correct
    version:

    py> splitter = re.compile(r'(?<!\\)/')
    py> for p in perlish:
    .... print splitter.split(p)
    ....
    ['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
    "$1'''$2'''$3", '']
    ['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
    ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", '']
    ['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
    "$1''$2''$3", '']

    James
     
    James Stroud, Mar 1, 2007
    #4
  5. John Pye

    Peter Otten Guest

    James Stroud wrote:

    > James Stroud wrote:
    >> John Pye wrote:
    >>> Hi all
    >>>
    >>> I have a file with a bunch of perl regular expressions like so:
    >>>
    >>> /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
    >>> bold
    >>> /(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
    >>> b>''$3/ # italic bold
    >>> /(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
    >>> italic
    >>>
    >>> These are all find/replace expressions delimited as '/search/replace/
    >>> # comment' where 'search' is the regular expression we're searching
    >>> for and 'replace' is the replacement expression.
    >>>
    >>> Is there an easy and general way that I can split these perl-style
    >>> find-and-replace expressions into something I can use with Python, eg
    >>> re.sub('search','replace',str) ?
    >>>
    >>> I though generally it would be good enough to split on '/' but as you
    >>> see the <\/b> messes that up. I really don't want to learn perl
    >>> here :)
    >>>
    >>> Cheers
    >>> JP
    >>>

    >>
    >> This could be more general, in principal a perl regex could end with a
    >> "\", e.g. "\\/", but I'm guessing that won't happen here.
    >>
    >> py> for p in perlish:
    >> ... print p
    >> ...
    >> /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
    >> /(^|[\s\(])\_\_([^ ].*?[^
    >> ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/ /(^|[\s\(])\_([^ ].*?[^
    >> ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ py> import re
    >> py> splitter = re.compile(r'[^\\]/')
    >> py> for p in perlish:
    >> ... print splitter.split(p)
    >> ...
    >> ['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
    >> "$1'''$2'''$", '']
    >> ['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
    >> "$1''<b>$2<\\/b>''$", '']
    >> ['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
    >> "$1''$2''$", '']
    >>
    >> (I'm hoping this doesn't wrap!)
    >>
    >> James

    >
    > I realized that threw away the closing parentheses. This is the correct
    > version:
    >
    > py> splitter = re.compile(r'(?<!\\)/')
    > py> for p in perlish:
    > ... print splitter.split(p)
    > ...
    > ['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
    > "$1'''$2'''$3", '']
    > ['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
    > ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", '']
    > ['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
    > "$1''$2''$3", '']


    There is another problem with escaped backslashes:

    >>> re.compile(r'(?<!\\)/').split(r"/abc\\/def/")

    ['', 'abc\\\\/def', '']

    Peter
     
    Peter Otten, Mar 1, 2007
    #5
  6. John Pye

    James Stroud Guest

    Peter Otten wrote:
    > James Stroud wrote:
    >
    >> James Stroud wrote:
    >>> John Pye wrote:
    >>>> Hi all
    >>>>
    >>>> I have a file with a bunch of perl regular expressions like so:
    >>>>
    >>>> /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
    >>>> bold
    >>>> /(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
    >>>> b>''$3/ # italic bold
    >>>> /(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
    >>>> italic
    >>>>
    >>>> These are all find/replace expressions delimited as '/search/replace/
    >>>> # comment' where 'search' is the regular expression we're searching
    >>>> for and 'replace' is the replacement expression.
    >>>>
    >>>> Is there an easy and general way that I can split these perl-style
    >>>> find-and-replace expressions into something I can use with Python, eg
    >>>> re.sub('search','replace',str) ?
    >>>>
    >>>> I though generally it would be good enough to split on '/' but as you
    >>>> see the <\/b> messes that up. I really don't want to learn perl
    >>>> here :)
    >>>>
    >>>> Cheers
    >>>> JP
    >>>>
    >>> This could be more general, in principal a perl regex could end with a
    >>> "\", e.g. "\\/", but I'm guessing that won't happen here.
    >>>
    >>> py> for p in perlish:
    >>> ... print p
    >>> ...
    >>> /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
    >>> /(^|[\s\(])\_\_([^ ].*?[^
    >>> ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/ /(^|[\s\(])\_([^ ].*?[^
    >>> ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ py> import re
    >>> py> splitter = re.compile(r'[^\\]/')
    >>> py> for p in perlish:
    >>> ... print splitter.split(p)
    >>> ...
    >>> ['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
    >>> "$1'''$2'''$", '']
    >>> ['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
    >>> "$1''<b>$2<\\/b>''$", '']
    >>> ['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
    >>> "$1''$2''$", '']
    >>>
    >>> (I'm hoping this doesn't wrap!)
    >>>
    >>> James

    >> I realized that threw away the closing parentheses. This is the correct
    >> version:
    >>
    >> py> splitter = re.compile(r'(?<!\\)/')
    >> py> for p in perlish:
    >> ... print splitter.split(p)
    >> ...
    >> ['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
    >> "$1'''$2'''$3", '']
    >> ['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
    >> ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", '']
    >> ['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
    >> "$1''$2''$3", '']

    >
    > There is another problem with escaped backslashes:
    >
    >>>> re.compile(r'(?<!\\)/').split(r"/abc\\/def/")

    > ['', 'abc\\\\/def', '']
    >
    > Peter


    Yes, this would be a case of the expression (left side) ending with a
    "\" as I mentioned above.

    James
     
    James Stroud, Mar 1, 2007
    #6
  7. John Pye

    Peter Otten Guest

    James Stroud wrote:

    > Yes, this would be a case of the expression (left side) ending with a
    > "\" as I mentioned above.


    Sorry for not tracking the context.

    Peter
     
    Peter Otten, Mar 1, 2007
    #7
  8. John Pye

    Peter Otten Guest

    John Pye wrote:

    > Is there an easy and general way that I can split these perl-style
    > find-and-replace expressions into something I can use with Python, eg
    > re.sub('search','replace',str) ?


    Another candidate:

    >>> re.compile(r"(?:/((?:\\.|[^/])*))").findall(r"/abc\\/def\/ghi//jkl")

    ['abc\\\\', 'def\\/ghi', '', 'jkl']

    Peter
     
    Peter Otten, Mar 1, 2007
    #8
  9. John Pye

    John Pye Guest

    Thanks all for your suggestions on this. The 'splitter' idea was
    particularly good, not something I'd thought of. Sorry for my late
    reply.
     
    John Pye, Mar 22, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page