Wolfgang Maier said:
On 27.05.2014 14:09, Vlastimil Brom wrote:
you can just escpape the pipe with backlash like any other metacharacter:
r"start=\|ID=ter54rt543d"
be sure to use the raw string notation r"...", or you can double all
backslashes in the string.
Thanks for the response.
I got the answer finally.
This is the regular expression to be
used:\\|ID=[a-z]*[0-9]*[a-z]*[0-9]*[a-z]*\\|
or, and more readable:
r'\|ID=[a-z]*[0-9]*[a-z]*[0-9]*[a-z]*\|'
This is what Vlastimil was talking about. It saves you from having to
escape the backslashes.
Sometimes what I do, instead of using backslashes, I put the problem
character into a character class by itself. It's a matter of personal
opinion which way is easier to read, but it certainly eliminates all the
questions about "how many backslashes do I need?"
r'[|]ID=[a-z]*[0-9]*[a-z]*[0-9]*[a-z]*[|]'
Another thing that can help make regexes easier to read is the VERBOSE
flag. Basically, it ignores whitespace inside the regex (see
https://docs.python.org/2/library/re.html#module-contents for details).
So, you can write something like:
pattern = re.compile(r'''[|]
ID=
[a-z]*
[0-9]*
[a-z]*
[0-9]*
[a-z]*
[|]''',
re.VERBOSE)
Or, alternatively, take advantage of the fact that Python concatenates
adjacent string literals, and write it like this:
pattern = re.compile(r'[|]'
r'ID='
r'[a-z]*'
r'[0-9]*'
r'[a-z]*'
r'[0-9]*'
r'[a-z]*'
r'[|]'
)