C
candide
Another question relative to regular expressions.
How to extract all word duplicates in a given text by use of regular
expression methods ? To make the question concrete, if the text is
------------------
Now is better than never.
Although never is often better than *right* now.
------------------
duplicates are :
------------------------
better is now than never
------------------------
Some code can solve the question, for instance
# ------------------
import re
regexp=r"\w+"
c=re.compile(regexp, re.IGNORECASE)
text="""
Now is better than never.
Although never is often better than *right* now."""
z=[s.lower() for s in c.findall(text)]
for d in set([s for s in z if z.count(s)>1]):
print d,
# ------------------
but I'm in search of "plain" re code.
How to extract all word duplicates in a given text by use of regular
expression methods ? To make the question concrete, if the text is
------------------
Now is better than never.
Although never is often better than *right* now.
------------------
duplicates are :
------------------------
better is now than never
------------------------
Some code can solve the question, for instance
# ------------------
import re
regexp=r"\w+"
c=re.compile(regexp, re.IGNORECASE)
text="""
Now is better than never.
Although never is often better than *right* now."""
z=[s.lower() for s in c.findall(text)]
for d in set([s for s in z if z.count(s)>1]):
print d,
# ------------------
but I'm in search of "plain" re code.