Extracting repeated words

C

candide

Another question relative to regular expressions.

How to extract all word duplicates in a given text by use of regular
expression methods ? To make the question concrete, if the text is

------------------
Now is better than never.
Although never is often better than *right* now.
------------------

duplicates are :

------------------------
better is now than never
------------------------

Some code can solve the question, for instance

# ------------------
import re

regexp=r"\w+"

c=re.compile(regexp, re.IGNORECASE)

text="""
Now is better than never.
Although never is often better than *right* now."""

z=[s.lower() for s in c.findall(text)]

for d in set([s for s in z if z.count(s)>1]):
print d,
# ------------------

but I'm in search of "plain" re code.
 
I

Ian Kelly

Another question relative to regular expressions.

How to extract all word duplicates in a given text by use of regular
expression methods ?  To make the question concrete, if the text is

------------------
Now is better than never.
Although never is often better than *right* now.
------------------

duplicates are :

------------------------
better is now than never
------------------------

Some code can solve the question, for instance

# ------------------
import re

regexp=r"\w+"

c=re.compile(regexp, re.IGNORECASE)

text="""
Now is better than never.
Although never is often better than *right* now."""

z=[s.lower() for s in c.findall(text)]

for d in set([s for s in z if z.count(s)>1]):
   print d,
# ------------------

but I'm in search of "plain" re code.

You could use a look-ahead assertion with a captured group:

But note that this is computationally expensive. The regex that you
posted is probably more efficient if you use a collections.Counter
object instead of z.count.

Cheers,
Ian
 
C

candide

Le 02/04/2011 00:42, Ian Kelly a écrit :
You could use a look-ahead assertion with a captured group:

It works fine, lookahead assertions in action is what exatly i was
looking for, many thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top