Extracting repeated words

candide · Apr 1, 2011

Another question relative to regular expressions.

How to extract all word duplicates in a given text by use of regular
expression methods ? To make the question concrete, if the text is

------------------
Now is better than never.
Although never is often better than *right* now.
------------------

duplicates are :

------------------------
better is now than never
------------------------

Some code can solve the question, for instance

# ------------------
import re

regexp=r"\w+"

c=re.compile(regexp, re.IGNORECASE)

text="""
Now is better than never.
Although never is often better than *right* now."""

z=[s.lower() for s in c.findall(text)]

for d in set([s for s in z if z.count(s)>1]):
print d,
# ------------------

but I'm in search of "plain" re code.

Ian Kelly · Apr 1, 2011

Another question relative to regular expressions.

How to extract all word duplicates in a given text by use of regular
expression methods ? To make the question concrete, if the text is

------------------
Now is better than never.
Although never is often better than *right* now.
------------------

duplicates are :

------------------------
better is now than never
------------------------

Some code can solve the question, for instance

# ------------------
import re

regexp=r"\w+"

c=re.compile(regexp, re.IGNORECASE)

text="""
Now is better than never.
Although never is often better than *right* now."""

z=[s.lower() for s in c.findall(text)]

for d in set([s for s in z if z.count(s)>1]):
print d,
# ------------------

but I'm in search of "plain" re code.

You could use a look-ahead assertion with a captured group:

But note that this is computationally expensive. The regex that you
posted is probably more efficient if you use a collections.Counter
object instead of z.count.

Cheers,
Ian

candide · Apr 2, 2011

Le 02/04/2011 00:42, Ian Kelly a écrit :

You could use a look-ahead assertion with a captured group:

It works fine, lookahead assertions in action is what exatly i was
looking for, many thanks.

Regular expressions, capture repeated groups	4	Jul 8, 2010
Extracting the value from Netcdf file with longitude and lattitude	0	May 16, 2014
Python pyPDF4 code to bookmark pdf based upon date text	1	Jan 18, 2023
Remove repeated words from a file	3	Sep 18, 2009
counting repeated words in input	10	Aug 3, 2007
Capturing a Repeated Group	13	Jul 11, 2007
python idioms : some are confusing	6	Sep 21, 2012
Repeated output when logging exceptions	6	Sep 24, 2009

Extracting repeated words

candide

Ian Kelly

candide

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads