Replace stop words (remove words from a string)

B

BerlinBrown

if I have an array of "stop" words, and I want to replace those values
with something else; in a string, how would I go about doing this. I
have this code that splits the string and then does a difference but I
think there is an easier approach:

E.g.

mystr =
kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsldfsd;

if I have an array stop_list = [ "[BAD]", "[BAD2]" ]

I want to replace the values in that list with a zero length string.

I had this before, but I don't want to use this approach; I don't want
to use the split.

line_list = line.lower().split()
res = list(set(keywords_list).difference(set(ENTITY_IGNORE_LIST)))
 
K

Karthik

How about -

for s in stoplist:
string.replace(mystr, s, "")

Hope this should work.

-----Original Message-----
From: [email protected]
[mailto:p[email protected]] On Behalf Of
BerlinBrown
Sent: Thursday, January 17, 2008 1:55 PM
To: (e-mail address removed)
Subject: Replace stop words (remove words from a string)

if I have an array of "stop" words, and I want to replace those values
with something else; in a string, how would I go about doing this. I
have this code that splits the string and then does a difference but I
think there is an easier approach:

E.g.

mystr =
kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsldf
sd;

if I have an array stop_list = [ "[BAD]", "[BAD2]" ]

I want to replace the values in that list with a zero length string.

I had this before, but I don't want to use this approach; I don't want
to use the split.

line_list = line.lower().split()
res = list(set(keywords_list).difference(set(ENTITY_IGNORE_LIST)))
 
G

Gary Herron

BerlinBrown said:
if I have an array of "stop" words, and I want to replace those values
with something else; in a string, how would I go about doing this. I
have this code that splits the string and then does a difference but I
think there is an easier approach:

E.g.

mystr =
kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsldfsd;

if I have an array stop_list = [ "[BAD]", "[BAD2]" ]

I want to replace the values in that list with a zero length string.

I had this before, but I don't want to use this approach; I don't want
to use the split.

line_list = line.lower().split()
res = list(set(keywords_list).difference(set(ENTITY_IGNORE_LIST)))
String have a replace method that will produce a new string with (all
occurrences of) one substring replaced with another. You'd have to loop
through your stop_list one word at a time.
'abcabc'


If either the string or the stop_list grows particularly large, this
approach won't scale very well since the whole string would be
re-created anew for each stop_list entry. In that case, I'd look into
the regular expression (re) module. You may be able to finagle a way to
find and replace all stop_list entries in one pass. (Finding them all
is easy -- not so sure you could replace them all at once though. )


Gary Herron
 
G

Gary Herron

Karthik said:
How about -

for s in stoplist:
string.replace(mystr, s, "")
That will work, but the string module is long outdated. Better to use
string methods:

for s in stoplist:
mystr.replace(s, "")

Gary Herron

Hope this should work.

-----Original Message-----
From: [email protected]
[mailto:p[email protected]] On Behalf Of
BerlinBrown
Sent: Thursday, January 17, 2008 1:55 PM
To: (e-mail address removed)
Subject: Replace stop words (remove words from a string)

if I have an array of "stop" words, and I want to replace those values
with something else; in a string, how would I go about doing this. I
have this code that splits the string and then does a difference but I
think there is an easier approach:

E.g.

mystr =
kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsldf
sd;

if I have an array stop_list = [ "[BAD]", "[BAD2]" ]

I want to replace the values in that list with a zero length string.

I had this before, but I don't want to use this approach; I don't want
to use the split.

line_list = line.lower().split()
res = list(set(keywords_list).difference(set(ENTITY_IGNORE_LIST)))
 
R

Raymond Hettinger

if I have an array of "stop" words, and I want to replace those values
with something else;
mystr =
kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsld­fsd;
if I have an array stop_list = [ "[BAD]", "[BAD2]" ]
I want to replace the values in that list with a zero length string.

Regular expressions should do the trick.

Try this:
mystr = 'kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsld­fsd;'
stoplist = ["[BAD]", "[BAD2]"]
import re
stoppattern = '|'.join(map(re.escape, stoplist))
re.sub(stoppattern, '', mystr)
'kljsldkfjksjdfjsdjflkdjslkfKkjkkkkjkkjkLSKJFKSFJKSJF;Lkjsld\xadfsd;'

Raymond
 
B

Bruno Desthuilliers

BerlinBrown a écrit :
if I have an array of "stop" words, and I want to replace those values
with something else; in a string, how would I go about doing this. I
have this code that splits the string and then does a difference but I
think there is an easier approach:

E.g.

mystr =
kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsldfsd;

if I have an array stop_list = [ "[BAD]", "[BAD2]" ]
s/array/list/

I want to replace the values in that list with a zero length string.

I had this before, but I don't want to use this approach; I don't want
to use the split.

line_list = line.lower().split()
res = list(set(keywords_list).difference(set(ENTITY_IGNORE_LIST)))

res = mystr
for stop_word in stop_list:
res = res.replace(stop_word, '')
 
B

bearophileHUGS

Raymond Hettinger:
Regular expressions should do the trick.

If the stop words are many (and similar) then that RE can be optimized
with a trie-based strategy, like this one called "List":
http://search.cpan.org/~dankogai/Regexp-Optimizer-0.15/lib/Regexp/List.pm

"List" is used by something more complex called "Optimizer" that's
overkill for the OP problem:
http://search.cpan.org/~dankogai/Regexp-Optimizer-0.15/lib/Regexp/Optimizer.pm

I don't know if a Python module similar to "List" is available, I may
write it :)

Bye,
bearophile
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,835
Latest member
KetoRushACVBuy

Latest Threads

Top