Replace stop words (remove words from a string)

Discussion in 'Python' started by BerlinBrown, Jan 17, 2008.

  1. BerlinBrown

    BerlinBrown Guest

    if I have an array of "stop" words, and I want to replace those values
    with something else; in a string, how would I go about doing this. I
    have this code that splits the string and then does a difference but I
    think there is an easier approach:

    E.g.

    mystr =
    kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsldfsd;

    if I have an array stop_list = [ "[BAD]", "[BAD2]" ]

    I want to replace the values in that list with a zero length string.

    I had this before, but I don't want to use this approach; I don't want
    to use the split.

    line_list = line.lower().split()
    res = list(set(keywords_list).difference(set(ENTITY_IGNORE_LIST)))
    BerlinBrown, Jan 17, 2008
    #1
    1. Advertising

  2. BerlinBrown

    Karthik Guest

    How about -

    for s in stoplist:
    string.replace(mystr, s, "")

    Hope this should work.

    -----Original Message-----
    From: python-list-bounces+karthik3186=
    [mailto:python-list-bounces+karthik3186=] On Behalf Of
    BerlinBrown
    Sent: Thursday, January 17, 2008 1:55 PM
    To:
    Subject: Replace stop words (remove words from a string)

    if I have an array of "stop" words, and I want to replace those values
    with something else; in a string, how would I go about doing this. I
    have this code that splits the string and then does a difference but I
    think there is an easier approach:

    E.g.

    mystr =
    kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsldf
    sd;

    if I have an array stop_list = [ "[BAD]", "[BAD2]" ]

    I want to replace the values in that list with a zero length string.

    I had this before, but I don't want to use this approach; I don't want
    to use the split.

    line_list = line.lower().split()
    res = list(set(keywords_list).difference(set(ENTITY_IGNORE_LIST)))


    --
    http://mail.python.org/mailman/listinfo/python-list
    Karthik, Jan 17, 2008
    #2
    1. Advertising

  3. BerlinBrown

    Gary Herron Guest

    BerlinBrown wrote:
    > if I have an array of "stop" words, and I want to replace those values
    > with something else; in a string, how would I go about doing this. I
    > have this code that splits the string and then does a difference but I
    > think there is an easier approach:
    >
    > E.g.
    >
    > mystr =
    > kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsldfsd;
    >
    > if I have an array stop_list = [ "[BAD]", "[BAD2]" ]
    >
    > I want to replace the values in that list with a zero length string.
    >
    > I had this before, but I don't want to use this approach; I don't want
    > to use the split.
    >
    > line_list = line.lower().split()
    > res = list(set(keywords_list).difference(set(ENTITY_IGNORE_LIST)))
    >

    String have a replace method that will produce a new string with (all
    occurrences of) one substring replaced with another. You'd have to loop
    through your stop_list one word at a time.

    >>> s = 'abcxyzabc'
    >>> s.replace('xyz','')

    'abcabc'


    If either the string or the stop_list grows particularly large, this
    approach won't scale very well since the whole string would be
    re-created anew for each stop_list entry. In that case, I'd look into
    the regular expression (re) module. You may be able to finagle a way to
    find and replace all stop_list entries in one pass. (Finding them all
    is easy -- not so sure you could replace them all at once though. )


    Gary Herron
    Gary Herron, Jan 17, 2008
    #3
  4. BerlinBrown

    Gary Herron Guest

    Karthik wrote:
    > How about -
    >
    > for s in stoplist:
    > string.replace(mystr, s, "")
    >

    That will work, but the string module is long outdated. Better to use
    string methods:

    for s in stoplist:
    mystr.replace(s, "")

    Gary Herron


    > Hope this should work.
    >
    > -----Original Message-----
    > From: python-list-bounces+karthik3186=
    > [mailto:python-list-bounces+karthik3186=] On Behalf Of
    > BerlinBrown
    > Sent: Thursday, January 17, 2008 1:55 PM
    > To:
    > Subject: Replace stop words (remove words from a string)
    >
    > if I have an array of "stop" words, and I want to replace those values
    > with something else; in a string, how would I go about doing this. I
    > have this code that splits the string and then does a difference but I
    > think there is an easier approach:
    >
    > E.g.
    >
    > mystr =
    > kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsldf
    > sd;
    >
    > if I have an array stop_list = [ "[BAD]", "[BAD2]" ]
    >
    > I want to replace the values in that list with a zero length string.
    >
    > I had this before, but I don't want to use this approach; I don't want
    > to use the split.
    >
    > line_list = line.lower().split()
    > res = list(set(keywords_list).difference(set(ENTITY_IGNORE_LIST)))
    >
    >
    >
    Gary Herron, Jan 17, 2008
    #4
  5. On Jan 17, 12:25 am, BerlinBrown <> wrote:
    > if I have an array of "stop" words, and I want to replace those values
    > with something else;
    > mystr =
    > kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsld­fsd;
    > if I have an array stop_list = [ "[BAD]", "[BAD2]" ]
    > I want to replace the values in that list with a zero length string.


    Regular expressions should do the trick.

    Try this:

    >>> mystr = 'kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsld­fsd;'
    >>> stoplist = ["[BAD]", "[BAD2]"]
    >>> import re
    >>> stoppattern = '|'.join(map(re.escape, stoplist))
    >>> re.sub(stoppattern, '', mystr)

    'kljsldkfjksjdfjsdjflkdjslkfKkjkkkkjkkjkLSKJFKSFJKSJF;Lkjsld\xadfsd;'

    Raymond
    Raymond Hettinger, Jan 17, 2008
    #5
  6. BerlinBrown a écrit :
    > if I have an array of "stop" words, and I want to replace those values
    > with something else; in a string, how would I go about doing this. I
    > have this code that splits the string and then does a difference but I
    > think there is an easier approach:
    >
    > E.g.
    >
    > mystr =
    > kljsldkfjksjdfjsdjflkdjslkf[BAD]Kkjkkkkjkkjk[BAD]LSKJFKSFJKSJF;L[BAD2]kjsldfsd;
    >


    <ot>you forgot the quotes</ot>

    > if I have an array stop_list = [ "[BAD]", "[BAD2]" ]


    s/array/list/

    > I want to replace the values in that list with a zero length string.
    >
    > I had this before, but I don't want to use this approach; I don't want
    > to use the split.
    >
    > line_list = line.lower().split()
    > res = list(set(keywords_list).difference(set(ENTITY_IGNORE_LIST)))


    res = mystr
    for stop_word in stop_list:
    res = res.replace(stop_word, '')
    Bruno Desthuilliers, Jan 17, 2008
    #6
  7. BerlinBrown

    Guest

    Raymond Hettinger:
    > Regular expressions should do the trick.
    > >>> stoppattern = '|'.join(map(re.escape, stoplist))
    > >>> re.sub(stoppattern, '', mystr)


    If the stop words are many (and similar) then that RE can be optimized
    with a trie-based strategy, like this one called "List":
    http://search.cpan.org/~dankogai/Regexp-Optimizer-0.15/lib/Regexp/List.pm

    "List" is used by something more complex called "Optimizer" that's
    overkill for the OP problem:
    http://search.cpan.org/~dankogai/Regexp-Optimizer-0.15/lib/Regexp/Optimizer.pm

    I don't know if a Python module similar to "List" is available, I may
    write it :)

    Bye,
    bearophile
    , Jan 17, 2008
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. sfu
    Replies:
    15
    Views:
    12,363
    William Brogden
    Sep 14, 2003
  2. Replies:
    18
    Views:
    1,676
    Mike Wahler
    Oct 26, 2005
  3. Lasse Edsvik

    replace words with bold words

    Lasse Edsvik, Oct 5, 2003, in forum: ASP General
    Replies:
    9
    Views:
    234
  4. Leif Wessman

    Remove short words from a string

    Leif Wessman, Oct 25, 2006, in forum: Perl Misc
    Replies:
    6
    Views:
    269
    Ted Zlatanov
    Oct 26, 2006
  5. pantagruel
    Replies:
    8
    Views:
    436
    Dr John Stockton
    Jul 22, 2006
Loading...

Share This Page