What is the best way to delete strings in a string list that thatmatch certain pattern?

Discussion in 'Python' started by Peng Yu, Nov 6, 2009.

  1. Peng Yu

    Peng Yu Guest

    Suppose I have a list of strings, A. I want to compute the list (call
    it B) of strings that are elements of A but doesn't match a regex. I
    could use a for loop to do so. In a functional language, there is way
    to do so without using the for loop.

    I'm wondering what is the best way to compute B in python.
    Peng Yu, Nov 6, 2009
    #1
    1. Advertising

  2. Peng Yu

    Lie Ryan Guest

    Re: What is the best way to delete strings in a string list thatthat match certain pattern?

    Peng Yu wrote:
    > Suppose I have a list of strings, A. I want to compute the list (call
    > it B) of strings that are elements of A but doesn't match a regex. I
    > could use a for loop to do so. In a functional language, there is way
    > to do so without using the for loop.


    In functional language, there is no looping, so that argument is kind of
    pointless. The looping construct in many functional language is a syntax
    sugar for recursion.

    In python, instead of explicit loop, you can use either:
    map(pattern.match, list_of_strs)
    or
    [pattern.match(mystr) for mystr in list_of_strs]

    or if you want to be wicked evil, you can write a recursive function as
    such:

    def multimatcher(list_of_strs, index=0):
    return [] if index >= len(list_of_strs) else (
    multimatcher(
    list_of_strs[index + 1]
    ).append(
    pattern.match(list_of_strs[index])
    )
    )
    Lie Ryan, Nov 6, 2009
    #2
    1. Advertising

  3. Re: What is the best way to delete strings in a string list thatthat match certain pattern?

    Peng Yu schrieb:
    > Suppose I have a list of strings, A. I want to compute the list (call
    > it B) of strings that are elements of A but doesn't match a regex. I
    > could use a for loop to do so. In a functional language, there is way
    > to do so without using the for loop.


    Nonsense. For processing over each element, you have to loop over them,
    either with or without growing a call-stack at the same time.

    FP languages can optimize away the stack-frame-growth (tail recursion) -
    but this isn't reducing complexity in any way.

    So use a loop, either directly, or using a list-comprehension.

    Diez
    Diez B. Roggisch, Nov 6, 2009
    #3
  4. Peng Yu

    Peng Yu Guest

    On Fri, Nov 6, 2009 at 3:05 AM, Diez B. Roggisch <> wrote:
    > Peng Yu schrieb:
    >>
    >> Suppose I have a list of strings, A. I want to compute the list (call
    >> it B) of strings that are elements of A but doesn't match a regex. I
    >> could use a for loop to do so. In a functional language, there is way
    >> to do so without using the for loop.

    >
    > Nonsense. For processing over each element, you have to loop over them,
    > either with or without growing a call-stack at the same time.
    >
    > FP languages can optimize away the stack-frame-growth (tail recursion) - but
    > this isn't reducing complexity in any way.
    >
    > So use a loop, either directly, or using a list-comprehension.


    What is a list-comprehension?

    I tried the following code. The list 'l' will be ['a','b','c'] rather
    than ['b','c'], which is what I want. It seems 'remove' will disrupt
    the iterator, right? I am wondering how to make the code correct.

    l = ['a', 'a', 'b', 'c']
    for x in l:
    if x == 'a':
    l.remove(x)

    print l
    Peng Yu, Nov 6, 2009
    #4
  5. Re: What is the best way to delete strings in a string list thatthat match certain pattern?

    On Fri, 6 Nov 2009, Peng Yu wrote:

    > On Fri, Nov 6, 2009 at 3:05 AM, Diez B. Roggisch <> wrote:
    > > Peng Yu schrieb:
    > >>
    > >> Suppose I have a list of strings, A. I want to compute the list (call
    > >> it B) of strings that are elements of A but doesn't match a regex. I
    > >> could use a for loop to do so. In a functional language, there is way
    > >> to do so without using the for loop.

    > >
    > > Nonsense. For processing over each element, you have to loop over them,
    > > either with or without growing a call-stack at the same time.
    > >
    > > FP languages can optimize away the stack-frame-growth (tail recursion) - but
    > > this isn't reducing complexity in any way.
    > >
    > > So use a loop, either directly, or using a list-comprehension.

    >
    > What is a list-comprehension?
    >
    > I tried the following code. The list 'l' will be ['a','b','c'] rather
    > than ['b','c'], which is what I want. It seems 'remove' will disrupt
    > the iterator, right? I am wondering how to make the code correct.
    >
    > l = ['a', 'a', 'b', 'c']
    > for x in l:
    > if x == 'a':
    > l.remove(x)
    >
    > print l


    list comprehension seems to be what you want:

    l = [i for i in l if i != 'a']

    rday
    --


    ========================================================================
    Robert P. J. Day Waterloo, Ontario, CANADA

    Linux Consulting, Training and Kernel Pedantry.

    Web page: http://crashcourse.ca
    Twitter: http://twitter.com/rpjday
    ========================================================================
    Robert P. J. Day, Nov 6, 2009
    #5
  6. Peng Yu

    Peng Yu Guest

    On Fri, Nov 6, 2009 at 10:42 AM, Robert P. J. Day <> wrote:
    > On Fri, 6 Nov 2009, Peng Yu wrote:
    >
    >> On Fri, Nov 6, 2009 at 3:05 AM, Diez B. Roggisch <> wrote:
    >> > Peng Yu schrieb:
    >> >>
    >> >> Suppose I have a list of strings, A. I want to compute the list (call
    >> >> it B) of strings that are elements of A but doesn't match a regex. I
    >> >> could use a for loop to do so. In a functional language, there is way
    >> >> to do so without using the for loop.
    >> >
    >> > Nonsense. For processing over each element, you have to loop over them,
    >> > either with or without growing a call-stack at the same time.
    >> >
    >> > FP languages can optimize away the stack-frame-growth (tail recursion) - but
    >> > this isn't reducing complexity in any way.
    >> >
    >> > So use a loop, either directly, or using a list-comprehension.

    >>
    >> What is a list-comprehension?
    >>
    >> I tried the following code. The list 'l' will be ['a','b','c'] rather
    >> than ['b','c'], which is what I want. It seems 'remove' will disrupt
    >> the iterator, right? I am wondering how to make the code correct.
    >>
    >> l = ['a', 'a', 'b', 'c']
    >> for x in l:
    >>   if x == 'a':
    >>     l.remove(x)
    >>
    >> print l

    >
    >  list comprehension seems to be what you want:
    >
    >  l = [i for i in l if i != 'a']


    My problem comes from the context of using os.walk(). Please see the
    description of the following webpage. Somehow I have to modify the
    list inplace. I have already tried 'dirs = [i for i in l if dirs !=
    'a']'. But it seems that it doesn't "prune the search". So I need the
    inplace modification of list.

    http://docs.python.org/library/os.html

    When topdown is True, the caller can modify the dirnames list in-place
    (perhaps using del or slice assignment), and walk() will only recurse
    into the subdirectories whose names remain in dirnames; this can be
    used to prune the search, impose a specific order of visiting, or even
    to inform walk() about directories the caller creates or renames
    before it resumes walk() again. Modifying dirnames when topdown is
    False is ineffective, because in bottom-up mode the directories in
    dirnames are generated before dirpath itself is generated.
    Peng Yu, Nov 6, 2009
    #6
  7. Peng Yu

    Peter Otten Guest

    Re: What is the best way to delete strings in a string list that that match certain pattern?

    Peng Yu wrote:

    > My problem comes from the context of using os.walk(). Please see the
    > description of the following webpage. Somehow I have to modify the
    > list inplace. I have already tried 'dirs = [i for i in l if dirs !=
    > 'a']'. But it seems that it doesn't "prune the search". So I need the
    > inplace modification of list.


    Use

    dirs[:] = [d for d in dirs if d != "a"]

    or

    try:
    dirs.remove("a")
    except ValueError:
    pass
    Peter Otten, Nov 6, 2009
    #7
  8. Peng Yu

    MRAB Guest

    Re: What is the best way to delete strings in a string list thatthat match certain pattern?

    Peng Yu wrote:
    > On Fri, Nov 6, 2009 at 10:42 AM, Robert P. J. Day <> wrote:
    >> On Fri, 6 Nov 2009, Peng Yu wrote:
    >>
    >>> On Fri, Nov 6, 2009 at 3:05 AM, Diez B. Roggisch <> wrote:
    >>>> Peng Yu schrieb:
    >>>>> Suppose I have a list of strings, A. I want to compute the list (call
    >>>>> it B) of strings that are elements of A but doesn't match a regex. I
    >>>>> could use a for loop to do so. In a functional language, there is way
    >>>>> to do so without using the for loop.
    >>>> Nonsense. For processing over each element, you have to loop over them,
    >>>> either with or without growing a call-stack at the same time.
    >>>>
    >>>> FP languages can optimize away the stack-frame-growth (tail recursion) - but
    >>>> this isn't reducing complexity in any way.
    >>>>
    >>>> So use a loop, either directly, or using a list-comprehension.
    >>> What is a list-comprehension?
    >>>
    >>> I tried the following code. The list 'l' will be ['a','b','c'] rather
    >>> than ['b','c'], which is what I want. It seems 'remove' will disrupt
    >>> the iterator, right? I am wondering how to make the code correct.
    >>>
    >>> l = ['a', 'a', 'b', 'c']
    >>> for x in l:
    >>> if x == 'a':
    >>> l.remove(x)
    >>>
    >>> print l

    >> list comprehension seems to be what you want:
    >>
    >> l = [i for i in l if i != 'a']

    >
    > My problem comes from the context of using os.walk(). Please see the
    > description of the following webpage. Somehow I have to modify the
    > list inplace. I have already tried 'dirs = [i for i in l if dirs !=
    > 'a']'. But it seems that it doesn't "prune the search". So I need the
    > inplace modification of list.
    >

    [snip]
    You can replace the contents of a list like this:

    l[:] = [i for i in l if i != 'a']
    MRAB, Nov 6, 2009
    #8
  9. Peng Yu

    Dave Angel Guest

    Re: What is the best way to delete strings in a string list thatthat match certain pattern?

    Peng Yu wrote:
    > On Fri, Nov 6, 2009 at 10:42 AM, Robert P. J. Day <> wrote:
    >
    >> On Fri, 6 Nov 2009, Peng Yu wrote:
    >>
    >>
    >>> On Fri, Nov 6, 2009 at 3:05 AM, Diez B. Roggisch <> wrote:
    >>>
    >>>> Peng Yu schrieb:
    >>>>
    >>>>> Suppose I have a list of strings, A. I want to compute the list (call
    >>>>> it B) of strings that are elements of A but doesn't match a regex. I
    >>>>> could use a for loop to do so. In a functional language, there is way
    >>>>> to do so without using the for loop.
    >>>>>
    >>>> Nonsense. For processing over each element, you have to loop over them,
    >>>> either with or without growing a call-stack at the same time.
    >>>>
    >>>> FP languages can optimize away the stack-frame-growth (tail recursion) - but
    >>>> this isn't reducing complexity in any way.
    >>>>
    >>>> So use a loop, either directly, or using a list-comprehension.
    >>>>
    >>> What is a list-comprehension?
    >>>
    >>> I tried the following code. The list 'l' will be ['a','b','c'] rather
    >>> than ['b','c'], which is what I want. It seems 'remove' will disrupt
    >>> the iterator, right? I am wondering how to make the code correct.
    >>>
    >>> l ='a', 'a', 'b', 'c']
    >>> for x in l:
    >>> if x ='a':
    >>> l.remove(x)
    >>>
    >>> print l
    >>>

    >> list comprehension seems to be what you want:
    >>
    >> l =i for i in l if i != 'a']
    >>

    >
    > My problem comes from the context of using os.walk(). Please see the
    > description of the following webpage. Somehow I have to modify the
    > list inplace. I have already tried 'dirs =i for i in l if dirs !'a']'. But it seems that it doesn't "prune the search". So I need the
    > inplace modification of list.
    >
    > http://docs.python.org/library/os.html
    >
    > When topdown is True, the caller can modify the dirnames list in-place
    > (perhaps using del or slice assignment), and walk() will only recurse
    > into the subdirectories whose names remain in dirnames; this can be
    > used to prune the search, impose a specific order of visiting, or even
    > to inform walk() about directories the caller creates or renames
    > before it resumes walk() again. Modifying dirnames when topdown is
    > False is ineffective, because in bottom-up mode the directories in
    > dirnames are generated before dirpath itself is generated.
    >
    >

    The context is quite important in this case. The os.walk() iterator
    gives you a tuple of three values, and one of them is a list. You do
    indeed want to modify that list, but you usually don't want to do it
    "in-place." I'll show you the in-place version first, then show you
    the slice approach.

    If all you wanted to do was to remove one or two specific items from the
    list, then the remove method would be good. So in your example, you
    don' t need a loop. Just say:
    if 'a' in dirs:
    dirs.remove('a')

    But if you have an expression you want to match each dir against, the
    list comprehension is the best answer. And the trick to stuffing that
    new list into the original list object is to use slicing on the left
    side. The [:] notation is a default slice that means the whole list.

    dirs[:] = [ item for item in dirs if bool_expression_on_item ]


    HTH
    DaveA
    Dave Angel, Nov 6, 2009
    #9
  10. Re: What is the best way to delete strings in a string list thatthatmatch certain pattern?

    On Fri, 06 Nov 2009 10:16:58 -0600, Peng Yu wrote:

    > What is a list-comprehension?


    Time for you to Read The Fine Manual.

    http://docs.python.org/tutorial/index.html


    > I tried the following code. The list 'l' will be ['a','b','c'] rather
    > than ['b','c'], which is what I want. It seems 'remove' will disrupt the
    > iterator, right? I am wondering how to make the code correct.
    >
    > l = ['a', 'a', 'b', 'c']
    > for x in l:
    > if x == 'a':
    > l.remove(x)



    Oh lordy, it's Shlemiel the Painter's algorithm. Please don't do that for
    lists with more than a handful of items. Better still, please don't do
    that.

    http://www.joelonsoftware.com/articles/fog0000000319.html



    --
    Steven
    Steven D'Aprano, Nov 7, 2009
    #10
  11. Peng Yu

    Peng Yu Guest

    On Sat, Nov 7, 2009 at 8:54 AM, Steven D'Aprano
    <> wrote:
    > On Fri, 06 Nov 2009 10:16:58 -0600, Peng Yu wrote:
    >
    >> What is a list-comprehension?

    >
    > Time for you to Read The Fine Manual.
    >
    > http://docs.python.org/tutorial/index.html
    >
    >
    >> I tried the following code. The list 'l' will be ['a','b','c'] rather
    >> than ['b','c'], which is what I want. It seems 'remove' will disrupt the
    >> iterator, right? I am wondering how to make the code correct.
    >>
    >> l = ['a', 'a', 'b', 'c']
    >> for x in l:
    >>   if x == 'a':
    >>     l.remove(x)

    >
    >
    > Oh lordy, it's Shlemiel the Painter's algorithm. Please don't do that for
    > lists with more than a handful of items. Better still, please don't do
    > that.
    >
    > http://www.joelonsoftware.com/articles/fog0000000319.html


    I understand what is Shlemiel the Painter's algorithm. But if the
    iterator can be intelligently adjusted in my code upon 'remove()', is
    my code Shlemiel the Painter's algorithm?
    Peng Yu, Nov 7, 2009
    #11
  12. Peng Yu

    Peng Yu Guest

    On Fri, Nov 6, 2009 at 5:57 PM, Dave Angel <> wrote:
    >
    >
    > Peng Yu wrote:
    >>
    >> On Fri, Nov 6, 2009 at 10:42 AM, Robert P. J. Day <>
    >> wrote:
    >>
    >>>
    >>> On Fri, 6 Nov 2009, Peng Yu wrote:
    >>>
    >>>
    >>>>
    >>>> On Fri, Nov 6, 2009 at 3:05 AM, Diez B. Roggisch <>
    >>>> wrote:
    >>>>
    >>>>>
    >>>>> Peng Yu schrieb:
    >>>>>
    >>>>>>
    >>>>>> Suppose I have a list of strings, A. I want to compute the list (call
    >>>>>> it B) of strings that are elements of A but doesn't match a regex. I
    >>>>>> could use a for loop to do so. In a functional language, there is way
    >>>>>> to do so without using the for loop.
    >>>>>>
    >>>>>
    >>>>> Nonsense. For processing over each element, you have to loop over them,
    >>>>> either with or without growing a call-stack at the same time.
    >>>>>
    >>>>> FP languages can optimize away the stack-frame-growth (tail recursion)
    >>>>> - but
    >>>>> this isn't reducing complexity in any way.
    >>>>>
    >>>>> So use a loop, either directly, or using a list-comprehension.
    >>>>>
    >>>>
    >>>> What is a list-comprehension?
    >>>>
    >>>> I tried the following code. The list 'l' will be ['a','b','c'] rather
    >>>> than ['b','c'], which is what I want. It seems 'remove' will disrupt
    >>>> the iterator, right? I am wondering how to make the code correct.
    >>>>
    >>>> l ='a', 'a', 'b', 'c']
    >>>> for x in l:
    >>>>  if x ='a':
    >>>>    l.remove(x)
    >>>>
    >>>> print l
    >>>>
    >>>
    >>>  list comprehension seems to be what you want:
    >>>
    >>>  l =i for i in l if i != 'a']
    >>>

    >>
    >> My problem comes from the context of using os.walk(). Please see the
    >> description of the following webpage. Somehow I have to modify the
    >> list inplace. I have already tried 'dirs =i for i in l if dirs !'a']'. But
    >> it seems that it doesn't "prune the search". So I need the
    >> inplace modification of list.
    >>
    >> http://docs.python.org/library/os.html
    >>
    >> When topdown is True, the caller can modify the dirnames list in-place
    >> (perhaps using del or slice assignment), and walk() will only recurse
    >> into the subdirectories whose names remain in dirnames; this can be
    >> used to prune the search, impose a specific order of visiting, or even
    >> to inform walk() about directories the caller creates or renames
    >> before it resumes walk() again. Modifying dirnames when topdown is
    >> False is ineffective, because in bottom-up mode the directories in
    >> dirnames are generated before dirpath itself is generated.
    >>
    >>

    >
    > The context is quite important in this case.  The os.walk() iterator gives
    > you a tuple of three values, and one of them is a list.  You do indeed want
    > to modify that list, but you usually don't want to do it "in-place."   I'll
    > show you the in-place version first, then show you the slice approach.
    >
    > If all you wanted to do was to remove one or two specific items from the
    > list, then the remove method would be good.  So in your example, you don' t
    > need a loop.  Just say:
    >   if 'a' in dirs:
    >        dirs.remove('a')
    >
    > But if you have an expression you want to match each dir against, the list
    > comprehension is the best answer.  And the trick to stuffing that new list
    > into the original list object is to use slicing on the left side.  The [:]
    > notation is a default slice that means the whole list.
    >
    >   dirs[:] = [ item for item in dirs if     bool_expression_on_item ]


    I suggest to add this example to the document of os.walk() to make
    other users' life easier.
    Peng Yu, Nov 7, 2009
    #12
  13. Re: What is the best way to delete strings in a string list thatthat match certain pattern?

    On Sat, 7 Nov 2009, Peng Yu wrote:

    > On Fri, Nov 6, 2009 at 5:57 PM, Dave Angel <> wrote:


    > > But if you have an expression you want to match each dir against,
    > > the list comprehension is the best answer.  And the trick to
    > > stuffing that new list into the original list object is to use
    > > slicing on the left side.  The [:] notation is a default slice
    > > that means the whole list.
    > >
    > >   dirs[:] = [ item for item in dirs if     bool_expression_on_item ]

    >
    > I suggest to add this example to the document of os.walk() to make
    > other users' life easier.


    huh? why do you need the slice notation on the left? why can't you
    just assign to "dirs" as opposed to "dirs[:]"? using the former seems
    to work just fine. is this some kind of python optimization or idiom?

    rday
    --


    ========================================================================
    Robert P. J. Day Waterloo, Ontario, CANADA

    Linux Consulting, Training and Kernel Pedantry.

    Web page: http://crashcourse.ca
    Twitter: http://twitter.com/rpjday
    ========================================================================
    Robert P. J. Day, Nov 7, 2009
    #13
  14. Peng Yu

    Peter Otten Guest

    Re: What is the best way to delete strings in a string list that that match certain pattern?

    Robert P. J. Day wrote:

    > On Sat, 7 Nov 2009, Peng Yu wrote:
    >
    >> On Fri, Nov 6, 2009 at 5:57 PM, Dave Angel <> wrote:

    >
    >> > But if you have an expression you want to match each dir against,
    >> > the list comprehension is the best answer. And the trick to
    >> > stuffing that new list into the original list object is to use
    >> > slicing on the left side. The [:] notation is a default slice
    >> > that means the whole list.
    >> >
    >> > dirs[:] = [ item for item in dirs if bool_expression_on_item ]

    >>
    >> I suggest to add this example to the document of os.walk() to make
    >> other users' life easier.

    >
    > huh? why do you need the slice notation on the left? why can't you
    > just assign to "dirs" as opposed to "dirs[:]"? using the former seems
    > to work just fine. is this some kind of python optimization or idiom?


    dirs = [...]

    rebinds the name "dirs" while

    dirs[:] = [...]

    updates the contents of the list currently bound to the "dirs" name. The
    latter is necessary in the context of os.walk() because it yields a list of
    subdirectories, gives the user a chance to update it and than uses this
    potentially updated list to decide which subdirectories to descend into.
    A simplified example:

    >>> def f():

    .... items = ["a", "b", "c"]
    .... yield items
    .... print items
    ....
    >>> for items in f():

    .... items = ["x", "y"]
    ....
    ['a', 'b', 'c']
    >>> for items in f():

    .... items[:] = ["x", "y"]
    ....
    ['x', 'y']

    Peter
    Peter Otten, Nov 7, 2009
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    640
  2. Klaus Neuner
    Replies:
    7
    Views:
    475
    Klaus Neuner
    Jul 26, 2004
  3. Diego Martins
    Replies:
    5
    Views:
    5,252
    Diego Martins
    Jun 19, 2007
  4. anonym
    Replies:
    1
    Views:
    1,004
    Knute Johnson
    Jan 15, 2009
  5. Angus
    Replies:
    6
    Views:
    1,616
    James Kanze
    Jan 23, 2010
Loading...

Share This Page