What is the best way to delete strings in a string list that thatmatch certain pattern?

Peng Yu · Nov 6, 2009

Suppose I have a list of strings, A. I want to compute the list (call
it B) of strings that are elements of A but doesn't match a regex. I
could use a for loop to do so. In a functional language, there is way
to do so without using the for loop.

I'm wondering what is the best way to compute B in python.

Lie Ryan · Nov 6, 2009

Peng said:
Suppose I have a list of strings, A. I want to compute the list (call
it B) of strings that are elements of A but doesn't match a regex. I
could use a for loop to do so. In a functional language, there is way
to do so without using the for loop.

In functional language, there is no looping, so that argument is kind of
pointless. The looping construct in many functional language is a syntax
sugar for recursion.

In python, instead of explicit loop, you can use either:
map(pattern.match, list_of_strs)
or
[pattern.match(mystr) for mystr in list_of_strs]

or if you want to be wicked evil, you can write a recursive function as
such:

def multimatcher(list_of_strs, index=0):
return [] if index >= len(list_of_strs) else (
multimatcher(
list_of_strs[index + 1]
).append(
pattern.match(list_of_strs[index])
)
)

Diez B. Roggisch · Nov 6, 2009

Peng said:
Suppose I have a list of strings, A. I want to compute the list (call
it B) of strings that are elements of A but doesn't match a regex. I
could use a for loop to do so. In a functional language, there is way
to do so without using the for loop.

Nonsense. For processing over each element, you have to loop over them,
either with or without growing a call-stack at the same time.

FP languages can optimize away the stack-frame-growth (tail recursion) -
but this isn't reducing complexity in any way.

So use a loop, either directly, or using a list-comprehension.

Diez

Peng Yu · Nov 6, 2009

Nonsense. For processing over each element, you have to loop over them,
either with or without growing a call-stack at the same time.

FP languages can optimize away the stack-frame-growth (tail recursion) - but
this isn't reducing complexity in any way.

So use a loop, either directly, or using a list-comprehension.

What is a list-comprehension?

I tried the following code. The list 'l' will be ['a','b','c'] rather
than ['b','c'], which is what I want. It seems 'remove' will disrupt
the iterator, right? I am wondering how to make the code correct.

l = ['a', 'a', 'b', 'c']
for x in l:
if x == 'a':
l.remove(x)

print l

Robert P. J. Day · Nov 6, 2009

Nonsense. For processing over each element, you have to loop over them,
either with or without growing a call-stack at the same time.

FP languages can optimize away the stack-frame-growth (tail recursion) - but
this isn't reducing complexity in any way.

So use a loop, either directly, or using a list-comprehension.

Click to expand...

What is a list-comprehension?

I tried the following code. The list 'l' will be ['a','b','c'] rather
than ['b','c'], which is what I want. It seems 'remove' will disrupt
the iterator, right? I am wondering how to make the code correct.

l = ['a', 'a', 'b', 'c']
for x in l:
if x == 'a':
l.remove(x)

print l

list comprehension seems to be what you want:

l = [i for i in l if i != 'a']

rday
--

========================================================================
Robert P. J. Day Waterloo, Ontario, CANADA

Linux Consulting, Training and Kernel Pedantry.

Web page: http://crashcourse.ca
Twitter: http://twitter.com/rpjday
========================================================================

Peng Yu · Nov 6, 2009

Peng Yu schrieb:

Suppose I have a list of strings, A. I want to compute the list (call
it B) of strings that are elements of A but doesn't match a regex. I
could use a for loop to do so. In a functional language, there is way
to do so without using the for loop.

Nonsense. For processing over each element, you have to loop over them,
either with or without growing a call-stack at the same time.

FP languages can optimize away the stack-frame-growth (tail recursion) - but
this isn't reducing complexity in any way.

So use a loop, either directly, or using a list-comprehension.

Click to expand...

What is a list-comprehension?

I tried the following code. The list 'l' will be ['a','b','c'] rather
than ['b','c'], which is what I want. It seems 'remove' will disrupt
the iterator, right? I am wondering how to make the code correct.

l = ['a', 'a', 'b', 'c']
for x in l:
if x == 'a':
l.remove(x)

print l

Click to expand...

list comprehension seems to be what you want:

l = [i for i in l if i != 'a']

My problem comes from the context of using os.walk(). Please see the
description of the following webpage. Somehow I have to modify the
list inplace. I have already tried 'dirs = [i for i in l if dirs !=
'a']'. But it seems that it doesn't "prune the search". So I need the
inplace modification of list.

http://docs.python.org/library/os.html

When topdown is True, the caller can modify the dirnames list in-place
(perhaps using del or slice assignment), and walk() will only recurse
into the subdirectories whose names remain in dirnames; this can be
used to prune the search, impose a specific order of visiting, or even
to inform walk() about directories the caller creates or renames
before it resumes walk() again. Modifying dirnames when topdown is
False is ineffective, because in bottom-up mode the directories in
dirnames are generated before dirpath itself is generated.

Peter Otten · Nov 6, 2009

Peng said:
My problem comes from the context of using os.walk(). Please see the
description of the following webpage. Somehow I have to modify the
list inplace. I have already tried 'dirs = [i for i in l if dirs !=
'a']'. But it seems that it doesn't "prune the search". So I need the
inplace modification of list.

Use

dirs[:] = [d for d in dirs if d != "a"]

or

try:
dirs.remove("a")
except ValueError:
pass

MRAB · Nov 6, 2009

Peng said:
Peng Yu schrieb:
Suppose I have a list of strings, A. I want to compute the list (call
it B) of strings that are elements of A but doesn't match a regex. I
could use a for loop to do so. In a functional language, there is way
to do so without using the for loop.
Nonsense. For processing over each element, you have to loop over them,
either with or without growing a call-stack at the same time.

FP languages can optimize away the stack-frame-growth (tail recursion) - but
this isn't reducing complexity in any way.

So use a loop, either directly, or using a list-comprehension.
What is a list-comprehension?

I tried the following code. The list 'l' will be ['a','b','c'] rather
than ['b','c'], which is what I want. It seems 'remove' will disrupt
the iterator, right? I am wondering how to make the code correct.

l = ['a', 'a', 'b', 'c']
for x in l:
if x == 'a':
l.remove(x)

print l

Click to expand...

list comprehension seems to be what you want:

l = [i for i in l if i != 'a']

Click to expand...

My problem comes from the context of using os.walk(). Please see the
description of the following webpage. Somehow I have to modify the
list inplace. I have already tried 'dirs = [i for i in l if dirs !=
'a']'. But it seems that it doesn't "prune the search". So I need the
inplace modification of list.

[snip]
You can replace the contents of a list like this:

l[:] = [i for i in l if i != 'a']

Dave Angel · Nov 6, 2009

Peng said:
Peng Yu schrieb:

Suppose I have a list of strings, A. I want to compute the list (call
it B) of strings that are elements of A but doesn't match a regex. I
could use a for loop to do so. In a functional language, there is way
to do so without using the for loop.

Nonsense. For processing over each element, you have to loop over them,
either with or without growing a call-stack at the same time.

FP languages can optimize away the stack-frame-growth (tail recursion) - but
this isn't reducing complexity in any way.

So use a loop, either directly, or using a list-comprehension.

What is a list-comprehension?

I tried the following code. The list 'l' will be ['a','b','c'] rather
than ['b','c'], which is what I want. It seems 'remove' will disrupt
the iterator, right? I am wondering how to make the code correct.

l ='a', 'a', 'b', 'c']
for x in l:
if x ='a':
l.remove(x)

print l

Click to expand...

list comprehension seems to be what you want:

l =i for i in l if i != 'a']

Click to expand...

My problem comes from the context of using os.walk(). Please see the
description of the following webpage. Somehow I have to modify the
list inplace. I have already tried 'dirs =i for i in l if dirs !'a']'. But it seems that it doesn't "prune the search". So I need the
inplace modification of list.

http://docs.python.org/library/os.html

When topdown is True, the caller can modify the dirnames list in-place
(perhaps using del or slice assignment), and walk() will only recurse
into the subdirectories whose names remain in dirnames; this can be
used to prune the search, impose a specific order of visiting, or even
to inform walk() about directories the caller creates or renames
before it resumes walk() again. Modifying dirnames when topdown is
False is ineffective, because in bottom-up mode the directories in
dirnames are generated before dirpath itself is generated.

The context is quite important in this case. The os.walk() iterator
gives you a tuple of three values, and one of them is a list. You do
indeed want to modify that list, but you usually don't want to do it
"in-place." I'll show you the in-place version first, then show you
the slice approach.

If all you wanted to do was to remove one or two specific items from the
list, then the remove method would be good. So in your example, you
don' t need a loop. Just say:
if 'a' in dirs:
dirs.remove('a')

But if you have an expression you want to match each dir against, the
list comprehension is the best answer. And the trick to stuffing that
new list into the original list object is to use slicing on the left
side. The [:] notation is a default slice that means the whole list.

dirs[:] = [ item for item in dirs if bool_expression_on_item ]

HTH
DaveA

Steven D'Aprano · Nov 7, 2009

What is a list-comprehension?

Time for you to Read The Fine Manual.

http://docs.python.org/tutorial/index.html

I tried the following code. The list 'l' will be ['a','b','c'] rather
than ['b','c'], which is what I want. It seems 'remove' will disrupt the
iterator, right? I am wondering how to make the code correct.

l = ['a', 'a', 'b', 'c']
for x in l:
if x == 'a':
l.remove(x)

Oh lordy, it's Shlemiel the Painter's algorithm. Please don't do that for
lists with more than a handful of items. Better still, please don't do
that.

http://www.joelonsoftware.com/articles/fog0000000319.html

Peng Yu · Nov 7, 2009

What is a list-comprehension?

Click to expand...

Time for you to Read The Fine Manual.

http://docs.python.org/tutorial/index.html

I tried the following code. The list 'l' will be ['a','b','c'] rather
than ['b','c'], which is what I want. It seems 'remove' will disrupt the
iterator, right? I am wondering how to make the code correct.

l = ['a', 'a', 'b', 'c']
for x in l:
if x == 'a':
l.remove(x)

Click to expand...

Oh lordy, it's Shlemiel the Painter's algorithm. Please don't do that for
lists with more than a handful of items. Better still, please don't do
that.

http://www.joelonsoftware.com/articles/fog0000000319.html

I understand what is Shlemiel the Painter's algorithm. But if the
iterator can be intelligently adjusted in my code upon 'remove()', is
my code Shlemiel the Painter's algorithm?

Peng Yu · Nov 7, 2009

Peng said:
Peng said:

On Fri, 6 Nov 2009, Peng Yu wrote:

Peng Yu schrieb:

Suppose I have a list of strings, A. I want to compute the list (call
it B) of strings that are elements of A but doesn't match a regex. I
could use a for loop to do so. In a functional language, there is way
to do so without using the for loop.

Nonsense. For processing over each element, you have to loop over them,
either with or without growing a call-stack at the same time.

FP languages can optimize away the stack-frame-growth (tail recursion)
- but
this isn't reducing complexity in any way.

So use a loop, either directly, or using a list-comprehension.

What is a list-comprehension?

I tried the following code. The list 'l' will be ['a','b','c'] rather
than ['b','c'], which is what I want. It seems 'remove' will disrupt
the iterator, right? I am wondering how to make the code correct.

l ='a', 'a', 'b', 'c']
for x in l:
if x ='a':
l.remove(x)

print l

list comprehension seems to be what you want:

l =i for i in l if i != 'a']

Click to expand...

My problem comes from the context of using os.walk(). Please see the
description of the following webpage. Somehow I have to modify the
list inplace. I have already tried 'dirs =i for i in l if dirs !'a']'. But
it seems that it doesn't "prune the search". So I need the
inplace modification of list.

http://docs.python.org/library/os.html

When topdown is True, the caller can modify the dirnames list in-place
(perhaps using del or slice assignment), and walk() will only recurse
into the subdirectories whose names remain in dirnames; this can be
used to prune the search, impose a specific order of visiting, or even
to inform walk() about directories the caller creates or renames
before it resumes walk() again. Modifying dirnames when topdown is
False is ineffective, because in bottom-up mode the directories in
dirnames are generated before dirpath itself is generated.

Click to expand...

The context is quite important in this case. The os.walk() iterator gives
you a tuple of three values, and one of them is a list. You do indeed want
to modify that list, but you usually don't want to do it "in-place." I'll
show you the in-place version first, then show you the slice approach.

If all you wanted to do was to remove one or two specific items from the
list, then the remove method would be good. So in your example, you don' t
need a loop. Just say:
if 'a' in dirs:
dirs.remove('a')

But if you have an expression you want to match each dir against, the list
comprehension is the best answer. And the trick to stuffing that new list
into the original list object is to use slicing on the left side. The [:]
notation is a default slice that means the whole list.

dirs[:] = [ item for item in dirs if bool_expression_on_item ]

I suggest to add this example to the document of os.walk() to make
other users' life easier.

Robert P. J. Day · Nov 7, 2009

But if you have an expression you want to match each dir against,
the list comprehension is the best answer. And the trick to
stuffing that new list into the original list object is to use
slicing on the left side. The [:] notation is a default slice
that means the whole list.

dirs[:] = [ item for item in dirs if bool_expression_on_item ]

Click to expand...

I suggest to add this example to the document of os.walk() to make
other users' life easier.

huh? why do you need the slice notation on the left? why can't you
just assign to "dirs" as opposed to "dirs[:]"? using the former seems
to work just fine. is this some kind of python optimization or idiom?

rday
--

========================================================================
Robert P. J. Day Waterloo, Ontario, CANADA

Linux Consulting, Training and Kernel Pedantry.

Web page: http://crashcourse.ca
Twitter: http://twitter.com/rpjday
========================================================================

Peter Otten · Nov 7, 2009

Robert said:
But if you have an expression you want to match each dir against,
the list comprehension is the best answer. And the trick to
stuffing that new list into the original list object is to use
slicing on the left side. The [:] notation is a default slice
that means the whole list.

dirs[:] = [ item for item in dirs if bool_expression_on_item ]

Click to expand...

I suggest to add this example to the document of os.walk() to make
other users' life easier.

Click to expand...

huh? why do you need the slice notation on the left? why can't you
just assign to "dirs" as opposed to "dirs[:]"? using the former seems
to work just fine. is this some kind of python optimization or idiom?

dirs = [...]

rebinds the name "dirs" while

dirs[:] = [...]

updates the contents of the list currently bound to the "dirs" name. The
latter is necessary in the context of os.walk() because it yields a list of
subdirectories, gives the user a chance to update it and than uses this
potentially updated list to decide which subdirectories to descend into.
A simplified example:
.... items = ["a", "b", "c"]
.... yield items
.... print items
........ items = ["x", "y"]
....
['a', 'b', 'c'].... items[:] = ["x", "y"]
....
['x', 'y']

Peter

What is the best way of going about recreating the setTimeout() function?	0	Sep 2, 2022
Is there a way to add strings to a list without the quotation marks in C++?	1	Nov 9, 2020
What code to write to display a certain content in widget area	0	Aug 23, 2022
What is the best paying programming language?	6	Jun 21, 2022
Hi. What would be the best language for creating a text game that includes relatively simple stats?	3	Jan 2, 2023
List filenames that end in .mp4 and add to a list	10	Dec 25, 2023
What is the most astounding C++ syntax construct?	0	Dec 22, 2022
DUPLICATE MODS, PLEASE DELETE, SORRY!	1	Sep 4, 2023

What is the best way to delete strings in a string list that thatmatch certain pattern?

Peng Yu

Lie Ryan

Diez B. Roggisch

Peng Yu

Robert P. J. Day

Peng Yu

Peter Otten

MRAB

Dave Angel

Steven D'Aprano

Peng Yu

Peng Yu

Robert P. J. Day

Peter Otten

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads