Remove items from a list

S

Stan Cook

I was trying to take a list of files in a directory and remove all but the ".dbf" files. I used the following to try to remove the items, but they would not remove. Any help would be greatly appreciated.

x = 0
for each in _dbases:
if each[-4:] <> ".dbf":
del each # also tried: del _dbases[x]
x = x + 1

I must be doing something wrong, but it acts as though it is....

signed
.. . . . . at the end of my rope....!
 
D

Dan Perl

What is the content of _dbases? How do you create that list? If I
understand correctly, it is a list of file names that you may have gotten
with os.listdir( ).

And I want to make sure I understand the problem. Are you trying to remove
the names from the list or are you trying to remove the files themselves?
Just making sure that it's not the latter...

Can you put more in your example, something that I may be able to run and
see the results?

Dan

I was trying to take a list of files in a directory and remove all but the
".dbf" files. I used the following to try to remove the items, but they
would not remove. Any help would be greatly appreciated.

x = 0
for each in _dbases:
if each[-4:] <> ".dbf":
del each # also tried: del _dbases[x]
x = x + 1

I must be doing something wrong, but it acts as though it is....

signed
.. . . . . at the end of my rope....!
 
S

Stan Cook

Yes, I used the listdir. The list is a list of files in the
directory. I want to filter everything out but the ".dbf"
files.


: What is the content of _dbases? How do you create that
list? If I
: understand correctly, it is a list of file names that you
may have gotten
: with os.listdir( ).
:
: And I want to make sure I understand the problem. Are you
trying to remove
: the names from the list or are you trying to remove the
files themselves?
: Just making sure that it's not the latter...
:
: Can you put more in your example, something that I may be
able to run and
: see the results?
:
: Dan
:
: : I was trying to take a list of files in a directory and
remove all but the
: ".dbf" files. I used the following to try to remove the
items, but they
: would not remove. Any help would be greatly appreciated.
:
: x = 0
: for each in _dbases:
: if each[-4:] <> ".dbf":
: del each # also tried: del
_dbases[x]
: x = x + 1
:
: I must be doing something wrong, but it acts as though it
is....
:
: signed
: . . . . . at the end of my rope....!
:
:
: ---
: Outgoing mail is certified Virus Free.
: Checked by AVG anti-virus system (http://www.grisoft.com).
: Version: 6.0.749 / Virus Database: 501 - Release Date:
9/1/04
:
:
 
P

Paul McGuire

Stan Cook said:
Yes, I used the listdir. The list is a list of files in the
directory. I want to filter everything out but the ".dbf"
files.

You said the answer yourself - "I want to _filter_ everything out but the
".dbf" files."

Use filter built-in, and use str's endswith() method in place of [-4:] list
slicing.

dirlist = [ "a.txt", "b.txt", "c.dbf", "d.txt", "e.dbf" ]
isdbf = lambda x : x.endswith(".dbf")
print filter( isdbf, dirlist )

gives:

['c.dbf', 'e.dbf']


-- Paul
 
L

Leif K-Brooks

Stan said:
I was trying to take a list of files in a directory and remove all but the ".dbf" files.

Assuming the variable "files" is the list of files:

files = [fname for fname in files if fname.endswith('.dbf')]
 
W

William Park

Paul McGuire said:
Stan Cook said:
Yes, I used the listdir. The list is a list of files in the
directory. I want to filter everything out but the ".dbf"
files.

You said the answer yourself - "I want to _filter_ everything out but
the ".dbf" files."

Use filter built-in, and use str's endswith() method in place of [-4:]
list slicing.

dirlist = [ "a.txt", "b.txt", "c.dbf", "d.txt", "e.dbf" ]
isdbf = lambda x : x.endswith(".dbf")
print filter( isdbf, dirlist )

gives:

['c.dbf', 'e.dbf']

Off topic... but if OP is interested in shell solution comparable to
above, then there are two I can offer:

dirlist=( a.txt b.txt c.dbf d.txt e.dbf )

1. echo ${dirlist[*]|/*.dbf}

2. func () {
[[ $1 == *.dbf ]]
}
arrayfilter func dirlist

3. for i in ${dirlist[*]}; do
[[ $1 == *.dbf ]] && echo $i
done

The first is shell version of Python's list comprehension, the second is
shell version of Python's filter(), and the third is standard loop
solution.

Ref:
http://freshmeat.net/projects/bashdiff/
help '${var|'
help arrayfilter
 
T

Tor Iver Wilhelmsen

Stan Cook said:
for each in _dbases:
if each[-4:] <> ".dbf":

List comprehension to the rescue!

_dbases = [each for each in _dbases if each[-4:] == ".dbf"]
 
E

Egbert Bouwman

I was trying to take a list of files in a directory and remove all but the ".dbf" files. I used the following to try to remove the items, but they would not remove. Any help would be greatly appreciated.

x = 0
for each in _dbases:
if each[-4:] <> ".dbf":
del each # also tried: del _dbases[x]
x = x + 1

I must be doing something wrong, but it acts as though it is....
The answers you received don't tell you what you are doing wrong.
If you replace 'del each' with 'print each' it works,
so it seems that you can not delete elements of a list you are
looping over. But I would like to know more about it as well.
egbert
 
M

Mel Wilson

I was trying to take a list of files in a directory and remove all but the ".dbf" files. I used the following to try to remove the items, but they would not remove. Any help would be greatly appreciated.

x = 0
for each in _dbases:
if each[-4:] <> ".dbf":
del each # also tried: del _dbases[x]
x = x + 1

I must be doing something wrong, but it acts as though it is....
The answers you received don't tell you what you are doing wrong.
If you replace 'del each' with 'print each' it works,
so it seems that you can not delete elements of a list you are
looping over. But I would like to know more about it as well.

One use of `del` is to remove a name from a namespace,
and that's what it's doing here: removing the name 'each'.

A paraphrase of what's going on is:

for i in xrange (len (_dbases)):
each = _dbases
if each[-4:] <> ".dbf":
del each

and we happily throw away the name 'each' without touching
the item in the list.

The way to remove items from a list is (untested code):

for i in xrange (len (a_list)-1, -1, -1):
if i_want_to_remove (a_list):
del a_list

Going through the list backwards means that deleting an item
doesn't change the index numbers of items we've yet to
process. `del a_list` removes from the list the
reference to the object that was the i'th item in the list
(under the hood, a Python list is implemented as an array of
references.)

This is one reason list comprehensions became popular so
fast.

Regards. Mel.
 
J

John Lenton

I was trying to take a list of files in a directory and remove all
but the ".dbf" files. I used the following to try to remove the
items, but they would not remove. Any help would be greatly
appreciated.

any reason you aren't doing glob.glob('*.dbf')?

--
John Lenton ([email protected]) -- Random fortune:
Has everybody got HALVAH spread all over their ANKLES?? ... Now, it's
time to "HAVE A NAGEELA"!!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBPwbygPqu395ykGsRAk+FAKCNb59yaVFmbjQp5kcfYua4VubgUwCdFIcC
th0XRO992Xf3EUoW/vMkSD4=
=F7+O
-----END PGP SIGNATURE-----
 
D

Dan Perl

But Stan says he tried something like that (see the comment in his code) and
it was still not working. I would still need a more complete code example
to reproduce the problem and figure out what went wrong.

Dan

the ".dbf" files. I used the following to try to remove the items, but they
would not remove. Any help would be greatly appreciated.
x = 0
for each in _dbases:
if each[-4:] <> ".dbf":
del each # also tried: del _dbases[x]
x = x + 1

I must be doing something wrong, but it acts as though it is....
The answers you received don't tell you what you are doing wrong.
If you replace 'del each' with 'print each' it works,
so it seems that you can not delete elements of a list you are
looping over. But I would like to know more about it as well.

One use of `del` is to remove a name from a namespace,
and that's what it's doing here: removing the name 'each'.

A paraphrase of what's going on is:

for i in xrange (len (_dbases)):
each = _dbases
if each[-4:] <> ".dbf":
del each

and we happily throw away the name 'each' without touching
the item in the list.

The way to remove items from a list is (untested code):

for i in xrange (len (a_list)-1, -1, -1):
if i_want_to_remove (a_list):
del a_list

Going through the list backwards means that deleting an item
doesn't change the index numbers of items we've yet to
process. `del a_list` removes from the list the
reference to the object that was the i'th item in the list
(under the hood, a Python list is implemented as an array of
references.)

This is one reason list comprehensions became popular so
fast.

Regards. Mel.
 
D

Dennis Lee Bieber

But Stan says he tried something like that (see the comment in his code) and
it was still not working. I would still need a more complete code example
to reproduce the problem and figure out what went wrong.
I seem to recall that he was still processing front to back --
which meant that with each delete, the remaining entries shifted
position.

t = ["a.txt", "b.txt", "c.dbf", "d.txt", "e.dbf"]

after deleting "a.txt" (t[0]) the loop examines t[1] -- but that is now
"c.dbf" as everything shifted left, and "b.txt" is now in t[0].

And I'm not sure what Python does on the end of the list; is the
termination dynamic (ie, it stops whenever the shortened list ends) or
is based on the original length...

--
 
P

Peter Otten

Dan said:
But Stan says he tried something like that (see the comment in his code)
and
it was still not working. I would still need a more complete code example
to reproduce the problem and figure out what went wrong.

The following example might make it clearer:
.... print "checking", fn
.... if fn.endswith(".dbf"):
.... print "deleting", files[index]
.... del files[index]
....
checking a.dbf
deleting a.dbf
checking x.txt
The iterator operating on the files list keeps track of its current position
in the list by a simple index and is unaware of any changes to that list.
If you delete an item _before_ or equal to that index position it will
still be incremented on the next pass of the for loop, and therefore you
never see item[n] that has become item[n-1] effectively by deleting one of
its predecessors.
To avoid this kind of trouble, Mel iterates over the list in reverse order -
deleting items _after_ the current position cannot confuse the iteration.

Peter
 
A

Alex Martelli

Dan Perl said:
But Stan says he tried something like that (see the comment in his code) and
it was still not working. I would still need a more complete code example
to reproduce the problem and figure out what went wrong.

Stan was not looping backwards through the list, which as Mel indicated
is a crucial part of making this clunky idiom "work" (sorta...):
The way to remove items from a list is (untested code):

for i in xrange (len (a_list)-1, -1, -1):
if i_want_to_remove (a_list):
del a_list

Going through the list backwards means that deleting an item
doesn't change the index numbers of items we've yet to


If you can make this work, you'll end up with *horrible* performance;
assuming that on average you're removing a number of items proportional
to len(a_list), this loop has O(N^2) performance.

That's because a Python list is not a linked-list, but rather a
compact-in-memory array... so, while on one hand indexing L[x] is O(1)
[NOT O(x) as it would be in a linked list], insertions and deletions
somewhere inside the list _are_ O(len(L)), since all items following the
insertion or deletion point must be shifted ('down' for a deletion, 'up'
for an insertion) to keep the array compact in memory.

Building a new list with a list comprehension (or with 'filter') and
possibly assigning it to the same name as the old list (or as the
contents of the old list, without name rebinding) OTOH is O(N), so it's
clearly the right way to go.


Alex
 
A

Alex Martelli

Dennis Lee Bieber said:
And I'm not sure what Python does on the end of the list; is the
termination dynamic (ie, it stops whenever the shortened list ends) or
is based on the original length...

The former: it's based on an IndexError being raised when the index
becomes too big for the list (so a nice MemoryError will result from
for x in mylist: mylist.append(x)
but that's another issue;-).


Alex
 
D

Dan Perl

Sorry, I missed that. And yes, that should be the problem. It's *A*
problem, for sure. Always a bad idea to modify the structure of a list
(deleting or inserting items) while iterating through it, but it's so easy
to forget that. Creating another list from the first one by filtering or
with a list comprehension should be the preferred solution, unless the
intention is to have this list used in more than one place and have the
changes reflected in all those places.

Dan

Peter Otten said:
Dan said:
But Stan says he tried something like that (see the comment in his code)
and
it was still not working. I would still need a more complete code example
to reproduce the problem and figure out what went wrong.

The following example might make it clearer:
... print "checking", fn
... if fn.endswith(".dbf"):
... print "deleting", files[index]
... del files[index]
...
checking a.dbf
deleting a.dbf
checking x.txt
The iterator operating on the files list keeps track of its current position
in the list by a simple index and is unaware of any changes to that list.
If you delete an item _before_ or equal to that index position it will
still be incremented on the next pass of the for loop, and therefore you
never see item[n] that has become item[n-1] effectively by deleting one of
its predecessors.
To avoid this kind of trouble, Mel iterates over the list in reverse order -
deleting items _after_ the current position cannot confuse the iteration.

Peter
 
P

Peter Otten

Dan said:
to forget that. Creating another list from the first one by filtering or
with a list comprehension should be the preferred solution, unless the
intention is to have this list used in more than one place and have the
changes reflected in all those places.

Even then there is a less painful solution using slices:
a = b = [1,2,3]
a[:] = [2*i for i in a if i != 2]
a [2, 6]
b
[2, 6]

Peter
 
P

Peter Abel

Stan Cook said:
I was trying to take a list of files in a directory and remove all but
the ".dbf" files. I used the following to try to remove the items, but
they would not remove. Any help would be greatly appreciated.

x = 0
for each in dbases:
if each[-4:] <> ".dbf":
del each # also tried: del dbases[x]
x = x + 1

I must be doing something wrong, but it acts as though it is....

signed
. . . . . at the end of my rope....!

When you iterate over a list with a for-loop as you do it,
you get a copy of "each" item of the list. What you're doing
is deleting this copy, which is bound to the variable *each*.
If you want to delete an item of a list you have to code:
del dbases
e.g
Example 1:
for i in range(len(dbases)):
if dbases[-4:] <> ".dbf":
del dbases

But now you'll get some trouble: Youre indexing dbases from 0 .. len(dbases),
but len(dbases) will change everytime you delete an item. So the better way
is to run over the list from the end.

Example 2:
for i in range(len(dbases)-1,-1,-1):
if dbases[-4:] <> ".dbf":
del dbasese
No you delete items of dbases, where the item indexed by i still exist.

As others already pointed out it would be better to write your condition
in a more general form:

Example 3:
for i in range(len(dbases)-1,-1,-1):
if dbases.endswith(".dbf"):
del dbasese

Or still better to generate a new list with
listcomprehension
Example 4:
dbf_names=[name for name in dbases if name.endswith('.dbf')]
or the filter-function
Example 5:
dbf_names=filter(lambda name:name.endswith('.dbf'),dbases)

But I think the best way is to have in dbases only the filenames which
end with '.dbf' from the beginning. You can get this with the glob-modul
instead of os.listdir():

Example 6:
import glob
dbases=glob.glob('/any/path/or/directory/*.dbf')
You'll get only filenames in dbases which end with '.dbf' or
an empty list if there are none.

Regards
Peter
 
E

Elbert Lev

Stan Cook said:
I was trying to take a list of files in a directory and remove all but
the ".dbf" files. I used the following to try to remove the items, but
they would not remove. Any help would be greatly appreciated.

# Assume we want files in working directory
import glob
import os.path

#If you want to have the list of dbf files try this:
lst = glob.glob("*.dbf")
print "lst = glob.glob(\"*.dbf\")"
print lst

# If you want to have the list of files, which do not have dbf extension:
lst = [x for x in glob.glob("*.*") if os.path.splitext(x) != "dbf"]
print "lst = [x for x in glob.glob(\"*.*\") if os.path.splitext(x) != \"dbf\"]"
print lst
 
J

Jeff Shannon

Paul said:
Yes, I used the listdir. The list is a list of files in the
directory. I want to filter everything out but the ".dbf"
files.

You said the answer yourself - "I want to _filter_ everything out but the
".dbf" files."

Use filter built-in, and use str's endswith() method in place of [-4:] list
slicing.

dirlist = [ "a.txt", "b.txt", "c.dbf", "d.txt", "e.dbf" ]
isdbf = lambda x : x.endswith(".dbf")
print filter( isdbf, dirlist )

gives:

['c.dbf', 'e.dbf']

Or, one could use a list comprehension and avoid the lambda:

isdbf = [ item for item in dirlist if item.endswith('.dbf') ]

Or better yet ;) one could trade listdir()/filtering for a single call
to glob:

import glob
isdbf = glob.glob('*.dbf')

Jeff Shannon
Technician/Programmer
Credit International
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top