Lining Up and PaddingTwo Similar Lists

W

W. eWatson

Maybe there's some function like zip or map that does this. If not, it's
probably fairly easy to do with push and pop. I'm just checking to see if
there's not some known simple single function that does what I want. Here's
what I'm trying to do.

I have a list dat like (assume the items are strings even thought I'm
omitting quotes.):
[a.dat, c.dat, g.dat, k.dat, p.dat]

I have another list called txt that looks like:
[a.txt, b.txt, g.txt, k.txt r.txt, w.txt]

What I need is to pair up items with the same prefix and use "None", or some
marker, to indicate the absence of the opposite item. That is, in non-list
form, I want:
a.dat a.txt
None b.txt
c.dat None
g.dat g.txt
k.dat k.txt
p.dat None
None r.txt
None w.txt

Ultimately, what I'm doing is to find the missing member of pairs.
--
Wayne Watson (Watson Adventures, Prop., Nevada City, CA)

(121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
Obz Site: 39° 15' 7" N, 121° 2' 32" W, 2700 feet

Web Page: <www.speckledwithstars.net/>
 
C

castironpi

Maybe there's some function like zip or map that does this. If not, it's
probably fairly easy to do with push and pop. I'm just checking to see if
there's not some known simple single function that does what I want. Here's
what I'm trying to do.

I have a list dat like (assume the items are strings even thought I'm
omitting quotes.):
[a.dat, c.dat, g.dat, k.dat, p.dat]

I have another list called txt that looks like:
[a.txt, b.txt, g.txt, k.txt r.txt, w.txt]

What I need is to pair up items with the same prefix and use "None", or some
marker, to indicate the absence of the opposite item. That is, in non-list
form, I want:
a.dat a.txt
None  b.txt
c.dat None
g.dat g.txt
k.dat k.txt
p.dat  None
None  r.txt
None  w.txt

Ultimately, what I'm doing is to find the missing member of pairs.
--
            Wayne Watson (Watson Adventures, Prop., Nevada City, CA)

              (121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
               Obz Site:  39° 15' 7" N, 121° 2' 32" W, 2700 feet

                     Web Page: <www.speckledwithstars.net/>

This gets you your list. What do you mean by 'missing member of
pairs'? If you mean, 'set of elements that appear in both' or 'set
that appears in one but not both', you can short circuit it at line
14.

-warning, spoiler-

dat= ['a.dat', 'c.dat', 'g.dat', 'k.dat', 'p.dat']
dat.sort()
txt= ['a.txt', 'b.txt', 'g.txt', 'k.txt', 'r.txt', 'w.txt']
txt.sort()
import os.path
datD= {}
for d in dat:
r,_= os.path.splitext( d )
datD[ r ]= d
txtD= {}
for d in txt:
r,_= os.path.splitext( d )
txtD[ r ]= d
both= sorted( list( set( datD.keys() )| set( txtD.keys() ) ) )

print datD
print txtD
print both

for i, x in enumerate( both ):
both[ i ]= datD.get( x, None ), txtD.get( x, None )

print both

OUTPUT:

{'a': 'a.dat', 'p': 'p.dat', 'c': 'c.dat', 'k': 'k.dat', 'g': 'g.dat'}
{'a': 'a.txt', 'b': 'b.txt', 'g': 'g.txt', 'k': 'k.txt', 'r': 'r.txt',
'w': 'w.t
xt'}
['a', 'b', 'c', 'g', 'k', 'p', 'r', 'w']
[('a.dat', 'a.txt'), (None, 'b.txt'), ('c.dat', None), ('g.dat',
'g.txt'), ('k.d
at', 'k.txt'), ('p.dat', None), (None, 'r.txt'), (None, 'w.txt')]
 
W

W. eWatson

castironpi said:
Maybe there's some function like zip or map that does this. If not, it's
probably fairly easy to do with push and pop. I'm just checking to see if
there's not some known simple single function that does what I want. Here's
what I'm trying to do.

I have a list dat like (assume the items are strings even thought I'm
omitting quotes.):
[a.dat, c.dat, g.dat, k.dat, p.dat]

I have another list called txt that looks like:
[a.txt, b.txt, g.txt, k.txt r.txt, w.txt]

What I need is to pair up items with the same prefix and use "None", or some
marker, to indicate the absence of the opposite item. That is, in non-list
form, I want:
a.dat a.txt
None b.txt
c.dat None
g.dat g.txt
k.dat k.txt
p.dat None
None r.txt
None w.txt

Ultimately, what I'm doing is to find the missing member of pairs.
--
Wayne Watson (Watson Adventures, Prop., Nevada City, CA)

(121.015 Deg. W, 39.262 Deg. N) GMT-8 hr std. time)
Obz Site: 39° 15' 7" N, 121° 2' 32" W, 2700 feet

Web Page: <www.speckledwithstars.net/>

This gets you your list. What do you mean by 'missing member of
(a.dat, a.txt) is a pair. (None, a.txt) has a.dat missing. I just need to
issue a msg to the user that one member of a file pair is missing. Both
files need to be present to make sense of the data.
pairs'? If you mean, 'set of elements that appear in both' or 'set
that appears in one but not both', you can short circuit it at line
14.

-warning, spoiler-
It looks like you went beyond the call of duty, but that's fine. It looks
like I have a few new features to learn about in Python. In particular,
dictionaries. Thanks.

Actually, the file names are probably in order as I pick them up in XP. I
would think if someone had sorted the folder, that as one reads the folder
they are in alpha order, low to high.
dat= ['a.dat', 'c.dat', 'g.dat', 'k.dat', 'p.dat']
dat.sort()
txt= ['a.txt', 'b.txt', 'g.txt', 'k.txt', 'r.txt', 'w.txt']
txt.sort()
import os.path
datD= {}
for d in dat:
r,_= os.path.splitext( d )
datD[ r ]= d
txtD= {}
for d in txt:
r,_= os.path.splitext( d )
txtD[ r ]= d
both= sorted( list( set( datD.keys() )| set( txtD.keys() ) ) )

print datD
print txtD
print both

for i, x in enumerate( both ):
both[ i ]= datD.get( x, None ), txtD.get( x, None )

print both

OUTPUT:

{'a': 'a.dat', 'p': 'p.dat', 'c': 'c.dat', 'k': 'k.dat', 'g': 'g.dat'}
{'a': 'a.txt', 'b': 'b.txt', 'g': 'g.txt', 'k': 'k.txt', 'r': 'r.txt',
'w': 'w.t
xt'}
['a', 'b', 'c', 'g', 'k', 'p', 'r', 'w']
[('a.dat', 'a.txt'), (None, 'b.txt'), ('c.dat', None), ('g.dat',
'g.txt'), ('k.d
at', 'k.txt'), ('p.dat', None), (None, 'r.txt'), (None, 'w.txt')]
 
C

castironpi

(a.dat, a.txt) is a pair. (None, a.txt) has a.dat missing. I just need to
issue a msg to the user that one member of a file pair is missing. Both
files need to be present to make sense of the data.> pairs'?  If you mean, 'set of elements that appear in both' or 'set


It looks like you went beyond the call of duty, but that's fine. It looks
like I have a few new features to learn about in Python. In particular,
dictionaries. Thanks.

Actually, the file names are probably in order as I pick them up in XP. I
would think if someone had sorted the folder, that as one reads the folder
they are in alpha order, low to high.

I don't think that's guaranteed by anything. I realized that
'dat.sort()' and 'txt.sort()' weren't necessary, since their contents
are moved to a dictionary, which isn't sorted.

both= set( datD.keys() )& set( txtD.keys() )

This will get you the keys (prefixes) that are in both. Then for
every prefix if it's not in 'both', you can report it.

Lastly, since you suggest you're guaranteed that 'txt' will all share
the same extension, you can do away with the dictionary and use sets
entirely. Only if you can depend on that assumption.

I took a look at this. It's probably more what you had in mind, and
the dictionaries are overkill.

import os.path
dat= ['a.dat', 'c.dat', 'g.dat', 'k.dat', 'p.dat']
datset= set( [ os.path.splitext( x )[ 0 ] for x in dat ] )
print datset
txt= ['a.txt', 'b.txt', 'g.txt', 'k.txt', 'r.txt', 'w.txt']
txtset= set( [ os.path.splitext( x )[ 0 ] for x in txt ] )
print txtset
both= txtset & datset
for d in datset- both:
print '%s.dat not matched'% d
for t in txtset- both:
print '%s.txt not matched'% t

OUTPUT:

set(['a', 'p', 'c', 'k', 'g'])
set(['a', 'b', 'g', 'k', 'r', 'w'])
p.dat not matched
c.dat not matched
r.txt not matched
b.txt not matched
w.txt not matched
 
P

Paul Rubin

W. eWatson said:
[a.dat, c.dat, g.dat, k.dat, p.dat]
[a.txt, b.txt, g.txt, k.txt r.txt, w.txt]

What I need is to pair up items with the same prefix and use "None",
or some marker, to indicate the absence of the opposite item.

This is functionally influenced but should be straightforward:

dat = ['a.dat', 'c.dat', 'g.dat', 'k.dat', 'p.dat']
txt = ['a.txt', 'b.txt', 'g.txt', 'k.txt', 'r.txt', 'w.txt']

# just get the portion of the filename before the first period
def prefix(filename):
return filename[:filename.find('.')]

# make a dictionary mapping prefixes to filenames
def make_dict(plist):
return dict((prefix(a),a) for a in plist)

pdat = make_dict(dat)
ptxt = make_dict(txt)

# get a list of all the prefixes, use "set" to remove
# duplicates, then sort the result and look up each prefix.
for p in sorted(set(pdat.keys() + ptxt.keys())):
print pdat.get(p), ptxt.get(p)
 
B

Boris Borcic

D,T=[dict((x.split('.')[0],x) for x in X) for X in (dat,txt)]
for k in sorted(set(D).union(T)) :
for S in D,T :
print '%-8s' % S.get(k,'None'),
print

HTH
 
W

W. eWatson

castironpi wrote:
....
I don't think that's guaranteed by anything. I realized that
'dat.sort()' and 'txt.sort()' weren't necessary, since their contents
are moved to a dictionary, which isn't sorted.
Actually, I'm getting the file names from listdir, and they appear to be
sorted low to high. I tried it on a folder with lots of dissimilar files.
both= set( datD.keys() )& set( txtD.keys() )

This will get you the keys (prefixes) that are in both. Then for
every prefix if it's not in 'both', you can report it.

Lastly, since you suggest you're guaranteed that 'txt' will all share
the same extension, you can do away with the dictionary and use sets
entirely. Only if you can depend on that assumption.
Each dat file contains an image, and its description and related parameters
are in the corresponding txt file.
I took a look at this. It's probably more what you had in mind, and
the dictionaries are overkill.
....
 
G

George Sakkis

It looks like I have a few new features to learn about in Python. In particular,
dictionaries.

In Python it's hard to think of many non-trivial problems that you
*don't* have to know about dictionaries.

George
 
M

Marc 'BlackJack' Rintsch

Actually, I'm getting the file names from listdir, and they appear to be
sorted low to high. I tried it on a folder with lots of dissimilar
files.

But that's not guaranteed. It depends on the operating system and file
system driver if the names are sorted or not.

In [14]: os.listdir?
Type: builtin_function_or_method
Base Class: <type 'builtin_function_or_method'>
String Form: <built-in function listdir>
Namespace: Interactive
Docstring:
listdir(path) -> list_of_strings

Return a list containing the names of the entries in the directory.

path: path of directory to list

The list is in arbitrary order. It does not include the special
entries '.' and '..' even if they are present in the directory.

Ciao,
Marc 'BlackJack' Rintsch
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top