simple way to un-nest (flatten?) list

D

djc

There is I am sure an easy way to do this, but I seem to be brain dead
tonight. So:

I have a table such that I can do

[line for line in table if line[7]=='JDOC']
and
[line for line in table if line[7]=='Aslib']
and
[line for line in table if line[7]=='ASLIB']
etc

I also have a dictionary
r= {'a':('ASLIB','Aslib'),'j':('JDOC', 'jdoc')}
so I can extract values
r.values()
[('ASLIB', 'Aslib'), ('JDOC', 'jdoc')]

I would like to do

[line for line in table if line[7] in ('JDOC','jdoc','Aslib','ASLIB')]

so how should I get from
{'a':('ASLIB','Aslib'),'j':('JDOC','jdoc')}
to
('Aslib','ASLIB','JDOC','jdoc')
 
G

George Sakkis

djc said:
There is I am sure an easy way to do this, but I seem to be brain dead
tonight. So:

I have a table such that I can do

[line for line in table if line[7]=='JDOC']
and
[line for line in table if line[7]=='Aslib']
and
[line for line in table if line[7]=='ASLIB']
etc

I also have a dictionary
r= {'a':('ASLIB','Aslib'),'j':('JDOC', 'jdoc')}
so I can extract values
r.values()
[('ASLIB', 'Aslib'), ('JDOC', 'jdoc')]

I would like to do

[line for line in table if line[7] in ('JDOC','jdoc','Aslib','ASLIB')]

so how should I get from
{'a':('ASLIB','Aslib'),'j':('JDOC','jdoc')}
to
('Aslib','ASLIB','JDOC','jdoc')

Meet itertools:

from itertools import chain
names = set(chain(*r.itervalues()))
print [line for line in table if line[7] in names]


George
 
D

Dennis Lee Bieber

There is I am sure an easy way to do this, but I seem to be brain dead
tonight. So:

I have a table such that I can do

[line for line in table if line[7]=='JDOC']
and
[line for line in table if line[7]=='Aslib']
and
[line for line in table if line[7]=='ASLIB']
etc

First suggestion: Forget the various cases when doing the tests...
You can always sort the result later (since the above is returning
individual lists for each test you must have some means of combining
them... It also makes a complete pass over "table" for each test)

line[7].upper() == "ASLIB"

will match any version of letter case (even "aSliB")
I also have a dictionary
r= {'a':('ASLIB','Aslib'),'j':('JDOC', 'jdoc')}

Let's see... A dictionary where the "value" is a tuple of the same
word in different letter cases... As shown above, the cases aren't
needed... So the need for the tuple seems to have gone... That leaves a
dictionary using the lowercase first letter of the value as the lookup
key to the single value...
so I can extract values
r.values()
[('ASLIB', 'Aslib'), ('JDOC', 'jdoc')]
See above re: unified case
I would like to do

[line for line in table if line[7] in ('JDOC','jdoc','Aslib','ASLIB')]

so how should I get from
{'a':('ASLIB','Aslib'),'j':('JDOC','jdoc')}
to
('Aslib','ASLIB','JDOC','jdoc')

*** untested ***

r = { "a" : "ASLIB", # unified case
"j" : "JDOC" } # though I'd be likely to uppercase key

[line for line in table
if line[7].upper() == #set to use same case
r.get(line[7].lower()[0], None) ]
# lowercase to match key, extract first character
# return None if not a valid key

If you have the possibility of, say

r = { "a" : ("AVALUE", "ANOTHERVALUE"), ...

make sure all entries are then as tuples

... "j" : ("JDOC", ) ...


change the == to "in"
{'a': ('AVALUE', 'ANOTHERVALUE'), 'j': ('JDOC',)}
v = "anotherValue"
v.upper() in r.get(v.lower()[0], None) True
v = "AfalseValue"
v.upper() in r.get(v.lower()[0], None) False
v = "jDoC"
v.upper() in r.get(v.lower()[0], None) True
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
S

Steven D'Aprano

There is I am sure an easy way to do this, but I seem to be brain dead
tonight. So:

I have a table such that I can do

[line for line in table if line[7]=='JDOC']
and
[line for line in table if line[7]=='Aslib']
and
[line for line in table if line[7]=='ASLIB']
etc

I also have a dictionary
r= {'a':('ASLIB','Aslib'),'j':('JDOC', 'jdoc')}
so I can extract values
r.values()
[('ASLIB', 'Aslib'), ('JDOC', 'jdoc')]

I would like to do

[line for line in table if line[7] in ('JDOC','jdoc','Aslib','ASLIB')]

What is the purpose of the "if line[7]" bit?


so how should I get from
{'a':('ASLIB','Aslib'),'j':('JDOC','jdoc')}
to
('Aslib','ASLIB','JDOC','jdoc')


Assuming you don't care what order the strings are in:

r = {'a':('ASLIB','Aslib'),'j':('JDOC','jdoc')}
result = sum(r.values(), ())

If you do care about the order:

r = {'a':('ASLIB','Aslib'),'j':('JDOC','jdoc')}
keys = r.keys()
keys.sort()
result = []
for key in keys:
result.extend(r[key])
result = tuple(result)
 
D

djc

George said:
> Meet itertools:
>
> from itertools import chain
> names = set(chain(*r.itervalues()))
> print [line for line in table if line[7] in names]
Assuming you don't care what order the strings are in:

r = {'a':('ASLIB','Aslib'),'j':('JDOC','jdoc')}
result = sum(r.values(), ())

If you do care about the order:

r = {'a':('ASLIB','Aslib'),'j':('JDOC','jdoc')}
keys = r.keys()
keys.sort()
result = []
for key in keys:
result.extend(r[key])
result = tuple(result)

Thank you everybody.
As it is possible that the tuples will not always be the same word in
variant cases
result = sum(r.values(), ())
will do fine and is as simple as I suspected the answer would be.
 
B

bearophileHUGS

djc:
As it is possible that the tuples will not always be the same word in
variant cases
result = sum(r.values(), ())
will do fine and is as simple as I suspected the answer would be.

It is simple, but I suggest you to take a look at the speed of that
part of your code into your program. With this you can see the
difference:

from time import clock
d = dict((i,range(300)) for i in xrange(300))

t = clock()
r1 = sum(d.values(), [])
print clock() - t

t = clock()
r2 = []
for v in d.values(): r2.extend(v)
print clock() - t

assert r1 == r2

Bye,
bearophile
 
D

djc

It is simple, but I suggest you to take a look at the speed of that
part of your code into your program. With this you can see the
difference:

from time import clock
d = dict((i,range(300)) for i in xrange(300))

t = clock()
r1 = sum(d.values(), [])
print clock() - t

t = clock()
r2 = []
for v in d.values(): r2.extend(v)
print clock() - t

Yes, interesting, and well worth noting

1 for v in d.values(): r1.extend(v)

2 from itertools import chain
set(chain(*d.itervalues()))

3 set(v for t in d.values() for v in t)

4 sum(d.values(), [])

5 reduce((lambda l,v: l+v), d.values())

on IBM R60e [CoreDuo 1.6MHz/2GB]
d = dict((i,range(x)) for i in xrange(x))
x t1 t2 t3 t4 t5
300 0.0 0.02 0.04 0.31 0.32
500 0.01 0.09 0.1 1.67 1.69
1000 0.02 0.3 0.4 16.17 16.15
0.03 0.28 0.42 16.37 16.31
1500 0.03 0.76 0.94 57.05 57.13
2000 0.07 1.2 1.66 136.6 136.97
2500 0.11 2.34 2.64 268.44 268.85

but on the other hand, as the intended application is a small command
line app where x is unlikely to reach double figures and there are only
two users, myself included:
d =
{'a':['ASLIB','Aslib'],'j':['JDOC','jdoc'],'x':['test','alt','3rd'],'y':['single',]}
0.0 0.0 0.0 0.0 0.0

And sum(d.values(), []) has the advantage of raising a TypeError in the
case of a possible mangled input.

{'a':['ASLIB','Aslib'],'j':['JDOC','jdoc'],'x':['test','alt','3rd'],'y':'single'}
r1
['ASLIB', 'Aslib', 'test', 'alt', '3rd', 'JDOC', 'jdoc', 's', 'i', 'n',
'g', 'l', 'e']
r2
set(['Aslib', 'JDOC', 'g', '3rd', 'i', 'l', 'n', 'ASLIB', 's', 'test',
'jdoc', 'alt', 'e'])
r4 = sum(d.values(), [])
TypeError: can only concatenate list (not "str") to list
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,438
Messages
2,571,699
Members
48,796
Latest member
Greg L.
Top