flattening a dict

B

Benjamin

How would I go about "flattening" a dict with many nested dicts
within? The dicts might look like this:
{"mays" : {"eggs" : "spam"},
"jam" : {"soda" : {"love" : "dump"}},
"lamba" : 23
}
I'd like it to put "/" inbetween the dicts to make it a one
dimensional dict and look like this:
{"mays/eggs" : "spam",
"jam/soda/love" : "dump",
"lamba" : 23
}
 
J

Jeff Schwab

Benjamin said:
How would I go about "flattening" a dict with many nested dicts
within? The dicts might look like this:
{"mays" : {"eggs" : "spam"},
"jam" : {"soda" : {"love" : "dump"}},
"lamba" : 23
}
I'd like it to put "/" inbetween the dicts to make it a one
dimensional dict and look like this:
{"mays/eggs" : "spam",
"jam/soda/love" : "dump",
"lamba" : 23
}

What have you tried so far?
 
A

Arnaud Delobelle

How would I go about "flattening" a dict with many nested dicts
within? The dicts might look like this:
{"mays" : {"eggs" : "spam"},
"jam" : {"soda" : {"love" : "dump"}},
"lamba" : 23}

I'd like it to put "/" inbetween the dicts to make it a one
dimensional dict and look like this:
{"mays/eggs" : "spam",
"jam/soda/love" : "dump",
"lamba" : 23

}

In Python you can do anything, even flatten a dictionary:

from itertools import chain

def flattendict(d, pfx='', sep='/'):
if isinstance(d, dict):
if pfx: pfx += sep
return chain(*(flattendict(v, pfx+k, sep) for k, v in
d.iteritems()))
else:
return (pfx, d),

test = {"mays" : {"eggs" : "spam"},
"jam" : {"soda" : {"love" : "dump"}},
"lamba" : 23
}
{'lamba': 23, 'mays/eggs': 'spam', 'jam/soda/love': 'dump'}

You an even have other separators ;)
{'lamba': 23, 'jam.soda.love': 'dump', 'mays.eggs': 'spam'}
 
B

Boris Borcic

Arnaud said:
In Python you can do anything, even

....pass the Turing test with a one-liner. Back after 9/11, when US patriotism
was the rage, Python knew how to answer correctly the query

filter(lambda W : W not in 'ILLITERATE','BULLSHIT')

And Python 3.0 slated for next August offers it a chance to change its mind
before the next elections...

Cheers, BB
 
T

Terry Jones

Hi Arnaud & Benjamin

Here's a version that's a bit more general. It handles keys whose values
are empty dicts (assigning None to the value in the result), and also dict
keys that are not strings (see the test data below). It's also less
recursive as it only calls itself on values that are dicts.

But.... it returns a dict whose keys are tuples. You then need to decide
what to do with this. Hence the helper function strdictflatten for the case
when all dict keys can be converted to str.

As I told Arnaud in email, I greatly prefer his version for its elegance.

Terry


def dictflatten(d, prefix=None):
result = {}
if prefix is None: prefix = tuple()
for k, v in d.iteritems():
key = prefix + (k,)
if isinstance(v, dict):
if v:
result.update(dictflatten(v, key))
else:
result[key] = None
else:
result[key] = v
return result

def strdictflatten(d, sep='/'):
return dict((sep.join(map(str, k)), v) for k, v in dictflatten(d).iteritems())

if __name__ == '__main__':
test = {
"" : {},
4 : 5,
"mays" : {"eggs" : "spam"},
"jam" : {"soda" : {"love" : "dump"}},
"lamba" : 23
}

d = dictflatten(test)
print strdictflatten(test)
 
A

Arnaud Delobelle

Hi Arnaud & Benjamin

Here's a version that's a bit more general. It handles keys whose values
are empty dicts (assigning None to the value in the result), and also dict
keys that are not strings (see the test data below). It's also less
recursive as it only calls itself on values that are dicts.

But.... it returns a dict whose keys are tuples. You then need to decide
what to do with this. Hence the helper function strdictflatten for the case
when all dict keys can be converted to str.

As I told Arnaud in email, I greatly prefer his version for its elegance.

Terry

def dictflatten(d, prefix=None):
    result = {}
    if prefix is None: prefix = tuple()
    for k, v in d.iteritems():
        key = prefix + (k,)
        if isinstance(v, dict):
            if v:
                result.update(dictflatten(v, key))
            else:
                result[key] = None
        else:
            result[key] = v
    return result

It's nice to do it this way I think. Here I post a generator-powered
version of the same idea.

from itertools import chain

def flattendict(d):
def gen(d, pfx=()):
return chain(*(gen(v, pfx+(k,)) if isinstance(v, dict)
else ((pfx+(k,), v),)
for k, v in d.iteritems()))
return dict(gen(d))

BTW, I keep using the idiom itertools.chain(*iterable). I guess that
during function calls *iterable gets expanded to a tuple. Wouldn't it
be nice to have an equivalent one-argument function that takes an
iterable of iterables and return the 'flattened' iterable?
 
G

George Sakkis

BTW, I keep using the idiom itertools.chain(*iterable). I guess that
during function calls *iterable gets expanded to a tuple. Wouldn't it
be nice to have an equivalent one-argument function that takes an
iterable of iterables and return the 'flattened' iterable?

Indeed; I don't have any exact numbers but I roughly use this idiom as
often or more as the case where chain() takes a known fixed number of
arguments. The equivalent function you describe is trivial:

def chain2(iter_of_iters):
for iterable in iter_of_iters:
for i in iterable:
yield i

but I usually don't bother, although if it was available in itertools
I'd use it instead.

Apart from introducing a new function for something quite similar,
another idea would be to modify chain(*iterables) semantics if it is
passed a single argument. Currently chain() is useful for 2 or more
arguments; chain(x) is equivalent to iter(x), there's no reason to use
it ever. On the downside, this might break backwards compatibility for
cases like chain(*some_iterable) where some_iterable has length of 1
but I'd guess this is quite rare.

George
 
B

Boris Borcic

George said:
Indeed; I don't have any exact numbers but I roughly use this idiom as
often or more as the case where chain() takes a known fixed number of
arguments. The equivalent function you describe is trivial:

def chain2(iter_of_iters):
for iterable in iter_of_iters:
for i in iterable:
yield i

or fwiw

chainstar = lambda iters : (x for it in iters for x in it)

- a form that better suggests how to inline it in the calling expression, if
applicable.

Cheers, BB
 
T

Terry Jones

Arnaud> BTW, I keep using the idiom itertools.chain(*iterable). I guess
Arnaud> that during function calls *iterable gets expanded to a tuple.
Arnaud> Wouldn't it be nice to have an equivalent one-argument function
Arnaud> that takes an iterable of iterables and return the 'flattened'
Arnaud> iterable?

This reminds me of a function I wrote a while back that iterates over all
its arguments, even calling passed functions and iterating over their
results. I knew *even less* about Python then than I do now; please set
flamethrowers to "Constructive Criticism".

http://www.fluidinfo.com/terry/2007/05/07/iteranything/

Terry
 
A

Arnaud Delobelle

or fwiw

chainstar = lambda iters : (x for it in iters for x in it)

- a form that better suggests how to inline it in the calling expression, if
applicable.

Indeed:

def flattendict(d):
def gen(d, pfx=()):
return (x for k, v in d.iteritems()
for x in (gen(v, pfx+(k,)) if isinstance(v, dict)
else ((pfx+(k,), v),)))
return dict(gen(d))

I don't know, I find the chain(*...) version more readable, although
this one is probably better. Please, Mr itertools, can we have
chainstar?
 
B

Boris Borcic

Arnaud said:
Indeed:

def flattendict(d):
def gen(d, pfx=()):
return (x for k, v in d.iteritems()
for x in (gen(v, pfx+(k,)) if isinstance(v, dict)
else ((pfx+(k,), v),)))
return dict(gen(d))

I don't know, I find the chain(*...) version more readable,

In this case I could concede that it is, although I find both forms overly
convoluted for easy reading.
although
this one is probably better.

It is more elementary in the mathematician's sense, and therefore preferable all
other things being equal, imo. I've tried to split 'gen' but I can't say the
result is so much better.

def flattendict(d) :
gen = lambda L : (x for M in exp(L) for x in rec(M))
exp = lambda L : (L+list(kv) for kv in L.pop().iteritems())
rec = lambda M : gen(M) if isinstance(M[-1],dict) else [M]
return dict((tuple(L[:-1]),L[-1]) for L in gen([d]))

For fun I've also written a (vaguely prologish) non-recursive generator-based
version that exploits undocumented (but logical) behavior of list.extend

class iterstack(list) :
__iter__ = lambda self : self
def next(self) :
while self :
try :
return self[-1].next()
except StopIteration :
self.pop()
raise StopIteration

def flattendict(d,stk=[]) :
res={}
its=iterstack()
its.extend(stk[-1].iteritems()
for stk[len(its)-1:] in chain([[d]],its)
if isinstance(stk[-1],dict)
or res.update([(stk.pop(),tuple(stk))[::-1]]))
return res

(testing was minimal)

Challenge for who really has time to waste : replace the iterstack list subclass
definition with a simple list + a generator expression inside flattendict.

Cheers, BB
 
D

Duncan Booth

Boris Borcic said:
It is more elementary in the mathematician's sense, and therefore
preferable all other things being equal, imo. I've tried to split
'gen' but I can't say the result is so much better.

def flattendict(d) :
gen = lambda L : (x for M in exp(L) for x in rec(M))
exp = lambda L : (L+list(kv) for kv in L.pop().iteritems())
rec = lambda M : gen(M) if isinstance(M[-1],dict) else [M]
return dict((tuple(L[:-1]),L[-1]) for L in gen([d]))

Why, why, why, why are you using lambda here? It only makes the code harder
to read (and it is bad enough without that). A lambda which is assigned
directly to a variable is a bad code smell.
 
B

Boris Borcic

Duncan said:
Boris Borcic said:
It is more elementary in the mathematician's sense, and therefore
preferable all other things being equal, imo. I've tried to split
'gen' but I can't say the result is so much better.

def flattendict(d) :
gen = lambda L : (x for M in exp(L) for x in rec(M))
exp = lambda L : (L+list(kv) for kv in L.pop().iteritems())
rec = lambda M : gen(M) if isinstance(M[-1],dict) else [M]
return dict((tuple(L[:-1]),L[-1]) for L in gen([d]))

Why, why, why, why are you using lambda here?

Because the 3 lambdas make manifest that they could be combined into a single
expression and thus reveal that they replace a single expression.
It only makes the code harder
to read

Matter of perceptions. I myself tend to find

def g(...) :
def f(x) :
return (whatever)
return y(f)

a bit hard to follow. So frankly I don't feel it betters anything to write

def flattendict(d) :
def gen(L) :
return (x for M in exp(L) for x in rec(M))

def exp(L) :
return (L+list(kv) for kv in L.pop().iteritems())

def rec(M) :
return gen(M) if isinstance(M[-1],dict) else [M]

return dict((tuple(L[:-1]),L[-1]) for L in gen([d]))

But whatever to please you

Cheers, BB
 
S

Steven D'Aprano

Why, why, why, why are you using lambda here? It only makes the code
harder to read (and it is bad enough without that). A lambda which is
assigned directly to a variable is a bad code smell.

Oh come on. I don't get this allergy to lambda that so many people have.
Look at the example given:


def flattendict(d) :
gen = lambda L : (x for M in exp(L) for x in rec(M))
exp = lambda L : (L+list(kv) for kv in L.pop().iteritems())
rec = lambda M : gen(M) if isinstance(M[-1],dict) else [M]
return dict((tuple(L[:-1]),L[-1]) for L in gen([d]))


The hard-to-read doesn't come from the lambda (which only adds a keyword
and a name to each function), but the algorithm, which combines
recursion, generator expressions, and tight coupling between three
functions. Would the function be any easier to read written like this?

# Untested
def flattendict(d):
def gen(L):
return (x for M in exp(L) for x in rec(M))
def exp(L):
return (L+list(kv) for kv in L.pop().iteritems())
def rec(M):
return gen(M) if isinstance(M[-1],dict) else [M]
return dict((tuple(L[:-1]),L[-1]) for L in gen([d]))

No. The function is hard to read, not because it uses lambdas, but
because it is obfuscated Python. The lambda syntax doesn't contribute to
the obfuscation.

And as for your point about bad code smells, no, I don't agree. If your
function consists of a single expression, and you don't expect
func.__name__ to have a meaningful value, then there's nothing wrong with
using a "named lambda". Anonymous functions are first-class objects in
Python, just as classes and modules and named functions are, and people
shouldn't make up superstitious rules about not assigning them to names.

def foo(x):
return x+1

foo = lambda x: x+1

The first case uses TWO keywords, a name, a declared argument and an
expression; the lambda form uses ONE keyword, a name, a declared argument
and an expression.

The problem with lambdas comes from people trying to hammer multi-
expression functions into a single-expression lambda, hence obfuscating
the algorithm. That's no different from people who obfuscate multi-
expression functions by writing them as a generator expression.
 
A

Arnaud Delobelle

On Feb 18, 10:22 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
[...]
The problem with lambdas comes from people trying to hammer multi-
expression functions into a single-expression lambda, hence obfuscating
the algorithm. That's no different from people who obfuscate multi-
expression functions by writing them as a generator expression.

I'm sorry if my Python is difficult to understand. That's because
I've got a bit of a Lisp...
 
J

Jeff Schwab

Arnaud said:
On Feb 18, 10:22 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
[...]
The problem with lambdas comes from people trying to hammer multi-
expression functions into a single-expression lambda, hence obfuscating
the algorithm. That's no different from people who obfuscate multi-
expression functions by writing them as a generator expression.

I'm sorry if my Python is difficult to understand. That's because
I've got a bit of a Lisp...

That may be the first lambda-related humor I've ever heard.
 
D

Duncan Booth

Steven D'Aprano said:
# Untested
def flattendict(d):
def gen(L):
return (x for M in exp(L) for x in rec(M))
def exp(L):
return (L+list(kv) for kv in L.pop().iteritems())
def rec(M):
return gen(M) if isinstance(M[-1],dict) else [M]
return dict((tuple(L[:-1]),L[-1]) for L in gen([d]))

No. The function is hard to read, not because it uses lambdas, but
because it is obfuscated Python. The lambda syntax doesn't contribute
to the obfuscation.

In this particular case I think the lambda does contribute to the
obfuscation. Yes, they are single expressions, but only because that
have been contorted to become single expressions. The first two return
generators, so if you don't force them into a lambda you can write them
out as for loops with yield. The last one is an if..else statement.
Splitting them into multiple statements also removes the need to keep
them as short as possible, so you don't need to use single letter
variable names everywhere.

And as for your point about bad code smells, no, I don't agree. If
your function consists of a single expression, and you don't expect
func.__name__ to have a meaningful value, then there's nothing wrong
with using a "named lambda". Anonymous functions are first-class
objects in Python, just as classes and modules and named functions
are, and people shouldn't make up superstitious rules about not
assigning them to names.

You've got it backwards here: the lambdas were given names. If they hadn't
been assigned to names I'd have no objection.
 
B

Boris Borcic

Duncan said:
In this particular case I think the lambda does contribute to the
obfuscation. Yes, they are single expressions, but only because that
have been contorted to become single expressions. The first two return
generators, so if you don't force them into a lambda you can write them
out as for loops with yield. The last one is an if..else statement.

The key word is "become", and in this particular case the discussion thread is
witness to the fact that it doesn't apply as you say. The three single
expressions came about from a (misshappen, OK, but I never pretended otherwise)
attempt to divide into pieces what was already a single expression in Arnaud's
code while keeping in the spirit.

BTW, I have a particular issue with your statement :

My problem is that I can't discuss its pure content with which I simply
disagree, because that would legitimize its abstracted form with which I have
the strongest issues, although largely OT.

Best, BB
 
B

Benjamin

Hi Arnaud & Benjamin

Here's a version that's a bit more general. It handles keys whose values
are empty dicts (assigning None to the value in the result), and also dict
keys that are not strings (see the test data below). It's also less
recursive as it only calls itself on values that are dicts.

But.... it returns a dict whose keys are tuples. You then need to decide
what to do with this. Hence the helper function strdictflatten for the case
when all dict keys can be converted to str.

As I told Arnaud in email, I greatly prefer his version for its elegance.

Terry

def dictflatten(d, prefix=None):
result = {}
if prefix is None: prefix = tuple()
for k, v in d.iteritems():
key = prefix + (k,)
if isinstance(v, dict):
if v:
result.update(dictflatten(v, key))
else:
result[key] = None
else:
result[key] = v
return result

def strdictflatten(d, sep='/'):
return dict((sep.join(map(str, k)), v) for k, v in dictflatten(d).iteritems())
Thanks. This is great. Now, how would I restrict the depth of the
flattening? For example, what if I just wanted something like this:
{
"mays" : {"eggs" : "spam"},
"jam" : {"soda" : "soda/love" : "dump"},
lamba" : 23
}
They are left alone up the to second level of recursion.
 
G

George Sakkis

Hi Arnaud & Benjamin
Here's a version that's a bit more general. It handles keys whose values
are empty dicts (assigning None to the value in the result), and also dict
keys that are not strings (see the test data below). It's also less
recursive as it only calls itself on values that are dicts.
But.... it returns a dict whose keys are tuples. You then need to decide
what to do with this. Hence the helper function strdictflatten for the case
when all dict keys can be converted to str.
As I told Arnaud in email, I greatly prefer his version for its elegance.

def dictflatten(d, prefix=None):
result = {}
if prefix is None: prefix = tuple()
for k, v in d.iteritems():
key = prefix + (k,)
if isinstance(v, dict):
if v:
result.update(dictflatten(v, key))
else:
result[key] = None
else:
result[key] = v
return result
def strdictflatten(d, sep='/'):
return dict((sep.join(map(str, k)), v) for k, v in dictflatten(d).iteritems())

Thanks. This is great. Now, how would I restrict the depth of the
flattening? For example, what if I just wanted something like this:
{
"mays" : {"eggs" : "spam"},
"jam" : {"soda" : "soda/love" : "dump"},
lamba" : 23}

They are left alone up the to second level of recursion.

Why don't you do your homework on your own for a change ? Or at least
pretend you're even trying ?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top