flattening a dict

Benjamin · Feb 17, 2008

How would I go about "flattening" a dict with many nested dicts
within? The dicts might look like this:
{"mays" : {"eggs" : "spam"},
"jam" : {"soda" : {"love" : "dump"}},
"lamba" : 23
}
I'd like it to put "/" inbetween the dicts to make it a one
dimensional dict and look like this:
{"mays/eggs" : "spam",
"jam/soda/love" : "dump",
"lamba" : 23
}

Jeff Schwab · Feb 17, 2008

Benjamin said:
How would I go about "flattening" a dict with many nested dicts
within? The dicts might look like this:
{"mays" : {"eggs" : "spam"},
"jam" : {"soda" : {"love" : "dump"}},
"lamba" : 23
}
I'd like it to put "/" inbetween the dicts to make it a one
dimensional dict and look like this:
{"mays/eggs" : "spam",
"jam/soda/love" : "dump",
"lamba" : 23
}

What have you tried so far?

Arnaud Delobelle · Feb 17, 2008

How would I go about "flattening" a dict with many nested dicts
within? The dicts might look like this:
{"mays" : {"eggs" : "spam"},
"jam" : {"soda" : {"love" : "dump"}},
"lamba" : 23}

I'd like it to put "/" inbetween the dicts to make it a one
dimensional dict and look like this:
{"mays/eggs" : "spam",
"jam/soda/love" : "dump",
"lamba" : 23

}

In Python you can do anything, even flatten a dictionary:

from itertools import chain

def flattendict(d, pfx='', sep='/'):
if isinstance(d, dict):
if pfx: pfx += sep
return chain(*(flattendict(v, pfx+k, sep) for k, v in
d.iteritems()))
else:
return (pfx, d),

test = {"mays" : {"eggs" : "spam"},
"jam" : {"soda" : {"love" : "dump"}},
"lamba" : 23
}
{'lamba': 23, 'mays/eggs': 'spam', 'jam/soda/love': 'dump'}

You an even have other separators

{'lamba': 23, 'jam.soda.love': 'dump', 'mays.eggs': 'spam'}

Boris Borcic · Feb 17, 2008

Arnaud said:
In Python you can do anything, even

....pass the Turing test with a one-liner. Back after 9/11, when US patriotism
was the rage, Python knew how to answer correctly the query

filter(lambda W : W not in 'ILLITERATE','BULLSHIT')

And Python 3.0 slated for next August offers it a chance to change its mind
before the next elections...

Cheers, BB

Terry Jones · Feb 17, 2008

Hi Arnaud & Benjamin

Here's a version that's a bit more general. It handles keys whose values
are empty dicts (assigning None to the value in the result), and also dict
keys that are not strings (see the test data below). It's also less
recursive as it only calls itself on values that are dicts.

But.... it returns a dict whose keys are tuples. You then need to decide
what to do with this. Hence the helper function strdictflatten for the case
when all dict keys can be converted to str.

As I told Arnaud in email, I greatly prefer his version for its elegance.

Terry

def dictflatten(d, prefix=None):
result = {}
if prefix is None: prefix = tuple()
for k, v in d.iteritems():
key = prefix + (k,)
if isinstance(v, dict):
if v:
result.update(dictflatten(v, key))
else:
result[key] = None
else:
result[key] = v
return result

def strdictflatten(d, sep='/'):
return dict((sep.join(map(str, k)), v) for k, v in dictflatten(d).iteritems())

if __name__ == '__main__':
test = {
"" : {},
4 : 5,
"mays" : {"eggs" : "spam"},
"jam" : {"soda" : {"love" : "dump"}},
"lamba" : 23
}

d = dictflatten(test)
print strdictflatten(test)

Arnaud Delobelle · Feb 17, 2008

Hi Arnaud & Benjamin

Here's a version that's a bit more general. It handles keys whose values
are empty dicts (assigning None to the value in the result), and also dict
keys that are not strings (see the test data below). It's also less
recursive as it only calls itself on values that are dicts.

But.... it returns a dict whose keys are tuples. You then need to decide
what to do with this. Hence the helper function strdictflatten for the case
when all dict keys can be converted to str.

As I told Arnaud in email, I greatly prefer his version for its elegance.

Terry

def dictflatten(d, prefix=None):
result = {}
if prefix is None: prefix = tuple()
for k, v in d.iteritems():
key = prefix + (k,)
if isinstance(v, dict):
if v:
result.update(dictflatten(v, key))
else:
result[key] = None
else:
result[key] = v
return result

It's nice to do it this way I think. Here I post a generator-powered
version of the same idea.

from itertools import chain

def flattendict(d):
def gen(d, pfx=()):
return chain(*(gen(v, pfx+(k,)) if isinstance(v, dict)
else ((pfx+(k,), v),)
for k, v in d.iteritems()))
return dict(gen(d))

BTW, I keep using the idiom itertools.chain(*iterable). I guess that
during function calls *iterable gets expanded to a tuple. Wouldn't it
be nice to have an equivalent one-argument function that takes an
iterable of iterables and return the 'flattened' iterable?

George Sakkis · Feb 17, 2008

BTW, I keep using the idiom itertools.chain(*iterable). I guess that
during function calls *iterable gets expanded to a tuple. Wouldn't it
be nice to have an equivalent one-argument function that takes an
iterable of iterables and return the 'flattened' iterable?

Indeed; I don't have any exact numbers but I roughly use this idiom as
often or more as the case where chain() takes a known fixed number of
arguments. The equivalent function you describe is trivial:

def chain2(iter_of_iters):
for iterable in iter_of_iters:
for i in iterable:
yield i

but I usually don't bother, although if it was available in itertools
I'd use it instead.

Apart from introducing a new function for something quite similar,
another idea would be to modify chain(*iterables) semantics if it is
passed a single argument. Currently chain() is useful for 2 or more
arguments; chain(x) is equivalent to iter(x), there's no reason to use
it ever. On the downside, this might break backwards compatibility for
cases like chain(*some_iterable) where some_iterable has length of 1
but I'd guess this is quite rare.

George

Boris Borcic · Feb 17, 2008

George said:
Indeed; I don't have any exact numbers but I roughly use this idiom as
often or more as the case where chain() takes a known fixed number of
arguments. The equivalent function you describe is trivial:

def chain2(iter_of_iters):
for iterable in iter_of_iters:
for i in iterable:
yield i

or fwiw

chainstar = lambda iters : (x for it in iters for x in it)

- a form that better suggests how to inline it in the calling expression, if
applicable.

Cheers, BB

Terry Jones · Feb 17, 2008

Arnaud> BTW, I keep using the idiom itertools.chain(*iterable). I guess
Arnaud> that during function calls *iterable gets expanded to a tuple.
Arnaud> Wouldn't it be nice to have an equivalent one-argument function
Arnaud> that takes an iterable of iterables and return the 'flattened'
Arnaud> iterable?

This reminds me of a function I wrote a while back that iterates over all
its arguments, even calling passed functions and iterating over their
results. I knew *even less* about Python then than I do now; please set
flamethrowers to "Constructive Criticism".

http://www.fluidinfo.com/terry/2007/05/07/iteranything/

Terry

Arnaud Delobelle · Feb 17, 2008

or fwiw

chainstar = lambda iters : (x for it in iters for x in it)

- a form that better suggests how to inline it in the calling expression, if
applicable.

Indeed:

def flattendict(d):
def gen(d, pfx=()):
return (x for k, v in d.iteritems()
for x in (gen(v, pfx+(k,)) if isinstance(v, dict)
else ((pfx+(k,), v),)))
return dict(gen(d))

I don't know, I find the chain(*...) version more readable, although
this one is probably better. Please, Mr itertools, can we have
chainstar?

Boris Borcic · Feb 18, 2008

Arnaud said:
Indeed:

def flattendict(d):
def gen(d, pfx=()):
return (x for k, v in d.iteritems()
for x in (gen(v, pfx+(k,)) if isinstance(v, dict)
else ((pfx+(k,), v),)))
return dict(gen(d))

I don't know, I find the chain(*...) version more readable,

In this case I could concede that it is, although I find both forms overly
convoluted for easy reading.

although
this one is probably better.

It is more elementary in the mathematician's sense, and therefore preferable all
other things being equal, imo. I've tried to split 'gen' but I can't say the
result is so much better.

def flattendict(d) :
gen = lambda L : (x for M in exp(L) for x in rec(M))
exp = lambda L : (L+list(kv) for kv in L.pop().iteritems())
rec = lambda M : gen(M) if isinstance(M[-1],dict) else [M]
return dict((tuple(L[:-1]),L[-1]) for L in gen([d]))

For fun I've also written a (vaguely prologish) non-recursive generator-based
version that exploits undocumented (but logical) behavior of list.extend

class iterstack(list) :
__iter__ = lambda self : self
def next(self) :
while self :
try :
return self[-1].next()
except StopIteration :
self.pop()
raise StopIteration

def flattendict(d,stk=[]) :
res={}
its=iterstack()
its.extend(stk[-1].iteritems()
for stk[len(its)-1:] in chain([[d]],its)
if isinstance(stk[-1],dict)
or res.update([(stk.pop(),tuple(stk))[::-1]]))
return res

(testing was minimal)

Challenge for who really has time to waste : replace the iterstack list subclass
definition with a simple list + a generator expression inside flattendict.

Cheers, BB

Duncan Booth · Feb 18, 2008

Boris Borcic said:
It is more elementary in the mathematician's sense, and therefore
preferable all other things being equal, imo. I've tried to split
'gen' but I can't say the result is so much better.

def flattendict(d) :
gen = lambda L : (x for M in exp(L) for x in rec(M))
exp = lambda L : (L+list(kv) for kv in L.pop().iteritems())
rec = lambda M : gen(M) if isinstance(M[-1],dict) else [M]
return dict((tuple(L[:-1]),L[-1]) for L in gen([d]))

Why, why, why, why are you using lambda here? It only makes the code harder
to read (and it is bad enough without that). A lambda which is assigned
directly to a variable is a bad code smell.

Boris Borcic · Feb 18, 2008

Duncan said:
Boris Borcic said:

It is more elementary in the mathematician's sense, and therefore
preferable all other things being equal, imo. I've tried to split
'gen' but I can't say the result is so much better.

def flattendict(d) :
gen = lambda L : (x for M in exp(L) for x in rec(M))
exp = lambda L : (L+list(kv) for kv in L.pop().iteritems())
rec = lambda M : gen(M) if isinstance(M[-1],dict) else [M]
return dict((tuple(L[:-1]),L[-1]) for L in gen([d]))

Click to expand...

Why, why, why, why are you using lambda here?

Because the 3 lambdas make manifest that they could be combined into a single
expression and thus reveal that they replace a single expression.

It only makes the code harder
to read

Matter of perceptions. I myself tend to find

def g(...) :
def f(x) :
return (whatever)
return y(f)

a bit hard to follow. So frankly I don't feel it betters anything to write

def flattendict(d) :
def gen(L) :
return (x for M in exp(L) for x in rec(M))

def exp(L) :
return (L+list(kv) for kv in L.pop().iteritems())

def rec(M) :
return gen(M) if isinstance(M[-1],dict) else [M]

return dict((tuple(L[:-1]),L[-1]) for L in gen([d]))

But whatever to please you

Cheers, BB

Steven D'Aprano · Feb 18, 2008

Why, why, why, why are you using lambda here? It only makes the code
harder to read (and it is bad enough without that). A lambda which is
assigned directly to a variable is a bad code smell.

Oh come on. I don't get this allergy to lambda that so many people have.
Look at the example given:

def flattendict(d) :
gen = lambda L : (x for M in exp(L) for x in rec(M))
exp = lambda L : (L+list(kv) for kv in L.pop().iteritems())
rec = lambda M : gen(M) if isinstance(M[-1],dict) else [M]
return dict((tuple(L[:-1]),L[-1]) for L in gen([d]))

The hard-to-read doesn't come from the lambda (which only adds a keyword
and a name to each function), but the algorithm, which combines
recursion, generator expressions, and tight coupling between three
functions. Would the function be any easier to read written like this?

# Untested
def flattendict(d):
def gen(L):
return (x for M in exp(L) for x in rec(M))
def exp(L):
return (L+list(kv) for kv in L.pop().iteritems())
def rec(M):
return gen(M) if isinstance(M[-1],dict) else [M]
return dict((tuple(L[:-1]),L[-1]) for L in gen([d]))

No. The function is hard to read, not because it uses lambdas, but
because it is obfuscated Python. The lambda syntax doesn't contribute to
the obfuscation.

And as for your point about bad code smells, no, I don't agree. If your
function consists of a single expression, and you don't expect
func.__name__ to have a meaningful value, then there's nothing wrong with
using a "named lambda". Anonymous functions are first-class objects in
Python, just as classes and modules and named functions are, and people
shouldn't make up superstitious rules about not assigning them to names.

def foo(x):
return x+1

foo = lambda x: x+1

The first case uses TWO keywords, a name, a declared argument and an
expression; the lambda form uses ONE keyword, a name, a declared argument
and an expression.

The problem with lambdas comes from people trying to hammer multi-
expression functions into a single-expression lambda, hence obfuscating
the algorithm. That's no different from people who obfuscate multi-
expression functions by writing them as a generator expression.

Arnaud Delobelle · Feb 18, 2008

On Feb 18, 10:22 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
[...]

The problem with lambdas comes from people trying to hammer multi-
expression functions into a single-expression lambda, hence obfuscating
the algorithm. That's no different from people who obfuscate multi-
expression functions by writing them as a generator expression.

I'm sorry if my Python is difficult to understand. That's because
I've got a bit of a Lisp...

Jeff Schwab · Feb 18, 2008

Arnaud said:
On Feb 18, 10:22 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
[...]

The problem with lambdas comes from people trying to hammer multi-
expression functions into a single-expression lambda, hence obfuscating
the algorithm. That's no different from people who obfuscate multi-
expression functions by writing them as a generator expression.

Click to expand...

I'm sorry if my Python is difficult to understand. That's because
I've got a bit of a Lisp...

That may be the first lambda-related humor I've ever heard.

Duncan Booth · Feb 19, 2008

Steven D'Aprano said:
# Untested
def flattendict(d):
def gen(L):
return (x for M in exp(L) for x in rec(M))
def exp(L):
return (L+list(kv) for kv in L.pop().iteritems())
def rec(M):
return gen(M) if isinstance(M[-1],dict) else [M]
return dict((tuple(L[:-1]),L[-1]) for L in gen([d]))

No. The function is hard to read, not because it uses lambdas, but
because it is obfuscated Python. The lambda syntax doesn't contribute
to the obfuscation.

In this particular case I think the lambda does contribute to the
obfuscation. Yes, they are single expressions, but only because that
have been contorted to become single expressions. The first two return
generators, so if you don't force them into a lambda you can write them
out as for loops with yield. The last one is an if..else statement.
Splitting them into multiple statements also removes the need to keep
them as short as possible, so you don't need to use single letter
variable names everywhere.

And as for your point about bad code smells, no, I don't agree. If
your function consists of a single expression, and you don't expect
func.__name__ to have a meaningful value, then there's nothing wrong
with using a "named lambda". Anonymous functions are first-class
objects in Python, just as classes and modules and named functions
are, and people shouldn't make up superstitious rules about not
assigning them to names.

You've got it backwards here: the lambdas were given names. If they hadn't
been assigned to names I'd have no objection.

Boris Borcic · Feb 19, 2008

Duncan said:
In this particular case I think the lambda does contribute to the
obfuscation. Yes, they are single expressions, but only because that
have been contorted to become single expressions. The first two return
generators, so if you don't force them into a lambda you can write them
out as for loops with yield. The last one is an if..else statement.

The key word is "become", and in this particular case the discussion thread is
witness to the fact that it doesn't apply as you say. The three single
expressions came about from a (misshappen, OK, but I never pretended otherwise)
attempt to divide into pieces what was already a single expression in Arnaud's
code while keeping in the spirit.

BTW, I have a particular issue with your statement :

My problem is that I can't discuss its pure content with which I simply
disagree, because that would legitimize its abstracted form with which I have
the strongest issues, although largely OT.

Best, BB

Benjamin · Feb 22, 2008

Hi Arnaud & Benjamin

Here's a version that's a bit more general. It handles keys whose values
are empty dicts (assigning None to the value in the result), and also dict
keys that are not strings (see the test data below). It's also less
recursive as it only calls itself on values that are dicts.

But.... it returns a dict whose keys are tuples. You then need to decide
what to do with this. Hence the helper function strdictflatten for the case
when all dict keys can be converted to str.

As I told Arnaud in email, I greatly prefer his version for its elegance.

Terry

def dictflatten(d, prefix=None):
result = {}
if prefix is None: prefix = tuple()
for k, v in d.iteritems():
key = prefix + (k,)
if isinstance(v, dict):
if v:
result.update(dictflatten(v, key))
else:
result[key] = None
else:
result[key] = v
return result

def strdictflatten(d, sep='/'):
return dict((sep.join(map(str, k)), v) for k, v in dictflatten(d).iteritems())

Thanks. This is great. Now, how would I restrict the depth of the
flattening? For example, what if I just wanted something like this:
{
"mays" : {"eggs" : "spam"},
"jam" : {"soda" : "soda/love" : "dump"},
lamba" : 23
}
They are left alone up the to second level of recursion.

George Sakkis · Feb 22, 2008

Hi Arnaud & Benjamin

Click to expand...

Here's a version that's a bit more general. It handles keys whose values
are empty dicts (assigning None to the value in the result), and also dict
keys that are not strings (see the test data below). It's also less
recursive as it only calls itself on values that are dicts.

Click to expand...

But.... it returns a dict whose keys are tuples. You then need to decide
what to do with this. Hence the helper function strdictflatten for the case
when all dict keys can be converted to str.

Click to expand...

As I told Arnaud in email, I greatly prefer his version for its elegance.

Terry

Click to expand...

def dictflatten(d, prefix=None):
result = {}
if prefix is None: prefix = tuple()
for k, v in d.iteritems():
key = prefix + (k,)
if isinstance(v, dict):
if v:
result.update(dictflatten(v, key))
else:
result[key] = None
else:
result[key] = v
return result

Click to expand...

def strdictflatten(d, sep='/'):
return dict((sep.join(map(str, k)), v) for k, v in dictflatten(d).iteritems())

Click to expand...

Thanks. This is great. Now, how would I restrict the depth of the
flattening? For example, what if I just wanted something like this:
{
"mays" : {"eggs" : "spam"},
"jam" : {"soda" : "soda/love" : "dump"},
lamba" : 23}

They are left alone up the to second level of recursion.

Why don't you do your homework on your own for a change ? Or at least
pretend you're even trying ?

Pythonic list/tuple/dict layout?	8	Jan 25, 2009
a flattening operator?	2	Apr 18, 2006
Multikey Dict?	6	Nov 13, 2005
Ordered dict by default	14	Feb 5, 2009
expanding a variable to a dict	8	Apr 3, 2008
Using a dict as if it were a module namespace	5	Jan 27, 2008
Doing set operation on non-hashable objects	5	Dec 24, 2008
stable object serialization to text file	1	Jan 11, 2012

flattening a dict

Benjamin

Jeff Schwab

Arnaud Delobelle

Boris Borcic

Terry Jones

Arnaud Delobelle

George Sakkis

Boris Borcic

Terry Jones

Arnaud Delobelle

Boris Borcic

Duncan Booth

Boris Borcic

Steven D'Aprano

Arnaud Delobelle

Jeff Schwab

Duncan Booth

Boris Borcic

Benjamin

George Sakkis

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads