Dict to "flat" list of (key,value)

Discussion in 'Python' started by Nicolas Girard, Jul 30, 2003.

  1. Hi,

    Forgive me if the answer is trivial, but could you tell me how to achieve
    the following:

    {k1:[v1,v2],k2:v3,...} --> [[k1,v1],[k1,v2],[k2,v3],...]

    The subtle point (at least to me) is to "flatten" values that are lists.

    Thanks in advance,
    Nicolas



    ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
    http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
    ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
    Nicolas Girard, Jul 30, 2003
    #1
    1. Advertising

  2. On Wed, 30 Jul 2003 21:26:19 +0200, Nicolas Girard <> wrote:

    >Hi,
    >
    >Forgive me if the answer is trivial, but could you tell me how to achieve
    >the following:
    >
    >{k1:[v1,v2],k2:v3,...} --> [[k1,v1],[k1,v2],[k2,v3],...]
    >
    >The subtle point (at least to me) is to "flatten" values that are lists.
    >

    Assuming v3 and others like it can't be references to lists
    (or how could you tell it from the format of k1's list?):

    (I quoted your names to avoid haing to define them separately)

    >>> d = {'k1':['v1','v2'],'k2':'v3'}
    >>> flat = []
    >>> for k,v in d.items():

    ... if isinstance(v,list):
    ... for v2 in v:
    ... flat.append([k,v2])
    ... else:
    ... flat.append([k,v])
    ...
    >>> flat

    [['k2', 'v3'], ['k1', 'v1'], ['k1', 'v2']]

    Or would you prefer an inscrutable one-liner ;-)

    Note that there is no guarantee of any particular order in d.items(),
    though you could sort them:

    >>> d = {'k1':['v1','v2'],'k2':'v3'}
    >>> flat = []
    >>> items = d.items()
    >>> items.sort() # on keys
    >>> for k,v in items:

    ... if isinstance(v,list):
    # Note that v is a list here, but does have pre-existing order you might want to keep.
    # If you wanted to guarantee sorted output of the v2's, you could
    # do v.sort() here, though you might surprise others holding references
    # to those lists, since they would see the sorting. To avoid that, make a copy first, e.g.,
    # v = v[:]; v.sort() #(untested, but I think it should work ;-)

    ... for v2 in v:
    ... flat.append([k,v2])
    ... else:
    ... flat.append([k,v])
    ...
    >>> flat

    [['k1', 'v1'], ['k1', 'v2'], ['k2', 'v3']]

    Regards,
    Bengt Richter
    Bengt Richter, Jul 30, 2003
    #2
    1. Advertising

  3. Nicolas Girard

    Brandon Beck Guest

    Someone will definitely come up with a better solution than this, but
    this could work for you at least as a first try

    import types
    def flatten(d, iterables=(types.ListType, types.TupleType)):
    rv = []
    for k, v in d.items():
    if type(v) in iterables:
    rv += [[k,x] for x in v]
    else:
    rv += [[k,v]]
    return rv

    If you want to change the types of things that get iterated over, just
    pass in a second parameter of your own.

    Hope that helps,
    Brandon



    "Nicolas Girard" <> wrote in
    message news:p...
    > Hi,
    >
    > Forgive me if the answer is trivial, but could you tell me how to

    achieve
    > the following:
    >
    > {k1:[v1,v2],k2:v3,...} --> [[k1,v1],[k1,v2],[k2,v3],...]
    >
    > The subtle point (at least to me) is to "flatten" values that are

    lists.
    >
    > Thanks in advance,
    > Nicolas
    >
    >
    >
    > ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet

    News==----
    > http://www.newsfeed.com The #1 Newsgroup Service in the World!
    >100,000 Newsgroups
    > ---= 19 East/West-Coast Specialized Servers - Total Privacy via

    Encryption =---
    Brandon Beck, Jul 30, 2003
    #3
  4. Nicolas Girard

    David C. Fox Guest

    Nicolas Girard wrote:
    > Hi,
    >
    > Forgive me if the answer is trivial, but could you tell me how to achieve
    > the following:
    >
    > {k1:[v1,v2],k2:v3,...} --> [[k1,v1],[k1,v2],[k2,v3],...]
    >
    > The subtle point (at least to me) is to "flatten" values that are lists.


    Well, I can get you at least part way, but there is an ambiguity in your
    original data structure which causes problems:

    1. d.items() will get you

    [(k1, [v1, v2]), (k2, v3), ...]

    2. Now, you want to expand out the list elements where the second part
    of each tuple is a list, which you can do with.

    flatten = lambda u: map(lambda x: (u[0], x), u[1])

    or, if the double lambda expression is confusing:

    def flatten(u):
    return map(lambda x: (u[0], x), u[1])

    (I've used a tuple (u[0], x) instead of a list, because your [k1, vn]
    are always pairs, but you can use a list if you prefer)

    Then flatten will take the nested element

    (k1, [v1, v2])

    and convert it to a list of tuples

    [(k1, v1), [k2, v2])]

    3. You want to apply flatten to each element of d.items(), which you
    can do with

    lol = map(flatten, d.items())

    which will give you a list of lists of tuples,

    4. and then reduce that to a list of tuples with

    reduce(operator.add, lol)


    Unfortunately, this doesn't quite work with your example above, because
    flatten won't work properly when applied to (k2, v3). If v3 is a
    sequence, it will instead give you [(k2, v3[0]), (k2, v3[1]), ...],
    which is probably not what you want (especially if v3 is a string - try
    flatten manually and see what happens). If v3 is not a sequence, you'll
    get a TypeError saying that argument 2 to map() must support iteration.

    If you *know* that v3 will NEVER be a list, you can modify flatten to
    handle the special case like so:

    def flatten(u):
    if isinstance(u[1], type([])):
    return (u[1], [u[2]])
    return map(lambda x: (u[0], x), u[1])

    Alternatively, if you can modify how the initial dictionary is generated
    to ensure that cases where a key has a single value always appear as
    k2: [v3] instead of k2: v3, then you can use the steps above without
    modification. This is probably better if it is feasible, because your
    original data structure is inherently ambiguous unless you know that v3
    will never be a list.

    David




    >
    > Thanks in advance,
    > Nicolas
    >
    >
    >
    > ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
    > http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
    > ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
    David C. Fox, Jul 30, 2003
    #4
  5. Nicolas Girard

    David C. Fox Guest

    David C. Fox wrote:

    Oops - a mistake and an omission

    > 4. and then reduce that to a list of tuples with
    >
    > reduce(operator.add, lol)


    You need to import the operator module first.

    > def flatten(u):
    > if isinstance(u[1], type([])):
    > return (u[1], [u[2]])
    > return map(lambda x: (u[0], x), u[1])


    I got my cases reversed - that should be "if not isinstance...
    David C. Fox, Jul 30, 2003
    #5
  6. Nicolas Girard

    Mike Rovner Guest

    Nicolas Girard wrote:
    > Forgive me if the answer is trivial, but could you tell me how to
    > achieve the following:
    >
    > {k1:[v1,v2],k2:v3,...} --> [[k1,v1],[k1,v2],[k2,v3],...]


    [list(i) for i in d.items()]

    Mike
    Mike Rovner, Jul 31, 2003
    #6
  7. Nicolas Girard

    Mike Rovner Guest

    Mike Rovner wrote:
    > Nicolas Girard wrote:
    >> Forgive me if the answer is trivial, but could you tell me how to
    >> achieve the following:
    >>
    >> {k1:[v1,v2],k2:v3,...} --> [[k1,v1],[k1,v2],[k2,v3],...]

    >
    > [list(i) for i in d.items()]


    Sorry, I missed v1,v2 flattening.
    Mike Rovner, Jul 31, 2003
    #7
  8. "Nicolas Girard" <> wrote in message
    news:p...
    > Hi,
    >
    > Forgive me if the answer is trivial, but could you tell me how to achieve
    > the following:
    >
    > {k1:[v1,v2],k2:v3,...} --> [[k1,v1],[k1,v2],[k2,v3],...]
    >
    > The subtle point (at least to me) is to "flatten" values that are lists.


    The other posters answered the question (as asked)
    by showing a loop that differentiated the two cases
    of inner lists vs single values. One further thought,
    is that the original data structure could be improved
    by building it so that every value is in a list:

    {k1:[v1,v2],k2:[v3],...}

    This is typically done by building the values with setdefault:

    index = {}
    for pagenum in range(len(pages)):
    page = pages[pagenum]
    for word in page:
    index.setdefault(word, []).append(pagenum)

    The new structure is much more easily flattened:

    [(k,v) for k, values in newstruct.iteritems() for v in values]


    it-all-starts-with-a-good-data-structure-ly yours,


    Raymond Hettinger
    Raymond Hettinger, Aug 2, 2003
    #8
  9. Nicolas Girard

    John Machin Guest

    On Sat, 02 Aug 2003 12:41:19 GMT, "Raymond Hettinger"
    <> wrote:


    >it-all-starts-with-a-good-data-structure-ly yours,


    Amen, brother.

    Even worse: I recall seeing code somewhere that had a data structure
    like this:

    {k1: [v1,v2], k2: v3, k3: None, ...}
    instead of
    {k1: [v1,v2], k2: [v3], k3: [], ...}

    I liked the elegant code example for building a book index. However in
    practice the user requirement would be not to have duplicated page
    numbers when a word occurs more than once on the same page. If you can
    achieve that elegantly, please post it!

    Cheers,
    John
    John Machin, Aug 3, 2003
    #9
  10. John Machin wrote:

    > On Sat, 02 Aug 2003 12:41:19 GMT, "Raymond Hettinger"
    > <> wrote:
    >>

    [a perfectly fine chunk of code]
    >>it-all-starts-with-a-good-data-structure-ly yours,

    >
    > I liked the elegant code example for building a book index. However in
    > practice the user requirement would be not to have duplicated page
    > numbers when a word occurs more than once on the same page. If you can
    > achieve that elegantly, please post it!


    OK, I'll bite:
    def genindex(pages):
    index = {}
    for pagenum in range(len(pages)):
    page = pages[pagenum]
    for word in page:
    index.setdefault(word, {})[pagenum] = None

    or with 2.3:
    def genindex(pages):
    index = {}
    for pagenum, page in enumerate(pages):
    for word in page:
    index.setdefault(word, {})[pagenum] = None
    return index

    With either one, flattening looks like:
    def flatten(index):
    flattened = [(word, list(pages)) for word, pages
    in index.items()]
    for word, pages in flattened:
    pages.sort()
    flattened.sort()
    return flatten

    For fun, try:

    book = ['this is a test'.split(),
    'this is only a test'.split(),
    'had this been a real function, you care a bit'.split()]
    for word, pages in flatten(index(book)):
    print word, pages

    -Scott David Daniels
    Scott David Daniels, Aug 3, 2003
    #10
  11. On Sun, 03 Aug 2003 00:28:53 GMT, (John Machin) wrote:

    >On Sat, 02 Aug 2003 12:41:19 GMT, "Raymond Hettinger"
    ><> wrote:
    >
    >
    >>it-all-starts-with-a-good-data-structure-ly yours,

    >
    >Amen, brother.
    >
    >Even worse: I recall seeing code somewhere that had a data structure
    >like this:
    >
    >{k1: [v1,v2], k2: v3, k3: None, ...}
    >instead of
    >{k1: [v1,v2], k2: [v3], k3: [], ...}
    >

    Page 1

    >I liked the elegant code example for building a book index. However in
    >practice the user requirement would be not to have duplicated page
    >numbers when a word occurs more than once on the same page. If you can
    >achieve that elegantly, please post it!
    >

    Page 2

    Don't know about elegance, but I wanted to play with my new 2.3 version ;-)

    Assuming the Timbot has special-cased small sets efficiently, and
    assuming pageSeq below is a generator can yield a sequence of word-iterable
    page objects (e.g. lists of words left after cleaning out punctuation
    etc of a page, however page boundaries are detected, etc etc):

    (FTHOI, the following pageSeq generator expects to find a "Page xxx"
    line ending each page, with nothing else on the line). I'm going to use
    this post as a "book" ;-)

    Page 3


    ====< bookindex.py >=================================================

    keepalnum = ''.join([c.isalnum() and c or ' ' for c in [chr(i) for i in range(256)]])

    def pageSeq(bookFile): # expect open bookFile ready to read
    pageWords = []
    pageId = '<prologue>'
    while 1:
    line = bookFile.readline()
    if not line: break
    lineToks = line.translate(keepalnum).split()
    if not lineToks: continue
    # expect single footer line with only "Page pgid" at end of page
    if len(lineToks)==2 and lineToks[0]=='Page':
    pageId = lineToks[1]
    yield pageId, pageWords
    pageWords = []; pageId = '(after %s)'% pageId
    else: pageWords.extend([tok for tok in lineToks if tok.isalpha()]) # rej digits in words
    if pageWords:
    yield pageId, pageWords

    def bookIndex(bookFile):
    # (untested, requires 2.3 features)
    from sets import Set
    wordIndex = {}
    for pgId, page in pageSeq(bookFile):
    for word in page:
    try:
    wordIndex[word].add(pgId)
    except KeyError:
    wordIndex[word] = Set([pgId])

    words = wordIndex.keys()
    words.sort()
    for word in words:
    pageIds = list(wordIndex[word])
    pnMaxWidth = max(map(len,pageIds))
    pageIds = [pid.rjust(pnMaxWidth) for pid in pageIds]
    pageIds.sort()
    pageIds = map(str.strip, pageIds)
    yield (word, ', '.join(pageIds))

    if __name__ == '__main__':
    import sys
    args = sys.argv[1:]
    if args:
    arg = args.pop(0)
    if arg == '-': f = sys.stdin
    else: f = file(arg)
    iCol=0
    for wordOccLine in bookIndex(f):
    print ('%14s: %-10s' % wordOccLine),
    if iCol%3 == 2: print
    iCol += 1
    print
    else:
    raise SystemExit(
    'Usage: python bookindex.py bookfilepath\n'
    ' a single minus for bookfilepath uses sys.stdin')
    =====================================================================
    Page CODE

    I started to post just the bookIndex generator (still marked untested, which is
    almost right still ;-) Then one think led to another ;-)

    Page LAST

    I'll copy above this line to the clip board and filter it through. Result:

    [22:22] C:\pywk\clp>getclip | python bookindex.py - |more
    Amen: 1 Assuming: 3 Aug: 1
    Don: 3 Even: 1 FTHOI: 3
    GMT: 1 Hettinger: 1 However: 2
    I: 1, 2, 3, LAST If: 2 John: 1
    KeyError: CODE Machin: 1 None: 1
    On: 1 Page: 3, CODE Raymond: 1
    Sat: 1 Set: CODE Sun: 1
    SystemExit: CODE Then: LAST Timbot: 3
    Usage: CODE a: 1, 2, 3, CODE about: 3
    achieve: 2 add: CODE after: 3, CODE
    all: 1 almost: LAST and: 3, CODE
    another: LAST are: 3 arg: CODE
    args: CODE argv: CODE as: 3
    assuming: 3 at: CODE be: 2
    below: 3 book: 2, 3 bookFile: CODE
    bookIndex: CODE, LAST bookfilepath: CODE bookindex: CODE
    boundaries: 3 break: CODE brother: 1
    building: 2 but: 3 c: CODE
    can: 2, 3 cased: 3 chr: CODE
    cleaning: 3 code: 1, 2 continue: CODE
    data: 1 def: CODE detected: 3
    digits: CODE duplicated: 2 e: 3
    each: 3 efficiently: 3 elegance: 3
    elegant: 2 elegantly: 2 else: 3, CODE
    end: CODE ending: 3 etc: 3
    example: 2 except: CODE expect: CODE
    expects: 3 extend: CODE f: CODE
    features: CODE file: CODE find: 3
    following: 3 footer: CODE for: 2, CODE
    from: CODE g: 3 generator: 3, LAST
    going: 3 good: 1 had: 1
    has: 3 have: 2 however: 3
    i: CODE iCol: CODE if: CODE
    import: CODE in: 2, CODE index: 2
    instead: 1 is: 3, LAST isalnum: CODE
    isalpha: CODE it: 1, 2 iterable: 3
    join: CODE just: LAST keepalnum: CODE
    keys: CODE know: 3 led: LAST
    left: 3 len: CODE lexicon: 1
    like: 1 liked: 2 line: 3, CODE
    lineToks: CODE list: CODE lists: 3
    ly: 1 m: 3 main: CODE
    map: CODE marked: LAST max: CODE
    minus: CODE more: 2 my: 3
    n: CODE name: CODE net: 1
    new: 3 not: 2, CODE nothing: 3
    numbers: 2 objects: 3 occurs: 2
    of: 1, 3, CODE on: 2, 3 once: 2
    one: LAST only: CODE open: CODE
    or: CODE out: 3 page: 2, 3, CODE
    pageId: CODE pageIds: CODE pageSeq: 3, CODE
    pageWords: CODE pgId: CODE pgid: CODE
    pid: CODE play: 3 please: 2
    pnMaxWidth: CODE pop: CODE post: 2, 3, LAST
    practice: 2 print: CODE prologue: CODE
    punctuation: 3 py: CODE python: CODE
    raise: CODE range: CODE read: CODE
    readline: CODE ready: CODE recall: 1
    rej: CODE requirement: 2 requires: CODE
    right: LAST rjust: CODE s: CODE
    same: 2 seeing: 1 sequence: 3
    sets: 3, CODE single: CODE sjmachin: 1
    small: 3 somewhere: 1 sort: CODE
    special: 3 split: CODE started: LAST
    starts: 1 stdin: CODE still: LAST
    str: CODE strip: CODE structure: 1
    sys: CODE t: 3 than: 2
    that: 1, 2 the: 2, 3, LAST think: LAST
    this: 1, 3 to: 2, 3, CODE, LAST tok: CODE
    translate: CODE try: CODE untested: CODE, LAST
    use: 3 user: 2 uses: CODE
    verizon: 1 version: 3 wanted: 3
    when: 2 which: LAST while: CODE
    with: 1, 3, CODE word: 2, 3, CODE wordIndex: CODE
    wordOccLine: CODE words: 3, CODE worse: 1
    would: 2 wrote: 1 xxx: 3
    yield: 3, CODE you: 2 yours: 1



    Regards,
    Bengt Richter
    Bengt Richter, Aug 3, 2003
    #11
  12. [Raymond]
    > >index = {}
    > > for pagenum in range(len(pages)):
    > > page = pages[pagenum]
    > > for word in page:
    > > index.setdefault(word, []).append(pagenum)
    > >
    > >it-all-starts-with-a-good-data-structure-ly yours,


    [John Machin]
    > Amen, brother.
    >
    > Even worse: I recall seeing code somewhere that had a data structure
    > like this:
    >
    > {k1: [v1,v2], k2: v3, k3: None, ...}
    > instead of
    > {k1: [v1,v2], k2: [v3], k3: [], ...}
    >
    > I liked the elegant code example for building a book index. However in
    > practice the user requirement would be not to have duplicated page
    > numbers when a word occurs more than once on the same page. If you can
    > achieve that elegantly, please post it!


    How about a simple Py2.3 clean-up pass:

    # Uniquify and sort each list of page numbers
    for word, pagelist in index.iteritems():
    newlist = dict.fromkeys(pagelist).keys()
    newlist.sort()
    index[word] = newlist


    Raymond Hettinger
    Raymond Hettinger, Aug 4, 2003
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Seongsu Lee
    Replies:
    21
    Views:
    576
    MonkeeSage
    Dec 14, 2007
  2. Albert van der Horst

    dict's as dict's key.

    Albert van der Horst, Jan 13, 2010, in forum: Python
    Replies:
    5
    Views:
    255
    Lie Ryan
    Jan 17, 2010
  3. Christoph Groth

    dict: retrieve the original key by key

    Christoph Groth, May 15, 2011, in forum: Python
    Replies:
    0
    Views:
    194
    Christoph Groth
    May 15, 2011
  4. Christoph Groth

    Re: dict: retrieve the original key by key

    Christoph Groth, May 15, 2011, in forum: Python
    Replies:
    5
    Views:
    289
    Ian Kelly
    May 16, 2011
  5. Victor Hooi
    Replies:
    1
    Views:
    107
    Victor Hooi
    Oct 29, 2013
Loading...

Share This Page