updating dictionaries from/to dictionaries

Discussion in 'Python' started by Brandon, Aug 11, 2008.

  1. Brandon

    Brandon Guest

    Hi all,

    I am not altogether experienced in Python, but I haven't been able to
    find a good example of the syntax that I'm looking for in any tutorial
    that I've seen. Hope somebody can point me in the right direction.

    This should be pretty simple: I have two dictionaries, foo and bar.
    I am certain that all keys in bar belong to foo as well, but I also
    know that not all keys in foo exist in bar. All the keys in both foo
    and bar are tuples (in the bigram form ('word1', 'word2)). I have to
    prime foo so that each key has a value of 1. The values for the keys
    in bar are variable integers. All I want to do is run a loop through
    foo, match any of its keys that also exist in bar, and add those key's
    values in bar to the preexisting value of 1 for the corresponding key
    in foo. So in the end the key,value pairs in foo won't necessarily
    be, for example, 'tuple1: 1', but also 'tuple2: 31' if tuple2 had a
    value of 30 in bar.

    I *think* the get method might work, but I'm not sure that it can work
    on two dictionaries the way that I'm getting at. I thought that
    converting the dictionaries to lists might work, but I can't see a way
    yet to match the tuple key as x[0][0] in one list for all y in the
    other list. There's just got to be a better way!

    Thanks for any help,
    Brandon
    (trying hard to be Pythonic but isn't there yet)
     
    Brandon, Aug 11, 2008
    #1
    1. Advertising

  2. for k in foo:
    foo[k] += bar.get(k, 0)

    On Mon, Aug 11, 2008 at 3:27 AM, Brandon <> wrote:
    > Hi all,
    >
    > I am not altogether experienced in Python, but I haven't been able to
    > find a good example of the syntax that I'm looking for in any tutorial
    > that I've seen. Hope somebody can point me in the right direction.
    >
    > This should be pretty simple: I have two dictionaries, foo and bar.
    > I am certain that all keys in bar belong to foo as well, but I also
    > know that not all keys in foo exist in bar. All the keys in both foo
    > and bar are tuples (in the bigram form ('word1', 'word2)). I have to
    > prime foo so that each key has a value of 1. The values for the keys
    > in bar are variable integers. All I want to do is run a loop through
    > foo, match any of its keys that also exist in bar, and add those key's
    > values in bar to the preexisting value of 1 for the corresponding key
    > in foo. So in the end the key,value pairs in foo won't necessarily
    > be, for example, 'tuple1: 1', but also 'tuple2: 31' if tuple2 had a
    > value of 30 in bar.
    >
    > I *think* the get method might work, but I'm not sure that it can work
    > on two dictionaries the way that I'm getting at. I thought that
    > converting the dictionaries to lists might work, but I can't see a way
    > yet to match the tuple key as x[0][0] in one list for all y in the
    > other list. There's just got to be a better way!
    >
    > Thanks for any help,
    > Brandon
    > (trying hard to be Pythonic but isn't there yet)
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >




    --
    Read my blog! I depend on your acceptance of my opinion! I am interesting!
    http://ironfroggy-code.blogspot.com/
     
    Calvin Spealman, Aug 11, 2008
    #2
    1. Advertising

  3. Brandon

    John Machin Guest

    On Aug 11, 6:24 pm, "Calvin Spealman" <> wrote:
    > for k in foo:
    >   foo[k] += bar.get(k, 0)


    An alternative:

    for k in bar:
    foo[k] += bar[k]

    The OP asserts that foo keys are a superset of bar keys. If that
    assertion is not true (i.e. there are keys in bar that are not in foo,
    your code will silently ignore them whereas mine will cause an
    exception to be raised (better behaviour IMHO). If the assertion is
    true, mine runs faster (even when len(foo) == len(bar).
     
    John Machin, Aug 11, 2008
    #3
  4. On Mon, 11 Aug 2008 00:27:46 -0700, Brandon wrote:

    > This should be pretty simple: I have two dictionaries, foo and bar. I
    > am certain that all keys in bar belong to foo as well, but I also know
    > that not all keys in foo exist in bar. All the keys in both foo and bar
    > are tuples (in the bigram form ('word1', 'word2)). I have to prime foo
    > so that each key has a value of 1.


    The old way:

    foo = {}
    for key in all_the_keys:
    foo[key] = 1


    The new way:

    foo = dict.fromkeys(all_the_keys, 1)


    > The values for the keys in bar are
    > variable integers. All I want to do is run a loop through foo, match
    > any of its keys that also exist in bar, and add those key's values in
    > bar to the preexisting value of 1 for the corresponding key in foo. So
    > in the end the key,value pairs in foo won't necessarily be, for example,
    > 'tuple1: 1', but also 'tuple2: 31' if tuple2 had a value of 30 in bar.


    Harder to say what you want to do than to just do it.

    The long way:

    for key in foo:
    if bar.has_key(key):
    foo[key] = foo[key] + bar[key]



    Probably a better way:

    for key, value in foo.iteritems():
    foo[key] = value + bar.get(key, 0)



    You should also investigate the update method of dictionaries. From an
    interactive session, type:

    help({}.update)

    then the Enter key.



    --
    Steven
     
    Steven D'Aprano, Aug 11, 2008
    #4
  5. Brandon

    Brandon Guest

    "Harder to say what you want to do than to just do it."

    The truly terrible thing is when you know that's the case even as
    you're saying it. Thanks for the help, all!
     
    Brandon, Aug 12, 2008
    #5
  6. Brandon

    John Machin Guest

    On Aug 12, 2:52 am, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:
    > On Mon, 11 Aug 2008 00:27:46 -0700, Brandon wrote:
    > > This should be pretty simple: I have two dictionaries, foo and bar. I
    > > am certain that all keys in bar belong to foo as well, but I also know
    > > that not all keys in foo exist in bar. All the keys in both foo and bar
    > > are tuples (in the bigram form ('word1', 'word2)). I have to prime foo
    > > so that each key has a value of 1.

    >

    [snip]
    > > The values for the keys in bar are
    > > variable integers. All I want to do is run a loop through foo, match
    > > any of its keys that also exist in bar, and add those key's values in
    > > bar to the preexisting value of 1 for the corresponding key in foo. So
    > > in the end the key,value pairs in foo won't necessarily be, for example,
    > > 'tuple1: 1', but also 'tuple2: 31' if tuple2 had a value of 30 in bar.

    >
    > Harder to say what you want to do than to just do it.
    >
    > The long way:
    >
    > for key in foo:
    > if bar.has_key(key):


    dict.has_key(key) is nigh on obsolete since Python 2.2 introduced the
    "key in dict" syntax.

    > foo[key] = foo[key] + bar[key]


    and foo[key] += bar[key] works in Python 2.1, maybe earlier.

    >
    > Probably a better way:
    >
    > for key, value in foo.iteritems():
    > foo[key] = value + bar.get(key, 0)


    Yeah, probably better than using has_key ...

    >
    > You should also investigate the update method of dictionaries. From an
    > interactive session, type:
    >
    > help({}.update)
    >
    > then the Enter key.


    I'm not sure what relevance dict.update has to the OP's problem.

    Help is fine for when you need a reminder of the syntax of some method
    you already know about. I'd suggest reading the manual of a modern
    version of Python (http://docs.python.org/lib/typesmapping.html) to
    get an overview of all the dict methods. The manual includes useful
    information that isn't in help, like "a.has_key(k) Equivalent to k
    in a, use that form in new code".
     
    John Machin, Aug 12, 2008
    #6
  7. Brandon

    Brandon Guest

    I wasn't sure about the update method either, since AFAICT (not far)
    the values would in fact update, not append as I needed them to. But
    the iteritems and get combo definitely worked for me.

    Thank you for the suggested link. I'm familiar with that page, but my
    skill level isn't so far along yet that I can more or less intuitively
    see how to combine methods, particularly in dictionaries. What would
    be a dream for me is if somebody just had tons of use-case examples -
    basically this post, condensed, for every potent combination of
    dictionary methods. A guy can dream.
     
    Brandon, Aug 12, 2008
    #7
  8. Brandon

    John Machin Guest

    On Aug 12, 9:14 am, Brandon <> wrote:
    > I wasn't sure about the update method either, since AFAICT (not far)
    > the values would in fact update, not append as I needed them to.


    "append"? Don't you mean "add"???

    > But
    > the iteritems and get combo definitely worked for me.


    Under some definition of "worked", yes, it would. What were your
    selection criteria?

    >
    > Thank you for the suggested link. I'm familiar with that page, but my
    > skill level isn't so far along yet that I can more or less intuitively
    > see how to combine methods, particularly in dictionaries. What would
    > be a dream for me is if somebody just had tons of use-case examples -
    > basically this post, condensed, for every potent combination of
    > dictionary methods. A guy can dream.


    Nobody is going to write that, and if they did, what would you do?
    Read it linearly, trying to find a match to your use-case? Forget
    dreams. What you need to do is practice translating from your
    requirements into Python, and it's not all that hard:

    "run a loop through foo" -> for key in foo:
    "match any of its keys that also exist in bar" -> if key in bar:
    "add those key's values in bar to the preexisting value for the
    corresponding key in foo" -> foo[key] += bar[key]

    But you also need to examine your requirements:
    (1) on a mechanical level, as I tried to point out in my first
    response, if as you say all keys in bar are also in foo, you can
    iterate over bar instead of and faster than iterating over foo.
    (2) at a higher level, it looks like bar contains a key for every
    possible bigram, and you are tallying actual counts in bar, and what
    you want out for any bigram is (1 + number_of_occurrences) i.e.
    Laplace adjustment. Are you sure you really need to do this two-dict
    caper? Consider using only one dictionary (zot):

    Initialise:
    zot = {}

    To tally:
    if key in zot:
    zot[key] += 1
    else:
    zot[key] = 1

    Adjusted count (irrespective of whether bigram exists or not):
    zot.get(key, 0) + 1

    This method uses space proportional to the number of bigrams that
    actually exist. You might also consider collections.defaultdict, but
    such a dict may end up containing entries for keys that you ask about
    (depending on how you ask), not just ones that exist.

    HTH,
    John
     
    John Machin, Aug 12, 2008
    #8
  9. Brandon

    Brandon Guest

    John:

    > "append"? Don't you mean "add"???


    Yes, that is what I meant, my apologies.

    > What you need to do is practice translating from your
    > requirements into Python, and it's not all that hard:
    >
    > "run a loop through foo" -> for key in foo:
    > "match any of its keys that also exist in bar" -> if key in bar:
    > "add those key's values in bar to the preexisting value for the
    > corresponding key in foo" -> foo[key] += bar[key]


    Due to my current level of numbskullery, when I start to see things
    like tuples as keys, the apparent ease of this evaporates in front of
    my eyes! I know that I need more practice, though, and it will come.
    >
    > But you also need to examine your requirements:
    > (1) on a mechanical level, as I tried to point out in my first
    > response, if as you say all keys in bar are also in foo, you can
    > iterate over bar instead of and faster than iterating over foo.
    > (2) at a higher level, it looks like bar contains a key for every
    > possible bigram, and you are tallying actual counts in bar, and what
    > you want out for any bigram is (1 + number_of_occurrences) i.e.
    > Laplace adjustment. Are you sure you really need to do this two-dict
    > caper? Consider using only one dictionary (zot):
    >
    > Initialise:
    > zot = {}
    >
    > To tally:
    > if key in zot:
    > zot[key] += 1
    > else:
    > zot[key] = 1
    >
    > Adjusted count (irrespective of whether bigram exists or not):
    > zot.get(key, 0) + 1
    >
    > This method uses space proportional to the number of bigrams that
    > actually exist. You might also consider collections.defaultdict, but
    > such a dict may end up containing entries for keys that you ask about
    > (depending on how you ask), not just ones that exist.


    You are very correct about the Laplace adjustment. However, a more
    precise statement of my overall problem would involve training and
    testing which utilizes bigram probabilities derived in part from the
    Laplace adjustment; as I understand the workflow that I should follow,
    I can't allow myself to be constrained only to bigrams that actually
    exist in training or my overall probability when I run through testing
    will be thrown off to 0 as soon as a test bigram that doesn't exist in
    training is encountered. Hence my desire to find all possible bigrams
    in train (having taken steps to ensure proper set relations between
    train and test). The best way I can currently see to do this is with
    my current two-dictionary "caper", and by iterating over foo, not
    bar :)

    And yes, I know it seems silly to wish for that document with the use-
    cases, but personally speaking, even if the thing is rather lengthy, I
    would probably pick up better techniques for general knowledge by
    reading through it and seeing the examples.

    I actually think that there would be a good market (if only in
    mindshare) for a thorough examination of the power of lists, nested
    lists, and dictionaries (with glorious examples) - something that
    might appeal to a lot of non-full time programmers who need to script
    a lot but want to be efficient about it, yet don't want to deal with a
    tutorial that unnecessarily covers all the aspects of Python. My
    $0.027 (having gone up due to the commodities markets).

    Thanks again for the input, I do appreciate it!

    Brandon
     
    Brandon, Aug 12, 2008
    #9
  10. Brandon

    John Machin Guest

    On Aug 12, 12:26 pm, Brandon <> wrote:

    >
    > You are very correct about the Laplace adjustment. However, a more
    > precise statement of my overall problem would involve training and
    > testing which utilizes bigram probabilities derived in part from the
    > Laplace adjustment; as I understand the workflow that I should follow,
    > I can't allow myself to be constrained only to bigrams that actually
    > exist in training or my overall probability when I run through testing
    > will be thrown off to 0 as soon as a test bigram that doesn't exist in
    > training is encountered. Hence my desire to find all possible bigrams
    > in train (having taken steps to ensure proper set relations between
    > train and test).
    > The best way I can currently see to do this is with
    > my current two-dictionary "caper", and by iterating over foo, not
    > bar :)


    I can't grok large chunks of the above, especially these troublesome
    test bigrams that don't exist in training but which you desire to find
    in train(ing?).

    However let's look at the mechanics: Are you now saying that your
    original assertion "I am certain that all keys in bar belong to foo as
    well" was not quite "precise"? If not, please explain why you think
    you need to iterate (slowly) over foo in order to accomplish your
    stated task.
     
    John Machin, Aug 12, 2008
    #10
  11. Brandon

    Brandon Guest

    On Aug 12, 7:26 am, John Machin <> wrote:
    > On Aug 12, 12:26 pm, Brandon <> wrote:
    >
    >
    >
    > > You are very correct about the Laplace adjustment. However, a more
    > > precise statement of my overall problem would involve training and
    > > testing which utilizes bigram probabilities derived in part from the
    > > Laplace adjustment; as I understand the workflow that I should follow,
    > > I can't allow myself to be constrained only to bigrams that actually
    > > exist in training or my overall probability when I run through testing
    > > will be thrown off to 0 as soon as a test bigram that doesn't exist in
    > > training is encountered. Hence my desire to find all possible bigrams
    > > in train (having taken steps to ensure proper set relations between
    > > train and test).
    > > The best way I can currently see to do this is with
    > > my current two-dictionary "caper", and by iterating over foo, not
    > > bar :)

    >
    > I can't grok large chunks of the above, especially these troublesome
    > test bigrams that don't exist in training but which you desire to find
    > in train(ing?).
    >
    > However let's look at the mechanics: Are you now saying that your
    > original assertion "I am certain that all keys in bar belong to foo as
    > well" was not quite "precise"? If not, please explain why you think
    > you need to iterate (slowly) over foo in order to accomplish your
    > stated task.


    I was merely trying to be brief. The statement of my certainty about
    foo/bar was precise as a stand-alone statement, but I was attempting
    to say that within the context of the larger problem, I need to
    iterate over foo.

    This is actually for a school project, but as I have already worked
    out a feasible (if perhaps not entirely optimized) workflow, I don't
    feel overly guilty about sharing this or getting some small amount of
    input - but certainly none is asked for beyond what you've given
    me :) I am tasked with finding the joint probability of a test
    sequence, utilizing bigram probabilities derived from train(ing)
    counts.

    I have ensured that all members (unigrams) of test are also members of
    train, although I do not have any idea as to bigram frequencies in
    test. Thus I need to iterate over all members of train for training
    bigram frequencies in order to be prepared for any test bigram I might
    encounter.

    The problem is that without Laplace smoothing, many POTENTIAL bigrams
    in train might have an ACTUAL frequency of 0 in train. And if one or
    more of those bigrams which have 0 frequency in train is actually
    found in test, the joint probability of test will become 0, and that's
    no fun at all. So I made foo dictionary that creates all POTENTIAL
    training bigrams with a smoothed frequency of 1. I also made bar
    dictionary that creates keys of all ACTUAL training bigrams with their
    actual values. I needed to combine the two dictionaries as a first
    step to eventually finding the test sequence probability. So any
    bigram in test will at least have a smoothed train frequency of 1 and
    possibly a smoothed train frequency of the existing train value + 1.
    Having iterated over foo, foo becomes the dictionary which holds these
    smoothed & combined train frequencies. I don't see a way to combine
    the two types of counts into one dictionary without keeping them
    separate first. Hence the caper.

    Sorry for the small essay.

    P.S. I do realize that there are better smoothing methods than
    Laplace, but that is what the problem has specified.
     
    Brandon, Aug 12, 2008
    #11
  12. Brandon

    John Machin Guest

    On Aug 13, 5:33 am, Brandon <> wrote:
    > On Aug 12, 7:26 am, John Machin <> wrote:
    >
    >
    >
    > > On Aug 12, 12:26 pm, Brandon <> wrote:

    >
    > > > You are very correct about the Laplace adjustment. However, a more
    > > > precise statement of my overall problem would involve training and
    > > > testing which utilizes bigram probabilities derived in part from the
    > > > Laplace adjustment; as I understand the workflow that I should follow,
    > > > I can't allow myself to be constrained only to bigrams that actually
    > > > exist in training or my overall probability when I run through testing
    > > > will be thrown off to 0 as soon as a test bigram that doesn't exist in
    > > > training is encountered. Hence my desire to find all possible bigrams
    > > > in train (having taken steps to ensure proper set relations between
    > > > train and test).
    > > > The best way I can currently see to do this is with
    > > > my current two-dictionary "caper", and by iterating over foo, not
    > > > bar :)

    >
    > > I can't grok large chunks of the above, especially these troublesome
    > > test bigrams that don't exist in training but which you desire to find
    > > in train(ing?).

    >
    > > However let's look at the mechanics: Are you now saying that your
    > > original assertion "I am certain that all keys in bar belong to foo as
    > > well" was not quite "precise"? If not, please explain why you think
    > > you need to iterate (slowly) over foo in order to accomplish your
    > > stated task.

    >
    > I was merely trying to be brief. The statement of my certainty about
    > foo/bar was precise as a stand-alone statement, but I was attempting
    > to say that within the context of the larger problem, I need to
    > iterate over foo.
    >
    > This is actually for a school project, but as I have already worked
    > out a feasible (if perhaps not entirely optimized) workflow, I don't
    > feel overly guilty about sharing this or getting some small amount of
    > input - but certainly none is asked for beyond what you've given
    > me :) I am tasked with finding the joint probability of a test
    > sequence, utilizing bigram probabilities derived from train(ing)
    > counts.
    >
    > I have ensured that all members (unigrams) of test are also members of
    > train, although I do not have any idea as to bigram frequencies in
    > test. Thus I need to iterate over all members of train for training
    > bigram frequencies in order to be prepared for any test bigram I might
    > encounter.
    >
    > The problem is that without Laplace smoothing, many POTENTIAL bigrams
    > in train might have an ACTUAL frequency of 0 in train. And if one or
    > more of those bigrams which have 0 frequency in train is actually
    > found in test, the joint probability of test will become 0, and that's
    > no fun at all. So I made foo dictionary that creates all POTENTIAL
    > training bigrams with a smoothed frequency of 1. I also made bar
    > dictionary that creates keys of all ACTUAL training bigrams with their
    > actual values. I needed to combine the two dictionaries as a first
    > step to eventually finding the test sequence probability.


    Let's assume this need is real for the moment. Put this loop in your
    code after the creation of foo and bar and before you "combine" them:

    for key in bar:
    assert key in foo

    Does it cause an exception? If so, either:
    you have a bug in the creation of foo or bar (or both!),
    or:
    the certainty you had in making your opening statement "I am
    certain that all keys in bar belong to foo as well" was not well-
    founded.

    If however it is correct that all keys in bar are also to be found in
    foo, then the following snippets of code are equivalent for your
    purpose of adding bar frequencies into foo:

    (1) iterating over foo:
    for key in foo:
    foo[key] += bar.get(key, 0)

    (2) iterating over bar:
    for key in bar:
    foo[key] += bar[key]

    I (again) challenge you to say *why* you feel that the "iterating over
    bar" solution will not work.


    > So any
    > bigram in test will at least have a smoothed train frequency of 1 and
    > possibly a smoothed train frequency of the existing train value + 1.
    > Having iterated over foo, foo becomes the dictionary which holds these
    > smoothed & combined train frequencies. I don't see a way to combine
    > the two types of counts into one dictionary without keeping them
    > separate first. Hence the caper.
    >


    Let's start with "So I made foo dictionary that creates all POTENTIAL
    training bigrams with a smoothed frequency of 1". Let me guess that
    you have a set W of all words ever used/usable in the language of the
    texts that you are considering ... let N = len(W). So the number of
    potential bigrams is N**2. Hmmm, how large is N, and have you actually
    run the foo-building code yet?

    Now, assuming foo does fit in memory etc, you get to the stage where
    you have a test message containing a bigram b = (word1, word2). Its
    smoothed frequency will be foo. If b is in bar, this should be
    equal to bar + 1. Otherwise it will be 1.

    So:
    (1) foo == bar.get(b, 0) + 1
    (2) foo is redundant. If you want to check that b is "legal", use
    (word1 in W and word2 in W).

    Please attempt to refute the specific points above, rather than
    writing another essay :)

    Cheers,
    John
     
    John Machin, Aug 12, 2008
    #12
  13. Brandon

    Brandon Guest


    > (1) iterating over foo:
    > for key in foo:
    > foo[key] += bar.get(key, 0)
    >
    > (2) iterating over bar:
    > for key in bar:
    > foo[key] += bar[key]
    >
    > I (again) challenge you to say *why* you feel that the "iterating over
    > bar" solution will not work.



    Well if you're going to be clever enough to iterate over bar and then
    send the results to another dictionary altogether, I obviously cannot
    put up a good argument on this matter!

    Thanks for the input, I appreciate it.
     
    Brandon, Aug 15, 2008
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. TJS
    Replies:
    0
    Views:
    411
  2. Replies:
    1
    Views:
    1,565
    Roedy Green
    Jan 9, 2006
  3. omission9
    Replies:
    13
    Views:
    784
    Ben Finney
    Jan 27, 2004
  4. lysdexia
    Replies:
    6
    Views:
    526
    John Machin
    Dec 2, 2007
  5. news.rcn.com
    Replies:
    2
    Views:
    1,147
    Roedy Green
    Dec 10, 2007
Loading...

Share This Page