updating dictionaries from/to dictionaries

B

Brandon

Hi all,

I am not altogether experienced in Python, but I haven't been able to
find a good example of the syntax that I'm looking for in any tutorial
that I've seen. Hope somebody can point me in the right direction.

This should be pretty simple: I have two dictionaries, foo and bar.
I am certain that all keys in bar belong to foo as well, but I also
know that not all keys in foo exist in bar. All the keys in both foo
and bar are tuples (in the bigram form ('word1', 'word2)). I have to
prime foo so that each key has a value of 1. The values for the keys
in bar are variable integers. All I want to do is run a loop through
foo, match any of its keys that also exist in bar, and add those key's
values in bar to the preexisting value of 1 for the corresponding key
in foo. So in the end the key,value pairs in foo won't necessarily
be, for example, 'tuple1: 1', but also 'tuple2: 31' if tuple2 had a
value of 30 in bar.

I *think* the get method might work, but I'm not sure that it can work
on two dictionaries the way that I'm getting at. I thought that
converting the dictionaries to lists might work, but I can't see a way
yet to match the tuple key as x[0][0] in one list for all y in the
other list. There's just got to be a better way!

Thanks for any help,
Brandon
(trying hard to be Pythonic but isn't there yet)
 
C

Calvin Spealman

for k in foo:
foo[k] += bar.get(k, 0)

Hi all,

I am not altogether experienced in Python, but I haven't been able to
find a good example of the syntax that I'm looking for in any tutorial
that I've seen. Hope somebody can point me in the right direction.

This should be pretty simple: I have two dictionaries, foo and bar.
I am certain that all keys in bar belong to foo as well, but I also
know that not all keys in foo exist in bar. All the keys in both foo
and bar are tuples (in the bigram form ('word1', 'word2)). I have to
prime foo so that each key has a value of 1. The values for the keys
in bar are variable integers. All I want to do is run a loop through
foo, match any of its keys that also exist in bar, and add those key's
values in bar to the preexisting value of 1 for the corresponding key
in foo. So in the end the key,value pairs in foo won't necessarily
be, for example, 'tuple1: 1', but also 'tuple2: 31' if tuple2 had a
value of 30 in bar.

I *think* the get method might work, but I'm not sure that it can work
on two dictionaries the way that I'm getting at. I thought that
converting the dictionaries to lists might work, but I can't see a way
yet to match the tuple key as x[0][0] in one list for all y in the
other list. There's just got to be a better way!

Thanks for any help,
Brandon
(trying hard to be Pythonic but isn't there yet)
 
J

John Machin

for k in foo:
  foo[k] += bar.get(k, 0)

An alternative:

for k in bar:
foo[k] += bar[k]

The OP asserts that foo keys are a superset of bar keys. If that
assertion is not true (i.e. there are keys in bar that are not in foo,
your code will silently ignore them whereas mine will cause an
exception to be raised (better behaviour IMHO). If the assertion is
true, mine runs faster (even when len(foo) == len(bar).
 
S

Steven D'Aprano

This should be pretty simple: I have two dictionaries, foo and bar. I
am certain that all keys in bar belong to foo as well, but I also know
that not all keys in foo exist in bar. All the keys in both foo and bar
are tuples (in the bigram form ('word1', 'word2)). I have to prime foo
so that each key has a value of 1.

The old way:

foo = {}
for key in all_the_keys:
foo[key] = 1


The new way:

foo = dict.fromkeys(all_the_keys, 1)

The values for the keys in bar are
variable integers. All I want to do is run a loop through foo, match
any of its keys that also exist in bar, and add those key's values in
bar to the preexisting value of 1 for the corresponding key in foo. So
in the end the key,value pairs in foo won't necessarily be, for example,
'tuple1: 1', but also 'tuple2: 31' if tuple2 had a value of 30 in bar.

Harder to say what you want to do than to just do it.

The long way:

for key in foo:
if bar.has_key(key):
foo[key] = foo[key] + bar[key]



Probably a better way:

for key, value in foo.iteritems():
foo[key] = value + bar.get(key, 0)



You should also investigate the update method of dictionaries. From an
interactive session, type:

help({}.update)

then the Enter key.
 
B

Brandon

"Harder to say what you want to do than to just do it."

The truly terrible thing is when you know that's the case even as
you're saying it. Thanks for the help, all!
 
J

John Machin

This should be pretty simple: I have two dictionaries, foo and bar. I
am certain that all keys in bar belong to foo as well, but I also know
that not all keys in foo exist in bar. All the keys in both foo and bar
are tuples (in the bigram form ('word1', 'word2)). I have to prime foo
so that each key has a value of 1. [snip]
The values for the keys in bar are
variable integers. All I want to do is run a loop through foo, match
any of its keys that also exist in bar, and add those key's values in
bar to the preexisting value of 1 for the corresponding key in foo. So
in the end the key,value pairs in foo won't necessarily be, for example,
'tuple1: 1', but also 'tuple2: 31' if tuple2 had a value of 30 in bar.

Harder to say what you want to do than to just do it.

The long way:

for key in foo:
if bar.has_key(key):

dict.has_key(key) is nigh on obsolete since Python 2.2 introduced the
"key in dict" syntax.
foo[key] = foo[key] + bar[key]

and foo[key] += bar[key] works in Python 2.1, maybe earlier.
Probably a better way:

for key, value in foo.iteritems():
foo[key] = value + bar.get(key, 0)

Yeah, probably better than using has_key ...
You should also investigate the update method of dictionaries. From an
interactive session, type:

help({}.update)

then the Enter key.

I'm not sure what relevance dict.update has to the OP's problem.

Help is fine for when you need a reminder of the syntax of some method
you already know about. I'd suggest reading the manual of a modern
version of Python (http://docs.python.org/lib/typesmapping.html) to
get an overview of all the dict methods. The manual includes useful
information that isn't in help, like "a.has_key(k) Equivalent to k
in a, use that form in new code".
 
B

Brandon

I wasn't sure about the update method either, since AFAICT (not far)
the values would in fact update, not append as I needed them to. But
the iteritems and get combo definitely worked for me.

Thank you for the suggested link. I'm familiar with that page, but my
skill level isn't so far along yet that I can more or less intuitively
see how to combine methods, particularly in dictionaries. What would
be a dream for me is if somebody just had tons of use-case examples -
basically this post, condensed, for every potent combination of
dictionary methods. A guy can dream.
 
J

John Machin

I wasn't sure about the update method either, since AFAICT (not far)
the values would in fact update, not append as I needed them to.

"append"? Don't you mean "add"???
But
the iteritems and get combo definitely worked for me.

Under some definition of "worked", yes, it would. What were your
selection criteria?
Thank you for the suggested link. I'm familiar with that page, but my
skill level isn't so far along yet that I can more or less intuitively
see how to combine methods, particularly in dictionaries. What would
be a dream for me is if somebody just had tons of use-case examples -
basically this post, condensed, for every potent combination of
dictionary methods. A guy can dream.

Nobody is going to write that, and if they did, what would you do?
Read it linearly, trying to find a match to your use-case? Forget
dreams. What you need to do is practice translating from your
requirements into Python, and it's not all that hard:

"run a loop through foo" -> for key in foo:
"match any of its keys that also exist in bar" -> if key in bar:
"add those key's values in bar to the preexisting value for the
corresponding key in foo" -> foo[key] += bar[key]

But you also need to examine your requirements:
(1) on a mechanical level, as I tried to point out in my first
response, if as you say all keys in bar are also in foo, you can
iterate over bar instead of and faster than iterating over foo.
(2) at a higher level, it looks like bar contains a key for every
possible bigram, and you are tallying actual counts in bar, and what
you want out for any bigram is (1 + number_of_occurrences) i.e.
Laplace adjustment. Are you sure you really need to do this two-dict
caper? Consider using only one dictionary (zot):

Initialise:
zot = {}

To tally:
if key in zot:
zot[key] += 1
else:
zot[key] = 1

Adjusted count (irrespective of whether bigram exists or not):
zot.get(key, 0) + 1

This method uses space proportional to the number of bigrams that
actually exist. You might also consider collections.defaultdict, but
such a dict may end up containing entries for keys that you ask about
(depending on how you ask), not just ones that exist.

HTH,
John
 
B

Brandon

John:
"append"? Don't you mean "add"???

Yes, that is what I meant, my apologies.
What you need to do is practice translating from your
requirements into Python, and it's not all that hard:

"run a loop through foo" -> for key in foo:
"match any of its keys that also exist in bar" -> if key in bar:
"add those key's values in bar to the preexisting value for the
corresponding key in foo" -> foo[key] += bar[key]

Due to my current level of numbskullery, when I start to see things
like tuples as keys, the apparent ease of this evaporates in front of
my eyes! I know that I need more practice, though, and it will come.
But you also need to examine your requirements:
(1) on a mechanical level, as I tried to point out in my first
response, if as you say all keys in bar are also in foo, you can
iterate over bar instead of and faster than iterating over foo.
(2) at a higher level, it looks like bar contains a key for every
possible bigram, and you are tallying actual counts in bar, and what
you want out for any bigram is (1 + number_of_occurrences) i.e.
Laplace adjustment. Are you sure you really need to do this two-dict
caper? Consider using only one dictionary (zot):

Initialise:
zot = {}

To tally:
if key in zot:
zot[key] += 1
else:
zot[key] = 1

Adjusted count (irrespective of whether bigram exists or not):
zot.get(key, 0) + 1

This method uses space proportional to the number of bigrams that
actually exist. You might also consider collections.defaultdict, but
such a dict may end up containing entries for keys that you ask about
(depending on how you ask), not just ones that exist.

You are very correct about the Laplace adjustment. However, a more
precise statement of my overall problem would involve training and
testing which utilizes bigram probabilities derived in part from the
Laplace adjustment; as I understand the workflow that I should follow,
I can't allow myself to be constrained only to bigrams that actually
exist in training or my overall probability when I run through testing
will be thrown off to 0 as soon as a test bigram that doesn't exist in
training is encountered. Hence my desire to find all possible bigrams
in train (having taken steps to ensure proper set relations between
train and test). The best way I can currently see to do this is with
my current two-dictionary "caper", and by iterating over foo, not
bar :)

And yes, I know it seems silly to wish for that document with the use-
cases, but personally speaking, even if the thing is rather lengthy, I
would probably pick up better techniques for general knowledge by
reading through it and seeing the examples.

I actually think that there would be a good market (if only in
mindshare) for a thorough examination of the power of lists, nested
lists, and dictionaries (with glorious examples) - something that
might appeal to a lot of non-full time programmers who need to script
a lot but want to be efficient about it, yet don't want to deal with a
tutorial that unnecessarily covers all the aspects of Python. My
$0.027 (having gone up due to the commodities markets).

Thanks again for the input, I do appreciate it!

Brandon
 
J

John Machin

You are very correct about the Laplace adjustment. However, a more
precise statement of my overall problem would involve training and
testing which utilizes bigram probabilities derived in part from the
Laplace adjustment; as I understand the workflow that I should follow,
I can't allow myself to be constrained only to bigrams that actually
exist in training or my overall probability when I run through testing
will be thrown off to 0 as soon as a test bigram that doesn't exist in
training is encountered. Hence my desire to find all possible bigrams
in train (having taken steps to ensure proper set relations between
train and test).
The best way I can currently see to do this is with
my current two-dictionary "caper", and by iterating over foo, not
bar :)

I can't grok large chunks of the above, especially these troublesome
test bigrams that don't exist in training but which you desire to find
in train(ing?).

However let's look at the mechanics: Are you now saying that your
original assertion "I am certain that all keys in bar belong to foo as
well" was not quite "precise"? If not, please explain why you think
you need to iterate (slowly) over foo in order to accomplish your
stated task.
 
B

Brandon

I can't grok large chunks of the above, especially these troublesome
test bigrams that don't exist in training but which you desire to find
in train(ing?).

However let's look at the mechanics: Are you now saying that your
original assertion "I am certain that all keys in bar belong to foo as
well" was not quite "precise"? If not, please explain why you think
you need to iterate (slowly) over foo in order to accomplish your
stated task.

I was merely trying to be brief. The statement of my certainty about
foo/bar was precise as a stand-alone statement, but I was attempting
to say that within the context of the larger problem, I need to
iterate over foo.

This is actually for a school project, but as I have already worked
out a feasible (if perhaps not entirely optimized) workflow, I don't
feel overly guilty about sharing this or getting some small amount of
input - but certainly none is asked for beyond what you've given
me :) I am tasked with finding the joint probability of a test
sequence, utilizing bigram probabilities derived from train(ing)
counts.

I have ensured that all members (unigrams) of test are also members of
train, although I do not have any idea as to bigram frequencies in
test. Thus I need to iterate over all members of train for training
bigram frequencies in order to be prepared for any test bigram I might
encounter.

The problem is that without Laplace smoothing, many POTENTIAL bigrams
in train might have an ACTUAL frequency of 0 in train. And if one or
more of those bigrams which have 0 frequency in train is actually
found in test, the joint probability of test will become 0, and that's
no fun at all. So I made foo dictionary that creates all POTENTIAL
training bigrams with a smoothed frequency of 1. I also made bar
dictionary that creates keys of all ACTUAL training bigrams with their
actual values. I needed to combine the two dictionaries as a first
step to eventually finding the test sequence probability. So any
bigram in test will at least have a smoothed train frequency of 1 and
possibly a smoothed train frequency of the existing train value + 1.
Having iterated over foo, foo becomes the dictionary which holds these
smoothed & combined train frequencies. I don't see a way to combine
the two types of counts into one dictionary without keeping them
separate first. Hence the caper.

Sorry for the small essay.

P.S. I do realize that there are better smoothing methods than
Laplace, but that is what the problem has specified.
 
J

John Machin

I was merely trying to be brief. The statement of my certainty about
foo/bar was precise as a stand-alone statement, but I was attempting
to say that within the context of the larger problem, I need to
iterate over foo.

This is actually for a school project, but as I have already worked
out a feasible (if perhaps not entirely optimized) workflow, I don't
feel overly guilty about sharing this or getting some small amount of
input - but certainly none is asked for beyond what you've given
me :) I am tasked with finding the joint probability of a test
sequence, utilizing bigram probabilities derived from train(ing)
counts.

I have ensured that all members (unigrams) of test are also members of
train, although I do not have any idea as to bigram frequencies in
test. Thus I need to iterate over all members of train for training
bigram frequencies in order to be prepared for any test bigram I might
encounter.

The problem is that without Laplace smoothing, many POTENTIAL bigrams
in train might have an ACTUAL frequency of 0 in train. And if one or
more of those bigrams which have 0 frequency in train is actually
found in test, the joint probability of test will become 0, and that's
no fun at all. So I made foo dictionary that creates all POTENTIAL
training bigrams with a smoothed frequency of 1. I also made bar
dictionary that creates keys of all ACTUAL training bigrams with their
actual values. I needed to combine the two dictionaries as a first
step to eventually finding the test sequence probability.

Let's assume this need is real for the moment. Put this loop in your
code after the creation of foo and bar and before you "combine" them:

for key in bar:
assert key in foo

Does it cause an exception? If so, either:
you have a bug in the creation of foo or bar (or both!),
or:
the certainty you had in making your opening statement "I am
certain that all keys in bar belong to foo as well" was not well-
founded.

If however it is correct that all keys in bar are also to be found in
foo, then the following snippets of code are equivalent for your
purpose of adding bar frequencies into foo:

(1) iterating over foo:
for key in foo:
foo[key] += bar.get(key, 0)

(2) iterating over bar:
for key in bar:
foo[key] += bar[key]

I (again) challenge you to say *why* you feel that the "iterating over
bar" solution will not work.

So any
bigram in test will at least have a smoothed train frequency of 1 and
possibly a smoothed train frequency of the existing train value + 1.
Having iterated over foo, foo becomes the dictionary which holds these
smoothed & combined train frequencies. I don't see a way to combine
the two types of counts into one dictionary without keeping them
separate first. Hence the caper.

Let's start with "So I made foo dictionary that creates all POTENTIAL
training bigrams with a smoothed frequency of 1". Let me guess that
you have a set W of all words ever used/usable in the language of the
texts that you are considering ... let N = len(W). So the number of
potential bigrams is N**2. Hmmm, how large is N, and have you actually
run the foo-building code yet?

Now, assuming foo does fit in memory etc, you get to the stage where
you have a test message containing a bigram b = (word1, word2). Its
smoothed frequency will be foo. If b is in bar, this should be
equal to bar + 1. Otherwise it will be 1.

So:
(1) foo == bar.get(b, 0) + 1
(2) foo is redundant. If you want to check that b is "legal", use
(word1 in W and word2 in W).

Please attempt to refute the specific points above, rather than
writing another essay :)

Cheers,
John
 
B

Brandon

(1) iterating over foo:
for key in foo:
foo[key] += bar.get(key, 0)

(2) iterating over bar:
for key in bar:
foo[key] += bar[key]

I (again) challenge you to say *why* you feel that the "iterating over
bar" solution will not work.


Well if you're going to be clever enough to iterate over bar and then
send the results to another dictionary altogether, I obviously cannot
put up a good argument on this matter!

Thanks for the input, I appreciate it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top