# Pickling dictionaries containing dictionaries: failing,recursion-style!

Discussion in 'Python' started by lysdexia, Dec 1, 2007.

1. ### lysdexiaGuest

I'm having great fun playing with Markov chains. I am making a
dictionary of all the words in a given string, getting a count of how
many appearances word1 makes in the string, getting a list of all the
word2s that follow each appearance of word1 and a count of how many
times word2 appears in the string as well. (I know I should probably
be only counting how many times word2 actually follows word1, but as I
said, I'm having great fun playing ...)

printed output of the dictionary looks like so:

{'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1, {'down':
1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
{'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}

Here's the actual function.

def assembleVocab(self):
self.wordDB = {}
for word in self.words:
try:
if not word in self.wordDB.keys():
wordsWeights = {}
afterwords = [self.words[i + 1] for i, e in
enumerate(self.words) if e == word]
for aw in afterwords:
if not aw in wordsWeights.keys():
wordsWeights[aw] = afterwords.count(aw)
self.wordDB[word] = [self.words.count(word), wordsWeights]
except:
pass
out = open("mchain.pkl",'wb')
pickle.dump(self.wordDB, out, -1)
out.close()

My problem is, I can't seem to get it to unpickle. When I attempt to
saved data, I get:

AttributeError: 'tuple' object has no attribute 'readline'

with pickle, and

Looking at the pickle pages on docs.python.org, I see that I am
indeed
supposed to be able to pickle ``tuples, lists, sets, and dictionaries
containing only picklable objects''.

I'm sure I'm missing something obvious. Clues?

lysdexia, Dec 1, 2007

2. ### Paul RubinGuest

Re: Pickling dictionaries containing dictionaries: failing, recursion-style!

lysdexia <> writes:
> self.wordDB[word] = [self.words.count(word), wordsWeights]

what is self.words.count? Could it be an iterator? I don't think you
can pickle those.

Paul Rubin, Dec 1, 2007

3. ### David TweetGuest

Are you opening the file in binary mode ("rb") before doing pickle.load on it?

On 01 Dec 2007 14:13:33 -0800, Paul Rubin
<"http://phr.cx"@nospam.invalid> wrote:
> lysdexia <> writes:
> > self.wordDB[word] = [self.words.count(word), wordsWeights]

>
> what is self.words.count? Could it be an iterator? I don't think you
> can pickle those.
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>

--
-David

David Tweet, Dec 1, 2007
4. ### John MachinGuest

On Dec 2, 9:13 am, Paul Rubin <http://> wrote:
> lysdexia <> writes:
> > self.wordDB[word] = [self.words.count(word), wordsWeights]

>
> what is self.words.count? Could it be an iterator? I don't think you
> can pickle those.

Whaaaat??
self.words is obviously an iterable (can you see "for word in
self.words" in his code?), probably just a list.
self.words.count looks like a standard sequence method to me.
self.words.count(word) will return an int -- can you see all those
"[1,", "[2," etc in his printed dict output?

John Machin, Dec 1, 2007
5. ### Paul RubinGuest

Re: Pickling dictionaries containing dictionaries: failing, recursion-style!

John Machin <> writes:
> self.words is obviously an iterable (can you see "for word in
> self.words" in his code?), probably just a list.

It could be a file, in which case its iterator method would read lines
from the file and cause that error message. But I think the answer is
that the pickle itself needs to be opened in binary mode, as someone
else posted.

Paul Rubin, Dec 1, 2007
6. ### John MachinGuest

On Dec 2, 8:59 am, lysdexia <> wrote:
> I'm having great fun playing with Markov chains. I am making a
> dictionary of all the words in a given string, getting a count of how
> many appearances word1 makes in the string, getting a list of all the
> word2s that follow each appearance of word1 and a count of how many
> times word2 appears in the string as well. (I know I should probably
> be only counting how many times word2 actually follows word1, but as I
> said, I'm having great fun playing ...)
>
> printed output of the dictionary looks like so:
>
> {'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1, {'down':
> 1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
> 2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
> 1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
> 'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
> {'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}
>
> Here's the actual function.
>
> def assembleVocab(self):
> self.wordDB = {}
> for word in self.words:
> try:
> if not word in self.wordDB.keys():
> wordsWeights = {}
> afterwords = [self.words[i + 1] for i, e in
> enumerate(self.words) if e == word]
> for aw in afterwords:
> if not aw in wordsWeights.keys():
> wordsWeights[aw] = afterwords.count(aw)
> self.wordDB[word] = [self.words.count(word), wordsWeights]
> except:
> pass
> out = open("mchain.pkl",'wb')
> pickle.dump(self.wordDB, out, -1)
> out.close()
>
> My problem is, I can't seem to get it to unpickle. When I attempt to
> saved data, I get:
>
> AttributeError: 'tuple' object has no attribute 'readline'
>
> with pickle, and
>

The code that created the dictionary is interesting, but not very
relevant. Please consider posting the code that is actually giving the
error!
>
> Looking at the pickle pages on docs.python.org, I see that I am
> indeed
> supposed to be able to pickle ``tuples, lists, sets, and dictionaries
> containing only picklable objects''.
>
> I'm sure I'm missing something obvious. Clues?

The docs for pickle.load(file) say """
Read a string from the open file object file and interpret it as a
pickle data stream, reconstructing and returning the original object
hierarchy. This is equivalent to Unpickler(file).load().

file must have two methods, a read() method that takes an integer
argument, and a readline() method that requires no arguments. Both
methods should return a string. Thus file can be a file object opened
for reading, a StringIO object, or any other custom object that meets
this interface.
"""

The error message(s) [plural??] that you are getting suggest(s) that
the argument that you supplied was *not* an open file object nor
anything else with both a read and readline method. Open the file in
binary mode ('rb') and pass the result to pickle.load.

John Machin, Dec 1, 2007
7. ### John MachinGuest

On Dec 2, 9:49 am, Paul Rubin <http://> wrote:
> John Machin <> writes:
> > self.words is obviously an iterable (can you see "for word in
> > self.words" in his code?), probably just a list.

>
> It could be a file, in which case its iterator method would read lines
> from the file and cause that error message.

Impossible:
(1) in "for word in words:" each word would end in "\n" and he'd have
to strip those and there's no evidence of that.
(2) Look at the line """afterwords = [self.words[i + 1] for i, e in
enumerate(self.words) if e == word]"""
and tell me how that works if self.words is a file!
(3) "self.words.count(word)" -- AttributeError: 'file' object has no
attribute 'count'

> But I think the answer is
> that the pickle itself needs to be opened in binary mode, as someone
> else posted.

The answer is (1) he needs to supply a file of any kind for a start
[read the error messages that he got!!]
(2) despite the silence of the docs, it is necessary to have opened
the file in binary mode on systems where it makes a difference
(notably Windows)

[If the OP is still reading this thread, here's an example of how to
show a problem, with minimal code that reproduces the problem, and all
the output including the stack trace]

C:\junk>type dpkl.py
import pickle

d = {'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1,
{'down':
1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
{'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}

s = pickle.dumps(d, -1)
print "string", dnews == d

out = open("mchain.pkl",'wb')
pickle.dump(d, out, -1)
out.close()

f = open("mchain.pkl", "rb")
f.close()
print "load binary", dnewb == d

f = open("mchain.pkl", "r")
f.close()
print "load text", dnewa == d

C:\junk>python dpkl.py
string True
Traceback (most recent call last):
File "dpkl.py", line 24, in <module>
File "c:\python25\lib\pickle.py", line 1370, in load
File "c:\python25\lib\pickle.py", line 858, in load
dispatch[key](self)
File "c:\python25\lib\pickle.py", line 1169, in load_binput
TypeError: ord() expected a character, but string of length 0 found

Changing the first line to
import cPickle as pickle
gives this:

C:\junk>python dpkl.py
string True
Traceback (most recent call last):
File "dpkl.py", line 24, in <module>
EOFError

Each of the two different errors indicate that reading was terminated
prematurely by the presence of the good ol' ^Z aka CPMEOF in the file:

>>> s.find(chr(26))

179
>>> len(s)

363

HTH,
John

John Machin, Dec 2, 2007