Pickling dictionaries containing dictionaries: failing,recursion-style!

Discussion in 'Python' started by lysdexia, Dec 1, 2007.

  1. lysdexia

    lysdexia Guest

    I'm having great fun playing with Markov chains. I am making a
    dictionary of all the words in a given string, getting a count of how
    many appearances word1 makes in the string, getting a list of all the
    word2s that follow each appearance of word1 and a count of how many
    times word2 appears in the string as well. (I know I should probably
    be only counting how many times word2 actually follows word1, but as I
    said, I'm having great fun playing ...)


    printed output of the dictionary looks like so:

    {'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1, {'down':
    1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
    2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
    1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
    'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
    {'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}

    Here's the actual function.

    def assembleVocab(self):
    self.wordDB = {}
    for word in self.words:
    try:
    if not word in self.wordDB.keys():
    wordsWeights = {}
    afterwords = [self.words[i + 1] for i, e in
    enumerate(self.words) if e == word]
    for aw in afterwords:
    if not aw in wordsWeights.keys():
    wordsWeights[aw] = afterwords.count(aw)
    self.wordDB[word] = [self.words.count(word), wordsWeights]
    except:
    pass
    out = open("mchain.pkl",'wb')
    pickle.dump(self.wordDB, out, -1)
    out.close()

    My problem is, I can't seem to get it to unpickle. When I attempt to
    load the
    saved data, I get:

    AttributeError: 'tuple' object has no attribute 'readline'

    with pickle, and

    TypeError: argument must have 'read' and 'readline' attributes

    Looking at the pickle pages on docs.python.org, I see that I am
    indeed
    supposed to be able to pickle ``tuples, lists, sets, and dictionaries
    containing only picklable objects''.

    I'm sure I'm missing something obvious. Clues?
    lysdexia, Dec 1, 2007
    #1
    1. Advertising

  2. lysdexia

    Paul Rubin Guest

    Re: Pickling dictionaries containing dictionaries: failing, recursion-style!

    lysdexia <> writes:
    > self.wordDB[word] = [self.words.count(word), wordsWeights]


    what is self.words.count? Could it be an iterator? I don't think you
    can pickle those.
    Paul Rubin, Dec 1, 2007
    #2
    1. Advertising

  3. lysdexia

    David Tweet Guest

    Are you opening the file in binary mode ("rb") before doing pickle.load on it?

    On 01 Dec 2007 14:13:33 -0800, Paul Rubin
    <"http://phr.cx"@nospam.invalid> wrote:
    > lysdexia <> writes:
    > > self.wordDB[word] = [self.words.count(word), wordsWeights]

    >
    > what is self.words.count? Could it be an iterator? I don't think you
    > can pickle those.
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >




    --
    -David
    David Tweet, Dec 1, 2007
    #3
  4. lysdexia

    John Machin Guest

    On Dec 2, 9:13 am, Paul Rubin <http://> wrote:
    > lysdexia <> writes:
    > > self.wordDB[word] = [self.words.count(word), wordsWeights]

    >
    > what is self.words.count? Could it be an iterator? I don't think you
    > can pickle those.


    Whaaaat??
    self.words is obviously an iterable (can you see "for word in
    self.words" in his code?), probably just a list.
    self.words.count looks like a standard sequence method to me.
    self.words.count(word) will return an int -- can you see all those
    "[1,", "[2," etc in his printed dict output?
    John Machin, Dec 1, 2007
    #4
  5. lysdexia

    Paul Rubin Guest

    Re: Pickling dictionaries containing dictionaries: failing, recursion-style!

    John Machin <> writes:
    > self.words is obviously an iterable (can you see "for word in
    > self.words" in his code?), probably just a list.


    It could be a file, in which case its iterator method would read lines
    from the file and cause that error message. But I think the answer is
    that the pickle itself needs to be opened in binary mode, as someone
    else posted.
    Paul Rubin, Dec 1, 2007
    #5
  6. lysdexia

    John Machin Guest

    On Dec 2, 8:59 am, lysdexia <> wrote:
    > I'm having great fun playing with Markov chains. I am making a
    > dictionary of all the words in a given string, getting a count of how
    > many appearances word1 makes in the string, getting a list of all the
    > word2s that follow each appearance of word1 and a count of how many
    > times word2 appears in the string as well. (I know I should probably
    > be only counting how many times word2 actually follows word1, but as I
    > said, I'm having great fun playing ...)
    >
    > printed output of the dictionary looks like so:
    >
    > {'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1, {'down':
    > 1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
    > 2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
    > 1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
    > 'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
    > {'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}
    >
    > Here's the actual function.
    >
    > def assembleVocab(self):
    > self.wordDB = {}
    > for word in self.words:
    > try:
    > if not word in self.wordDB.keys():
    > wordsWeights = {}
    > afterwords = [self.words[i + 1] for i, e in
    > enumerate(self.words) if e == word]
    > for aw in afterwords:
    > if not aw in wordsWeights.keys():
    > wordsWeights[aw] = afterwords.count(aw)
    > self.wordDB[word] = [self.words.count(word), wordsWeights]
    > except:
    > pass
    > out = open("mchain.pkl",'wb')
    > pickle.dump(self.wordDB, out, -1)
    > out.close()
    >
    > My problem is, I can't seem to get it to unpickle. When I attempt to
    > load the
    > saved data, I get:
    >
    > AttributeError: 'tuple' object has no attribute 'readline'
    >
    > with pickle, and
    >
    > TypeError: argument must have 'read' and 'readline' attributes


    The code that created the dictionary is interesting, but not very
    relevant. Please consider posting the code that is actually giving the
    error!
    >
    > Looking at the pickle pages on docs.python.org, I see that I am
    > indeed
    > supposed to be able to pickle ``tuples, lists, sets, and dictionaries
    > containing only picklable objects''.
    >
    > I'm sure I'm missing something obvious. Clues?


    The docs for pickle.load(file) say """
    Read a string from the open file object file and interpret it as a
    pickle data stream, reconstructing and returning the original object
    hierarchy. This is equivalent to Unpickler(file).load().

    file must have two methods, a read() method that takes an integer
    argument, and a readline() method that requires no arguments. Both
    methods should return a string. Thus file can be a file object opened
    for reading, a StringIO object, or any other custom object that meets
    this interface.
    """

    The error message(s) [plural??] that you are getting suggest(s) that
    the argument that you supplied was *not* an open file object nor
    anything else with both a read and readline method. Open the file in
    binary mode ('rb') and pass the result to pickle.load.
    John Machin, Dec 1, 2007
    #6
  7. lysdexia

    John Machin Guest

    On Dec 2, 9:49 am, Paul Rubin <http://> wrote:
    > John Machin <> writes:
    > > self.words is obviously an iterable (can you see "for word in
    > > self.words" in his code?), probably just a list.

    >
    > It could be a file, in which case its iterator method would read lines
    > from the file and cause that error message.


    Impossible:
    (1) in "for word in words:" each word would end in "\n" and he'd have
    to strip those and there's no evidence of that.
    (2) Look at the line """afterwords = [self.words[i + 1] for i, e in
    enumerate(self.words) if e == word]"""
    and tell me how that works if self.words is a file!
    (3) "self.words.count(word)" -- AttributeError: 'file' object has no
    attribute 'count'


    > But I think the answer is
    > that the pickle itself needs to be opened in binary mode, as someone
    > else posted.


    The answer is (1) he needs to supply a file of any kind for a start
    [read the error messages that he got!!]
    (2) despite the silence of the docs, it is necessary to have opened
    the file in binary mode on systems where it makes a difference
    (notably Windows)

    [If the OP is still reading this thread, here's an example of how to
    show a problem, with minimal code that reproduces the problem, and all
    the output including the stack trace]

    C:\junk>type dpkl.py
    import pickle

    d = {'and': [1, {'to': 1}], 'down': [1, {'upon': 1}], 'them': [1,
    {'down':
    1}], 'no': [1, {'others': 1}], 'this': [1, {'it': 1}], 'is': [2, {'a':
    2}], 'upon': [1, {'a': 1}], 'it': [2, {'is': 2}], 'think': [2, {'and':
    1, 'words': 1}], 'write': [1, {'this': 1}], 'to': [3, {'write': 1,
    'put': 1, 'think': 1}], 'words': [1, {'no': 1}], 'others': [1,
    {'think': 1}], 'put': [1, {'them': 1}], 'sin': [2, {'to': 2}]}

    s = pickle.dumps(d, -1)
    dnews = pickle.loads(s)
    print "string", dnews == d

    out = open("mchain.pkl",'wb')
    pickle.dump(d, out, -1)
    out.close()

    f = open("mchain.pkl", "rb")
    dnewb = pickle.load(f)
    f.close()
    print "load binary", dnewb == d

    f = open("mchain.pkl", "r")
    dnewa = pickle.load(f)
    f.close()
    print "load text", dnewa == d

    C:\junk>python dpkl.py
    string True
    load binary True
    Traceback (most recent call last):
    File "dpkl.py", line 24, in <module>
    dnewa = pickle.load(f)
    File "c:\python25\lib\pickle.py", line 1370, in load
    return Unpickler(file).load()
    File "c:\python25\lib\pickle.py", line 858, in load
    dispatch[key](self)
    File "c:\python25\lib\pickle.py", line 1169, in load_binput
    i = ord(self.read(1))
    TypeError: ord() expected a character, but string of length 0 found

    Changing the first line to
    import cPickle as pickle
    gives this:

    C:\junk>python dpkl.py
    string True
    load binary True
    Traceback (most recent call last):
    File "dpkl.py", line 24, in <module>
    dnewa = pickle.load(f)
    EOFError

    Each of the two different errors indicate that reading was terminated
    prematurely by the presence of the good ol' ^Z aka CPMEOF in the file:

    >>> s = open('mchain.pkl', 'rb').read()
    >>> s.find(chr(26))

    179
    >>> len(s)

    363

    HTH,
    John
    John Machin, Dec 2, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dave Brueck
    Replies:
    0
    Views:
    311
    Dave Brueck
    Feb 10, 2004
  2. Dave Brueck
    Replies:
    0
    Views:
    437
    Dave Brueck
    Feb 10, 2004
  3. manstey

    pickling multiple dictionaries

    manstey, May 24, 2006, in forum: Python
    Replies:
    3
    Views:
    263
    manstey
    May 25, 2006
  4. Marco Lierfeld
    Replies:
    6
    Views:
    293
    Marco Lierfeld
    Oct 13, 2006
  5. Replies:
    8
    Views:
    737
    John Reye
    Apr 26, 2012
Loading...

Share This Page