pickle module doens't work

Discussion in 'Python' started by Omer Korat, Dec 27, 2012.

  1. Omer Korat

    Omer Korat Guest

    Hi all,

    I'm working on a project in Python 2.7. I have a few large objects, and I want to save them for later use, so that it will be possible to load them whole from a file, instead of creating them every time anew. It is critical that they be transportable between platforms. Problem is, when I use the 2.7 pickle module, all I get is a file containing a string representing the commands used to create the object. But there's nothing I can do with this string, because it only contains information about the object's module, class and parameters. And that way, they aren't transportable.
    In python 3.3 this problem is solved, and the pickle.dump generates a series of bytes, which can be loaded in any other module independently of anything. But in my project, I need NLTK 2.0, which is written in python 2.7...

    Anybody has suggestions? Maybe there is a way to use pickle so that it yields the results I need? Or is there any other module that does pickle's job?Or perhaps there is a way to mechanically translate between python versions, so I'll be able to use pickle from 3.3 inside an application written in 2.7? Or perhaps somebody knows of a way to embed a piece of 3.3 code insidea 2.7 program?

    It can't be I'm the only one who wants to save python objects for later use! There must be a standard method to do this, but I couldn't find any on the web!
    If someone can solve this for me I'll be so grateful.
     
    Omer Korat, Dec 27, 2012
    #1
    1. Advertising

  2. Omer Korat

    Peter Otten Guest

    Omer Korat wrote:

    > I'm working on a project in Python 2.7. I have a few large objects, and I
    > want to save them for later use, so that it will be possible to load them
    > whole from a file, instead of creating them every time anew. It is
    > critical that they be transportable between platforms. Problem is, when I
    > use the 2.7 pickle module, all I get is a file containing a string
    > representing the commands used to create the object. But there's nothing I
    > can do with this string, because it only contains information about the
    > object's module, class and parameters. And that way, they aren't
    > transportable. In python 3.3 this problem is solved, and the pickle.dump
    > generates a series of bytes, which can be loaded in any other module
    > independently of anything. But in my project, I need NLTK 2.0, which is
    > written in python 2.7...
    >
    > Anybody has suggestions? Maybe there is a way to use pickle so that it
    > yields the results I need? Or is there any other module that does pickle's
    > job? Or perhaps there is a way to mechanically translate between python
    > versions, so I'll be able to use pickle from 3.3 inside an application
    > written in 2.7? Or perhaps somebody knows of a way to embed a piece of 3.3
    > code inside a 2.7 program?
    >
    > It can't be I'm the only one who wants to save python objects for later
    > use! There must be a standard method to do this, but I couldn't find any
    > on the web! If someone can solve this for me I'll be so grateful.


    Pickling works the same way in Python 2 and Python 3. For classes only the
    names are dumped, so you need (the same version of) NLTK on the source and
    the destination platform.

    If you can provide a short demo of what works in Python 3 but fails in
    Python 2 we may be able to find the actual problem or misunderstanding.
    Maybe it is just that different protocols are used by default? I so, try

    with open(filename, "wb") as f:
    pickle.dump(f, your_data, protocol=pickle.HIGHEST_PROTOCOL)
     
    Peter Otten, Dec 27, 2012
    #2
    1. Advertising

  3. Omer Korat

    Omer Korat Guest

    You're probably right in general, for me the 3.3 and 2.7 pickles definitely don't work the same:

    3.3:
    >>> type(pickle.dumps(1))

    <type 'bytes'>

    2.7:
    >>> type(pickle.dumps(1, pickle.HIGHEST_PROTOCOL))

    <type 'str'>


    As you can see, in 2.7 when I try to dump something, I get useless string. Look what I gen when I dump an NLTK object such as the sent_tokenize function:

    '\x80\x02cnltk.tokenize\nsent_tokenize\ng\x00'

    Now, this is useless. If I try to load it on a platform without NLTK installed on it, I get:

    ImportError: No module named 'nltk'

    So it means the actual sent_tokenizer wasn't saved. Just it's module.
     
    Omer Korat, Dec 27, 2012
    #3
  4. Omer Korat

    Omer Korat Guest

    You're probably right in general, for me the 3.3 and 2.7 pickles definitely don't work the same:

    3.3:
    >>> type(pickle.dumps(1))

    <type 'bytes'>

    2.7:
    >>> type(pickle.dumps(1, pickle.HIGHEST_PROTOCOL))

    <type 'str'>


    As you can see, in 2.7 when I try to dump something, I get useless string. Look what I gen when I dump an NLTK object such as the sent_tokenize function:

    '\x80\x02cnltk.tokenize\nsent_tokenize\ng\x00'

    Now, this is useless. If I try to load it on a platform without NLTK installed on it, I get:

    ImportError: No module named 'nltk'

    So it means the actual sent_tokenizer wasn't saved. Just it's module.
     
    Omer Korat, Dec 27, 2012
    #4
  5. Omer Korat

    Omer Korat Guest

    I see. In that case, all I have to do is make sure NLTK is available when I load the pickled objects. That pretty much solves my problem. Thanks!
    So it means pickle doesn't ever save the object's values, only how it was created?

    Say I have a large object that requires a lot of time to train on data. It means pickle doesn't save its values, so you have to train it every time anew? Is there no way to save its trained values?
     
    Omer Korat, Dec 27, 2012
    #5
  6. Omer Korat

    Omer Korat Guest

    I see. In that case, all I have to do is make sure NLTK is available when I load the pickled objects. That pretty much solves my problem. Thanks!
    So it means pickle doesn't ever save the object's values, only how it was created?

    Say I have a large object that requires a lot of time to train on data. It means pickle doesn't save its values, so you have to train it every time anew? Is there no way to save its trained values?
     
    Omer Korat, Dec 27, 2012
    #6
  7. On Fri, Dec 28, 2012 at 12:16 AM, Omer Korat
    <> wrote:
    > I see. In that case, all I have to do is make sure NLTK is available when I load the pickled objects. That pretty much solves my problem. Thanks!
    > So it means pickle doesn't ever save the object's values, only how it was created?
    >
    > Say I have a large object that requires a lot of time to train on data. It means pickle doesn't save its values, so you have to train it every time anew? Is there no way to save its trained values?


    It'll save instance data but not class data or code. So it'll save all
    that content, and it assumes that class data is either static or will
    be recreated appropriately during unpickling.

    ChrisA
     
    Chris Angelico, Dec 27, 2012
    #7
  8. Omer Korat

    Terry Reedy Guest

    On 12/27/2012 7:34 AM, Dave Angel wrote:

    > Perhaps you'd rather see it in the Python docs.
    >
    > http://docs.python.org/2/library/pickle.html
    > http://docs.python.org/3.3/library/pickle.html
    >
    > pickle <http://docs.python.org/2/library/pickle.html#module-pickle>can
    > save and restore class instances transparently, however the class
    > definition must be importable and live in the same module as when the
    > object was stored.
    > and
    > Similarly, when class instances are pickled, their class’s codeand data
    > are not pickled along with them. Only the instance data are pickled.
    > This is done on purpose, so you can fix bugs in a class or add methods
    > to the class and still load objects that were created with an earlier
    > version of the class.


    I should point out the the above was probably written before the
    (partial) unification of types and classes in 2.2 (completed in 3.3). So
    'class' is referring to 'Python-coded class' and 'code' is referring to
    '(compiled) Python code', and not machine code. Now, everything that
    pickle pickles is a 'class instance' and class code can be compiled from
    either Python or the interpreter's system language (C, Java, C#, others,
    or even Python itself).

    --
    Terry Jan Reedy
     
    Terry Reedy, Dec 27, 2012
    #8
  9. Omer Korat

    Omer Korat Guest

    I am using the nltk.classify.MaxEntClassifier. This object has a set of labels, and a set of probabilities: P(label | features). It modifies this probability given data. SO for example, if you tell this object that the label L appears 60% of the time with the feature F, then P(L | F) = 0.6.
    The point is, there is no way to access the probabilities directly. The object's 'classify' method uses these probabilities, but you can't call them as an object property.
    In order to adjust probabilities, you have to call the object's 'train' method, and feed classified data in.
    So is there any way to save a MaxEntClassifier object, with its classification probabilities, without having to call the 'train' method?
     
    Omer Korat, Jan 1, 2013
    #9
  10. Omer Korat

    Omer Korat Guest

    Yeah, right. I didn't think about that. I'll check in the source how the data is stored.
    Thanks for helping sort it all out.
     
    Omer Korat, Jan 2, 2013
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. schapopa

    Buttons doens't work

    schapopa, Feb 14, 2005, in forum: ASP .Net
    Replies:
    3
    Views:
    374
    schapopa
    Feb 15, 2005
  2. Shapper

    Why this code line doens't work?

    Shapper, Apr 29, 2005, in forum: ASP .Net
    Replies:
    4
    Views:
    2,619
    Shapper
    Apr 29, 2005
  3. Noozer
    Replies:
    3
    Views:
    579
    Noozer
    Oct 12, 2004
  4. a pickle's pickle

    , Aug 2, 2005, in forum: Python
    Replies:
    4
    Views:
    411
  5. Filip De Backer

    System.DirectoryServices doens't work

    Filip De Backer, Aug 26, 2005, in forum: ASP .Net Security
    Replies:
    1
    Views:
    131
Loading...

Share This Page