pickle module doens't work


O

Omer Korat

Hi all,

I'm working on a project in Python 2.7. I have a few large objects, and I want to save them for later use, so that it will be possible to load them whole from a file, instead of creating them every time anew. It is critical that they be transportable between platforms. Problem is, when I use the 2.7 pickle module, all I get is a file containing a string representing the commands used to create the object. But there's nothing I can do with this string, because it only contains information about the object's module, class and parameters. And that way, they aren't transportable.
In python 3.3 this problem is solved, and the pickle.dump generates a series of bytes, which can be loaded in any other module independently of anything. But in my project, I need NLTK 2.0, which is written in python 2.7...

Anybody has suggestions? Maybe there is a way to use pickle so that it yields the results I need? Or is there any other module that does pickle's job?Or perhaps there is a way to mechanically translate between python versions, so I'll be able to use pickle from 3.3 inside an application written in 2.7? Or perhaps somebody knows of a way to embed a piece of 3.3 code insidea 2.7 program?

It can't be I'm the only one who wants to save python objects for later use! There must be a standard method to do this, but I couldn't find any on the web!
If someone can solve this for me I'll be so grateful.
 
Ad

Advertisements

P

Peter Otten

Omer said:
I'm working on a project in Python 2.7. I have a few large objects, and I
want to save them for later use, so that it will be possible to load them
whole from a file, instead of creating them every time anew. It is
critical that they be transportable between platforms. Problem is, when I
use the 2.7 pickle module, all I get is a file containing a string
representing the commands used to create the object. But there's nothing I
can do with this string, because it only contains information about the
object's module, class and parameters. And that way, they aren't
transportable. In python 3.3 this problem is solved, and the pickle.dump
generates a series of bytes, which can be loaded in any other module
independently of anything. But in my project, I need NLTK 2.0, which is
written in python 2.7...

Anybody has suggestions? Maybe there is a way to use pickle so that it
yields the results I need? Or is there any other module that does pickle's
job? Or perhaps there is a way to mechanically translate between python
versions, so I'll be able to use pickle from 3.3 inside an application
written in 2.7? Or perhaps somebody knows of a way to embed a piece of 3.3
code inside a 2.7 program?

It can't be I'm the only one who wants to save python objects for later
use! There must be a standard method to do this, but I couldn't find any
on the web! If someone can solve this for me I'll be so grateful.

Pickling works the same way in Python 2 and Python 3. For classes only the
names are dumped, so you need (the same version of) NLTK on the source and
the destination platform.

If you can provide a short demo of what works in Python 3 but fails in
Python 2 we may be able to find the actual problem or misunderstanding.
Maybe it is just that different protocols are used by default? I so, try

with open(filename, "wb") as f:
pickle.dump(f, your_data, protocol=pickle.HIGHEST_PROTOCOL)
 
O

Omer Korat

You're probably right in general, for me the 3.3 and 2.7 pickles definitely don't work the same:

3.3:<type 'bytes'>

2.7:<type 'str'>


As you can see, in 2.7 when I try to dump something, I get useless string. Look what I gen when I dump an NLTK object such as the sent_tokenize function:

'\x80\x02cnltk.tokenize\nsent_tokenize\ng\x00'

Now, this is useless. If I try to load it on a platform without NLTK installed on it, I get:

ImportError: No module named 'nltk'

So it means the actual sent_tokenizer wasn't saved. Just it's module.
 
O

Omer Korat

You're probably right in general, for me the 3.3 and 2.7 pickles definitely don't work the same:

3.3:<type 'bytes'>

2.7:<type 'str'>


As you can see, in 2.7 when I try to dump something, I get useless string. Look what I gen when I dump an NLTK object such as the sent_tokenize function:

'\x80\x02cnltk.tokenize\nsent_tokenize\ng\x00'

Now, this is useless. If I try to load it on a platform without NLTK installed on it, I get:

ImportError: No module named 'nltk'

So it means the actual sent_tokenizer wasn't saved. Just it's module.
 
O

Omer Korat

I see. In that case, all I have to do is make sure NLTK is available when I load the pickled objects. That pretty much solves my problem. Thanks!
So it means pickle doesn't ever save the object's values, only how it was created?

Say I have a large object that requires a lot of time to train on data. It means pickle doesn't save its values, so you have to train it every time anew? Is there no way to save its trained values?
 
O

Omer Korat

I see. In that case, all I have to do is make sure NLTK is available when I load the pickled objects. That pretty much solves my problem. Thanks!
So it means pickle doesn't ever save the object's values, only how it was created?

Say I have a large object that requires a lot of time to train on data. It means pickle doesn't save its values, so you have to train it every time anew? Is there no way to save its trained values?
 
Ad

Advertisements

C

Chris Angelico

I see. In that case, all I have to do is make sure NLTK is available when I load the pickled objects. That pretty much solves my problem. Thanks!
So it means pickle doesn't ever save the object's values, only how it was created?

Say I have a large object that requires a lot of time to train on data. It means pickle doesn't save its values, so you have to train it every time anew? Is there no way to save its trained values?

It'll save instance data but not class data or code. So it'll save all
that content, and it assumes that class data is either static or will
be recreated appropriately during unpickling.

ChrisA
 
T

Terry Reedy

Perhaps you'd rather see it in the Python docs.

http://docs.python.org/2/library/pickle.html
http://docs.python.org/3.3/library/pickle.html

pickle <http://docs.python.org/2/library/pickle.html#module-pickle>can
save and restore class instances transparently, however the class
definition must be importable and live in the same module as when the
object was stored.
and
Similarly, when class instances are pickled, their class’s codeand data
are not pickled along with them. Only the instance data are pickled.
This is done on purpose, so you can fix bugs in a class or add methods
to the class and still load objects that were created with an earlier
version of the class.

I should point out the the above was probably written before the
(partial) unification of types and classes in 2.2 (completed in 3.3). So
'class' is referring to 'Python-coded class' and 'code' is referring to
'(compiled) Python code', and not machine code. Now, everything that
pickle pickles is a 'class instance' and class code can be compiled from
either Python or the interpreter's system language (C, Java, C#, others,
or even Python itself).
 
O

Omer Korat

I am using the nltk.classify.MaxEntClassifier. This object has a set of labels, and a set of probabilities: P(label | features). It modifies this probability given data. SO for example, if you tell this object that the label L appears 60% of the time with the feature F, then P(L | F) = 0.6.
The point is, there is no way to access the probabilities directly. The object's 'classify' method uses these probabilities, but you can't call them as an object property.
In order to adjust probabilities, you have to call the object's 'train' method, and feed classified data in.
So is there any way to save a MaxEntClassifier object, with its classification probabilities, without having to call the 'train' method?
 
Ad

Advertisements

O

Omer Korat

Yeah, right. I didn't think about that. I'll check in the source how the data is stored.
Thanks for helping sort it all out.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top