Performance of cPickle module

sh · May 11, 2004

Hi guys,

Well, I have a (maybe dumb) question.

I want to write my own little blog using Python (as a fairly small but doable
project for myself to learn more deaply Python in a web context).

I don't want so far to use a database as a backend, I'd prefer use XML which is
enough for a small amount of data the blog would have to deal with.

My problem is that, as HTTP is stateless, I can't keep some objects alive across
multiple requests like for instance an instance of a Users class which is an
interface to manage my users.

Let's say for example, that an user wants to access a resource (a simple web
page), my code would call that Users class though an instance of it and would
call something like getUser(self,login) returning an instance of an UserData
class which would provide me with all the details of that user (name, email,
etc.)

I want to save my users not in a database like I said but in an XML file on the
server.

As HTTP is stateless, I believe that I will have to create again and again the
Users object for every requests.

I don't want to parse the xml file each time, instead I want to save the Users
object (that keeps a map to all my UserData objects) into a file using the
cPickle module.

My question therefore is, is my architecture efficient enough ?

If I had to use a database, the database would keep track of my users and I
would only need to do a SQL statement. Would the cPickle more efficient in my
case than a database ?

To give a bit of code let's say that I have something like :

import cPickle

class UserData:
def __init__(self,name,email):
self.name = name
self.email = email

class Users:
def __init__(self):
self.users = {}
self.hasChanged = false

def _deserialize(self):
if self.hasChanged == false:
self.users = cPickle.load('users.dat')
else:
#parse the xml file...

Is it an efficient method ?

Thanks
- Sylvain

=?ISO-8859-1?Q?Holger_T=FCrk?= · May 11, 2004

> [...]
If I had to use a database, the database would keep track of my users and I
would only need to do a SQL statement. Would the cPickle more efficient in my
case than a database ?

To give a bit of code let's say that I have something like :

import cPickle

class UserData:
def __init__(self,name,email):
self.name = name
self.email = email

class Users:
def __init__(self):
self.users = {}
self.hasChanged = false

def _deserialize(self):
if self.hasChanged == false:
self.users = cPickle.load('users.dat')
else:
#parse the xml file...

Is it an efficient method ?

Thanks
- Sylvain

Hi,

this may be interesting for others, too.
I modified the example given above a little, entered
1000 users and saved and loaded the Users object
1000 times using cPickle on an Athlon 1GHz.
The results are:

[...]
995
996
997
998
999

real 5m26.115s
user 3m59.570s
sys 0m5.060s

That are 0.326s per save/load-roundtrip.

-rw-r--r-- 1 holger users 173972 2004-05-11 15:23 test.pickle

For 10 users:

[...]
995
996
997
998
999

real 0m5.148s
user 0m2.710s
sys 0m0.740s

0.005s per roundtrip.

-rw-r--r-- 1 holger users 1708 2004-05-11 15:25 test.pickle

That should be fast enough for a weblog application.
The http/cgi-overhead and the concurrent access on the
pickled objects when writing them will probably be
the harder problems.

Greetings,

Holger

Here's the program:
#!/usr/bin/python

import string, random, cPickle

def randString (l):
return "".join ([string.letters [random.randrange (l)] for i in range (l)])

class UserData:
def __init__(self,name,email):
self.name = name
self.email = email

class Users:
def __init__(self):
self.users = {}

u = Users ()

for a in range (1000):
u.users [randString (20)] = (UserData (randString (40), randString (40)))

for a in range (1000):
print a

f = open ("test.pickle", "w")
p = cPickle.Pickler (f)
p.dump (u)
f.close ()

f = open ("test.pickle", "r")
p = cPickle.Unpickler (f)
u = p.load ()
f.close ()

Paul Rubin · May 11, 2004

If I had to use a database, the database would keep track of my users and I
would only need to do a SQL statement. Would the cPickle more efficient in my
case than a database ?

Not if you had more than a few users. Why don't you look at the dbm
or shelve modules. The dbm module lets you store strings (including
pickles) in a disk file that works like a hash table (much less hassle
than messing with an SQL server). The shelve module uses dbm and
handles the pickling automatically. Note that all these approaches
have a terrible pitfall, which is what happens if the web page needs
to update the database, say you want to let people automatically
create their own user accounts through the site? If two people try to
update the dbm file (or an xml file) simultaneously, things can get
completely screwed up unless you're careful. The idea of a real
database is to take care of those issues for you.

Another thing you could do is put the session state in a browser
cookie. Be careful when you do that though, since a malicious user
could concoct a cookie that lets him seize some other user's session,
or even takes over your server if you unpickle the cookie. The best
way to handle that is encrypt the cookies. See

http://www.nightsong.com/phr/crypto/p3.py

for a simple encryption function that should be sufficient for this
purpose.

Using cPickle	6	Feb 6, 2009
cPickle error when caching data	2	Aug 3, 2010
Problem with cPickle and cElementTree	1	Oct 9, 2008
Translater + module + tkinter	1	Feb 16, 2023
cPickle EOF Error	0	Feb 28, 2007
cPickle - sharing pickled objects between scripts and imports	6	Jun 23, 2012
cPickle and subclassing lists?	6	Apr 17, 2009
cPickle segfault with nested dicts in threaded env	3	Sep 8, 2010

Performance of cPickle module

sh

=?ISO-8859-1?Q?Holger_T=FCrk?=

Paul Rubin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads