Python choice of database

P

Philippe C. Martin

Yes, I agree, but as most of the customer base I target uses the O/S that
cannot be named ;-) , file names could become a problem just as 'ln -s' is
out of the question.

Yet, this might be the best trade-off.

Regards,

Philippe
 
E

EP

Oren suggested:
How about using the filesystem as a database? For the number of records
you describe it may work surprisingly well. A bonus is that the
database is easy to manage manually.

I tried this for one application under the Windows OS and it worked fine...

until my records (text - maybe 50KB average) unexpectedly blossomed into the 10,000-1,000,000 ranges. If I or someone else (who innocently doesn't know better) opens up one of the directories with ~150,000 files in it, the machine's personality gets a little ugly (it seems buggy but is just very busy; no crashing). Under 10,000 files per directory seems to work just fine, though.

For less expansive (and more structured) data, cPickle is a favorite.
 
J

Jeremy Sanders

until my records (text - maybe 50KB average) unexpectedly blossomed into
the 10,000-1,000,000 ranges. If I or someone else (who innocently doesn't
know better) opens up one of the directories with ~150,000 files in it,
the machine's personality gets a little ugly (it seems buggy but is just
very busy; no crashing). Under 10,000 files per directory seems to work
just fine, though.

Yes. Programs like "squid" use subdirectories to avoid this problem. If
your key is a surname, then you can just use the first letter to divide
the names up, for instance, or part of the hash value.

Many Linux FSs can cope with lots of files, but it doesn't hurt to try to
avoid this.

Jeremy
 
C

Charles Krug

Oren suggested:


I tried this for one application under the Windows OS and it worked fine...

until my records (text - maybe 50KB average) unexpectedly blossomed
into the 10,000-1,000,000 ranges. If I or someone else (who
innocently doesn't know better) opens up one of the directories with
~150,000 files in it, the machine's personality gets a little ugly (it
seems buggy but is just very busy; no crashing). Under 10,000 files
per directory seems to work just fine, though.

For less expansive (and more structured) data, cPickle is a favorite.

Related question:

What if I need to create/modify MS-Access or SQL Server dbs?
 
G

GMane Python

For my database, I have a table of user information with a unique
identifier, and then I save to the filesystem my bitmap files, placing the
unique identifier, date and time information into the filename. Why stick a
photo into a database?

For instance:

User Table:
uniqueID: 0001
lNane: Rose
fName: Dave

Then save the bitmap with filename:
0001_13:00:00_06-21-2005.bmp

To make things faster, I also have a table of filenames saved, so I can know
exactly which files I want to read in.

-Dave
 
P

Peter Hansen

GMane said:
For my database, I have a table of user information with a unique
identifier, and then I save to the filesystem my bitmap files, placing the
unique identifier, date and time information into the filename. Why stick a
photo into a database?

There are various possible reasons, depending on one's specific situation.

A database allows you to store the date and time info, or other
attributes, as separate fields so you can use standard SQL (or whatever
your favourite DB supports) to sort, query, and retrieve.

A database makes it possible to update or remove the photo in the same
manner you use to access all your other data, rather than requiring you
to deal with filesystem ideosyncracies and exceptional conditions.

A database such as SQLite will store *all* your data in a single file,
making it much easier to copy for archival purposes, to send to someone
else, or to move to another machine.
Then save the bitmap with filename:
0001_13:00:00_06-21-2005.bmp

A database shouldn't make you jump through hoops to create "interesting"
file names just to store your data. :)
To make things faster, I also have a table of filenames saved, so I can know
exactly which files I want to read in.

Oh yeah, databases can have indexes (or indices, if you will) which let
you get that sort of speedup without having to resort to still more
custom programming. :)

Not that a database is always the best way to store an image (or to do
anything else, for that matter), but there are definitely times when it
can be a better approach than simple files. (There are disadvantages
too, of course, such as making it harder to "get at" the data from
outside the application which created it. In the case of bitmaps, this
might well be a deciding factor, but each case should be addressed on
its own merits.)

-Peter
 
P

Philippe C. Martin

I guess I use databases to store .... data ;-) and I do not wish to worry
about the type of data I'm storing. That's why I love to pickle.

I understand that during an optimization phase, decisions might be taken to
handle data otherwise.

Regards,

Philippe
 
C

Christos TZOTZIOY Georgiou

For very short keys and record (e.g. email addresses) you can use
symbolic links instead of files. The advantage is that you have a
single system call (readlink) to retrieve the contents of a link. No
need to open, read and close.

readlink also does open, read and close too. And why go through
indirection? Why not make indexes into subdirectories, say, and
hard-link the records under different filenames?
This works only on posix systems, of course.

There aren't any non-posix-conformant --or, at least, any
non-self-described-as-posix-conformant :)-- operating systems in wide
use today.

Hint: win32file.CreateHardLink
 
C

Christos TZOTZIOY Georgiou

I tried this for one application under the Windows OS and it worked fine...

until my records (text - maybe 50KB average) unexpectedly blossomed into the 10,000-1,000,000 ranges. If I or someone else (who innocently doesn't know better) opens up one of the directories with ~150,000 files in it, the machine's personality gets a little ugly (it seems buggy but is just very busy; no crashing). Under 10,000 files per directory seems to work just fine, though.

Although I am not a pro-Windows person, I have to say here directories
containing more than 10000 files is not a problem for NTFS (at least
NTFS of Win2000 and WinXP based on my experience) since AFAIK
directories are stored in B-tree format; the problem is if one tries to
*view* the directory contents using Explorer. Command-line dir had no
problem on a directory with >15000 files.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top