A wish: Simple database

B

Bill Kelly

Hi,

From: "Hal Fulton said:
Hi, all...

I sometimes wish for a very simple database with the
following features:

1. Distributed as part of Ruby
2. Need not store entire db in memory
3. No SQL requirement
4. No special efficiency requirement
5. Available cross-platform
6. Database files are readable cross-platform

Typically I use DBM in this case. But it doesn't
meet (5) and (6), since it's not there on Windows
and the files can't be moved even across Linux
systems.

A simple marshal would be fine, but it violates (2).
I believe PStore would also?

SDBM works by default on Windows, but it is severely
limited if not actually buggy. Probably violates (6)
and (5) also.

Does anyone have any recommendation? Or would a
"universal built-in database" make an interesting
addition to our world?

I, too, share the same wish...

(Side note - agree about SDBM. Doesn't work properly
on Windows, and even on Linux if you try to store anything
but really tiny key/value pairs, the data file bloats up
to gigabyte size in no time, and pretty soon it just fails
to find a place to store the data.... :( ...)

In a current project, I started out by using YAML::Store,
because the human-readable data format was especially
useful to me. But it didn't take long before we'd added
enough records that the YAML file size grew too large to
be loaded in a reasonable time by the CGI app that needed
it. I considered switching to PStore, but really wanted
to keep the human-readable data. Instead I made a small
wrapper for YAML::Store (would work the same with PStore)
that, based on the key you're requesting, hashes out to
one of (currently 256) files on disk.

It could certainly be more sophisticated; but right now
it's been meeting my specific needs quite well. On the
chance that it may be useful to anyone else, here it is:

db-store.rb
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
require 'yaml/store'

class DBStore
def initialize(dbname)
@dbname = dbname
end

def transaction(key)
hashname = hashname_for_key(key)
ystore = YAML::Store.new(hashname_to_store_filename(hashname))
ystore.transaction do
yield(ystore)
end
end

def each_ystore
each_store_filename do |fname|
next unless File.exist? fname
ystore = YAML::Store.new(fname)
ystore.transaction do
yield ystore
end
end
end

def each
each_ystore do |ystore|
ystore.roots.each do |key|
rec = ystore[key]
yield rec
end
end
end

protected

def each_store_filename
each_hashname {|hn| yield hashname_to_store_filename(hn) }
end

def hashname_to_store_filename(hashname)
"#@dbname/#{hashname}.ystore"
end

def each_hashname
0.upto(255) {|n| yield idx_to_hashname(n) }
end

def idx_to_hashname(idx)
sprintf("%02x", idx)
end

def hashname_for_key(key)
sprintf("%02x", key.hash & 255)
end
end

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Usage example:

customer = Customer.new( *CGI_stuff )

dbstore = DBStore.new("path/to/datadir")
dbstore.transaction(customer.email) do |keystore|
if keystore.root?(customer.email) # record for this user already in database?
existing_cust = keystore[customer.email]
existing_cust.update( customer )
# ...
else
keystore[customer.email] = customer
# ...
end
end

So the main difference is that transactaion now needs
the database key you're wanting to deal with, so it
can go fetch the appropriate YAML::Store (or PStore,
etc.) database chunk from a disk file it knows must
contain that key. What transaction() then yields to
its block is just a normal YAML::Store object (or
PStore, etc.)

If you want to iterate over all the *Store database
chunks on disk, there's DBStore#each_ystore.

If you want to just iterate over all the records,
there's DBStore#each.

dbstore = DBStore.new("path/to/datadir")
dbstore.each do |customer|
puts customer.to_s
end

* * *

Looking at DBStore, it would seem to be easy to change so
that one could pass in the preferred "store" mechanism...
What if initialize() looked like this:

def initialize(dbname, storeclass=YAML::Store)
@dbname = dbname
@storeclass = storeclass
end

I believe it would then work with PStore as well, just
changing the explicit occurrances of YAML::Store to
@storeclass.

Also, that it hashes out to 256 files on disk is of
course arbitrary. It would certainly be trivial to
make it hash to a million files on disk and create
subdirectories as necessary... Perhaps just

def hashname_for_key(key)
hv = key.hash % 1000000
sprintf("%03d/%03d", hv / 1000, hv % 1000)
end

then putting a Dir.mkdir(File.dirname(hashname)) in
transaction()...


Anyway, for what it's worth . . .

Regards,

Bill
 
J

Jeff Moss

Try DyBASE, object oriented dynamically typed database storage, for
ruby, python, php and some other language I've never heard of.... it's
linked into the interpreter as a shared object so its quick. I don't
know if this counts as cross platform though, not sure what you're
looking for there.

-Jeff
 
E

Ezra Zygmuntowicz

--Apple-Mail-5-491824415
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
format=flowed

Can someone please give me a hand here? I have just built a brand new
RHEL 4 server for my company and I am trying to get rails up and
running on it. I have ruby 1.8.2 installed fine. And the gem install
process went fine as well. But when I try to do the gem install raisl
under the root user this is what I getL:


[root@yhrws root]# gem instal rails
Config file /root/.gemrc does not exist
Attempting local installation of 'rails'
Local gem file not found: rails*.gem
Attempting remote installation of 'rails'
Updating Gem source index for: http://gems.rubyforge.org
/usr/local/lib/ruby/1.8/timeout.rb:42:in `new': execution expired
(Timeout::Error)
from /usr/local/lib/ruby/1.8/net/protocol.rb:83:in `connect'
from /usr/local/lib/ruby/1.8/net/protocol.rb:82:in `timeout'
from /usr/local/lib/ruby/1.8/timeout.rb:55:in `timeout'
from /usr/local/lib/ruby/1.8/net/protocol.rb:82:in `connect'
from /usr/local/lib/ruby/1.8/net/protocol.rb:64:in `initialize'
from /usr/local/lib/ruby/1.8/net/http.rb:430:in `open'
from /usr/local/lib/ruby/1.8/net/http.rb:430:in `do_start'
from /usr/local/lib/ruby/1.8/net/http.rb:419:in `start'
... 22 levels...
from
/usr/local/lib/ruby/site_ruby/1.8/rubygems/cmd_manager.rb:90:in
`process_args'
from
/usr/local/lib/ruby/site_ruby/1.8/rubygems/cmd_manager.rb:63:in `run'
from
/usr/local/lib/ruby/site_ruby/1.8/rubygems/gem_runner.rb:9:in `run'
from /usr/local/bin/gem:17


I'm sorry if someone has already answered this question or if it is
totally obvious what the problem is but can some one give me a little
help here? Again this is on Red Hat Enterprise L:inux a totally fresh
install with Ruby 1.8.2 installed in /usr/local/bin. Any pointers?
Many many thanks in advance for any help

-Ezra Zygmuntowicz
Yakima Herald-Republic
WebMaster
509-577-7732
(e-mail address removed)

--Apple-Mail-5-491824415--
 
G

George Moschovitis

I believe George added SQLite support into og as of a couple of
releases

this is true. Btw, in version 0.11.0 there is an experimental
filesystem adapter, that uses filesystem directories instead of tables
and yaml files instead of rows. Of course this adapter is very limited
at the moment, but I am working on it. However, I am not sure if I 'll
come up with something useful.

On the topic of avoiding SQL, the development (not released) version
supports pregenerated finders for all properties for even easier
querying.

regards,
George
 
J

Joel VanderWerf

Hal said:
I've used FSDB and I like it.

Well, thank you, Hal. And thanks to Michael, too.
But I was always a little nervous about its disk usage. Is it
excessive?

Have you seen any problems? The disk usage should be equal to your data
(stored using your choices of serialization methods), plus the extra
dirs to provide the hierarchical db structure, plus some small
(typically 4 bytes each for versioning data) files with names like
"..fsdb.meta.yourfilename".

So the overhead (due to the file system's block size, for instance) will
be bad if you have lots of little files. (Maybe Reiser FS can help with
that.)
I know I said I wasn't terribly worried about performance, but
it does need to be *reasonable* in terms of speed and space.

The benchmarks report 900+ transactions per sec with 1 process, 1 thread
on 850MHz Pentium. It degrades a little when you add processes, and it
degrades a bit more as the thread count increases.

That brings up another advantage of fsdb over PStore: fsdb is both
process and thread safe. That's assuming the OS has a reasonable file
lock (FSDB is not process-safe on WinME). Process safety means that FSDB
can be used for _persistent_ asynchronous IPC in situations where some
delay is acceptable. Also note that FSDB uses its own mutex classes that
don't break when you fork from a multithreaded app, so you can easily
use FSDB for communicating among parent and child processes. If you fork
with a db object in scope, you can just keep talking to it.

My only objection to putting fsdb in the standard library is that I've
only tested it on Windows (noting the problem on ME/98/95), Solaris, and
Linux. Someone was trying to get it to work on OSX, but we couldn't get
file locking to work (I didn't have access to a mac, and I couldn't
really debug remotely).

It's already Ruby-licensed.

Regarding Hal's feature list:
1. Distributed as part of Ruby

No. BTW, FSDB is in RPA.
2. Need not store entire db in memory

Yes, need not. FSDB uses a cache that can be cleared and only loads
requested objects into the cache. If you want, you can subclass the
CacheEntry class to use weak references, so that ruby GC can clear the
cache as needed. (I may make this standard if it is not too much of a
speed hit--it's on my todo list to investigate.)
3. No SQL requirement

None. Unfortunately, there is no SQL interface, either.
4. No special efficiency requirement

None. It's pretty a basic ruby wrapper on top of file system calls.
5. Available cross-platform

Sorta. See above. The difficulty is that each platform generates file
exceptions differently (and it may even differ from file system to file
system AFAIK). For instance, WinME raises Errno::ENOENT if you try to
open a dir. FSDB maps this to reasonably platform-independent behavior.
6. Database files are readable cross-platform

Yes, and you have your choice (per file, using regex matching on the
db-relative file path) of YAML, Marshal, ASCII, tabular data, CSV, etc.

Of course, if you want the whole database in one file, this file tree
stuff isn't gonna make you happy.

But that's another nice thing about FSDB: if you have an existing file
tree, even with a mix of file formats, you can treat it as a database.
It's a nice way to have ruby interact somewhat safely with external
programs that have their own ideas about where data files go and what
format they are in.

Conversely, you can treat a FSDB database as a file tree: you can grep
it, rsync it, etc.

More details on:

http://redshift.sourceforge.net/fsdb/doc/api/index.html
 
J

Joel VanderWerf

if you were interested we could work on a patch and submit to matz?

Great! I'm already using your fcntl-lock stuff from a couple of years
ago as an option (and the default on some platforms). In fact, that's
what was breaking on Darwin. Have you had any luck there?

Anyway, if sqlite has a good general solution, let's try to adapt it.
I'll take a look this weekend.
 
J

James Edward Gray II

See YAML::DBM, which comes with Ruby.

require 'yaml/dbm'
YAML::DBM.open( "/tmp/blog" ) do |db|
db['name'] = 'RedHanded'
db['url'] = 'http://redhanded.hobix.com'
db['contact'] = ['(e-mail address removed)', '(e-mail address removed)']
end

YAML::DBM.open( "/tmp/blog" ) do |db|
p db['contact']
end
#=> ['(e-mail address removed)', '(e-mail address removed)']
and


joel has already written it

http://raa.ruby-lang.org/project/fsdb/

On one hand, it would be nice if I didn't always look so dumb. On the
other, I would miss out on all the great tips you guys are always
giving me!

Thanks for the excellent links/code/education.

James Edward Gray II
 
A

Ara.T.Howard

Great! I'm already using your fcntl-lock stuff from a couple of years ago as
an option (and the default on some platforms). In fact, that's what was
breaking on Darwin. Have you had any luck there?

no - see below. i can get access to a mac to test though... doug?
Anyway, if sqlite has a good general solution, let's try to adapt it. I'll
take a look this weekend.

o.k. cool.

if you download the src check out

./src/os.c

this file might be generated (i forget) so you may have to run ./configure &&
make...

then grep for

sqliteOsReadLock

it's got

#if OS_UNIX
...
#if OS_WIN
...
#if OS_MAC

and it compiles all over the place.

also, the apache portable runtime has a portable file locking interface.
check out

http://apr.apache.org/

specifically

http://apr.apache.org/docs/apr/group__apr__file__lock__types.html

and

http://apr.apache.org/docs/apr/apr__file__io_8h-source.html

(they don't seem to have links in doxygen for the *c files?)


anyhow - we could start by creating an extension. if it worked on win/nix/mac
we could then submit to core or something. it should be to bad i think...

i'll probably bug my friend doug to run the tests on his mac. doug?

kind regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================
 
H

Hal Fulton

Austin said:
Wouldn't this be an RCR?

I don't know. I've only perceived RCRs as calling for changes
in the language or the core.

But I see your point.


Hal
 
M

Martin DeMello

Jeff Moss said:
Try DyBASE, object oriented dynamically typed database storage, for
ruby, python, php and some other language I've never heard of.... it's

Rebol? It's a sort of lisp/forth/tcl cross from the guy behind the
Amiga, with a very rich set of built in datatypes, targeted at net and
distributed scripting. It has an extremely impressive runtime, which
packs a *lot* into a very small (~250kb) executable.

martin
 
G

gabriele renzi

Hal Fulton ha scritto:
not gdbm, but on windows it seem that sdbm works fine and comes by default.
 
A

Axel Friedrich

Hi,

for backup, must I always backup the many MBs of the whole database
or is there a possibility to backup only the few KBs having been
added last, for incremental backup?

Axel
 
Z

Zach Dennis

Ara.T.Howard said:
no - see below. i can get access to a mac to test though... doug?

I can hook someone up to a G3, 512Mb RAM running OSX.3.7 via ssh and
OSXvnc, and allow you to have root access for working on this patch.
Just let me know,

Zach
 
M

Mathieu Bouchard

I'd like to think so. But I've found that DBM files
created on one Linux system can't (necessarily) be
read on another. For gdbm, I don't know.

I recall that they are prone to endianness issues, and so i would guess
they are also prone to number-size issues, e.g. two versions of the
same program on the same machine may not be compatible if one runs in
32-bit mode and the other in 64-bit mode.

But then, I tried in 1998, so it may have been fixed since. I tested
between a Cyrix box (littleendian) at home and a UltraSparc box
(bigendian) at work.

_____________________________________________________________________
Mathieu Bouchard -=- Montréal QC Canada -=- http://artengine.ca/matju
 
V

vruz

I sometimes wish for a very simple database with the
following features:

1. Distributed as part of Ruby
2. Need not store entire db in memory
3. No SQL requirement
4. No special efficiency requirement
5. Available cross-platform
6. Database files are readable cross-platform

Jamis Buck's SQLite bindings library is very good, very well
documented, and it is built on top of the SQLite public domain library
(www.sqlite.org)

It obviously doesn't comply with (1) and (3), but even if one doesn't
like SQL I've found the
api Jamis has built to be quite ruby-esque and intuitive.

The SQLite3/Ruby Manual:
http://docs.jamisbuck.org/read/book/3

Of course if the goal is *not having* any SQL at all, this will never be useful.
 
S

Sean McEligot

1. Distributed as part of Ruby
2. Need not store entire db in memory
3. No SQL requirement
4. No special efficiency requirement
5. Available cross-platform
6. Database files are readable cross-platform

I just started on this YAML based database this week. The code is
below. I'm not sure where I'm going with this yet. It does not meet
requirement 2. I'm going to do a *dbm version also. That one will meet
requirment 2, but not 6. I'm also working a C++ database with
sleepycat here http://dbapi.sourceforge.net/ but that isn't ruby and
the ruby bindings aren't checked in. Here's my simple yaml db. Maybe
it will give you some ideas.

#! /usr/bin/env ruby
require 'yaml'

$ydb_default_directory = "~"

class Ydb
def initialize(dbname, directory=$ydb_default_directory)
@directory = File.expand_path(directory)
@dbname = dbname
@db = {}
@modified = false
end
def load
filename = db_filename
if File.exists?(filename) then
@db= YAML.load_file(filename)
end
@modified = false
end
def open
load
end
def db_filename
if not File.directory?(@directory)
p "mkdir: #{@directory}"
Dir.mkdir(@directory)
end
File.join(@directory, @dbname+".yaml")
end
def []=(key,val)
@db[key]=val
end
def [](key)
@db[key]
end
def put(key,val)
@db[key]=val
end
def put(obj)
@modified||@modified = true
if block_given?
key = yield obj
else
key = obj.pk
end
@db[key]=obj
end
def has_key?(key)
@db.has_key(keyk)
end
def each
@db.each do |name, path|
yield name, path
end
end
def save
if @modified then
File.open( db_filename, 'w' ) do |out|
YAML.dump( @db, out )
end
end
end
def close
save
end
def keys
@db.keys
end
def values
@db.values
end
def index(name)
ix = Ydb.new("#{@dbame}-#{name}")
@db.keys.each do |key|
val= @db[key]
skey = yield val
ix[key] = key
end
return ix
end
def index!
ix = {}
@db.values.each do |val|
key = yield val
p "#{key}=#{val}"
ix[key] = val
end
return ix
end
end

class TestItem
attr_accessor :id
attr_accessor :fname
attr_accessor :lname
def initialize(id, fname, lname)
@id = id
@fname = fname
@lname = lname
end
def pk
@id
end
def to_s
"#{id}:#{fname} #{lname}"
end
end
def main
if ARGV.length == 3 then
db = Ydb.new("test")
db.open
row = TestItem.new(ARGV[0], ARGV[1], ARGV[2])
db.put(row)
db.close
end

db = Ydb.new("test")
db.open
db.each do |k,v|
p "#{k}=#{v}"
end
by_fname =db.index! do |i|
i.fname
end
by_lname =db.index! do |i|
i.lname
end
p "first names: #{by_fname.keys.join(" ")}"
p "last names: #{by_lname.keys.join(" ")}"

p "fdb"
fdb=db.index("fname") do |i|
i.fname
end
fdb.each do |k,v|
p "#{k}=#{v}"
end
db.close
end

if __FILE__ == $0
main
end
 
H

Hal Fulton

vruz said:
Jamis Buck's SQLite bindings library is very good, very well
documented, and it is built on top of the SQLite public domain library
(www.sqlite.org)

It obviously doesn't comply with (1) and (3), but even if one doesn't
like SQL I've found the
api Jamis has built to be quite ruby-esque and intuitive.

The SQLite3/Ruby Manual:
http://docs.jamisbuck.org/read/book/3

Of course if the goal is *not having* any SQL at all, this will never be useful.

I realized later that this was unclear.

I meant only that I don't *require* SQL, not that I
reject any package that supports it.


Hal
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,787
Messages
2,569,630
Members
45,338
Latest member
41Pearline46

Latest Threads

Top