Database applications and OOness

T

Tim Bates

People,
I'm going to write an application that needs to deal with reasonably
large amounts of data over a reasonably long period of time. So the
question that immediately comes up is, how do I store said data when the
program is not running? As I see it, the options can be divided into two
categories in two different ways: How the data is stored and how it is
interfaced.

The data can be stored in (essentially) one of two ways: in memory, or
on disk. In my case, in-memory systems are probably no good - with the
volumes of data I want to hold, and the periods of time I want to hold
them over (some if it is accounting data, which is likely to be kept for
years) I don't want it all in memory. Therefore, I will choose a
disk-based system such as an RDBMS. My RDBMS of choice is PostgreSQL, so
I'll run with that for now.

The other way my options can be divided is (and I'm simplifying things a
bit again) between those libraries that present the data as objects and
those that present it as database rows. Let me clarify:

#<Person:0x40132728 @name="name", @number="phone number", @address="address">
vs
["name","address","phone number"]

I like the first one better, because you get encapsulation and behaviour
with your data, but most of the database interface libraries that I've
seen of that type tend to take most of the querying out of the hands of
the user. They allow you to retrieve objects that have properties that
match a given example, but very seldom any more complex than that - the
equivalent of limiting yourself to "SELECT * FROM table WHERE
table.column = ?" type queries only. Part of the reason I like RDBMSs is
that I can do complex queries which don't necessarily return a simple
list of objects, like:

"SELECT s.name AS name, COUNT(DISTINCT t.customer_id) AS customers
FROM salesperson s JOIN transactions t ON s.id = t.salesperson
ORDER BY customers DESC;"

This really returns a salesperson-to-integer mapping of how many
customers a salesperson has dealt with. You can't represent this as a
plain list of objects, unless you created a special type of object just
for it (which is silly) and most OO-type database libraries won't let
you make that sort of query at all without going behind their backs,
which is really ugly.

The alternative is to use something like vanilla DBI, in which you write
your own SQL and everything comes back as arrays of values. This allows
you to do whatever queries you want but you lose the niceness of the OO
approach. I don't want to do vanilla DBI, it's too messy.

What I want is some sort of happy medium, and I'm starting to think I'll
have to write it as none of the database interface libraries I've
evaluated do it the way I want - which is the motivation for writing a
lot of software, is it not? I believe that's how Ruby itself got its
start, so I'm in noble company...

But also on a deadline. Any suggestions?

Tim Bates
 
R

Robert Klemme

[snip persistent storage selection]
The other way my options can be divided is (and I'm simplifying things a
bit again) between those libraries that present the data as objects and
those that present it as database rows. Let me clarify:

#<Person:0x40132728 @name="name", @number="phone number", @address="address">
vs
["name","address","phone number"]

I like the first one better, because you get encapsulation and behaviour
with your data, but most of the database interface libraries that I've
seen of that type tend to take most of the querying out of the hands of
the user. They allow you to retrieve objects that have properties that
match a given example, but very seldom any more complex than that - the
equivalent of limiting yourself to "SELECT * FROM table WHERE
table.column = ?" type queries only. Part of the reason I like RDBMSs is
that I can do complex queries which don't necessarily return a simple
list of objects, like:

"SELECT s.name AS name, COUNT(DISTINCT t.customer_id) AS customers
FROM salesperson s JOIN transactions t ON s.id = t.salesperson
ORDER BY customers DESC;"

This really returns a salesperson-to-integer mapping of how many
customers a salesperson has dealt with. You can't represent this as a
plain list of objects, unless you created a special type of object just
for it (which is silly) and most OO-type database libraries won't let
you make that sort of query at all without going behind their backs,
which is really ugly.

There are however object relational mapping tools that come with their own
query language. These query languages are most likely not as powerful and
flexible as SQL but they ten to be more flexible than QBE (query by
example), which you mention.
The alternative is to use something like vanilla DBI, in which you write
your own SQL and everything comes back as arrays of values. This allows
you to do whatever queries you want but you lose the niceness of the OO
approach. I don't want to do vanilla DBI, it's too messy.

What I want is some sort of happy medium, and I'm starting to think I'll
have to write it as none of the database interface libraries I've
evaluated do it the way I want - which is the motivation for writing a
lot of software, is it not? I believe that's how Ruby itself got its
start, so I'm in noble company...

But also on a deadline. Any suggestions?

You didn't say whether your project has Ruby as a requirement written on
it or whether tools you use must be free. In case not, you could consider
object relational binding tools like

- Lafcadio (Ruby, free)
http://rubyforge.org/projects/lafcadio/

- Apache OJB (Java, free)
http://db.apache.org/ojb/

- JDO implementations (Java, not all free, comes with it's own query
language)
http://java.sun.com/products/jdo/
http://jdocentral.com/

etc.


Or you consider an OO database and write only the query language parser
yourself. You could then use

- db4o (Java, commercial with reasonable pricing) with SODA query
language
http://www.db4o.com/
http://www.odbms.org/soda/

etc.

You'll probably notice that I'm coming from a Java background. :) I'm
sure there are similar tools for other languages.

Kind regards

robert
 
T

Tim Bates

have you ever looked at the Criteria library ?

Yes, in fact I wrote a DBI module for it. It's great; if I were going to
write an OO DBMS interface I'd use it, or at least some of the code and
ideas from it. But it doesn't solve the whole problem - only the
querying bit, and even then it can't quite manage queries as complex as
the example I gave with in the OP. It still only returns (at best)
arrays of data, and does nothing about turning them into first-class
objects.

I don't know if there is a solution to my problem, whether what I want
can be done cleanly. To tell the truth I'm not 100% certain of what I
want, but I visualise something like a cross between Vapor and Criteria.
If no such thing exists, and I can't find an alternate solution I guess
I'm back to writing it myself, or settling for some other method. It
just seems like there should be a good way to marry the power of SQL
(and the optimised searching/indexing and concurrency routines a good
RDBMS provides) with the pristine Object-Orientedness of Ruby.

Tim Bates
 
M

Martin DeMello

Tim Bates said:
I like the first one better, because you get encapsulation and behaviour
with your data, but most of the database interface libraries that I've
seen of that type tend to take most of the querying out of the hands of
the user. They allow you to retrieve objects that have properties that
match a given example, but very seldom any more complex than that - the
equivalent of limiting yourself to "SELECT * FROM table WHERE
table.column = ?" type queries only. Part of the reason I like RDBMSs is
that I can do complex queries which don't necessarily return a simple
list of objects, like:

"SELECT s.name AS name, COUNT(DISTINCT t.customer_id) AS customers
FROM salesperson s JOIN transactions t ON s.id = t.salesperson
ORDER BY customers DESC;"

This really returns a salesperson-to-integer mapping of how many
customers a salesperson has dealt with. You can't represent this as a
plain list of objects, unless you created a special type of object just
for it (which is silly) and most OO-type database libraries won't let
you make that sort of query at all without going behind their backs,
which is really ugly.

What would you like it to return? Should it construct a class on the fly,
with fields 'name' and 'customers'? And should it use its own query
language to do so, or parse the sql query?

martin
 
H

Hal Fulton

Tim said:
I don't know if there is a solution to my problem, whether what I want
can be done cleanly. To tell the truth I'm not 100% certain of what I
want, but I visualise something like a cross between Vapor and Criteria.
If no such thing exists, and I can't find an alternate solution I guess
I'm back to writing it myself, or settling for some other method. It
just seems like there should be a good way to marry the power of SQL
(and the optimised searching/indexing and concurrency routines a good
RDBMS provides) with the pristine Object-Orientedness of Ruby.

Tim,

I think I have an inkling of what you want.

My take on it is this: If you're going to do some coding to
meke this happen, it's easier to take an existing DB (with
fancy query capabilities and presumably efficient, reliable
data storage) and graft an OO layer onto it. (Easier, that is,
than vice versa.)

For example, you might wrap a query in a method that returned
a Struct whose member names were the database field names.

I'm sure there are complications there, but they shouldn't be
of the same order of magnitude as a full database with a full
SQL parser.


Hal
 
W

walter

I write a lot of code that provides some sort of web based view or
application to data stored in a database, and I find myself following
a repeated model where I define a Struct that represents a row of
data. This is very convenient for my manipulation fo the data, but
there are complications that arise.

The first is that if I change the database schema, I also have to then
go change the struct code to match, and the second is that I write a
lot of the exact same sort of SQL to query data with the same Ruby
around it to put that data into my struct, and when I change the
database schema, I then need to go change all of the SQL that is
affected.

What I need is a layer that, given a db table, will generate a Struct
that encapsulates that data and provides a few convenience methods for
querying data from the table into one or more objects, and for pushing
the data from the objects back to the db.

So far as I have been able to determine, there's nothing on RAA that
does this. Am I overlooking something, or is this something that I
need to write myself?

I have a system that was developed in house. It is not exactly what
you are asking for. I call it SPOM (Simple Persistent Object
Manager). The keyword is SIMPLE.

It basically has 2 layers. SpomDB is a wrapper over a DBI
connection. It can take an arbitrary sql query and will return an
array of newly created Objects. The objects will have members that
are the same names as the query names (unless they are invalid ruby
names) but they can also be accessed via [], like an array would.
This makes it convenient to do an arbitrary query. But it currently
does support modifying and resaving this type of object (would not be
hard to add, but I haven't needed that).

Above this is SPOM, a manager of sorts. It takes a spomDB and you
can add xml file mappings to it. The xml file mappings allow you to
map between the db and any given object. You can add data type
mappings (ie it is stored as a char in the db but in ruby it is a
boolean). You can also have sub-objects mapped. So you can have a
person object, that has an address object that has a zip code object.
You can then access the ruby object via person.address.zip.code (ie
it creates the sub objects and fills them for you). Please keep in
mind this is a simple one object to one table mapping not
relationships between tables (it is simple).

It is used in house to read in and transfer data between SQL Server,
Informix, Dataflex, and DBF databases. Its primary intent was to
extract data from legacy systems. Using the mapping files, you can
also do basic CRUD operations, but there currently is NO transaction
support, no connection pooling, no object query interface (ie you use
sql), no multi-table mappings or join support. I do plan to add
these things, but it will take time, right not SPOM supports our
basic in-house needs.

If there is an interest in this I could release it next tuesday to
RubyForge. (I would simply need to present this to our executive
group meeting on tuesday, more of a formality, but that's our
protocol for this) Also note that there is very little documention,
I'll try to add some if there is interest.
Thanks,

Kirk Haines



Walt
*****************************************************
Walter Szewelanczyk
IS Director
M.W. Sewall & CO. email : (e-mail address removed)
259 Front St. Phone : (207) 442-7994 x 128
Bath, ME 04530 Fax : (207) 443-6284
*****************************************************
 
F

Francis Hwang

As somebody who's done some of this work (I wrote Lafcadio), I
strongly recommend that you consider extending somebody else's work
before doing it yourself. When you enter this problem domain there are
a lot of little wrinkles you run into, and the best solution isn't
always necessarily obvious from the start. I know that personally it
took me a while to figure out the best way to transparently handle
relations between different rows, for example.

Now, I've only used Lafcadio, but I have the feeling that right now
the Ruby OR-mapping tools out there are either 1) strong on mapping
rows to objects, or 2) strong on ad-hoc querying, but not both. I'd
put Lafcadio and Vapor in category 1, and Criteria in category 2.
Either requirement is pretty big; both are twice as big. In the
long-term, my feeling is that any library that sticks around in this
space will have to satisfy both needs.

In the short-term, I can tell you what you'd have to do for Lafcadio
to get it do what you want. First, Lafcadio was written first for
MySQL, and although it should support Postgres with minimal changes, I
have yet to implement or test such a change. (At RubyConf Ryan Davis
told me he was very interested in helping add Postgres functionality,
and I moved the library to DBI to enable that, but I suspect that,
like everybody else, Ryan's pretty busy.) Second, Lafcadio's support
for querying works, but I'll be the first to admit that it's verbose
and ugly as hell. Query inference is very high on my list, but then,
I'm busy, too. So Vapor or Criteria might be a better choice for you,
depending.

At the least you might want to poke around in the various libraries
and steal the best ideas for yourself.

Francis
 
G

gabriele renzi

il Wed, 7 Jan 2004 19:06:32 +0900, Tim Bates <[email protected]> ha
scritto::

on a sidenote: someone agrees with me that we should have a common
shared and abstract ruby/DBMS api like python/java/whatever instead of
a wrapper package (DBI)?


(I hinted this long time ago and had no answer at all. Falling in
Warnock's Dilemma, so I post again..)
 
A

Ara.T.Howard

Date: Wed, 07 Jan 2004 20:31:44 GMT
From: gabriele renzi <[email protected]>
Newsgroups: comp.lang.ruby
Subject: Re: Database applications and OOness

il Wed, 7 Jan 2004 19:06:32 +0900, Tim Bates <[email protected]> ha
scritto::

on a sidenote: someone agrees with me that we should have a common
shared and abstract ruby/DBMS api like python/java/whatever instead of
a wrapper package (DBI)?
yaml?


(I hinted this long time ago and had no answer at all. Falling in
Warnock's Dilemma, so I post again..)

-a
--

ATTN: please update your address books with address below!

===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
| STP :: http://www.ngdc.noaa.gov/stp/
| NGDC :: http://www.ngdc.noaa.gov/
| NESDIS :: http://www.nesdis.noaa.gov/
| NOAA :: http://www.noaa.gov/
| US DOC :: http://www.commerce.gov/
|
| The difference between art and science is that science is what we
| understand well enough to explain to a computer.
| Art is everything else.
| -- Donald Knuth, "Discover"
|
| /bin/sh -c 'for l in ruby perl;do $l -e "print \"\x3a\x2d\x29\x0a\"";done'
===============================================================================
 
R

Ryan Pavlik

Yes, in fact I wrote a DBI module for it. It's great; if I were
going to write an OO DBMS interface I'd use it, or at least some of
the code and ideas from it. But it doesn't solve the whole problem -
only the querying bit, and even then it can't quite manage queries
as complex as the example I gave with in the OP. It still only
returns (at best) arrays of data, and does nothing about turning
them into first-class objects.

I just noticed a thread discussing Criteria in passing. ;-) Looking
at the original post, I don't think the query is that hard to
generate:

require 'criteria/sql'
include Criteria

s = SQLTable.new("salesperson")
t = SQLTable.new("transactions")

q = s.*.left_outer_join(s.id == t.salesperson, s)
q.order_by = :customers
q.order = :desc

q.select(s.name, "COUNT(DISTINCT #{t.customer_id})")

# => SELECT salesperson.name, COUNT(DISTINCT
# transactions.customer_id) FROM salesperson LEFT OUTER JOIN
# salesperson ON (salesperson.id = transactions.salesperson) ORDER
# BY customers desc

Granted, it doesn't generater column headings, and you currently have
to use a literal string hack to get the SQL function. This should
hopefully be fixed before long.
I don't know if there is a solution to my problem, whether what I
want can be done cleanly. To tell the truth I'm not 100% certain of
what I want, but I visualise something like a cross between Vapor
and Criteria. If no such thing exists, and I can't find an
alternate solution I guess I'm back to writing it myself, or
settling for some other method. It just seems like there should be a
good way to marry the power of SQL (and the optimised
searching/indexing and concurrency routines a good RDBMS provides)
with the pristine Object-Orientedness of Ruby.

I'm not sure what Vapor is, but---and I know you've mentioned it so
you're aware of it---you may take a look at Mephle, since I think it
does what you want:

class Foo
# When you call these, they automatically update the proper
# row in the table.
attr_accessor_indexed String, :bar, :baz
attr_accessor_indexed Integer, :quux

:
end

Things are a little less automatic than I'd like at the moment.
Writing this has given me an idea of how to fix it, though, with an
'auto_index' function, so you'd have something like:

class Foo
IDX = DBITable.new(...)

def initialize(...)
:

auto_index(IDX)
end
end

This could look at the column titles (or a list of what
attr_accessor_indexed declares) and insert a row with the proper
data. (Right now you have to do a manual insertion, which isn't too
tough either.)

There are a few reason I still serialize and index instead of using
tables exclusively:

1. Some objects are large, and you don't want to index everything
2. Looking up attributes every time is slow
3. Not all ruby things can be represented as SQL things
(4. SQL isn't guaranteed to be the storage mechanism, either)

I'm open to suggestion though. It's conceivable that there's a
"CachedTableObject" class or something where you give up the ability
to write into a live system, and you avoid point 2, for objects that
work. Of course, you still need to be able to load that object later,
and using a unique OID for every object in the system is nice, so
you'd still have something in the object table. I'm not sure what you
gain by this one other than a few extra bytes, but it could be done.

Making Mephle indexing less manual is probably a lot more useful---you
get what you want without any work (and there's not really much as it
is).
 
T

Tim Bates

As somebody who's done some of this work (I wrote Lafcadio), I
strongly recommend that you consider extending somebody else's work
before doing it yourself. When you enter this problem domain there are
a lot of little wrinkles you run into, and the best solution isn't
always necessarily obvious from the start.

Absolutely, I wouldn't want to write yet another database interface if I
could find one that suited me, with or without some modification. The
interface is not the point of this project.
Now, I've only used Lafcadio, but I have the feeling that right now
the Ruby OR-mapping tools out there are either 1) strong on mapping
rows to objects, or 2) strong on ad-hoc querying, but not both.

This is my experience also, which is why I started this thread.
I'd put Lafcadio and Vapor in category 1, and Criteria in category 2.

Criteria doesn't do any of category 1 at all, so yes.
Either requirement is pretty big; both are twice as big. In the
long-term, my feeling is that any library that sticks around in this
space will have to satisfy both needs.

Uhuh. But none of them have quite cracked it yet.
In the short-term, I can tell you what you'd have to do for Lafcadio
to get it do what you want. First, Lafcadio was written first for
MySQL, and although it should support Postgres with minimal changes, I
have yet to implement or test such a change.

Vapor, to give an example, is very targeted at Postgres and uses a lot of its
more advanced features (in particular it integrates Postgres's
transaction model quite closely into its behaviour). Criteria, to give
another example, is running into problems because of subtly different
query syntaxes between different databases (MySQL's "RLIKE" vs
Postgres's "~", for example). Does Lafcadio suffer from either of these
constraints? In particular I'd like to have a fair bit of control over
when and where transactions are used - it's particularly critical in
cases like

"BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE accnum = 1743;
UPDATE accounts SET balance = balance + 100 WHERE accnum = 329;
COMMIT;"

If I were to do a similar thing in Ruby code, I'd need to be sure that
the account balance that I was using in the calculation was up-to-date
(otherwise the effects of someone else's transaction might get wiped)
and also that if the second UPDATE in the example above failed for some
reason then the first one would get rolled back. Can Lafcadio do, or be
made to do this?
(At RubyConf Ryan Davis told me he was very interested in helping add
Postgres functionality, and I moved the library to DBI to enable that,
but I suspect that, like everybody else, Ryan's pretty busy.)

I am also interested in getting it to work with Postgres, if it solves
my problem. I'm pretty busy too, but I have time set aside for this
project that I can't really start on until I deal with this issue.
Second, Lafcadio's support for querying works, but I'll be the first
to admit that it's verbose and ugly as hell. Query inference is very
high on my list, but then, I'm busy, too. So Vapor or Criteria might
be a better choice for you, depending.

I wouldn't consider Criteria to be in quite the same basket as Lafcadio
and Vapor. It doesn't have any OO abstraction at all (on the data side -
the project is all about OO abstraction of queries) I think Lafcadio (or
Vapor) could take a lot of ideas (and/or code) from Criteria for their
own query interfaces. I had independently thought of some of the
concepts behind Criteria when I discovered that somebody had already
written it, and in a much cleverer way than I would have thought of, but
it's not primarily a database interface library, it's a query
construction library. A project that combined the best aspects of the OO
data interfaces and the OO query interfaces that we already have would
be exactly what I was looking for.

Perhaps I'll start looking into how to a) get Lafcadio to work with
Postgres and b) get Lafcadio or Vapor to work with Criteria.

Tim Bates
 
T

Tim Bates

# => SELECT salesperson.name, COUNT(DISTINCT
# transactions.customer_id) FROM salesperson LEFT OUTER JOIN
# salesperson ON (salesperson.id = transactions.salesperson) ORDER
# BY customers desc

Granted, it doesn't generater column headings, and you currently have
to use a literal string hack to get the SQL function. This should
hopefully be fixed before long.

Well this doesn't actually work, because the ORDER BY clause references
a column name ("customers") that doesn't exist, and also in
Criteria-1.1a at least when I tried this it failed with
NoMethodError: undefined method `join' for :customers:Symbol
from /home/tim/ruby/ln/criteria/sql.rb:243:in `_mkorder'
from /home/tim/ruby/ln/criteria/sql.rb:287:in `select'

I realise that these are minor things. If I can help fix them, I will -
tell me what needs to be done. I am keen to get this working.
I'm not sure what Vapor is, but---and I know you've mentioned it so
you're aware of it---you may take a look at Mephle, since I think it
does what you want:

I did look at Mephle, but I seem to remember you telling me not that
long ago that it *wasn't* what I wanted. :p
There are a few reason I still serialize and index instead of using
tables exclusively:

1. Some objects are large, and you don't want to index everything
2. Looking up attributes every time is slow
3. Not all ruby things can be represented as SQL things
(4. SQL isn't guaranteed to be the storage mechanism, either)

I'm open to suggestion though. It's conceivable that there's a
"CachedTableObject" class or something where you give up the ability
to write into a live system, and you avoid point 2, for objects that
work. Of course, you still need to be able to load that object later,
and using a unique OID for every object in the system is nice, so
you'd still have something in the object table. I'm not sure what you
gain by this one other than a few extra bytes, but it could be done.

I'm not sure I quite follow you here. The approach that Vapor (and I
think Lafcadio) takes is to look up an attribute only when the object is
loaded or explicitly refreshed, and just keep the value in memory, so if
the same accessor method is called multiple times in succession only one
database query is made. I think this is what you mean by "giving up the
ability to write into a live system".

Also, I'm not entirely clear on what you mean by indexing in this
context. I am very likely to set up the database tables and SQL indexes
manually, so those columns that will contain a lot of data will
naturally not be indexed. I'm not sure if you're referring to SQL
indexing though, I haven't looked that closely at Mephle.

Points 3 and 4 are rather moot. Point 3 because in my case I will be careful to
only put into the database that which can be represented in SQL. I don't
really want a library that hides all the SQL from me, I know what I'm
doing in that regard. Point 4 because my project will only ever store
stuff in SQL, although I realise that this is not a reason for you to
abandon that approach as I'm not the only one who may use your library.

Tim Bates
 
R

Ryan Pavlik

Well this doesn't actually work, because the ORDER BY clause references
a column name ("customers") that doesn't exist, and also in
Criteria-1.1a at least when I tried this it failed with
NoMethodError: undefined method `join' for :customers:Symbol
from /home/tim/ruby/ln/criteria/sql.rb:243:in `_mkorder'
from /home/tim/ruby/ln/criteria/sql.rb:287:in `select'

Hmm. I really need to release 1.2. You can always grab the latest
from svn:

http://svn.mephle.org/svn/mephle/criteria

IIRC, 1.1a had a number of things in need of fixing.
I realise that these are minor things. If I can help fix them, I will -
tell me what needs to be done. I am keen to get this working.

Oh, yeah, you need "COUNT(...) AS customers". Hopefully this will be
fixed by 1.2 as well.
I did look at Mephle, but I seem to remember you telling me not that
long ago that it *wasn't* what I wanted. :p

Oh? ;) Hrm, I probably misunderstood what you were after.
I'm not sure I quite follow you here. The approach that Vapor (and I
think Lafcadio) takes is to look up an attribute only when the
object is loaded or explicitly refreshed, and just keep the value in
memory, so if the same accessor method is called multiple times in
succession only one database query is made. I think this is what you
mean by "giving up the ability to write into a live system".

Right. You couldn't have someone else talking to the database at the
same time, at least to write. If you can't do that, I don't see what
advantage it provides to store objects this way (instead of just
indexing them).
Also, I'm not entirely clear on what you mean by indexing in this
context. I am very likely to set up the database tables and SQL
indexes manually, so those columns that will contain a lot of data
will naturally not be indexed. I'm not sure if you're referring to
SQL indexing though, I haven't looked that closely at Mephle.

Yes, this is pretty much what I do with Mephle. Objects are stored by
serializing (via Marshal) and writing that to an Object table.
Anything I need for quick lookup, I stick into a table using Criteria.
Points 3 and 4 are rather moot. Point 3 because in my case I will be
careful to only put into the database that which can be represented
in SQL. I don't really want a library that hides all the SQL from
me, I know what I'm doing in that regard. Point 4 because my project
will only ever store stuff in SQL, although I realise that this is
not a reason for you to abandon that approach as I'm not the only
one who may use your library.

Well, it hides the object storage bits, since they're the same for
everything. Indexing and SQL stuff is abstracted via Criteria of
course, but that's still "right there". Viability depends on how you
want things to work, but you might check it out.
 
T

Tim Bates

What would you like it to return? Should it construct a class on the fly,
with fields 'name' and 'customers'? And should it use its own query
language to do so, or parse the sql query?

I don't know. Probably an array of arrays or an array of hashes - which,
I know, is exactly what DBI would return for such a query. The point I'm
trying to make is I'd like the system to be able to handle this sort of
query _as_well_as_ the "SELECT * FROM table WHERE property = ?" type
return-a-list-of-objects query. I don't know of any system that can do
that, or even if it could be done neatly. Possibly such a system would
have to accept two sorts of query and handle both separately, but that
introduces its own ugliness.

Tim Bates
 
T

Tim Bates

I liek the look of ROE
(http://www.cincomsmalltalk.com/userblogs/avi/blogView?showComments=true&ent
ry=3246121322) on Squeak. Don't know if its what you're looking for of
course, since even you don't know what you're looking for :)

This article has a point that I've seen before, namely that object
models don't necessarily have a 1-to-1 mapping to relational models:
"[Object-relational mapping] adds a huge amount of complexity to a
project, and with dubious benefits: when you're not tracking down
bugs in the mapping framework or obsessing about performance, you're
chopping toes off your object model so you can shoehorn it into a
relational schema."

The OO relational-query model he describes looks very much like
Criteria, and I think if I were to go along that route I would
definitely use Criteria and DBI. But some inner Ruby demon is constantly
telling me that I'd really like to represent my data as first-class
objects _when_possible_, which Criteria doesn't even try to do.

Essentially, now, I think I want Lafcadio on Postgres with Criteria (or
Criteria-like) queries and transaction support. I like the way (and
correct me if I'm wrong about this) Lafcadio uses the RDBMS the way it
was meant to be used - unlike Vapor (which adds extra tables and columns
for metadata and generally wrests control of the RDBMS out of the hands
of the user) or Mephle (which basically only uses the RDBMS as somewhere
to write data to). I don't like Lafcadio's query model (which I
haven't tried to use, but the author himself assures me is "ugly as
hell") and I do really like Criteria's query model, so the best of both
worlds there would just about be enough to satisfy me.

I'm pretty fixed on using an SQL back-end, because it outsources much of
the calculation and concurrency issues to another program which is built
for it, but just so I can say I have considered all the options can
someone tell me if there's a non-SQL OO solution out there which, most
importantly, doesn't store all its data in memory all the time - some of
my data needs to be kept for a long time but will very seldom be
referenced, and I don't want to have to manually write it to disk. I've
seen things like Madeline and other Prevayler-type things, and they seem
to want to keep everything in memory and only write snapshots and
logging to disk. A neat query model and some in-built concurrency
support (even if it's just read/write locking) would be added bonuses.

Tim Bates
 
R

Robert Klemme

gabriele renzi said:
il Wed, 7 Jan 2004 19:06:32 +0900, Tim Bates <[email protected]> ha
scritto::

on a sidenote: someone agrees with me that we should have a common
shared and abstract ruby/DBMS api like python/java/whatever instead of
a wrapper package (DBI)?

What exactly is the difference between the "shared and abstract ruby/DBMS
api" and DBI? IMHO DBI is exactly an abstract DBMC API, albeit only for
relational databases. Do you want to unify the different SQL dialects?
Or do you want only certain things unified, e.g. date and time formats
(like JDBC tries with its escape sequences)?

Kind regards

robert
 
F

Francis Hwang

Tim Bates said:
Vapor, to give an example, is very targeted at Postgres and uses a lot of its
more advanced features (in particular it integrates Postgres's
transaction model quite closely into its behaviour). Criteria, to give
another example, is running into problems because of subtly different
query syntaxes between different databases (MySQL's "RLIKE" vs
Postgres's "~", for example). Does Lafcadio suffer from either of these
constraints? In particular I'd like to have a fair bit of control over
when and where transactions are used - it's particularly critical in
cases like

"BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE accnum = 1743;
UPDATE accounts SET balance = balance + 100 WHERE accnum = 329;
COMMIT;"

If I were to do a similar thing in Ruby code, I'd need to be sure that
the account balance that I was using in the calculation was up-to-date
(otherwise the effects of someone else's transaction might get wiped)
and also that if the second UPDATE in the example above failed for some
reason then the first one would get rolled back. Can Lafcadio do, or be
made to do this?

1. Regarding subtle differences in query syntax: If I had a
comprehensive list of things to look out for I could write tests
against them. I imagine most of them would be easy to fix.

2. Regarding transactions: I haven't done this, but I don't imagine it
would be difficult. A Ruby-like style for it might be something like:

objectStore.runTransaction {
account1743.balance -= 100
account1743.commit
account329.balance += 100
account329.commit
}

.... and then I'd just write an ObjectStore#runTransaction method like:

def runTransaction
beginTransaction
yield
commitTransaction
end

right? One question: When using DBI/Postgres, do you issue begin and
commit commands as separate lines, or do they need to be mixed in with
your SQL strings?
I wouldn't consider Criteria to be in quite the same basket as Lafcadio
and Vapor. It doesn't have any OO abstraction at all (on the data side -
the project is all about OO abstraction of queries) I think Lafcadio (or
Vapor) could take a lot of ideas (and/or code) from Criteria for their
own query interfaces. I had independently thought of some of the
concepts behind Criteria when I discovered that somebody had already
written it, and in a much cleverer way than I would have thought of, but
it's not primarily a database interface library, it's a query
construction library. A project that combined the best aspects of the OO
data interfaces and the OO query interfaces that we already have would
be exactly what I was looking for.

There's an extremely alpha version of query inference in the last dev
release, 0.3.4. I wouldn't suggest trying to use it, though, 'cause
the interface is likely to change. I tried doing something similar to
Ryan did with Criteria, but I found that as soon I started seeing
pseudo-Ruby syntactical symbols like &, |, <, >, etc., I instinctively
wanted the entire range of Ruby primitives to play with. Hence my
wierd question here earlier about whether it's possible to override
unary negation.

The more I think about it, the more this sort of interface makes me
nervous. Maybe Criteria users can pipe in here with their perspective,
but it seems to me that once you tell the programmer that she can
write code like

(tbl.bday == "2003/10/04").update:)age => tbl.age + 1)

then it's natural for her to assume she can write the opposite:

(tbl.bday != "2003/10/04").update:)age => tbl.age + 1)

.... but the second case will fail, and probably it will fail quietly
since (I think) there's no way for a Table object to detect negation
like that. I'm starting to think I'd want a query syntax that trades
some cleverness for clarity, something like maybe

(tbl.bday.equals( "2003/10/04" ) ).update:)age => tbl.age + 1)
(tbl.bday.equals( "2003/10/04" ).not! ).update:)age => tbl.age + 1)

Or maybe that's trading one sort of ugliness for another. I'm not sure
yet.

One more point of disclosure in the interest of not wasting your time,
Tim: Lafcadio does almost no handling of group functions yet. I
suspect it wouldn't be too difficult to add on, but I haven't thought
about the problem much yet so I can't really be certain of it.

Francis
 
D

dhtapp

I've been watching this thread with a great deal of interest. I'm
relatively new to Ruby, and have no experience with the various db
approaches (well, except for playing with pstore for a few hours). But I
did use NeXT/Apple's EOF product for a few years. I thought I'd just post a
few remarks in case any of you gurus are trying to strategize something.

That framework used a model object (an EOModel), with which the application
communicates directly. The model itself used vendor-free semantics to
describe standard database ops, and then the app loaded an appropriate
adaptor (EOOracleAdaptor, EOSybaseAdaptor) at runtime to translate the join
and fetch requirements into a particular vendor's SQL flavor.

In the model, Entities (think tables) could be specified to return either
instances of specific Java classes (and earlier, Objective-C classes), or
dictionaries. The dictionaries were basically hashes, but they shared the
ability with the custom Java classes to respond to key-value coding for
fetching and traversal, so that a "keypath" like
"someVendor.principalContact.phoneNumber" behaved the same, whatever the
type of object loaded. (Note: Apple relied heavily enough on key-value
coding and keypath notation to obtain at least one patent on it. I suppose
someone looking to implement from the ground up would blow past that
pioneering effort and look straight at OGNL, or somesuch.)

The model had some support for groupings, and custom read-write statements
could be recorded in the model on a per-attribute basis, to override the
model's default SQL generation. For instance, you could model two Entities
on an invoice table, say "Invoice", which did the standard CRUD stuff, and
"InvoiceSummary", which had a couple of standard attributes and a custom
reader defined as "sum %amount_due%" (sorry, can't remember the specific
syntax, but you get the idea).

And there are/were some other nifty ideas. For instance, a "fault" system
(much easier for the product engineers to implement in the earlier versions,
since Obj-C instances can be coerced into swizzling their own class
pointers at runtime...the Java version, from what I understand, took the
midnight sacrifice of multiple chickens to come together). Anyway, the idea
was that objects fetched from the db would know just enough about themselves
to be able to identify (actually, lie about) their Entity types and locate
the rest of their data in db. Then, they'd stay dormant until touched for
the first time, at which point they'd swap in their correct class
information and fetch their individual underlying rows. (That, too, could
obviously be a performance problem, but the system provided "batch faulting"
and other ways to tune for that.)

As my dear (but long-winded) great uncle used to say, "I said all of that to
say this:" Anyone thinking of doing this kind of stuff from the ground up
could maybe profit from a few weeks' study of the EOF architecture. No
need to buy the product; I think there are ample whitepapers up at the Apple
site to give an overview of the system. And, given Ruby's runtime dynamics
(compared even to Obj-C, to say nothing of Java), and the caliber of folks
who hang out here, I have no doubt someone could put together a Clean, Lean
Persistence Machine ;-)

-dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,152
Latest member
LorettaGur
Top