Object/Relational Mapping is the Vietnam of Computer Science

  • Thread starter Demetrius Gallitzin
  • Start date
E

Eivind Eklund

[1] I'd love it if there were knowledge consultants in the world. Just
like I can go to a shop and have someone assist me in buying clothes,
I'd like to be able to go to someone and say "I'd like to know about x",
and they would say "ah perhaps you'd like this, this and this, then",
and I'd say "yes, those sound good, but this isn't quite right", etc. x
could be music, or part of biology, or international relations. And then
I could just drink things up and appreciate everything out there that
interests me.

They exists, though they have a different title: Librarians.

And yes, I'm fairly much serious.

Eivind.
 
C

Clifford Heath

Christian said:
Interestingly, it can be shown that the Relational Model still is
perfectly sound even if you remove the first normal-form (as opposed
to all higher normal forms).

Yes. You just allow relation-valued attributes, and un/nest operators,
and you can easily define !1NF SQL. I have a paper somewhere which
studies this, and I believe it worked quite well. It also happens not
to break relational algebra or calculus at all. But that's not the
problem that needs solving, in fact it makes it slightly worse.
The real problem IMO is not that we need a "OODB"...
but they are hardly useful for expressing database
operations). It's clear to me that a good datastore will be based on
the relational model.
The real problem is its implementation at the moment, which
constrains the developers a lot.

Yes. Relations are the right way to store data, objects are the
right way to manipulate them, but facts are the right way to conceive
of them, and hence to query them. Both ER and OO schemata are absorptions
of fact-based schemata to suit the physical characteristics of disk
storage and RAM storage/allocation respectively. IOW they're both
derived, to some extent contrived, for different purposes. Neither
can ever be the "one true way".
I argue that the following features
would take a big deal of the pain usually connected with ORM without
changing the validity of the Relational Model.
* Dynamically, strongly typed values. (Types can be enforced (or even
ducktyped!) by help of constraints.)
* Dynamically defined relvars.
* Dynamic fields. (This is, in the end, dynamic relvars and DKNF).

What you are describing is a conceptual query language. Look at
ConQuer - not a lot of information about it is available, and
Microsoft owns (but appears not to be progressing) the only
implementation.
The problem is not the Relational Model, but the current
implementations of it: they live in a world that's very different from
the one of a Ruby program. Still, I argue that all three above
features are consistent with the Relational Model.

Any absorption (clumping) of a fact-based schema obfuscates the
original intent. Each of the common types of absorption (ER and OO)
has its advocates, as we've seen in this thread. Can you please both
stop throwing rocks at each other and figure out how to get the best
of both worlds?
That said, I honestly have no idea how to implement all this
efficiently. :-(

I do, and I'm working on it. Anyone want to help? Or are you happy
just throwing rocks?

Clifford Heath.
 
A

Austin Ziegler

[1] I'd love it if there were knowledge consultants in the world. Just
like I can go to a shop and have someone assist me in buying clothes,
I'd like to be able to go to someone and say "I'd like to know about x",
and they would say "ah perhaps you'd like this, this and this, then",
and I'd say "yes, those sound good, but this isn't quite right", etc. x
could be music, or part of biology, or international relations. And then
I could just drink things up and appreciate everything out there that
interests me.
They exists, though they have a different title: Librarians.

And yes, I'm fairly much serious.

And, like other knowledge workers, there are good ones and bad ones.
But yeah, a good reference librarian can be an amazing resource.

-austin
 
R

Rick DeNatale

Yes. Relations are the right way to store data, objects are the
right way to manipulate them, but facts are the right way to conceive
of them, and hence to query them. Both ER and OO schemata are absorptions
of fact-based schemata to suit the physical characteristics of disk
storage and RAM storage/allocation respectively. IOW they're both
derived, to some extent contrived, for different purposes. Neither
can ever be the "one true way".

And there's a fundamental tension between the driving ideas of a
database and object oriented programming.

Databases come from a philosophy of separating data and it's
representation from the programs which operate on it. In the case of
relational databases, that representation is encapsulated behind an
interface defined by SQl.

To many this separation is the sine qua non of what it means to be a database.

On the other hand, Object Oriented Programming comes from the
philosophy that the representation of the data in an object should
only be known by the methods which operate on that object. Alan Kay's
conception of an object was that it was a tiny completed von Neumann
computer with data and code co-located and encapsulated from the other
objects with which it interacts behind an interface defined by its set
of methods.

From the traditional point of view, one could consider the term Object
Oriented database to by an oxymoron, since the data representation is
not hidden from the objects which represent the data in the database.
These folks might look at OO db systems and see them as shared
persistent object stores but not a databases.

Of course if you look at an OODB from the point of view of the
interface between client objects and database objects, the OO
encapsulation does provide that separation. The complaint then is
that the database representation of the persistent objects is not as
portable as a more "standardized" database. This is often a valid
point, when a requirement exists for interfacing with code outside of
the object system.

ORM approaches like ActiveRecord (and the pattern it's named after)
provide an Object Oriented interface on top of a standard relational
model. The active record objects provide oo encapsulation between the
DB and the rest of the code, and SQL provides the DB encapsulation
between the active record objects and the actual database engine. Of
course, the encapsulation of the OO system can be broken because code
other than the active record objects can directly access and alter
their state via SQL, but that's often an requirement, and knowing this
the issue can be dealt with.

It seems to me as if the all out war between the proponents of OO dbs
vs. the relationalists which raged in the mid-1980s to mid-1990s has
toned down quite a bit since then. Tying this to the title of the
thread, I'd say compare OO DBs and Relational DBs to North and South
Vietnam (in no particular order). For the most part the users of these
technologies are now living in a land where the communist and
capitalist approaches have merged, and things are quite a bit more
peaceful.
 
A

Austin Ziegler

And there's a fundamental tension between the driving ideas of a
database and object oriented programming.

I don't think that there's a fundamental tension, but that's just me.
Databases come from a philosophy of separating data and it's
representation from the programs which operate on it. In the case of
relational databases, that representation is encapsulated behind an
interface defined by SQl.

This is not correct. Fabian Pascal would have a field day with this
statement. Databases come from a formalization of the need to store
data. Hierarchical databases (and, to a degree, object databases) store
the data in the same way that the programs which manipulate the data
represent the data.

Relational databases come from the realisation that more than one
program needs to work with a given set of data, you don't want to store
more than one copy of any given datum, and there should be a formal way
of modelling such things. Thus, you have the Relational Model of Data --
which, as I have mentioned several times, is a model that represents
tuples, attributes, and relations that can be queried using relational
algebra.

Mr Heath's assertion that ER modelling is about physical storage is
completely incorrect; there *is* a physical layer that can be modelled,
but it is *primarily* about the logical layer, and the physical
characteristics DO NOT MATTER at that logical layer. You continue that
false assertion and add another one, suggesting that SQL databases are
relational databases. They are not. They follow *portions* of the
relational model, but do not completely implement relational algebra and
have followed hype and fashions to incorporate more features of
hierarchical and object data modelts because developers are fickle
creatures who aren't given enough time to really understand most of the
technology that they are required to implement or use by people who
understand even less.

SQL is, at best, an approximation of the relational model and it is
generally a failure at that, because SQL doesn't let you think in
logical terms more often than not.

-austin
 
J

Jochen Theodorou

Austin Ziegler schrieb:
[...]
SQL is, at best, an approximation of the relational model and it is
generally a failure at that, because SQL doesn't let you think in
logical terms more often than not.

say, is there a better way to make queries? I mean one that is in actual
use in one of the big databases and that is no sql dialect.

bye blackdrag
 
C

Clifford Heath

Austin said:
Relational databases come from the realisation that more than one
program needs to work with a given set of data, you don't want to store
more than one copy of any given datum, and there should be a formal way
of modelling such things. Thus, you have the Relational Model of Data --
which, as I have mentioned several times, is a model that represents
tuples, attributes, and relations that can be queried using relational
algebra.

Yes, thanks for re-iterating that... full support from me.
Mr Heath's assertion that ER modelling is about physical storage is
completely incorrect;

Austin, please put the rocks down again, and listen carefully.
You'll see I'm supporting your view, and adding to it. Don't
immediately assume that because I said something you didn't
relate to, that you can't learn something new from someone
who's been doing relational *and* object modeling at both
conceptual and physical levels since you were in short trousers.

But before I show why I'm not incorrect, let me first say that
we have an unavoidable need to work with the physical models
we must create, and that there exist no tools (to my knowledge)
that do enough to alleviate the problems that arise from that.
It's not a problem with the relational model, but it *is* a
problem with our need for physical mappings.
there *is* a physical layer that can be modelled,
but it is *primarily* about the logical layer, and the physical
characteristics DO NOT MATTER at that logical layer.
SQL is, at best, an approximation of the relational model

You can spit all you like about how SQL causes these
problems, but it isn't really the cause; rather it's the
absence of a better alternative. Until you can point to
a better alternative, you're fighting a pointless battle
with the practitioners who maintain that relational
databases are hard to work with. You're saying they
shouldn't be hard, and you're right. They're saying they
*are* hard, and they're right. Read on to see why.

Ok, with that out of the way, let me introduce fact-based
modeling, and show how it solves problems that repeatedly
even occur in properly-designed relational databases. Sit
tight, this is going to take a while. I'll try to make it
shorter; please don't pick nits with my shortcuts until
you've seen the whole picture.

Suppose we want build a simple schema to record which types
of beer people like. We can record the following elementary
facts:

Person is known by Name.
Beer is known by BeerName.
Person is fond of Beer.

Here we have two entity types (Person and Beer), and two
things that might be value types (Name and BeerName). Each
entity type has a defined reference mode ("is known by"),
which each form a fact type that's a 1:1 relationship.
Finally we have one fact type, with the reading "is fond of",
in which Person and Beer both play a role.

Now this is enough information for most folk to work out
what's going on, but not yet enough to define a relational
schema; we need to know whether we will record whether a
person may be fond of more than one beer. I'm assuming that
a beer might be liked by more than one person, but lets
assume that my initial fact said "Person has favourite Beer"
instead of "Person is fond of Beer". It's also the case
that a given Person might not like any beer, but let's
assume that we have other reasons to record such a Person.

We can represent this model using the relation:

Person(Name*, BeerName?)

where * means primary key, and ? means null is allowed.

Further normalization requires:

Person(Name*)
FavouriteBeer(Name*, BeerName*)

which avoids the NULL value. Both are valid choices, since
they can be mapped to one another without loss - though the
former is a preferable physical model. Notice how in neither
case did we need a Beer() relation. That's because all fact
roles of Beer have been absorbed into one of the tables you
see. So far so good... until we get a change request.

We have to record all the beers a person likes in priority
order. Now the second form, which wasn't preferred because
of its additional physical cost, is preferable, because we
can simply add a column "priority" to FavouriteBeer, renaming
it PreferredBeers.

All we did was add a fact "Person has Preference for Beer",
and the new constraint allows more than one Beer per Person,
yet all our relational queries are written incorrectly, and
we have to create migration code to construct a new table,
and revisit all our queries to map to the new tables. SQL
helps a little by allowing us to construct views, but the
views can't hide the fact that the Person table no longer
has a BeerName attribute.

The story goes on... now change things so that a Name
is made up of a FamilyName and a GivenName, and again so
that FamilyName is the primary key of a Family entity which
has a functionally dependent attribute MedicareNumber. You
can't avoid such schema migrations, and you can't avoid
the fact that you must create an efficient physical schema
each time, so you wind up in a constant tradeoff between
keeping your schema clean and refactoring your queries.

Notice that I've said *nothing* here about SQL that isn't
true of any effective relational system in existence. The
need to store more than one fact per tuple (for performance
reasons) is the cause. It's these *compound facts* that
create the schema evolution problems that SQL suffers from,
yet compounding is unavoidable for performance reasons.
Hence my comments about disk storage, which I stand by.

The only solution is a system that enables us to completely
hide the physical layer from the user (from the queries),
and that's what fact-based modeling can do. The details of
how are contained in Terry Halpin's book "Information
Modeling and Relational Databases", and are implemented in
at least four significant database design tools, three of
which he designed (the other is CaseTalk, www.casetalk.com).
The most recent is an open source plugin for Visual Studio
called NORMA, which is available as a CTP as the "orm"
project at <http://sourceforge.net/projects/orm/>. Talk
about an iceberg in hell! This thing is already *way* cool.
I'm visiting the team (which Terry leads) for two days in
May before the Rails conference.

A fact-based model is still relational, and still has all
the benefits of being built on first order logic and the
predicate calculus, yet it's also intrinsically different
from what has come to be commonly known as "the relational
model".

In particular, all relevant join paths are known in the
schema or can be added without breaking it. So I can say
the conceptual query:

Person(Name@) who
has more than 4 Preference
for Beer(BeerName@)

where the @ sign says that I want this as part of my result
set, and this will return the names and preferred beers
(without priority details) of all beer connoisseurs. Leave
off the @ sign from Person(Name) and you just get the beer
names without repetition.

I didn't need to say JOIN or WHERE anywhere there, so it
was easy to write (also easy to build with a graphical
tool), and extensive study has shown that such queries are
highly resistant to being broken as the schema grows.

The result set is not in first normal form, BTW. It's a
tree, or in more complex cases, a graph. It starts to look
very like a de-serialized collection of objects (what I call
a fact constellation, because it selects a meaningful group
of stars from a starry sky of facts), and that's how
programmers need it to be - it immediately addresses O/RM
requirements with no further work.

This is what relational databases *should* be like, but
aren't. A raw beginner can use a graphical query builder
(like the one in InfoModeler - Google that) that would make
an experienced DBA quake in their boots. Such queries aren't
hard to translate to SQL against the underlying physical
model, as long as you preserve the derivations by which
the physical schema was created. I've written Ruby code that
already handles some cases.

The remaining problem is that the existing tools are only
design tools, that generate database schemata and static
data layers. What's needed in addition is a flexible runtime
and query processor, written in a dynamic language, and it's
to create such an animal that I registered the ActiveFacts
project on RubyForge. There's no content there yet (sorry
Christian), but if anyone who's willing to do the background
study, they're welcome to help out. I hang out on the new
Yahoo group "information_modeling", and need folk to discuss
ActiveFacts with while I develop it to the point where I have
enough on which to base collaboration.

I plan to post a significant literature review there sometime
soon, to help people to get started. It's somewhat a pity
that Terry named his method Object Role Modeling, even though
neither the word "Object" nor the acronym "ORM" had been used
as they do now... but if we agree to call it fact-based
modeling or information modeling I think we can avoid
confusion. A good overview of Object Role Modeling is at
<http://msdn2.microsoft.com/en-us/library/aa290383(VS.71).aspx>

Clifford Heath.
 
R

Rich Morin

The only solution is a system that enables us to completely
hide the physical layer from the user (from the queries),
and that's what fact-based modeling can do. The details of
how are contained in Terry Halpin's book "Information
Modeling and Relational Databases", and are implemented in
at least four significant database design tools, three of
which he designed (the other is CaseTalk, www.casetalk.com).
The most recent is an open source plugin for Visual Studio
called NORMA, which is available as a CTP as the "orm"
project at <http://sourceforge.net/projects/orm/>. Talk
about an iceberg in hell! This thing is already *way* cool.
I'm visiting the team (which Terry leads) for two days in
May before the Rails conference.

I'm a big fan of ORM2 (The current name for the ORM effort).
I'm not a database wonk, but I play around a lot with graph-
based data structures, and ORM2 fits my needs very well. I
would like to mention 2.5 other technologies that I'm hoping
to tie together with ORM2, Ruby, and Rails.

* Conceptual Graphs (Dr. John Sowa, et al)

Although CGs use a different notation than ORM2, both
systems describe hypergraphs (graphs in which edges
may have more than two endpoints: "John took the train
to Chicago"). While ORM2 maps nicely into relational
DBs, CG maps nicely into predicate calculus.

* Ruby Graph Library (RGL; Horst Duchene)
GRAph Theory in Ruby (GRATR; Shawn Garbett)

The basic idea in RGL and GRATR is a mapping between
graph nodes and objects. This lets you, for example,
ask a node about its neighbors, forward messages to
them, etc.

It seems to me (SciFi alert) that it should be possible to:

* extend RGL/GRATR to have both nodes and edges as objects,
such that you can ask an edge about the objects it links

* back the resulting graph by an ORM2-style database (this
would let me ask cross-graph questions such as "which As
have a B relationship with Cs").

* send links (predicates) to a KR system (e.g., PowerLoom)

In fact, I've been musing about how close Active Record is to
being able to support a subset of this scheme. Consequently,
I found your posting _quite_ interesting.


One problem I see with using AR join tables for relationships
is that there seems to be a presumption that there is only a
single reason why a set of tables would be joined. So, "Rich
drives a Camry" and "Rich owns a Camry" live in cars_people.

My working plan is to use a "type" field (ala STI) in each
join table to disambiguate these cases. Not pretty, however.
The discussion in

http://wiki.rubyonrails.org/rails/pages/ManytoManyPolymorphicAssociations

hints at some of the difficulties with this approach.

Another problem is that (apparently) AR doesn't support join
tables with more than two columns. Given that I need hyper-
edges, this loses badly. So, it looks like I need some model
(e.g., acts as) code...

The remaining problem is that the existing tools are only
design tools, that generate database schemata and static
data layers.

There's also the slight problem that NORMA requires Visual
Studio, C#, M$Windows, etc.
What's needed in addition is a flexible runtime
and query processor, written in a dynamic language, and it's
to create such an animal that I registered the ActiveFacts
project on RubyForge. There's no content there yet (sorry
Christian), but if anyone who's willing to do the background
study, they're welcome to help out. I hang out on the new
Yahoo group "information_modeling", and need folk to discuss
ActiveFacts with while I develop it to the point where I have
enough on which to base collaboration.

I'd be happy to be part of the discussion.

-r
--
http://www.cfcl.com/rdm Rich Morin
http://www.cfcl.com/rdm/resume (e-mail address removed)
http://www.cfcl.com/rdm/weblog +1 650-873-7841

Technical editing and writing, programming, and web development
 
C

Clifford Heath

Cross-posted and followups to information_modeling from
ruby-talk.

Rich said:
I'm a big fan of ORM2 (The current name for the ORM effort).

Hooray - someone else! ORM2 is a minor redrawing of the
graphical symbols for ORM, I'm not aware of any semantic
changes. And though the current mapper is relational,
there's no good reason not to do UML and other kinds of
artifacts from it.

Where did you come across ORM?
Did you study at Neumont/Northface?
I'm not a database wonk, but I play around a lot with graph-
based data structures, and ORM2 fits my needs very well. I
would like to mention 2.5 other technologies that I'm hoping
to tie together with ORM2, Ruby, and Rails.
* Conceptual Graphs (Dr. John Sowa, et al)
Although CGs use a different notation than ORM2, both
systems describe hypergraphs (graphs in which edges
may have more than two endpoints: "John took the train
to Chicago"). While ORM2 maps nicely into relational
DBs, CG maps nicely into predicate calculus.

Well, ORM has ternary and higher-order facts, but each
fact role connector has exactly two ends. The NORMA mapper
maps higher-order facts to binary ones before absorption,
which is strictly unnecessary, but simplified the XSLT
approach they were using before LiveOIAL foiled it.
The point is that you have objects types (entity, value
and nested types) and you have fact types with one or
more roles - each role is a 2-ended connector from a
fact type to an object type.
* Ruby Graph Library (RGL; Horst Duchene)
GRAph Theory in Ruby (GRATR; Shawn Garbett)
The basic idea in RGL and GRATR is a mapping between
graph nodes and objects.

I don't see why the objects couldn't *be* graph nodes.
You could re-open the Object class to add the required
single attr_accessor. But anyhow, you're saying that's
not what they do, and that's ok too.
It seems to me (SciFi alert) that it should be possible to:

I'm not a graph theorist, though I studied it once, but what
you're suggesting sounds quite reasonable.
One problem I see with using AR join tables for relationships
is that there seems to be a presumption that there is only a
single reason why a set of tables would be joined. So, "Rich
drives a Camry" and "Rich owns a Camry" live in cars_people.
My working plan is to use a "type" field (ala STI) in each
join table to disambiguate these cases. Not pretty, however.

Ouch... I thought you could do this properly using has_many_through?
That said, I haven't tried it.
Another problem is that (apparently) AR doesn't support join
tables with more than two columns. Given that I need hyper-
edges, this loses badly. So, it looks like I need some model
(e.g., acts as) code...

I'll post my relational meta-model for ORM 2 shortly, in
the Yahoo group. It can represent (almost?) everything that
NORMA is capable of representing, and it maps to the object
hierarchy I've defined in Ruby that can already load NORMA
diagrams (though not yet constraints). This relational model
should work with AR with minimal changes, so you can use it
for your purposes.
There's also the slight problem that NORMA requires Visual
Studio, C#, M$Windows, etc.

Well, I *did* say NORMA is an iceberg in hell :).
But both Terry and I are, with others, keen to define
a textual language for ORM, and I'll implement that in
Ruby, or even sooner, a Ruby DSL for ORM, which I've
started. So there'll be no need for a visual tool.
I'd be happy to be part of the discussion.

Great to have your interest, and thanks for joining
the Yahoo group. It only has a few people yet, but
it's only two weeks old too.

Clifford Heath.
 
S

Sam Smoot

One problem I see with using AR join tables for relationships
is that there seems to be a presumption that there is only a
single reason why a set of tables would be joined. So, "Rich
drives a Camry" and "Rich owns a Camry" live in cars_people.

My working plan is to use a "type" field (ala STI) in each
join table to disambiguate these cases. Not pretty, however.

Just a heads-up since feedback from this group would be appreciated:

I've implemented that in my DataMapper project.

RubyForge project (svn access): http://rubyforge.org/projects/datamapper
Blog (with related posts): http://substantiality.net/archives/tags/datamapper
 
J

Joel VanderWerf

Jochen said:
Austin Ziegler schrieb:
[...]
SQL is, at best, an approximation of the relational model and it is
generally a failure at that, because SQL doesn't let you think in
logical terms more often than not.

say, is there a better way to make queries? I mean one that is in actual
use in one of the big databases and that is no sql dialect.

Nansen said: `Yes, there is.'

`What is it?' asked the monk.

Nansen replied: `It is not mind, it is not Buddha, it is not things.'
 
J

Jochen Theodorou

Joel said:
Jochen said:
Austin Ziegler schrieb:
[...]
SQL is, at best, an approximation of the relational model and it is
generally a failure at that, because SQL doesn't let you think in
logical terms more often than not.

say, is there a better way to make queries? I mean one that is in
actual use in one of the big databases and that is no sql dialect.

Nansen said: `Yes, there is.'

`What is it?' asked the monk.

Nansen replied: `It is not mind, it is not Buddha, it is not things.'

hehe, ok.

bye blackdrag
 
C

Clifford Heath

Sam said:
Just a heads-up since feedback from this group would be appreciated:
I've implemented that in my DataMapper project.

(CC'ed into information_modeling - DataMapper is on
RubyForge for those folk).

It looks interesting Sam. I've not got very far into
reading it, but I like what I see. It's not heading
in the same direction as ActiveFacts, but we still
have some goals in common. Perhaps there's some room
for collaboration?

I have a couple of plans for AF that goes somewhat
against the tide of O/RM's, based on my experience
from building a very successful AR-style O/RM in C#:

* I don't believe in dynamic reflection from the DB.
If reflection is needed it should be a manual thing
that yields a schema file of some sort. DB schema
don't have enough scope for documenting the original
intent and conceptual structure - a simple list of
tables, columns, FK's and indexes isn't a sufficient
basis for writing good programs. Instead, the whole
database schema should be generated from a higher-
level schema that *is* properly annotated so that
effective code can be generated, and result sets
have an appropriate structure. (... and I certainly
don't believe you should have to do a relational
schema, an object hierarchy, *and* a set of XML
mapping files, like iBatis for example!)

* I don't believe in fetching whole records, or in
fetching from only a single relation (including a
view) in one operation. Applications too often don't
need that, and it makes lazy programmers who do it
anyhow. Query results should be structured so they
can reflect the sum of all the data that must be
fetched in response to a single user action.

* I think that every row and value fetched should be
traceable back through the query to the schema.
IOW you can look at this value "42" and find out
what the question was :). This last goal is a bit
demanding considering with AF, the query is made
against the fact-based schema, translated to one
or more SQL queries, potentially cached into a
stored procedure, processed with parameters into
one or more tabular result sets, then re-assembled
into a graph that reflects the structure of the
original query.

The idea here is that (for example in a Rails context)
you take the parameters from the user's context and
any submitted values, you run one query, and you get
the entire hierarchical result set for rendering the
next view. The result set is structured appropriately
too, so it can be mapped to a view that's dynamic, it
can wrap itself around a number of similar result sets.

If that sounds interesting, lets talk.

Clifford Heath.
 
R

Rick DeNatale

I don't think that there's a fundamental tension, but that's just me.


This is not correct. Fabian Pascal would have a field day with this
statement. Databases come from a formalization of the need to store
data.

I think you misinterpret my point.
Hierarchical databases (and, to a degree, object databases) store
the data in the same way that the programs which manipulate the data
represent the data.

I don't believe that this is, in general, true. but let me get back to that.
Relational databases come from the realisation that more than one
program needs to work with a given set of data, you don't want to store
more than one copy of any given datum, and there should be a formal way
of modelling such things. Thus, you have the Relational Model of Data --
which, as I have mentioned several times, is a model that represents
tuples, attributes, and relations that can be queried using relational
algebra.

I'll refer you to Chris Dates 1975 book "An Introduction to Database
Systems." Which was the bible on database systems to folks of my
generation. Actually I'm looking at the third edition from 1981. The
book covers the relational (using System R as the example),
hierarchical (using IMS/DB), and network (EBTG) approaches.

In the introduction, Date gives the driving reason for databases as
allowing enterprises to put their operational data under centralized
control. He gives a lengthy list of advantages from this reduction in
redundancy, reduction in inconsistency, data can be shared, standards
(various levels of enterprise standards, industry, national and
international), easier data interchange/migration, ability to apply
security restrictions, integrity maintenance, and the ability to
balance conflicting requirements.

These requirements lead to what Date calls an important goal of
database systems, 'data independence.' This is the separation between
the database managment system and the applications which I was talking
about. Data indepence allows applications to have different views of
the data, and to allow a DBA to change the storage structure and/or
access strategy without affecting applications.

The abstract model for this presented by Date is that the architecture
of the system has three levels, in internal storage level, an external
level (which comprises the individual views of the different
applications), and a conceptual level which provides a 'level of
indirection' between the two. This conceptual level is what provides
the encapsulation/information hiding I alluded to. The use of SQL as
the realization of the conceptual level for the relational approach
was just an example.

Maybe I'm just an old fuddy duddy, but this is what I learned as the
defining quality of a database system when I first encountered the
idea in the 1970s.

Object oriented databases back off just a bit from data independence
because they turn the conceptual level into a framework to be fleshed
out by the clients rather than sticking with a standard, and that's
the tension I'm talking about.

Now getting back to your statement:
Hierarchical databases (and, to a degree, object databases) store
the data in the same way that the programs which manipulate the data
represent the data.

This depends on the programs. Some programs are written to look at
data hierarchically, or as a network. Others aren't. In fact the
vast majority of applications written when database systems were
introduced tended to work on sequential (usually sorted) data. Back
then the ideas introduced by systems like IMS and CODASYL DBTG were
actually rather foreign and might be one reason why the relational
approach was widely adopted, the conceptual model looked more like an
extension and formalization of what application developers were
accustomed to back in those days.

It might be hard to see this, but for someone who lived through some
of the revolutions in information technology since the early 1970s,
it's evident. I spent a good part of my career evangelizing
enterprise programmers about object-oriented technologies, so I know
how foreign what we take for granted these days was for the older
folks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top