Object/Relational Mapping is the Vietnam of Computer Science

Joel VanderWerf · Mar 22, 2007

Austin Ziegler wrote:
...

Thank ghu I don't have to do business with you, because I wouldn't trust
your programs to work with my most important assets. I assure you that
my data is far more important than the applications which do something
with the data. The applications increase value, but they NEVER provide
value. It's the data.

* What's the most valuable thing that Amazon has? It isn't the programs;
those are constantly updated and occasionally replaced. It's the
customer DATA that they've amassed.
* What's the biggest worry intelligent people have about Google? It
isn't the programs, it's the amount of DATA that Google contains about
people.

What is Google's most valuable asset? Not data. They recreate their data
constantly.

Joel VanderWerf · Mar 22, 2007

James said:
Data is a dead fish. Applications are knowing how to fish.

Data is king crab! Applications are prawns! Don't you get it???
Crustaceans rulez!

Clifford Heath · Mar 22, 2007

Sam said:
Eh? I'm not talking about ACIDity.

I'm sorry, but the possibility of a power fail is an infinitely
greater risk than that I won't be able to use my existing software
and hardware to extract data from the proprietary (or not) storage
format that software might use. You're the one that said "the data
is safe". I beg to differ.

I'm talking about the horse people
love to beat about loosing your data due to vendor lock-in

Ok. Perhaps you can explain just how vendor lockin would cause me
to loose (sic) data? I still have the files, and the software, and
the hardware, and backups or redundancy for all. Where's the chance
of loss that's mitigated by having the source code as well?

On the subject of ACIDity though, here's a developer press-release on
an older version, along with benchmarks and tests for crash
simulations: http://developer.db4o.com/blogs/product_news/archive/2006/06/02/25420.aspx

Now, could be that you just don't trust them.

No, I trust them. I don't, however, trust them as much as tests
that I know have been conducted, using thread-scheduling hooks
to explore very many of the infinite combinatoric paths of such
things, and in the process, do the same "stop the world" recovery
tests. Such exploration takes years, thousands of clients, and
trillions of transactions, before real trust is deserved.

But in any case, it's not my data that's at risk, and it's not me
who needs to be convinced. It's my dozens of customers who are
backing up tens of gigabytes of transaction log every day, from
machines costing hundred of $K, and who are using software that's
doing the same for tens of thousands of other customers for years,
without the vendor being sued our of existence - as happened to
inferior players during the 80's - who need to be convinced.

For better or for worse, and even though they now seem to have
risen above their ignorance, the authors of MySQL, who apparently
didn't know what a transaction is, have unfortunately tarred most
of the open source database world with the same brush. Unfair, but
life is.

Clifford Heath.

Robert Klemme · Mar 22, 2007

Ok, if you say so. Let's call it a describing language, but operations
like AUTO INCREMENT seem an awful lot like programming. I guess we have
to say Ruby is not a programming language either. It is a scripting
language.
hmm...
many sources do describe (no pun intended) SQL as a declarative
programming language. It isn't 'Turing complete' because it can't create
an infinite loop. Big deal.
That's academic nitpicking.

Actually it's not because this fact has indeed practical consequences.
For example, try to retrieve a tree structure from a table without
defined depth limit in standard SQL.

Regards

robert

Robert Klemme · Mar 22, 2007

Robert Klemme schrieb:
[...]

How does it do schema migration? Do you have experience with that?

Click to expand...

look at
http://developer.db4o.com/ProjectSpaces/view.aspx/Db4o-Out-Of-The-Box_Presentation

the part named "Refactoring and Schema Evolution"
and at http://developer.db4o.com/forums/thread/26997.aspx

I think that covers most cases. There are other ways too, like
translators and reflectors, but I am not fit in that part.

Thanks for the pointer! It seems at least not too big a pain to do
although the "simple use the following code to resave all objects with
UUIDs and VersionNumbers enabled" made me a little nervous.

But if
UUIDs and VersionNumbers are switched on then that should not be a big
issue. Still "ALTER TABLE ADD ( foo VARCHAR2(100) DEFAULT '-' )" feels
a bit simpler...

Kind regards

robert

Peña, Botp · Mar 22, 2007

From: Austin Ziegler [mailto:[email protected]] :
# Well, it is (the ultimately starting point), although you may need a
# process to collect it in the first place. However, it is, as=20
# you said, a cyclical loop.

data is it.
data can be King (as you said) and can be a pauper.
data has been around even before we existed. The raw data fr stoneage is =
basically same as today

...but

the processing of data has=20

1 made civilization, consider usa, europe, china, dubai, etc
2 made industrializaton, commerce, agri, you name it
3 made leisure, sports, arts, etc
4 made each one of us unique, consider AZiegler, Ara, YMatz, DBlack, =
etc..

data is _just_ data until one makes *use of it. So until someone makes =
use of it, it is still useless.. Isn't it the reason why there are =
rubyists all around? Is the ruby language data?

kind regards -botp

Austin Ziegler · Mar 22, 2007

What is Google's most valuable asset? Not data. They recreate their data
constantly.

I would (mostly) disagree. Obviously, Google provides value to its
customers/users because of the algorithms it applies to the data it
collects. However, the data Google has is of immense intrinsic value.
Part of the value that Google continually refreshes the data, but
saying that it's "recreated" constantly isn't quite true; it's
partially refreshed constantly. If they lost 20% of their data, it
would take them a *long* time to recreate that 20% because of the
sheer volume -- and some portion of that 20% would be forever lost.

Data matters immensely to Google.

-austin

Jochen Theodorou · Mar 22, 2007

Robert Klemme schrieb:
[...]

Thanks for the pointer! It seems at least not too big a pain to do
although the "simple use the following code to resave all objects with
UUIDs and VersionNumbers enabled" made me a little nervous. But if
UUIDs and VersionNumbers are switched on then that should not be a big
issue. Still "ALTER TABLE ADD ( foo VARCHAR2(100) DEFAULT '-' )" feels
a bit simpler...

db4o allows you to add fields to object without UUIDs and
VersionNumbers. You just change the class and it works. The only two
things that are working better in a rdbms is updating a large amount of
data using a single sql command and removing a large amount of rows
using a single sql command.

That is, because db4o does not have a query mechnism that allows to
update or to remove without creating the object first. And creating a
huge amount of needless object means to lose processing power. But I
also think that if enough customers say they want to have this, then it
can be put into db4o. It is not a big problem to design it and I think
adding it to the database is also not too much of a pain. So, I won't
say it is a general disadvantage of oodbms, it is only one for db4o that
could be overcome. Ah well, maybe there is already an mechnism for this
and I just missed that.

bye blackdrag

Trans · Mar 22, 2007

Ziegler's Rule of Data is just corollary to the General Law of
Zieglerity:

I am King, you is pwn3d.

But it can't be right. It contradicts the Special Law:

I is where IT is at!

Eek! That was supposed to be My Special Law, _MY_ special law, I tell
you!

T/

Eleanor McHugh · Mar 22, 2007

here at the national geophysical data ceter

http://ngdc.noaa.gov/

we say that data is useless, only the combination of applications
and human
reasoning can turn it into __information__. so, with that in mind,
i'd say
that data and applications are useless and that it's only by
combining the two
using logic (aka business rules) that anything meaningful arises.

cast in point : we've 260tb of 'data' sitting in our mass storage
device.
less than 0.01% ever comes back out. that small percentage is
massaged into
meaningful __information__ via complex application and human logic
though and
it's those kernels we're interested in.

Having trained as a physicist before entering IT I'd say that's a
different usage of the word 'data' than in this conversation. I'd
worry about any data entering one of my relational databases that
didn't pass the basic criterion of being potentially useful
information - of course it's not provably useful until someone
performs a query to extract it, so perhaps we should class it as
virtual information in its stored form... some kind of superposition
waiting for human intervention ;p

Anyway, if I have the choice between saving my data and saving my
code - data will win every time.

Ellie

Eleanor McHugh
Games With Brains

Chad Perrin · Mar 22, 2007

data is _just_ data until one makes *use of it. So until someone makes use of it, it is still useless.. Isn't it the reason why there are rubyists all around? Is the ruby language data?

Of course Ruby is just data. That's true of every programming language.
Lisp isn't special in this regard -- it's special, in that it doesn't
try to pretend there's a difference between source code and data,
whereas (most) other languages do.

Chad Perrin · Mar 22, 2007

Ok. Perhaps you can explain just how vendor lockin would cause me
to loose (sic) data? I still have the files, and the software, and
the hardware, and backups or redundancy for all. Where's the chance
of loss that's mitigated by having the source code as well?

One doesn't lose data because of vendor lock-in. One loses (easy)
access to data because of vendor lock-in (coupled with some form of
vendor lock-out, of course -- data locked into a given format, user
locked out of the software one uses to access it).

Phrogz · Mar 22, 2007

On Thu, Mar 22, 2007 at 07:25:09PM +0900, Clifford Heath wrote:
One doesn't lose data because of vendor lock-in. One loses (easy)
access to data because of vendor lock-in (coupled with some form of
vendor lock-out, of course -- data locked into a given format, user
locked out of the software one uses to access it).

That raises a little bit of an existential question/argument:

If my data is changed to gibberish (random byte overwrites), it's
certainly lost. There is no way to recover it.

If my data is strongly encrypted and the original key is lost, is the
data lost? What if it could be recovered by 100 years of parallel
processing by all the computers on the planet brute forcing the key?

If my data is 'encrypted' by a vendor and I lose access to the
software needed to decrypt it, is it lost? It could be recovered if I
somehow force the vendor to give me access to the software, or I break
DMCA and reverse-engineer the format, or...

It appears that being lost is not a binary condition, but instead a
gradient whose value is inversely proportional to the ability to
recover/find the data.

Peña, Botp · Mar 23, 2007

From: Chad Perrin [mailto

[email protected]]=20
# On Thu, Mar 22, 2007 at 08:10:59PM +0900, Pe=F1a, Botp wrote:
# > data is _just_ data until one makes *use of it. So until=20
# someone makes use of it, it is still useless.. Isn't it the=20
# reason why there are rubyists all around? Is the ruby=20
# language data?

# >=20
#=20
# Of course Ruby is just data. That's true of every=20
# programming language.

if language is data, then from whence did it came from?=20
Surely, it must have been, like any other data,

1 found, thru the process of searching or
2 created, or
3 copied, or
(4 a combi of the 3 above)

all those 3 are processes. they require intelligence. data is useless =
without them.

Can processing create it's own pristine data?

# Lisp isn't special in this regard -- it's special, in that it doesn't
# try to pretend there's a difference between source code and data,
# whereas (most) other languages do.

(thanks. lisp is very cool indeed, too.)

btw, one last question: let's take the google example, if you're to =
decide of the fate of google and were given only one choice which would =
you prefer,=20
1 the total lost of data, and start from gathering again
2 the total lost of code and coders, and restart hiring and coding again
?

cheers

kind regards -botp

Clifford Heath · Mar 23, 2007

Chad said:
...vendor lock-in (coupled with some form of
vendor lock-out, of course -- data locked into a given format, user
locked out of the software one uses to access it).

Ok, I can see why you wouldn't want to use software like that.
I certainly wouldn't, and haven't. In fact I don't know any
software like that, except what Microsoft tried with XP, but
even that failed. So it's a bit of a strawman you're knocking
down, aren't you?

Chad Perrin · Mar 23, 2007

From: Chad Perrin [mailto[email protected]]
# On Thu, Mar 22, 2007 at 08:10:59PM +0900, Peña, Botp wrote:
# > data is _just_ data until one makes *use of it. So until
# someone makes use of it, it is still useless.. Isn't it the
# reason why there are rubyists all around? Is the ruby
# language data?
# >
#
# Of course Ruby is just data. That's true of every
# programming language.

if language is data, then from whence did it came from?

Language is an abstracted, reformulated (or refactored) form of
preexisting data. The only argument for the spontaneous generation of
data from processes that really strikes me as particularly believable is
based on the idea of human creativity.

On the other hand, the Taoist in me wants to say that these divisions
between process and data are illusory, and both are one and the same.
Still, when programming it's generally far more useful to program to the
data, rather than imagine that you're using data to support the program.

1 found, thru the process of searching or
2 created, or
3 copied, or
(4 a combi of the 3 above)

all those 3 are processes. they require intelligence. data is useless without them.

Data exists. One finds uses for it -- thus, it becomes useful. The
means by which you put it to use is the process. The point of the
process is to make use of data. At least, from a limited empirical
perspective, that's surely how it looks to me.

Can processing create it's own pristine data?

Processing needs something to process. You don't get output without
input -- even if that input is built into the code (hard-coded data,
like the string "Goodbye, crule[sic] world!" in a particularly maudlin
Hello World program). There's no such thing as immaculate conception in
programming.

# Lisp isn't special in this regard -- it's special, in that it doesn't
# try to pretend there's a difference between source code and data,
# whereas (most) other languages do.

(thanks. lisp is very cool indeed, too.)

One of these days I'll even get around to learning a dialect of Lisp
that doesn't get me patted on the head and condescended to.

btw, one last question: let's take the google example, if you're to decide of the fate of google and were given only one choice which would you prefer,
1 the total lost of data, and start from gathering again
2 the total lost of code and coders, and restart hiring and coding again
?

I'll answer that question in part by reformulating it:

Pick what you'd keep, if you could only keep one . . .

1. data
2. code
3. programmers

I'd choose the programmers who made it all happen in the first place.
The data isn't really lost -- it's just no longer collected in neat
little piles. The code can be recreated, and maybe this time it'll be
even better because we'll have learned from our experience the first
time around. Talent is invaluable, however. With talent, we can
recreate code to collect the data and transform it into information.

Chad Perrin · Mar 23, 2007

Ok, I can see why you wouldn't want to use software like that.
I certainly wouldn't, and haven't. In fact I don't know any
software like that, except what Microsoft tried with XP, but
even that failed. So it's a bit of a strawman you're knocking
down, aren't you?

Only if I was using that statement as a point of argument. I wasn't,
really -- I just defined vendor lock-in. I suspect you're making
assumptions about my meaning and intent.

In any case, my usual approach to dealing with the specter of vendor
lock-in is to avoid the vendors' proprietary formats whenever at all
practical. Sure, I can always use some outside tool to recover that
from which I've been locked out, but I'd rather not have to.

Clifford Heath · Mar 23, 2007

Chad said:
In any case, my usual approach to dealing with the specter of vendor
lock-in is to avoid the vendors' proprietary formats whenever at all
practical.

In that case I think we're in violent agreement

Eleanor McHugh · Mar 23, 2007

Language is an abstracted, reformulated (or refactored) form of
preexisting data. The only argument for the spontaneous generation of
data from processes that really strikes me as particularly =20
believable is
based on the idea of human creativity.

On the other hand, the Taoist in me wants to say that these divisions
between process and data are illusory, and both are one and the same.
Still, when programming it's generally far more useful to program =20
to the
data, rather than imagine that you're using data to support the =20
program.

The physicist in me would tend to agree. Being and Doing are merely =20
useful abstractions for the 'time'-dependent asymmetries of phase space.

Ellie

Eleanor McHugh
Games With Brains

Chad Perrin · Mar 23, 2007

The physicist in me would tend to agree. Being and Doing are merely
useful abstractions for the 'time'-dependent asymmetries of phase space.

Having read and enjoyed The Tao of Physics several times, I find that I
rather like that response.

CFP: GAMEON 2007, November 20-22, 2007, University of Bologna, Bologna,Italy	0	Jun 15, 2007
Ruby Weekly News 30th May - 5th June 2005	1	Jun 7, 2005
Ruby Weekly News 14th - 20th March 2005	0	Mar 20, 2005

Object/Relational Mapping is the Vietnam of Computer Science

Joel VanderWerf

Joel VanderWerf

Clifford Heath

Robert Klemme

Robert Klemme

Peña, Botp

Austin Ziegler

Jochen Theodorou

Trans

Eleanor McHugh

Chad Perrin

Chad Perrin

Phrogz

Peña, Botp

Clifford Heath

Chad Perrin

Chad Perrin

Clifford Heath

Eleanor McHugh

Chad Perrin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads