Database usage best practices

Rainer Weikusat · Sep 26, 2012

I just came accross the following statement in the DBI::Class
documentation,

,----
| Class:

BI provides a convenient abstraction layer to a database.
|
| It not only provides a simple database to object mapping layer, but
| can be used to implement several higher order database functions
| (triggers, referential integrity, cascading delete etc.), at the
| application level, rather than at the database.
|
| This is particularly useful when using a database which doesn't
| support these (such as MySQL), or when you would like your code to be
| portable across multiple databases which might implement these things
| in different ways.
|
| In short, Class:

BI aims to make it simple to introduce 'best
| practice' when dealing with data stored in a relational database.
`----

I strongly disagree with the opinion that it would be 'best practice'
to treat a RDBMS as 'dumb' system for storing structured data in
binary files and reimplement all the features it already has in
application code on top of it. This rather strikes me as 'back to the
1960!' idea which likely comes from the fact that RDBMSes, originally
supposed to enable people to perform operations on datasets without
having to learn programming in some imperative language first, were so
successful that people who already know how to program in an imperative
language are more or less forced to use them but - of course - they
don't have the slightest interest in actually learning how to so do
efficiently, especially if this means 'learning a second' (or even
more than 'the second') other programming language.

Ironically, this step backwards in the last century markets itself as
modern ...

Charlton Wilbur · Sep 27, 2012

RW> I strongly disagree with the opinion that it would be 'best
RW> practice' to treat a RDBMS as 'dumb' system for storing
RW> structured data in binary files and reimplement all the features
RW> it already has in application code on top of it.

Point the 1st: it depends very much on the DB. If you're using MySQL,
the only sane path is to treat it as a dumb system and reimplement the
features you need on top of it.

Point the 2nd: you're treating this as a black and white issue. It's
not whether an ORM layer is saintly or wicked, it's what costs you pay
in terms of performance and maintainability versus what benefits you get
in terms of expressivity, error-resistance, and risk of change when you
include an ORM layer.

In my experience, if you have an application large enough to consider in
terms of MVC that uses a database as a data store in any but the most
trivial of manners, you are a moron if you *don't* use an ORM layer. In
particular, this architecture allows new features in the application to
be developed entirely in software, and then once the desired behavior is
understood and stable, the logic can be pushed down into the database.
The risk and cost of change at levels higher than the ORM layer is much
lower than the risk and cost of change at levels below the ORM layer,
and this architecture offers a way to mitigate that risk.

RW> This rather strikes me as 'back to the 1960!' idea which likely
RW> comes from the fact that RDBMSes, originally supposed to enable
RW> people to perform operations on datasets without having to learn
RW> programming in some imperative language first, were so
RW> successful that people who already know how to program in an
RW> imperative language are more or less forced to use them but - of
RW> course - they don't have the slightest interest in actually
RW> learning how to so do efficiently, especially if this means
RW> 'learning a second' (or even more than 'the second') other
RW> programming language.

I'm sure waving your dick around like this is very pleasant, but please
tuck it back in your pants; people are pointing and laughing.

Charlton

Rainer Weikusat · Sep 28, 2012

[...]

I'm sure waving your dick around like this is very pleasant, but please
tuck it back in your pants; people are pointing and laughing.

I would like to add an additional metapoint: The opinions of worthless
people are worthless.

Rainer Weikusat · Sep 28, 2012

Mladen Gogala said:
How do you measure expressivity and error resistance? How would you even
define expressivity? I am not sure what is the connection between risk
and ORM, so I will not comment on this one.

I don't know what 'expressivity' is supposed to be but I have some
ideas of 'error resistance' for program interacting with databases:
Usually, this means catching database errors and deciding on a
sensible recovery strategy. This is especially important for programs
which are supposed to run continously without supervision. In Perl,
this relatively easy: DBI supports defining a database error
handler. This handler can throw an exception (meaning, an exception
object containing information about the error, not something like
"Can't database!"). This exception can be caught at a suitable place
in the application code. Provided the problem seems transient, the
usually sensible way to deal with that it to disconnect from the
database, reconnect and re-execute the query at the front of the query
queue. This is not anyhow 'magically' tied to 'ORM layers' and not
really difficult to implement.

In simpler cases, it is sufficient to fork the 'worker process' (this
job can also be performed by a special supervisor program) and do the
actual work in the child. Should the child hit a transient problem, it
exits with a non-zero exit code and after seeing that, the parent
simply forks again and the procedure starts over (I know this is
difficult to imagine but 'fork' is actually a useful primitive, not
just an encumbrance on the way to 'execute another program' ...).

[...]

Oh yes, ORM layer is a good thing, it enables to program very quickly and
generates classes for you. ORM is usually accompanied by an application
generator, like Groovy on Grails (Hibernate) which means that the
projects will be done much more quickly.

This is simply not true: Over the lifetime of a program, the amortized
impplementation cost of every infrastructure facility contained in it
is zero provided that it actually works and works as it would need to
work. As a practical example, the largest Perl program I'm currently
working with is just about to break through the 10,000 LOC barrier
(I've done a couple of large ones in the past). This includes a
database interface layer designed to be suitable for this particular
program and writing that has maybe costed me something like a week
back in 2010. Presently, it amounts to 6% of the code (678 lines) and
something like this is by far too trivial to warrant downloading
seriously large amounts of unknown code 'from the internet'. In
exchange for this minor effort (and some assorted fixes/ enhancements
to the DBD:

g module), this has been working 24x7 in dozens of
installations world-wide (which is not much in absolute terms but a
fair lot for a single person to support) without ever causing the
slightest headache to me or the people who use this code. In contrast
to that, 'the Java team' (larger than 1 person) uses Hibernate and
they've certainly meanwhile spend as much time as I spent writing this
code with debugging, investigating and working-around bugs/ quirks in
the ORM. These chores are still ongoing and in exchange for that, they
got a technically inferior solution both from an 'error resilience'
and a user experience standpoint (of course, it can be argued that the
solution to any problem really ought to be 'download more stuff from
the internet'

.

Rainer Weikusat · Sep 28, 2012

[Hibernate]

in exchange for that, they got a technically inferior solution both
from an 'error resilience' and a user experience standpoint (of
course, it can be argued that the solution to any problem really
ought to be 'download more stuff from the internet' .

Some details on this:

By default, the ORM which happens to be used here does not support
reconnecting to the database after an existing connection was severed
for whatever reason. Should this happen, the UI code becomes dead in
water until the corresponding process is restarted. Meanwhile, this
default policy has been changed and the program survives TCP-related
malaises. But this didn't happen until it caused problems for
customers and understandingly so: The theory behind using third-party
developed infrastructure code is that the people who did that are 'the
experts' in this domain and that a poor application programmer should
be happy that he can concentrate on his problems instead of having to
learn how to tame 'wild beasts' like a RDBMS. Because of this, people
who use this code naturally tend to leaving 'the defaults' alone.

Also by default (or at least when being used in the 'obvious,
straight-forward way') the ORM performs database operations
synchronously and 'the user session' becomes unresponsive until they
are completed. In contrast to this, the Perl program I mentioned in
the previous posting has been designed as 'traditional'
single-threaded, event-driven UNIX(*) process whose 'processing core'
is an I/O multiplexing loop (since it has lots of other stuff to do
than interact with the database). Because of this, it uses the
PostgreSQL asynchronous query processing interface. At the application
level, this means that a query is 'started' based on a query object
and some parameters for this particular invocation and once it has
been completed, a suitable 'continuation' will be invoked (while I
still think the way 'continuations' are supposed to work in scheme is
completely mad, the concept/ term itself is IMHO very useful). Since
Perl supports closures, this is an easy thing to do. The same can very
likely be accomplished with 'the ORM' as well, at the very least by
employing threading in a suitable way, but it wasn't done: The
difference is really that I started with designing how 'the program',
considering its purpose, was generally supposed to operate and then
started looking for already existing code which could be helpful for
accomplishing this instead of starting with "download what everybody
else uses" and then code according the properties it happens to have.

The morale of this is that 'complex abstraction layers' are only
really useful when the people using them already have the knowledge
and skills they are not supposed to need because of the abstraction
layer.

A note the 'dicksizing' auto-responder: I understand that you are
usually motivated by showing off, however, this is not necessarily
true for everyone else. I try to avoid using my own experiences as
an example as hard as possible exactly to avoid soliciting these kind
of useless retorts. But this isn't always possible and I'm not really
convinced that it is sensible.

Kaz Kylheku · Sep 28, 2012

How do you measure expressivity and error resistance? How would you even
define expressivity?

Expressivity is the extent to which you can express the solution to a problem
in terms of only the symbols and language of the problem domain. Let us
call those relevant symbols.

So for instance if we're solving some problem in linguistics, but stuff like
"malloc(sizeof *node * n)" occurs in the solution, then that takes away
from the expressivity. The entity "node" perhaps represents something in the
problem domain, and perhaps n also (being the number of such things), but
malloc, sizeof, and the * dereferencing operator are three irrelevant symbols,
and so is the * multiplicative operator, because it's involved in a memory
management calculation which is irrelevant to the problem domain.

To obtain a measure of expressivity, we could count the total number of symbols
in the code, excluding any punctuation. That yields a denominator by which we
can divide the number of just the relevant symbols to obtain a fraction.

Expressivity is, intuitively, a very real, concrete concept which we can
readily recognize, and that tells us we can put a number on it, just like we
can put a number on how funny a joke is, or how moving is a symphony.

Rainer Weikusat · Sep 28, 2012

Kaz Kylheku said:
Expressivity is the extent to which you can express the solution to a problem
in terms of only the symbols and language of the problem domain. Let us
call those relevant symbols.

So for instance if we're solving some problem in linguistics, but stuff like
"malloc(sizeof *node * n)" occurs in the solution, then that takes away
from the expressivity. The entity "node" perhaps represents something in the
problem domain, and perhaps n also (being the number of such things), but
malloc, sizeof, and the * dereferencing operator are three irrelevant symbols,
and so is the * multiplicative operator, because it's involved in a memory
management calculation which is irrelevant to the problem domain.

It is irrelevant 'to the problem domain' because 'the problem' is
different from the 'the solution'. Because of this, a problem can be
stated abstractly while any technical solution to the problem will
need to make use of existing tools suitable for solving problems. Eg,
let's assume the problem is "I want a coffee!". Neither the ground
coffee beans nor the filter nor the water kettle, the water itself,
the desk all of this is placed on, the material which was used to
construct this desk nor even the coffeepot I employ in order to drink
the result has any 'problem domain' relation to the problem
itself. They are just necessary (or helpful) parts of a technical
solution to it.

Rainer Weikusat · Sep 28, 2012

[...]

let's assume the problem is "I want a coffee!". Neither the ground
coffee beans nor

I was actually being to conservative: As demonstrated by
' !Iaaceeffnotw', stating the problem already requires a lot of things
with absolutely no inherent relation to it. Not even the morphemes or
the sounds the spoken sentence would be composed have any: All just
perfectly abitrary conventions forced onto the poor users of language
because of 'technicalities' of the solution domain.

Kaz Kylheku · Sep 28, 2012

It is irrelevant 'to the problem domain' because 'the problem' is
different from the 'the solution'. Because of this, a problem can be
stated abstractly while any technical solution to the problem will
need to make use of existing tools suitable for solving problems. Eg,
let's assume the problem is "I want a coffee!". Neither the ground
coffee beans nor the filter nor the water kettle, the water itself,
the desk all of this is placed on, the material which was used to
construct this desk nor even the coffeepot I employ in order to drink
the result has any 'problem domain' relation to the problem
itself. They are just necessary (or helpful) parts of a technical
solution to it.

Those things may be necessary and helpful, but they are not expressive.
(Well, they are "espressive", haha.)

The details of the process for brewing coffee is irrelevant to the problem
domain of "I would like a coffee". The relevant concepts are: are what kind of
coffee I want: latte, cappucino, black. How dark a roast, how much sugar, if
any and so on.

The barista at the coffee shop speaks more or less the relevant language of
this problem domain, as does any well-designed coffee dispensing automaton.
They hide the irrelevant details from the consumer, giving us an abstract
cofee-making interface whereby we can "program" the coffee that we want
in "coffee shop lingo".

In computing, we can similarly hide the necessary and helpful, but irrelevant
concepts. Those are not always bits, bytes, pointers and memory management.

For instance "Low pass filter these samples with a cutoff frequency of
4000 Hz, rolling off at 18 db per octave" is more expressive than constructing
a matrix full of coefficients and then explicitly munging the data
trhough it.

Kaz Kylheku · Sep 28, 2012

Thanks for the nice explanation. I haven't heard about the expressivity
before, my spelling checker is still flagging it out. What does
expressivity show me and what can I use it for? May I conclude anything
about the program unit being studied, based on its expressivity?

Yes. For example, if you have sufficient expressivity, there isn't much
difference between the specification of a problem and the solution. And, of
course, correctness means that the solution implements the specification of the
problem, so if they are not so different, correctness is easier to verify.

So certain conclusions are easier to obtain from expressive code. But not
necessarily others. If you require conclusions about the details which
are hidden, that may be difficult to impossible.

For instance, suppose you want reach a conclusion about how much memory the
solution requires, but there is no explicit memory management. Then you have to
infer it somehow by knowing how it works "under the hood", or rely on
profiling tools.

Kaz Kylheku · Sep 28, 2012

[...]

let's assume the problem is "I want a coffee!". Neither the ground
coffee beans nor

Click to expand...

I was actually being to conservative: As demonstrated by
' !Iaaceeffnotw', stating the problem already requires a lot of things
with absolutely no inherent relation to it.

Well, you need a language. You need symbols, and the representation of
symbols and how they map to concepts is arbitrary.

But a solution using one symbols is isomorphic to another one using
different symbols for the same thing. (It can be symbol-for-symbol isomorphic.)

If you don't understand the conventions of the language, then an utterance may
not look like gibberish to you, but that's not the same thing as lacking
inherent expressivity.

Expressivity *demands* reliance on material that has been internalized between
the originator of the message and the recipient: some common language,
common understanding of domain abstractions and so on.

Expressivity means putting less information in, and relying on "exformation".
(I didn't make up that word and I'm using it consistently with:
http://en.wikipedia.org/wiki/Exformation)

Not even the morphemes or
the sounds the spoken sentence would be composed have any: All just

For the purpose of expressivity, we don't look at morphemes unless they act as
independent symbols. Morphemes which are just fragments of the representation
of a symbol are uninteresting, because symbols are atoms, and could be
replaced by other atoms in a way that preserves meaning, if the replacement
is consistent.

The pattern of morphemes is valuable only to the extent that when it occurs
twice in the utterance (and in the same context), it refers to the same thing.

Rainer Weikusat · Sep 30, 2012

Mladen Gogala said:
Rainer, I agree with you about the DB usage, I disagree about the
ORM,

I didn't write anything about my opinion of object-relational
mapping developer support software, I just stated the fact that
'database interaction' is a relatively tiny and ancilliary part of
almost all programs which include database interactions I've either
written or encountered so far. The same happens to be true for all
other kinds of problem-independent infrastructure code: It's always
there and it is always just an insignificant amount compared with the
problem specific code and further, it is not only the smaller part but
also the part which is a lot easier to implement and get
right (I'm actually fairly convinced that such an ungodly amount of
'generalized infrastructure code' exists on the internet because
this is code dealing with inherently simple problems. Simple enough,
to be precise, that horrendously complicated ways of dealing with them
don't just collapse under their own weight, as they would if the
problem was complicated, meaning, getting half of the corner cases
which shouldn't have been lumped together to begin with more or less
subtly wrong would render the thing itself completely unusuable).

In Perl, I've so far been able to get by with using a slowly expanding
subset of the DBI functionality and the reason why I employ relational
database management systems is because the nature of a given problem
is such that doing so provides an advantage. Consequently, I don't
quite understand why I would want to use some middlewhere whose sole
purpose is to hide the RDBMS behind a simpler (and thus, less
flexible) abstraction: If 'simpler but less flexible' seems a viable
option, I can simply not use a database.

In any case, I'm convinced that learning how to employ a database
engine is well worth the effort. Eg, something Hibernate is very fond
of are n:m mapping tables linking different kinds of 'objects'
together: Since I'm not fond of dealing with this explicitly at all,
I'm always accessing the data with the help of suitably defined views
so that the database pulls the different parts together for me. I
would go so far as to say that using joins in application code is
usually a mistake: The database should be worrying about that, not
the guy who tries to make sense of the code.

Extending Template Classes - Best Practices	2	Nov 2, 2011
XML Database Best Practices	9	May 30, 2007
C++ VS C# for specific app	1	Jul 5, 2019
Internationalization: properties best practices	0	May 9, 2007
best practices	9	Apr 10, 2006
Best Practices - User Data & Sessions?	3	Sep 13, 2007
Best practices for forward declaring a template function	4	Nov 29, 2009
Adding successive input fields - best practices?	3	Mar 28, 2008

Database usage best practices

Rainer Weikusat

Charlton Wilbur

Rainer Weikusat

Rainer Weikusat

Rainer Weikusat

Kaz Kylheku

Rainer Weikusat

Rainer Weikusat

Kaz Kylheku

Kaz Kylheku

Kaz Kylheku

Rainer Weikusat

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads