Database usage best practices

Discussion in 'Perl Misc' started by Rainer Weikusat, Sep 26, 2012.

  1. I just came accross the following statement in the DBI::Class
    documentation,

    ,----
    | Class::DBI provides a convenient abstraction layer to a database.
    |
    | It not only provides a simple database to object mapping layer, but
    | can be used to implement several higher order database functions
    | (triggers, referential integrity, cascading delete etc.), at the
    | application level, rather than at the database.
    |
    | This is particularly useful when using a database which doesn't
    | support these (such as MySQL), or when you would like your code to be
    | portable across multiple databases which might implement these things
    | in different ways.
    |
    | In short, Class::DBI aims to make it simple to introduce 'best
    | practice' when dealing with data stored in a relational database.
    `----

    I strongly disagree with the opinion that it would be 'best practice'
    to treat a RDBMS as 'dumb' system for storing structured data in
    binary files and reimplement all the features it already has in
    application code on top of it. This rather strikes me as 'back to the
    1960!' idea which likely comes from the fact that RDBMSes, originally
    supposed to enable people to perform operations on datasets without
    having to learn programming in some imperative language first, were so
    successful that people who already know how to program in an imperative
    language are more or less forced to use them but - of course - they
    don't have the slightest interest in actually learning how to so do
    efficiently, especially if this means 'learning a second' (or even
    more than 'the second') other programming language.

    Ironically, this step backwards in the last century markets itself as
    modern ...
     
    Rainer Weikusat, Sep 26, 2012
    #1
    1. Advertising

  2. >>>>> "RW" == Rainer Weikusat <> writes:

    RW> I strongly disagree with the opinion that it would be 'best
    RW> practice' to treat a RDBMS as 'dumb' system for storing
    RW> structured data in binary files and reimplement all the features
    RW> it already has in application code on top of it.

    Point the 1st: it depends very much on the DB. If you're using MySQL,
    the only sane path is to treat it as a dumb system and reimplement the
    features you need on top of it.

    Point the 2nd: you're treating this as a black and white issue. It's
    not whether an ORM layer is saintly or wicked, it's what costs you pay
    in terms of performance and maintainability versus what benefits you get
    in terms of expressivity, error-resistance, and risk of change when you
    include an ORM layer.

    In my experience, if you have an application large enough to consider in
    terms of MVC that uses a database as a data store in any but the most
    trivial of manners, you are a moron if you *don't* use an ORM layer. In
    particular, this architecture allows new features in the application to
    be developed entirely in software, and then once the desired behavior is
    understood and stable, the logic can be pushed down into the database.
    The risk and cost of change at levels higher than the ORM layer is much
    lower than the risk and cost of change at levels below the ORM layer,
    and this architecture offers a way to mitigate that risk.

    RW> This rather strikes me as 'back to the 1960!' idea which likely
    RW> comes from the fact that RDBMSes, originally supposed to enable
    RW> people to perform operations on datasets without having to learn
    RW> programming in some imperative language first, were so
    RW> successful that people who already know how to program in an
    RW> imperative language are more or less forced to use them but - of
    RW> course - they don't have the slightest interest in actually
    RW> learning how to so do efficiently, especially if this means
    RW> 'learning a second' (or even more than 'the second') other
    RW> programming language.

    I'm sure waving your dick around like this is very pleasant, but please
    tuck it back in your pants; people are pointing and laughing.

    Charlton


    --
    Charlton Wilbur
     
    Charlton Wilbur, Sep 27, 2012
    #2
    1. Advertising

  3. Charlton Wilbur <> writes:

    [...]

    > I'm sure waving your dick around like this is very pleasant, but please
    > tuck it back in your pants; people are pointing and laughing.


    I would like to add an additional metapoint: The opinions of worthless
    people are worthless.
     
    Rainer Weikusat, Sep 28, 2012
    #3
  4. Mladen Gogala <> writes:
    > On Thu, 27 Sep 2012 14:45:52 -0400, Charlton Wilbur wrote:
    >> Point the 2nd: you're treating this as a black and white issue. It's
    >> not whether an ORM layer is saintly or wicked, it's what costs you pay
    >> in terms of performance and maintainability versus what benefits you get
    >> in terms of expressivity, error-resistance, and risk of change when you
    >> include an ORM layer.

    >
    > How do you measure expressivity and error resistance? How would you even
    > define expressivity? I am not sure what is the connection between risk
    > and ORM, so I will not comment on this one.


    I don't know what 'expressivity' is supposed to be but I have some
    ideas of 'error resistance' for program interacting with databases:
    Usually, this means catching database errors and deciding on a
    sensible recovery strategy. This is especially important for programs
    which are supposed to run continously without supervision. In Perl,
    this relatively easy: DBI supports defining a database error
    handler. This handler can throw an exception (meaning, an exception
    object containing information about the error, not something like
    "Can't database!"). This exception can be caught at a suitable place
    in the application code. Provided the problem seems transient, the
    usually sensible way to deal with that it to disconnect from the
    database, reconnect and re-execute the query at the front of the query
    queue. This is not anyhow 'magically' tied to 'ORM layers' and not
    really difficult to implement.

    In simpler cases, it is sufficient to fork the 'worker process' (this
    job can also be performed by a special supervisor program) and do the
    actual work in the child. Should the child hit a transient problem, it
    exits with a non-zero exit code and after seeing that, the parent
    simply forks again and the procedure starts over (I know this is
    difficult to imagine but 'fork' is actually a useful primitive, not
    just an encumbrance on the way to 'execute another program' ...).

    [...]

    >> In my experience, if you have an application large enough to consider in
    >> terms of MVC that uses a database as a data store in any but the most
    >> trivial of manners, you are a moron if you *don't* use an ORM layer. In

    >
    > Oh yes, ORM layer is a good thing, it enables to program very quickly and
    > generates classes for you. ORM is usually accompanied by an application
    > generator, like Groovy on Grails (Hibernate) which means that the
    > projects will be done much more quickly.


    This is simply not true: Over the lifetime of a program, the amortized
    impplementation cost of every infrastructure facility contained in it
    is zero provided that it actually works and works as it would need to
    work. As a practical example, the largest Perl program I'm currently
    working with is just about to break through the 10,000 LOC barrier
    (I've done a couple of large ones in the past). This includes a
    database interface layer designed to be suitable for this particular
    program and writing that has maybe costed me something like a week
    back in 2010. Presently, it amounts to 6% of the code (678 lines) and
    something like this is by far too trivial to warrant downloading
    seriously large amounts of unknown code 'from the internet'. In
    exchange for this minor effort (and some assorted fixes/ enhancements
    to the DBD::pg module), this has been working 24x7 in dozens of
    installations world-wide (which is not much in absolute terms but a
    fair lot for a single person to support) without ever causing the
    slightest headache to me or the people who use this code. In contrast
    to that, 'the Java team' (larger than 1 person) uses Hibernate and
    they've certainly meanwhile spend as much time as I spent writing this
    code with debugging, investigating and working-around bugs/ quirks in
    the ORM. These chores are still ongoing and in exchange for that, they
    got a technically inferior solution both from an 'error resilience'
    and a user experience standpoint (of course, it can be argued that the
    solution to any problem really ought to be 'download more stuff from
    the internet' :).
     
    Rainer Weikusat, Sep 28, 2012
    #4
  5. Rainer Weikusat <> writes:

    [Hibernate]

    > in exchange for that, they got a technically inferior solution both
    > from an 'error resilience' and a user experience standpoint (of
    > course, it can be argued that the solution to any problem really
    > ought to be 'download more stuff from the internet' :).


    Some details on this:

    By default, the ORM which happens to be used here does not support
    reconnecting to the database after an existing connection was severed
    for whatever reason. Should this happen, the UI code becomes dead in
    water until the corresponding process is restarted. Meanwhile, this
    default policy has been changed and the program survives TCP-related
    malaises. But this didn't happen until it caused problems for
    customers and understandingly so: The theory behind using third-party
    developed infrastructure code is that the people who did that are 'the
    experts' in this domain and that a poor application programmer should
    be happy that he can concentrate on his problems instead of having to
    learn how to tame 'wild beasts' like a RDBMS. Because of this, people
    who use this code naturally tend to leaving 'the defaults' alone.

    Also by default (or at least when being used in the 'obvious,
    straight-forward way') the ORM performs database operations
    synchronously and 'the user session' becomes unresponsive until they
    are completed. In contrast to this, the Perl program I mentioned in
    the previous posting has been designed as 'traditional'
    single-threaded, event-driven UNIX(*) process whose 'processing core'
    is an I/O multiplexing loop (since it has lots of other stuff to do
    than interact with the database). Because of this, it uses the
    PostgreSQL asynchronous query processing interface. At the application
    level, this means that a query is 'started' based on a query object
    and some parameters for this particular invocation and once it has
    been completed, a suitable 'continuation' will be invoked (while I
    still think the way 'continuations' are supposed to work in scheme is
    completely mad, the concept/ term itself is IMHO very useful). Since
    Perl supports closures, this is an easy thing to do. The same can very
    likely be accomplished with 'the ORM' as well, at the very least by
    employing threading in a suitable way, but it wasn't done: The
    difference is really that I started with designing how 'the program',
    considering its purpose, was generally supposed to operate and then
    started looking for already existing code which could be helpful for
    accomplishing this instead of starting with "download what everybody
    else uses" and then code according the properties it happens to have.

    The morale of this is that 'complex abstraction layers' are only
    really useful when the people using them already have the knowledge
    and skills they are not supposed to need because of the abstraction
    layer.

    A note the 'dicksizing' auto-responder: I understand that you are
    usually motivated by showing off, however, this is not necessarily
    true for everyone else. I try to avoid using my own experiences as
    an example as hard as possible exactly to avoid soliciting these kind
    of useless retorts. But this isn't always possible and I'm not really
    convinced that it is sensible.
     
    Rainer Weikusat, Sep 28, 2012
    #5
  6. Rainer Weikusat

    Kaz Kylheku Guest

    On 2012-09-27, Mladen Gogala <> wrote:
    > How do you measure expressivity and error resistance? How would you even
    > define expressivity?


    Expressivity is the extent to which you can express the solution to a problem
    in terms of only the symbols and language of the problem domain. Let us
    call those relevant symbols.

    So for instance if we're solving some problem in linguistics, but stuff like
    "malloc(sizeof *node * n)" occurs in the solution, then that takes away
    from the expressivity. The entity "node" perhaps represents something in the
    problem domain, and perhaps n also (being the number of such things), but
    malloc, sizeof, and the * dereferencing operator are three irrelevant symbols,
    and so is the * multiplicative operator, because it's involved in a memory
    management calculation which is irrelevant to the problem domain.

    To obtain a measure of expressivity, we could count the total number of symbols
    in the code, excluding any punctuation. That yields a denominator by which we
    can divide the number of just the relevant symbols to obtain a fraction.

    Expressivity is, intuitively, a very real, concrete concept which we can
    readily recognize, and that tells us we can put a number on it, just like we
    can put a number on how funny a joke is, or how moving is a symphony.
     
    Kaz Kylheku, Sep 28, 2012
    #6
  7. Kaz Kylheku <> writes:
    > On 2012-09-27, Mladen Gogala <> wrote:
    >> How do you measure expressivity and error resistance? How would you even
    >> define expressivity?

    >
    > Expressivity is the extent to which you can express the solution to a problem
    > in terms of only the symbols and language of the problem domain. Let us
    > call those relevant symbols.
    >
    > So for instance if we're solving some problem in linguistics, but stuff like
    > "malloc(sizeof *node * n)" occurs in the solution, then that takes away
    > from the expressivity. The entity "node" perhaps represents something in the
    > problem domain, and perhaps n also (being the number of such things), but
    > malloc, sizeof, and the * dereferencing operator are three irrelevant symbols,
    > and so is the * multiplicative operator, because it's involved in a memory
    > management calculation which is irrelevant to the problem domain.


    It is irrelevant 'to the problem domain' because 'the problem' is
    different from the 'the solution'. Because of this, a problem can be
    stated abstractly while any technical solution to the problem will
    need to make use of existing tools suitable for solving problems. Eg,
    let's assume the problem is "I want a coffee!". Neither the ground
    coffee beans nor the filter nor the water kettle, the water itself,
    the desk all of this is placed on, the material which was used to
    construct this desk nor even the coffeepot I employ in order to drink
    the result has any 'problem domain' relation to the problem
    itself. They are just necessary (or helpful) parts of a technical
    solution to it.
     
    Rainer Weikusat, Sep 28, 2012
    #7
  8. Rainer Weikusat <> writes:

    [...]

    > let's assume the problem is "I want a coffee!". Neither the ground
    > coffee beans nor


    I was actually being to conservative: As demonstrated by
    ' !Iaaceeffnotw', stating the problem already requires a lot of things
    with absolutely no inherent relation to it. Not even the morphemes or
    the sounds the spoken sentence would be composed have any: All just
    perfectly abitrary conventions forced onto the poor users of language
    because of 'technicalities' of the solution domain.
     
    Rainer Weikusat, Sep 28, 2012
    #8
  9. Rainer Weikusat

    Kaz Kylheku Guest

    On 2012-09-28, Rainer Weikusat <> wrote:
    > Kaz Kylheku <> writes:
    >> On 2012-09-27, Mladen Gogala <> wrote:
    >>> How do you measure expressivity and error resistance? How would you even
    >>> define expressivity?

    >>
    >> Expressivity is the extent to which you can express the solution to a problem
    >> in terms of only the symbols and language of the problem domain. Let us
    >> call those relevant symbols.
    >>
    >> So for instance if we're solving some problem in linguistics, but stuff like
    >> "malloc(sizeof *node * n)" occurs in the solution, then that takes away
    >> from the expressivity. The entity "node" perhaps represents something in the
    >> problem domain, and perhaps n also (being the number of such things), but
    >> malloc, sizeof, and the * dereferencing operator are three irrelevant symbols,
    >> and so is the * multiplicative operator, because it's involved in a memory
    >> management calculation which is irrelevant to the problem domain.

    >
    > It is irrelevant 'to the problem domain' because 'the problem' is
    > different from the 'the solution'. Because of this, a problem can be
    > stated abstractly while any technical solution to the problem will
    > need to make use of existing tools suitable for solving problems. Eg,
    > let's assume the problem is "I want a coffee!". Neither the ground
    > coffee beans nor the filter nor the water kettle, the water itself,
    > the desk all of this is placed on, the material which was used to
    > construct this desk nor even the coffeepot I employ in order to drink
    > the result has any 'problem domain' relation to the problem
    > itself. They are just necessary (or helpful) parts of a technical
    > solution to it.


    Those things may be necessary and helpful, but they are not expressive.
    (Well, they are "espressive", haha.)

    The details of the process for brewing coffee is irrelevant to the problem
    domain of "I would like a coffee". The relevant concepts are: are what kind of
    coffee I want: latte, cappucino, black. How dark a roast, how much sugar, if
    any and so on.

    The barista at the coffee shop speaks more or less the relevant language of
    this problem domain, as does any well-designed coffee dispensing automaton.
    They hide the irrelevant details from the consumer, giving us an abstract
    cofee-making interface whereby we can "program" the coffee that we want
    in "coffee shop lingo".

    In computing, we can similarly hide the necessary and helpful, but irrelevant
    concepts. Those are not always bits, bytes, pointers and memory management.

    For instance "Low pass filter these samples with a cutoff frequency of
    4000 Hz, rolling off at 18 db per octave" is more expressive than constructing
    a matrix full of coefficients and then explicitly munging the data
    trhough it.
     
    Kaz Kylheku, Sep 28, 2012
    #9
  10. Rainer Weikusat

    Kaz Kylheku Guest

    On 2012-09-28, Mladen Gogala <> wrote:
    > On Fri, 28 Sep 2012 18:12:30 +0000, Kaz Kylheku wrote:
    >
    >> Expressivity is the extent to which you can express the solution to a
    >> problem in terms of only the symbols and language of the problem domain.
    >> Let us call those relevant symbols.

    >
    > Thanks for the nice explanation. I haven't heard about the expressivity
    > before, my spelling checker is still flagging it out. What does
    > expressivity show me and what can I use it for? May I conclude anything
    > about the program unit being studied, based on its expressivity?


    Yes. For example, if you have sufficient expressivity, there isn't much
    difference between the specification of a problem and the solution. And, of
    course, correctness means that the solution implements the specification of the
    problem, so if they are not so different, correctness is easier to verify.

    So certain conclusions are easier to obtain from expressive code. But not
    necessarily others. If you require conclusions about the details which
    are hidden, that may be difficult to impossible.

    For instance, suppose you want reach a conclusion about how much memory the
    solution requires, but there is no explicit memory management. Then you have to
    infer it somehow by knowing how it works "under the hood", or rely on
    profiling tools.
     
    Kaz Kylheku, Sep 28, 2012
    #10
  11. Rainer Weikusat

    Kaz Kylheku Guest

    On 2012-09-28, Rainer Weikusat <> wrote:
    > Rainer Weikusat <> writes:
    >
    > [...]
    >
    >> let's assume the problem is "I want a coffee!". Neither the ground
    >> coffee beans nor

    >
    > I was actually being to conservative: As demonstrated by
    > ' !Iaaceeffnotw', stating the problem already requires a lot of things
    > with absolutely no inherent relation to it.


    Well, you need a language. You need symbols, and the representation of
    symbols and how they map to concepts is arbitrary.

    But a solution using one symbols is isomorphic to another one using
    different symbols for the same thing. (It can be symbol-for-symbol isomorphic.)

    If you don't understand the conventions of the language, then an utterance may
    not look like gibberish to you, but that's not the same thing as lacking
    inherent expressivity.

    Expressivity *demands* reliance on material that has been internalized between
    the originator of the message and the recipient: some common language,
    common understanding of domain abstractions and so on.

    Expressivity means putting less information in, and relying on "exformation".
    (I didn't make up that word and I'm using it consistently with:
    http://en.wikipedia.org/wiki/Exformation)

    > Not even the morphemes or
    > the sounds the spoken sentence would be composed have any: All just


    For the purpose of expressivity, we don't look at morphemes unless they act as
    independent symbols. Morphemes which are just fragments of the representation
    of a symbol are uninteresting, because symbols are atoms, and could be
    replaced by other atoms in a way that preserves meaning, if the replacement
    is consistent.

    The pattern of morphemes is valuable only to the extent that when it occurs
    twice in the utterance (and in the same context), it refers to the same thing.
     
    Kaz Kylheku, Sep 28, 2012
    #11
  12. Mladen Gogala <> writes:
    > On Fri, 28 Sep 2012 11:50:37 +0100, Rainer Weikusat wrote:
    >
    >> This is simply not true: Over the lifetime of a program, the amortized
    >> impplementation cost of every infrastructure facility contained in it is
    >> zero provided that it actually works and works as it would need to work.

    >
    > Rainer, I agree with you about the DB usage, I disagree about the
    > ORM,


    I didn't write anything about my opinion of object-relational
    mapping developer support software, I just stated the fact that
    'database interaction' is a relatively tiny and ancilliary part of
    almost all programs which include database interactions I've either
    written or encountered so far. The same happens to be true for all
    other kinds of problem-independent infrastructure code: It's always
    there and it is always just an insignificant amount compared with the
    problem specific code and further, it is not only the smaller part but
    also the part which is a lot easier to implement and get
    right (I'm actually fairly convinced that such an ungodly amount of
    'generalized infrastructure code' exists on the internet because
    this is code dealing with inherently simple problems. Simple enough,
    to be precise, that horrendously complicated ways of dealing with them
    don't just collapse under their own weight, as they would if the
    problem was complicated, meaning, getting half of the corner cases
    which shouldn't have been lumped together to begin with more or less
    subtly wrong would render the thing itself completely unusuable).

    In Perl, I've so far been able to get by with using a slowly expanding
    subset of the DBI functionality and the reason why I employ relational
    database management systems is because the nature of a given problem
    is such that doing so provides an advantage. Consequently, I don't
    quite understand why I would want to use some middlewhere whose sole
    purpose is to hide the RDBMS behind a simpler (and thus, less
    flexible) abstraction: If 'simpler but less flexible' seems a viable
    option, I can simply not use a database.

    In any case, I'm convinced that learning how to employ a database
    engine is well worth the effort. Eg, something Hibernate is very fond
    of are n:m mapping tables linking different kinds of 'objects'
    together: Since I'm not fond of dealing with this explicitly at all,
    I'm always accessing the data with the help of suitably defined views
    so that the database pulls the different parts together for me. I
    would go so far as to say that using joins in application code is
    usually a mistake: The database should be worrying about that, not
    the guy who tries to make sense of the code.
     
    Rainer Weikusat, Sep 30, 2012
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. karim
    Replies:
    0
    Views:
    474
    karim
    Jul 13, 2003
  2. John Dalberg
    Replies:
    3
    Views:
    583
    samuelhon
    Nov 16, 2006
  3. Yehia A.Salam

    XML Database Best Practices

    Yehia A.Salam, May 30, 2007, in forum: ASP .Net
    Replies:
    9
    Views:
    354
    sloan
    May 31, 2007
  4. MartyN
    Replies:
    2
    Views:
    152
    Egbert Nierop \(MVP for IIS\)
    May 30, 2007
  5. Chicken McNuggets

    Best book on C gotchas and best practices?

    Chicken McNuggets, Jul 31, 2013, in forum: C Programming
    Replies:
    9
    Views:
    275
    Fred J. Tydeman
    Aug 5, 2013
Loading...

Share This Page