Database development

Tom Anderson · Apr 26, 2010

Thats what additional indexes are for. For example, to support the
following example you'd want indexes on doc.keyword and folder.name,
since they'll probably be used often for data selection and sorting.
We'll assume that the DB designer was sensible and put indexes on both
table's prime keys. Your example requirement could be satisfied with a
single query, which would involve a prime key join and use the additional
indexes to select the rows to be included in the dataset. Something like
this:

SELECT required fields
from folder f, document d
where f.key = d.folderkey
and d.keyword = 'FOO'
and f.name = 'Reports';

Any decent RDBMS should be able to optimise that type of query and would
return just the required result set.

I suspect documents may have more than one keyword, in which case your
query might look like:

SELECT required fields
from folder f, document d
where f.key = d.folderkey
and d.keyword LIKE '%FOO%' -- if keywords are packed space-separated into one column (bad idea)
and f.name = 'Reports';

Or:

SELECT required fields
from folder f, document d, keyword k
where f.key = d.folderkey and d.key = k.documentkey
and k.word = 'FOO' -- if keyword is a relation (documentkey, word)
and f.name = 'Reports';

Or even:

SELECT required fields
from folder f, document d, document_keywords dk, keyword k
where f.key = d.folderkey and d.key = dk.documentkey and dk.keywordkey = k.key
and k.word = 'FOO' -- if keywords are first-class and there is a join table (fully normalised but overcomplicated)
and f.name = 'Reports';

Why would you need to do that? A correctly written query will only
return the data you need, present it in the order you want and not slow
database updates down by requiring additional reference tables which
must be maintained.

I think the OP is saying he would have to navigate in that way if he was
using ORM rather than SQL. That still isn't true, though, because in JPA
you have JP-QL, which is more or less isomorphic to SQL, and in fact
simpler, because the mapping of relationships to properties makes joins
easier to express:

select doc from Document doc
where
'FOO' in doc.keywords
and doc.folder.name = 'Reports'

(i think)

tom

Lew · Apr 26, 2010

Tom said:
I think the OP is saying he would have to navigate in that way if he was
using ORM rather than SQL. That still isn't true, though, because in JPA
you have JP-QL, which is more or less isomorphic to SQL, and in fact
simpler, because the mapping of relationships to properties makes joins
easier to express:

select doc from Document doc
where
'FOO' in doc.keywords
and doc.folder.name = 'Reports'

(i [sic] think)

What you wrote looks correct, but I am fairly certain it requires
'keywords' to be expressed as a collection (probably a 'Set'), which
is best done if the keywords are in their own table, which they should
be anyway. I don't think it works with the space-separated list of
keywords as in your first of three examples.

Your second example, as you hint, is probably optimal and the third
overkill.

Robert Klemme · Apr 26, 2010

I don't think the big benefits of ORM (Hibernate or JPA or one of the
alternatives) are in the writing of the code. It still requires
somebody that knows both the ORM framework and the database well
to write really efficient code.

The big benefits are for reading the code. Everyone can read the
the code using ORM and immediately understand what it does without
looking at tons of code that uses JDBC and SQL. It is maintenance
friendly.

So you are saying that I need skilled people to write the initial code
and can give maintenance to less skilled people because the ORM using
code is easy to read? I am not sure that is a good strategy. Over time
software tends to decay because more and more bug fixes are applied and
features added. If only the people knew internals of ORM that wrote the
initial code I see a good chance that maintainers wreck havoc on the
performance and potentially the whole application if they change /
extend the easy readable code without knowing the tool they are using.
Even a change as seemingly simple as that of a field type from "int" to
"String" might have dramatic consequences. And just think of the woes
of schema migration: if you have an installed base you urgently need
someone who understands the DB underneath and the ORM tool to come up
with a feasible migration strategy that.

Btw, did I mention that I believe database independence is a myth?

Kind regards

robert

Tom Anderson · Apr 26, 2010

Tom said:
Tom said:

I think the OP is saying he would have to navigate in that way if he
was using ORM rather than SQL. That still isn't true, though, because
in JPA you have JP-QL, which is more or less isomorphic to SQL, and in
fact simpler, because the mapping of relationships to properties makes
joins easier to express:

select doc from Document doc
where
'FOO' in doc.keywords
and doc.folder.name = 'Reports'

(i [sic] think)

Click to expand...

What you wrote looks correct, but I am fairly certain it requires
'keywords' to be expressed as a collection (probably a 'Set'), which is
best done if the keywords are in their own table, which they should be
anyway. I don't think it works with the space-separated list of
keywords as in your first of three examples.

I believe you're absolutely right - sorry, i should have been clearer on
what context that query required.

Your second example, as you hint, is probably optimal and the third
overkill.

Probably. The advantage of the fully normal design is that queries like:

"find me all the documents with keywords with 'brew' in them"

are fast, because doing that in any schema involves a LIKE over the
keyword text, which more or less means a table scan, and the keyword table
in that design is smaller than in the second one (proportional to the
number of distinct keywords, not the sum of the number of keywords over
all documents), and much smaller than in the first one, where it's the
whole table.

But then, if you have full-text indexing, you can search the packed string
without doing a table scan. Although that's not necessarily fast enough:

http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html

But then, you can full-text index the more normal schemas too:

http://www.opensymphony.com/compass/content/about.html

Er, so, yeah, anyway, glad to have entirely eliminated the OP's confusion,
i'm sure.

tom

Robert Klemme · Apr 26, 2010

(e-mail address removed) says...

ORM doesn't force business rules into a separate layer and raw JDBC
calls don't force them into the same layer as persistence.

Exactly. I believe there is a fundamental dilemma that I haven't seen a
satisfying solution to: with a relational database and an object
oriented (or not) application which implements the business logic you
automatically have a distribution of business rules between several tiers.

If you manage to place all business consistency rules into the database
(which is often impossible because either of limitations of the DB or
complexity of the model) you leave very little for the application layer
(mostly presentation) so you might wonder why not directly implement all
the business logic in PL/SQL or T-SQL (just to name two well known
brands). Advantage is that you cannot break the model if you need to do
changes in the DB (this can happen for migration, repair or other one
off tasks).

If you place all the rules in the application consequently you would
have to even get rid of foreign keys. Downside is of course that you
now have zero consistency enforcement for the data model in the DB (e.g.
during all those tasks mentioned above) and you are only using 10% of a
potentially expensive installation (in the case of a commercial RDBMS).

In reality I have often seen a mix between the two approaches: some
consistency checking (FK, PK, CHECK constraints, triggers) is done in
the database and the "rest" of the business logic lives in the
application tier. This may actually be the worst approach: it's not
only that you don't have a single place where the business model is
consistently defined and enforced - that is merely a violation of some
form of purity rule (which _does_ have its advantages). But this might
also make people feel safe when they change the database while breaking
business rules that live elsewhere...

We probably should get rid of persistence altogether - maybe Alzheimer
DB or WORN is the future.

Pitch, you can have multi tier with JDBC and ORM - you can even have
multi tier without persistence altogether.

Kind regards

robert

Martin Gregorie · Apr 26, 2010

I suspect documents may have more than one keyword, in which case your
query might look like:

I suspect you're right, unless the keywords are pulled out to form a
unique list in its own table with a M:M relationship to Document, which
be useful in some circumstances. However, I didn't persue that since what
I really wanted to illustrate was the benefits of indexing heavily
searched non-key columns in the right circumstances and of generating
result sets that don't contain irrelevant data. Both seemed to be points
that hadn't occurred to the OP judging by his comment about searching out
required rows from the result set.

Zlatko ÄuriÄ‡ · Apr 26, 2010

I suspect you're right, unless the keywords are pulled out to form a
unique list in its own table with a M:M relationship to Document, which
be useful in some circumstances. However, I didn't persue that since what
I really wanted to illustrate was the benefits of indexing heavily
searched non-key columns in the right circumstances and of generating
result sets that don't contain irrelevant data. Both seemed to be points
that hadn't occurred to the OP judging by his comment about searching out
required rows from the result set.

Hmm. My docs and folders are all "nodes". Stored in the node table.
Nodes have metadata - such as authors, keywords, document id's,
departments, this and that. Those are stored in the metadata table.

Now, to get all the info about one doc, you have to get it's node ID,
and the go to the metadata table, and find out the metadata that means
"parent node", get it's id, then get this parents' children (to get
other documents in this set), then get all those documents and search
for some keywords in those documents.

Lew · Apr 26, 2010

Hmm. My docs and folders are all "nodes". Stored in the node table.
Nodes have metadata - such as authors, keywords, document id's [sic],
departments, this and that. Those are stored in the metadata table.

Now, to get all the info about one doc, you have to get it's [sic] node ID,
and the go to the metadata table, and find out the metadata that means
"parent node", get it's [sic] id, then get this parents' children (to get
other documents in this set), then get all those documents and search
for some keywords in those documents.

You don't have to do it that way. You could do it with a correctly set up
query involving JOINs, which is the point Martin and others are making.

Think set intersection instead of procedure.

Zlatko ÄuriÄ‡ · Apr 26, 2010

You don't have to do it that way. You could do it with a correctly set
up query involving JOINs, which is the point Martin and others are making.

I'm not, the application is.

Think set intersection instead of procedure.

I will, thanks.

Arne Vajhøj · Apr 27, 2010

@y6g2000prk.googlegroups.com>, (e-mail address removed) says...

I always believed that ORM systems are forcing you to write your own
business-rules layer apart from the persistence layer.

It is not enforcing anything. Hibernate or any other ORM does not
refuse to persist classes with business logic in.

There is a certain correlation, because when people start to
use ORM then they have also learned about PL-BLL-DAL (let us
ignore the fact that they probably should have learned about
PL-CL-BLL-DAL).

That way database
access is kept simple and easy mantainable.

Also, this multi-tier architecture allows for easier load-balancing,
architecture changes, integration with other systems, development..

layers != tiers

Arne

Arne Vajhøj · Apr 27, 2010

Exactly. I believe there is a fundamental dilemma that I haven't seen a
satisfying solution to: with a relational database and an object
oriented (or not) application which implements the business logic you
automatically have a distribution of business rules between several tiers.

If you manage to place all business consistency rules into the database
(which is often impossible because either of limitations of the DB or
complexity of the model) you leave very little for the application layer
(mostly presentation) so you might wonder why not directly implement all
the business logic in PL/SQL or T-SQL (just to name two well known
brands). Advantage is that you cannot break the model if you need to do
changes in the DB (this can happen for migration, repair or other one
off tasks).

If you place all the rules in the application consequently you would
have to even get rid of foreign keys. Downside is of course that you now
have zero consistency enforcement for the data model in the DB (e.g.
during all those tasks mentioned above) and you are only using 10% of a
potentially expensive installation (in the case of a commercial RDBMS).

In reality I have often seen a mix between the two approaches: some
consistency checking (FK, PK, CHECK constraints, triggers) is done in
the database and the "rest" of the business logic lives in the
application tier. This may actually be the worst approach: it's not only
that you don't have a single place where the business model is
consistently defined and enforced - that is merely a violation of some
form of purity rule (which _does_ have its advantages). But this might
also make people feel safe when they change the database while breaking
business rules that live elsewhere...

That is a classic dilemma.

My preference is: if database need to be accessed by apps in different
technology, then it makes sense to put the business logic in SP's -
otherwise I would keep the business logic in the Java code, because
that makes it a lot cheaper to work with a different database - I would
keep basic integrity check in the database though.

Arne

Arne Vajhøj · Apr 27, 2010

To use JDBC and SQL, I need to handle trial things, like commit,
rollback etc. How about hibernate/spring?

You still need to handle your own transactions. That is a very
common feature of persistence frameworks - they don't try to
guess how you want your transactions.

Arne

Arne Vajhøj · Apr 27, 2010

So you are saying that I need skilled people to write the initial code
and can give maintenance to less skilled people because the ORM using
code is easy to read? I am not sure that is a good strategy. Over time
software tends to decay because more and more bug fixes are applied and
features added. If only the people knew internals of ORM that wrote the
initial code I see a good chance that maintainers wreck havoc on the
performance and potentially the whole application if they change /
extend the easy readable code without knowing the tool they are using.
Even a change as seemingly simple as that of a field type from "int" to
"String" might have dramatic consequences. And just think of the woes of
schema migration: if you have an installed base you urgently need
someone who understands the DB underneath and the ORM tool to come up
with a feasible migration strategy that.

Not quite. I am saying that you may want people that have a clue
about persistence to write and modify the persistence code, but
that developers that does not know about the used ORM framework
or the database will be able to easily read and understand the code
(while working on something else - like the business logic).

Btw, did I mention that I believe database independence is a myth?

It is a myth that is seen working everyday in the Java world, that
many Java apps using a good ORM (like Hibernate or one
of the JPA implementation) use the same Java code with different
databases. It is not always that easy. But for all the simple
stuff it works well.

Arne

Pitch · Apr 27, 2010

My preference is: if database need to be accessed by apps in different
technology,

If you use ORM you don't want this, so it's not relevant. Right?

Graham Cox · Apr 27, 2010

Lew said:
What you wrote looks correct, but I am fairly certain it requires
'keywords' to be expressed as a collection (probably a 'Set'), which
is best done if the keywords are in their own table, which they should
be anyway. I don't think it works with the space-separated list of
keywords as in your first of three examples.

Arguably the keywords should actually be in a full text index in the
RDBMS, and let the proper full text indexing support handle it
instead. That will be more efficient than scanning a table of keywords
joined back to the table of documents, and - potentially - gives support
for near matches as well...

On the other hand, I've no idea if JPA supports full text indexes and if
it does how you'd go about using them...

Pitch · Apr 27, 2010

Pitch, you can have multi tier with JDBC and ORM - you can even have
multi tier without persistence altogether.

Yes, but I sad _this_ multi-tier architecture. If you use only JDBC you
have one tier less.

Pitch · Apr 27, 2010

(e-mail address removed) says...

ORM doesn't force business rules into a separate layer and raw JDBC calls
don't force them into the same layer as persistence.

I disagree.

Lew · Apr 27, 2010

Pitch said:
I disagree.

The evidence speaks for itself.

Robert Klemme · Apr 27, 2010

Not quite. I am saying that you may want people that have a clue
about persistence to write and modify the persistence code, but
that developers that does not know about the used ORM framework
or the database will be able to easily read and understand the code
(while working on something else - like the business logic).

Thank you for the clarification!

It is a myth that is seen working everyday in the Java world, that
many Java apps using a good ORM (like Hibernate or one
of the JPA implementation) use the same Java code with different
databases. It is not always that easy. But for all the simple
stuff it works well.

Hm... The question is: how simple is "simple" and where does
"complicated" begin? Just an example, which I would rather place in
the "simple" bucket: assume you want to query for items that have a
field set to null. Considering that Oracle does not index NULL values
and another RDBMS that you want to use does so, you'll likely end up
with a clutch: either you use a different value for NULL which will
allow for uniform ORM code at the cost of a bad design; or you need to
make the ORM tool create different queries (and I do not mean the
difference between "VARCHAR2" and "VARCHAR") for Oracle and the other
RDBMS. You could, of course, also live with the FTS in Oracle but I
doubt you'll have much fun doing this on any reasonably large
database.

In any case I believe the complicated stuff is where the fun begins -
so for _me_ DB independence is definitively a myth. ;-)

Kind regards

robert

Robert Klemme · Apr 27, 2010

If you use ORM you don't want this, so it's not relevant. Right?

Maybe you do not want it - but you still may need to be able to do it
(DB repair, large volume changes etc.). Unfortunately reality is
rarely in line with what we like or want - and we still have to deal
with it.

Kind regards

robert

Spring/hibernate and JDBC	35	Jul 9, 2011
Spring Boot Request Mapping: How to Handle Multiple Request Paths in a Controller	1	Oct 12, 2023
Web development on 2022	0	Jul 17, 2022
3 Immediate JAVA Developer Roles !!!	1	May 17, 2012
Website with Database. use C#	1	Mar 25, 2023
IntelliJ Plugin Wizard Development	1	May 30, 2023
Any discuss group about Hibernate, Spring?	4	Mar 12, 2008
Webinar Invitation - Cross Platform Development with Spring	0	Sep 2, 2013

Database development

Tom Anderson

Lew

Robert Klemme

Tom Anderson

Robert Klemme

Martin Gregorie

Zlatko ÄuriÄ‡

Lew

Zlatko ÄuriÄ‡

Arne Vajhøj

Arne Vajhøj

Arne Vajhøj

Arne Vajhøj

Pitch

Graham Cox

Pitch

Pitch

Lew

Robert Klemme

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads