Database development

T

Tom Anderson

Thats what additional indexes are for. For example, to support the
following example you'd want indexes on doc.keyword and folder.name,
since they'll probably be used often for data selection and sorting.
We'll assume that the DB designer was sensible and put indexes on both
table's prime keys. Your example requirement could be satisfied with a
single query, which would involve a prime key join and use the additional
indexes to select the rows to be included in the dataset. Something like
this:

SELECT required fields
from folder f, document d
where f.key = d.folderkey
and d.keyword = 'FOO'
and f.name = 'Reports';

Any decent RDBMS should be able to optimise that type of query and would
return just the required result set.

I suspect documents may have more than one keyword, in which case your
query might look like:

SELECT required fields
from folder f, document d
where f.key = d.folderkey
and d.keyword LIKE '%FOO%' -- if keywords are packed space-separated into one column (bad idea)
and f.name = 'Reports';

Or:

SELECT required fields
from folder f, document d, keyword k
where f.key = d.folderkey and d.key = k.documentkey
and k.word = 'FOO' -- if keyword is a relation (documentkey, word)
and f.name = 'Reports';

Or even:

SELECT required fields
from folder f, document d, document_keywords dk, keyword k
where f.key = d.folderkey and d.key = dk.documentkey and dk.keywordkey = k.key
and k.word = 'FOO' -- if keywords are first-class and there is a join table (fully normalised but overcomplicated)
and f.name = 'Reports';
Why would you need to do that? A correctly written query will only
return the data you need, present it in the order you want and not slow
database updates down by requiring additional reference tables which
must be maintained.

I think the OP is saying he would have to navigate in that way if he was
using ORM rather than SQL. That still isn't true, though, because in JPA
you have JP-QL, which is more or less isomorphic to SQL, and in fact
simpler, because the mapping of relationships to properties makes joins
easier to express:

select doc from Document doc
where
'FOO' in doc.keywords
and doc.folder.name = 'Reports'

(i think)

tom
 
L

Lew

Tom said:
I think the OP is saying he would have to navigate in that way if he was
using ORM rather than SQL. That still isn't true, though, because in JPA
you have JP-QL, which is more or less isomorphic to SQL, and in fact
simpler, because the mapping of relationships to properties makes joins
easier to express:

select doc from Document doc
where
  'FOO' in doc.keywords
  and doc.folder.name = 'Reports'

(i [sic] think)

What you wrote looks correct, but I am fairly certain it requires
'keywords' to be expressed as a collection (probably a 'Set'), which
is best done if the keywords are in their own table, which they should
be anyway. I don't think it works with the space-separated list of
keywords as in your first of three examples.

Your second example, as you hint, is probably optimal and the third
overkill.
 
R

Robert Klemme

I don't think the big benefits of ORM (Hibernate or JPA or one of the
alternatives) are in the writing of the code. It still requires
somebody that knows both the ORM framework and the database well
to write really efficient code.

The big benefits are for reading the code. Everyone can read the
the code using ORM and immediately understand what it does without
looking at tons of code that uses JDBC and SQL. It is maintenance
friendly.

So you are saying that I need skilled people to write the initial code
and can give maintenance to less skilled people because the ORM using
code is easy to read? I am not sure that is a good strategy. Over time
software tends to decay because more and more bug fixes are applied and
features added. If only the people knew internals of ORM that wrote the
initial code I see a good chance that maintainers wreck havoc on the
performance and potentially the whole application if they change /
extend the easy readable code without knowing the tool they are using.
Even a change as seemingly simple as that of a field type from "int" to
"String" might have dramatic consequences. And just think of the woes
of schema migration: if you have an installed base you urgently need
someone who understands the DB underneath and the ORM tool to come up
with a feasible migration strategy that.

Btw, did I mention that I believe database independence is a myth? :)

Kind regards

robert
 
T

Tom Anderson

Tom said:
I think the OP is saying he would have to navigate in that way if he
was using ORM rather than SQL. That still isn't true, though, because
in JPA you have JP-QL, which is more or less isomorphic to SQL, and in
fact simpler, because the mapping of relationships to properties makes
joins easier to express:

select doc from Document doc
where
  'FOO' in doc.keywords
  and doc.folder.name = 'Reports'

(i [sic] think)

What you wrote looks correct, but I am fairly certain it requires
'keywords' to be expressed as a collection (probably a 'Set'), which is
best done if the keywords are in their own table, which they should be
anyway. I don't think it works with the space-separated list of
keywords as in your first of three examples.

I believe you're absolutely right - sorry, i should have been clearer on
what context that query required.
Your second example, as you hint, is probably optimal and the third
overkill.

Probably. The advantage of the fully normal design is that queries like:

"find me all the documents with keywords with 'brew' in them"

are fast, because doing that in any schema involves a LIKE over the
keyword text, which more or less means a table scan, and the keyword table
in that design is smaller than in the second one (proportional to the
number of distinct keywords, not the sum of the number of keywords over
all documents), and much smaller than in the first one, where it's the
whole table.

But then, if you have full-text indexing, you can search the packed string
without doing a table scan. Although that's not necessarily fast enough:

http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html

But then, you can full-text index the more normal schemas too:

http://www.opensymphony.com/compass/content/about.html

Er, so, yeah, anyway, glad to have entirely eliminated the OP's confusion,
i'm sure.

tom
 
R

Robert Klemme

(e-mail address removed) says...


ORM doesn't force business rules into a separate layer and raw JDBC
calls don't force them into the same layer as persistence.

Exactly. I believe there is a fundamental dilemma that I haven't seen a
satisfying solution to: with a relational database and an object
oriented (or not) application which implements the business logic you
automatically have a distribution of business rules between several tiers.

If you manage to place all business consistency rules into the database
(which is often impossible because either of limitations of the DB or
complexity of the model) you leave very little for the application layer
(mostly presentation) so you might wonder why not directly implement all
the business logic in PL/SQL or T-SQL (just to name two well known
brands). Advantage is that you cannot break the model if you need to do
changes in the DB (this can happen for migration, repair or other one
off tasks).

If you place all the rules in the application consequently you would
have to even get rid of foreign keys. Downside is of course that you
now have zero consistency enforcement for the data model in the DB (e.g.
during all those tasks mentioned above) and you are only using 10% of a
potentially expensive installation (in the case of a commercial RDBMS).

In reality I have often seen a mix between the two approaches: some
consistency checking (FK, PK, CHECK constraints, triggers) is done in
the database and the "rest" of the business logic lives in the
application tier. This may actually be the worst approach: it's not
only that you don't have a single place where the business model is
consistently defined and enforced - that is merely a violation of some
form of purity rule (which _does_ have its advantages). But this might
also make people feel safe when they change the database while breaking
business rules that live elsewhere...

We probably should get rid of persistence altogether - maybe Alzheimer
DB or WORN is the future. :)

Pitch, you can have multi tier with JDBC and ORM - you can even have
multi tier without persistence altogether.

Kind regards

robert
 
M

Martin Gregorie

I suspect documents may have more than one keyword, in which case your
query might look like:
I suspect you're right, unless the keywords are pulled out to form a
unique list in its own table with a M:M relationship to Document, which
be useful in some circumstances. However, I didn't persue that since what
I really wanted to illustrate was the benefits of indexing heavily
searched non-key columns in the right circumstances and of generating
result sets that don't contain irrelevant data. Both seemed to be points
that hadn't occurred to the OP judging by his comment about searching out
required rows from the result set.
 
Z

Zlatko Äurić

I suspect you're right, unless the keywords are pulled out to form a
unique list in its own table with a M:M relationship to Document, which
be useful in some circumstances. However, I didn't persue that since what
I really wanted to illustrate was the benefits of indexing heavily
searched non-key columns in the right circumstances and of generating
result sets that don't contain irrelevant data. Both seemed to be points
that hadn't occurred to the OP judging by his comment about searching out
required rows from the result set.

Hmm. My docs and folders are all "nodes". Stored in the node table.
Nodes have metadata - such as authors, keywords, document id's,
departments, this and that. Those are stored in the metadata table.

Now, to get all the info about one doc, you have to get it's node ID,
and the go to the metadata table, and find out the metadata that means
"parent node", get it's id, then get this parents' children (to get
other documents in this set), then get all those documents and search
for some keywords in those documents.
 
L

Lew

Hmm. My docs and folders are all "nodes". Stored in the node table.
Nodes have metadata - such as authors, keywords, document id's [sic],
departments, this and that. Those are stored in the metadata table.

Now, to get all the info about one doc, you have to get it's [sic] node ID,
and the go to the metadata table, and find out the metadata that means
"parent node", get it's [sic] id, then get this parents' children (to get
other documents in this set), then get all those documents and search
for some keywords in those documents.

You don't have to do it that way. You could do it with a correctly set up
query involving JOINs, which is the point Martin and others are making.

Think set intersection instead of procedure.
 
Z

Zlatko Äurić

You don't have to do it that way. You could do it with a correctly set
up query involving JOINs, which is the point Martin and others are making.

I'm not, the application is.
Think set intersection instead of procedure.

I will, thanks.
 
A

Arne Vajhøj

@y6g2000prk.googlegroups.com>, (e-mail address removed) says...

I always believed that ORM systems are forcing you to write your own
business-rules layer apart from the persistence layer.

It is not enforcing anything. Hibernate or any other ORM does not
refuse to persist classes with business logic in.

There is a certain correlation, because when people start to
use ORM then they have also learned about PL-BLL-DAL (let us
ignore the fact that they probably should have learned about
PL-CL-BLL-DAL).
That way database
access is kept simple and easy mantainable.

Also, this multi-tier architecture allows for easier load-balancing,
architecture changes, integration with other systems, development..

layers != tiers

Arne
 
A

Arne Vajhøj

Exactly. I believe there is a fundamental dilemma that I haven't seen a
satisfying solution to: with a relational database and an object
oriented (or not) application which implements the business logic you
automatically have a distribution of business rules between several tiers.

If you manage to place all business consistency rules into the database
(which is often impossible because either of limitations of the DB or
complexity of the model) you leave very little for the application layer
(mostly presentation) so you might wonder why not directly implement all
the business logic in PL/SQL or T-SQL (just to name two well known
brands). Advantage is that you cannot break the model if you need to do
changes in the DB (this can happen for migration, repair or other one
off tasks).

If you place all the rules in the application consequently you would
have to even get rid of foreign keys. Downside is of course that you now
have zero consistency enforcement for the data model in the DB (e.g.
during all those tasks mentioned above) and you are only using 10% of a
potentially expensive installation (in the case of a commercial RDBMS).

In reality I have often seen a mix between the two approaches: some
consistency checking (FK, PK, CHECK constraints, triggers) is done in
the database and the "rest" of the business logic lives in the
application tier. This may actually be the worst approach: it's not only
that you don't have a single place where the business model is
consistently defined and enforced - that is merely a violation of some
form of purity rule (which _does_ have its advantages). But this might
also make people feel safe when they change the database while breaking
business rules that live elsewhere...

That is a classic dilemma.

My preference is: if database need to be accessed by apps in different
technology, then it makes sense to put the business logic in SP's -
otherwise I would keep the business logic in the Java code, because
that makes it a lot cheaper to work with a different database - I would
keep basic integrity check in the database though.

Arne
 
A

Arne Vajhøj

To use JDBC and SQL, I need to handle trial things, like commit,
rollback etc. How about hibernate/spring?

You still need to handle your own transactions. That is a very
common feature of persistence frameworks - they don't try to
guess how you want your transactions.

Arne
 
A

Arne Vajhøj

So you are saying that I need skilled people to write the initial code
and can give maintenance to less skilled people because the ORM using
code is easy to read? I am not sure that is a good strategy. Over time
software tends to decay because more and more bug fixes are applied and
features added. If only the people knew internals of ORM that wrote the
initial code I see a good chance that maintainers wreck havoc on the
performance and potentially the whole application if they change /
extend the easy readable code without knowing the tool they are using.
Even a change as seemingly simple as that of a field type from "int" to
"String" might have dramatic consequences. And just think of the woes of
schema migration: if you have an installed base you urgently need
someone who understands the DB underneath and the ORM tool to come up
with a feasible migration strategy that.

Not quite. I am saying that you may want people that have a clue
about persistence to write and modify the persistence code, but
that developers that does not know about the used ORM framework
or the database will be able to easily read and understand the code
(while working on something else - like the business logic).
Btw, did I mention that I believe database independence is a myth? :)

It is a myth that is seen working everyday in the Java world, that
many Java apps using a good ORM (like Hibernate or one
of the JPA implementation) use the same Java code with different
databases. It is not always that easy. But for all the simple
stuff it works well.

Arne
 
G

Graham Cox

Lew said:
What you wrote looks correct, but I am fairly certain it requires
'keywords' to be expressed as a collection (probably a 'Set'), which
is best done if the keywords are in their own table, which they should
be anyway. I don't think it works with the space-separated list of
keywords as in your first of three examples.
Arguably the keywords should actually be in a full text index in the
RDBMS, and let the proper full text indexing support handle it
instead. That will be more efficient than scanning a table of keywords
joined back to the table of documents, and - potentially - gives support
for near matches as well...

On the other hand, I've no idea if JPA supports full text indexes and if
it does how you'd go about using them...
 
P

Pitch

Pitch, you can have multi tier with JDBC and ORM - you can even have
multi tier without persistence altogether.

Yes, but I sad _this_ multi-tier architecture. If you use only JDBC you
have one tier less.
 
P

Pitch

(e-mail address removed) says...


ORM doesn't force business rules into a separate layer and raw JDBC calls
don't force them into the same layer as persistence.

I disagree.
 
R

Robert Klemme

Not quite. I am saying that you may want people that have a clue
about persistence to write and modify the persistence code, but
that developers that does not know about the used ORM framework
or the database will be able to easily read and understand the code
(while working on something else - like the business logic).

Thank you for the clarification!
It is a myth that is seen working everyday in the Java world, that
many Java apps using a good ORM (like Hibernate or one
of the JPA implementation) use the same Java code with different
databases. It is not always that easy. But for all the simple
stuff it works well.

Hm... The question is: how simple is "simple" and where does
"complicated" begin? Just an example, which I would rather place in
the "simple" bucket: assume you want to query for items that have a
field set to null. Considering that Oracle does not index NULL values
and another RDBMS that you want to use does so, you'll likely end up
with a clutch: either you use a different value for NULL which will
allow for uniform ORM code at the cost of a bad design; or you need to
make the ORM tool create different queries (and I do not mean the
difference between "VARCHAR2" and "VARCHAR") for Oracle and the other
RDBMS. You could, of course, also live with the FTS in Oracle but I
doubt you'll have much fun doing this on any reasonably large
database. :)

In any case I believe the complicated stuff is where the fun begins -
so for _me_ DB independence is definitively a myth. ;-)

Kind regards

robert
 
R

Robert Klemme

If you use ORM you don't want this, so it's not relevant. Right?

Maybe you do not want it - but you still may need to be able to do it
(DB repair, large volume changes etc.). Unfortunately reality is
rarely in line with what we like or want - and we still have to deal
with it.

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,772
Messages
2,569,593
Members
45,104
Latest member
LesliVqm09
Top