Database development

T

Tom Anderson

Arguably the keywords should actually be in a full text index in the
RDBMS, and let the proper full text indexing support handle it
instead. That will be more efficient than scanning a table of keywords
joined back to the table of documents

Maybe, maybe not:

http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html

Depends on the data.
On the other hand, I've no idea if JPA supports full text indexes and if
it does how you'd go about using them...

JPA itself doesn't, AFAIK. However, you can take a JPA-mapped database,
add a fulltext index on the side, and then access it using native queries.
It's a leak in the abstraction, and a point of nonportability, but it's
easily confined to the data access layer, and probably not too big a deal
in practice.

There are also fulltext search options for JPA that don't use the
database's own indexing - notably Compass, which indexes a JPA repository
with Lucene:

http://www.compass-project.org/overview.html

Hibernate also has Hibernate Search, which does much the same thing. I
understand that its existence is pushing towards the next major version of
JPA including fulltext search. Whether there will be implementations on
top of database fulltext search as well as Lucene and similar, i don't
know.

tom
 
T

Tom Anderson

Hmm. My docs and folders are all "nodes". Stored in the node table.
Nodes have metadata - such as authors, keywords, document id's,
departments, this and that. Those are stored in the metadata table.

That sounds like a pretty bad idea. It's not playing to an RDBMS's
strengths at all. I take it you can't change this?
Now, to get all the info about one doc, you have to get it's node ID,
and the go to the metadata table, and find out the metadata that means
"parent node", get it's id, then get this parents' children (to get
other documents in this set), then get all those documents and search
for some keywords in those documents.

You can still do this with joins in one big query:

SELECT document.node_id
FROM nodes AS document, metadata AS parentage, metadata AS folder_data, metadata AS content
WHERE
document.node_id = parentage.node_id
AND parentage.key = 'parentNode'
AND parentage.value = folder_data.node_id
AND folder_data.key = 'name'
AND folder_data.value = ?
AND document.node_id = content.node_id
AND content.key = 'content'
AND content.value LIKE ?

Remember to tip your query optimiser generously beforehand, though.

tom
 
T

Tom Anderson

My preference is: if database need to be accessed by apps in different
technology, then it makes sense to put the business logic in SP's -
otherwise I would keep the business logic in the Java code, because that
makes it a lot cheaper to work with a different database -

It's worth mentioning that the modern alternative approach here is to put
hide the database completely behind the java, and expose the functionality
through web services. Rather than having other apps talk to the database
directly, they make calls to the java layer. That lets you raise the level
of abstraction in the other apps, reuse functionality in the java, and
maintain the invariants enforced by the business logic in the java.

The downside of this is that whatever it is the other app wants to do has
to be supported by the java app, which will invariably mean that
developing any app will involve some work on the java app to add the
needed capabilities. Mind you, if the DB-centric alternative involves
putting logic in stored procedures, then this is no different - external
apps will have to wrangle java code instead of PL/SQL or some other such
monstrosity.

Over time, the java app evolves into more of a service layer - at some
point, it makes sense to formally split it into the original app and a
service layer. That point might even be right at the start.

Also worth mentioning that if you're not afflicted by XMLitis, you can
also expose the services through CORBA or one of a myriad of other RPCish
mechanisms. Thrift! Protocol buffers! JSON-REST! DCE RPC! CSV-UUCP!
I would keep basic integrity check in the database though.

Same here. It should never be necessary, but it's nice to have a backstop.
If you can generate the constraints in the database automatically from the
code, then it's a no-brainer - would i be right in thinking that DDL
generated by popular JPA implementations puts in constraints wherever it
can? For example, FOREIGN KEY should be pretty easy. If you've got the new
javax.validation annotations on your entities, more constraints could be
generated from those too - is anyone doing that?

tom
 
J

Jim Janney

Zlatko Duric said:
I inherited something that uses Hibernate, and I'm thinking about
speeding up a few things. I was just thinking about how it would be
difficult to try to speed all the slow stuff up by replacing all the
hibernate stuff with all JDBC queries, and with my experience there's
no chance I'll be doing this. But this approach (combination of ORM
and JDBC) sounds very interesting to me.

Now, my data is all objects - that suits me perfectly. But there is
some information about all those objects I'd like to store in a single
table or maybe two of them, that'd be super-fast to reach, without
having to look for all those parent/children/node/parameters/other
links and without having other issues to think about. I believe that
part of the features would benefit from it a lot in terms of
performance.

Now, how common is this approach (combination)? Is there something
really important I should read about this, before starting with the
implementation?

http://docs.jboss.org/hibernate/stable/core/reference/en/html/queryhql.html#queryhql-select

You can query for just the columns (or expressions) you need, without
pulling in complete objects. They used to call these projected
queries, but I don't see that phrase in the docs now. You can also
escape into native SQL.
 
L

Lew

Tom said:
You can still do this with joins in one big query:

SELECT document.node_id
FROM nodes AS document, metadata AS parentage, metadata AS folder_data,
metadata AS content
WHERE
document.node_id = parentage.node_id
AND parentage.key = 'parentNode'
AND parentage.value = folder_data.node_id
AND folder_data.key = 'name'
AND folder_data.value = ?
AND document.node_id = content.node_id
AND content.key = 'content'
AND content.value LIKE ?

Remember to tip your query optimiser generously beforehand, though.

Not only was that a superb pun, it was demmed good advice.

The better DBMSes have ways to gather statistics that help them plan queries
and other operations. Their efficiency is strongly influenced by the tips
these statistics provide. If you don't set these processes up correctly or
keep up with the DBMS's usage, your queries tend to run slower.
 
A

Arne Vajhøj

If you use ORM you don't want this, so it's not relevant. Right?

In the unlikely case that it is a decision that can be made, then
it is still a restriction and therefore relevant.

In the more likely case that it is a given, then it is obviously
relevant.

Arne
 
A

Arne Vajhøj

Maybe you do not want it - but you still may need to be able to do it
(DB repair, large volume changes etc.). Unfortunately reality is
rarely in line with what we like or want - and we still have to deal
with it.

DBA operations is one thing.

But it is not that uncommon that databases used by Java is
also used by non-Java apps: .NET, C, Cobol, PL/I etc..

Arne
 
A

Arne Vajhøj

It's worth mentioning that the modern alternative approach here is to
put hide the database completely behind the java, and expose the
functionality through web services. Rather than having other apps talk
to the database directly, they make calls to the java layer. That lets
you raise the level of abstraction in the other apps, reuse
functionality in the java, and maintain the invariants enforced by the
business logic in the java.

Modern alternative????

I thought it was a classic anti-pattern to expose DAL as
services instead of BLL as services.

After all - this is entity beans remote interface just with a
10 times more inefficient transport format.

Arne
 
A

Arne Vajhøj

Yes, but I sad _this_ multi-tier architecture. If you use only JDBC you
have one tier less.

No.

You have the same tiers with ORM and JDBC and in 99% of cases
the same layers as well.

Arne
 
A

Arne Vajhøj

Hm... The question is: how simple is "simple" and where does
"complicated" begin? Just an example, which I would rather place in
the "simple" bucket: assume you want to query for items that have a
field set to null. Considering that Oracle does not index NULL values
and another RDBMS that you want to use does so, you'll likely end up
with a clutch: either you use a different value for NULL which will
allow for uniform ORM code at the cost of a bad design; or you need to
make the ORM tool create different queries (and I do not mean the
difference between "VARCHAR2" and "VARCHAR") for Oracle and the other
RDBMS. You could, of course, also live with the FTS in Oracle but I
doubt you'll have much fun doing this on any reasonably large
database. :)

How does you solve it by using JDBC?
In any case I believe the complicated stuff is where the fun begins -
so for _me_ DB independence is definitively a myth. ;-)

Well - out goal is to make everything as simple as possible.

Arne
 
R

Robert Klemme

How does you solve it by using JDBC?

Well, you would either wrap the query in a stored procedure or use
custom SQL per database engine type. For example, with Oracle you
could create a FBI and query with its functional expression.
Well - out goal is to make everything as simple as possible.

.... but not simpler. :)

Cheers

robert
 
G

Gunter Herrmann

Hi!

Robert said:
Btw, did I mention that I believe database independence is a myth? :)

If you want to write code that runs on "all" databases, you cannot use
any of the special features of any database.
Your code will run (rather crawl) on every database.

If you want a fast application let the database do what it can do best,
and create an API on top of it.
In case your back end runs on Oracle, use packaged stored procedures,
table functions, pipelined functions.
This way you make sure that different programs (written in different
programming languages) are all controlled by the same database code.
If you did this from the start you can easily have
- Oracle forms
- Cold fusion
- a Java desktop application
- an interface for processing remote requests (written in C)
- another interface for processing remote requests (written in Pascal)
- ...
do the same thing without messing up the data.

You can encapsulate the tables behind this by only giving execute privs
on the API procedures to all the clients. No insert/update/delete privs
for any client software.

This way you can easily change the client without rewriting the logic.

brgds

Gunter, Orlando, Fla.
 
T

Tom Anderson

Modern alternative????

I thought it was a classic anti-pattern to expose DAL as services
instead of BLL as services.

Yes. Sorry if i didn't make it clear enough, but i was talking about
exposing domain-model operations, not raw database operations. Hence the
bit about "That lets you raise the level of abstraction in the other apps,
reuse functionality in the java, and maintain the invariants enforced by
the business logic in the java.". And the paragraph discussing the fact
that developing other apps will involve adding business logic to the java.

tom
 
P

Pitch

Well - ORM implementations does not carry experience AI into the
app.

And your experience is not part of the ORM.

:)

Well, arent' you a nitpicker. Are you and Lew the same person?
 
L

Lew

(e-mail address removed) says...
Pitch said:
 :)

Well, arent' you a nitpicker. Are you and Lew the same person?

We aren't, but the honor would be mine were it so.

It isn't really nitpicking to point out that you completely and
utterly contradicted your earlier point.
 
P

Pitch

(e-mail address removed) says...



We aren't, but the honor would be mine were it so.

It isn't really nitpicking to point out that you completely and
utterly contradicted your earlier point.

Aparently, you are not able to follow my thoughts so I won't take up any
more of your time.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,600
Members
45,181
Latest member
RexGreenwa
Top