Database development

J

Jack

When I work on database development projects, I use JDBC and SQL. Many
people use hibernate/spring. Can somebody explain the pros and cons of
using JDBC and SQL vs using hibernate/spring on database
developments?

Thanks.

Jack
 
A

Arne Vajhøj

When I work on database development projects, I use JDBC and SQL. Many
people use hibernate/spring. Can somebody explain the pros and cons of
using JDBC and SQL vs using hibernate/spring on database
developments?

That is a rather big discussion.

The ultra short version is:
- ORM (Hibernate or other) is best when the problem to
be solved is CRUD of objects
- pure SQL (JDBC) is best when you want to do something
more unusual

Arne
 
L

Lew

Jack said:
When I work on database development projects, I use JDBC and SQL. Many
people use hibernate [sic]/spring [sic]. Can somebody explain the pros and cons of
using JDBC and SQL vs using hibernate [sic]/spring [sic] on database
developments?
That is a rather big discussion.

The ultra short version is:
- ORM (Hibernate or other) is best when the problem to
be solved is CRUD of objects
- pure SQL (JDBC) is best when you want to do something
more unusual

Hibernate, as Arne pointed out, is not the only ORM (Object-Relational
Mapping) tool. The biggies include also OpenJPA and EclipseLink.

They all support and implement the Java Persistence API (JPA), the Java
standard for ORM.

JPA is useful for more than mere CRUD, but that is the most common use case.

JPA does not require Spring at all. "Hibernate/Spring" is not one thing nor
one word.

Strictly speaking, JPA is not for database development. It is for persistent
object development. The point of view of JPA is object orientation, quite
different from the set logic of SQL.

SQL is best when you need a relational view of data. JPA is best when you
need an object view of objects.

Obviously JPA sits atop JDBC, so there is no real dichotomy between the two.
The question is roughly equivalent to the decision between a low-level library
and a higher-level one.
 
T

Tom Anderson

That is a rather big discussion.

The ultra short version is:
- ORM (Hibernate or other) is best when the problem to
be solved is CRUD of objects
- pure SQL (JDBC) is best when you want to do something
more unusual

I'd rephrase that slightly to say that ORM is best when you want to deal
with your data as objects - when you need to be able to call methods,
traverse object graphs, and generally think of your data as objects.

If your data is something that isn't usefully thought of as objects
(perhaps a big boring spew of temperature measurements over time or
something) then there isn't much benefit to ORM. There's probably no real
harm either, so if you prefer ORM, you can still use it.

And as Arne said, when you're trying to do something unusual, you may be
outside the limits of what ORM can comfortably do, and you'll be better
off using straight JDBC. Or perhaps a combination of ORM for any CRUDdy /
domain logicky bits, and JDBC for complex queries.

tom
 
Z

Zlatko Duric

And as Arne said, when you're trying to do something unusual, you may be
outside the limits of what ORM can comfortably do, and you'll be better
off using straight JDBC. Or perhaps a combination of ORM for any CRUDdy
/ domain logicky bits, and JDBC for complex queries.

I inherited something that uses Hibernate, and I'm thinking about
speeding up a few things. I was just thinking about how it would be
difficult to try to speed all the slow stuff up by replacing all the
hibernate stuff with all JDBC queries, and with my experience there's no
chance I'll be doing this. But this approach (combination of ORM and
JDBC) sounds very interesting to me.

Now, my data is all objects - that suits me perfectly. But there is some
information about all those objects I'd like to store in a single table
or maybe two of them, that'd be super-fast to reach, without having to
look for all those parent/children/node/parameters/other links and
without having other issues to think about. I believe that part of the
features would benefit from it a lot in terms of performance.

Now, how common is this approach (combination)? Is there something
really important I should read about this, before starting with the
implementation?

TIA
 
A

Arne Vajhøj

I inherited something that uses Hibernate, and I'm thinking about
speeding up a few things. I was just thinking about how it would be
difficult to try to speed all the slow stuff up by replacing all the
hibernate stuff with all JDBC queries, and with my experience there's no
chance I'll be doing this. But this approach (combination of ORM and
JDBC) sounds very interesting to me.

Now, my data is all objects - that suits me perfectly. But there is some
information about all those objects I'd like to store in a single table
or maybe two of them, that'd be super-fast to reach, without having to
look for all those parent/children/node/parameters/other links and
without having other issues to think about. I believe that part of the
features would benefit from it a lot in terms of performance.

Now, how common is this approach (combination)? Is there something
really important I should read about this, before starting with the
implementation?

If you are accessing the data as objects, then I don't think
that switching from Hibernate to raw JDBC is the right direction
to optimize the code.

Instead you should focus on tuning Hibernate and the databases
itself.

Hibernate can be slow and Hibernate can be fast. It all depends
on the guy writing the code.

Arne
 
A

Arne Vajhøj

I'd rephrase that slightly to say that ORM is best when you want to deal
with your data as objects - when you need to be able to call methods,
traverse object graphs, and generally think of your data as objects.

If your data is something that isn't usefully thought of as objects
(perhaps a big boring spew of temperature measurements over time or
something) then there isn't much benefit to ORM. There's probably no
real harm either, so if you prefer ORM, you can still use it.

And as Arne said, when you're trying to do something unusual, you may be
outside the limits of what ORM can comfortably do, and you'll be better
off using straight JDBC. Or perhaps a combination of ORM for any CRUDdy
/ domain logicky bits, and JDBC for complex queries.

It is possible to mix different persistence technologies, but it raises
lots of potential consistency issues. I would avoid it if possible.

Arne
 
M

markspace

Zlatko said:
But there is some
information about all those objects I'd like to store in a single table
or maybe two of them, that'd be super-fast to reach, without having to
look for all those parent/children/node/parameters/other links ...
Now, how common is this approach (combination)? Is there something
really important I should read about this, before starting with the
implementation?


As far as I know, de-normalizing a database for faster access is very
common, as long as you started with a good normalized design, and you
document carefully what you denormalize, and you measure carefully the
performance boost and can justify the extra maintenance.

I don' have any links handy, but if you Google for "database
denormalization optimization" there seems to be plenty of info. I'd try
some standard techniques for denormalization first, rather than try to
improvise something.
 
A

Arne Vajhøj

As far as I know, de-normalizing a database for faster access is very
common, as long as you started with a good normalized design, and you
document carefully what you denormalize, and you measure carefully the
performance boost and can justify the extra maintenance.

I don' have any links handy, but if you Google for "database
denormalization optimization" there seems to be plenty of info. I'd try
some standard techniques for denormalization first, rather than try to
improvise something.

It is very common to denormalise databases for "performance".

I have a strong suspicion that in more than 90% of cases it
is unwarranted.

Databases are extremely optimized to do joins efficiently.

If the logical and physical design is good then joins is
usually not the problem.

Even if it is a problem, then the specific databases may
offer the possibility of materialized views to solve the
problem.

Arne
 
L

Lew

Wrong approach.

Common doesn't mean correct.

Yes. Read about why normalization is important in the first place. Read
about the things others have mentioned, like (materialized) views. Read about
why "premature optimization is the root of all evil."

Whatever you do to "optimize" won't, at least not unless you actually
*measure* performance before and after your so-called "optimizations" under
realistic loads and field conditions.

Don't forget to take into account the cost of the increased code complexity
for denormalized structures, and compare that to the cost of keeping data
normalized. Don't forget to take into account the actuarial cost of the risk
to your data from the denormalization.

Better yet, stick with best practices.
It is very common to denormalise databases for "performance".

OP: Notice how some of us put "performance" in quotation marks? There's a
good reason for that.
I have a strong suspicion that in more than 90% of cases it
is unwarranted.

You are being kind.
Databases are extremely optimized to do joins efficiently.

If the logical and physical design is good then joins is
usually not the problem.

Even if it is a problem, then the specific databases may
offer the possibility of materialized views to solve the
problem.

When one denormalizes a database for "performance", one usually winds up with
none of the expected performance gains and all of the expected increase in
risk to the data.

At least, one ought to expect that risk. The purpose of normalizing a
database is to prevent data anomalies and enforce data constraints.
Denormalize and you screw that up.

As for ORM efficiency, as Arne pointed out:
Hibernate [or any other ORM framework] can be slow
and Hibernate can be fast.
It all depends on the guy writing the code.

Properly written, JPA code is no slower than raw SQL coding, takes less time
to develop and maintain (part of the cost-benefit equation, folks!), and is a
much more natural fit to the object model of the application.

Furthermore, JPA frameworks offload much of the management effort for
persistent storage connections and O-R mapping. This is similar to how
managed-memory environments like the JVM and .Net offload the effort, expense
and risk of memory management from the programmer. Don't give that up lightly.

Beyond that, Hibernate and other JPA frameworks lend themselves well to
inbuilt and outboard cache approaches. Out of the box, JPA gives you a "level
one" cache (a.k.a. a "session") that will help optimize interaction with the
database.

If you're looking to optimize database access, it is by far much more
productive to pay attention to things like client- and server-side statement
preparation, scope and lifecycle of 'EntityManager' instances, database tuning
parameters (such as work memory or whatever the DBMS calls it), connection
pooling, disk speed (use high-rotational-speed drives in a RAID array with a
battery-backed RAID controller), scalability, concurrency, indexes,
partitioning (database, not disk), and other adjustments that will improve
performance WITHOUT TOTALLY HOSING YOUR DATA MODEL.

A good object model that caches well without concurrency bottlenecks will
scale well to additional hardware and provide much more performance than a
messed-up data model, without the risks of the latter.

I've been part of performance optimization efforts for databases a number of
times. Denormalization usually has hurt, not helped. In one case that I
recall fondly, the denormalized structure caused a quadratic increase in
processing time with data quantity, rather than the linear increase a
normalized database would have provided (and did, when they finally accepted
my recommendation to normalize, but not before causing a major problem with
their customer that got the project manager replaced and nearly cost the
contract).

Program run time is rarely the dominant cost in a project.

Code correctly and well first. That will almost always give sufficient
performance. If not, MEASURE and optimize and MEASURE again, focusing first
and foremost on things that don't mess you up by harming data integrity,
maintainability or scalability.
 
R

Robert Klemme

That is a rather big discussion.

The ultra short version is:
- ORM (Hibernate or other) is best when the problem to
be solved is CRUD of objects
- pure SQL (JDBC) is best when you want to do something
more unusual

Notably manipulating large volumes of data. While there are some
features in JPA and likes working efficiently with large volumes of data
often requires exploiting features of the particular RDBMS at hand.
That soon becomes awkward if you want to do it through a ORM.

And one downside of using Hibernate, JPA or any other ORM tool: these
tools hide the often quoted impedance mismatch between the object world
and the database - which is good because that is precisely what they
were invented for. The danger here is to use a persistent store
mindlessly just like objects in memory which has a good chance of
leading to awful performance in certain situations. In a way you could
say the hiding works but not for every use case and sometimes you're
crossing from "works" to "major nuisance" without noticing it. Bottom
line, with Hibernate as well as with any other tool, you should know it
and its weaknesses (which to a certain extent makes the "hiding" moot).

Kind regards

robert
 
A

Arne Vajhøj

Notably manipulating large volumes of data. While there are some
features in JPA and likes working efficiently with large volumes of data
often requires exploiting features of the particular RDBMS at hand. That
soon becomes awkward if you want to do it through a ORM.

And one downside of using Hibernate, JPA or any other ORM tool: these
tools hide the often quoted impedance mismatch between the object world
and the database - which is good because that is precisely what they
were invented for. The danger here is to use a persistent store
mindlessly just like objects in memory which has a good chance of
leading to awful performance in certain situations. In a way you could
say the hiding works but not for every use case and sometimes you're
crossing from "works" to "major nuisance" without noticing it. Bottom
line, with Hibernate as well as with any other tool, you should know it
and its weaknesses (which to a certain extent makes the "hiding" moot).

I don't think the big benefits of ORM (Hibernate or JPA or one of the
alternatives) are in the writing of the code. It still requires
somebody that knows both the ORM framework and the database well
to write really efficient code.

The big benefits are for reading the code. Everyone can read the
the code using ORM and immediately understand what it does without
looking at tons of code that uses JDBC and SQL. It is maintenance
friendly.

Arne
 
J

Jack

If you are accessing the data as objects, then I don't think
that switching from Hibernate to raw JDBC is the right direction
to optimize the code.

Instead you should focus on tuning Hibernate and the databases
itself.

Hibernate can be slow and Hibernate can be fast. It all depends
on the guy writing the code.

Arne

To use JDBC and SQL, I need to handle trial things, like commit,
rollback etc. How about hibernate/spring?
 
L

Lew

Jack said:
To use JDBC and SQL, I need to handle trial things, like commit,
rollback etc. How about hibernate [sic]/spring [sic]?

Hibernate and Spring are two completely different products.

In Hibernate and other JPA products you do have to handle transactions. I
don't know about Spring.
 
Z

Zlatko Duric

Wrong approach.

Well, I don't mean to change the model.

I want to keep and use all this stuff that's already in there.
But there are few things I think can benefit from an approach like this:

- reports
there are a few "reports" being produced from the system. For some
10000 objects, there are 200k mysql queries - I am not much into big
systems (yet) but this seems waaaaaaaaaaay too much to me. If I had the
data from this in an additional table (I know this means duplicating
some data), I could filter out what I need in one single query.
- lists
when accessing the data, users only fetch a certain small amount of
objects - 15, 20, maybe 100 at a time. So to avoid most of the node
links and stuff, I could get a quick filter of the data needed, without
having to join this and that.

Now, I know there are some disadvantages too, like having to worry
whether I update the object and the table at the same time or so.
I also have to make sure that all the data is in sync, since it's now
duplicated. But I still can't resist wondering if this would be a good
option.

Luckily I still have a couple of months work already planned, so I get
to think about it :p

Thanks anyway
 
P

Pitch

When I work on database development projects, I use JDBC and SQL. Many
people use hibernate/spring. Can somebody explain the pros and cons of
using JDBC and SQL vs using hibernate/spring on database
developments?

I always believed that ORM systems are forcing you to write your own
business-rules layer apart from the persistence layer. That way database
access is kept simple and easy mantainable.

Also, this multi-tier architecture allows for easier load-balancing,
architecture changes, integration with other systems, development..
 
L

Lew

Zlatko said:
Well, I don't mean to change the model.

I want to keep and use all this stuff that's already in there.
But there are few things I think can benefit from an approach like this:

- reports
there are a few "reports" being produced from the system. For some
10000 objects, there are 200k mysql queries - I am not much into big
systems (yet) but this seems waaaaaaaaaaay too much to me. If I had the
data from this in an additional table (I know this means duplicating
some data), I could filter out what I need in one single query.

You can do that with views or join queries without duplicating the data.
- lists
when accessing the data, users only fetch a certain small amount of
objects - 15, 20, maybe 100 at a time. So to avoid most of the node
links and stuff, I could get a quick filter of the data needed, without
having to join this and that.

What do you mean by "most of the node links and stuff"?

Duplicated data will require more accesses, code complication and risk than
"to join this and that".
Now, I know there are some disadvantages too, like having to worry
whether I update the object and the table at the same time or so.
I also have to make sure that all the data is in sync, since it's now
duplicated. But I still can't resist wondering if this would be a good
option.
No.

Luckily I still have a couple of months work already planned, so I get
to think about it :p

Time isn't necessary, correct thinking is.
 
L

Lew

(e-mail address removed) says...
I always believed that ORM systems are forcing you to write your own
business-rules layer apart from the persistence layer. That way database
access is kept simple and easy mantainable.

ORM doesn't force business rules into a separate layer and raw JDBC calls
don't force them into the same layer as persistence.
 
Z

Zlatko Duric

You can do that with views or join queries without duplicating the data.

Maybe I could, if I knew what am I looking for in the first place :)
What do you mean by "most of the node links and stuff"?

Well, my objects are documents - those are "nodes". But so are the
"folders" holding the documents. And so are their parents. And I want,
for example, all the docs that have the keyword FOO and their parent is
"Reports".

That's messy - iterate through all the folders to find the stuff I need.

Duplicated data will require more accesses, code complication and risk
than "to join this and that".


Time isn't necessary, correct thinking is.

That's why I'm posting here, so you can do my thinking :)
 
M

Martin Gregorie

Well, my objects are documents - those are "nodes". But so are the
"folders" holding the documents. And so are their parents. And I want,
for example, all the docs that have the keyword FOO and their parent is
"Reports".
Thats what additional indexes are for. For example, to support the
following example you'd want indexes on doc.keyword and folder.name,
since they'll probably be used often for data selection and sorting.
We'll assume that the DB designer was sensible and put indexes on both
table's prime keys. Your example requirement could be satisfied with a
single query, which would involve a prime key join and use the additional
indexes to select the rows to be included in the dataset. Something like
this:

SELECT required fields
from folder f, document d
where f.key = d.folderkey
and d.keyword = 'FOO'
and f.name = 'Reports';

Any decent RDBMS should be able to optimise that type of query and would
return just the required result set.
That's messy - iterate through all the folders to find the stuff I need.
Why would you need to do that? A correctly written query will only return
the data you need, present it in the order you want and not slow database
updates down by requiring additional reference tables which must be
maintained.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top