ORMs comparisons/complaints.

A

Arne Vajhøj

If you cant manage relational databases, you shouldn't be messing with
big data.

Why not?

Hint: relational databases are not that common for big data.

Arne
 
A

Arne Vajhøj

It certainly wasn't the real Arved.

Faked headers too, from the look of it.

It did not even show up at the NNTP server I use.

I can only see it via Google.

Arne
 
A

Arne Vajhøj

ORMs are good at what they where invented for: serializing an object and
resurrecting it at a later point in time.

Storing objects in a relational database via ORM is very different
from serialization (for non-trivial usage).

A serialization stores everything in a sequential stream of data.

Storing objects in a relational database via ORM store the stuff
not already stored in different tables.

Using a document store have some similarities with serialization.
That means you have to design
your system and its underlying data as a collection of objects with
(encapsulated) member data. Using that approach the lifetime of an
object instance must be able to extend the actual running span of the
program. That requires serialization/resurrection by definition.

No.

It requires the ability to save and load data.
Apart
from the fact that I think that this is a bad approach on its own, to
constrain memory usage objects need to be put to sleep by default and be
resurrected only when they are accessed.

That is how persistence works. The data are on disk and when a program
needs them they are read from disk to memory.
This makes the approach even
more blurred and needlessly complex, bringing stuff like caching and
managing/synchronizing duplicate object instances across concurrently
running program instances into the picture.

But that is not in any way ORM specific.

Plain JDBC will have the same potential issues with caching.
To make things worse almost no system only needs single object
instances. Almost any practical system needs counts, averages etc. which
could be done with a query on an RDBMS or by traversing object instances
IF THEY WHERE REAL INSTANCES. Since doing the latter with an ORM would
require resurrecting enormous amounts of instances for practical reasons
you have to pour water into the wine and do atypical stuff like joins
and aggregate queries through the ORM.

????

Joins is a core feature of an ORM.
I know they CAN do this but that
is no more than a wart on such systems since they contradict the primary
goal of an ORM.

Aggregate functions are not very ORM'ish.

But if they are used much in the transactional work that ORM
is intended, then something is wrong in the first place.
This is also the area where ORMs failed in the projects
I talked about. It's not that the ORM can not do it, it just can not do
it sufficiently well even with help from the most experienced experts we
could find.

????

Joins is a core feature of ORM.

And common Java ORM's like JPA implemntations and Hibernate does
aggregate functions exactly like SQL in JPQL and HQL respectively.

I wonder what kind of "experts" you got.

Arne
 
A

Arne Vajhøj

On 12/30/13 5:38 AM, Silvio wrote:
[snip]
To make things worse almost no system only needs single object
instances. Almost any practical system needs counts, averages etc. which
could be done with a query on an RDBMS or by traversing object instances
IF THEY WHERE REAL INSTANCES. Since doing the latter with an ORM would
require resurrecting enormous amounts of instances for practical reasons
you have to pour water into the wine and do atypical stuff like joins
and aggregate queries through the ORM. I know they CAN do this but that
is no more than a wart on such systems since they contradict the primary
goal of an ORM. This is also the area where ORMs failed in the projects
I talked about. It's not that the ORM can not do it, it just can not do
it sufficiently well even with help from the most experienced experts we
could find.
This is a good point, and it was something niggling in subconscious.
This is where I've always struggled with ORMs, but I never consciously
acknowledged that the difficulty was in utilizing the power of the "R"
in The ORM.
I still think the best approach for most systems is to design a separate
and independent data store that covers the problem domain which is
completely isolated from the systems that implement data extractions,
processes and data storage. I do not manually write code to serialize
object instances since I do not serialize them in the first place. Such
a data store can be an RDBMS but if so desired a NoSQL thingy or even a
file system could do well. Using an RDBMS gives the additional advantage
that the data is readily accessible for standard reporting and ETL tools.
This is one approach. I think one of the major "features" of most ORM
implementations is that they attempt to abstract away the actual RDBMS
layer to the point where you feel "dirty" trying to access it in any
meaningful way. This does provide some value in portability, but many
applications rarely need this portability of RDBMS, and more often
benefit from special features of the particular RDBMS chosen.

Plain JDBC code can be written rather portable as well *within* the
area where ORM actually makes sense, so I agree that portability
is not the big argument for ORM.

To me the advantage of using an ORM is simply all the code you
don't have to write.

Regarding getting away from the relational database, then note
that common Java ORM's like JPA implementations and Hibernate
actually allows you to use SQL queries. It is not strongly typed,
but the capability is there.

Arne
 
A

Arved Sandstrom

If I use ORM I expect that it can reasonably deal with deep structures.
Normalizing the data model at object level is exactly not what I want.

Not entirely sure what you mean here, Marcel. In all seriousness I don't
think I've ever seen anyone normalize or selectively denormalize a data
model at "object level". Apart from that, I'm not quite sure how one
would conveniently normalize data at the object level (presumably
through the ORM) if the data showed up denormalized.
Hmm, you see ORM not as a function extension but simply as a more
convenient API to RDBMS. That's OK. But maybe _O_RM does not hit the
nail on the head in this case.

Marcel, I've never thought that ORM (_O_RM as you emphasize it)
precisely hits the nail on the head. You did hit the nail on the head
when you described you I view ORMs: I see them as providing more
convenient access to databases. I also believe that they are usually
better than other models.
Well, maybe it fits because we mostly think in solutions rather than in
requirements. This applies already to the customer requests. It is often
difficult to distinguish between real requirements and creative ideas of
the customer, how to solve his issues.

Marcel, if you come up with an answer to us developers mostly thinking
about solutions, I applaud you. :) I've been programming since my late
teens, back in the '70's, and in my experience IT people almost
invariably choose what they know: there are financial reasons for doing
that, career reasons for doing that, and "I don't know about the other
possibilities" reasons for doing that. Speaking to that last point,
Marcel, if you can do better than the rest of us at figuring out how
many "solutions" there are, I commend you.

[ SNIP ]
Of course, that's always true.

But from my experience success is not manly a question whether you have
gone one way or the other. It is more a question if you have done this
well and in you have enough experience with the tools you are using.

In a way you are proving my point, per your last sentence. Good or great
coders make things work; they do even better if they have experience
with the technology. There aren't that many superlative IT technologies
out there, although there are many that are OK, and even more that are
bad. A competent experienced developer can work with bad stuff and still
produce solutions; a poor developer can use excellent software and fail.

By the way, before you tackle a project involving the combination of
IBM, Oracle, and SAP software, consider suicide. I don't think any
amount of experience helps with that.

[ SNIP ]
Well, an XML is a document too, and most of the time it is well structured.

May I act as the child who called out the emperor for being naked? :)

Tim Bray, who we all know, has made comments like this:

"Whether you like XML or not, we’re stuck with it for a long time. These
days, the only new XML-based projects being started up are
document-centric and publishing-oriented. Thank goodness, because that’s
a much better fit than all the WS-* and Java EE config puke and so on
that has given those three letters a bad name among so many programmers.
XML for your document database is actually pretty hard to improve on."

"When XML was invented, it was the world’s only useful cross-platform
cross-language cross-character-set cross-database data format. Where by
“useful” I mean, “came with a pretty good suite of free open-source
tools to do the basic things you needed.” ¶
That’s why it ended up being used for all sorts of wildly-inappropriate
things."

"That I gave up working on Lark, the first ever production-ready XML
parser, and still one of the fastest. It was maybe the best piece of
software I ever wrote; but I couldn’t see the point when there were two
other pretty good Java-language XML parsers out there in the wild. Oh well."

And Tim made a bit of a masturbatory blog post here:
http://www.tbray.org/ongoing/When/200x/2006/04/18/XML-Grammar.

Point being, one of the authors of XML is a bit confused about XML. He's
changed his views about it frequently. I've met the man; I even had the
privilege of him telling me back in 1999 or so that indexing and
searching XML was stupid.

XML is about as well-structured as the thoughts of the author, Marcel.
It's also not really a wonderful means for describing data or documents,
although Bray is in love with it. A structured flat file is better.

As to that first paragraph I quoted, it's a good thing that Tim is 6
years older than me, and will soon retire. "WS-* and Java EE config
puke"? The man has not ever seriously engaged in SOA, or J2EE/Java EE,
but he's happy to deliver opinions like that. He's an idiot.
I do not really have experience with NoSQL databases. But I used non
relational data models and in memory computing now for about 6 years in
different projects. None of the projects failed, all are still live.
Also we did not really save resources because of the decisions made. But
from the code maintenance and from the performance point of view it was
successful. Some change requests to the first of these projects were
implemented by an apprentice in half an hour. This would have taken a
few days by a qualified programmer, if we had chosen a relational data
model.

In the last project - a larger one - we have significant performance
benefits. There is an adjacent third party application with a very
similar data model from the customers point of view. (They are better
with respect to inheritance, we are better with respect to deep
structures but both can deal with polymorphism, table properties an so on.)
They ended up with solution d). We use XML documents. Measurements show
about three orders of magnitude performance difference, measured in time
per object access. We have two CPU cores with <30% load if users make
traffic. They have 16 CPU at 80-100% load if users make traffic. The
number of objects and attributes is comparable. In fact the data is
partially synchronized by interfaces. Both are Web applications. The
number of users is comparable.
OK, they have chosen PHP (used in an object oriented way), we have .NET
3.5. But this will not explain all the 3 orders of magnitude. Their
system creates heavy load on the large attribute values table.


Marcel

Silvio made some similar points, related to in-memory, the importance of
language objects, and performance. You've added some extra observations.

Call me dubious about most of the points, however. Most people have
recognized the limitations of NoSQL: CouchDB and Mongo were sexy, but
they are limited. Amazon recognizes that NoSQL is limited, maybe the
rest of us should.

I'm wondering about what exactly we are arguing about, though, Marcel.
Referring to a paragraph of yours, I have myself seen people program
relational-type logic on top of NoSQL datastores: this almost invariably
happens. I've also heard about programmers who can't make things work
with an RDBMS - you usually fire them.

With all due respect, I'll call bullshit about "Some change requests to
the first of these projects were implemented by an apprentice in half an
hour. This would have taken a few days by a qualified programmer, if we
had chosen a relational data model." That's highly implausible.

AHS
 
A

Arved Sandstrom

Why not?

Hint: relational databases are not that common for big data.

Arne

Arne, only because we are going through a phase of people trying to
establish reputations and do some NIH. It's been sexy to refer to shards
and documents and map-reduce and semi-structured data: fact is that for
a number of years we've had quite a few people wasting their time
describing structured data as non-structured data.

By the way, my man, my youngest uncle was also called Arne. Happy NY.

AHS
 
A

Arved Sandstrom

Much cleaner syntax.

But I am not convinced that the syntax is sufficient important
to translate that into "light years ahead".

Arne
We can agree to disagree, Arne. I think ideas like C# LINQ and Scala
Squeryl are far in advance of Java.

I was able to write a Scala DSL a few years ago that would not have been
possible in Java. Similarly, C# - I think you'd have to admit - is quite
far ahead of Java.

AHS
 
G

Gordon Levi

Arne Vajhøj said:
Let us say that you need to add a field.

With an ORM you only need to update:
* one dataclass
* one mapping of data

With plain JDBC you need to change:
* one data class
* N SQL statements
* N places in the Java code

I don't understand this so I fear I must be doing something wrong in
my Java programs. If someone wants to add a field in a database why do
I have to alter anything in my program other than adding, for example,
getString(String columnLabel) if I want to actually use the new field
at that point in the program.
 
S

Silvio

Storing objects in a relational database via ORM is very different
from serialization (for non-trivial usage).

A serialization stores everything in a sequential stream of data.

Storing objects in a relational database via ORM store the stuff
not already stored in different tables.

Using a document store have some similarities with serialization.

I meant serialization in the more general sense. I am not talking about
Object(In/Out)putStream but about saving the exact state of an instance
to some addressable storage with the main purpose of restoring its state
later.
No.

It requires the ability to save and load data.

No, not data but instances. My point is that these are fundamentally
different.
That is how persistence works. The data are on disk and when a program
needs them they are read from disk to memory.


But that is not in any way ORM specific.

Plain JDBC will have the same potential issues with caching.

In theory the ORM approach does not need more caching than a RDBMS
driven approach. In practice this is not the case. The number of cases
where caching outside of the RDBMS is actually needed is very limited
and I rarely use any form of data caching.
Without exception all the ORM systems I worked on relied heavily on
caching or the would not be practically usable.
????

Joins is a core feature of an ORM.

You can do joins easily but that is not the big problem. The core
problem is that joins create new views on the underlying data that do
not match with the entity classes that match its underlying tables. To
represent the data from a join properly you would need a new class and
then you get into one of the biggest culprits with ORM: aliased data and
cache redundancy.
Aggregate functions are not very ORM'ish.

But if they are used much in the transactional work that ORM
is intended, then something is wrong in the first place.

That is just my point. I never encounter any purely transactional
system. That is almost always a part but it is always the easy part.
Also, it's not that ORM tool providers shout from the roofs that you
should only use them for transactional stuff...
????

Joins is a core feature of ORM.

And common Java ORM's like JPA implemntations and Hibernate does
aggregate functions exactly like SQL in JPQL and HQL respectively.

I wonder what kind of "experts" you got.

Arne

Yes, they got all the stuff working the first time with joins and
aggregates like you say. But after some time they got into trouble
because aliased data caused the systems to show faulty counts and totals
etc. and from there on it went down-hill. On more than one system they
had to forcefully flush the cache at specific moments to get the right
results for subsequent reporting. Which then did not work very promptly,
to put it mildly.

Most systems I work(ed) on are primarily analytical systems working on
data that comes from surveys, measure devices, order tracking systems
etc. and fetching a record and storing it after manually changing some
attributes is not the common use case. The primary goal is to properly
manage such data at the record level behind the scenes while at the same
time allow thorough analysis of the overall process. Much of this
requires aggregates on joins that have to be built dynamically because
the user can specify exactly what he needs via a UI, akin to an OLAP
system.

I admit that this is not a good match for ORM but in these cases that
was hindsight wisdom. When they started they thought ORM was the right
tool. Which does not surprise me since I encounter many people who never
doubt ORM is the way to go with any system.
 
A

Arved Sandstrom

20 years, over 30 including Hobbyist
[ SNIP ]

Most of the guys that design and implement ORMs actually - oddly enough
- also understand performance. Just about all the time if you're having
a performance problem with an ORM it's because you didn't know what you
were doing. Don't feel bad: I've screwed up a few times too. I've not
met many programmers who had 20 or so years of professional experience,
that didn't have the chops to make them work, unless they simply hadn't
ever used ORMs much...which is OK.

Hassles when trying to relate data between tables? You mean more so than
trying to do that with JDBC or SQL?

AHS
 
A

Arne Vajhøj

By the way, my man, my youngest uncle was also called Arne. Happy NY.

The name is not unusual in Scandinavia. Just did a lookup - we are
13805 Arne in Denmark alone.

Happy NY to you too.

Arne
 
A

Arne Vajhøj

I don't understand this so I fear I must be doing something wrong in
my Java programs. If someone wants to add a field in a database why do
I have to alter anything in my program other than adding, for example,
getString(String columnLabel) if I want to actually use the new field
at that point in the program.

The context is that the class is persisted in a relational
database.

If you are not using an ORM you need to write code to do the
database interface.

Arne
 
A

Arne Vajhøj

We can agree to disagree, Arne. I think ideas like C# LINQ and Scala
Squeryl are far in advance of Java.

I was able to write a Scala DSL a few years ago that would not have been
possible in Java. Similarly, C# - I think you'd have to admit - is quite
far ahead of Java.

I think we almost agree on the grading of the syntaxes.

But we may disagree on the weight given to syntax in an ORM
evaluation.

I just don't see syntax for queries as being important enough
to cause "light years ahead".

Everything else equal, then a nice syntax obviously create
a winner.

Arne
 
A

Arne Vajhøj

I meant serialization in the more general sense. I am not talking about
Object(In/Out)putStream but about saving the exact state of an instance
to some addressable storage with the main purpose of restoring its state
later.

That is not the way the term serialization is normally used.

But it is a common requirement for both ORM and plain JDBC.
No, not data but instances. My point is that these are fundamentally
different.

Not really.

A data class as typical used by ORM's does contain data as the term
"data class" implies.

Arne
 
A

Arne Vajhøj

In theory the ORM approach does not need more caching than a RDBMS
driven approach. In practice this is not the case.

????

Some ORM's come without what at least in the Java world is known
as level 2 cache (cache of data outside of ongoing transaction).

But for those with level 2 cache, then I have never seen one
where it could not be disabled.

If you don't want it, then just turn it of.
The number of cases
where caching outside of the RDBMS is actually needed is very limited
and I rarely use any form of data caching.

????

Use of cache is essential for achieving good performance for many many
types of application no matter language, database or database access
technology.
Without exception all the ORM systems I worked on relied heavily on
caching or the would not be practically usable.

I don't know what ORM's you have worked with.

But the ORM's create the same SQL as handwritten SQL, so there are
no more and no less need for caching.

Furthermore among some of the most popular ORM's then level 2 cache is
either disabled by default (Hibernate) or not existing (EF).
You can do joins easily but that is not the big problem. The core
problem is that joins create new views on the underlying data that do
not match with the entity classes that match its underlying tables. To
represent the data from a join properly you would need a new class and
then you get into one of the biggest culprits with ORM: aliased data and
cache redundancy.

????

Any decent ORM just do the joins, load the data and no aliased data
(beyond where required due to data actually being the same also in the
database).

Arne
 
A

Arne Vajhøj

Yes, they got all the stuff working the first time with joins and
aggregates like you say. But after some time they got into trouble
because aliased data caused the systems to show faulty counts and totals
etc. and from there on it went down-hill. On more than one system they
had to forcefully flush the cache at specific moments to get the right
results for subsequent reporting. Which then did not work very promptly,
to put it mildly.

It has either been a horrible ORM or some horrible "experts".
Most systems I work(ed) on are primarily analytical systems working on
data that comes from surveys, measure devices, order tracking systems
etc. and fetching a record and storing it after manually changing some
attributes is not the common use case. The primary goal is to properly
manage such data at the record level behind the scenes while at the same
time allow thorough analysis of the overall process. Much of this
requires aggregates on joins that have to be built dynamically because
the user can specify exactly what he needs via a UI, akin to an OLAP
system.

I admit that this is not a good match for ORM but in these cases that
was hindsight wisdom. When they started they thought ORM was the right
tool. Which does not surprise me since I encounter many people who never
doubt ORM is the way to go with any system.

Ah.

ORM's are often good for transactional data (OLTP).

Not as obvious a choice for analytical data (DWH).

Arne
 
A

Arved Sandstrom

Well I'm not sure what kind of Java applications you write but in the
world I inhabit the client request "we just want to add one more field
to this form" is guaranteed to send a shiver down the spine as it's
usually followed by "How much?, for adding a single field? you gotta be
kidding".

Most BA, end user, customer, and PM requests send shivers down
developers' spines. :)
To add a single persisted field to an input form requires the
modification of most if not all of the layers of an application.

If it's actually a persisted field in an input form (various types of
data input mechanisms related to or from HTML, MS, Oracle, Adobe, IBM,
etc), I agree. That by definition affects _every_ layer.
Adding a single field to a product line for example would probably
require the modification of all the CRUD methods for that line, not to
mention the inclusion of the field in the search algorithms, modifying
the validation layer, incorporating the field in a logical place in the
HCI, modifying and refactoring the relevant code etc etc. Anything that
can reduce the amount of work required to implement such an apparently
trivial change has got to be a good idea.

From the implementation standpoint a persisted field addition, deletion
or modification, in a complete stack scenario, will consume at least a
man-day, and possibly several. It really won't make much of a difference
as to whether a Java ORM or JDBC is involved...not on average.

The actual problem with changes of this sort is that someone a bit
higher up than the grunt coder altered the requirements, or the design.
Proponents of agile have bent over backwards to accommodate these
ineffiencies: this is just letting non-technological people dictate
implementation.
Having said that, I don't used ORMs myself, I mean I've tried, I've
really tried but I start reading the documentation and my eyes glaze
over and I discover that I need to learn 'another' (query) language and
I decide that it's just not worth the effort. Stick to SQL, accept the
overhead and I don't think you can go far wrong.

Just my 2 euros worth

The JPA specs are some of the best specs I've seen. The associated specs
for the actual implementations are also quite good. If your eyes are
glazing over when you try to read these, I'm pretty sure you've got a
hell of a time with almost every other spec out there.

Feel free to stick to SQL and waste substantial time doing NIH. It could
just be me, but I like it when I get a substantial assist creating
objects out of relational data. You just now above said "accept the
overhead": guess what, there's a lot of overhead.

AHS
 
A

Arved Sandstrom

Does he mean near-exclusive use of 3NF?

On the occasions when I've stopped at 2NF where the database or 4GL
supports that and/or thrown a few derived fields such as account totals
etc. into the data model. I've usually regretted it.

I seldom find myself mapping data model entities 1:1 onto objects for two
reasons: firstly, because after spending quite a bit of time as a DB
designer/tuner/developer and writing a fair bit of SQL in the process I
tend to think in terms of using JDBC rather than a JPA and partly because
about the only times I've needed to handle arrays of objects that
represent entities they've ended up as components of an object
representing an ordered collection of the components, e.g. the collection
of GPS fixes representing a flight path. In these cases, and individual
component is of relatively little interest because you want to display or
analyze the collection as a whole and/or its an intermediate data store
used as part of transforming the collection from one external
representation to another.
One of the sweetest things to me about JPA is constructor expressions. I
actually have met a few of the guys who came up with this: if there is
one concept in JPA that recognizes practical needs, and doesn't get too
theoretical, it's this one. It's much better, IMO, than the plethora of
other JPA techniques for producing objects.

You're not the first person in this thread who referred to object
collections or aggregate values. I've not myself found straight SQL or
JDBC any better than JPA for doing this stuff: if your app is Java then
you ultimately need to translate into objects, and often object
collections. I don't care to continually write that boilerplate myself.

AHS
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top