A
Arne Vajhøj
If you cant manage relational databases, you shouldn't be messing with
big data.
Why not?
Hint: relational databases are not that common for big data.
Arne
If you cant manage relational databases, you shouldn't be messing with
big data.
It certainly wasn't the real Arved.
Faked headers too, from the look of it.
ORMs are good at what they where invented for: serializing an object and
resurrecting it at a later point in time.
That means you have to design
your system and its underlying data as a collection of objects with
(encapsulated) member data. Using that approach the lifetime of an
object instance must be able to extend the actual running span of the
program. That requires serialization/resurrection by definition.
Apart
from the fact that I think that this is a bad approach on its own, to
constrain memory usage objects need to be put to sleep by default and be
resurrected only when they are accessed.
This makes the approach even
more blurred and needlessly complex, bringing stuff like caching and
managing/synchronizing duplicate object instances across concurrently
running program instances into the picture.
To make things worse almost no system only needs single object
instances. Almost any practical system needs counts, averages etc. which
could be done with a query on an RDBMS or by traversing object instances
IF THEY WHERE REAL INSTANCES. Since doing the latter with an ORM would
require resurrecting enormous amounts of instances for practical reasons
you have to pour water into the wine and do atypical stuff like joins
and aggregate queries through the ORM.
I know they CAN do this but that
is no more than a wart on such systems since they contradict the primary
goal of an ORM.
This is also the area where ORMs failed in the projects
I talked about. It's not that the ORM can not do it, it just can not do
it sufficiently well even with help from the most experienced experts we
could find.
On 12/30/13 5:38 AM, Silvio wrote:
[snip]This is a good point, and it was something niggling in subconscious.To make things worse almost no system only needs single object
instances. Almost any practical system needs counts, averages etc. which
could be done with a query on an RDBMS or by traversing object instances
IF THEY WHERE REAL INSTANCES. Since doing the latter with an ORM would
require resurrecting enormous amounts of instances for practical reasons
you have to pour water into the wine and do atypical stuff like joins
and aggregate queries through the ORM. I know they CAN do this but that
is no more than a wart on such systems since they contradict the primary
goal of an ORM. This is also the area where ORMs failed in the projects
I talked about. It's not that the ORM can not do it, it just can not do
it sufficiently well even with help from the most experienced experts we
could find.
This is where I've always struggled with ORMs, but I never consciously
acknowledged that the difficulty was in utilizing the power of the "R"
in The ORM.
This is one approach. I think one of the major "features" of most ORMI still think the best approach for most systems is to design a separate
and independent data store that covers the problem domain which is
completely isolated from the systems that implement data extractions,
processes and data storage. I do not manually write code to serialize
object instances since I do not serialize them in the first place. Such
a data store can be an RDBMS but if so desired a NoSQL thingy or even a
file system could do well. Using an RDBMS gives the additional advantage
that the data is readily accessible for standard reporting and ETL tools.
implementations is that they attempt to abstract away the actual RDBMS
layer to the point where you feel "dirty" trying to access it in any
meaningful way. This does provide some value in portability, but many
applications rarely need this portability of RDBMS, and more often
benefit from special features of the particular RDBMS chosen.
If I use ORM I expect that it can reasonably deal with deep structures.
Normalizing the data model at object level is exactly not what I want.
Hmm, you see ORM not as a function extension but simply as a more
convenient API to RDBMS. That's OK. But maybe _O_RM does not hit the
nail on the head in this case.
Well, maybe it fits because we mostly think in solutions rather than in
requirements. This applies already to the customer requests. It is often
difficult to distinguish between real requirements and creative ideas of
the customer, how to solve his issues.
Of course, that's always true.
But from my experience success is not manly a question whether you have
gone one way or the other. It is more a question if you have done this
well and in you have enough experience with the tools you are using.
Well, an XML is a document too, and most of the time it is well structured.
I do not really have experience with NoSQL databases. But I used non
relational data models and in memory computing now for about 6 years in
different projects. None of the projects failed, all are still live.
Also we did not really save resources because of the decisions made. But
from the code maintenance and from the performance point of view it was
successful. Some change requests to the first of these projects were
implemented by an apprentice in half an hour. This would have taken a
few days by a qualified programmer, if we had chosen a relational data
model.
In the last project - a larger one - we have significant performance
benefits. There is an adjacent third party application with a very
similar data model from the customers point of view. (They are better
with respect to inheritance, we are better with respect to deep
structures but both can deal with polymorphism, table properties an so on.)
They ended up with solution d). We use XML documents. Measurements show
about three orders of magnitude performance difference, measured in time
per object access. We have two CPU cores with <30% load if users make
traffic. They have 16 CPU at 80-100% load if users make traffic. The
number of objects and attributes is comparable. In fact the data is
partially synchronized by interfaces. Both are Web applications. The
number of users is comparable.
OK, they have chosen PHP (used in an object oriented way), we have .NET
3.5. But this will not explain all the 3 orders of magnitude. Their
system creates heavy load on the large attribute values table.
Marcel
Why not?
Hint: relational databases are not that common for big data.
Arne
We can agree to disagree, Arne. I think ideas like C# LINQ and ScalaMuch cleaner syntax.
But I am not convinced that the syntax is sufficient important
to translate that into "light years ahead".
Arne
Arne Vajhøj said:Let us say that you need to add a field.
With an ORM you only need to update:
* one dataclass
* one mapping of data
With plain JDBC you need to change:
* one data class
* N SQL statements
* N places in the Java code
Storing objects in a relational database via ORM is very different
from serialization (for non-trivial usage).
A serialization stores everything in a sequential stream of data.
Storing objects in a relational database via ORM store the stuff
not already stored in different tables.
Using a document store have some similarities with serialization.
No.
It requires the ability to save and load data.
That is how persistence works. The data are on disk and when a program
needs them they are read from disk to memory.
But that is not in any way ORM specific.
Plain JDBC will have the same potential issues with caching.
????
Joins is a core feature of an ORM.
Aggregate functions are not very ORM'ish.
But if they are used much in the transactional work that ORM
is intended, then something is wrong in the first place.
????
Joins is a core feature of ORM.
And common Java ORM's like JPA implemntations and Hibernate does
aggregate functions exactly like SQL in JPQL and HQL respectively.
I wonder what kind of "experts" you got.
Arne
I meant serialization in the more general sense. I am not talking about
Object(In/Out)putStream but about saving the exact state of an instance
to some addressable storage with the main purpose of restoring its state
later.
Serialization literally means to put an object into a serial form. I
think you're trying to use it to mean something close to marshalling.
http://en.wikipedia.org/wiki/Marshalling_(computer_science)
Just a thought.
[ SNIP ]20 years, over 30 including Hobbyist
By the way, my man, my youngest uncle was also called Arne. Happy NY.
I don't understand this so I fear I must be doing something wrong in
my Java programs. If someone wants to add a field in a database why do
I have to alter anything in my program other than adding, for example,
getString(String columnLabel) if I want to actually use the new field
at that point in the program.
We can agree to disagree, Arne. I think ideas like C# LINQ and Scala
Squeryl are far in advance of Java.
I was able to write a Scala DSL a few years ago that would not have been
possible in Java. Similarly, C# - I think you'd have to admit - is quite
far ahead of Java.
I meant serialization in the more general sense. I am not talking about
Object(In/Out)putStream but about saving the exact state of an instance
to some addressable storage with the main purpose of restoring its state
later.
No, not data but instances. My point is that these are fundamentally
different.
In theory the ORM approach does not need more caching than a RDBMS
driven approach. In practice this is not the case.
The number of cases
where caching outside of the RDBMS is actually needed is very limited
and I rarely use any form of data caching.
Without exception all the ORM systems I worked on relied heavily on
caching or the would not be practically usable.
You can do joins easily but that is not the big problem. The core
problem is that joins create new views on the underlying data that do
not match with the entity classes that match its underlying tables. To
represent the data from a join properly you would need a new class and
then you get into one of the biggest culprits with ORM: aliased data and
cache redundancy.
Yes, they got all the stuff working the first time with joins and
aggregates like you say. But after some time they got into trouble
because aliased data caused the systems to show faulty counts and totals
etc. and from there on it went down-hill. On more than one system they
had to forcefully flush the cache at specific moments to get the right
results for subsequent reporting. Which then did not work very promptly,
to put it mildly.
Most systems I work(ed) on are primarily analytical systems working on
data that comes from surveys, measure devices, order tracking systems
etc. and fetching a record and storing it after manually changing some
attributes is not the common use case. The primary goal is to properly
manage such data at the record level behind the scenes while at the same
time allow thorough analysis of the overall process. Much of this
requires aggregates on joins that have to be built dynamically because
the user can specify exactly what he needs via a UI, akin to an OLAP
system.
I admit that this is not a good match for ORM but in these cases that
was hindsight wisdom. When they started they thought ORM was the right
tool. Which does not surprise me since I encounter many people who never
doubt ORM is the way to go with any system.
Well I'm not sure what kind of Java applications you write but in the
world I inhabit the client request "we just want to add one more field
to this form" is guaranteed to send a shiver down the spine as it's
usually followed by "How much?, for adding a single field? you gotta be
kidding".
To add a single persisted field to an input form requires the
modification of most if not all of the layers of an application.
Adding a single field to a product line for example would probably
require the modification of all the CRUD methods for that line, not to
mention the inclusion of the field in the search algorithms, modifying
the validation layer, incorporating the field in a logical place in the
HCI, modifying and refactoring the relevant code etc etc. Anything that
can reduce the amount of work required to implement such an apparently
trivial change has got to be a good idea.
Having said that, I don't used ORMs myself, I mean I've tried, I've
really tried but I start reading the documentation and my eyes glaze
over and I discover that I need to learn 'another' (query) language and
I decide that it's just not worth the effort. Stick to SQL, accept the
overhead and I don't think you can go far wrong.
Just my 2 euros worth
One of the sweetest things to me about JPA is constructor expressions. IDoes he mean near-exclusive use of 3NF?
On the occasions when I've stopped at 2NF where the database or 4GL
supports that and/or thrown a few derived fields such as account totals
etc. into the data model. I've usually regretted it.
I seldom find myself mapping data model entities 1:1 onto objects for two
reasons: firstly, because after spending quite a bit of time as a DB
designer/tuner/developer and writing a fair bit of SQL in the process I
tend to think in terms of using JDBC rather than a JPA and partly because
about the only times I've needed to handle arrays of objects that
represent entities they've ended up as components of an object
representing an ordered collection of the components, e.g. the collection
of GPS fixes representing a flight path. In these cases, and individual
component is of relatively little interest because you want to display or
analyze the collection as a whole and/or its an intermediate data store
used as part of transforming the collection from one external
representation to another.
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.