ORMs comparisons/complaints.

A

Arved Sandstrom

2014-01-03 14:10, Arved Sandstrom skrev:

The more usual Swedish variant would be Arvid (Sandström).
I did not actually know that about my first name, thank you. My surname
does actually have the umlaut: my parents, myself and my sisters rarely
spell it that way, so we don't have to carefully pronounce it to
Anglo-Saxons or French. Very few folks in North America can actually
correctly pronounce Estonian names. :)

AHS
 
A

Arved Sandstrom

Marcel said:
[...quite a bit of interesting stuff...]

Choosing this as a more-or-less arbitrary point to jump in, I have a question.

This is an elegant and diplomatic way of saying, there are so many
frigging long posts already, I can't keep track. :)

You should work for the UN perhaps.
Given an existing relational database (or set of databases) that:

!) Is/are central to the business (i.e. not there just to "support" some
specific activity).

2) Is a fairly good relational expression of the underlying business model (but
see (3) and (7)).

3) Is old, old, old...

4) Is used by more than one application, which are not all written in the same
language

5) Has severe perfomance problems (caused by the nature of rhe business) even
when accessed via had-written SQL such that there is a fairly large amount of
explicit demnormalisation, and the write-measure-rewrite loop is a central part
of normal, everyday, application development/maintenance. And, even so, many of
the most important processes take long enough to cause very real operational
difficulties for customers.

6) Is not simple, in the sense that most of the "entities" that a user would
think in terms of do not correspond to rows of data in single tables.

7) Has historical quirks (see 3). Such as: no primary keys (the data has
primary keys at the logical level, but -- due to history -- the physical DBMS
itself doesn't know what they are, it just knows about a buch of unique
indexes). No FK constraints (again they are there in the logical model, but
not expressed physically).

Is my feeling that even attempting to use an ORM for developing brand new
applications against the database would be impossible and/or massively
counter-productive, actually defensible ? I.e: am I right to dismiss ORM /out
of hand/ ?

(Incidentally, of the relatively few database I've come across in 30 years of
programming, many fit that pattern).

-- chris
You're not wrong in general, Chris. I just this past year saw a database
that had no primary keys, and about 4 or 5 FKs, and about 50 tables. As
you may surmise, it was about as denormalized and non-validated as you
can imagine.

About 6 or 7 years ago an Adabas database that I had to work with was
automatically translated to an RDBMS: this was not very successful - the
subsequent attempt to use TopLink on it was painful.

And I've seen plenty of Oracle, Sybase, MS SQL Server, MySQL, Postgres
etc setups which prove your point #5...as in, any means of accessing the
DB would be difficult. Having said that, I still believe that an ORM is
usually better than a lower-level API for accessing good *or* bad APIs.

AHS
 
A

Arne Vajhøj

You are right. We are in an intermediate situation where relational DBMS
are integrated into backup, service and management processes by far
better than the new OO or document based DBMS. And of course, the
administrators have experience with them.
Because of this I did the last larger project still with MSSQL (in
snapshot isolation mode). But the data model did not depend on the DB. I
simply store XML documents with revision information. In fact the DBMS
backend was activated about 4 month after the project started. So when
the time is come we could easily move to Mongo or whatever.

XML in either XML or CLOB columns is one way to combine the operational
advantages of the RDBMS with a NoSQL approach to data.

But you will still lack in tools support. No support from reporting
software. And no tools for adhoc work - you will need to code everything.
What I miss by far more is a standard, how to use NoSQL DBMS. So anybody
writes its own framework with its own advantages and disadvantages.

It is rather chaotic now. Many incompatible libraries. And some of the
libraries are of bad quality.

There are some attempts to remedy that in the Java world. Many
JPA providers are starting to support NoSQL databases.

Even though JPA is intended as an ORM then it can be made to work
with NoSQL databases as well.

And it is a standardized API.

MS could do the same with EF if they wanted to.
COBOL, dark chapter - from the present point of view. And SAP extended
the COBOL style by ABAP.

Legacy stuff is a reality whether we like it or not.
[performance]
It should be mandatory for developers writing code using ORM to do some
testing with the logging of actual SQL statements executed turned on. It
reveals what is going on without having to know so much about the inside
of the ORM.

Well, true, but from the programmers point of view, I do not always want
to go that deep into the framework. This is like digging down to the
assembler level in former times.
However, at least when doing function programming (without a functional
language) performance checks have to be done by some means or other. As
rule of thumb the number of queries placed on the backend should be
finite on each user interaction, i.e. no queries in a loop. If you
follow this rule reasonable indices and the optimizer can do their job
usually.

It is like many other things.

One write some readable code.

If performance test shows a real performance problem then one
investigate.

If one is using an ORM, then checking what SQL it generates is
an obvious first step.
There is always a way (with unlimited resources :) ). But the question
is about maintainability and TCO. And the answer is not always clear.
Even small bugs or missing features in the ORM layer may result in
fragile workarounds that collapse on small code changes.

I would prefer when the investigation of standards around NoSQL over
driving ORM to the limits. The first has a future, the latter is an
intermediate solution. Of course, both have their places for now.

I doubt that relational databases will go away.

Arne
 
A

Arne Vajhøj

Well, an XML is a document too, and most of the time it is well structured.

It can be but does not need to be.

If there is a schema defining the format and data get validated against
it then it is well structured.

That also carry some of the same restrictions as relational databases
even though XML schemas allow for much more flexible structures
than database tables.

If there are no schema or validation then there are no structure
enforced.
I do not really have experience with NoSQL databases. But I used non
relational data models and in memory computing now for about 6 years in
different projects. None of the projects failed, all are still live.
Also we did not really save resources because of the decisions made. But
from the code maintenance and from the performance point of view it was
successful. Some change requests to the first of these projects were
implemented by an apprentice in half an hour. This would have taken a
few days by a qualified programmer, if we had chosen a relational data
model.

In the last project - a larger one - we have significant performance
benefits. There is an adjacent third party application with a very
similar data model from the customers point of view. (They are better
with respect to inheritance, we are better with respect to deep
structures but both can deal with polymorphism, table properties an so on.)
They ended up with solution d). We use XML documents. Measurements show
about three orders of magnitude performance difference, measured in time
per object access. We have two CPU cores with <30% load if users make
traffic. They have 16 CPU at 80-100% load if users make traffic. The
number of objects and attributes is comparable. In fact the data is
partially synchronized by interfaces. Both are Web applications. The
number of users is comparable.
OK, they have chosen PHP (used in an object oriented way), we have .NET
3.5. But this will not explain all the 3 orders of magnitude. Their
system creates heavy load on the large attribute values table.

There are plenty of cases where relational databases are not the best
solution.

Google, Facebook, Yahoo etc. did not go NoSQL just for fun.

But I would still consider a relational database to be the default
for persistence - what you use unless you have specific reasons
not to.

Arne
 
A

Arved Sandstrom

On 01/03/2014 10:56 PM, Arne Vajhøj wrote:
[ SNIP ]
There are plenty of cases where relational databases are not the best
solution.

Google, Facebook, Yahoo etc. did not go NoSQL just for fun.

But I would still consider a relational database to be the default
for persistence - what you use unless you have specific reasons
not to.

Arne

Quite a few important companies (so probably many more minor companies)
played with NoSQL for a bit, and then have backed off in various
degrees. Nobody went for NoSQL just for fun...most went for it because
they were sucked in by hype or misguided opinion. I'd estimate that the
majority of people have returned to RDBMSs. The fact that a number of
large companies have systems that are not relational is because you
can't throw away a few tens or hundreds of millions of $$ in a few
months or years.

NoSQL is based on relaxing ACID, particularly consistency. You do that
by cheating your end users through increasing UI speed, while at the
same time decreasing the validity of your data.

AHS
 
L

Lars Enderin

2014-01-03 23:34, Arved Sandstrom skrev:
I did not actually know that about my first name, thank you. My surname
does actually have the umlaut: my parents, myself and my sisters rarely
spell it that way, so we don't have to carefully pronounce it to
Anglo-Saxons or French. Very few folks in North America can actually
correctly pronounce Estonian names. :)

There were 20751 Arvids vs 25 Arveds in Sweden last year.
 
J

Joerg Meier

I did not actually know that about my first name, thank you. My surname
does actually have the umlaut: my parents, myself and my sisters rarely
spell it that way, so we don't have to carefully pronounce it to
Anglo-Saxons or French.

Also having an Umlaut in my Name (Jörg), I learned that the apropriate way
to spell ä ö ü is ae, oe, ue, when you don't want or can't use the Umlaut,
although I can see how "Sandstroem" wouldn't lend itself to easy
pronunciation :)

Computer programs also still have a surprising amount of trouble with
simple European characters, my mum had both an ö and a ß in her name, and
we ended up with a surprising amount of folders in C:\Users, because
apparently, a lot of programs not only access that location wrong, but also
have different wrong ways to express those characters. I think the record
was held by my sister, who had 3 Umlaute in her name, and ended up with an
impressive 17 different "C:\Users\First Last" combinations (although that
was the last generation of Windows, I just don't remember off hand where
Windows XP had those folders).

Liebe Gruesse,
Joerg
 
A

Arne Vajhøj

I've used MySQL in the past to throw together a prototype/proof of
concept. I can download, install, configure and populate a simple
database, knock up a quick Servlet/HTML interface and have something
basic working in a few hours on my laptop that I can carry around with
me, call it a throwaway prototype if you like. I had a look at the
Oracle download, 2.5GB, MySQL is 290MB, that's lightweight :)

You can get some big Oracle installers.

But for this purpose Oracle XE seems sufficient.

And that is a lot less (only 327 MB for 11.2 for Win32).

Arne
 
A

Arne Vajhøj

Also having an Umlaut in my Name (Jörg), I learned that the apropriate way
to spell ä ö ü is ae, oe, ue, when you don't want or can't use the Umlaut,
although I can see how "Sandstroem" wouldn't lend itself to easy
pronunciation :)

It is the official way in many contexts.

http://denmark.usembassy.gov/non-immigrant_visas/ds-160-online-application-instructions.html

<quote>
If your name, address or other information needed to complete the visa
application form is spelled with the Danish letters æ, ø or å, you
should write ae, oe or aa.
</quote>

http://finland.usembassy.gov/faq_ds-160.html

<quote>
Finnish/Swedish characters ä, ö, and å should be typed as ae, oe and aa
respectively.
</quote>

Arne
 
A

Arne Vajhøj

Well that's an(other) interesting point.
How many times in your career have you actually changed databases
mid-stream? I'm not talking about prototyping, where you may use a
simple lightweight database almost as a stub for prototyping purposes.
I'm talking about the situation where you have a working application
that's in the wild and making money, suddenly it is decided to change
the database, say move from Oracle to a Microsoft product. The
investment in the existing technology will be immense, switching vendors
will require a huge additional investment in training, licensing blah
blah blah ... yet this seems to be an oft quoted reason for using ORMs
... I have never experienced this myself. How often does it really happen?

It happens occasionally. A company changes strategy. The company is
acquired by another company or part of the company is spun off, which
results in a new set of strategies. It is not that frequently though.

But it is not so important for Arved's point, because there are
other cases where developers need to cope with different SQL
dialects:
* the company uses more than one database (external customers using
different databases, different internal departments using different
databases) requiring developers to use different databases for
different products/projects
* developers changing job where the new company use a different database
than the old company. For consultants/freelancers that happens
very frequently.

The majority of Java developers would use several different databases
during a 10 year period.

Arne
 
A

Arne Vajhøj

On 01/03/2014 10:56 PM, Arne Vajhøj wrote:
[ SNIP ]
There are plenty of cases where relational databases are not the best
solution.

Google, Facebook, Yahoo etc. did not go NoSQL just for fun.

But I would still consider a relational database to be the default
for persistence - what you use unless you have specific reasons
not to.

Quite a few important companies (so probably many more minor companies)
played with NoSQL for a bit, and then have backed off in various
degrees. Nobody went for NoSQL just for fun...most went for it because
they were sucked in by hype or misguided opinion. I'd estimate that the
majority of people have returned to RDBMSs. The fact that a number of
large companies have systems that are not relational is because you
can't throw away a few tens or hundreds of millions of $$ in a few
months or years.

NoSQL is based on relaxing ACID, particularly consistency. You do that
by cheating your end users through increasing UI speed, while at the
same time decreasing the validity of your data.

There is a lot more to it than that.

NoSQL is really many different things.

There are some large distributed database systems. There are some
which I would call modern replacements for traditional ISAM files.

Some are AP. Some are CP. CouchDB and Neo4J are ACID.

The big internet companies did not go to NoSQL due to the hype - as
the hype was created by them going to NoSQL and they are not staying
with NoSQL to avoid a large write off. They are went to and are
staying with NoSQL, because the relational databases can not do
the job. Relational databases and PB just does not work performance
and cost wise.

I am sure that there is a large portion of smaller companies that went
NoSQL (typical something "ISAM like") that did so due to the hype. And
I am sure that some of them regretted that they did so.

But persistence is not a one size fit all. Different requirements can
lead to different solutions.

C is critical in many contexts (like money transactions). But there are
other contexts where it does not matter (does it matter that Facebook
users in Mexico sees an update 5 minutes later than the Facebook users
in Finland? No!).

There are quite a few of those C & ACID relational databases where
for performance reasons the web apps in front actually cache and
use data for N seconds even though the data may have been updated. In
that case they are paying for C & ACID for no reason.

All that said then I still consider relational the default choice.
Especially for small startups. One get a lot of functionality and
features that may be useful. If later a need for something with
less functionality & features but higher performance for config
shows up, then it can be valid to change.

Or to put it another way: to me choosing NoSQL is an optimization
and general rules about premature optimization applies.

Arne
 
A

Arved Sandstrom

Also having an Umlaut in my Name (Jörg), I learned that the apropriate way
to spell ä ö ü is ae, oe, ue, when you don't want or can't use the Umlaut,
although I can see how "Sandstroem" wouldn't lend itself to easy
pronunciation :)

Much of Estonian doesn't. :) There are even some Estonians in
particular dialects who can't correctly pronounce o~, for example.

Estonian not only ranks very highly as having words with very many
consecutive vowels, it may possibly also have a few words with the most
consecutive umlauts.

AHS
 
A

Arne Vajhøj

Interesting: that is one I'd specifically avoid because it, along with
Oracle, has/had some of the most nonstandard SQL syntax of any RDBMS.

If you are good with plain SQL-92 then any recent MySQL using default
engine type should be OK.

Some stuff from newer SQL versions like CTE, XML data type etc. are
missing.

You should not change the engine type from default (InnoDB) to MyISAM as
that would break a lot of standard features.

If you insist on using very funky names then start MySQL with --ansi
to use standard quoted identifiers.

Arne
 
A

Arved Sandstrom

On 01/03/2014 10:56 PM, Arne Vajhøj wrote:
[ SNIP ]
There are plenty of cases where relational databases are not the best
solution.

Google, Facebook, Yahoo etc. did not go NoSQL just for fun.

But I would still consider a relational database to be the default
for persistence - what you use unless you have specific reasons
not to.

Quite a few important companies (so probably many more minor companies)
played with NoSQL for a bit, and then have backed off in various
degrees. Nobody went for NoSQL just for fun...most went for it because
they were sucked in by hype or misguided opinion. I'd estimate that the
majority of people have returned to RDBMSs. The fact that a number of
large companies have systems that are not relational is because you
can't throw away a few tens or hundreds of millions of $$ in a few
months or years.

NoSQL is based on relaxing ACID, particularly consistency. You do that
by cheating your end users through increasing UI speed, while at the
same time decreasing the validity of your data.

There is a lot more to it than that.

NoSQL is really many different things.
There are some large distributed database systems. There are some
which I would call modern replacements for traditional ISAM files.

Some are AP. Some are CP. CouchDB and Neo4J are ACID.

Well, we know at least one thing, NoSQL is not SQL...by definition. :)

What we should really do is get rid of the term NoSQL, since it
literally means "no SQL". We also have the problem that originally
"NoSQL" did refer to non-relational data stores that relaxed ACID
guarantees. Many, perhaps most, still do have relaxed guarantees. Some
NoSQL systems are ACID, I agree with you, but sometimes you have to make
the extra effort to obtain that functionality.

I may date myself here. When people started producing "NoSQL" databases,
it was a large part of the definition that ACID guarantees were relaxed.
This is just historical fact. That some people started restoring ACID to
NoSQL doesn't mean that relaxation of consistency isn't still often a
key concept.
The big internet companies did not go to NoSQL due to the hype - as
the hype was created by them going to NoSQL and they are not staying
with NoSQL to avoid a large write off. They are went to and are
staying with NoSQL, because the relational databases can not do
the job. Relational databases and PB just does not work performance
and cost wise.

I think we should definitely drop the "NoSQL" terminology here. SQL per
se is irrelevant.

I find it somewhat astonishing that in 2014, with literally dozens or
hundreds or thousands of fast large-memory servers available to a
company, that a performant solution can't be accomplished with relational.

Eric Brewer's CAP theorem is a nice academic notion. What I've often
wondered is why people jumped on ideas like that, most of which is
certainly sound, and tossed the baby out with the dishwater. A whack of
folks went wild on document databases - note that a document is
frequently self-contained and quite relational. Other people went wild
on re-inventing wheels differently: they totally abandoned relational
DBs and figured they wouldn't work at all in highly distributed systems.

Oddly enough, some of the "big internet companies" you allude to do in
fact use relational DBs that are highly distributed. The access latency
of such highly distributed RDBMSs is only marginally worse than the "Big
Data" systems we now usually think of. It also happens that when
companies do this, it's because they do in fact value relational feautures.
I am sure that there is a large portion of smaller companies that went
NoSQL (typical something "ISAM like") that did so due to the hype. And
I am sure that some of them regretted that they did so.

A lot did, yes. I'm sure I mentioned before that about 15 years ago I
worked for a company that proposed to do very innovative stuff with WML
and HDML on cellphones. Probably the worst decision we ever made was to
switch from Allaire Cold Fusion to J2EE. All due to hype.
But persistence is not a one size fit all. Different requirements can
lead to different solutions.

C is critical in many contexts (like money transactions). But there are
other contexts where it does not matter (does it matter that Facebook
users in Mexico sees an update 5 minutes later than the Facebook users
in Finland? No!).

That's not a great consistency or transactional example, Arne. No
offense. It might be better to consider the problem of a *single
individual* doing something.
There are quite a few of those C & ACID relational databases where
for performance reasons the web apps in front actually cache and
use data for N seconds even though the data may have been updated. In
that case they are paying for C & ACID for no reason.

This is true. But let's be real: an ACID consistency guarantee means
that your data may be a few seconds old but accurate. A non-ACID system
often means your data is not accurate all the time. If you've got an ORM
or web framework or HTTP server doing caching in front of an ACID
database, you've still got the C in ACID: the information might just be
a bit stale.
All that said then I still consider relational the default choice.
Especially for small startups. One get a lot of functionality and
features that may be useful. If later a need for something with
less functionality & features but higher performance for config
shows up, then it can be valid to change.

Or to put it another way: to me choosing NoSQL is an optimization
and general rules about premature optimization applies.

Arne
I agree with the above 2 paras.

AHS
 
A

Arne Vajhøj

On 01/03/2014 10:56 PM, Arne Vajhøj wrote:
[ SNIP ]

There are plenty of cases where relational databases are not the best
solution.

Google, Facebook, Yahoo etc. did not go NoSQL just for fun.

But I would still consider a relational database to be the default
for persistence - what you use unless you have specific reasons
not to.

Quite a few important companies (so probably many more minor companies)
played with NoSQL for a bit, and then have backed off in various
degrees. Nobody went for NoSQL just for fun...most went for it because
they were sucked in by hype or misguided opinion. I'd estimate that the
majority of people have returned to RDBMSs. The fact that a number of
large companies have systems that are not relational is because you
can't throw away a few tens or hundreds of millions of $$ in a few
months or years.

NoSQL is based on relaxing ACID, particularly consistency. You do that
by cheating your end users through increasing UI speed, while at the
same time decreasing the validity of your data.

There is a lot more to it than that.

NoSQL is really many different things.
There are some large distributed database systems. There are some
which I would call modern replacements for traditional ISAM files.

Some are AP. Some are CP. CouchDB and Neo4J are ACID.

Well, we know at least one thing, NoSQL is not SQL...by definition. :)

What we should really do is get rid of the term NoSQL, since it
literally means "no SQL". We also have the problem that originally
"NoSQL" did refer to non-relational data stores that relaxed ACID
guarantees. Many, perhaps most, still do have relaxed guarantees. Some
NoSQL systems are ACID, I agree with you, but sometimes you have to make
the extra effort to obtain that functionality.

I may date myself here. When people started producing "NoSQL" databases,
it was a large part of the definition that ACID guarantees were relaxed.
This is just historical fact. That some people started restoring ACID to
NoSQL doesn't mean that relaxation of consistency isn't still often a
key concept.
The big internet companies did not go to NoSQL due to the hype - as
the hype was created by them going to NoSQL and they are not staying
with NoSQL to avoid a large write off. They are went to and are
staying with NoSQL, because the relational databases can not do
the job. Relational databases and PB just does not work performance
and cost wise.

I think we should definitely drop the "NoSQL" terminology here. SQL per
se is irrelevant.

I agree the terminology is a bit funky.

SQL is really traditional relational.

NoSQL is really non-relational based on new technology.

NewSQL is really relational based on the same technologies as NoSQL.

And no surprise that non-X databases are more diverse than X databases.
I find it somewhat astonishing that in 2014, with literally dozens or
hundreds or thousands of fast large-memory servers available to a
company, that a performant solution can't be accomplished with relational.

Software is usually good at what it was designed for, but not good at
what it was not designed for.

Oracle, DB2, ASE, SQLServer, PostgreSQL, MySQL etc. was not designed
to run in massive parallel configurations (most of them can not even
run 2 node active-active).

Typical the ability to do more with less code result in some overhead.
Relational databases comes with a lot of stuff: SQL interface,
transaction support etc. that makes life a lot easier for developers.
But it does come with a cost. With MB and GB data, then that is typical
not a problem. With TB data it may or may not be a problem. With PB
data it is a problem.

Different problems lead to different solutions.

There are attempts to make a new kind of relational databases in the
so called NewSQL databases (like Google Spanner) to bring back
relational in big data.

I think it is too early to say whether that will be a success or not.
Eric Brewer's CAP theorem is a nice academic notion. What I've often
wondered is why people jumped on ideas like that, most of which is
certainly sound, and tossed the baby out with the dishwater. A whack of
folks went wild on document databases - note that a document is
frequently self-contained and quite relational. Other people went wild
on re-inventing wheels differently: they totally abandoned relational
DBs and figured they wouldn't work at all in highly distributed systems.

I actually like the focus on the CAP theorem.

It has made it very clear to everybody that picking X may mean
not picking Y.

It is impossible to get everything and one need to prioritize what
is most important.
Oddly enough, some of the "big internet companies" you allude to do in
fact use relational DBs that are highly distributed. The access latency
of such highly distributed RDBMSs is only marginally worse than the "Big
Data" systems we now usually think of. It also happens that when
companies do this, it's because they do in fact value relational feautures.

Sure. Facebook and Yahoo are still big MySQL users (Google seems to be
dropping MySQL).

But again - it is different tools for different tasks.
A lot did, yes. I'm sure I mentioned before that about 15 years ago I
worked for a company that proposed to do very innovative stuff with WML
and HDML on cellphones. Probably the worst decision we ever made was to
switch from Allaire Cold Fusion to J2EE. All due to hype.

CF was probably a lot more mature back then.

But Java EE and the rest of Java evolved while CF is almost gone.

So if the criteria was that the company want to be on a successful
platform that would evolve and be supported for a long time, then it
was the right decision.

If the criteria was to make web development faster and cheaper from day
one then I believe that it was a disaster.

Side note: CF has merged into Java and Java EE later, but it has not
resurrected CF.
That's not a great consistency or transactional example, Arne. No
offense. It might be better to consider the problem of a *single
individual* doing something.

It is the equivalent of the traditional transaction example of two
attempts to withdraw money from the same bank account in two different
cities.

And it is also where the consistency problem in the AP system are
seen in real life. Users served by different data centers see different
information.
This is true. But let's be real: an ACID consistency guarantee means
that your data may be a few seconds old but accurate. A non-ACID system
often means your data is not accurate all the time. If you've got an ORM
or web framework or HTTP server doing caching in front of an ACID
database, you've still got the C in ACID: the information might just be
a bit stale.

The data are stored C somewhere, but they may be shown in a non-C way.

Arne
 
A

Arne Vajhøj

But, if the API is standards-compliant SQL and it is (preferably) ACID
then do we give a stuff what the underlying storage mechanism is? I'd be
happy to regard anything that fits this slot as an SQL DBMS.

But some RDBMS are designed to do this. Remember the Red Brick Data
Warehouse which appeared around 15 years ago? That could make a pretty
good stab at this. It offered the ability to partition large tables and,
when required, would run a query per partition in parallel before
combining the results into a single result set (we partitioned the date
by date so we could hold old and seldom accessed partitions on a nearline
tape library).

All its tables, indexes, and table partitions were mapped onto standard
UNIX files, so does this count as NewSQL in your proposed terminology?

It is not terms that I have invented.
Bear in mind that it was ACID compliant.

I only saw it do this on a single server with a large (for the time)
filestore but its an approach that could easily be generalised to support
a distributed DB.

When you consider how long ago the RedBrick Data Warehouse did this sort
of thing, is Spanner really cutting edge?

I know nothing about Red Brick.

But it can have have been some of the same as NewSQL today.

It is not that unusual that "new" technology are very similar to
some earlier technology that just did not succeed commercially.

But I do think that a little bit more than partitioning in multiple
servers are necessary to qualify.

Some of the traditional relational databases offer that.

And you can do the same with any database using a library that
can do it client side.

In those cases it is often called sharding instead of
partitioning.

If you are using Hibernate then it is just a matter
of using Hibernate Shards.

Arne
 
A

Arved Sandstrom

I'd assumed that the ORM would be generating SQL, so its good to know
that what it generates can be inspected. Not that I'm casting nasturtiums
at ORMS, but I've seen some truly dreadful automatically generated SQL in
the past (Sybase PowerBuilder, I'm looking at you), so I tend to be
suspicious of auto-generated SQL.

However, I was wondering whether there any performance problems could
come from the way an ORM uses its generated SQL and what tools might be
available to analyse them.
[ SNIP ]

Martin, I'd be generally inclined to tackle the possible performance
problem by using ORM logging first, RDBMS tracing/profiling second to
further narrow down issues, and only then backtracking to the ORM to
code review (or worst case dive into open source ORM code and make
changes).

AHS
 
M

Marcel Müller

XML in either XML or CLOB columns is one way to combine the operational
advantages of the RDBMS with a NoSQL approach to data.

But you will still lack in tools support. No support from reporting
software. And no tools for adhoc work - you will need to code everything.

It's not that bad anymore. E.g. MSSQL supports xpath in queries (AFAIK
since 2008) and you can also create indices based on xpath expressions.
But there are restrictions. In fact I did not use this features in
production so far.
It is rather chaotic now. Many incompatible libraries. And some of the
libraries are of bad quality.
Ack.

There are some attempts to remedy that in the Java world. Many
JPA providers are starting to support NoSQL databases.

Even though JPA is intended as an ORM then it can be made to work
with NoSQL databases as well.

And it is a standardized API.

Maybe this is a start.
MS could do the same with EF if they wanted to.

Uh, I did not recover from EF 3.5 so far. Never touched EF since that.
For my taste it is bound to tight to MSSQL.

[performance]
It is like many other things.

One write some readable code.

If performance test shows a real performance problem then one
investigate.

Unfortunately this always tends to end up in the situation that the
program consumes all available resources regardless of the complexity of
the requirement. And even slight changes of the requirement maybe only
small configuration changes bring the time bomb to explode. Not that pretty.

I just had such a case today. 1.5E5 rows input, 2E4 rows out, now 5E4
rows out, n full table scans => O(n²), reduced locality, reduced cache
efficiency, boom! 1 hour -> 24 hours. The optimization has not been done
so far, but I would wonder if it will take more than a few minutes after
completion. Programs like this mainly produce CO2.

If one is using an ORM, then checking what SQL it generates is
an obvious first step.

Yes. I think you won't come around this for now.
I doubt that relational databases will go away.

I agree. They will at least live as long as the large code base that
rely on that. But the question is what are the future developments of SQL?

And maybe at some point we will find SQL emulators based on NoSQL
backends. So to speak the reciprocal of ORM. ;-)


Marcel
 
J

Jukka Lahtinen

Arne Vajhøj said:
<quote>
Finnish/Swedish characters ä, ö, and å should be typed as ae, oe and aa
respectively.
</quote>

Actually more often a less wrong way for Finnish would be to use
a for ä and o for ö.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top