On 01/03/2014 10:56 PM, Arne Vajhøj wrote:
[ SNIP ]
There are plenty of cases where relational databases are not the best
solution.
Google, Facebook, Yahoo etc. did not go NoSQL just for fun.
But I would still consider a relational database to be the default
for persistence - what you use unless you have specific reasons
not to.
Quite a few important companies (so probably many more minor companies)
played with NoSQL for a bit, and then have backed off in various
degrees. Nobody went for NoSQL just for fun...most went for it because
they were sucked in by hype or misguided opinion. I'd estimate that the
majority of people have returned to RDBMSs. The fact that a number of
large companies have systems that are not relational is because you
can't throw away a few tens or hundreds of millions of $$ in a few
months or years.
NoSQL is based on relaxing ACID, particularly consistency. You do that
by cheating your end users through increasing UI speed, while at the
same time decreasing the validity of your data.
There is a lot more to it than that.
NoSQL is really many different things.
There are some large distributed database systems. There are some
which I would call modern replacements for traditional ISAM files.
Some are AP. Some are CP. CouchDB and Neo4J are ACID.
Well, we know at least one thing, NoSQL is not SQL...by definition.
What we should really do is get rid of the term NoSQL, since it
literally means "no SQL". We also have the problem that originally
"NoSQL" did refer to non-relational data stores that relaxed ACID
guarantees. Many, perhaps most, still do have relaxed guarantees. Some
NoSQL systems are ACID, I agree with you, but sometimes you have to make
the extra effort to obtain that functionality.
I may date myself here. When people started producing "NoSQL" databases,
it was a large part of the definition that ACID guarantees were relaxed.
This is just historical fact. That some people started restoring ACID to
NoSQL doesn't mean that relaxation of consistency isn't still often a
key concept.
The big internet companies did not go to NoSQL due to the hype - as
the hype was created by them going to NoSQL and they are not staying
with NoSQL to avoid a large write off. They are went to and are
staying with NoSQL, because the relational databases can not do
the job. Relational databases and PB just does not work performance
and cost wise.
I think we should definitely drop the "NoSQL" terminology here. SQL per
se is irrelevant.
I find it somewhat astonishing that in 2014, with literally dozens or
hundreds or thousands of fast large-memory servers available to a
company, that a performant solution can't be accomplished with relational.
Eric Brewer's CAP theorem is a nice academic notion. What I've often
wondered is why people jumped on ideas like that, most of which is
certainly sound, and tossed the baby out with the dishwater. A whack of
folks went wild on document databases - note that a document is
frequently self-contained and quite relational. Other people went wild
on re-inventing wheels differently: they totally abandoned relational
DBs and figured they wouldn't work at all in highly distributed systems.
Oddly enough, some of the "big internet companies" you allude to do in
fact use relational DBs that are highly distributed. The access latency
of such highly distributed RDBMSs is only marginally worse than the "Big
Data" systems we now usually think of. It also happens that when
companies do this, it's because they do in fact value relational feautures.
I am sure that there is a large portion of smaller companies that went
NoSQL (typical something "ISAM like") that did so due to the hype. And
I am sure that some of them regretted that they did so.
A lot did, yes. I'm sure I mentioned before that about 15 years ago I
worked for a company that proposed to do very innovative stuff with WML
and HDML on cellphones. Probably the worst decision we ever made was to
switch from Allaire Cold Fusion to J2EE. All due to hype.
But persistence is not a one size fit all. Different requirements can
lead to different solutions.
C is critical in many contexts (like money transactions). But there are
other contexts where it does not matter (does it matter that Facebook
users in Mexico sees an update 5 minutes later than the Facebook users
in Finland? No!).
That's not a great consistency or transactional example, Arne. No
offense. It might be better to consider the problem of a *single
individual* doing something.
There are quite a few of those C & ACID relational databases where
for performance reasons the web apps in front actually cache and
use data for N seconds even though the data may have been updated. In
that case they are paying for C & ACID for no reason.
This is true. But let's be real: an ACID consistency guarantee means
that your data may be a few seconds old but accurate. A non-ACID system
often means your data is not accurate all the time. If you've got an ORM
or web framework or HTTP server doing caching in front of an ACID
database, you've still got the C in ACID: the information might just be
a bit stale.
All that said then I still consider relational the default choice.
Especially for small startups. One get a lot of functionality and
features that may be useful. If later a need for something with
less functionality & features but higher performance for config
shows up, then it can be valid to change.
Or to put it another way: to me choosing NoSQL is an optimization
and general rules about premature optimization applies.
Arne
I agree with the above 2 paras.
AHS