caching design patterns

T

Timasmith

It seems to me that to each a truly performant enterprise level
application with thick client functionality (a lot of it) you really
need to use caching to the nines.

There are a few open source toolkits for caching, great seem to work
fine either on the server side or on the client.

In the past I have always cached reference data, the fairly static, not
too harmful if it is stale configuration option, setting and lookup
data. Works fine, works well.

BUT to truly turn the fat client into a awesomely snappy user interface
that stuns the audience... I need to cache the fluid activity data -
the orders, the results, the critical data that should be right and
recent if you display it.

Perhaps it is dangerous to even consider caching it but if you do the
reality is in most cases it doesnt change fast enough that you would
then have a blistering killer app.

So how do you do it? I wouldnt even consider it with a legacy
database, but starting from scratch with 100% control over architecture
and every line of the code interacting with the database - it is
tempting.

What if I have medium sized DTO business objects that can be retreived
with a version number from the database. No data can be written
without updating the version number.

When it comes to pulling data in I simply send out my primary key +
version number and either a new object or the locally cached one is
returned.

Could it work? Would you do it?

So many questions, never enough years left to answer them.
 
D

Daniel Pitts

Timasmith said:
It seems to me that to each a truly performant enterprise level
application with thick client functionality (a lot of it) you really
need to use caching to the nines.

There are a few open source toolkits for caching, great seem to work
fine either on the server side or on the client.

In the past I have always cached reference data, the fairly static, not
too harmful if it is stale configuration option, setting and lookup
data. Works fine, works well.

BUT to truly turn the fat client into a awesomely snappy user interface
that stuns the audience... I need to cache the fluid activity data -
the orders, the results, the critical data that should be right and
recent if you display it.

Perhaps it is dangerous to even consider caching it but if you do the
reality is in most cases it doesnt change fast enough that you would
then have a blistering killer app.

So how do you do it? I wouldnt even consider it with a legacy
database, but starting from scratch with 100% control over architecture
and every line of the code interacting with the database - it is
tempting.

What if I have medium sized DTO business objects that can be retreived
with a version number from the database. No data can be written
without updating the version number.

When it comes to pulling data in I simply send out my primary key +
version number and either a new object or the locally cached one is
returned.

Could it work? Would you do it?

So many questions, never enough years left to answer them.

Actually, I've found that the ability to add caching is important for
scaling, but actually worrying about caching when it isn't a concern
can cause a lot of problems. Its along the lines of premature
optimization.

Write it without caching. If things run slowly, profile it and find
out where (don't assume it is a caching issue). If it does appear to
require cachine, refactor it in. You do know how to refactor, don't
you? :)

Hope that helps.
 
T

Timasmith

Daniel said:
Actually, I've found that the ability to add caching is important for
scaling, but actually worrying about caching when it isn't a concern
can cause a lot of problems. Its along the lines of premature
optimization.

Write it without caching. If things run slowly, profile it and find
out where (don't assume it is a caching issue). If it does appear to
require cachine, refactor it in. You do know how to refactor, don't
you? :)

Hope that helps.

Yes I know how to refactor. You agree caching is important for scaling
but you suggest to ignore it until the problem strikes...I think it
will be too late then.
 
K

Karl Uppiano

Timasmith said:
Yes I know how to refactor. You agree caching is important for scaling
but you suggest to ignore it until the problem strikes...I think it
will be too late then.

I don't think he means ignore until the problem strikes, but test and find
out where the problems are, and fix them. Speculative optimization is the
cause of some of the most unreliable, over designed, overly complicated code
around. Build the simplest thing that will possibly work -- then -- make it
work, make it right, make it fast.
 
M

Mark Jeffcoat

Timasmith said:
Yes I know how to refactor. You agree caching is important for scaling
but you suggest to ignore it until the problem strikes...I think it
will be too late then.

I am currently maintaining and extending an application that
was designed with the philosophy you suggest. It's not the
first time, but the story never changes:

I've had to remove all the "caching" that was was originally
written. And that's the stuff that mostly worked-- the vast
majority of the caching effort just didn't work at all. (It
turns out that deciding when to expire the cache can be a
tricky problem, unless you just ignore the problem completely.
Most of the results will be wrong, but you have a decent chance
of being fired before you have to "debug" it.)

The server now runs about an order of magnitute faster than
it used to, and it actually works -- I run a profiler every
3 months or so. Mostly, it's just to amuse myself; 90% of the
benefit came from the very first run. I'm not saying you're
completely, entirely, 100% wrong.... but that's the way I'll
bet every time.



[ Sorry. The bitterness slips out sometimes. Do your
own thing, ignore the conventional wisdom, go nuts.
Revolutionize the world. But please do it on your own
project. ]
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Timasmith said:
Yes I know how to refactor. You agree caching is important for scaling
but you suggest to ignore it until the problem strikes...I think it
will be too late then.

"If it ain't broke don't fix it."

"Keep It Simple Stupid."

If very often make sense to wait solving a problem until
you know you actually do have a problem.

If you know the domain very well you may know before
you have created the first UML diagram.

If you don't know the domain, then you will need
to have a prototype working and make some tests
before you know if cache is necessary and if it will help.

Arne
 
L

Lew

....

Karl said:
I don't think he means ignore until the problem strikes, but test and find
out where the problems are, and fix them. Speculative optimization is the
cause of some of the most unreliable, over designed, overly complicated code
around. Build the simplest thing that will possibly work -- then -- make it
work, make it right, make it fast.

I have found that version numbers in a database record greatly complicate the
db - is a version number *really* part of the data model?

Also, databases have caches, at least all the products I've considered using
for real applications do. Are you likely to outperform the database cache? By
enough to matter?

Caching is useful for things like result sets, where you don't want to go back
to the database to retrieve data the *client* knows it hasn't changed.

The problem of information latency is subtle and not addressed easily by
simple bandages. Consider for each query how much latency it can tolerate -
must it have absolutely the latest available data incorporating all other
clients' changes, or can it work for a while with data from a few
(milli)seconds ago?

Stay away from version information in a database record. How will you know if
a client's version is stale without doing a database query anyway? Is it
really worth great complications in update, insert and delete logic? I
predict it won't solve your (perceived) problem anyway.

There might be some value somewhere to version (or other history) data in a
transaction table's row, but I haven't seen it. (History tables are a whole
separate subject, but even there you don't have version data, just temporal
information.) I wonder what Mr. Celko thinks of the idea.

- Lew
 
T

Timasmith

I have found that version numbers in a database record greatly complicate the
db - is a version number *really* part of the data model?

Yes it can be, it is the number of times the record was updated.
Also, databases have caches, at least all the products I've considered using
for real applications do. Are you likely to outperform the database cache? By
enough to matter?

It is not the goal to outperform the database cache. It is the goal to
outperform the total of query+object creation+mapping from resultset to
object+network latency to transfer the object.
When you have 1000 users pulling objects the relieved database,
application server and network resources can be considerable.
Caching is useful for things like result sets, where you don't want to go back
to the database to retrieve data the *client* knows it hasn't changed.

The problem of information latency is subtle and not addressed easily by
simple bandages. Consider for each query how much latency it can tolerate -
must it have absolutely the latest available data incorporating all other
clients' changes, or can it work for a while with data from a few
(milli)seconds ago?

No, must be the latest updates.
Stay away from version information in a database record. How will you know if
a client's version is stale without doing a database query anyway? Is it

It would be very fast to issue select version_nbr from your_table where
primary_key = ?
really worth great complications in update, insert and delete logic? I
predict it won't solve your (perceived) problem anyway.

There might be some value somewhere to version (or other history) data in a
transaction table's row, but I haven't seen it. (History tables are a whole
separate subject, but even there you don't have version data, just temporal
information.) I wonder what Mr. Celko thinks of the idea.

- Lew

To further illustrate my point. Take this real world example

table patient (patient_id, last_name, first_name, sex, dob,... etc for
40 fields).
table appointment (id, appt_date,... etc for 30 fields).
table orders (order_id, order_name, ... etc for 40 fields).
table results (result_id, order_id, result_name, etc. for 20 fields).

I pull up a patient record and it pulls in:

1 patient rows
8 appointment rows
80 order rows
800 result rows

All row data are mapped to an object which adds significant business
methods to the data.

To pull in that much data may take non trivial resources and at least
2-5 seconds. The user can then browse, filter, sort, group and do many
operations on the local workstation. All operations are fast since the
object data is held in memory.

However the user is waiting for a new set of results. Every 10 minutes
they click a refresh button. To pull in the same data is a wasteful
use of resources. If you have 100 users simulataneously pressing
refresh then you have a significant performance bottleneck. You would
need a very fast database server and many application servers to scale
and keep the same 2-5 seconds. At some point you would probably need
to cluster the database. This becomes very expensive and starts to
push the cost of the hardware so that many organizations cannot afford
it.

But take a step back and look at the problem. For 98/100 of the
patients the data did not change. Lets suppose that each order object
has a version stamp which is updated every time anything on the order
or underlying tables for the results changes. A simple, single query
of

select version_nbr from orders where order_in (...) would identify
whether the local PC objects are still fresh or whether the object
needs to be extracted.

Now this would *never* work with a legacy system, there are just too
many possibilities that someone has written a query/batch job or
something that will update the database without updating the version
nbr. But with a new system you can enforce it from the get go.

I agree dont tune till you need to, dont fix it until you have to, but
I know this problem is coming.
This is undoubtably less applicable to simple applications which use
resultsets directly but that is not where I am coming from. The
complexity of the domain is significant and needs the DTO architecture,
fat objects etc. There is a performance cost to that of course, which
is what I am looking to offset.
 
C

Chris Uppal

Timasmith said:
But take a step back and look at the problem. For 98/100 of the
patients the data did not change. Lets suppose that each order object
has a version stamp which is updated every time anything on the order
or underlying tables for the results changes. A simple, single query
of

select version_nbr from orders where order_in (...) would identify
whether the local PC objects are still fresh or whether the object
needs to be extracted.

I can't comment on your intuition about how much caching is necessary, nor on
how you support/implement that at the data level (you may well be right about
both, I have no means to judge). But it seems that you are considering moving
this out of the implementation domain and into the data-modelling domain. I.e.
your data model is specificially designed to allow applications to understand
the history of changes to values. If so, then I suggest dropping the use of
the word "cache" altogether. That suggests a low-level implementation detail;
one that you'd expect to be handled automatically (if tune-ably) in the bowels
of the data access layer (not unlike the caching that happens inside each
result-set). But since versioning is part of your data model, it should be
visible at the application level -- just as much as, say, patient names are --
and your code for handling it should reside at that level too.

You applications may /use/ the versioning information to implement local
caches, but that is no business of the database, nor of any (putative) O-R
tools you may use. Similarly, it would be orthogonal to any data-access
Patterns you might happen to fancy using.

So you are looking for patterns (or Patterns) for handling versioned data. If
you think of it like that, and phrase it like that, then you may find more
information than you would if you just talk about "caching" (which is bound to
attract just the kind of discussion we have just had here).

I hope that makes some sort of sense.

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top