Not using exceptions only for exceptional conditions

V

villo

Hi everybody

I always have agreed that exception shouldn't be used for flow control,
nevertheless I have incurred in a situation where breaking this
principle is probably good, for other reasons than design.

The situation is inserting a row in a table, extremely simple stuff. It
happens that the application is basically only doing this and the table
has ~10^7 records, so I have rewritten the application logic, that
would be correctly described as

ARecordObject r=...

if(aDao.find(r)!=null)
aDao.insert(r);
else
aDao.merge(r,anotherRecordObject);

in the form of

if(!aDao.insert(r))
aDao.merge(r,anotherRecordObject);

where insert catches DataIntegrityViolationException and returns true
iff an insert was made.

Now, I see that the first form is better for many reasons, but if the
latter performs faster should I really
bother?

Cheers
Francesco
 
L

lordy

Hi everybody

The situation is inserting a row in a table, extremely simple stuff. It
happens that the application is basically only doing this and the table
has ~10^7 records, so I have rewritten the application logic, that
would be correctly described as

ARecordObject r=...

if(aDao.find(r)!=null)
aDao.insert(r);
else
aDao.merge(r,anotherRecordObject);

in the form of

if(!aDao.insert(r))
aDao.merge(r,anotherRecordObject);

where insert catches DataIntegrityViolationException and returns true
iff an insert was made.

Now, I see that the first form is better for many reasons, but if the
latter performs faster should I really
bother?

For maximum performance you should push that logic into a stored
procedure on the database.

Failing that the latter method is good and relevant bits of the program
should document the 'flow'.
Have you also considered relative occurence of inserts() to merges() in
your application.
Depeneding on incomming data it may be quicker to try a merge() first.
(Presumable this would be an update that returns zero rows updated
rather than an exception)

Lordy
 
R

Rastislav Komara

villo said:
Hi everybody

I always have agreed that exception shouldn't be used for flow control,
nevertheless I have incurred in a situation where breaking this
principle is probably good, for other reasons than design.

The situation is inserting a row in a table, extremely simple stuff. It
happens that the application is basically only doing this and the table
has ~10^7 records, so I have rewritten the application logic, that
would be correctly described as

ARecordObject r=...

if(aDao.find(r)!=null)
aDao.insert(r);
else
aDao.merge(r,anotherRecordObject);

in the form of

if(!aDao.insert(r))
aDao.merge(r,anotherRecordObject);

where insert catches DataIntegrityViolationException and returns true
iff an insert was made.

Now, I see that the first form is better for many reasons, but if the
latter performs faster should I really
bother?

Cheers
Francesco


It is absolutly bad solution. You cannot achieve reasonable performance
using second way. Exceptions are realy realy slow for building stack
trace. If you do not want to use SQL test over DB use other way to
determine which operation is required. I cant recomend DB stored
porcedures. This is hard to write and bound application to special
Database infrastructure lovering portability and cacheing abilities.
 
C

Chris Uppal

villo said:
if(aDao.find(r)!=null)
aDao.insert(r);
else
aDao.merge(r,anotherRecordObject);

Now, I see that the first form is better for many reasons,

It isn't better unless you are running your database at such a high isolation
level that the failed find() will prevent anyone else adding a record which
/would/ have matched the conditions in the SELECT. Similarly, the find()
could succeed, but the row be removed before your merge() executed; but
forbidding that doesn't require such a high isolation level[*].

OTOH, the second form is not a lot better, since once the error has been
triggered (and /assuming/ you've managed to find a -- database dependent -- way
of recognising the scenario accurately), there is /still/ no guarantee that the
data won't change under you before your alternative code is executed, so you
need some sort of loop somewhere.

Speaking as a non-expert, I prefer the second form, despite that problem.

I'd much prefer a third form with an UPDATE_OR_INSERT atomic action -- but
SQL doesn't provide that[**] :-(

-- chris


[*] I gather from the PostgreSQL documentation, that that DB may be quite
likely to use an isolation level high enough for this, since the level needed
for the second to work (read repeatability) is apparently just a synonym for
the highest level "serializable".

[**] Unless, perhaps, one could cobble something together with select for
update -- I'm no expert, as I mentioned.
 
C

Chris Uppal

Rastislav said:
It is absolutly bad solution. You cannot achieve reasonable performance
using second way. Exceptions are realy realy slow for building stack
trace.

I find it /really/ difficult to believe that the cost of building, throwing,
and catching, an exception is even remotely comparable with the cost of a round
trip to a database. (And that's assuming that a select followed by an insert
runs in approximately the same time as a single insert -- which is probably
true when taking DB caching into account, but only at a far coarser granularity
than the time taken to throw an exception.)

-- chris
 
V

villo

Thanks for the feedback. Just a couple of remarks:
-I know that on the average the cost of building the stacktrace and
throwing the exception
is not comparable with a round trip to the db. I have to admit that I
haven't done any serious
benchmark in my case, which I'll probably do soon. But I have the
feeling that in my case
it can be slightly different (see below)
-Chris made a good point, actually my rdbms (hsql) supports only read
uncommitted, so it is possible that I have a race condition here.
Nevertheless this doesn't represent a problem at the application level:
in case a record has been inserted between find() and merge() I simply
catch the Duplicate...Exception again.
-Select for update could be a way but I'm not sure wheter hsqldb
supports it. And I wonder if I can mimic the merge() method with bare
sql...

The reason for the first point is that due to requirements I'm forced
to use the embedded file storage hsqldb provides. Now Inserting the
data is not a problem, but search time when the table size is in the
order of 10^7 tend to be unacceptable for the current scenario, where
the objective is to write as fast as I can.
Just to give an overview of what I'm doing: I have a thread pool making
axfr requests and writing some information to a single table. Each
record contains basically an IP and the ccTld it belongs to. Laterwards
the collected data will be merged to a separate database to make some
data analysis. This will be done in batches.

So to summarize I guess I will have to make some real benchmarks to see
which approach performs better; I expect that as long as the db is
small the first will be faster, while the second should catch up as the
size increases...Which is the pointcut it is totally unclear right
now:(

cheers
Francesco
 
C

Chris Uppal

villo said:
The reason for the first point is that due to requirements I'm forced
to use the embedded file storage hsqldb provides. Now Inserting the
data is not a problem, but search time when the table size is in the
order of 10^7 tend to be unacceptable for the current scenario, where
the objective is to write as fast as I can.

Note that inserting into a table requires the logical equivalent of a search
anyway, since rows are unique. If you are seeing a significant difference
between the time taken to do an insert (of a record you know is not already
present), and the time take to search for that record (and not find it), then
it sounds as if something is very wrong -- and presumably fixable ;-)

You might get better performance by indexing the table (only) on an otherwise
meaningless incrementing integer and/or timestamp. The idea is to make writing
this table as near as possible like writing an ordinary sequential file.

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top