Where's my Derby?

J

John B. Matthews

Paul Cager said:
My guess is that it is more likely to be a mistake than malice. The
sun-java6-jdk package has a "Suggests" dependency for "sun-java6-demo,
sun-java6-doc, sun-java6-source". Maybe they simply forgot to add a
dependency for javadb.

(I'm looking at Debian rather than Ubuntu, by the way).

Interesting. Checking Ubuntu 9.10, Karmic Koala:

$ apt-cache search javadb
sun-java6-javadb - Java(TM) DB, Sun Microsystems' distribution of Apache Derby
sun-javadb-client - Java DB client
sun-javadb-common - Java DB common files
sun-javadb-core - Java DB core
sun-javadb-demo - Java DB demo
sun-javadb-doc - Java DB documentation
sun-javadb-javadoc - Java DB javadoc
 
A

Arne Vajhøj

Strongly agreed. I want to use Java, not DebianJava. I don't want to
have to go and start a thread on a newsgroup to double-check the fact
that something that should work doesn't. Now i've wasted you guys' time
as well as my own!


I doubt that - neither Debian nor Ubuntu are that stupid.

If Java DB is considered part of Java, then shipping a "Java"
without it would be a violation of license/trademark/whatever.

MS paid 20M$ for that.

SUN/Oracle would probably not be agressive against Debian/Ubuntu. And
Debian/Ubuntu would probbaly fix it right away if notified.

Arne
 
A

Arne Vajhøj

H2 supports the rather useful MERGE INTO command, which apart from the
first word looks identical to INSERT INTO, and means something like
"INSERT INTO unless a row with that primary key already exists, in which
case UPDATE". This is useful for implementing something with semantics
like Map.put, where a put can either create a new record or replace an
old one.

I thought this was actually standard, since SQL:2003 defines a MERGE
INTO command, but on looking more closely, i see that SQL:2003's version
is different, being more complicated and less useful. H2's version isn't
even a minimal subset of the SQL:2003 one, AFAICT. In H2, i say:

MERGE INTO table (key, value) VALUES (?, ?)

Whereas in SQL:2003 (as realised by Oracle, at least), i'd have to say:

MERGE INTO table USING table
ON (key = ?)
WHEN MATCHED THEN UPDATE SET value = ?
WHEN NOT MATCHED THEN INSERT (key, value) VALUES (?, ?)

And i'd have to give my key and value twice in the parameters, unless i
wanted to use some kind of nonstandard (?) variable mechanism inside the
query. The SQL:2003 version is much more flexible, of course, and there
are all sorts of extra conditions you can add to it.

MySQL has yet another syntax for this, which looks like:

INSERT INTO table (key, value) VALUES (?, ?)
ON DUPLICATE KEY UPDATE value = VALUES(value)

As far as i can see, Derby has no way to do insert-or-update in a single
query.

Anyway, argh, looks like i'm going to have to add a facility for
database-specific SQL for the insert operation if i want this to be
portable. I'd really like to avoid having to fall back to running
multiple queries to probe the database for the existence of the key and
then do UPDATE or INSERT accordingly.

Given the maintenance work of adding a specific SQL statement for
every new database wouldn't it just be easier to make two
statements?

Especially with a high update ratio then an UPDATE with a WHERE
and only an INSERT if no rows were updated could be worth
considering.

Anyway the cost should not be so bad, because the first of
the two statements should get the relevant pages loaded
in memory so the second should be fast.

Arne
 
A

Andreas Leitgeb

Arne Vajhøj said:
Especially with a high update ratio then an UPDATE with a WHERE
and only an INSERT if no rows were updated could be worth
considering.

Would it be safe? Or could one end up with two entries for
that key, if the attempted updates happen at the same time?

(Ok, two equal keys could be prevented by unique-key-constraints,
but then it would mean that the "last" one doesn't always win, but
may possibly run into an DB-error, instead...)
 
M

Martin Gregorie

Would it be safe? Or could one end up with two entries for that key,
if the attempted updates happen at the same time?
Should be OK provided you use explicit commit units rather than default
automatic commits.
(Ok, two equal keys could be prevented by unique-key-constraints, but
then it would mean that the "last" one doesn't always win, but may
possibly run into an DB-error, instead...)
If both statements are in the same commit unit and you're using row or
page locking and a sensible lock timeout, the second update attempt
should wait until the first update commits and then report that the row
exists.
 
A

Andreas Leitgeb

Martin Gregorie said:
Should be OK provided you use explicit commit units rather than default
automatic commits.

"Should be ..." doesn't really sound too convincing in the context of
possible concurrency problems ;-)
If both statements are in the same commit unit and you're using row or
page locking ...

Which row or page would be locked, if the where-clause
didn't match anything? The whole table?
 
L

Lew

Andreas said:
"Should be ..." doesn't really sound too convincing in the context of
possible concurrency problems ;-)

If the database engine supports transactions and you remember to use them, it
will be safe.


Andreas said:
Which row or page would be locked, if the where-clause
didn't match anything? The whole table?

That depends on the database engine you use and whether the database is
partitioned. RTFM for your particular environment.

It also depends on whether there are sequenced keys in the table. I worked on
a project a couple of years back that had many tables sharing the same
sequence for their keys. It caused huge contention because every table's
INSERT had to wait for the sequence to unlock, and there were a lot of INSERTs.

Another trick to consider is judicious use of SELECT ... FOR UPDATE.
 
T

Tom Anderson

If the database engine supports transactions and you remember to use
them, it will be safe.

You mean, if it has them, you remember to use them, and you're happy to
live with the consequences of using the serializable isolation level,
you'll be safe.

Being able to do an 'upsert' in a single atomic operation makes it
possible to be safe much faster than having to do it with two queries.
Another trick to consider is judicious use of SELECT ... FOR UPDATE.

True. I don't think it's applicable here, though.

tom
 
T

Tom Anderson

On 04-04-2010 20:12, Lew wrote:
I hear good things about H2.

;ALLOW_LITERALS=NONE in the connection URL certainly is nice !

H2 supports the rather useful MERGE INTO command [...] Anyway, argh,
looks like i'm going to have to add a facility for database-specific
SQL for the insert operation if i want this to be portable. I'd really
like to avoid having to fall back to running multiple queries to probe
the database for the existence of the key and then do UPDATE or INSERT
accordingly.

Given the maintenance work of adding a specific SQL statement for every
new database wouldn't it just be easier to make two statements?

Easier, of course. Significantly easier? Not really. It's just one SQL
statement, and not a very complicated one.
Especially with a high update ratio then an UPDATE with a WHERE and only
an INSERT if no rows were updated could be worth considering.

Yes, absolutely. And easier to do than the INSERT-first case, because you
have to detect failure of the primary key constraint to trigger the
UPDATE, and i'm not confident about being able to do that portably.
Anyway the cost should not be so bad, because the first of the two
statements should get the relevant pages loaded in memory so the second
should be fast.

As discussed downstream, i worry that it will clobber concurrency if done
safely.

tom
 
M

Martin Gregorie

"Should be ..." doesn't really sound too convincing in the context of
possible concurrency problems ;-)
That was me being cautious since we don't know what DBMS the OP has in
mind. For the following code to work the DBMS *must* support explicit
transaction control, serialised transaction mode and (probably) the
ability to set the lock timeout period. If we have all that the code is
straight forward:

set_session_conditions
{
set auto-commit off;
set transaction serialisation;
set lock timeout = 5 seconds; // needs to be at least 2x the max
// transaction execution time.
}

add_entry
{
start transaction // may be implied
select count(*) from table where key='?'
if (count == zero)
insert into table (key, value) values ('?', '?');
commit
}

If there are no explicit transactions, then all bets are off: it can't
be done without serialising the database access, i.e. you'll need to
queue updates for execution by a single worker thread - or something like
that.

If you can't serialise transactions, then you *must* make the key unique
and be prepared for inserts to fail with a duplicate key exception. You
must also be prepared to rollback and retry if the transaction times out.
Which row or page would be locked, if the where-clause didn't match
anything? The whole table?
Depends what locks you're using - table, page or row. A decent DBMS will
let you specify at least table or row locking and some may allow page
locking as well. It doesn't actually matter in this case since there's
only a single row accessed during a transaction. However, the bigger the
scope of the lock the more chance that a concurrent transaction will fail
due to a lock conflict and need to be retried

Ideally you'd use row locking: then the select will lock zero or 1 rows
and the insert, if run, will lock one row. Page locking extends this to
the page containing the row and table locking prevents any concurrency
for the duration of the transaction.
 
A

Andreas Leitgeb

Martin Gregorie said:
Depends what locks you're using - table, page or row. ...
Ideally you'd use row locking: then the select will lock zero or 1 rows
and the insert, if run, will lock one row.

That explained it. Thanks.

So second thread waits on its insert-line, until first thread commits,
and then second thread gets some unique constraint error. It then just
repeats the whole transaction, which (assuming no further concurrencies)
succeeds right at the update, so second thread's value wins, as expected
for put-semantics.

I don't know, why I was so stuck at expecting threads to already block on
the empty update...
 
T

Tom Anderson

That was me being cautious since we don't know what DBMS the OP has in
mind. For the following code to work the DBMS *must* support explicit
transaction control, serialised transaction mode and (probably) the
ability to set the lock timeout period. If we have all that the code is
straight forward:

set_session_conditions
{
set auto-commit off;
set transaction serialisation;
set lock timeout = 5 seconds; // needs to be at least 2x the max
// transaction execution time.
}

add_entry
{
start transaction // may be implied
select count(*) from table where key='?'
if (count == zero)
insert into table (key, value) values ('?', '?');
commit
}

I'd be keen to see an update in there somewhere too!

tom
 
A

Arne Vajhøj

It is not. It is all in the "org.apache.derby" namespace, and is not
part of what a JRE must provide to be entitled to use the name "Java".

OK. Then there are no potential legal problem.
There are such a few utility components which are provided with
Sun's JRE while not being part of "Java" per se; another example is
the HTTP server code, in com.sun.net.httpserver.

Java DB is documented & supported. Is the HTTP server that?

Arne
 
T

Tom Anderson

Yes. From:
http://java.sun.com/javase/6/docs/technotes/guides/net/index.html
click on the fourth link in the "API Specification" section. The
"Networking features and enhancements" page:
http://java.sun.com/javase/6/docs/technotes/guides/net/enhancements-6.0.html
talks about that server. Apparently it is meant to stay.

We had a back-and-forth over this a while ago. It's unarguably documented
- but it's not clear that it's part of the API (to put it mildly), and not
clear that, because of that or otherwise, it's supported.

But then, support, schmupport. A decompiler and -Xbootclasspath is worth
ten thousand support requests anyway.

tom
 
M

Martin Gregorie

I'd be keen to see an update in there somewhere too!
I was tired when I wrote that, but the logic is the same. Just substitute
UPDATE for SELECT and the update count for the SELECT count(*) result in
my pseudocode.
 
A

Arne Vajhøj

Would it be safe? Or could one end up with two entries for
that key, if the attempted updates happen at the same time?

A sufficient high transaction isolation level should fix that.
(Ok, two equal keys could be prevented by unique-key-constraints,
but then it would mean that the "last" one doesn't always win, but
may possibly run into an DB-error, instead...)

A lower transaction isolation level and such a constraint combined
with a retry in the code may perform better than a high
transaction isolation level depending on how the database
handles this stuff.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top