What I don't understand is why I should access the database (or
whatever
repository where I can find my information) twice, instead of just
building the bean and returning it in the find method.
There are multiple cases to consider.
The container uses the ejbFindByPrimaryKey method to test if the
database has a record with that primary key. If so, ejbFindByPrimaryKey
returns the primary key; if not it throws an ObjectNotFoundException.
In the event that more than one record exists with the primary key (not
usually possible), the finder throws a FinderException. This process
doesn't necessarily involve doing a database access, but in practice it
almost always does.
The container uses other Single-Object Finders to translate the lookup
criteria into a primary key for a record in the database. If there is
no such record, the finder must throw an ObjectNotFoundException. If
more than one primary key is found, the finder must throw a
FinderException. This lookup might not involve reading the actual
record from the database; e.g., the primary key might be obtained by
querying a different table using the search criteria.
In both of the above cases, if the container receives a primary key
back from the finder, it then looks in its bean pool to see if it
already has a bean instance for that key. It returns that instance if
so, or allocates a new instance if not (or sets up lazy allocation for
it). Thus, it's possible that the bean instance returned is one from
the pool, not the one that executed the finder. It could also be just a
stub for lazy allocation.
The container uses Multi-Object Finders to perform queries based on the
lookup criteria. The finder returns a Collection of primary keys
(possibly empty) associated with the result set of the query. This
might not involve reading the actual records from the database; e.g.,
the primary keys might be obtained by querying a different table using
the search criteria.
In this case, the container looks in its bean pool to see if it has
bean instances for any of the returned keys. It allocates new instances
(or sets up lazy allocation) for any keys that didn't have pooled
instances, and then returns the lot. Multiple bean instances are
returned, and maybe none of them are the one that executed the finder.
Some might just be a stub for lazy allocation.
The client might then use the bean instance(s) that it received from
the container. The first access to each bean instance will trigger a
call to its ejbLoad method, which typically will issue another database
read.
So, if you call a finder method that returns 'n' bean instances, and
then access each of those instances, you'll typically end up with 'n+1'
database accesses.
Some containers (at least WebLogic and JBoss) can be instructed to
preload all of the returned beans if they're CMP beans. In a few cases
this might be wasteful; for example, if the result set was 1000 beans
and you only wanted to look at the top 5. It can also result in your
database going into lock escalation.
Entity beans are designed to always behave correctly under all
conditions. They are inherently low-performance and should be
approached carefully in any system that is expected to be under heavy
load. In addition to the 'n+1' problem (which can sometimes be
circumvented with some containers), you have scalability challenges
introduced by the limitation that only one client can be accessing an
entity bean at a time. Entity beans must be accessed inside
transactions, which generally is inappropriate for OLAP applications.
And depending on your database, you might end up with unnecessary lock
escalation which can further damage scalability.