Monday, February 17, 2014
Monday, February 10, 2014
Peeking Behind Hibernate Batching
This article peeks behind how Hibernate deals with batching, and how it fails to do optimistic locking with the standard Oracle JDBC driver and what to do with it.
What is batching
First, let us have an understanding what batching is and is
not.
Image a PreparedStatement is an expensive pathway to the database.
The pathway is very expensive (i.e. memory consumption, performance overhead
etc), so in order to be cost-effective, the pathway will be reused. For
example, to create five rows in the Child table, one PreparedStatement will be
created instead of five, and the five sets of parameters for creating the five
children will pass through the pathway.
insert into CP_CHILD (version, name, id) values (?, ?, ?)
What batching is not is batching 5 insert sqls to the
database, such as the following:
Hibernate batching
Now let us get a little more technical with Hibernate
batching.
(The following test program was tested with Hibernate 3.2.5/Oracle ojdbc6.jar for 11.1.0.6, and also with Hibernate 4.3.1/Oracle ojdbc6.jar for 11.2.0.4)
(The following test program was tested with Hibernate 3.2.5/Oracle ojdbc6.jar for 11.1.0.6, and also with Hibernate 4.3.1/Oracle ojdbc6.jar for 11.2.0.4)
Here is a very simple code snippet to insert 1 parent and 5
children, Parent and Child tables, as you can image, have primary/foreign key
relationship:
Parent p = new Parent();p.setName("p1");// insert 5 children, seq: 0,1,2,3,4List<Child> children = new ArrayList<Child>();for (int i = 0; i < 5; i++) {Child c = new Child();c.setName("c" + i);p.addChild(c);}session.saveOrUpdate(p);
The real actions happen within session.flush(), before
session.flush(), all actions are happening inside hibernate first level
cache session. Using Perfspy to capture it and dive into the batch specific
methods:
6 AbstractBatcher.prepareBatchStatement(..) are invoked to
prepare 6 insert sqls, whose operation ids are 1928, 2053, 2186, 2294, 2402,
2510. The first one is to insert into CP_PARENT, the remaining five are to
insert into CP_CHILD.
What is interesting is the second one 2053. Hibernate
assembles statements into a batcher first (operation 1987, 2120, 2228, 2336,
2444, 2552):
When
it tries to put an insert child PreparedStatement into the queue, it finds out
there is already a different PreparedStatement in the queue, so it executes
that statement into the DB. This can be confirmed by looking into the detailed parameter information
of operation 2059 BatchingBatcher.doExecuteBatch(..) , which shows the sql
being executed is “insert into CP_PARENT”:
Although Hibernate logging shows 5 insert into CP_CHILD sqls
are issued:
Hibernate: insert into CP_CHILD (version, name, id) values (?, ?, ?)Hibernate: insert into CP_CHILD (version, name, id) values (?, ?, ?)Hibernate: insert into CP_CHILD (version, name, id) values (?, ?, ?)Hibernate: insert into CP_CHILD (version, name, id) values (?, ?, ?)Hibernate: insert into CP_CHILD (version, name, id) values (?, ?, ?)
It actually executes one insert sql, and uses batching
mechanism to create five children.
Check out the following diagram, the five CP_CHILD AbstractBatcher.prepareBatchStatement(..)
reuses the same statement (the return values of OraclePreparedStatementWrapper
objects have the same
hashcode 1d58e2af):
Optimistic locking trouble with Oracle JDBC Driver
Hibernate’s optimistic locking falls apart when dealing with
Oracle. Check operation 2060 BatchingBatcher.checkRowCounts(..), which is to
check if the return value of the PreparedStatement “insert into CP_PARENT” matches
with the expectation. The PreparedStatement “insert into CP_PARENT” is expected
to insert 1 row and thus its return value should be 1. However, Oracle JDBC
driver, when handling batching, always returns -2, meaning, it has executed the
sql, but it doesn’t know how many rows it has impacted.
Looking into how Hibernate checks the result of the “insert
into CHILD” PreparedStatement will be more illustrative:
Instead of returning an array of [1,1,1,1,1], the “insert
into CHILD” PreparedStatement returns an array of [-2,-2,-2,-2,-2], which
throws Hibernate off guard and resigns to the fact that it can’t safeguard
optimistic locking.
Dirty data
This trouble with Oracle JDBC driver can easily lead to
dirty data. The following test case proves it:
public void testDuplicateChild() throws InterruptedException {// prepare the data, one parent, 5 childrenSession session = sf.getCurrentSession();Transaction ta = session.beginTransaction();final Parent p = new Parent();p.setName("p1");resetChildren(p, "c");session.saveOrUpdate(p);ta.commit();final CountDownLatch cdWaitForTheSeondThreadToStartBeforeTheFirstThreadfCommit = new CountDownLatch(1);final CountDownLatch cdWaitAfterTheFirstThreadFinishCommit = new CountDownLatch(1);final CountDownLatch cdWaitForThreadsToFinish = new CountDownLatch(2);Runnable r1 = new Runnable() {public void run() {Session session = sf.getCurrentSession();Transaction ta = session.beginTransaction();Parent parent = (Parent) session.load(Parent.class, p.getId());//change children nameresetChildren(parent, "c-new1");try {cdWaitForTheSeondThreadToStartBeforeTheFirstThreadfCommit.await();} catch (InterruptedException e) {e.printStackTrace();}ta.commit();cdWaitAfterTheFirstThreadFinishCommit.countDown();cdWaitForThreadsToFinish.countDown();}};Runnable r2 = new Runnable() {public void run() {Session session = sf.getCurrentSession();Transaction ta = session.beginTransaction();Parent parent = (Parent) session.load(Parent.class, p.getId());//change children nameresetChildren(parent, "c-new2");cdWaitForTheSeondThreadToStartBeforeTheFirstThreadfCommit.countDown();try {cdWaitAfterTheFirstThreadFinishCommit.await();} catch (InterruptedException e) {e.printStackTrace();}ta.commit();cdWaitForThreadsToFinish.countDown();}};new Thread(r1).start();new Thread(r2).start();cdWaitForThreadsToFinish.await();}private void resetChildren(Parent p, String childName) {p.getChildren().clear();List<Child> children = new ArrayList<Child>();for (int i = 0; i < 5; i++) {Child c = new Child();c.setName(childName + i);children.add(c);}
This simple program first creates one parent and 5 children,
and then launches two threads to change the child names. The two threads are
orchestrated so that they are operating concurrently. If optimistic locking is
working, the second transaction should fail, and there should be only 5
children for the parent. However, since the optimistic locking falls apart,
both threads finish successfully, and 10 children are created for the parent.
Solution
Someone already posted a solution into Hibernate forum:
The gist of the solution is that it uses Oracle specific
JDBC API instead of standard JDBC API. The standard JDBC API for executing a
batch is:
PreparedStatement.executeBatch()
which returns an array of -2 for batching statement, the
solution uses:
Statement.sendBatch()
The ugly thing is that in order to invoke “sendBatch”
method, you have to first get hold of the Oracle specific statement. If your
application is deployed into some application server, it can become quite ugly
to get hold the underlying Oracle specific statement, usually you will need to
use some reflections to get it, which makes the code hard to maintain and port.
Subscribe to:
Posts (Atom)