Monday, February 10, 2014

Peeking Behind Hibernate Batching

This article peeks behind how Hibernate deals with batching, and how it fails to do optimistic locking with the standard Oracle JDBC driver and what to do with it. 

What is batching

First, let us have an understanding what batching is and is not.
Image a PreparedStatement is an expensive pathway to the database. The pathway is very expensive (i.e. memory consumption, performance overhead etc), so in order to be cost-effective, the pathway will be reused. For example, to create five rows in the Child table, one PreparedStatement will be created instead of five, and the five sets of parameters for creating the five children will pass through the pathway. 
 insert into CP_CHILD (version, name, id) values (?, ?, ?)

What batching is not is batching 5 insert sqls to the database, such as the following:  

Hibernate batching

Now let us get a little more technical with Hibernate batching.

(The following test program was tested with Hibernate 3.2.5/Oracle ojdbc6.jar for 11.1.0.6, and also with Hibernate 4.3.1/Oracle ojdbc6.jar for 11.2.0.4)
Here is a very simple code snippet to insert 1 parent and 5 children, Parent and Child tables, as you can image, have primary/foreign key relationship:

Parent p = new Parent();
             p.setName("p1");

             // insert 5 children, seq: 0,1,2,3,4
             List<Child> children = new ArrayList<Child>();
             for (int i = 0; i < 5; i++) {
                    Child c = new Child();
                    c.setName("c" + i);
                    p.addChild(c);
                   
             }

             session.saveOrUpdate(p);
The real actions happen within session.flush(), before session.flush(), all actions are happening inside hibernate first level cache session. Using Perfspy to capture it and dive into the batch specific methods:


6 AbstractBatcher.prepareBatchStatement(..) are invoked to prepare 6 insert sqls, whose operation ids are 1928, 2053, 2186, 2294, 2402, 2510. The first one is to insert into CP_PARENT, the remaining five are to insert into CP_CHILD. 

What is interesting is the second one 2053. Hibernate assembles statements into a batcher first (operation 1987, 2120, 2228, 2336, 2444, 2552):

When it tries to put an insert child PreparedStatement into the queue, it finds out there is already a different PreparedStatement in the queue, so it executes that statement into the DB. This can be confirmed by looking into the detailed parameter information of operation 2059 BatchingBatcher.doExecuteBatch(..) , which shows the sql being executed is “insert into CP_PARENT”:

Although Hibernate logging shows 5 insert into CP_CHILD sqls are issued:
Hibernate: insert into CP_CHILD (version, name, id) values (?, ?, ?)
Hibernate: insert into CP_CHILD (version, name, id) values (?, ?, ?)
Hibernate: insert into CP_CHILD (version, name, id) values (?, ?, ?)
Hibernate: insert into CP_CHILD (version, name, id) values (?, ?, ?)
Hibernate: insert into CP_CHILD (version, name, id) values (?, ?, ?)
 
It actually executes one insert sql, and uses batching mechanism to create five children.

Check out the following diagram, the five CP_CHILD AbstractBatcher.prepareBatchStatement(..) reuses the same statement (the return values of OraclePreparedStatementWrapper objects have the same hashcode 1d58e2af):
   



Optimistic locking trouble with Oracle JDBC Driver




Hibernate’s optimistic locking falls apart when dealing with Oracle. Check operation 2060 BatchingBatcher.checkRowCounts(..), which is to check if the return value of the PreparedStatement “insert into CP_PARENT” matches with the expectation. The PreparedStatement “insert into CP_PARENT” is expected to insert 1 row and thus its return value should be 1. However, Oracle JDBC driver, when handling batching, always returns -2, meaning, it has executed the sql, but it doesn’t know how many rows it has impacted.


Looking into how Hibernate checks the result of the “insert into CHILD” PreparedStatement will be more illustrative:
Instead of returning an array of [1,1,1,1,1], the “insert into CHILD” PreparedStatement returns an array of [-2,-2,-2,-2,-2], which throws Hibernate off guard and resigns to the fact that it can’t safeguard optimistic locking.

Dirty data


This trouble with Oracle JDBC driver can easily lead to dirty data. The following test case proves it:
 
public void testDuplicateChild() throws InterruptedException {
             // prepare the data, one parent, 5 children
             Session session = sf.getCurrentSession();
             Transaction ta = session.beginTransaction();
             final Parent p = new Parent();
             p.setName("p1");
             resetChildren(p, "c");
             session.saveOrUpdate(p);
             ta.commit();

             final CountDownLatch cdWaitForTheSeondThreadToStartBeforeTheFirstThreadfCommit = new CountDownLatch(
                          1);
             final CountDownLatch cdWaitAfterTheFirstThreadFinishCommit = new CountDownLatch(1);
             final CountDownLatch cdWaitForThreadsToFinish = new CountDownLatch(2);

             Runnable r1 = new Runnable() {
                    public void run() {
                          Session session = sf.getCurrentSession();
                          Transaction ta = session.beginTransaction();
                          Parent parent = (Parent) session.load(Parent.class, p.getId());
                          //change children name
                          resetChildren(parent, "c-new1");

                          try {
                            cdWaitForTheSeondThreadToStartBeforeTheFirstThreadfCommit.await();
                          } catch (InterruptedException e) {
                                 e.printStackTrace();
                          }

                          ta.commit();
                           cdWaitAfterTheFirstThreadFinishCommit.countDown();
                          cdWaitForThreadsToFinish.countDown();
                    }
             };

             Runnable r2 = new Runnable() {
                    public void run() {
                          Session session = sf.getCurrentSession();
                          Transaction ta = session.beginTransaction();
                          Parent parent = (Parent) session.load(Parent.class, p.getId());

                          //change children name
                          resetChildren(parent, "c-new2");

                    cdWaitForTheSeondThreadToStartBeforeTheFirstThreadfCommit.countDown();
                          try {
                                 cdWaitAfterTheFirstThreadFinishCommit.await();
                          } catch (InterruptedException e) {
                                 e.printStackTrace();
                          }

                          ta.commit();
                          cdWaitForThreadsToFinish.countDown();
                    }
             };

             new Thread(r1).start();
             new Thread(r2).start();

             cdWaitForThreadsToFinish.await();

       }

       private void resetChildren(Parent p, String childName) {
             p.getChildren().clear();
             List<Child> children = new ArrayList<Child>();
             for (int i = 0; i < 5; i++) {
                    Child c = new Child();
                    c.setName(childName + i);
                    children.add(c);
             }


This simple program first creates one parent and 5 children, and then launches two threads to change the child names. The two threads are orchestrated so that they are operating concurrently. If optimistic locking is working, the second transaction should fail, and there should be only 5 children for the parent. However, since the optimistic locking falls apart, both threads finish successfully, and 10 children are created for the parent.

Solution 



Someone already posted a solution into Hibernate forum:

The gist of the solution is that it uses Oracle specific JDBC API instead of standard JDBC API. The standard JDBC API for executing a batch is:

PreparedStatement.executeBatch()


which returns an array of -2 for batching statement, the solution uses: 

Statement.sendBatch()


The ugly thing is that in order to invoke “sendBatch” method, you have to first get hold of the Oracle specific statement. If your application is deployed into some application server, it can become quite ugly to get hold the underlying Oracle specific statement, usually you will need to use some reflections to get it, which makes the code hard to maintain and port.