Monday, September 16, 2013

trca gotchas – explain plan != execution plan



trca generates a nicely formatted, visually assisted explain plan, which weights all the trouble to get trca installed.

However, sometimes the explain plan is not what actually happens. For lack of better terms, I will re-define the following terms (note, this is not actually how Oracle documentations define them, it is just I find this interpretation helps clarification): 
  • Explain plan: this is the estimated plan, not the actual plan.
  • Execution plan: this is the actual plan executed by oracle for your sql, in tkprof or trca, this is equal to “row source operation”.
Among other things, these two will affect the execution plan:
  • Bind variable peeking
  • Timing

Bind variable peeking



--column t.id has two values:1 and 99, only one row is 1
create table t
    as
    select case when rownum =1 then 1 else 99 end id, a.*
      from all_objects a

create index t_idx on t(id);


Use traceSql.sh to analyze these sqls:


variable id number;
exec :id := 99;
select * from t where id = :id;
The resulting report shows:



Execution plan (a.k.a row source operation) is not equal to explain plan:
  • Execution plan uses table full scan, since id=99 is the 99% of table t, this obviously is the best choice
  • Explain plan uses index scan, which is not optimal
The reason for this discrepancy is that explain plan doesn’t do “bind variable peeking”. At runtime, what happens is: optimizer looks at the sql, and says to itself “there is a bind variable, I’d better first peek at its value, before selecting an execution plan”, so it peeks at “:id” variable, and finds out it is 99, “I should use a table scan, as 99% of the table is id=99”.  Explain plan, on the other hand, is blind to bind variables. 

If there are no distracting bind variables, explain plan can be equal to the execution plan (can be, but not will be, as there are other impacting factors). If we change the sql to use literals, explain plan is equal to execution plan:

select * from t where id = 99;



At this point, you might be tempted to ask, “table t is not analyzed, how come optimizer knows 99 is the 99% of the table?” This is thanks to dynamic sampling.


Dynamic sampling



If a table is not analyzed, optimizer will sample a certain percentage of the data, get an understanding of the data distribution, and use this understanding to pick an optimal plan. There are 11 levels (0 through 10) of dynamic sampling, in Oracle9i r2, the default dynamic sampling level value is 1, whereas in 10g r1 and above, it defaults to 2.

SQL> set autotrace off;
SQL> select count(*) from t;

  COUNT(*)
----------
     81084

SQL> set autotrace traceonly explain;
SQL> select * from t where id =99;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100K|    16M|   328   (2)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    |   100K|    16M|   328   (2)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ID"=99)

Note
-----
   - dynamic sampling used for this statement (level=2)

SQL> select /*+ dynamic_sampling(t 0) */* from t where id = 99;

Execution Plan
----------------------------------------------------------
Plan hash value: 470836197

--------------------------------------------------------------------------------

| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time|

-------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |       |   971 |   162K|     5   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T     |   971 |   162K|     5   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | T_IDX |   388 |       |     1   (0)| 00:00:01 |

-------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=99)


With default dynamic sampling, optimizer’s cardinality estimation is quite close (100K) and it is able to arrive at the optimal execution plan; if turning dynamic sampling off by using level 0 (hint dynamic_sampling(t 0)), optimizer immediately flies blind and picks up an inefficient plan.

Timing



There is another factor that might cause different explain plan and execution plan: timing. 

Explain plan always does a hard parse, while execution plan will reuse the existing plan if there is one, i.e. it does a soft parse. 

Continue the demo, first we run these sqls:

variable id number;
exec :id := 99;
select * from t where id = :id;


Report shows the execution plan uses table scan, explain plan gets it wrong and uses index scan. 

Now let us change the bind variable value:

exec :id := 1;
select * from t where id = :id;


Rerun traceSql.sh, but this time do not flush shared pool. The resulting report shows:
 


Execution plan continues to use the existing plan, which is now not suitable for “id=1”; execution plan appears to get it right, but it is just a coincidence.


Recommendation



The well formatted, visually assisted explain plan is too important for sql performance analysis. In order to get a correct explain plan, the sql should use literals, not bind variables. 

References