Translate

Dynamic sampling and its impact on Optimizer

What is Dynamic sampling and its impact on the Optimizer and how it helps generating better Execution Plan?

The Oracle optimizer makes decisions based on the statistics stored in the data dictionary for the objects referenced by a given query. Without valid statistics, the optimizer often makes poor choices when picking an execution plan. 
The main objective of dynamic sampling is to create more accurate selectivity and cardinality estimates, which, in turn, helps the CBO generate faster execution plans. 
Oracle optimizer use the default statistic values when table does not have any statistics and dynamic sampling query feature is disabled. Optimizer Dynamic Sampling has 11 levels between 0 to 10. The default setting is 2. This setting ensures that the optimizer will have a good estimate whenever it optimizes a query that accesses a table with no statistics. Database will perform dynamic sampling when existing statistics are missing, unusable or insufficient (unless dynamic sampling has been turned off entirely).


NOTE: Optimizer will not perform dynamic sampling on Remote tables or External Tables.

How Dynamic Sampling Works
There are two ways to use dynamic sampling:
  • The OPTIMIZER_DYNAMIC_SAMPLING parameter can be set at the database instance level and can also be overridden at the session level with the ALTER SESSION command.
  • The DYNAMIC_SAMPLING query hint can be added to specific queries. DYNAMIC_SAMPLING hint can also be used to invoke dynamic sampling for a single statement. 
Here are the criteria levels for OPTIMIZER_DYNAMIC_SAMPLING:
Level
When Dynamic Sampling will be used
Sample size (blocks)
0
Do not use dynamic sampling.
N/A
1
Use dynamic sampling with a 32-block sample size for all tables that have not been analysed, but only if the unanalysed table has more than 32 blocks, no indexes and is not partitioned.
32
2 (default)
Use dynamic sampling with a 64-block sample size if at least one table in the statement has no statistics. This is the default.
64
3
Use dynamic sampling with a 64-block sample size if the statement meets level 2 criteria or has one or more expressions used in the WHERE clause predicates. Example: Where SUBSTR(CUSTLASTNAME,1,3) or Where a + b =5
64
4
Use dynamic sampling with a 64-block sample size if the statement meets level 3 criteria or uses complex predicates (an OR or AND operator between multiple predicates on the same table).
64
5
Any statement that meets level 4 criteria
128
6
Any statement that meets level 4 criteria
256
7
Any statement that meets level 4 criteria
512
8
Any statement that meets level 4 criteria
1024
9
Any statement that meets level 4 criteria
4086
10
Use dynamic sampling of all blocks for all statements.
All Blocks


Consider below Table:
SQL> CREATE Table MyTemp AS
SELECT OWNER, OBJECT_TYPE
FROM ALL_OBJECTS;
Table created.

SQL> SELECT Count(*) From MyTemp;
  COUNT(*)
----------
     71879



Let’s disable the Dynamic Sampling by DYNAMIC_SAMPLING hint, with a level of zero and see the Execution Plan.

Disabling dynamic sampling to see default cardinalities
SQL> SET AUTOTRACE TRACEONLY EXP
SQL> SELECT /*+ Dynamic_Sampling(MyTemp,0) */ *
  2  From MyTemp;
Execution Plan
----------------------------------------------------------
Plan hash value: 2234051319
----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        | 16908 |   462K|    58   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| MYTEMP | 16908 |   462K|    58   (0)| 00:00:01 |


As you can see, the estimated cardinality is 16,906, which is very far off from the real cardinality, about 71,879. If I permit dynamic sampling, I get a much more realistic cardinality estimate.

More-realistic cardinalities with dynamic sampling enabled
SQL> SELECT * From MyTemp;
Execution Plan
----------------------------------------------------------
Plan hash value: 2234051319

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        | 82504 |  2255K|    59   (2)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| MYTEMP | 82504 |  2255K|    59   (2)| 00:00:01 |
----------------------------------------------------------------------------
Note
-----
 - dynamic sampling used for this statement (level=2)


Now, 82504 is much closer to 71,879. An inaccurate cardinality estimate can swing either way. We have seen the optimizer radically underestimating the cardinality, but it can overestimate as well. 


Overestimating cardinalities
SQL> DELETE From MyTemp;
71879 rows deleted.
SQL> COMMIT;
Commit complete.

SQL> SET AUTOTRACE TRACEONLY EXP
SQL> SELECT /*+ Dynamic_Sampling(MyTemp,0) */ *
  2  From MyTemp;
Execution Plan
----------------------------------------------------------
Plan hash value: 2234051319
----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        | 16908 |   462K|    58   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| MYTEMP | 16908 |   462K|    58   (0)| 00:00:01 |
----------------------------------------------------------------------------
SQL> SELECT * From MyTemp;

Execution Plan
----------------------------------------------------------
Plan hash value: 2234051319
----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     1 |    28 |    58   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| MYTEMP |     1 |    28 |    58   (0)| 00:00:01 |
----------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement (level=2)




Think about what might happen if the optimizer guessed 16,908 rows instead of 1 row in this case. For queries accessing table MyTemp, the optimizer would grossly overestimate the rows that will be returned from MyTemp and generate incorrect plans as a result.


Poor Cardinality Estimates 


Creating the “FLAG” table and gathering statistics

SQL> CREATE Table MyTable
  2  AS SELECT
  3  DECODE(MOD(ROWNUM,2),0,'N','Y') Flag1,
  4  DECODE(MOD(ROWNUM,2),0,'Y','N') Flag2, AO.*
  5  FROM ALL_OBJECTS AO;
Table created.

SQL> SELECT Count(*) From MyTable;
  COUNT(*)
----------
     71880

SQL> CREATE INDEX T_IND ON MyTable(Flag1,Flag2);
Index created.

SQL> BEGIN
  2  DBMS_STATS.GATHER_TABLE_STATS(USER,'MYTABLE',
  3  METHOD_OPT=>'For All Indexed Columns Size 254');
  4  END;
  5  /
PL/SQL procedure successfully completed.

SQL> SET AUTOTRACE TRACEONLY EXP
SQL> SELECT * From MyTable Where Flag1='N' And Flag2='N';

Execution Plan
----------------------------------------------------------
Plan hash value: 1015944200
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         | 17969 |  1772K|   299   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| MYTABLE | 17969 |  1772K|   299   (1)| 00:00:04 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG1"='N' AND "FLAG2"='N')



Let’s use Dynamic Sampling
SQL> SELECT /*+ Dynamic_Sampling(MyTable 3) */  * From MyTable Where Flag1='N' And Flag2='N';
Execution Plan
----------------------------------------------------------
Plan hash value: 270124127
---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     7 |   707 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MYTABLE |     7 |   707 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IND   |     7 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FLAG1"='N' AND "FLAG2"='N')
Note
-----
- dynamic sampling used for this statement (level=2)




Note the much better row estimate (7 instead of more than 17,959) , also note the very different (and now optimal) plan. Instead of a full table scan, the optimizer has decided to use the index, and this execution plan will definitely be optimal compared to the original full table scan, because in reality, no data will be returned. The index will discover that instantly.
So, not all queries can benefit from Oracle dynamic sampling. Queries with very fast execution times or ad-hoc queries that will not be repeated are not good candidates. Here are the situations that can benefit most from dynamic sampling:
·       A query with complex predicates that is generating a poor execution plan.
·       Queries with execution times that are significantly longer than the time required to perform dynamic sampling.
·       Queries that will be repeatedly executed on the database.
 ·       If the available statistics are not enough to generate a good execution plan, dynamic sampling will be used.





Get involved and leave your Comments in the Box Below. The more people get involved, the more we all benefit. So, leave your thoughts before you leave the page.
Source- AskTom 

1 comment: