Inside the Oracle Optimizer - Removing the black magic: 2008

Tuesday, October 14, 2008

Oracle Open World follow up

We were delighted to see so many people turn up for our Open World session - Inside the 11g Optimizer - so early on Tuesday morning! A lot of people have been asking where they can find more information on the topics covered especially the demos that were shown. You can find similar worked examples for most of the new 11g Optimizer features on the Oracle By Example website. You can also get more information on SQL Plan Management in the following white paper. Our blog entry from December 2007 has more information on Adaptive Cursor Sharing while the January 2008 entry gives more details on the enhancements made to statistics. We hope you enjoyed Oracle Open World as much as we did!

Thursday, July 24, 2008

Will the Optimizer development team be at Oracle Open World?

The largest gathering of Oracle customers, partners, developers, and technology enthusiasts will happen in September when Oracle will host its annual user conference Open World in San Francisco and the Optimizer development group will be there! You will have two opportunities to meet the team -- attend the technical presentation "Inside the 11g Optimizer - Removing the mystery" on Tuesday morning at 9am or stop by the Oracle demo grounds (in Moscone West) to see all of the demos for the 11g new features and ask the development team any burning questions you may have!

Maria Colgan

Thursday, June 26, 2008

Why are some of the tables in my query missing from the plan?

We apologize for our brief hiatus from blogging. We've been busy working on improvements to the optimizer.

In 10gR2, we introduced a new transformation, table elimination (alternately called "join elimination"), which removes redundant tables from a query. A table is redundant if its columns are only referenced to in join predicates, and it is guaranteed that those joins neither filter nor expand the resulting rows. There are several cases where Oracle will eliminate a redundant table. We will discuss each case in turn.

Primary Key-Foreign Key Table Elimination

Starting in 10gR2, the optimizer eliminates tables that are redundant due to primary key-foreign key constraints. Consider the following example tables:

create table jobs
(
job_id NUMBER PRIMARY KEY,
job_title VARCHAR2(35) NOT NULL,
min_salary NUMBER,
max_salary NUMBER
);
create table departments
(
department_id NUMBER PRIMARY KEY,
department_name VARCHAR2(50)
);
create table employees
(
employee_id NUMBER PRIMARY KEY,
employee_name VARCHAR2(50),
department_id NUMBER REFERENCES departments(department_id),
job_id NUMBER REFERENCES jobs(job_id)
);

and the query:

select e.employee_name
from employees e, departments d
where e.department_id = d.department_id;

In this query, the join to departments is redundant. The only column referenced in the query appears in the join predicate, and the primary key-foreign key constraint guarantees that there is at most one match in departments for each row in employees. Hence, the query is equivalent to:

select e.employee_name
from employees e
where e.department_id is not null;

The optimizer will generate this plan for the query:

-------------------------------------------
 Id   Operation             Name      
-------------------------------------------
   0  SELECT STATEMENT                
*  1   TABLE ACCESS FULL    EMPLOYEES 
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 1 - filter("E"."DEPARTMENT_ID" IS NOT NULL)

Note that the IS NOT NULL predicate is not necessary if the column has a NOT NULL constraint on it.

Starting in 11gR1, the optimizer will also eliminate tables that are semi-joined or anti-joined. Consider the following query:


select e.employee_id, e.employee_name
from employees e
where not exists (select 1
                  from jobs j
                  where j.job_id = e.job_id);

Since employees.job_id is a foreign key to jobs.job_id, any non-null value in employees.job_id must have a match in jobs. So only employees with null values for employees.job_id will appear in the result. Hence, this query is equivalent to:

select e.employee_id, e.employee_name
from employees e
where job_id is null;

and the optimizer can choose this plan:


-------------------------------------------
 Id   Operation             Name      
-------------------------------------------
   0  SELECT STATEMENT                
*  1   TABLE ACCESS FULL    EMPLOYEES 
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("E"."JOB_ID" IS NULL)

Suppose employees.job_id has a NOT NULL constraint:

alter table employees modify job_id not null;

In this case, there could not possibly be any rows in EMPLOYEES, and the optimizer could choose this plan:

-------------------------------------------
 Id   Operation             Name      
-------------------------------------------
   0  SELECT STATEMENT                
*  1   FILTER                         
   2    TABLE ACCESS FULL   EMPLOYEES 
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 1 - filter(NULL IS NOT NULL)

The "NULL IS NOT NULL" filter is a false constant predicate, that will prevent the table scan from even taking place.

Also in 11gR1, the optimization became available for ANSI compliant joins. For this query:

select employee_name
from employees e inner join jobs j
on e.job_id = j.job_id;

the optimizer can eliminate JOBS and produce this plan:

-------------------------------------------
 Id   Operation             Name      
-------------------------------------------
   0  SELECT STATEMENT                
   1   TABLE ACCESS FULL    EMPLOYEES 
-------------------------------------------

Outer Join Table Elimination

In 11gR1, a new form of table elimination was introduced for outer joins, which does not require PK-FK constraints. For the example, we require a new table and an addition to EMPLOYEES:

create table projects
(
project_id NUMBER UNIQUE,
deadline DATE,
priority NUMBER
);

alter table employees add project_id number;

Now consider a query that outer joins employees and projects:

select e.employee_name, e.project_id
from employees e, projects p
where e.project_id = p.project_id (+);

The outer join guarantees that every row in employees will appear at least once in the result. The unique constraint on projects.project_id guarantees that every row in employees will match at most one row in projects. Together, these two properties guarantee that every row in employees will appear in the result exactly once. Since no other columns from projects are referenced, projects can be eliminated, and the optimizer can choose this plan:

-------------------------------------------
 Id   Operation             Name      
-------------------------------------------
   0  SELECT STATEMENT                
   1   TABLE ACCESS FULL    EMPLOYEES 
-------------------------------------------

Why Would I Ever Write Such a Query?

All of the example queries in this post are very simple, and one would be unlikely to write a query where the join is so obviously unnecessary. There are many real world scenarios where table elimination may be helpful, including machine-generated queries and elimination of tables in views. For example, a set of tables might be exposed as a view, which contains a join. The join may be necessary to retrieve all of the columns exposed by the view. But some users of the view may only access a subset of the columns, and in this case, the joined table can be eliminated.

For example, consider the view:

create view employee_directory_v as
select e.employee_name, d.department_name, j.job_title
from employees e, departments d, jobs j
where e.department_id = d.department_id
and e.job_id = j.job_id;

This view might be exposed to a simple employee directory application. To lookup employee names by job title, the application issues a query:

select employee_name
from employee_directory_v
where department = 'ACCOUNTING';

Since the job_title column is not referenced, jobs can be eliminated from the query, and the optimizer can choose this plan:

--------------------------------------------
 Id   Operation             Name       
--------------------------------------------
   0  SELECT STATEMENT                 
*  1   HASH JOIN                       
   2    TABLE ACCESS FULL   EMPLOYEES  
*  3    TABLE ACCESS FULL   DEPARTMENTS
--------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 1 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
 3 - filter("D"."DEPARTMENT_NAME"='ACCOUNTING')

Known Limitations

There are currently a few limitations of table elimination:

Multi-column primary key-foreign key constraints are not supported.
Referring to the join key elsewhere in the query will prevent table elimination. For an inner join, the join keys on each side of the join are equivalent, but if the query contains other references to the join key from the table that could otherwise be eliminated, this prevents elimination. A workaround is to rewrite the query to refer to the join key from the other table (we realize this is not always possible).

Friday, March 14, 2008

Oracle keeps closing my TAR because I cannot provide a testcase, can you help?

The answer to this question is yes, as Oracle Database 11g provides a new diagnostic tool called SQL Test Case Builder. In this article, we explain what SQL Test Case Builder is, and how to use it with examples.

Why SQL Test Case Builder?

For most SQL problems, the single most important factor for a speedy bug resolution is to obtain a reproducible test case. However, this is normally the longest and most painful step for customers. The goal of the SQL Test Case Builder (TCB) is to automatically gather as much information as possible related to a SQL incident (problem) and package it in a way that allows a developer or a support engineer to reproduce the problem on his or her own machine quickly.

At a very high-level, SQL Test Case Builder can be seen as a way to export a SQL. Currently, Oracle export (expdp) takes a schema or a set of tables and exports all the dependents objects. SQL Test Case Builder provides the same service but takes a SQL statement as input.

What's Inside Test Case Builder?

The main input of SQL Test Case Builder is a SQL object. A SQL object is defined as the SQL text plus all the information required to compile it on a particular database instance (this contains the parsing user name, for example).

Logically, a SQL test case appears as a script containing all the necessary commands to recreate the objects, the user, the statistics, and the environment.

Within the Oracle Diagnosability infrastructure, TCB compiles the problem SQL in a special capture mode to obtain the set of objects to export. A test case captures two types of information:

Permanent information

SQL text

PL/SQL functions, procedures, packages

Statistics

Bind variables

Compilation environment

User information (like privileges)

SQL profiles, stored outlines, or other SQL Management Objects

Meta data on all the objects involved

Optimizer statistics

The execution plan information

The table content (sample or full). This is optional.

Transient information

How do I use the SQL Test Case Builder?

The task of creating a SQL test case can be performed in two ways:

From EM (Enterprise Manager), where TCB is invoked on user-demand via IPS (Incident Packaging Service) after a SQL incident occurred. The user can also manually create an incident for a problem query for building test case purpose.

From SQLPLUS, where you can directly invoke one of the PL/SQL API functions in the SQL Diagnostic package. We will give examples of using the APIs below.

All the new PL/SQL procedures supporting SQL Test Case Builder are part of a new PL/SQL package called dbms_sqldiag (see dbmsdiag.sql for details). The two main features related to TCB in this package are export and import test cases.

Procedure dbms_sqldiag.export_sql_testcase exports a SQL test case for a given SQL statement to a given directory.

Procedure dbms_sqldiag.import_sql_testcase imports a test case from a given directory.

To build (or export) a test case, the simplest form would be something like:

     dbms_sqldiag.export_sql_testcase(
       directory  => 'TCB_DIR_EXP',
       sql_text   => 'select count(*) from sales',
       testcase   => tco)

Here directory and sql_text are inputs which specify where the test case will be stored, and the problem query statement, respectively. Testcase specifies the test case metadata as output.

For security reason, the user data are not exported by default. You have the option to set exportData to TRUE to include the data. You can also set samplingPercent if you are exporting with data. To protect users proprietary codes, TCB will not export PL/SQL package body by default.

Once the test case has been built, you can copy all the files under the export directory to your test environment. Note there is a file called xxxxxxxxmain.xml, for example, oratcb1_03C600800001main.xml, which contains the metadata of the test case.

Now importing the test case can be as simple as:

     dbms_sqldiag.import_sql_testcase(
       directory => 'TEST_DIR',
       filename => 'oratcb1_03C600800001main.xml')

To verify that the test case is successfully rebuilt, you can just issue an explain command for the problem query. However, if you want to actully run the query, then you need to have the data available.

You can refer to dbmsdiag.sql for more information about other options available for these procedures.

Example - We now show the typical steps of using TCB by a sample query with materialized view. In this exmaple, we set the exportData option to TRUE, so we can re-run the same query after the TCB task is completed.

Setup

SQL> connect / as sysdba
Connected.
SQL>
SQL> create or replace directory TCB_DIR_EXP as
  2  '/net/tiger/apps/tcb_exp';
Directory created.
SQL>
SQL> grant dba to apps;
Grant succeeded.
SQL>
SQL> connect apps/apps
Connected.
SQL>
SQL> create materialized view scp_mvu
  2  parallel 2
  3  as
  4  select          p.prod_name, c.cust_gender,
  5                  max(s.amount_sold) max_amount_sold
  6  from            sales s, products p, customers c
  7  where           s.prod_id = p.prod_id
  8  and             s.cust_id = c.cust_id
  9  group by        p.prod_name, c.cust_gender;

Materialized view created.

SQL>
SQL> desc scp_mvu;
 Name                                      Null?    Type
 ----------------------------------------- -------- ------------
 PROD_NAME                                 NOT NULL VARCHAR2(50)
 CUST_GENDER                                        CHAR(1)
 MAX_AMOUNT_SOLD                                    NUMBER

SQL>
SQL> select * from scp_mvu where max_amount_sold > 7000 order by 3;

PROD_NAME                                          C MAX_AMOUNT_SOLD
-------------------------------------------------- - ---------------
Joseph Sportcoat                                   F          7400.8
Kenny Cool Leather Skirt                           M            7708
Leather Boot-Cut Trousers                          M            8184

3 rows selected.

Export as user APPS

SQL> connect apps/apps
Connected.

SQL>
SQL> Rem define the problem SQL statement
SQL> create or replace package define_vars is
  2    sql_stmt1     varchar2(2000) := q'# select * from scp_mvu
  3                                        where max_amount_sold > 7000
  4                                        order by 3
  5                                      #';
  6  end;
  7  /

Package created.
SQL> 
SQL> set serveroutput on
SQL>
SQL> declare
  2    tco           clob;
  3  begin
  4    -- Export test case
  5    dbms_sqldiag.export_sql_testcase
  6    (
  7      directory           => 'TCB_DIR_EXP',
  8      sql_text            => define_vars.sql_stmt1,
  9      user_name           => 'APPS',
 10      exportData          => TRUE,
 11      testcase            => tco
 12    );
 13 
 14  end;
 15  /

PL/SQL procedure successfully completed.
SQL>
SQL> Rem Drop MV before importing
SQL> drop materialized view scp_mvu;

Materialized view dropped.

SQL> conn / as sysdba
Connected.
SQL> create or replace directory TCB_DIR_IMP
  2  as '/net/lion/test/tcb_imp';
Directory created.
SQL>
SQL> grant dba to test;
Grant succeeded.

Import as user TEST

SQL> connect test/test
Connected.
SQL>
SQL> set serveroutput on
SQL>
SQL> begin
  2    -- Import test case
  3    dbms_sqldiag.import_sql_testcase
  4    (
  5      directory           => 'TCB_DIR_IMP',
  6      filename            => 'oratcb3_05e803500001main.xml',
  7      importData          => TRUE
  8    );
  9 
 10  end;
 11  /

PL/SQL procedure successfully completed.

Verification. This is to check that now all relevant objects were imported successfully.

SQL> desc scp_mvu;
 Name                                      Null?    Type
 ----------------------------------------- -------- ------------
 PROD_NAME                                 NOT NULL VARCHAR2(50)
 CUST_GENDER                                        CHAR(1)
 MAX_AMOUNT_SOLD                                    NUMBER
SQL>
SQL> select * from scp_mvu where max_amount_sold > 7000 order by 3;

PROD_NAME                                          C MAX_AMOUNT_SOLD
-------------------------------------------------- - ---------------
Joseph Sportcoat                                   F          7400.8
Kenny Cool Leather Skirt                           M            7708
Leather Boot-Cut Trousers                          M            8184

3 rows selected.

Finally, we also have good news for 10g users: SQL Test Case Builder has been backported to 10.2.0.4!

Tuesday, February 26, 2008

Upgrading from Oracle Database 9i to 10g: What to expect from the Optimizer

One of the most daunting activities a DBA can undertake is upgrading the database to a new version. Having to comprehend all of the new features and to deal with potential plan changes can be overwhelming. In order to help DBA's upgrade from Oracle Database 9i to 10g a new whitepaper called "Upgrading from Oracle Database 9i to 10g: What to expect from the Optimizer" has recently been posted on Oracle Technology Network (OTN). This paper aims to explain in detail what to expect from the CBO when you upgrade from Oracle database 9i to 10g and describes what steps you should take before and after the upgrade to minimize any potential SQL regressions. This is a must read for any DBA planning on upgrading from 9i to 10g in the near future!

Thursday, February 7, 2008

Displaying and reading the execution plans for a SQL statement

Generating and displaying the execution plan of a SQL statement is a common task for most DBAs, SQL developers, and preformance experts as it provides them information on the performance characteristics of a SQL statement. An execution plan shows the detailed steps necessary to execute a SQL statement. These steps are expressed as a set of database operators that consumes and produces rows. The order of the operators and their implentation is decided by the query optimizer using a combination of query transformations and physical optimization techniques.

While the display is commonly shown in a tabular format, the plan is in fact tree-shaped. For example, consider the following query based on the SH schema (Sales History):


select prod_category, avg(amount_sold)
from sales s, products p
where p.prod_id = s.prod_id
group by prod_category;

The tabular representation of this query's plan is:


------------------------------------------
 Id   Operation              Name   
------------------------------------------
   0  SELECT STATEMENT              
   1   HASH GROUP BY                
   2    HASH JOIN                   
   3     TABLE ACCESS FULL   PRODUCTS
   4     PARTITION RANGE ALL        
   5      TABLE ACCESS FULL  SALES  
------------------------------------------

While the tree-shaped representation of the plan is:


   GROUP BY
      |
     JOIN
 _____|_______
 |            |
ACCESS     ACCESS
(PRODUCTS) (SALES)

When you read a plan tree you should start from the bottom up. In the above example begin by looking at the access operators (or the leaves of the tree). In this case the access operators are implemented using full table scans. The rows produced by these tables scans will be consumed by the join operator. Here the join operator is a hash-join (other alternatives include nested-loop or sort-merge join). Finally the group-by operator implemented here using hash (alternative would be sort) consumes rows produced by the join-opertor.

The execution plan generated for a SQL statement is just one of the many alternative execution plans considered by the query optimizer. The query optimizer selects the execution plan with the lowest cost. Cost is a proxy for performance, the lower is the cost the better is the performance. The cost model used by the query optimizer accounts for the IO, CPU, and network usage in the query.

There are two different methods you can use to look at the execution plan of a SQL statement:

EXPLAIN PLAN command - This displays an execution plan for a SQL statement without actually executing the statement.

V$SQL_PLAN - A dictionary view introduced in Oracle 9i that shows the execution plan for a SQL statement that has been compiled into a cursor in the cursor cache.

Under certain conditions the plan shown when using EXPLAIN PLAN can be different from the plan shown using V$SQL_PLAN. For example, when the SQL statement contains bind variables the plan shown from using EXPLAIN PLAN ignores the bind variable values while the plan shown in V$SQL_PLAN takes the bind variable values into account in the plan generation process.

Displaying an execution plan has been made easier after the introduction of the dbms_xplan package in Oracle 9i and by the enhancements made to it in subsequent releases. This packages provides several PL/SQL procedures to display the plan from different sources:

EXPLAIN PLAN command

V$SQL_PLAN

Automatic Workload Repository (AWR)

SQL Tuning Set (STS)

SQL Plan Baseline (SPM)

The following examples illustrate how to generate and display an execution plan for our original SQL statement using the different functions provided in the dbms_xplan package.

Example 1 Uses the EXPLAIN PLAN command and the dbms_xplan.display function.


SQL> EXPLAIN PLAN FOR
 2   select prod_category, avg(amount_sold)
 3   from sales s, products p
 4   where p.prod_id = s.prod_id
 5   group by prod_category;

Explained.


SQL> select plan_table_output
 2    from table(dbms_xplan.display('plan_table',null,'basic'));

------------------------------------------
 Id   Operation              Name   
------------------------------------------
   0  SELECT STATEMENT              
   1   HASH GROUP BY                
   2    HASH JOIN                   
   3     TABLE ACCESS FULL   PRODUCTS
   4     PARTITION RANGE ALL        
   5      TABLE ACCESS FULL  SALES  
------------------------------------------

The arguments are for dbms_xplan.display are:

plan table name (default 'PLAN_TABLE'),

statement_id (default null),

format (default 'TYPICAL')

More details can be found in $ORACLE_HOME/rdbms/admin/dbmsxpln.sql.

Example 2 Generating and displaying the execution plan for the last SQL statement executed in a session:


SQL> select prod_category, avg(amount_sold)
 2   from sales s, products p
 3   where p.prod_id = s.prod_id
 4   group by prod_category;

no rows selected


SQL> select plan_table_output
 2    from table(dbms_xplan.display_cursor(null,null,'basic'));

------------------------------------------
 Id   Operation              Name   
------------------------------------------
   0  SELECT STATEMENT              
   1   HASH GROUP BY                
   2    HASH JOIN                   
   3     TABLE ACCESS FULL   PRODUCTS
   4     PARTITION RANGE ALL        
   5      TABLE ACCESS FULL  SALES  
------------------------------------------

The arguments used by dbms_xplay.dispay_cursor are:

SQL ID (default null, null means the last SQL statement executed in this session),

child number (default 0),

format (default 'TYPICAL')

The details are in $ORACLE_HOME/rdbms/admin/dbmsxpln.sql.

Example 3 Displaying the execution plan for any other statement requires the SQL ID to be provided, either directly or indirectly:

Directly:


SQL> select plan_table_output from
 2   table(dbms_xplan.display_cursor('fnrtqw9c233tt',null,'basic'));

Indirectly:


SQL> select plan_table_output
 2   from v$sql s,
 3   table(dbms_xplan.display_cursor(s.sql_id,
 4                                  s.child_number, 'basic')) t
 5   where s.sql_text like 'select PROD_CATEGORY%';

Example 4 - Displaying an execution plan corresponding to a SQL Plan Baseline. SQL Plan Baselines have been introduced in Oracle 11g to support the SQL Plan Management feature (SPM). In order to illustrate such a case we need to create a SQL Plan Baseline first.


SQL> alter session set optimizer_capture_sql_plan_baselines=true;

Session altered.

SQL> select prod_category, avg(amount_sold)
 2   from sales s, products p
 3   where p.prod_id = s.prod_id
 4   group by prod_category;

no rows selected

If the above statement has been executed more than once, a SQL Plan Baseline will be created for it and you can verified this using the follows query:


SQL> select SQL_HANDLE, PLAN_NAME, ACCEPTED
 2   from dba_sql_plan_baselines
 3   where sql_text like 'select prod_category%';

SQL_HANDLE                     PLAN_NAME                      ACC
------------------------------ ------------------------------ ---
SYS_SQL_1899bb9331ed7772       SYS_SQL_PLAN_31ed7772f2c7a4c2  YES

The execution plan for the SQL Plan Baseline created above can be displayed either directly or indirectly:

Directly

select t.* from
table(dbms_xplan.display_sql_plan_baseline('SYS_SQL_1899bb9331ed7772',
                                           format => 'basic')) t

Indirectly

select t.*
    from (select distinct sql_handle
          from dba_sql_plan_baselines
          where sql_text like 'select prod_category%') pb,
         table(dbms_xplan.display_sql_plan_baseline(pb.sql_handle,
                                                    null,'basic')) t;

The output of either of these two statements is:


----------------------------------------------------------------------------
SQL handle: SYS_SQL_1899bb9331ed7772
SQL text: select prod_category, avg(amount_sold) from sales s, products p
          where p.prod_id = s.prod_id group by prod_category
----------------------------------------------------------------------------

----------------------------------------------------------------------------
Plan name: SYS_SQL_PLAN_31ed7772f2c7a4c2
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
----------------------------------------------------------------------------

Plan hash value: 4073170114

---------------------------------------------------------
 Id   Operation                 Name               
---------------------------------------------------------
   0  SELECT STATEMENT                             
   1   HASH GROUP BY                               
   2    HASH JOIN                                  
   3     VIEW                   index$_join$_002   
   4      HASH JOIN                                
   5       INDEX FAST FULL SCAN PRODUCTS_PK        
   6       INDEX FAST FULL SCAN PRODUCTS_PROD_CAT_IX
   7     PARTITION RANGE ALL                       
   8      TABLE ACCESS FULL     SALES              
---------------------------------------------------------

Formatting

The format argument is highly customizable and allows you to see as little (high-level) or as much (low-level) details as you need / want in the plan output. The high-level options are:

Basic
The plan includes the operation, options, and the object name (table, index, MV, etc)
Typical
It includes the information shown in BASIC plus additional optimizer-related internal information such as cost, size, cardinality, etc. These information are shown for every operation in the plan and represents what the optimizer thinks is the operation cost, the number of rows produced, etc. It also shows the predicates evaluation by the operation. There are two types of predicates: ACCESS and FILTER. The ACCESS predicates for an index are used to fetch the relevant blocks because they apply to the search columns. The FILTER predicates are evaluated after the blocks have been fetched.
All
It includes the information shown in TYPICAL plus the lists of expressions (columns) produced by every operation, the hint alias and query block names where the operation belongs. The last two pieces of information can be used as arguments to add hints to the statement.

The low-level options allow the inclusion or exclusion of find details, such as predicates and cost.
For example,

select plan_table_output
from table(dbms_xplan.display('plan_table',null,'basic +predicate +cost'));

-------------------------------------------------------
 Id   Operation              Name      Cost (%CPU)
-------------------------------------------------------
   0  SELECT STATEMENT                    17  (18)
   1   HASH GROUP BY                      17  (18)
*  2    HASH JOIN                         15   (7)
   3     TABLE ACCESS FULL   PRODUCTS      9   (0)
   4     PARTITION RANGE ALL               5   (0)
   5      TABLE ACCESS FULL  SALES         5   (0)
-------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("P"."PROD_ID"="S"."PROD_ID")

select plan_table_output from
table(dbms_xplan.display('plan_table',null,'typical -cost -bytes'));

----------------------------------------------------------------------------
 Id   Operation              Name      Rows  Time      Pstart Pstop
----------------------------------------------------------------------------
   0  SELECT STATEMENT                    4  00:00:01             
   1   HASH GROUP BY                      4  00:00:01             
*  2    HASH JOIN                       960  00:00:01             
   3     TABLE ACCESS FULL   PRODUCTS   766  00:00:01             
   4     PARTITION RANGE ALL            960  00:00:01      1     16
   5      TABLE ACCESS FULL  SALES      960  00:00:01      1     16
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("P"."PROD_ID"="S"."PROD_ID")

Note Section

In addition to the plan, the package displays notes in the NOTE section, such as that dynamic sampling was used during query optimization or that star transformation was applied to the query.
For example, if the table SALES did not have statistics then the optimizer will use dynamic sampling and the plan display will report it as follows (see '+note' detail in the query):

select plan_table_output
from table(dbms_xplan.display('plan_table',null,'basic +note'));

------------------------------------------
 Id   Operation              Name   
------------------------------------------
   0  SELECT STATEMENT              
   1   HASH GROUP BY                
   2    HASH JOIN                   
   3     TABLE ACCESS FULL   PRODUCTS
   4     PARTITION RANGE ALL        
   5      TABLE ACCESS FULL  SALES  
------------------------------------------

Note
-----
- dynamic sampling used for this statement

Bind peeking

The query optimizer takes into account the values of bind variable values when generation an execution plan. It does what is generally called bind peeking. See the first post in this blog about the concept of bind peeking and its impact on the plans and the performance of SQL statements.
As stated earlier the plan shown in V$SQL_PLAN takes into account the values of bind variables while the one shown from using EXPLAIN PLAN does not. Starting with 10gR2, the dbms_xplan package allows the display of the bind variable values used to generate a particular cursor/plan. This is done by adding '+peeked_binds' to the format argument when using display_cursor().
This is illustrated with the following example:


variable pcat varchar2(50)
exec :pcat := 'Women'

select PROD_CATEGORY, avg(amount_sold)
from sales s, products p
where p.PROD_ID = s.PROD_ID
and prod_category != :pcat
group by PROD_CATEGORY;

select plan_table_output
from table(dbms_xplan.display_cursor(null,null,'basic +PEEKED_BINDS'));

------------------------------------------
 Id   Operation              Name   
------------------------------------------
   0  SELECT STATEMENT              
   1   HASH GROUP BY                
   2    HASH JOIN                   
   3     TABLE ACCESS FULL   PRODUCTS
   4     PARTITION RANGE ALL        
   5      TABLE ACCESS FULL  SALES  
------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

1 - :PCAT (VARCHAR2(30), CSID=2): 'Women'

Tuesday, January 22, 2008

Improvement of AUTO sampling statistics gathering feature in Oracle 11g

Optimizer statistics in Oracle are managed via a pl/sql package, dbms_stats. It provides several pl/sql procedures to gather statistics for a table, schema, or a database. For example, gather_table_statistics is used to gather statistics on a table. This procedure has an estimate_percent parameter, which specifies the sampling percentage of the statistics gathering. The users can specify any number between 0 ~ 100 for this parameter. For example, suppose you have a table BIGT, you can specify a 1% sampling percentage as follows:

exec dbms_stats.gather_table_stats(null, 'BIGT', 
                       estimate_percent => 1);

It is not always easy for users to pick the appropriate sampling percentage. If the specified sampling percentage is too high, it can take longer to gather statistics. On the contray, if the data is skewed and the specified sampling percentage is too low, the resulting statistics can be inaccurate.

For this reason, Oracle introduced the AUTO value for the estimate_percent parameter. For example, you can gather statistics on BIGT as follows:

exec dbms_stats.gather_table_stats(null, 'BIGT', 
                       estimate_percent => dbms_stats.auto_sample_size);

The advantage of using AUTO sample size over a fixed number is two-folds. First, when AUTO sample size is specified, the system automatically determines the appropriate sampling percentage. Second, AUTO sample size is more flexible than a fixed sampling percentage. A fixed sampling percentage size that was good at some point in time may not be appropriate after the data distribution in the table has changed. On the other hand when AUTO value is used Oracle will adjust the sample size when the data distribution changes.

When AUTO is used Oracle picks a sample size where the statistics quality is good enough. However, it does not work very well under cases of extreme skew in the data. In Oracle 11g, we improved the behavior when the AUTO value is used. First, AUTO sampling now generates deterministic statistics. Second, and more importantly, AUTO sampling generates statistics that are almost as accurate as 100% sampling but takes much less time than 100% sampling. To illustrate these merits, we compare the performance of using a fixed sampling percentage, AUTO sample size in Oracle 10g and AUTO sample size in Oracle 11g.

We used the standard TPC-D data generator to generate a Lineitem table. The Lineitem table is about 230G and contains 1.8 million rows with 16 columns. The schema of the lineitem table is as follows:

column name	column type
l_shipdate	date
l_orderkey	number
l_discount	number
l_extendedprice	number
l_suppkey	number
l_quantity	number
l_returnflag	varchar2
l_partkey	number
l_linestatus	varchar2
l_tax	number
l_commitdate	date
l_receiptdate	date
l_shipmode	varchar2
l_linenumber	number
l_shipinstruct	varchar2
l_comment	varchar2

Table 1 gives the elapsed time of gathering statistics on the Lineitem table by different sampling percentages.

Sampling Percentage	Elapsed Time (sec)
1% sampling	797
100% sampling (Compute)	18772
Auto sampling in Oracle 10g	2935
Auto sampling in Oracle 11g	1908

Table 1: Statistics gathering time on 230G TPC-D Lineitem Table Using Different Estimate Percentages

We also compare the quality of the statistics gathered using different estimate percentages. Among all the statistics of a column, number of distinct values (NDV) is the one whose accuracy used to be an issue. We define the accuracy rate of NDV of a column as follows:


accuracy rate = 1 - (|estimated NDV - actual NDV|)/actual NDV.

The accuracy rate ranges from 0% to 100%. The higher the accuracy rate is, the more accurate the gathered statistics are. Since 100% sampling always lead to an accuracy rate of 100%, we do not report it. We focus on the columns which has at least one statistics accuracy rate below 99.9% when using different estimate percentages. Table 2 illustrates the accurate rates of the columns.

Column Name	Actual NDV	Auto Sampling in Oracle 11g	1% Sampling
orderkey	450,000,000	98.0%	50%
comment	181,122,127	98.60%	4.60%
partkey	60,000,000	99.20%	98.20%
suppkey	3,000,000	99.60%	99.90%
extendedprice	3,791,320	99.60%	94.30%

Table 2: Accuracy Rate of Gathering NDV LineItem Using Different Estimate Percentages

In short, the elapsed time of ''AUTO sampling in Oracle 11g'' is 10 times faster than 100% sampling but yields the statistics of similar quality (accuracy rate close to 100%).

Inside the Oracle Optimizer - Removing the black magic

Got a question for the Optimizer team?