Got a question for the Optimizer team?
Tuesday, October 14, 2008
Oracle Open World follow up
Thursday, July 24, 2008
Will the Optimizer development team be at Oracle Open World?
Maria Colgan
Thursday, June 26, 2008
Why are some of the tables in my query missing from the plan?
In 10gR2, we introduced a new transformation, table elimination (alternately called "join elimination"), which removes redundant tables from a query. A table is redundant if its columns are only referenced to in join predicates, and it is guaranteed that those joins neither filter nor expand the resulting rows. There are several cases where Oracle will eliminate a redundant table. We will discuss each case in turn.
Primary Key-Foreign Key Table Elimination
Starting in 10gR2, the optimizer eliminates tables that are redundant due to primary key-foreign key constraints. Consider the following example tables:
create table jobs
(
job_id NUMBER PRIMARY KEY,
job_title VARCHAR2(35) NOT NULL,
min_salary NUMBER,
max_salary NUMBER
);
create table departments
(
department_id NUMBER PRIMARY KEY,
department_name VARCHAR2(50)
);
create table employees
(
employee_id NUMBER PRIMARY KEY,
employee_name VARCHAR2(50),
department_id NUMBER REFERENCES departments(department_id),
job_id NUMBER REFERENCES jobs(job_id)
);
and the query:
select e.employee_name
from employees e, departments d
where e.department_id = d.department_id;
In this query, the join to departments is redundant. The only column referenced in the query appears in the join predicate, and the primary key-foreign key constraint guarantees that there is at most one match in departments for each row in employees. Hence, the query is equivalent to:
select e.employee_name
from employees e
where e.department_id is not null;
The optimizer will generate this plan for the query:
-------------------------------------------
Id Operation Name
-------------------------------------------
0 SELECT STATEMENT
* 1 TABLE ACCESS FULL EMPLOYEES
-------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("E"."DEPARTMENT_ID" IS NOT NULL)
Note that the IS NOT NULL predicate is not necessary if the column has a NOT NULL constraint on it.
Starting in 11gR1, the optimizer will also eliminate tables that are semi-joined or anti-joined. Consider the following query:
select e.employee_id, e.employee_name
from employees e
where not exists (select 1
from jobs j
where j.job_id = e.job_id);
Since employees.job_id is a foreign key to jobs.job_id, any non-null value in employees.job_id must have a match in jobs. So only employees with null values for employees.job_id will appear in the result. Hence, this query is equivalent to:
select e.employee_id, e.employee_name
from employees e
where job_id is null;
and the optimizer can choose this plan:
-------------------------------------------
Id Operation Name
-------------------------------------------
0 SELECT STATEMENT
* 1 TABLE ACCESS FULL EMPLOYEES
-------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("E"."JOB_ID" IS NULL)
Suppose employees.job_id has a NOT NULL constraint:
alter table employees modify job_id not null;
In this case, there could not possibly be any rows in EMPLOYEES, and the optimizer could choose this plan:
-------------------------------------------
Id Operation Name
-------------------------------------------
0 SELECT STATEMENT
* 1 FILTER
2 TABLE ACCESS FULL EMPLOYEES
-------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NULL IS NOT NULL)
The "NULL IS NOT NULL" filter is a false constant predicate, that will prevent the table scan from even taking place.
Also in 11gR1, the optimization became available for ANSI compliant joins. For this query:
select employee_name
from employees e inner join jobs j
on e.job_id = j.job_id;
the optimizer can eliminate JOBS and produce this plan:
-------------------------------------------
Id Operation Name
-------------------------------------------
0 SELECT STATEMENT
1 TABLE ACCESS FULL EMPLOYEES
-------------------------------------------
Outer Join Table Elimination
In 11gR1, a new form of table elimination was introduced for outer joins, which does not require PK-FK constraints. For the example, we require a new table and an addition to EMPLOYEES:
create table projects
(
project_id NUMBER UNIQUE,
deadline DATE,
priority NUMBER
);
alter table employees add project_id number;
Now consider a query that outer joins employees and projects:
select e.employee_name, e.project_id
from employees e, projects p
where e.project_id = p.project_id (+);
The outer join guarantees that every row in employees will appear at least once in the result. The unique constraint on projects.project_id guarantees that every row in employees will match at most one row in projects. Together, these two properties guarantee that every row in employees will appear in the result exactly once. Since no other columns from projects are referenced, projects can be eliminated, and the optimizer can choose this plan:
-------------------------------------------
Id Operation Name
-------------------------------------------
0 SELECT STATEMENT
1 TABLE ACCESS FULL EMPLOYEES
-------------------------------------------
Why Would I Ever Write Such a Query?
All of the example queries in this post are very simple, and one would be unlikely to write a query where the join is so obviously unnecessary. There are many real world scenarios where table elimination may be helpful, including machine-generated queries and elimination of tables in views. For example, a set of tables might be exposed as a view, which contains a join. The join may be necessary to retrieve all of the columns exposed by the view. But some users of the view may only access a subset of the columns, and in this case, the joined table can be eliminated.
For example, consider the view:
create view employee_directory_v as
select e.employee_name, d.department_name, j.job_title
from employees e, departments d, jobs j
where e.department_id = d.department_id
and e.job_id = j.job_id;
This view might be exposed to a simple employee directory application. To lookup employee names by job title, the application issues a query:
select employee_name
from employee_directory_v
where department = 'ACCOUNTING';
Since the job_title column is not referenced, jobs can be eliminated from the query, and the optimizer can choose this plan:
--------------------------------------------
Id Operation Name
--------------------------------------------
0 SELECT STATEMENT
* 1 HASH JOIN
2 TABLE ACCESS FULL EMPLOYEES
* 3 TABLE ACCESS FULL DEPARTMENTS
--------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
3 - filter("D"."DEPARTMENT_NAME"='ACCOUNTING')
Known Limitations
There are currently a few limitations of table elimination:
- Multi-column primary key-foreign key constraints are not supported.
- Referring to the join key elsewhere in the query will prevent table elimination. For an inner join, the join keys on each side of the join are equivalent, but if the query contains other references to the join key from the table that could otherwise be eliminated, this prevents elimination. A workaround is to rewrite the query to refer to the join key from the other table (we realize this is not always possible).
Friday, March 14, 2008
Oracle keeps closing my TAR because I cannot provide a testcase, can you help?
Why SQL Test Case Builder?
For most SQL problems, the single most important factor for a speedy bug resolution is to obtain a reproducible test case. However, this is normally the longest and most painful step for customers. The goal of the SQL Test Case Builder (TCB) is to automatically gather as much information as possible related to a SQL incident (problem) and package it in a way that allows a developer or a support engineer to reproduce the problem on his or her own machine quickly.
At a very high-level, SQL Test Case Builder can be seen as a way to export a SQL. Currently, Oracle export (expdp) takes a schema or a set of tables and exports all the dependents objects. SQL Test Case Builder provides the same service but takes a SQL statement as input.
What's Inside Test Case Builder?
The main input of SQL Test Case Builder is a SQL object. A SQL object is defined as the SQL text plus all the information required to compile it on a particular database instance (this contains the parsing user name, for example).
Logically, a SQL test case appears as a script containing all the necessary commands to recreate the objects, the user, the statistics, and the environment.
Within the Oracle Diagnosability infrastructure, TCB compiles the problem SQL in a special capture mode to obtain the set of objects to export. A test case captures two types of information:
- Permanent information
- SQL text
- PL/SQL functions, procedures, packages
- Statistics
- Bind variables
- Compilation environment
- User information (like privileges)
- SQL profiles, stored outlines, or other SQL Management Objects
- Meta data on all the objects involved
- Optimizer statistics
- The execution plan information
- The table content (sample or full). This is optional.
- Transient information
For most of the SQL test cases, the permanent information above is enough to reproduce a problem. There are however cases where this is not enough and additional information about the context in which this SQL was compiled is required. Therefore, in addition to the permanent information, SQL Test Case Builder captures transient information, e.g. information that is only available as part of the compilation of the SQL statement. This includes dynamic sampling results, cached information, some run time information, like the actual degree of parallelism used, etc.
As part of creating a SQL test case, the SQL object is reloaded and all the diagnostic information available generated and gathered automatically. This information will be made available to Oracle support and developers.
How do I use the SQL Test Case Builder?
The task of creating a SQL test case can be performed in two ways:
- From EM (Enterprise Manager), where TCB is invoked on user-demand via IPS (Incident Packaging Service) after a SQL incident occurred. The user can also manually create an incident for a problem query for building test case purpose.
- From SQLPLUS, where you can directly invoke one of the PL/SQL API functions in the SQL Diagnostic package. We will give examples of using the APIs below.
All the new PL/SQL procedures supporting SQL Test Case Builder are part of a new PL/SQL package called dbms_sqldiag (see dbmsdiag.sql for details). The two main features related to TCB in this package are export and import test cases.
- Procedure dbms_sqldiag.export_sql_testcase exports a SQL test case for a given SQL statement to a given directory.
- Procedure dbms_sqldiag.import_sql_testcase imports a test case from a given directory.
To build (or export) a test case, the simplest form would be something like:
dbms_sqldiag.export_sql_testcase(
directory => 'TCB_DIR_EXP',
sql_text => 'select count(*) from sales',
testcase => tco)
Here directory and sql_text are inputs which specify where the test case will be stored, and the problem query statement, respectively. Testcase specifies the test case metadata as output.
For security reason, the user data are not exported by default. You have the option to set exportData to TRUE to include the data. You can also set samplingPercent if you are exporting with data. To protect users proprietary codes, TCB will not export PL/SQL package body by default.
Once the test case has been built, you can copy all the files under the export directory to your test environment. Note there is a file called xxxxxxxxmain.xml, for example, oratcb1_03C600800001main.xml, which contains the metadata of the test case.
Now importing the test case can be as simple as:
dbms_sqldiag.import_sql_testcase(
directory => 'TEST_DIR',
filename => 'oratcb1_03C600800001main.xml')
To verify that the test case is successfully rebuilt, you can just issue an explain command for the problem query. However, if you want to actully run the query, then you need to have the data available.
You can refer to dbmsdiag.sql for more information about other options available for these procedures.
Example - We now show the typical steps of using TCB by a sample query with materialized view. In this exmaple, we set the exportData option to TRUE, so we can re-run the same query after the TCB task is completed.
- Setup
- Export as user APPS
- Import as user TEST
- Verification. This is to check that now all relevant objects were imported successfully.
SQL> connect / as sysdba
Connected.
SQL>
SQL> create or replace directory TCB_DIR_EXP as
2 '/net/tiger/apps/tcb_exp';
Directory created.
SQL>
SQL> grant dba to apps;
Grant succeeded.
SQL>
SQL> connect apps/apps
Connected.
SQL>
SQL> create materialized view scp_mvu
2 parallel 2
3 as
4 select p.prod_name, c.cust_gender,
5 max(s.amount_sold) max_amount_sold
6 from sales s, products p, customers c
7 where s.prod_id = p.prod_id
8 and s.cust_id = c.cust_id
9 group by p.prod_name, c.cust_gender;
Materialized view created.
SQL>
SQL> desc scp_mvu;
Name Null? Type
----------------------------------------- -------- ------------
PROD_NAME NOT NULL VARCHAR2(50)
CUST_GENDER CHAR(1)
MAX_AMOUNT_SOLD NUMBER
SQL>
SQL> select * from scp_mvu where max_amount_sold > 7000 order by 3;
PROD_NAME C MAX_AMOUNT_SOLD
-------------------------------------------------- - ---------------
Joseph Sportcoat F 7400.8
Kenny Cool Leather Skirt M 7708
Leather Boot-Cut Trousers M 8184
3 rows selected.
SQL> connect apps/apps
Connected.
SQL>
SQL> Rem define the problem SQL statement
SQL> create or replace package define_vars is
2 sql_stmt1 varchar2(2000) := q'# select * from scp_mvu
3 where max_amount_sold > 7000
4 order by 3
5 #';
6 end;
7 /
Package created.
SQL>
SQL> set serveroutput on
SQL>
SQL> declare
2 tco clob;
3 begin
4 -- Export test case
5 dbms_sqldiag.export_sql_testcase
6 (
7 directory => 'TCB_DIR_EXP',
8 sql_text => define_vars.sql_stmt1,
9 user_name => 'APPS',
10 exportData => TRUE,
11 testcase => tco
12 );
13
14 end;
15 /
PL/SQL procedure successfully completed.
SQL>
SQL> Rem Drop MV before importing
SQL> drop materialized view scp_mvu;
Materialized view dropped.
At this stage, the export procedure has successfully completed. The next commands prepare a directory for import purpose. The directory could be on a different machine.
SQL> conn / as sysdba
Connected.
SQL> create or replace directory TCB_DIR_IMP
2 as '/net/lion/test/tcb_imp';
Directory created.
SQL>
SQL> grant dba to test;
Grant succeeded.
As the export has finished successfully, you can now transfer all the files under TCB_DIR_EXP to a directory in test environment, for example, TCB_DIR_IMP as created above. Again, look up and make note of the TCB metadata file xxxxxxxxmain.xml, which will be used below.
SQL> connect test/test
Connected.
SQL>
SQL> set serveroutput on
SQL>
SQL> begin
2 -- Import test case
3 dbms_sqldiag.import_sql_testcase
4 (
5 directory => 'TCB_DIR_IMP',
6 filename => 'oratcb3_05e803500001main.xml',
7 importData => TRUE
8 );
9
10 end;
11 /
PL/SQL procedure successfully completed.
SQL> desc scp_mvu;
Name Null? Type
----------------------------------------- -------- ------------
PROD_NAME NOT NULL VARCHAR2(50)
CUST_GENDER CHAR(1)
MAX_AMOUNT_SOLD NUMBER
SQL>
SQL> select * from scp_mvu where max_amount_sold > 7000 order by 3;
PROD_NAME C MAX_AMOUNT_SOLD
-------------------------------------------------- - ---------------
Joseph Sportcoat F 7400.8
Kenny Cool Leather Skirt M 7708
Leather Boot-Cut Trousers M 8184
3 rows selected.
Finally, we also have good news for 10g users: SQL Test Case Builder has been backported to 10.2.0.4!
Tuesday, February 26, 2008
Upgrading from Oracle Database 9i to 10g: What to expect from the Optimizer
Thursday, February 7, 2008
Displaying and reading the execution plans for a SQL statement
While the display is commonly shown in a tabular format, the plan is in fact tree-shaped. For example, consider the following query based on the SH schema (Sales History):
select prod_category, avg(amount_sold)
from sales s, products p
where p.prod_id = s.prod_id
group by prod_category;
The tabular representation of this query's plan is:
------------------------------------------
Id Operation Name
------------------------------------------
0 SELECT STATEMENT
1 HASH GROUP BY
2 HASH JOIN
3 TABLE ACCESS FULL PRODUCTS
4 PARTITION RANGE ALL
5 TABLE ACCESS FULL SALES
------------------------------------------
While the tree-shaped representation of the plan is:
GROUP BY
|
JOIN
_____|_______
| |
ACCESS ACCESS
(PRODUCTS) (SALES)
When you read a plan tree you should start from the bottom up. In the above example begin by looking at the access operators (or the leaves of the tree). In this case the access operators are implemented using full table scans. The rows produced by these tables scans will be consumed by the join operator. Here the join operator is a hash-join (other alternatives include nested-loop or sort-merge join). Finally the group-by operator implemented here using hash (alternative would be sort) consumes rows produced by the join-opertor.
The execution plan generated for a SQL statement is just one of the many alternative execution plans considered by the query optimizer. The query optimizer selects the execution plan with the lowest cost. Cost is a proxy for performance, the lower is the cost the better is the performance. The cost model used by the query optimizer accounts for the IO, CPU, and network usage in the query.
There are two different methods you can use to look at the execution plan of a SQL statement:
- EXPLAIN PLAN command - This displays an execution plan for a SQL statement without actually executing the statement.
- V$SQL_PLAN - A dictionary view introduced in Oracle 9i that shows the execution plan for a SQL statement that has been compiled into a cursor in the cursor cache.
Under certain conditions the plan shown when using EXPLAIN PLAN can be different from the plan shown using V$SQL_PLAN. For example, when the SQL statement contains bind variables the plan shown from using EXPLAIN PLAN ignores the bind variable values while the plan shown in V$SQL_PLAN takes the bind variable values into account in the plan generation process.
Displaying an execution plan has been made easier after the introduction of the dbms_xplan package in Oracle 9i and by the enhancements made to it in subsequent releases. This packages provides several PL/SQL procedures to display the plan from different sources:
- EXPLAIN PLAN command
- V$SQL_PLAN
- Automatic Workload Repository (AWR)
- SQL Tuning Set (STS)
- SQL Plan Baseline (SPM)
The following examples illustrate how to generate and display an execution plan for our original SQL statement using the different functions provided in the dbms_xplan package.
Example 1 Uses the EXPLAIN PLAN command and the dbms_xplan.display function.
SQL> EXPLAIN PLAN FOR
2 select prod_category, avg(amount_sold)
3 from sales s, products p
4 where p.prod_id = s.prod_id
5 group by prod_category;
Explained.
SQL> select plan_table_output
2 from table(dbms_xplan.display('plan_table',null,'basic'));
------------------------------------------
Id Operation Name
------------------------------------------
0 SELECT STATEMENT
1 HASH GROUP BY
2 HASH JOIN
3 TABLE ACCESS FULL PRODUCTS
4 PARTITION RANGE ALL
5 TABLE ACCESS FULL SALES
------------------------------------------
The arguments are for dbms_xplan.display are:
- plan table name (default 'PLAN_TABLE'),
- statement_id (default null),
- format (default 'TYPICAL')
More details can be found in $ORACLE_HOME/rdbms/admin/dbmsxpln.sql.
Example 2 Generating and displaying the execution plan for the last SQL statement executed in a session:
SQL> select prod_category, avg(amount_sold)
2 from sales s, products p
3 where p.prod_id = s.prod_id
4 group by prod_category;
no rows selected
SQL> select plan_table_output
2 from table(dbms_xplan.display_cursor(null,null,'basic'));
------------------------------------------
Id Operation Name
------------------------------------------
0 SELECT STATEMENT
1 HASH GROUP BY
2 HASH JOIN
3 TABLE ACCESS FULL PRODUCTS
4 PARTITION RANGE ALL
5 TABLE ACCESS FULL SALES
------------------------------------------
The arguments used by dbms_xplay.dispay_cursor are:
- SQL ID (default null, null means the last SQL statement executed in this session),
- child number (default 0),
- format (default 'TYPICAL')
The details are in $ORACLE_HOME/rdbms/admin/dbmsxpln.sql.
Example 3 Displaying the execution plan for any other statement requires the SQL ID to be provided, either directly or indirectly:
- Directly:
SQL> select plan_table_output from
2 table(dbms_xplan.display_cursor('fnrtqw9c233tt',null,'basic')); - Indirectly:
SQL> select plan_table_output
2 from v$sql s,
3 table(dbms_xplan.display_cursor(s.sql_id,
4 s.child_number, 'basic')) t
5 where s.sql_text like 'select PROD_CATEGORY%';
Example 4 - Displaying an execution plan corresponding to a SQL Plan Baseline. SQL Plan Baselines have been introduced in Oracle 11g to support the SQL Plan Management feature (SPM). In order to illustrate such a case we need to create a SQL Plan Baseline first.
SQL> alter session set optimizer_capture_sql_plan_baselines=true;
Session altered.
SQL> select prod_category, avg(amount_sold)
2 from sales s, products p
3 where p.prod_id = s.prod_id
4 group by prod_category;
no rows selected
If the above statement has been executed more than once, a SQL Plan Baseline will be created for it and you can verified this using the follows query:
SQL> select SQL_HANDLE, PLAN_NAME, ACCEPTED
2 from dba_sql_plan_baselines
3 where sql_text like 'select prod_category%';
SQL_HANDLE PLAN_NAME ACC
------------------------------ ------------------------------ ---
SYS_SQL_1899bb9331ed7772 SYS_SQL_PLAN_31ed7772f2c7a4c2 YES
The execution plan for the SQL Plan Baseline created above can be displayed either directly or indirectly:
- Directly
select t.* from
table(dbms_xplan.display_sql_plan_baseline('SYS_SQL_1899bb9331ed7772',
format => 'basic')) t - Indirectly
select t.*
from (select distinct sql_handle
from dba_sql_plan_baselines
where sql_text like 'select prod_category%') pb,
table(dbms_xplan.display_sql_plan_baseline(pb.sql_handle,
null,'basic')) t;
The output of either of these two statements is:
----------------------------------------------------------------------------
SQL handle: SYS_SQL_1899bb9331ed7772
SQL text: select prod_category, avg(amount_sold) from sales s, products p
where p.prod_id = s.prod_id group by prod_category
----------------------------------------------------------------------------
----------------------------------------------------------------------------
Plan name: SYS_SQL_PLAN_31ed7772f2c7a4c2
Enabled: YES Fixed: NO Accepted: YES Origin: AUTO-CAPTURE
----------------------------------------------------------------------------
Plan hash value: 4073170114
---------------------------------------------------------
Id Operation Name
---------------------------------------------------------
0 SELECT STATEMENT
1 HASH GROUP BY
2 HASH JOIN
3 VIEW index$_join$_002
4 HASH JOIN
5 INDEX FAST FULL SCAN PRODUCTS_PK
6 INDEX FAST FULL SCAN PRODUCTS_PROD_CAT_IX
7 PARTITION RANGE ALL
8 TABLE ACCESS FULL SALES
---------------------------------------------------------
Formatting
The format argument is highly customizable and allows you to see as little (high-level) or as much (low-level) details as you need / want in the plan output. The high-level options are:
- Basic
The plan includes the operation, options, and the object name (table, index, MV, etc) - Typical
It includes the information shown in BASIC plus additional optimizer-related internal information such as cost, size, cardinality, etc. These information are shown for every operation in the plan and represents what the optimizer thinks is the operation cost, the number of rows produced, etc. It also shows the predicates evaluation by the operation. There are two types of predicates: ACCESS and FILTER. The ACCESS predicates for an index are used to fetch the relevant blocks because they apply to the search columns. The FILTER predicates are evaluated after the blocks have been fetched. - All
It includes the information shown in TYPICAL plus the lists of expressions (columns) produced by every operation, the hint alias and query block names where the operation belongs. The last two pieces of information can be used as arguments to add hints to the statement.
For example,
select plan_table_output
from table(dbms_xplan.display('plan_table',null,'basic +predicate +cost'));
-------------------------------------------------------
Id Operation Name Cost (%CPU)
-------------------------------------------------------
0 SELECT STATEMENT 17 (18)
1 HASH GROUP BY 17 (18)
* 2 HASH JOIN 15 (7)
3 TABLE ACCESS FULL PRODUCTS 9 (0)
4 PARTITION RANGE ALL 5 (0)
5 TABLE ACCESS FULL SALES 5 (0)
-------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("P"."PROD_ID"="S"."PROD_ID")
select plan_table_output from
table(dbms_xplan.display('plan_table',null,'typical -cost -bytes'));
----------------------------------------------------------------------------
Id Operation Name Rows Time Pstart Pstop
----------------------------------------------------------------------------
0 SELECT STATEMENT 4 00:00:01
1 HASH GROUP BY 4 00:00:01
* 2 HASH JOIN 960 00:00:01
3 TABLE ACCESS FULL PRODUCTS 766 00:00:01
4 PARTITION RANGE ALL 960 00:00:01 1 16
5 TABLE ACCESS FULL SALES 960 00:00:01 1 16
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("P"."PROD_ID"="S"."PROD_ID")
Note Section
In addition to the plan, the package displays notes in the NOTE section, such as that dynamic sampling was used during query optimization or that star transformation was applied to the query.
For example, if the table SALES did not have statistics then the optimizer will use dynamic sampling and the plan display will report it as follows (see '+note' detail in the query):
select plan_table_output
from table(dbms_xplan.display('plan_table',null,'basic +note'));
------------------------------------------
Id Operation Name
------------------------------------------
0 SELECT STATEMENT
1 HASH GROUP BY
2 HASH JOIN
3 TABLE ACCESS FULL PRODUCTS
4 PARTITION RANGE ALL
5 TABLE ACCESS FULL SALES
------------------------------------------
Note
-----
- dynamic sampling used for this statement
Bind peeking
The query optimizer takes into account the values of bind variable values when generation an execution plan. It does what is generally called bind peeking. See the first post in this blog about the concept of bind peeking and its impact on the plans and the performance of SQL statements.
As stated earlier the plan shown in V$SQL_PLAN takes into account the values of bind variables while the one shown from using EXPLAIN PLAN does not. Starting with 10gR2, the dbms_xplan package allows the display of the bind variable values used to generate a particular cursor/plan. This is done by adding '+peeked_binds' to the format argument when using display_cursor().
This is illustrated with the following example:
variable pcat varchar2(50)
exec :pcat := 'Women'
select PROD_CATEGORY, avg(amount_sold)
from sales s, products p
where p.PROD_ID = s.PROD_ID
and prod_category != :pcat
group by PROD_CATEGORY;
select plan_table_output
from table(dbms_xplan.display_cursor(null,null,'basic +PEEKED_BINDS'));
------------------------------------------
Id Operation Name
------------------------------------------
0 SELECT STATEMENT
1 HASH GROUP BY
2 HASH JOIN
3 TABLE ACCESS FULL PRODUCTS
4 PARTITION RANGE ALL
5 TABLE ACCESS FULL SALES
------------------------------------------
Peeked Binds (identified by position):
--------------------------------------
1 - :PCAT (VARCHAR2(30), CSID=2): 'Women'
Tuesday, January 22, 2008
Improvement of AUTO sampling statistics gathering feature in Oracle 11g
exec dbms_stats.gather_table_stats(null, 'BIGT',
estimate_percent => 1);
It is not always easy for users to pick the appropriate sampling percentage. If the specified sampling percentage is too high, it can take longer to gather statistics. On the contray, if the data is skewed and the specified sampling percentage is too low, the resulting statistics can be inaccurate.
For this reason, Oracle introduced the AUTO value for the estimate_percent parameter. For example, you can gather statistics on BIGT as follows:
exec dbms_stats.gather_table_stats(null, 'BIGT',
estimate_percent => dbms_stats.auto_sample_size);
When AUTO is used Oracle picks a sample size where the statistics quality is good enough. However, it does not work very well under cases of extreme skew in the data. In Oracle 11g, we improved the behavior when the AUTO value is used. First, AUTO sampling now generates deterministic statistics. Second, and more importantly, AUTO sampling generates statistics that are almost as accurate as 100% sampling but takes much less time than 100% sampling. To illustrate these merits, we compare the performance of using a fixed sampling percentage, AUTO sample size in Oracle 10g and AUTO sample size in Oracle 11g.
We used the standard TPC-D data generator to generate a Lineitem table. The Lineitem table is about 230G and contains 1.8 million rows with 16 columns. The schema of the lineitem table is as follows:
column name | column type |
l_shipdate | date |
l_orderkey | number |
l_discount | number |
l_extendedprice | number |
l_suppkey | number |
l_quantity | number |
l_returnflag | varchar2 |
l_partkey | number |
l_linestatus | varchar2 |
l_tax | number |
l_commitdate | date |
l_receiptdate | date |
l_shipmode | varchar2 |
l_linenumber | number |
l_shipinstruct | varchar2 |
l_comment | varchar2 |
Table 1 gives the elapsed time of gathering statistics on the Lineitem table by different sampling percentages.
Sampling Percentage | Elapsed Time (sec) |
1% sampling | 797 |
100% sampling (Compute) | 18772 |
Auto sampling in Oracle 10g | 2935 |
Auto sampling in Oracle 11g | 1908 |
Table 1: Statistics gathering time on 230G TPC-D Lineitem Table Using Different Estimate Percentages
We also compare the quality of the statistics gathered using different estimate percentages. Among all the statistics of a column, number of distinct values (NDV) is the one whose accuracy used to be an issue. We define the accuracy rate of NDV of a column as follows:
accuracy rate = 1 - (|estimated NDV - actual NDV|)/actual NDV.
The accuracy rate ranges from 0% to 100%. The higher the accuracy rate is, the more accurate the gathered statistics are. Since 100% sampling always lead to an accuracy rate of 100%, we do not report it. We focus on the columns which has at least one statistics accuracy rate below 99.9% when using different estimate percentages. Table 2 illustrates the accurate rates of the columns.
Column Name | Actual NDV | Auto Sampling in Oracle 11g | 1% Sampling |
orderkey | 450,000,000 | 98.0% | 50% |
comment | 181,122,127 | 98.60% | 4.60% |
partkey | 60,000,000 | 99.20% | 98.20% |
suppkey | 3,000,000 | 99.60% | 99.90% |
extendedprice | 3,791,320 | 99.60% | 94.30% |
Table 2: Accuracy Rate of Gathering NDV LineItem Using Different Estimate Percentages
In short, the elapsed time of ''AUTO sampling in Oracle 11g'' is 10 times faster than 100% sampling but yields the statistics of similar quality (accuracy rate close to 100%).