Got a question for the Optimizer team?

If you have an idea for a post you would like to see on our blog please email us


Thursday, June 26, 2008

Why are some of the tables in my query missing from the plan?

We apologize for our brief hiatus from blogging. We've been busy working on improvements to the optimizer.

In 10gR2, we introduced a new transformation, table elimination (alternately called "join elimination"), which removes redundant tables from a query. A table is redundant if its columns are only referenced to in join predicates, and it is guaranteed that those joins neither filter nor expand the resulting rows. There are several cases where Oracle will eliminate a redundant table. We will discuss each case in turn.

Primary Key-Foreign Key Table Elimination

Starting in 10gR2, the optimizer eliminates tables that are redundant due to primary key-foreign key constraints. Consider the following example tables:

create table jobs
(
job_id NUMBER PRIMARY KEY,

job_title VARCHAR2(35) NOT NULL,
min_salary NUMBER,
max_salary NUMBER
);

create table departments
(
department_id NUMBER PRIMARY KEY,
department_name VARCHAR2(50)
);
create table employees
(
employee_id NUMBER PRIMARY KEY,
employee_name VARCHAR2(50),
department_id NUMBER REFERENCES departments(department_id),
job_id NUMBER REFERENCES jobs(job_id)
);

and the query:

select e.employee_name
from employees e, departments d
where e.department_id = d.department_id;

In this query, the join to departments is redundant. The only column referenced in the query appears in the join predicate, and the primary key-foreign key constraint guarantees that there is at most one match in departments for each row in employees. Hence, the query is equivalent to:

select e.employee_name
from employees e
where e.department_id is not null;


The optimizer will generate this plan for the query:

-------------------------------------------
Id Operation Name
-------------------------------------------
0 SELECT STATEMENT
* 1 TABLE ACCESS FULL EMPLOYEES
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("E"."DEPARTMENT_ID" IS NOT NULL)

Note that the IS NOT NULL predicate is not necessary if the column has a NOT NULL constraint on it.

Starting in 11gR1, the optimizer will also eliminate tables that are semi-joined or anti-joined. Consider the following query:

select e.employee_id, e.employee_name
from employees e
where not exists (select 1
from jobs j
where j.job_id = e.job_id);


Since employees.job_id is a foreign key to jobs.job_id, any non-null value in employees.job_id must have a match in jobs. So only employees with null values for employees.job_id will appear in the result. Hence, this query is equivalent to:

select e.employee_id, e.employee_name
from employees e
where job_id is null;

and the optimizer can choose this plan:

-------------------------------------------
Id Operation Name
-------------------------------------------
0 SELECT STATEMENT
* 1 TABLE ACCESS FULL EMPLOYEES
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("E"."JOB_ID" IS NULL)

Suppose employees.job_id has a NOT NULL constraint:

alter table employees modify job_id not null;

In this case, there could not possibly be any rows in EMPLOYEES, and the optimizer could choose this plan:

-------------------------------------------
Id Operation Name
-------------------------------------------
0 SELECT STATEMENT
* 1 FILTER
2 TABLE ACCESS FULL EMPLOYEES
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NULL IS NOT NULL)

The "NULL IS NOT NULL" filter is a false constant predicate, that will prevent the table scan from even taking place.

Also in 11gR1, the optimization became available for ANSI compliant joins. For this query:

select employee_name
from employees e inner join jobs j
on e.job_id = j.job_id;

the optimizer can eliminate JOBS and produce this plan:

-------------------------------------------
Id Operation Name
-------------------------------------------
0 SELECT STATEMENT
1 TABLE ACCESS FULL EMPLOYEES
-------------------------------------------

Outer Join Table Elimination

In 11gR1, a new form of table elimination was introduced for outer joins, which does not require PK-FK constraints. For the example, we require a new table and an addition to EMPLOYEES:

create table projects
(
project_id NUMBER UNIQUE,
deadline DATE,
priority NUMBER
);

alter table employees add project_id number;

Now consider a query that outer joins employees and projects:

select e.employee_name, e.project_id
from employees e, projects p
where e.project_id = p.project_id (+);

The outer join guarantees that every row in employees will appear at least once in the result. The unique constraint on projects.project_id guarantees that every row in employees will match at most one row in projects. Together, these two properties guarantee that every row in employees will appear in the result exactly once. Since no other columns from projects are referenced, projects can be eliminated, and the optimizer can choose this plan:

-------------------------------------------
Id Operation Name
-------------------------------------------
0 SELECT STATEMENT
1 TABLE ACCESS FULL EMPLOYEES
-------------------------------------------

Why Would I Ever Write Such a Query?

All of the example queries in this post are very simple, and one would be unlikely to write a query where the join is so obviously unnecessary. There are many real world scenarios where table elimination may be helpful, including machine-generated queries and elimination of tables in views. For example, a set of tables might be exposed as a view, which contains a join. The join may be necessary to retrieve all of the columns exposed by the view. But some users of the view may only access a subset of the columns, and in this case, the joined table can be eliminated.

For example, consider the view:

create view employee_directory_v as
select e.employee_name, d.department_name, j.job_title
from employees e, departments d, jobs j
where e.department_id = d.department_id
and e.job_id = j.job_id;

This view might be exposed to a simple employee directory application. To lookup employee names by job title, the application issues a query:

select employee_name
from employee_directory_v
where department = 'ACCOUNTING';

Since the job_title column is not referenced, jobs can be eliminated from the query, and the optimizer can choose this plan:

--------------------------------------------
Id Operation Name
--------------------------------------------
0 SELECT STATEMENT
* 1 HASH JOIN
2 TABLE ACCESS FULL EMPLOYEES
* 3 TABLE ACCESS FULL DEPARTMENTS
--------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
3 - filter("D"."DEPARTMENT_NAME"='ACCOUNTING')

Known Limitations

There are currently a few limitations of table elimination:
  • Multi-column primary key-foreign key constraints are not supported.
  • Referring to the join key elsewhere in the query will prevent table elimination. For an inner join, the join keys on each side of the join are equivalent, but if the query contains other references to the join key from the table that could otherwise be eliminated, this prevents elimination. A workaround is to rewrite the query to refer to the join key from the other table (we realize this is not always possible).

No comments: