Got a question for the Optimizer team?

If you have an idea for a post you would like to see on our blog please email us


Tuesday, December 11, 2007

Outerjoins in Oracle

Since release 6, Oracle has supported a restricted form of left outerjoin, which uses Oracle-specific syntax. In 9i, Oracle introduced support for ANSI SQL 92/99 syntax for inner joins and various types of outerjoin. The Oracle syntax for left outerjoin and that of ANSI SQL 92/99 are not equivalent, as the latter is more expressive.

There appears to be some confusion about equivalence between ANSI outer join and Oracle outer join syntaxes. The following examples explain the equivalences and inequivalences of these two syntaxes.

Oracle-Specific Syntax

Consider query A, which expresses a left outerjoin in the Oracle syntax. Here T1 is the left table whose non-joining rows will be retained and all non-joining rows of T2 will be null appended.


A.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+);


ANSI Left Outerjoin

In the ANSI outer join syntax, query A can be expressed as query B.


B.
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x);


Equivalence

Consider the following queries. In the Oracle semantics, the presence of (+) on the filter predicate (e.g., T2.y (+) > 5 in query C) indicates that this filter must be applied to the table T2 before the outer join takes place.


C.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+) and T2.y (+) > 5;


The ANSI left outer join query D is equivalent to C. Applying the filter on the right table in the left outer join is the same as combining the filter with the join condition.


D.
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x and T2.y > 5);


Similarly, the presence of (+) on the filter predicate, T2.y (+) IS NULL, in query E indicates that this filter must be applied to the table T2 before the outer join takes place.


E.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+) and T2.y (+) IS NULL;


The ANSI left outer join query F is equivalent to E.


F.
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x and T2.y IS NULL);


Consider query G. Oracle will apply the filter, T2.y IS NULL, in query G after the outer join has taken place. G will return only those rows of T2 that failed to join with T1 or those whose T2.y values happen to be null.


G.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+) and T2.y IS NULL;


The ANSI left outer join query H is equivalent to G, as the WHERE clause in H is applied after the left outer join is performed based on the condition specified in the ON clause.


H.
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x)
WHERE T2.y IS NULL;


Consider query I, where the filter on the left table is applied before or after the outer join takes place.


I.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+) and T1.Z > 4;


The ANSI left outer join query J is equivalent to query I.


J.
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x)
WHERE T1.Z > 4;


Lateral Views

In Oracle, ANSI left and right outerjoins are internally expressed in terms of left outerjoined lateral views. In many cases, a left outerjoined lateral view can be merged and the ANSI left (or right) outerjoin can be expressed entirely in terms of Oracle native left outerjoin operator. (A lateral view is an inline view that contains correlation referring to other tables that precede it in the FROM clause.)

Consider the ANSI left outer join query K, which is first represented internally as L.


K.
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x and T2.k = 5);

L.
SELECT T1.d, LV.c
FROM T1,
LATERAL (SELECT T2.C
FROM T2
WHERE T1.x = T2.x and T2.k = 5)(+) LV;


The lateral view in query L is merged to yield query M.


M.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+) and T2.k (+)= 5;


Consider query N, which expresses a left outerjoin in the ANSI join syntax. Currently query N cannot be expressed using the Oracle native left outer join operator.


N.
SELECT T1.m, T2.n
FROM T1 LEFT OUTER JOIN T2
ON (T1.h = 11 and T1.y = T2.y)
WHERE T1.q > 3;


The query N is converted into query O with a left outer-joined lateral view. The lateral view in O cannot be merged, since the filter on the left table specified in the ON clause must be part of the left outerjoin condition.


O.
SELECT T1.m, LV.n
FROM T1,
LATERAL(SELECT T2.n
FROM T2
WHERE T1.h = 11 and T1.y = T2.y)(+) LV
WHERE T1.q > 3;


Consider query P with a disjunction in the outer join condition. Currently N cannot be expressed using the Oracle native left outer join operator.


P.
SELECT T1.A, T2.B
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x OR T1.Z = T2.Z);


The query P will be converted into Q with a left outer-joined lateral view containing the disjunctive join condition.


Q.
SELECT T1.A, LV.B
FROM T1,
LATERAL (SELECT T2.B
FROM T2
WHERE T1.x = T2.x OR T1.Z = T2.Z) (+) LV;


ANSI Full Outerjoin

Before Oracle 11gR1 all ANSI full outerjoins were converted into a UNION ALL query with two branches, where one branch contained a left outerjoined lateral view and the other branch contained a NOT EXISTS subquery. A native support for hash full outerjoin was introduced in 11gR1 to overcome this problem. When the native full outerjoin, cannot be used, Oracle reverts to the pre-11gR1 strategy.

Consider query R, which specifies an ANSI full outerjoin.


R.
SELECT T1.c, T2.d
FROM T1 FULL OUTER JOIN T2
ON T1.x = T2.y;


Before 11gR1, Oracle would internally convert query R into S.


S.
SELECT T1.c, T2.d
FROM T1, T2
WHERE T1.x = T2.y (+)
UNION ALL
SELECT NULL, T2.d
FROM T2
WHERE NOT EXISTS
(SELECT 1 FROM T1 WHERE T1.x = T2.y);


With the native support of hash full outerjoin, R will be simply expressed as query T, where the view, VFOJ, is considered unmergeable.


T.
SELECT VFOJ.c, VFOJ.d
FROM (SELECT T1.c, T2.d
FROM T1, T2
WHERE T1.x F=F T2.y) VFOJ;


Conversion of Outerjoin into Inner Join

Consider query U. Here the filter predicate on the outer-joined table T2 does not contain the outerjoin operator (+); thus it will be applied after the left outerjoin has taken place. This will result in the elimination of all null appended rows of T2. Hence, Oracle converts the outer join into an inner join.


U.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+) and T2.y > 5;


The ANSI left outer join query V is equivalent to query U, as the WHERE clause in V is applied after the left outer join is performed based on the condition specified in the ON clause.


V.
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2
ON (T1.x = T2.x)
WHERE T2.y > 5;


Oracle converts the queries U and V into query W with an inner join.


W.
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x and T2.y > 5;



Q&A

Q1: I do not understand the queries N and O. What is the difference between
a filter appearing in the ON Clause or a filter appearing in the WHERE
clause?

A1: Consider two tables T11 and T22.

T11:
A | B
1 | 2
2 | 3
3 | 5

T22:
X | Y
7 | 2
8 | 4
9 | 4

The following ANSI left outer join query N’ involving
the tables T11 and T22 will return three rows, since
the filter, which always fails, is part of the join
condition. Although this join condition, which
comprises both the predicates in the ON clause,
always evaluates to FALSE, all the rows of the left
table T11 are retained in the result.

N’.
SELECT *
FROM T11 LEFT OUTER JOIN T22
ON (T11.A > 9 and T11.B = T22.Y);

A B X Y
------ ---------- ---------- ---------
1 2
2 3
3 5

However, if the filter, T11.A > 9, is moved to the WHERE clause,
the query will return zero rows.


Q2: Is the outer to inner join conversion a new feature?

A2: No. This feature has been avaliable since Release 7.


Q3: Has native full outer join been made available in
versions prior to 11gR1?

A3: Yes. It is available in 10.2.0.3 and 10.2.0.4, but not by
default.

10 comments:

Stew Ashton said...

Thanks so much for this blog! Like many others, I will be reading and re-reading every word.
One question: your last section shows the optimizer "silently" converting an outer join to an inner join when a filter predicate would eliminate the null appended rows. Is this new 11G behavior, or do previous versions do this? My experience with 9i leads me to believe that the needless outer joins stay and may degrade performance.
You could call this optimization the "Hibernate corrector", since Hibernate defaults to left outer joins everywhere :(

The Human Fly said...

Nice explanation.
However, is there any behavior change with the Optimizer when used ANSI standard syntax over the other one?

Regards

Jaffar

Anonymous said...

Hi

As you pointed out "Before Oracle 11gR1 all ANSI full outerjoins were converted into a UNION ALL query with two branches". Now, according to the following test it seams that it is already available as of 10.2.0.3, but not activated per default.

SQL> SELECT * FROM v$version WHERE rownum = 1;

BANNER
------------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bi

SQL> EXPLAIN PLAN FOR
2 SELECT /*+ NATIVE_FULL_OUTER_JOIN */ *
3 FROM t1 FULL OUTER JOIN t2 ON t1.id = t2.id;

Explained.

SQL> SELECT * FROM table(dbms_xplan.display(NULL, NULL, 'basic'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------

Plan hash value: 53297166

------------------------------------------
| Id | Operation | Name |
------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | VIEW | VW_FOJ_0 |
| 2 | HASH JOIN FULL OUTER| |
| 3 | TABLE ACCESS FULL | T1 |
| 4 | TABLE ACCESS FULL | T2 |
------------------------------------------

SQL> ALTER SESSION SET "_optimizer_native_full_outer_join" = force;

Session altered.

SQL> EXPLAIN PLAN FOR
2 SELECT *
3 FROM t1 FULL OUTER JOIN t2 ON t1.id = t2.id;

Explained.

SQL> SELECT * FROM table(dbms_xplan.display(NULL, NULL, 'basic'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------

Plan hash value: 53297166

------------------------------------------
| Id | Operation | Name |
------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | VIEW | VW_FOJ_0 |
| 2 | HASH JOIN FULL OUTER| |
| 3 | TABLE ACCESS FULL | T1 |
| 4 | TABLE ACCESS FULL | T2 |
------------------------------------------

So, my hope is that you are planning to activate it in 10.2.0.4 as well :-)

Cheers,
Chris

Flado said...

Regarding queries N and O:
Why is it that the predicate T1.h = 11 cannot be moved out of the lateral view? I understand that by definition, the ON clause predicates must be applied before the WHERE clause, but in this case (predicates on the LEFT table only) doing so wouldn't change the result set, would it?
I must be missing something...

Thanks for the blog!

Keep these posts coming :-)

Flado

Unknown said...

Thanks a lot for this superb analysis, it brings down my doubts about the ANSI syntax at much more accettable levels ;-)

robert said...

I do not understand your comment about join N. Since all rows from T1 are removed from the result where T1.h is not 11 this should be equivalent:

SELECT T1.m, T2.n
FROM T1, T2
WHERE T1.h = 11
and T1.y = T2.y(+)
and T1.q > 3;

What am I missing?

The6Campbells said...

can you clarify the general conditions where the new native FOJ cannot be applied.

Charles Schultz said...

This is awesome! The Q&A is priceless. Seriously, I hope you guys keep this up. It is a great learning experience for those of wishing to understand "how things work"; not only the original post, but the interactions you have with the visitors as well. Your commitment to "setting the record straight" is ... stellar.

Gints Plivna said...

Speaking about "Q1: I do not understand the queries N and O. What is the difference between a filter appearing in the ON Clause or a filter appearing in the WHERE clause?"
it is also nicely explained In Jonathan Gennick's 5 years old article What's in a Condition?

robert said...

Thanks Gints! I applied the same adjustments as Jonathan to my mental model and now all is well again. :-)

Cheers