9/2/12

SEMI/ANTI JOINS


SEMI JOIN

A “semi-join” between two tables returns rows from the first table where one or more matches
are found in the second table.

Semi-joins are written using the EXISTS or IN constructs.

a semi-join can be performed using the nested loops, hash join, or merge join algorithms

Oracle provides the NL_SJ, HASH_SJ, and MERGE_SJ hints in order for you to manipulate the semi-join process if you need to. The hint is applied to the subquery of the EXISTS or IN clause, not the main body of the query itself.

if the MAIN query is more selective i.e returns less number of rows then its better to use EXIST
if the subquery is more selective then its better to use "IN" operator.

ANTI JOIN:

An “anti-join” between two tables returns rows from the first table where no matches are found in the second table. Anti-joins are written using the NOT EXISTS or NOT IN constructs.

if the subquery of "NOT IN" results at least one NULL value then entire NOT IN will be false
and we will not get any results, But "NOT EXIST" will consider NULL as value and returns the value.



Restrictions on SEMI/ANTI joins:

1.      If there is a DISTINCT clause in the query then oracle can’t use semi joins (there is one alternative for DISTINCT claue, see below example i.e using NO_MERGE hint)
2.      The EXIST/IN is part of OR operation  then oracle can’t use semi joins
3.       If the query contains UNION set operator then oracle can’t use semi joins


If there is a DISTINCT clause in the query or the EXIST/IN is part of OR operation then
we can't use SEMI/ANTI joins.

SQL> select DISTINCT username from tab1 where exist 
    (select 1 from tab2 where tabb2.col1 = tab1.col1 );
 
-- The above query will not use SEMI JOIN  because we have the DISTINCT operator in the query

Workaround:

SQL> select /*+ no_merge(tab) */
            DISTINCT username
       from (
              select username from tab1 where exist 
              (select 1 from tab2 where tabb2.col1 = tab1.col1 )
            ) tab;

Example:


Create CUSTOMER table:

SQL> create table test_table_cust_123
     (name varchar2(20), state varchar2(20), cust_id number);
Table created.


begin
for i in 1 .. 100
loop
insert into test_table_cust_123 values
(dbms_random.string('U', 10), dbms_random.string('U', 2),
 round(dbms_random.value(0, 100)) );
end loop
commit;
end;
/

delete duplicate cust_id's:

SQL> delete from test_table_cust_123 where cust_id in
     (select cust_id from (select cust_id, count(*) from test_table_cust_123
     group by cust_id having count(*) > 2));

Create ORDER table:

SQL> create table test_table_order_123
     (item varchar2(20), price number, order_date date, cust_id number);
Table created.

begin
for i in 1 .. 100
loop
insert into test_table_order_123 values (
dbms_random.string('U', 10), round(dbms_random.value(1000, 100000)),
sysdate - round(dbms_random.value(0, 100)), round(dbms_random.value(0, 100)) );
end loop
commit;
end;
/


query:

SELECT  C.name,
        C.cust_id
  FROM  test_table_cust_123 C
 WHERE  C.state = 'CA'
AND EXISTS
          (
            SELECT 1
              FROM test_table_order_123 O
             WHERE O.cust_id = C.cust_id
               AND O.order_date > SYSDATE - 3
          )
ORDER BY C.name;

The above query is using the HASH SEMI JOIN, the execution plan was shown below:

---------------------------------------------------------------------------------
| Id  | Operation           | Name                 | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                      |     1 |    59 |     6  (34)|
|   1 |  SORT ORDER BY      |                      |     1 |    59 |     6  (34)|
|*  2 |   HASH JOIN SEMI    |                      |     1 |    59 |     5  (20)|
|*  3 |    TABLE ACCESS FULL| TEST_TABLE_CUST_123  |     2 |    74 |     2   (0)|
|*  4 |    TABLE ACCESS FULL| TEST_TABLE_ORDER_123 |     1 |    22 |     2   (0)|
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("O"."CUST_ID"="C"."CUST_ID")
   3 - filter("C"."STATE"='CA')
   4 - filter("O"."ORDER_DATE">SYSDATE@!-3)

We can force the query to use Nested Loop SEMI Join  by using the hint  NL_SJ  in the EXIST subquery block, The execution plan was shown below:
 

---------------------------------------------------------------------------------
| Id  | Operation           | Name                 | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                      |     1 |    59 |     5  (20)|
|   1 |  SORT ORDER BY      |                      |     1 |    59 |     5  (20)|
|   2 |   NESTED LOOPS SEMI |                      |     1 |    59 |     4   (0)|
|*  3 |    TABLE ACCESS FULL| TEST_TABLE_CUST_123  |     2 |    74 |     2   (0)|
|*  4 |    TABLE ACCESS FULL| TEST_TABLE_ORDER_123 |     1 |    22 |     1   (0)|
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("C"."STATE"='CA')
   4 - filter("O"."ORDER_DATE">SYSDATE@!-3 AND "O"."CUST_ID"="C"."CUST_ID")


We can force the query to not use the SEMI join and to use FILTER operation followd by a SORT, This can be achived with NO_UNNEST hint in the EXIT subquery. Execution plan was shown below:

---------------------------------------------------------------------------------
| Id  | Operation           | Name                 | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                      |     1 |    37 |     5  (20)|
|   1 |  SORT ORDER BY      |                      |     1 |    37 |     5  (20)|
|*  2 |   FILTER            |                      |       |       |            |        
|*  3 |    TABLE ACCESS FULL| TEST_TABLE_CUST_123  |     2 |    74 |     2   (0)|
|*  4 |    TABLE ACCESS FULL| TEST_TABLE_ORDER_123 |     1 |    22 |     2   (0)|
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "TEST_TABLE_ORDER_123" "O"
              WHERE "O"."CUST_ID"=:B1 AND "O"."ORDER_DATE">SYSDATE@!-3))
   3 - filter("C"."STATE"='CA')
   4 - filter("O"."CUST_ID"=:B1 AND "O"."ORDER_DATE">SYSDATE@!-3