Sunday, June 27, 2010

Subquery And Null

Last week, I completed a four-day Teradata Advanced Sql course offered by Teradata Australia Pty Ltd and sponsored by my employer. In this series, I will document some of the insights that I gained and the tricks of the trade that I learnt. Today, I will write about some gotchas associated with subqueries.

The following is a garden variety subquery that I write all the time:
SELECT *
FROM t1
WHERE some_col NOT IN (SELECT a_col FROM t2);

The above works as long as the subquery does not return a NULL in what is effectively a comma separated list of values within the pair of parenthesis. However, things go pear-shaped if the subquery returns one or more NULLs in the list. When that happens, the outer query returns nothing.

The outer query compares some_col with each value returned by the subquery.  Say, the subquery returns val1, val2 and NULL. For the outer query to return a row, the condition that needs to be true is this: 
some_col not equal to val1
AND some_col not equal to val2
AND some_col not equal to NULL;

Comparison with a NULL results in NULL, and so does ANDing a value with a NULL. Therefore, the preceding condition never becomes true, and the outer query does not return anything.

NULL is not an issue in the following query (it has 'IN' instead of 'NOT IN' in the WHERE clause):

SELECT *
FROM t1
WHERE some_col IN (SELECT a_col FROM t2);

In the above query, some_col needs to be equal to only one of the values returned by the subquery for the outer query to return a row. In other words, some_col gets ORed with the values returned by the subquery. Therefore, NULL is not an issue.

There are ways to circumvent the problems caused by NULLs in a subquery, and they will be discussed in the next installment of this series.

No comments:

Post a Comment