Rao, J., Pirahesh, H., & Zuzarte, C. (2004). Canonical abstraction for outerjoin optimization. SIGMOD, 671. https://doi.org/10.1145/1007568.1007643

This paper presents the CBA (compensation-based approach) to solve the join reordering problem. It introduces two compensation operators, through which outer join queries could be converted into a canonical representation, just like inner join queries.

The authors are inspired by the fact that an inner join query can be represented by:

Then, the paper purposes a similar representation for outer join queries:

The deep reason why the authors try to pursue such a canonical abstraction is: with this nice form, we can achieve full commutativity and transitivity among all relations in the query. In this paper, a outer join query is defined to be one query with any number of left outerjoins and inner joins. Such a query can be represented by a binary tree, in which the leaf nodes are base relations and internal nodes are join predicates.

Then, we define 3 operators used in the canonical abstraction:

Then, we summarize that an outer join predicate nullifies the null-producing relation, while an inner join predicate nullifies both relations. The paper then points out that nullification operators themselves are not commutative in general. Two nullification operators are commutative if they nullify the same set of attributes. This is because, when the outer nullification is interchanged to be the inner one, the nulls generated by the original inner nullification (current outer nullification) will no longer be seen by the original outer nullification (current inner nullification). Let’s revisit the definition of nullification operator: it says the predicate p does not evaluate to true. In other words, it will nullify the attributes when p is either false or null. Therefore, if the nulls generated are not seen, the result will be different.

Now, the last step before achieving a canonical abstraction for outer join queries is that nullification operators are not interchangeable. Under the condition that all predicates are null-intolerant, the paper tries to make nullification operators interchangable. The addition of these implied nullification operations will make nullification operators interchangable. The reason is that these implied nullification will ensure that all nulls will be generated regardless of the order of nullification operators. An additional benefit of applying these implied nullification is that transitivity will be applied across two outer joins.

Then, the paper introduces the concept of nullification sets. Given a relation R, its nullification set NSr is a set of all predicates that can nullify R if not true. Then an algorithm is presented to populate all nullification sets by iterating the query tree from bottom-up in the post-order.

It is guaranteed that all nullification sets in the operator tree are complete upon one bottom-up post-order traversal. The nullification sets generated have 2 properties:

Since the nullification operators are interchangable now, we have fixed the last gap before achieving the canonical abstraction for an outer join query. Now, we can enjoy the benefits of better join reorderability for outer join queries. With the nullification sets introduced, we have an additional benefit that predicates within the same ON clause can be split and do not have to be applied together.

Then, the authors try to organize these nullification sets into a directed graph. Each node represent a unique nullification set (if the nullification sets of two relations are the same, they belong to the same node). There is an edge from node a to node b if NSa is a subset of NSb. This graph is referred to as DAGns. The DAG has the following properties:

Following up, the paper tries to purpose a DP-style bottom-up plan enumeration algorithm. To distinguish among plans corresponding to the same logical expression, plan properties are used and stored along the way. To extend the optimizer for outer join support, we need to add a plan property to store the nullification set after each join node. During the enumeration process, the generation of nullification sets is tracked and any missing nullification operation will be added in the end.

To implement a best-match operator, the paper purposes to do a sorting to obtain a favorable ordering (in which nulls are sorted last). Then, we can filter out all spurious tuples via a one-pass iteration. To get access to the previous tuple, the authors purpose to use the OVER ... WINDOW clause in SQL:1999 standard. This can also help us deal with the special case where the first tuple does not have a previous tuple.

Questions