Category: MySQLDevelopment

MySQL Internals Transformations

← Back to MySQL Internals overview page

Contents

[edit] How MySQL Transforms Subqueries

The Item_subselect virtual method select_transformer is used to rewrite subqueries. It is called from Item_subselect::init (which is called just after the call to the fix_fields() method for all items in JOIN::prepare).

[edit] Item_in_subselect::select_transformer

Item_in_subselect::select_transformer is divided into two parts, one for the scalar left part and one for the row left part.

[edit] Scalar IN Subquery

To rewrite a scalar IN subquery, the Item_in_subselect::single_value_transformer method is used. The scalar IN subquery will be replaced with an Item_in_optimizer item.

An Item_in_optimizer item is a special boolean function. On a value request (one of val, val_int, or val_str methods) it evaluates the left expression of the IN by storing its value in a cache item (one of Item_cache* items), then it tests the cache to see whether it is NULL. If left expression (cache) is NULL, then Item_in_optimizer returns NULL, else it evaluates Item_in_subselect.

Example queries.

a) SELECT * from t1 where t1.a in (SELECT t2.a FROM t2);
b) SELECT * from t1 where t1.a in (SELECT t2.a FROM t2 GROUP BY t2.a);
(in WHERE and HAVING in case 'a' and in HAVING in case 'b')

<left_expression> IN (SELECT <item> ...) will be represented as follows:

                        +-----------------+
                         |Item_in_optimizer|
                         +-----------------+
                                  |
            +---------------------+------------+
            |                                  |
 +-----------------------+             +-----------------+
 |   <left_expression>   |             |Item_in_subselect|
 |                       |             +-----------------+
 +-----------------------+                      |
 |<left_expression cache>|          +-----------+-----------+
 |                       |          |                       |
 +-----------------------+          |                       |
            ^                 +----------+        +--------------------+
            +<<<<<<<<<<<<<<<<<| Item_ref |    +<<<|Item_ref_null_helper|
                              +----------+    V   +--------------------+
                                              V   +--------------------+
                                              +>>>|       <item>       |
                                                  +--------------------+
 

where <<<<<<<<< is reference in meaning of Item_ref.

Item_ref is used to point to <left_expression cache>, because at the time of transformation we know only the address of the variable where the cache pointer will be stored.

If the select statement has an ORDER BY clause, it will be wiped out, because there is no sense in ORDER BY without LIMIT here.

If IN subquery union, the condition of every select in the UNION will be changed individually.

If a condition needs to be added to the WHERE clause, it will be presented as (item OR item IS NULL) and Item_is_not_null_test(item) will be added to the HAVING clause. Item_is_not_null_test registers a NULL value the way Item_ref_null_helper does it, and returns FALSE if the argument is NULL. With the above trick, we will register NULL value of Item even for the case of index optimization of a WHERE clause (case 'a' in the following example).

The following are examples of IN transformations:

<left_expression> IN (SELECT <item> FROM t WHERE <where_exp>)

If returning NULL correctly would make sense, the above will become:

(SELECT 1 FROM t
  WHERE
    <where_exp> and
    (Item_ref(<cached_left_expression>)=<item> or
     <Item> is null)
  HAVING Item_is_not_null_test(<item>))

When a subquery is marked as the top item of the WHERE clause, it will become:

(SELECT 1 FROM t
  WHERE
     <where_exp> and
     Item_ref(<cached_left_expression>)=<item>)
<left_expression> IN (SELECT <item> FROM t
                           HAVING <having_expr>
                           ORDER BY 1)

will be represented as

(SELECT <item> as ref_null_helper FROM t
   HAVING <having_exp> AND
     Item_ref(<cached_left_expression>) = Item_ref_null_helper(item))
<left_expression> IN (SELECT <item> UNION ...)

will become

(SELECT 1
   HAVING Item_ref(<cached_left_expression>)=
          <Item_null_helper(<Item>)>
 UNION ...)

(HAVING without FROM is a syntax error, but a HAVING condition is checked even for subquery without FROM)

<left_expression> IN (select <item>)

will be completely replaced with <left_expression> = <item>

Now conditions (WHERE (a) or HAVING (b)) will be changed, depending on the select, in the following way:

If subquery contains a HAVING clause, SUM() function or GROUP BY (example 1), then the item list will be unchanged and an Item_ref_null_helper reference will be created on item list element. A condition will be added to the HAVING.

If the subquery does not contain HAVING, SUM() function, or GROUP BY (example 2), then:

A single select without a FROM clause will be reduced to just <left_expression> = <item> without use of Item_in_optimizer.

[edit] Row IN Subquery

To rewrite a row IN subquery, the method used is Item_in_subselect::row_value_transformer. It works in almost the same way as the scalar analog, but works with Item_cache_row for caching left expression and uses references for elements of Item_cache_row. To refer to the item list, it uses Item_ref_null_helper(ref_array+i).

A subquery with HAVING, SUM() function, or GROUP BY will transformed in the following way:

ROW(l1, l2, ... lN) IN (SELECT i1, i2, ... iN FROM t HAVING <having_expr>)

will become:

(SELECT i1, i2, ... iN FROM t
   HAVING <having_expr> and
     <cache_l0> = <Item_ref_null_helper(ref_array[0]> AND
     <cache_l1> = <Item_ref_null_helper(ref_array[1])> AND
     ...
     <cache_lN-1> = <Item_ref_null_helper(ref_array[N-1]>)

SELECT without FROM will be transformed in this way, too.

It will be the same for other subqueries, except for the WHERE clause.

[edit] Item_allany_subselect

Item_allany_subselect is inherited from Item_in_subselect. ALL/ANY/SOME use the same algorithm (and the same method of Item_in_subselect) as scalar IN, but use a different function instead of =.

ANY/SOME use the same function that was listed after the left expression.

ALL uses an inverted function, and all subqueries passed as arguments to Item_func_not_all (Item_func_not_all is a special NOT function used in optimization, see following).

But before above transformation ability of independent ALL/ANY/SOME optimization will be checked (query is independent, operation is one of <, =<, >, >=, returning correct NULL have no sense (top level of WHERE clause) and it is not row subquery).

For such queries, the following transformation can be done:

val >  ALL (SELECT...) -> val >  MAX (SELECT...)
val <  ALL (SELECT...) -> val <  MIN (SELECT...)
val >  ANY (SELECT...) -> val >  MIN (SELECT...)
val <  ANY (SELECT...) -> val <  MAX (SELECT...)
val >= ALL (SELECT...) -> val >= MAX (SELECT...)
val <= ALL (SELECT...) -> val <= MIN (SELECT...)
val >= ANY (SELECT...) -> val >= MIN (SELECT...)
val <= ANY (SELECT...) -> val <= MAX (SELECT...)

ALL subqueries already have NOT before them. This problem can be solved with help of special NOT, which can bring 'top' tag to its argument and correctly process NULL if it is 'top' item (return TRUE if argument is NULL if it is 'top' item). Let's call this operation _NOT_. Then we will have following table of transformation:

val >  ALL (SELECT...) -> _NOT_ val >= MAX (SELECT...)
val <  ALL (SELECT...) -> _NOT_ val <= MIN (SELECT...)
val >  ANY (SELECT...) ->       val <  MIN (SELECT...)
val <  ANY (SELECT...) ->       val >  MAX (SELECT...)
val >= ALL (SELECT...) -> _NOT_ val >  MAX (SELECT...)
val <= ALL (SELECT...) -> _NOT_ val <  MIN (SELECT...)
val >= ANY (SELECT...) ->       val <= MIN (SELECT...)
val <= ANY (SELECT...) ->       val >= MAX (SELECT...)

If subquery does not contain grouping and aggregate function, above subquery can be rewritten with MAX()/MIN() aggregate function, for example:

val > ANY (SELECT item ...) -> val < (SELECT MIN(item)...)

For queries with aggregate function and/or grouping, special Item_maxmin_subselect will be used. This subquery will return the maximum (minimum) value of the result set.

[edit] Item_singlerow_subselect

Item_singlerow_subselect will be rewritten only if it contains no FROM clause, and it is not part of UNION, and it is a scalar subquery. For now, there will be no conversion of subqueries with field or reference on top of item list (on the one hand, we can't change the name of such items, but on the other hand, we should assign to it the name of the whole subquery which will be reduced).

The following will not be reduced:

SELECT a;
SELECT 1 UNION SELECT 2;
SELECT 1 FROM t1;

The following select will be reduced:

SELECT 1;
SELECT a+2;

Such a subquery will be completely replaced by its expression from item list and its SELECT_LEX and SELECT_LEX_UNIT will be removed from SELECT_LEX's tree.

But every Item_field and Item_ref of that expression will be marked for processing by a special fix_fields() procedure. The fix_fields() procedures for such Items will be performed in the same way as for items of an inner subquery. Also, if this expression is Item_fields or Item_ref, then the name of this new item will be the same as the name of this item (but not (SELECT ...)). This is done to prevent broken references on such items from more inner subqueries.

Retrieved from "http://forge.mysql.com/wiki/MySQL_Internals_Transformations"

This page has been accessed 10,908 times. This page was last modified 19:54, 12 July 2007.

Find

Browse
MySQLForge
Main Page
Current events
Recent changes
Random page
Help
Edit
Edit this page
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Special pages
New pages
File list
Statistics
Bug reports
More...