Query Rewriter

DBIx::MyParse allows a SQL query to be parsed and then rewritten into a more efficient form. This allows query optimizations that are not included in the default MySQL optimizer to be performed.

Query rewriting can either happen at the client side, or in the middle between the client and the server, via a proxy module.

Contents

[edit] Example

A query that does not use an index:

SELECT a FROM b WHERE LEFT(c,4) = 'abcd'

can be rewritten to become:

SELECT a FROM b WHERE c LIKE 'abcd%'

[edit] Transformation

We start with the original condition:

LEFT(c, x) = s

[edit] How to arrive at an optimized expresson

MySQL will optimize this expression if one of the sides of the equation is a stand-alone indexed column, while the other side is a fixed expression that can be computed at query compilation time.

If s is not a fixed expression (e.g. a standalone string like 'abc'), our intended rewrite will replace this side with a function containing s which can then not be computed at query complation time. Therefore, in this case, this side of the condition will be "damaged" and no index will be used. So, if s is not a literal, we should not attempt a rewrite. Same applies for x -- if it changes for every record, no matter how we permute the operators, we will not be able to arrive at an optimized expression.

If s is a literal, then the other thing we need to achieve is have a stand-alone column on the left side of the expression. Therefore, we can do:

c LIKE F(s,x)

Where F() is a function that can be calculated only once at query compilation time while maintaining all the properties of the original expression.

[edit] NULL values

The first property to check is the behavoir with respect to NULL values. The original expression will return NULL if any of its three components is null, so we will need to preserve that. LIKE returns null if ether of its arguments is null, so what we need to ensure is that F(s,x) is null if either s or x is null. Since we require both of those to be literal values, we can check them using ISNULL(s) || ISNULL(x), however if they participate in any other function, then MySQL will check their nullability for us.


[edit] Trailing space

It is intuitive to think that we should write:

c LIKE CONCAT(s, '%')

except that the original equality operator behaves so that

'a ' = 'a               '

is also evaluated to 1. We do need to preserve that property, meaning that we should only compare the initial non-space characters of the two strings. <t>LIKE</tt> does compare initial characters, so we only need to research if presence of space characters does not influence the result. Space characters can be on both sides of the LEFT expression. So, we test:

'a ' LIKE 'a%'

works, however

'a' LIKE 'a %'

does not work. Therefore, we should not allow spaces in the second argument to LIKE. This argument is a CONCAT() function, so any spaces can only come from s, which we can pass through a RTRIM() to obtain the desired behavoir.

[edit] Character sets

The manual specifies that LIKE and = have different behavoir with respect to special collations and multibyte character sets. Therefore, it will only be safe to transform the query if its input arguments are from a safe collation.

Unfortunately it would be unsafe to assume equality between the two operators for the default MySQL collation, due to different characters and character pairs being sorted together in Nordic and Germanic languages. However, the ascii_general_ci should be safe enough.

We can use COLLATION() to check the collation of c, however this function works on individual records and is not optimized, so any index speedup will be lost.

So the particular optimization in this writeup may only be applied if we are a-priori certain that the column in question does not contain data that will trigger different behavoir between = and LIKE.

To check the behavoir of s, we can compare it against BINARY s. If those two strings are equal, then s does not contain any characters with special collation needs.

[edit] String Lengths

LIKE can return true even if the lengths are different, e.g. LEFT('abc', 2) = 'abc' will be false, however 'abc' LIKE 'abc%' evaluates to true. Simularily, LEFT('abcd', 3) = 'ab' will be false, however 'abcd' LIKE 'ab%' will be true.

To avoid that, we will only apply the optimization in case the lengths of the strings being compared are identical. We also examine each record that matches the optimization using the original LEFT() function, in order to remove any false positives.

If x is 0, the expression must be true only if s is an empty string, so we only apply the optimization if x is greater than 0.

[edit] Special characters

The final thing that can trip us off is the presence of _ or % characters in s, so we need to check for those explicitly and not optimize if they are present.

[edit] Final query

We combine all our suitability tests together to compose:

IF(
    INSTR(s,'_') = 0 AND
    INSTR(s,'%') = 0 AND
    x > 0 AND
    LENGTH(s) = x AND
    s = BINARY s
    c LIKE CONCAT(RTRIM(LEFT(s,x)), '%'),
  1
) AND LEFT(c, x) = s

If the expression is not suitable for optimization, then 1 is returned, which allows the original expression to be evaluated.


[edit] Code

Using DBIx::MyParse, the query rewrite being discussed can be implemented as follows:

Retrieved from "http://forge.mysql.com/wiki/Query_Rewriter"

This page has been accessed 7,500 times. This page was last modified 10:31, 14 September 2007.

Find

Browse
MySQLForge
Main Page
Current events
Recent changes
Random page
Help
Edit
Edit this page
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Special pages
New pages
File list
Statistics
Bug reports
More...