Query Rewriter
DBIx::MyParse allows a SQL query to be parsed and then rewritten into a more efficient form. This allows query optimizations that are not included in the default MySQL optimizer to be performed.
Query rewriting can either happen at the client side, or in the middle between the client and the server, via a proxy module.
Contents |
[edit] Example
A query that does not use an index:
SELECT a FROM b WHERE LEFT(c,4) = 'abcd'
can be rewritten to become:
SELECT a FROM b WHERE c LIKE 'abcd%'
[edit] Transformation
We start with the original condition:
LEFT(c, x) = s
[edit] How to arrive at an optimized expresson
MySQL will optimize this expression if one of the sides of the equation is a stand-alone indexed column, while the other side is a fixed expression that can be computed at query compilation time.
If s is not a fixed expression (e.g. a standalone string like 'abc'), our intended rewrite will replace this side with a function containing s which can then not be computed at query complation time. Therefore, in this case, this side of the condition will be "damaged" and no index will be used. So, if s is not a literal, we should not attempt a rewrite. Same applies for x -- if it changes for every record, no matter how we permute the operators, we will not be able to arrive at an optimized expression.
If s is a literal, then the other thing we need to achieve is have a stand-alone column on the left side of the expression. Therefore, we can do:
c LIKE F(s,x)
Where F() is a function that can be calculated only once at query compilation time while maintaining all the properties of the original expression.
[edit] NULL values
The first property to check is the behavoir with respect to NULL values. The original expression will return NULL if any of its three components is null, so we will need to preserve that. LIKE returns null if ether of its arguments is null, so what we need to ensure is that F(s,x) is null if either s or x is null. Since we require both of those to be literal values, we can check them using ISNULL(s) || ISNULL(x), however if they participate in any other function, then MySQL will check their nullability for us.
[edit] Trailing space
It is intuitive to think that we should write:
c LIKE CONCAT(s, '%')
except that the original equality operator behaves so that
'a ' = 'a '
is also evaluated to 1. We do need to preserve that property, meaning that we should only compare the initial non-space characters of the two strings. <t>LIKE</tt> does compare initial characters, so we only need to research if presence of space characters does not influence the result. Space characters can be on both sides of the LEFT expression. So, we test:
'a ' LIKE 'a%'
works, however
'a' LIKE 'a %'
does not work. Therefore, we should not allow spaces in the second argument to LIKE. This argument is a CONCAT() function, so any spaces can only come from s, which we can pass through a RTRIM() to obtain the desired behavoir.
[edit] Character sets
The manual specifies that LIKE and = have different behavoir with respect to special collations and multibyte character sets. Therefore, it will only be safe to transform the query if its input arguments are from a safe collation.
Unfortunately it would be unsafe to assume equality between the two operators for the default MySQL collation, due to different characters and character pairs being sorted together in Nordic and Germanic languages. However, the ascii_general_ci should be safe enough.
We can use COLLATION() to check the collation of c, however this function works on individual records and is not optimized, so any index speedup will be lost.
So the particular optimization in this writeup may only be applied if we are a-priori certain that the column in question does not contain data that will trigger different behavoir between = and LIKE.
To check the behavoir of s, we can compare it against BINARY s. If those two strings are equal, then s does not contain any characters with special collation needs.
[edit] String Lengths
LIKE can return true even if the lengths are different, e.g. LEFT('abc', 2) = 'abc' will be false, however 'abc' LIKE 'abc%' evaluates to true. Simularily, LEFT('abcd', 3) = 'ab' will be false, however 'abcd' LIKE 'ab%' will be true.
To avoid that, we will only apply the optimization in case the lengths of the strings being compared are identical. We also examine each record that matches the optimization using the original LEFT() function, in order to remove any false positives.
If x is 0, the expression must be true only if s is an empty string, so we only apply the optimization if x is greater than 0.
[edit] Special characters
The final thing that can trip us off is the presence of _ or % characters in s, so we need to check for those explicitly and not optimize if they are present.
[edit] Final query
We combine all our suitability tests together to compose:
IF(
INSTR(s,'_') = 0 AND
INSTR(s,'%') = 0 AND
x > 0 AND
LENGTH(s) = x AND
s = BINARY s
c LIKE CONCAT(RTRIM(LEFT(s,x)), '%'),
1
) AND LEFT(c, x) = s
If the expression is not suitable for optimization, then 1 is returned, which allows the original expression to be evaluated.
[edit] Code
Using DBIx::MyParse, the query rewrite being discussed can be implemented as follows: