6.0 Subquery Optimization Benchmarks
Contents |
SergeyP has analyzed the performance of new subquery optimizations that are already pushed into MySQL 6.0. For short description of these optimizations, see 6.0 Subquery Optimization Cheatsheet. Ongoing work and future plans are listed at Subquery_Works.
[edit] Summary statistics
Analyzed 25 subquery performance bugs, which is a random sample of all subquery performance bugs we had (TODO: Ranger: expand the sampling)
- 7 cases are not handled
- of which 2 will be handled when WL#3830 is complete
- 17 cases (17/25=68%) are already handled by new subquery optimizations.
- 3 bugs (4 queries) of those that are handled by the new optimizations are easy to try out (have both query and dataset, no guessing required), I've ran them and for all of them observed 10x to 9300x speedup, which is 3350x speedup on average.
[edit] Preliminary conclusions
These conclusions were made after analyzing bugs db cases:
- New 6.0 optimizations cover roughly 2/3 of subquery cases we have.
- New optimizations make MySQL faster in most cases, 1000x speedup is normal
- (TODO: pending more accurate benchmarks from Ranger)
- WL#3985 "Cost-based choice between semi-join and materialization" is relevant and should be done before the release. (see how much better we could do in Bug#28257)
- WL#3830 "Subquery optimization: Materialization: Partial matching of tuples with NULL components" is relevant and should be done before the release.
- The next low hanging fruit seems to be WL#3485 "
FROM (view-mergeable-select)subquery optimization" (Hi Evgen!:)
- Analysis of what subquery user cases are there is available here: Blog: Observations about subquery use cases.
[edit] Detailed bug list
I've done two searches, one for "subquery" and one for "subselect". Unfortunately I didn't notice I've left the Status: select in "Active" state. My intent was to find completely everything.
| BUG# | Summary | Subquery type | Testcase available? | Handled by 6.0 optimizations? | Speedup |
|---|---|---|---|---|---|
| Bug#4132 | MySQL hang up when run a subquery. | select ... where oe not in (/*uncorrelated*/ select min(x) ... group by y) | Query only | Pending: materialization (waiting for WL#3830) | |
| Bug#8139 | subselect index not used for "in" but used for "=" | select ... where oe in (uncorrelated semi-join-ok select) | Query only | Yes, both | |
| Bug#9021 | Query with subquery does not use index in outer query | select ... where oe in (uncorrelated semi-join-ok select) | Query only | Yes, both | |
| Bug#9090 | Optimization Problem with a subquery in an IN Operation | select ... where oe in (uncorrelated select w/ group by) | Query only | Yes, materialization | |
| Bug#10312 | subquery cant be optimized correctly | select ... where oe in (uncorrelated semi-join-ok select) | Query only | Yes, both | |
| Bug#10815 | Sub-queries failure when using GROUP BY | select ... where oe in (uncorrelated group-by select) | Query only | Yes, materialization | |
| Bug#11254 | Subquery IN Crashes MySQL (Service Pegs CPU) | select ... where oe in (uncorrellated semi-join-ok select) and (another uncorrellated semi-join-ok select) | Query and data | Yes, both | Yes, 9300 times |
| Bug#14253 | Optimizer uses partial indexes for subqueries, where it should not | select ... where exists (correlated semi-join ok select) | Query and data | No, it's EXISTS | |
| Bug#15135 | Dependent subquery performs very badly | select ... where oe in (uncorrelated select w/ group by) | Query and data | Yes, materialization | TODO |
| Bug#17560 | select with subselect does not use indexed key | select ... where oe in (uncorrellated semi-join-ok select) | Query only | Yes, both | |
| Bug#17952 | nested query super slow..should not be | select ... where oe in (uncorrelated select) | Query only | Yes, both | |
| Bug#18465 | Optimizer/execution plan bug with dependent subqueries in ON clauses of JOINs | select ... inner join on ((select min(...) where ie=oe1) = oe2) ... | Query and data | No | |
| Bug#18826 | Using subquery in "in (...)" clause forces full table scan | select ... where oe in (uncorrelated semi-join-ok select) | Query only | Yes, both | |
| Bug#19895 | No optimization of where clause subqueries rewritable as joins | select ... where oe in (semi-join-ok select), where exists (...) | Query only | Yes, both | |
| Bug#22765 | subselect uses all partitions instead of the actual needed ones. | select ... where oe in (semi-join-ok uncorrelated select) | Query only | Yes, both | |
| Bug#24770 | update with subquery slow | update ... where oe in (uncorrelated semi-join-ok select) | Query and data | Yes, materialization (todo: try it) Pending: handling UPDATEs with w/ semi-join |
TODO |
| Bug#25796 | Query plan is not using expected index | select ... where oe in (uncorrelated semi-join-ok select) | Query only | Yes, both | |
| Bug#25926 | dependent suqbuery takes too long | select ... where oe in (uncorrelated group-by select) | Query and data | Yes, materialization | Yes, 950 times |
| Bug#27199 | Performance problem when query contains a left join against a subquery | select ... left join (mergeable select ...) ... | Query and data | No | |
| Bug#27452 | Correlated subquery produces inefficient optimizer plan for INFORMATION_SCHEMA | select ... left join (mergeable select ...) ... | Query only | No | |
| Bug#28257 | MySQL hangs on "Copying to tmp table" or "Sending data" - Part 2 of Bug #14070 | select ... where oe in (uncorrelated group-by select) select ... where oe in (uncorrelated semi-join-ok select with distinct) |
Query and data | Materialization - yes+yes, semi-join - yes+no | Yes. 10 times and 3140 times |
| Bug#30659 | 300% Performance Regression 5.0.45 -> 5.2.6 on Correlated Subquery | select ... from (uncorrelated group-by select) select ... where oe=(select max(ie) ... where correlated_cond) |
Query and data | No, its FROM or correlated group-by. | |
| Bug#30883 | Dependent Sub Queries Very Slow | select ... where oe not in (uncorrelated semi-join-ok select) | Query only | Pending: materialization (waiting for WL#3830) | |
| Bug#31468 | EXPLAIN slow when joining big table to a subquery | select ... from (select ...) | Query and data | No | |
| Bug#32341 | Subquery does not use index with session variable, works with string literal | select ... where oe = (select ... where .. @var .. order by limit 1) | Query and data | No | |
| Bug#32665 | Query with dependent subquery is too slow | select .. where oe in (uncorrelated semi-join-ok select) | Query only | Yes, both |
[edit] Results of search in DBT-x benchmark suites
- DBT1 - No subqueries
- DBT2 - No subqueries
- DBT3 - Subquery cases:
| query no | query pattern | Handled by 6.0 optimizations? |
|---|---|---|
| 2 | select ... where oe=(uncorrelated select min(..) from 4 tables) | No |
| 4 | select ... where exists (select * from tbl where oe=ie and uncorr_cond) | No, EXISTs |
| 7 | select ... from (view-mergeable select) group by x order by y | No |
| 8 | select ... from (view-mergeable select) /*no where*/ group by x order by y |
No |
| 9 | select ... from (view-mergeable select) /*no where*/ group by x order by y |
No |
| 11 | select ... having sum(..) > (uncorrelated select(sum) ...) | No |
| 13 | select ... from (group-by select) /*no where*/ group by x order by y | No |
| 15 | select ... from tbl, group_by_view_1 where group_by_view_1.group_col=tbl.col and oe = (select max(col) from group_by_view1) |
No |
| 16 | select ... where oe not in (uncorrelated semi-join-ok select) | Yes, materialization |
| 17 | select ... where oe < (select avg( ...) ... where correlated cond) | No |
| 18 | select ... where oe in (uncorrelated group-by select) | Yes, materialization |
| 20 | select ... where oe in (semi-join-ok select where ... and oe in > (select sum(x) from tbl where corr_cond)) | No |
| 21 | select ... where exists (select ... where oe1=ie1 and oe2!=ie2) and not exists (select ... where oe1=ie1 and oe2!=ie2) |
No |
| 22 | select .. from (select ... where oe > (uncorrelated select avg(x)) and not exists (select * from tbl where oe=ie)) /*no where*/ order by x group by y |
No |
- DBT4 - No subqueries
[edit] Results of search in MySQL support cases
The search found 47 cases related to subqueries. The below table has private info removed, the full table is at 6.0_Subquery_Optimization_Benchmarks page on the intranet)
(note: in query abbreviations, i='inner'(i.e. defined in the subquery) o='outer', e='expression'. 'oe'='an expression that depends on outer tables'.)
| Query type | Handled by new optimization? | Notes |
| WHERE oe IN (uncorrelated semi-join-ok select) | Yes, both | |
| WHERE oe= ANY(uncorrelated semi-join-ok select) | Yes, both | subquery has dummy ORDER BY; Testcase available |
| WHERE oe IN (uncorrelated semi-join-ok select) | Yes, both | Testcase available |
| WHERE oe IN (uncorrelated semi-join-ok select) | Yes, both | |
| WHERE oe IN (uncorrelated semi-join-ok select) | Yes, both | |
| WHERE .. IN (uncorrelated semi-join-ok select) | Yes, both | |
| WHERE oe IN (correlated semi-join-ok select) | Yes, semi-join | |
| WHERE oe IN (uncorrelated semi-join-ok select w/ selective clauses) | Yes, both | Inside-out order needed |
| WHERE oe IN (uncorrelated semi-join-ok select w/ selective clauses) | Yes, both | Inside-out order seems to be needed |
| WHERE oe IN (uncorrelated semi-join-ok select w/ selective clauses) | Yes, both | |
| equivalent to WHERE oe IN (uncorrelated semi-join-ok select w/ selective clauses) | Yes, both | |
| WHERE oe IN (uncorrelated semi-join-ok select w/ selective clauses) | Yes, both | Subquery is a 5-way inner join |
| WHERE oe IN (uncorrelated semi-join-ok select) | Yes, both | Inside-out order seems to be needed |
| WHERE oe IN (uncorrelated semi-join-ok select distinct(x) ... ) | Yes, both | |
| WHERE oe IN (SELECT x FROM t1, tw WHERE uncorrelated_cond GROUP BY x) | No? | Will be handleable by both if we make the optimizer detect and ignore such redundant GROUP BY clauses |
| UPDATE ... WHERE oe NOT IN (uncorrelated semi-join-ok select, no access to updated table) | Yes, materialization | Could use semi-join if we make update code convert such queries to multi-table UPDATEs |
| select ... from outer_table where oe in (uncorrelated semi-join-ok select w/ LIMIT 1) | Yes, both | Why LIMIT 1? |
| HAVING oe IN (SELECT sum() ... FROM (SELECT ... ) no_where GROUP BY ...) | Yes, materialization | |
| WHERE oe IN (uncorreated semi-join-ok select distinct) | Yes, both | |
| WHERE oe IN (uncorrelated semi-join-ok select) | Yes, both | |
| WHERE outer_tbl.col1 = (SELECT max(col) FROM inner_tbl WHERE inner_tbl.col1=outer_tbl.col1) AND outer_tbl.col1 IN (1,2,3) | No | Rewriteable as FROM. Is this a case for equality propagation to span subquery bounds? |
| DELETE .. WHERE NOT EXISTS (semi-join-ok correlated select) SELECT ... WHERE NOT EXISTS (semi-join-ok correlated select) | No, its EXISTS | Testcase available |
| WHERE oe IN (SELECT column FROM table) | Yes, both | |
| WHERE oe IN (SELECT col FROM tbl1 WHERE NOT EXISTS(SELECT ... WHERE ... AND tbl1.col= inner_tbl.col) | IN-subqery - Yes, both EXISTS - No. | Another case of equality-correlated EXISTS |
| select ... inner joins ... where oe = (uncorrelated semi-join-ok select) | No | Subqueries are simple 'lookups' so the case itself is not interesting. In general, this hints on necessity to position subquery predicates based on their costs. |
| WHERE .. IN (uncorrelated group-by select) | Yes, Materialization | |
| Several correlated LIMIT 1 subqueries in select select list | No | This could probably use subquery argument caching. |
| WHERE oe IN (uncorrelated semi-join-ok select) | Yes, both | |
| UPDATE tbl ... WHERE tbl.col IN (uncorrelated semi-join-ok select) | No | Manual conversion to multi-table join update helped |
| WHERE oe IN (uncorrelated semi-join-ok select) | Yes, both | Manual conversion to join provided speedup |
| WHERE oe IN (uncorrelated semi-join-ok select) | Yes, both | Insideout order neede |
| WHERE oe IN (uncorrelated semi-join-ok select distinct) | Yes, both | Insideout order needed |
| WHERE oe IN (uncorrelated semi-join-ok select) | Yes, both | Insideout order needed |
| WHERE oe IN (uncorrelated semi-join-ok select) | Yes, both | |
| WHERE oe IN (uncorrelated semi-join-ok select) | Yes, both | |
| WHERE NOT EXISTS (SELECT 1 FROM inner_tbl WHERE inner_tbl.col=outer_tbl.col AND another-similar-cond) | No, EXISTS | |
| WHERE oe IN (uncorrelated group by select) | Yes, Materialization | 2-level nesting |
| WHERE oe IN (SELECT col FROM tbl1, tbl2 WHERE uncorrelated_where GROUP BY col) | Yes, Materialization (TODO: check if we can convert this to semi-join) | Another redundant GROUP BY case |
| WHERE oe IN (uncorrelated semi-join-ok select), and its EXISTS form | Variant 1: Yes, both Variant 2: No, it's EXISTS | |
| SELECT .. FROM ... LEFT OUTER JOIN (SELECT ...) | No, it's FROM. | |
| WHERE .. IN (uncorrelated semi-join-ok select) | Yes, both | |
| WHERE oe NOT IN (uncorrelated select w/ aggregates and group by) | Yes, materialization | |
| WHERE oe =(SELECT 3-way join WHERE correlated_cond ORDER BY x LIMIT 1) | No | Subq value cache might help here. |
| WHERE oe IN (uncorrelated semi-join-ok select, 2-way join) | Yes, both | |
| WHERE [NOT] EXISTS (SELECT 1 FROM it1 STRAIGHT JOIN it2 WHERE it1.col=ot.col AND so forth LIMIT 1 clause) | No, it's EXISTS | |
| WHERE oe NOT IN (uncorrelated semi-join-ok select) | Yes, Materialization |
[edit] TODO
- Run testcases from support cases
- Look at Bug#34364 "LEFT JOIN with subquery takes very long time" - it's a new customer bug with a test dataset.
.