MySQL Internals Optimizer/Storage engine Interface
This page is work in progress. Information may be misleading or incorrect
This is intended to serve as a guide of what information a storage engine needs to provide to MySQL Query Optimizer.
Contents |
[edit] Number of rows in the table
Number of records in the table is reported by the table handler by filling handler->stats.records in
handler::info(HA_STATUS_VARIABLE & maybe_other_bits);
call. The optimizer interprets handler->stats.records value as follows: 0 means the table has exactly zero rows. Interpretation of other values depend on whether handler->table_flags() returns HA_STATS_RECORDS_IS_EXACT bit. If the bit is set, the value is assumed to be the exact number of rows in the table (and query "SELECT COUNT(*) FROM tbl" can be answered immediately). If the bit is not set, the value is assumed to be an estimate (the optimizer will treat it as expected value of the number of records).
[edit] Upper bound of number of rows
The call
ha_rows handler::estimate_rows_upper_bound()
returns the upper bound of rows in the table. If the storage engine cannot provide the upper bound, it must return HA_POS_ERROR to indicate that. The optimizer uses the upper bound to allocate memory for sorting, returning more records then was reported as upper bound will cause ORDER BY queries to fail.
[edit] Indexes
[edit] Index statistics
Index statistics is reported in
handler::info(HA_STATUS_CONST & maybe_other_bits);
call. The call retrieves statistics for all indexes, statistics for i-th index is stored in the
handler->table->key_info[i].rec_per_key
array. The array has #keyparts elements. The optimizer interpets its elements as
rec_per_key[k] = AVG(#rows such that
keypart1=c1 AND keypart2=c2 AND ... AND keypart{k} = c_k,
where c_i are arbitrary constants))
rec_per_key[k] = 0 means the estimate is not known.
Note: the above doesn't apply to fulltext or R-TREE indexes. Optimizer will not use rec_per_key values for those types of indexes
[edit] NULL values
SQL's rules for NULL values are that NULL!=NULL. How should NULL value population affect rec_per_key statistics is a gray area. See Bug#9622 for an attempt to deal with this in MyISAM.
[edit] records_in_range() call
TODO
(returning 0 means you guarantee the range has 0 rows)
[edit] Index properties
[edit] Cost functions
[edit] Cost Units
[edit] read_time()
[edit] scan_time()
[edit] Condition pushdown
[edit] Index Condition pushdown
[edit] Multi Range Read interface
[edit] Default cost functions implementation
what assumptions they make about the engine
[edit] Implementations in known storage engines
Production of statistics and records_in_range() implementation:
| Engine | rec_per_key | records_in_range |
|---|---|---|
| MyISAM | Static values, updateable on ANALYZE/OPTIMIZE and some DDL or bulk ops | Calculated by doing index dives to range endpoints and counting fraction of index pages that is between the endpoints |
| InnoDB | Dynamically calculated by doing several index dives (TODO: when is it recalculated?). That's why running ANALYZE several times may produce different, non-converging results. | Similar to MyISAM |
| NDB | MySQL 5.0 - trivial function: { return unique_key? 1 : 10 ;} MySQL 5.1: For BTREE indexes, ha_ndbcluster can obtain a reasonably accurate estimate by asking the backend. Communication with backend has a cost, so ha_ndbcluster maintains an LRU range estimate cache of 32 elements. The backend is accessed on every 20th records_in_range() call. For other calls the result is calculated from the information in the cache. See NdbIndexStat.h|cpp. |
|
| HEAP | BTREE: a variant of index dives HASH: Based on # of hash buckets |
|
| Falcon | TODO | Heuristic formula based on index cardinality value, number of keyparts used by the left interval bound, and type of the interval (disinguishes between equality and all other intervals) |
| PBXT | ||
| Solid | (something nontrivial) |
ANALYZE/OPTIMIZE table actions:
| Engine | ANALYZE | OPTIMIZE |
|---|---|---|
| MyISAM | for each index, run through the entire index and update rec_per_key values | |
| InnoDB | ||
| NDB | ||
| HEAP | unsupported | unsupported |
| Falcon | ||
| PBXT | ||
| Solid |