Category: MySQLDevelopment

MySQL Internals Optimizer tracing

← Back to MySQL Internals overview page

Contents

[edit] Tracing the Optimizer

Since milestone version 5.6.3, a new tracing capability has been added to the MySQL optimizer. The interface is provided by a set of optimizer_trace_xxx system variables and the INFORMATION_SCHEMA.OPTIMIZER_TRACE table, but is subject to change.

[edit] Typical Usage

# Turn tracing on (it's off by default):
SET optimizer_trace="enabled=on";
SELECT ...; # your query here
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
# possibly more queries...
# When done with tracing, disable it:
SET optimizer_trace="enabled=off";

A session can trace only statements which it executes; it cannot see a trace of another session.

[edit] System Variables Controlling the Trace

A brief overview from "mysqld --verbose --help":

 --optimizer-trace=name 
                     Controls tracing of the Optimizer:
 --optimizer-trace-features=name 
                     Enables/disables tracing of selected features of the
                     Optimizer:
                     optimizer_trace_features=option=val[,option=val...],
                     where option is one of {greedy_search, range_optimizer,
                     dynamic_range, repeated_subselect} and val is one of {on,
                     off, default}
 --optimizer-trace-limit=# 
                     Maximum number of shown optimizer traces
 --optimizer-trace-max-mem-size=# 
                     Maximum allowed cumulated size of stored optimizer traces
 --optimizer-trace-offset=# 
                     Offset of first optimizer trace to show; see manual
 --end-markers-in-json=#
                     In JSON output ("EXPLAIN FORMAT=JSON" and optimizer
                     trace), if set to 1, repeats the structure's key (if it
                     has one) near the closing bracket

Those will be described in more detail below.

[edit] The INFORMATION_SCHEMA.OPTIMIZER_TRACE Table

The OPTIMIZER_TRACE table contains information about traced statements. The table has these columns:

[edit] Traceable Queries

They are: SELECT; INSERT or REPLACE (with VALUES or SELECT); UPDATE/DELETE and their multi-table variants; all the previous ones prefixed by EXPLAIN; SET (unless it manipulates the optimizer_trace system variable); DO; DECLARE/CASE/IF/RETURN (stored routines language elements); CALL. If one of those statements is prepared and executed in separate steps, preparation and execution are separately traced.

[edit] Automatic Trace Purging

By default, each new trace overwrites the previous trace. Thus, if a statement contains substatements (example: invokes stored procedures, stored functions, triggers), the top statement and substatements each generate one trace, but at the execution's end only the last substatement's trace is visible. A user who wants to see the trace of another substatement, can enable/disable tracing around the desired substatement, but this requires editing the routine's code, which may not be possible. Another solution is to tune trace purging.

[edit] Tuning Trace Purging

This is done with

 SET optimizer_trace_offset=<OFFSET>, optimizer_trace_limit=<LIMIT>

where OFFSET is a signed integer, and LIMIT is a positive integer. The default for optimizer_trace_offset is -1; the default for optimizer_trace_limit is 1. The SET statement has the following effects:

traces of the (-OFFSET) newest remembered traces (if OFFSET < 0).

For example, a combination of OFFSET=-1 and LIMIT=1 will make the last trace be shown (as is default), OFFSET=-2 and LIMIT=1 will make the next-to-last be shown, OFFSET=-5 and LIMIT=5 will make the last five traces be shown. Such negative OFFSET can be useful when one knows that the interesting substatements are the few last ones of a stored routine, like this:

SET optimizer_trace_offset=-5, optimizer_trace_limit=5;
CALL stored_routine(); # more than 5 substatements in this routine
SELECT * FROM information_schema.OPTIMIZER_TRACE; # see only last 5 traces

On the opposite, a positive OFFSET can be useful when one knows that the interesting substatements are the few first ones of a stored routine.

The more accurately those two variables are adjusted, the less memory is used. For example, OFFSET=0 and LIMIT=5 will use memory to remember 5 traces, so if only the three first are needed, OFFSET=0 and LIMIT=3 is better (tracing stops after LIMIT traces, so the 4th and 5th trace are not created and take up no memory). A stored routine may have a loop which executes many substatements and thus generates many traces, which would use a lot of memory; proper OFFSET and LIMIT can restrict tracing to one iteration of the loop for example. This also gains speed, as tracing a substatement impacts performance.

If OFFSET>=0, only LIMIT traces are kept in memory. If OFFSET<0, that is not true: instead, (-OFFSET) traces are kept in memory; indeed even if LIMIT is smaller than (-OFFSET), so excludes the last statement, the last statement must still be traced because it will be inside LIMIT after executing one more statement (remember than OFFSET<0 is counted from the end: the "window" slides as more statements execute).

Such memory and speed gains are the reason why optimizer_trace_offset and optimizer_trace_limit, which are restrictions at the trace producer level, are offered. They are better than using

 SELECT * FROM OPTIMIZER_TRACE LIMIT <LIMIT> OFFSET <OFFSET>;

which is a restriction on the trace consumer level and saves almost nothing.

[edit] Tracing Memory Usage

Each remembered trace is a String. It is extended (with realloc()) as optimization progresses and appends data to it. The optimizer_trace_max_mem_size variable sets a limit on the total amount of memory used by all currently remembered traces: If this limit is reached, the current trace isn't extended (so it will be incomplete), and the MISSING_BYTES_BEYOND_MAX_MEM_SIZE column will show the number of bytes missing from this trace. The optimizer_trace_max_mem_size variable has a small default (16kB) to protect the innocent, it's often necessary to grow it for real-life traces.

[edit] Privilege Checking

In complex scenarios where the query uses SQL SECURITY DEFINER views or stored routines, it may be that a user is denied from seeing the trace of its query because it lacks some extra privileges on those objects. In that case, the trace will be shown as empty and the MISSING_PRIVILEGE column will show "1".

[edit] Interaction with the --debug Option

Anything written to the trace is automatically writte to the --debug file.

[edit] The optimizer_trace System Variable

The optimizer_trace system variable has these on/off switches:

[edit] The end_markers_in_json System Variable

If a JSON structure is large, it's difficult to pair its closing bracket with its opening bracket; to help the reader (but this is not JSON-compliant), setting @@end_markers_in_json=on repeats the structure's key (if it has one) near the closing bracket. It also affects 'EXPLAIN FORMAT=JSON' in the same way. Note that before MySQL 5.6.6, this variable didn't exist and was rather a switch in the @@optimizer_trace variable (it was set with 'set optimizer_trace="end_marker=on";').

[edit] Selecting Optimizer Features to Trace

Some features in the optimizer can be invoked many times during statement optimization and execution, and thus can make the trace grow beyond reason. They are:

Those features can be excluded from tracing using the optimizer_trace_features system variable, which has these on/off switches:

[edit] General Trace Structure

A trace follows closely the actual execution path: there is a join-preparation object, a join-optimization object, a join-execution object, for each JOIN. Query transformations (IN->EXISTS, outer join to inner join...), simplifications (elimination of clauses), equality propagation are shown in subobjects. Calls to the range optimizer, cost evaluations, reasons why an access path is chosen over another one, or why a sorting method is chosen over another one, are shown too. It is far from showing everything happening in the optimizer, but we plan to show more information in the future.

[edit] Example

Here we take an example from the test suite.

 #
 # Tracing of ORDER BY & GROUP BY simplification.
 #
 SET OPTIMIZER_TRACE="enabled=on",END_MARKERS_IN_JSON=on; # be readable
 SET OPTIMIZER_TRACE_MAX_MEM_SIZE=1000000; # avoid small default
 CREATE TABLE t1 (
 pk INT, col_int_key INT,
 col_varchar_key VARCHAR(1), col_varchar_nokey VARCHAR(1)
 );
 INSERT INTO t1 VALUES
 (10,7,'v','v'),(11,0,'s','s'),(12,9,'l','l'),(13,3,'y','y'),(14,4,'c','c'),
 (15,2,'i','i'),(16,5,'h','h'),(17,3,'q','q'),(18,1,'a','a'),(19,3,'v','v'),
 (20,6,'u','u'),(21,7,'s','s'),(22,5,'y','y'),(23,1,'z','z'),(24,204,'h','h'),
 (25,224,'p','p'),(26,9,'e','e'),(27,5,'i','i'),(28,0,'y','y'),(29,3,'w','w');
 CREATE TABLE t2 (
 pk INT, col_int_key INT,
 col_varchar_key VARCHAR(1), col_varchar_nokey VARCHAR(1),
 PRIMARY KEY (pk)
 );
 INSERT INTO t2 VALUES
 (1,4,'b','b'),(2,8,'y','y'),(3,0,'p','p'),(4,0,'f','f'),(5,0,'p','p'),
 (6,7,'d','d'),(7,7,'f','f'),(8,5,'j','j'),(9,3,'e','e'),(10,188,'u','u'),
 (11,4,'v','v'),(12,9,'u','u'),(13,6,'i','i'),(14,1,'x','x'),(15,5,'l','l'),
 (16,6,'q','q'),(17,2,'n','n'),(18,4,'r','r'),(19,231,'c','c'),(20,4,'h','h'),
 (21,3,'k','k'),(22,3,'t','t'),(23,7,'t','t'),(24,6,'k','k'),(25,7,'g','g'),
 (26,9,'z','z'),(27,4,'n','n'),(28,4,'j','j'),(29,2,'l','l'),(30,1,'d','d'),
 (31,2,'t','t'),(32,194,'y','y'),(33,2,'i','i'),(34,3,'j','j'),(35,8,'r','r'),
 (36,4,'b','b'),(37,9,'o','o'),(38,4,'k','k'),(39,5,'a','a'),(40,5,'f','f'),
 (41,9,'t','t'),(42,3,'c','c'),(43,8,'c','c'),(44,0,'r','r'),(45,98,'k','k'),
 (46,3,'l','l'),(47,1,'o','o'),(48,0,'t','t'),(49,189,'v','v'),(50,8,'x','x'),
 (51,3,'j','j'),(52,3,'x','x'),(53,9,'k','k'),(54,6,'o','o'),(55,8,'z','z'),
 (56,3,'n','n'),(57,9,'c','c'),(58,5,'d','d'),(59,9,'s','s'),(60,2,'j','j'),
 (61,2,'w','w'),(62,5,'f','f'),(63,8,'p','p'),(64,6,'o','o'),(65,9,'f','f'),
 (66,0,'x','x'),(67,3,'q','q'),(68,6,'g','g'),(69,5,'x','x'),(70,8,'p','p'),
 (71,2,'q','q'),(72,120,'q','q'),(73,25,'v','v'),(74,1,'g','g'),(75,3,'l','l'),
 (76,1,'w','w'),(77,3,'h','h'),(78,153,'c','c'),(79,5,'o','o'),(80,9,'o','o'),
 (81,1,'v','v'),(82,8,'y','y'),(83,7,'d','d'),(84,6,'p','p'),(85,2,'z','z'),
 (86,4,'t','t'),(87,7,'b','b'),(88,3,'y','y'),(89,8,'k','k'),(90,4,'c','c'),
 (91,6,'z','z'),(92,1,'t','t'),(93,7,'o','o'),(94,1,'u','u'),(95,0,'t','t'),
 (96,2,'k','k'),(97,7,'u','u'),(98,2,'b','b'),(99,1,'m','m'),(100,5,'o','o');
 SELECT SUM(alias2.col_varchar_nokey) , alias2.pk AS field2 FROM t1 AS alias1
 STRAIGHT_JOIN t2 AS alias2 ON alias2.pk = alias1.col_int_key WHERE alias1.pk
 GROUP BY field2 ORDER BY alias1.col_int_key,alias2.pk ;
 SUM(alias2.col_varchar_nokey)	field2
 0	1
 0	2
 0	3
 0	4
 0	5
 0	6
 0	7
 0	9

Now we look at the trace:

 SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
 QUERY	TRACE	MISSING_BYTES_BEYOND_MAX_MEM_SIZE	INSUFFICIENT_PRIVILEGES
 SELECT SUM(alias2.col_varchar_nokey) , alias2.pk AS field2 FROM t1 AS alias1
 STRAIGHT_JOIN t2 AS alias2 ON alias2.pk = alias1.col_int_key WHERE alias1.pk
 GROUP BY field2 ORDER BY alias1.col_int_key,alias2.pk	{

This was the first column: it repeats the query (this is a useful mark when several traces are remembered thanks to optimizer_trace_offset and optimizer_trace_limit). Now the trace. The statement's execution is naturally made of "steps":

   "steps": [
     {
       "join_preparation": {

This is a join's preparation

         "select#": 1,

for the first SELECT of the statement (which has only one, here). Here are steps of the join's preparation:

         "steps": [
           {
             "expanded_query": "/* select#1 */ select sum(`test`.`alias2`.`col_varchar_nokey`) AS `SUM(alias2.col_varchar_nokey)`,`test`.`alias2`.`pk` AS `field2` from (`test`.`t1` `alias1` straight_join `test`.`t2` `alias2` on((`test`.`alias2`.`pk` = `test`.`alias1`.`col_int_key`))) where `test`.`alias1`.`pk` group by `test`.`alias2`.`pk` order by `test`.`alias1`.`col_int_key`,`test`.`alias2`.`pk`"
           }

Above is the query as it is in the join's preparation: fields have been resolved to their database and table, and each SELECT is annotated with its number (useful with subqueries).

         ] /* steps */
       } /* join_preparation */
     },
     {

Off to optimization:

       "join_optimization": {
         "select#": 1,
         "steps": [
           {
             "condition_processing": {
               "condition": "WHERE",
               "original_condition": "(`test`.`alias1`.`pk` and (`test`.`alias2`.`pk` = `test`.`alias1`.`col_int_key`))",
               "steps": [
                 {
                   "transformation": "equality_propagation",
                   "resulting_condition": "(`test`.`alias1`.`pk` and multiple equal(`test`.`alias2`.`pk`, `test`.`alias1`.`col_int_key`))"
                 },
                 {
                   "transformation": "constant_propagation",
                   "resulting_condition": "(`test`.`alias1`.`pk` and multiple equal(`test`.`alias2`.`pk`, `test`.`alias1`.`col_int_key`))"
                 },
                 {
                   "transformation": "trivial_condition_removal",
                   "resulting_condition": "(`test`.`alias1`.`pk` and multiple equal(`test`.`alias2`.`pk`, `test`.`alias1`.`col_int_key`))"

Not much happened in equality propagation above.

                 }
               ] /* steps */
             } /* condition_processing */
           },
           {
             "ref_optimizer_key_uses": [
               {
                 "database": "test",
                 "table": "alias2",
                 "field": "pk",
                 "equals": "`test`.`alias1`.`col_int_key`",
                 "null_rejecting": true

A possible ref access has been identified, and it is NULL-rejecting: any NULL value in `test`.`alias1`.`col_int_key` cannot have a match (it could have a match if the operator were <=>).

               }
             ] /* ref_optimizer_key_uses */
           },
           {

Now for every table in the query we estimate the cost of, and number of records returned by, a table scan, a range access,

             "records_estimation": [
               {
                 "database": "test",
                 "table": "alias1",
                 "const_keys_added": {
                   "keys": [
                   ] /* keys */,
                   "cause": "group_by"
                 } /* const_keys_added */,
                 "range_analysis": {
                   "table_scan": {
                     "records": 20,
                     "cost": 8.1977
                   } /* table_scan */
                 } /* range_analysis */
               },
               {
                 "database": "test",
                 "table": "alias2",
                 "const_keys_added": {
                   "keys": [
                     "PRIMARY"
                   ] /* keys */,
                   "cause": "group_by"
                 } /* const_keys_added */,
                 "range_analysis": {
                   "table_scan": {
                     "records": 100,
                     "cost": 24.588
                   } /* table_scan */,
                   "potential_range_indices": [
                     {
                       "index": "PRIMARY",
                       "usable": true,
                       "key_parts": [
                         "pk"
                       ] /* key_parts */
                     }
                   ] /* potential_range_indices */,
                   "setup_range_conditions": [
                   ] /* setup_range_conditions */,
                   "group_index_range": {
                     "chosen": false,

Not possible to use GROUP_MIN_MAX because it can handle only one table, and we have two in the join:

                     "cause": "not_single_table"
                   } /* group_index_range */

No range access is possible.

                 } /* range_analysis */
               }
             ] /* records_estimation */
           },
           {

Finding an optimal order for tables (greedy search); actually as this is STRAIGHT_JOIN only the requested order is explored, and access methods are selected:

             "considered_execution_plans": [
               {
                 "database": "test",
                 "table": "alias1",
                 "best_access_path": {
                   "considered_access_paths": [
                     {
                       "access_type": "scan",
                       "records": 20,
                       "cost": 2.0977,
                       "chosen": true

table scan chosen!

                     }
                   ] /* considered_access_paths */
                 } /* best_access_path */,
                 "cost_for_plan": 6.0977,
                 "records_for_plan": 20,

We estimate that reading the first table, and applying any conditions to it, will yield 20 rows.

                 "rest_of_plan": [
                   {
                     "database": "test",
                     "table": "alias2",
                     "best_access_path": {
                       "considered_access_paths": [
                         {
                           "access_type": "ref",
                           "index": "PRIMARY",
                           "records": 1,
                           "cost": 20.2,
                           "chosen": true

We choose ref access on the primary key of alias2.

                         },
                         {
                           "access_type": "scan",
                           "using_join_cache": true,
                           "records": 75,
                           "cost": 7.4917,
                           "chosen": false

but not table scan, because its amount of records (75) is far greater than that of ref access (1).

                         }
                       ] /* considered_access_paths */
                     } /* best_access_path */,
                     "cost_for_plan": 30.098,
                     "records_for_plan": 20,
                     "chosen": true
                   }
                 ] /* rest_of_plan */
               }
             ] /* considered_execution_plans */
           },
           {

Now that the order of tables is fixed, we can split the WHERE condition in chunks which can be tested early ("push down of conditions down the join tree"):

             "attaching_conditions_to_tables": {
               "original_condition": "((`test`.`alias2`.`pk` = `test`.`alias1`.`col_int_key`) and `test`.`alias1`.`pk`)",
               "attached_conditions_computation": [
               ] /* attached_conditions_computation */,
               "attached_conditions_summary": [
                 {
                   "database": "test",
                   "table": "alias1",
                   "attached": "(`test`.`alias1`.`pk` and (`test`.`alias1`.`col_int_key` is not null))"

this condition above can be tested on rows of alias1 without even reading rows of alias2.

                 },
                 {
                   "database": "test",
                   "table": "alias2",
                   "attached": null
                 }
               ] /* attached_conditions_summary */
             } /* attaching_conditions_to_tables */
           },
           {

Now we try to simplify ORDER BY:

             "clause_processing": {
               "clause": "ORDER BY",
               "original_clause": "`test`.`alias1`.`col_int_key`,`test`.`alias2`.`pk`",
               "items": [
                 {
                   "item": "`test`.`alias1`.`col_int_key`"
                 },
                 {
                   "item": "`test`.`alias2`.`pk`",
                   "eq_ref_to_preceding_items": true

Because the WHERE clause contains `alias2`.`pk`=`alias1`.`col_int_key`, ordering by both columns is a waste: can just order by the first column, the second will always be equal to it.

                 }
               ] /* items */,
               "resulting_clause_is_simple": true,
               "resulting_clause": "`test`.`alias1`.`col_int_key`"

So we get a shorter ORDER BY clause - and this is not visible in EXPLAIN or EXPLAIN EXTENDED!! This simplification can be worth it: this shorter clause, being single-column and single-table, could be implemented by an index scan...

             } /* clause_processing */
           },
           {
             "clause_processing": {
               "clause": "GROUP BY",
               "original_clause": "`test`.`alias2`.`pk`",
               "items": [
                 {
                   "item": "`test`.`alias2`.`pk`"
                 }
               ] /* items */,
               "resulting_clause_is_simple": false,
               "resulting_clause": "`test`.`alias2`.`pk`"
             } /* clause_processing */
           },
           {
             "refine_plan": [
               {
                 "database": "test",
                 "table": "alias1",
                 "scan_type": "table"
               },
               {
                 "database": "test",
                 "table": "alias2"
               }
             ] /* refine_plan */
           }
         ] /* steps */
       } /* join_optimization */
     },
     {

Now we execute the join, and nothing interesting happens here:

       "join_execution": {
         "select#": 1,
         "steps": [
         ] /* steps */
       } /* join_execution */
     }
   ] /* steps */
 }	0	0

This was just an example. All traces have the same basic structure, but if a statement uses subqueries, there are several join preparations/optimizations/executions, subquery-specific transformations not shown here...

[edit] Nicely Displaying a Trace

Looking at a trace in the "mysql" command-line client can be cumbersome (though the command "pager less" makes it better). An alternative can be to send the trace to a file:

SELECT TRACE INTO DUMPFILE <filename> FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;

and then passing this file to some JSON viewer, for example the JsonView Firefox add-on (shows objects in colours, allows object collapsing/expanding). Make sure to use INTO DUMPFILE and not INTO OUTFILE because the latter escapes newlines. Also, have the @@end_marker_in_json variable off: when it's on, the trace is more human-readable but not JSON-compliant.

[edit] Preventing use of optimizer trace

If for some reason, as DBA of a MySQL Server, you wish to prevent all users from seeing traces of their queries, start the server with

--maximum-optimizer-trace-max-mem-size=0 --optimizer-trace-max-mem-size=0

This will set the maximum size to 0, and prevent users from changing this limit, thus all traces will be truncated to 0 bytes.

[edit] Testing

This feature is tested in mysql-test/suite/opt_trace and unittest/gunit/opt_trace-t.

[edit] Implementation

See files sql/opt_trace* , starting from sql/opt_trace.h. A trace is started by creating an instance of Opt_trace_start; information is added to this trace by creating instances of Opt_trace_object and Opt_trace_array, and by using the add() functions of those classes...

Retrieved from "http://forge.mysql.com/wiki/MySQL_Internals_Optimizer_tracing"

This page has been accessed 2,723 times. This page was last modified 09:34, 21 March 2012.

Find

Browse
MySQLForge
Main Page
Current events
Recent changes
Random page
Help
Edit
Edit this page
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Special pages
New pages
File list
Statistics
Bug reports
More...