Following a Query Back and Forth in the Server
Contents |
← Back to MySQL University main page
[edit] Following a Query Back and Forth in the Server
- Date: 2007-09-21 and 2007-10-18
- Presenter: Sergei Golubchik
- Scribe: Paul DuBois
- Attendees (please register by filling in your name below, and read the Instructions for Attendees):
- Martin Hansson
- Sven Sandberg
- Mattias Jonsson
[edit] Presentation
[edit] Big Picture
[edit] Moment 1: Arrival
[edit] 1. Protocol
- Vio (virtual io) layer to abstract socket/ssl/tcp from the upper level
- net_serv.cc — the upper level of the on-the-wire protocol
- Non-blocking io and blocking with a separate alarm thread to handle timeouts
- Handles zlib compression of packet payload
644 /*
645 Read one command from connection and execute it (query or simple command).
646 This function is called in loop from thread function.
652 */
654 bool do_command(THD *thd)
655 {
660 DBUG_ENTER("do_command");
679 if ((packet_length=my_net_read(net)) == packet_error)
680 {
685 if (net->error != 3) // can we continue without closing the connection ?
688 DBUG_RETURN(TRUE); // We have to close it.
690 net_send_error(thd, net->last_errno, NullS);
692 DBUG_RETURN(FALSE);
693 }
694 else
695 {
696 packet=(char*) net->read_pos;
697 command = (enum enum_server_command) (uchar) packet[0];
703 }
717 DBUG_RETURN(dispatch_command(command,thd, packet+1, (uint) packet_length));
718 }
[edit] 2. COM_ switch in dispatch_command()
770 switch (command) {
785 case COM_REGISTER_SLAVE:
823 case COM_CHANGE_USER:
913 case COM_STMT_EXECUTE:
918 case COM_STMT_FETCH:
923 case COM_STMT_SEND_LONG_DATA:
928 case COM_STMT_PREPARE:
933 case COM_STMT_CLOSE:
1063 case COM_QUIT:
1118 case COM_BINLOG_DUMP:
1147 case COM_REFRESH:
1160 case COM_SHUTDOWN:
1249 case COM_PING:
1290 case COM_DEBUG:
...
1306 }
[edit] 2a. COM_QUERY
943 case COM_QUERY:
944 {
945 if (alloc_query(thd, packet, packet_length))
946 break;
952 general_log_print(thd, command, format, thd->query_length, thd->query);
954
955 if (!(specialflag & SPECIAL_NO_PRIOR))
956 my_pthread_setprio(pthread_self(),QUERY_PRIOR);
957
958 mysql_parse(thd, thd->query, thd->query_length, & found_semicolon);
959
990 if (!(specialflag & SPECIAL_NO_PRIOR))
991 my_pthread_setprio(pthread_self(),WAIT_PRIOR);
992 DBUG_PRINT("info",("query ready"));
993 break;
994 }
[edit] 3. Query Cache
- Caches the query as it came from the client
- And the result of a query — as network packets that were sent to the client
- Thus it adds almost no overhead to regular query processing
- And it's fast — if the query is in the cache it is found before any work on the query is done
- If the query is in the cache, the answer is sent at once — no processing required
5391 void mysql_parse(THD *thd, const char *inBuf, uint length, const char **)
5393 {
5417 if (query_cache_send_result_to_client(thd, (char*) inBuf, length) <= 0)
5418 {
5426 bool err= parse_sql(thd, &lip, NULL);
5429 if (!err)
5430 {
5432 if (mqh_used && thd->user_connect && check_mqh(thd, lex->sql_command))
5435 thd->net.error = 0;
5437 else
5441 { /* Actually execute the query */
5457 mysql_execute_command(thd);
5458 query_cache_end_of_result(thd);
5459 }
5461 }
5462 else
5468 query_cache_abort(&thd->net);
5478 thd->cleanup_after_query();
5480 }
5488 }
[edit] Moment 2: Parsing
[edit] 4. Parser
- LARL(1) parser built by bison
- Hand written lexer — flex is not used
- The query is parsed into memory data structures — table lists, select trees, expression trees. No bytecode.
- Can be used standalone — DBIx::MyParse
[edit] 4a. class Item
- Base class for expressions
- fields, constants, functions, variables, placeholders, subqueries — everything is an Item
- properties: name, type, maybe_null, null_value, charset, collation, used_tables, const_item, args[ ], ...
- methods: val_int(), val_str(), val_real(), val_decimal(); reset(), add(); eq(), print(), fix_fields(), ...
- Evaluating an expression of any complexity is as simple as
expr->val_int() // or val_str(), val_real(), val_decimal()
[edit] Moment 2a: After parsing
[edit] 5. Preparing for execution
- Checking privileges
1878 res= check_table_access(thd, lex->exchange ? SELECT_ACL | FILE_ACL : SELECT_ACL, all_tables, 0);
- Opening and locking tables
- deadlock-free locking: all tables are locked at once and always in the same order
948 enum enum_thr_lock_result
949 thr_multi_lock(THR_LOCK_DATA **data, uint count, THR_LOCK_OWNER *owner)
950 {
955 sort_locks(data,count);
957 for (pos=data,end=data+count; pos < end ; pos++)
958 {
959 enum enum_thr_lock_result result= thr_lock(*pos, owner, (*pos)->type);
960 if (result != THR_LOCK_SUCCESS)
961 { /* Aborted */
962 thr_multi_unlock(data,(uint) (pos-data));
963 DBUG_RETURN(result);
964 }
969 }
1017 DBUG_RETURN(THR_LOCK_SUCCESS);
1018 }
- this is true even in complex statements with routines, subqueries, views, triggers
- Calling item->fix_fields()
- fields find their tables
- items define basic properties like maybe_null, const_item, used_tables
[edit] Moment 3: Optimizing
[edit] 6. Query transformations
- Equality propagation: WHERE f1 = f2 AND f2 = f3 → f1 = f2 = f3
8719 optimize_cond(JOIN *join, COND *conds, List<TABLE_LIST> *join_list, Item::cond_result *cond_value)
8721 {
8729 /* Build all multiple equality predicates */
8738 conds= build_equal_items(join->thd, conds, NULL, join_list, &join->cond_equal);
8742 /* change field1 = field2 to field2 = const for each found field1 = const */
8743 propagate_cond_constants(thd, (I_List<COND_CMP> *) 0, conds, conds);
8744 /* Remove all instances of item == item */
8749 conds= remove_eq_conds(thd, conds, cond_value) ;
8753 }
- NOT elimination: WHERE NOT f1 > 5 → f1 <= 5
7018 Item *negate_expression(THD *thd, Item *expr)
7019 {
7021 if (expr->type() == Item::FUNC_ITEM && ((Item_func *) expr)->functype() == Item_func::NOT_FUNC)
7023 { /* it is NOT(NOT( ... )) */
7027 if (arg->is_bool_func() || place == IN_WHERE || place == IN_HAVING)
7028 return arg;
7029 /* if it is not boolean function then we emulate value of not(not(a)), it will be a != 0 */
7033 return new Item_func_ne(arg, new Item_int((char*) "0", 0, 1));
7034 }
7036 if ((negated= expr->neg_transformer(thd)) != 0)
7037 return negated;
7038 return new Item_func_not(expr);
7039 }
[edit] 6a. More query transformations
- "const" tables
- tables with no more than one row
- tables, a unique key of which is compared to a constant
WHERE t1.pk = 12 AND t2.a = t1.b
- MIN()/MAX()/COUNT(*)
SELECT MAX(a) FROM t1; SELECT MIN(b) FROM t1 WHERE a=5;
- Converting outer joins to inner joins:
FROM t1 LEFT JOIN t2 USING (id) WHERE t2.name LIKE 'a%'
8304 simplify_joins(JOIN *join, List<TABLE_LIST> *join_list, COND *conds, bool top)
8305 {
8370 if (table->table->map & conds->not_null_tables())
8371 {
8376 table->outer_join= 0;
8377 if (table->on_expr)
8378 {
8382 conds= and_conds(conds, table->on_expr);
8390 table->prep_on_expr= table->on_expr= 0;
8391 }
8392 }
8459 }
[edit] 7. Range Optimizer
- Supports >, >=, <, <=, =, <>, IS NULL, <=>, BETWEEN, IN, LIKE, and any combination of the above combined with AND/OR
- Supports only comparison with a constant
- Builds a tree of ranges (see next slide)
- Does not use index cardinality but asks storage engine for estimates for every key range from the tree — does not suffer from skewed distributions
7262 static ha_rows
7263 check_quick_keys(PARAM *param, uint idx, SEL_ARG *key_tree,
7264 uchar *min_key, uint min_key_flag, int min_keypart,
7265 uchar *max_key, uint max_key_flag, int max_keypart)
7266 {
....
7413 tmp=param->table->file->records_in_range(keynr,
7414 (min_key_length ? &min_range :
7415 (key_range*) 0),
7416 (max_key_length ? &max_range :
7417 (key_range*) 0));
[edit] 7a. Range Optimizer (example)
CREATE TABLE ( ... INDEX (a,b) ... )
SELECT ... WHERE a IN (1,2) AND b IN (3,4) OR
a=7 AND b BETWEEN 3 AND 8 OR
a > 9 and b < 5
[edit] 8. Choosing an execution plan
- Getting statistics from the storage engine
- index cardinality, table scan and index lookup costs
- Creating a list of possible indexes
- Performing search for the best execution plan
- best plan is the one that has the lowest cost
- exhaustive search (up to MySQL 4.1), or greedy search (starting from 5.0)
2350 make_join_statistics(JOIN *join, TABLE_LIST *tables, COND *conds, DYNAMIC_ARRAY *keyuse_array)
2352 {
2379 for (; tables; tables= tables->next_leaf)
2391 error= tables->table->file->info(HA_STATUS_VARIABLE | HA_STATUS_NO_LOCK);
2497 if (conds || outer_join)
2498 if (update_ref_and_keys(join->thd, keyuse_array, stat, join->tables,
2499 conds, join->cond_equal, ~outer_join, join->select_lex, &sargables))
2501 DBUG_RETURN(1);
2506 for (POSITION *p_pos=join->positions, *p_end=p_pos+const_count; p_pos < p_end ; p_pos++)
2509 { /* Read tables with 0 or 1 rows (system tables) */
2514 if (!(tmp=join_read_const_table(s, p_pos)))
2520 found_const_table_map|= s->table->map;
2521 }
2525 do /* loop until no more const tables are found */
2526 {
....
2658 } while (join->const_table_map & found_ref && ref_changed);
2682 for (s=stat ; s < stat_end ; s++) /* Calc how many (possible) matched records in each table */
2683 {
2692 s->read_time=(ha_rows) s->table->file->scan_time();
2710 if (!s->const_keys.is_clear_all() && !s->table->pos_in_table_list->embedding)
2721 records= get_quick_record_count(join->thd, select, s->table, &s->const_keys, join->row_limit);
2753 }
2761 /* Find an optimal join order of the non-constant tables. */
2765 choose_plan(join, all_table_map & ~join->const_table_map);
2776 }
[edit] Moment 4: Executing
[edit] 9. Join
- nested-loop joins:
- for every matching row in the first table
- for every matched row in the second table
...- for every matched row in the last table
- (usually) send results to the client
- for every matched row in the second table
- for every matching row in the first table
- "for every matched row" means
- getting a next row
- checking the part of WHERE/ON condition that is applicable
[edit] 10. Sending data
- technically, this is not a separate step — result rows are sent as soon as they are found in the nested-join loop, row by row
- also, it does not nesessarily involve sending
- class select_result:
select_send select_export SELECT ... INTO OUTFILE select_insert INSERT ... SELECT select_exists_subselect multi_delete ...
- also it could do aggregation for GROUP BY
- or write rows to temporary table, instead of sending them out
[edit] 9a. Temporary tables
- if optimizer decides it needs a temporary table, the nested-join loop populates this table, instead of sending the data to the client
- then the second run is performed fetching the data from temporary table
- temporary table can be used for aggregation (GROUP BY) or uniqueness (DISTINCT, UNION) — it has unique index created as necessary.
[edit] That's all
[edit] Questions
[edit] Questions asked during the session
At what time is the query saved to the binlog?
- Currently, at the end of individual command handlers. that is, at the end of mysql_insert(), mysql_update(), etc. This could be changed in 6.0
Why is every table opened before executing? (Asking because of working with partitioning)
- Because of lock-free locking protocol. We must prevent the following situation:
thread1> lock table A. thread2> lock table B thread1> try to lock table B - wait for thread2 to unlock it thread2> try to lock table A - wait for thread1 to unlock it -- deadlock --
- Thus, MySQL locks *all tables* that a query may need *at once* in the beginning of the query, and *always in the same order*.
Is it possible in the future to do opening av partitions later? (in the same way as you describe, but only for the used partitions?) Right now it opens every partition at open...
- Yes. But partitions must be locked in the *same* order, always. Assuming the order is 1-2-3-... one cannot lock a partition 2 and later find that he needs to access a partition 1, too. So, possible solutions:
- Open all partitions that MySQL may possibly need in the beginning. It doesn't have to be all partitions, in some cases it's possible to optimize to a subset of partitions, e.g. in: SELECT * FROM part_table where part_key=6
- Open partitions on first access, but when the situation above happens (one needs to access partition 1 having partition 2 locked) unlock partition 2 before locking partition 1 (one can lock partition 2 again then, if necessary)
So it would be OK, to not open every partition, but when opening, do it in the same order?
- Yes, see above
Are there any plans for deadlock detections?
- No immediate plans.
So it must be implemented in the storage engine, like in InnoDB?
- Yes
[edit] Voice recording and other links
- Voice Recording: Ogg Audio (8MB)
- IRC log: IRC Text (4KB)


