Overview of query execution and use of temp tables
← Back to MySQL University main page- Presenter: Monty Widenius
- Time: Thursday, June 14, 2007, at 13:00 UTC = 6:00am PST = 9:00am EST = 15:00 CEST = 16:00 EET = 17:00 MoscowTime
- Estimated length of Session: 1:30 to 2:00 h
- Scribe: Paul DuBois
- Attendees (please fill in your name):
- Dmitri Lenev
- Ingo Struewing
- Mark Leith
- Antony Curtis
- Tom Hanlon
- Martin Hansson
- Sergey Petrunia
- Victoria Reznichenko
- Chad Miller
- Horst Hunger
- Georgi Kodinov
- Timour Katchaounov
- Jess Balint
- Alexey Kopytov
- Susanne Ebrecht
- Martin Hansson
- Eric Herman
Contents |
[edit] Overview of query (SELECT) execution and use of temp tables
Note that when we in the following text mention temporary table, we are referring to internal temporary tables, not tables created with the CREATE TEMPORARY TABLE command.
[edit] SELECT execution
All queries that are to be executed are going through sql_select.cc:mysql_parse(). It's here you should set a breakpoint if you want to understand how MySQL executes the query.
The flow of execution for a SELECT is roughly:
- Check if query is in query cache
- If yes, send cached result to client and abort rest of execution.
- Parse query (with sql_lex.cc and sql_yacc.yy)
- Check table access rights (sql_select.cc::check_table_access())
- Open all tables (directly and indirectly) used by query (sql_base.cc::open_and_lock_tables())
- Fill derived tables with data (tables in FROM clause) (sql_derived.cc::mysql_handle_derived())
- Create object for sending result to client (new select_send) (sql_parse.cc::execute_sqlcom_select)
- Check and resolve item trees (item->fix_fields), order by lists etc. (sql_select.cc::JOIN::prepare()). This also does some simple safe rewrites like flattening AND/OR arguments.
- Optimize query (sql_select.cc::JOIN::optimize())
- Convert outer joins to inner joins (if possible)
- Optimize WHERE and HAVING clauses
- Build all multiple equality predicates and eliminate equality predicates that can be inferred from these multiple equalities. (sql_select.cc::build_equal_items())
- Propagate constants (sql_select.cc::propagate_cond_constants())
- Remove trivial comparisons (A=A) and check for impossible constant usage. (sql_select.cc::remove_eq_conds())
- Prune partition tables.
- Optimize away MIN()/MAX() and COUNT() if possible. (opt_sum.cc::opt_sum_query())
- Calculate optimal join order (sql_select.cc::make_join_statistics())
- This uses opt_range.cc::test_quick_select() to check if it makes sense to read the table trough an index and to estimate how many rows we have to read from the table.
- The optimal join order is calculated through a greedy algorithm. (sql_select.cc::choose_plan())
- Choose the best fields from the generated multiple equality predicates. (sql_select.cc::substitute_for_best_equal_field())
- Create 'optimal' WHERE conditions to test for read table. (sql_select.cc::make_join_select()). These can also be used as pushdown conditions to storage engines.
- Check if we can optimize away distinct or group by. (sql_select.cc::JOIN::optimize())
- Setup join structure with accessors functions according to how rows should be read (sql_select.cc:make_join_readinfo()).
- For simple IN sub queries, setup to use the best sub query engine.
- Check if we can skip ORDER BY
- Create primary temporary table if needed (to ensure it's only created once even if we call JOIN::exec() multiple times, like for sub queries).
- If needed, sort rows or create pointers to retrieve rows in sorted order. (sql_select.cc::create_sort_index() and filesort.cc::filesort())
- Check if we can do some shortcut optimizations for DISTINCT. (Similar to 'left join optimization).
- Execute query (sql_select.cc::JOIN::exec())
- In case of recognized impossible queries, return 0 rows. (like WHERE 1=2 or WHERE a=b AND a=1 AND b=2)
- Populate used schema tables. (sql_show.cc::get_schema_tables_result())
- If we needed temporary table (see temporary table usage above)
- Execute query and fill temporary table with result. (sql_select.cc::do_select())
- Swap to use temporary table instead of other tables for the rest of the query.
- If we need another temporary table, group data to this table.
- If distinct, remove duplicates (if not removed earlier as a by process of GROUP BY).
- Add HAVING as a condition for the temporary table.
- If needed, create a sorted index for the temporary table. (sql_select.cc::create_sort_index())
- Do join and send data to the result. (sql_select.cc::do_select())
[edit] Description of join execution (sql_select.cc::do_select())
The purpose of this function is to join all tables together and to either store the result into a temporary table or send it to it's destination (through the select_result class).
The JOIN structure (one for each table in the join) is setup so that JOIN->next_select points to the next to-be-used sub_select() function. The JOIN->next_select for the last table points to an appropriate end_select() function.
The overall logic of do_select() is:
- Setup the virtual 'end_select' function that will be called when we have collected a full row combination from all tables. The chosen 'end_select' function depends on:
- If we have are going to write things into a temporary table
- If we group things into a temporary table
- If we are going to send the result to 'select_result' class
- If all tables are const tables, check HAVING, call 'end_select()' else call sub_select(), the driver of the join.
- Cleanup
- Return ok or error
The are several versions of 'sub_select()', the driver of the join. They have all the following basic structure:
If we have gone through all row combinations call JOIN->next_select() return while there are more rows for the current table read one record If record combination matches WHERE criteria Call JOIN->next_select()
The 'end_select()' functions has the following basic structure
If we have gone through all row combinations do end-of-group-check && end-of-group-writes (if applicable) return If row combination matches HAVING criteria Send row combination to next stage (This can be write row to temporary table, update existing row in temporary table or send it to 'select_result' class)
The different 'end_select' functions are:
- end_send() ; Send row combination to select_result class.
- end_send_group() ; When group changes, send summary row to select_result class.
- end_write() ; Write row combination to temporary table
- end_write_group() ; When group changes, write summary row to temp table.
- end_update() ; Update row to summary table (for group by)
- end_unique_update() ; Like end_update() but for MyISAM temp tables.
[edit] Temporary tables
[edit] Temporary table creation
Temporary tables are created by sql_select.cc:create_tmp_table() This creates a HEAP or MyISAM table from a list of 'Item'. (Item is a container for constants, fields, expressions etc).
Function comment for sql_select.cc:create_tmp_table():
SYNOPSIS
create_tmp_table()
thd thread handle
param a description used as input to create the table
fields list of items that will be used to define
column types of the table (also see NOTES)
group TODO document
distinct should table rows be distinct
save_sum_fields see NOTES
select_options
rows_limit
table_alias possible name of the temporary table that can be used
for name resolving; can be "".
DESCRIPTION Given field pointers are changed to point at tmp_table for send_fields. The table object is self contained: it's allocated in its own memory root, as well as Field objects created for table columns. This function will replace Item_sum items in 'fields' list with corresponding Item_field items, pointing at the fields in the temporary table, unless this was prohibited by TRUE value of argument save_sum_fields. The Item_field objects are created in THD memory root.
When creating a temporary table one can specify:
- If the temporary table should have a unique key (for group by optimization)
- If all rows should be unique (for distinct optimization)
- The number of rows in temporary table (for LIMIT optimization)
- If the content of GROUP BY functions (MIN, MAX, SUM etc) should be saved or not. (Other option is that we create a field to hold the result for incremental updates to the group by function result.
Normally a temporary table is created in memory (HEAP). The exceptions are:
- If there is blobs in the table (HEAP can't handle blobs)
- If the row should be unique (distinct optimization)
- If the user has specified that the result set will require big temporary tables (SELECT SQL_BIG_TABLES) and SELECT_SMALL_RESULT is not used.
- If table is forced to be MyISAM with the internal TMP_TABLE_FORCE_MYISAM option.
- If we need to do MATCH ... AGAINST .. in BOOLEAN MODE on the temporary table.
[edit] Temporary table sizes
The size of the in memory table is roughly limited by the users variables min(tmp_table_size, max_heap_table_size). (Roughly because internally we change the size to number of rows based on the record length). In addition it's limited to the 'rows_limit' argument to create_tmp_table().
If the in memory table reaches the size of max_heap_table_size then it's automaticly converted to a MyISAM table. When the MyISAM data file reaches the size of max_tmp_table_size the query will be aborted with a ER_RECORD_FILE_FULL (Table 'xxxx' is full) error message.
[edit] When are temporary tables created
During the optimizer phase we create a temporary table to hold a temporary result set if:
- If we have an ORDER/GROUP BY or DISTINCT that uses columns from several tables or the 'to be sorted' table is not the first non constant table in the query plan.
- If SELECT_BIG_RESULT was not used and the query uses a GROUP BY/DISTINCT that can not be resolved by an INDEX.
- If we have different GROUP BY and ORDER BY lists.
- If we have an ORDER or GROUP BY and one of the sort/group arguments is an 'expensive' operation (like stored procedure, sub query or UDF function).
- If SELECT SQL_BUFFER_RESULT is used.
During the execution phase we create a second temporary table if:
- If we had a complex GROUP BY operation and a different ORDER BY. In this case we first create a temporary table to hold the result set of all rows that will be part of the GROUP BY result before we create another table with final result set. An item can mark a GROUP BY to be complex by resetting the 'param.group_by' flag. This happens when using SUM(DISTINCT), GROUP_CONCAT() and WITH ROLLUP.
- We are doing DISTINCT on something that was calculated during GROUP BY resolved by the 'incremental group by calculation' method. For example SELECT DISTINCT SEC_TO_TIME(SUM(B)) FROM t1 group by A;
- UNION uses a temporary table to store the result of the union
- For COUNT(DISTINCT A[,B,C,...]) we create a temporary table with an unique key over all the arguments. This allows us to resolve the DISTINCT by just checking the number of rows in the temporary table. (If the temporary table is an in memory table, we will use the Unique() class to resolve DISTINCT instead of the heap table. (sql_class.h::class Unique).
- For GROUP_CONCAT(A) we use create_tmp_table() to figure out the fields needed to store the arguments. We do however not use the table to store information, we use a binary tree instead.
- When using the information schemas or SHOW commands, we create a temporary table for each result set.
- When using multi-table updates, we create for each changed table a separate temporary table to store the primary key and all changed fields.
Temporary tables can also be created as part of calling 'mysql_select()'. This happens:
- To resolve derived tables (sub queries in the FROM clause). See sql_derived.cc
- When executing multi-table updates (The query is converted to a SELECT and executed as such). See sql_update.cc::mysql_multi_update()
- When executing sub queries. Some sub queries are instantiated as a temporary table (ie, they are only executed once). See item_subselect.cc.
You can tell whether a query requires a temporary table by using EXPLAIN and checking the Extra column to see whether it says Using temporary.
[edit] How are temporary table used
In most cases the temporary table is just created as a container for a total result set which is finally 'looped' over and sent to it's end destination. In a few cases create_tmp_table() is used to find out what kind of fields and storage is needed to store a set of items (GROUP_CONCAT()). When resolving GROUP BY, the temporary table can however be used in a different way.
- Create a temporary table with an unique key over all the group by fields. For each 'group function' we create a container field to hold the value for the group function. For most functions it's of the same type as the argument for the group functions, but there are exceptions: For example for AVG() we store the sum of all rows and the number of rows that had the same group-by key. This can be then used to calculate the correct AVG() value when all rows has been looped trough.
- Loop over all row combinations matching the query.
- For each row, check if a row with all same group fields exists in the temporary table.
- If no, then add a new row to the temporary table with all group fields set. All group by functions are initialized with the information from the current row combination via the 'item_sum->reset()' function.
- If yes, read the old row and incrementally update all sum functions with the new information from the current row combination via the 'item_sum->add()' function.
- When we have looped over all row combination from the original tables, the temporary table now contains the result set. Now we only have to sort and/or remove duplicates (if distinct) before sending the result to it's destination.
[edit] Useful links
For some basic information about query execution, take a look at: http://forge.mysql.com/wiki/MySQL_Internals_Selects
For temporary tables take a look at: http://dev.mysql.com/doc/refman/5.1/en/internal-temporary-tables.html (Not 100 % accurate)
[edit] Session notes
Voice Recording: OGG (mostly silence and keyboard noise)
IRC Log:
[7:47:37 AM:] ***: sqljb (n=joerg@p57A0C1B3.dip0.t-ipconnect.de) has joined the channel [7:48:06 AM:] Leithal: what's the stream URL? [7:48:08 AM:] ***: sqljb is now known as joerg_b [7:48:28 AM:] miracee: http://mysql:university@myuniv.mysql.com:8000/voip.ogg [7:48:55 AM:] ***: Shrews (n=shrews@boa.lssi.net) has joined the channel [7:49:16 AM:] pdubois: monty: can you say somthing? [7:49:43 AM:] pdubois: monty: I hear you. [7:49:47 AM:] monty_: Am I the only one that can create sound ? [7:49:51 AM:] pdubois: Yes. [7:50:17 AM:] monty_: so, how about getting the slides available somewhere? [7:50:57 AM:] pdubois: Let me see... [7:54:25 AM:] ***: EricHerman (n=eric@s5590d530.adsl.wanadoo.nl) has joined the channel [7:55:09 AM:] monty_: While waiting, please read the document and ask questions [7:55:29 AM:] * Leithal has posted one question [7:55:44 AM:] monty_: I have tried to write all possible needed background information, so I don't have much to say except answering questions ;) [7:55:47 AM:] miracee: Leithal: the link was for you :) [7:55:58 AM:] Leithal: miracee: thanks :) [7:56:22 AM:] ***: mhansson has quit IRC (Remote closed the connection) [7:57:06 AM:] ***: joro (n=kgeorge@213.91.243.4) has joined the channel [7:57:40 AM:] ***: mhansson (n=martin@upp-off-vpn-1.mysql.com) has joined the channel [7:57:46 AM:] miracee: but I need the link for the slides ... maybe this question is silly, it's my first real university use today [7:58:44 AM:] pdubois: Okay, I pasted in the slides at http://forge.mysql.com/wiki/Query_Execution - please see if you can access it. [7:59:06 AM:] Leithal: I get them pdubois [7:59:54 AM:] pdubois: Okay, good. We'll start soon. [7:59:56 AM:] ***: horst (n=hhunger@dslb-088-073-062-105.pools.arcor-ip.net) has joined the channel [7:59:57 AM:] ***: guilhemb (n=guilhem@ABordeaux-152-1-21-185.w82-125.abo.wanadoo.fr) has joined the channel [8:00:22 AM:] ***: ingo__ (n=istruewi@tc17-n68-201.de.inter.net) has joined the channel [8:00:48 AM:] ***: #mysql-university You need to be a channel operator to do that [8:01:11 AM:] monty_: Hope everyone has access to the session pages [8:01:19 AM:] pdubois: I guess I'm not a channel operator - cannot set the topic. [8:01:21 AM:] joerg_b: No sound yet for me - still trying. [8:01:33 AM:] * joro has access to the pages [8:01:36 AM:] Leithal: I'm getting typing :) [8:01:43 AM:] pdubois: Can anyone but me hear sound? [8:01:45 AM:] EricHerman: we hear you. [8:01:48 AM:] weigon: monty_: we hear you [8:01:54 AM:] joerg_b: http://mysql:university@myuniv.mysql.com:8000/voip.ogg - is that right ? [8:01:58 AM:] pdubois: Right. [8:02:06 AM:] * horst hear you [8:02:07 AM:] andrey__: no sound in xmms [8:02:39 AM:] monty_: I hope you can all quickly scan the papers and suggest what to talk about [8:02:40 AM:] monty_: to paul: [8:02:44 AM:] pdubois: Session pages can be accessed here: [8:02:45 AM:] pdubois: https://inside.mysql.com/wiki/Overview_of_query_execution_and_use_of_temp_tables [8:02:46 AM:] monty_: The manual page is not complete about WHEN we create temporary tables or [8:02:46 AM:] monty_: when MyISAM/HEAP is used. [8:02:53 AM:] pdubois: http://forge.mysql.com/wiki/Query_Execution [8:02:57 AM:] monty_: To Leithal [8:03:10 AM:] monty_: This is about the problem of using temporary tables twice in a statement: [8:03:23 AM:] monty_: Problem of reopening temporary tables: [8:03:23 AM:] monty_: - The problem is not only UNION, but also [8:03:23 AM:] monty_: CREATE TEMPORARY TABLE t1 ...; [8:03:23 AM:] monty_: SELECT * from t1,t1 as t2; [8:03:23 AM:] monty_: In this case we need to have two cursors on the temporary table t1. [8:03:24 AM:] monty_: Now we have in thd->temporary_tables a list of one handler instance [8:03:26 AM:] monty_: for each temporary table. We can't reuse this for both usage of t1 [8:03:28 AM:] monty_: above as one instance can only refer to one record at a time. [8:03:36 AM:] pdubois: Okay. Monty are you ready to begin? I will introduce the session and hand it off to you. [8:03:48 AM:] monty_: A not trivial problem to solve [8:03:50 AM:] ***: timour (n=tkatchao@212.36.1.202) has joined the channel [8:03:50 AM:] monty_: pdubois: ok [8:04:19 AM:] Leithal: what kind of timelines can we look at to fix this though? [8:04:32 AM:] pdubois: Welcome, everyone. Our session today is by Monty Widenius, titled [8:04:32 AM:] pdubois: Overview of query execution and use of temp tables [8:04:32 AM:] pdubois: Can everyone who is here say that they are here? [8:04:39 AM:] * pdubois is here [8:04:40 AM:] * timour is here [8:04:43 AM:] * Leithal is here [8:04:44 AM:] * kaamos is here [8:04:45 AM:] * joerg_b is here - without sound [8:04:46 AM:] * horst is here [8:04:47 AM:] * joro is here [8:04:48 AM:] * mhansson is here [8:04:54 AM:] * EricHerman is here [8:04:58 AM:] miracee: is here [8:05:03 AM:] * andrey__ is here [8:05:03 AM:] monty_: Leithal: There is two solutions to the problem: [8:05:07 AM:] * ingo__ no sound [8:05:10 AM:] guilhemb: here [8:05:16 AM:] monty_: There is two ways to solve this when we need another instance of the [8:05:16 AM:] monty_: temporary table: [8:05:16 AM:] monty_: The 'open_temporary_table() twice' method: [8:05:16 AM:] monty_: - Open a new instance of the temporary table with 'open_temporary_table()) [8:05:16 AM:] monty_: - Lock it with thr_lock() (should always succeeds as no conflicts with [8:05:18 AM:] monty_: other threads) [8:05:19 AM:] monty_: - Use the table for the duration of the statement [8:05:21 AM:] monty_: - Unlock and free the extra instance (instead of putting it back in [8:05:24 AM:] monty_: the temporary table list) at end of statement [8:05:26 AM:] monty_: The problem with the above is that this is not totally trivial (and thus [8:05:28 AM:] monty_: error prone) and the handler may be confuses by us opening a temporary [8:05:30 AM:] monty_: table twice (not tested). [8:05:32 AM:] monty_: The 'clone()' method: [8:05:33 AM:] monty_: - Add a new option to 'clone()' to inform the handler how the table should [8:05:36 AM:] monty_: be opened. [8:05:37 AM:] monty_: - Open a new table instance with 'clone(... keep-locks)' [8:05:41 AM:] monty_: - Use the table for the duration of the statement [8:05:44 AM:] monty_: - close the extra instance (instead of putting it back in [8:05:46 AM:] monty_: the temporary table list) at end of statement [8:05:47 AM:] monty_: The later method is faster (calls to external_lock, thr_lock etc) and [8:05:49 AM:] monty_: it allows us to do more optimization for this case. It would however [8:05:52 AM:] monty_: require us to add an optimized clone method() to all relevant [8:05:54 AM:] monty_: handlers. (Note that this may need to be done anyway for us to [8:05:56 AM:] monty_: implement foreign keys). [8:05:58 AM:] monty_: I would prefer the clone() method as it's likely that we will need [8:06:00 AM:] monty_: clone() for other things in the future. [8:06:02 AM:] monty_: On the other hand, it's very unlikely we can add this to the default [8:06:04 AM:] monty_: 5.1 branch. Adding it to 5.2 and port it back to the 5.1 community branch [8:06:06 AM:] monty_: may however be a possibility. [8:06:09 AM:] pdubois: Monty is now answering Leith's question at the end of the session page about temporary tables. [8:06:25 AM:] monty_: pdubois: which is the last line you saw ? [8:06:35 AM:] monty_: (I am worried about the IRC missing lines) [8:06:42 AM:] pdubois: monty: may however be a possibility [8:06:43 AM:] Leithal: I got all of that [8:07:04 AM:] monty_: good, then Leithal should have got a complete answer, I hope [8:07:12 AM:] pdubois: monty_: Will now you go through your session slides? [8:07:28 AM:] monty_: I don't see the point, as they stand alone [8:07:44 AM:] Leithal: monty_: yes - thank you! I would very much like to see this in 5.2, so I will try to get it on to the list of things for that with product management etc. [8:07:46 AM:] monty_: I assume everyone can read [8:08:01 AM:] ***: vangelis_GSOC (n=p01053@labs.cs.unipi.gr) has joined the channel [8:08:17 AM:] monty_: But I can easily go deaper into any issue that anyone wants to hear more about [8:08:36 AM:] pdubois: Okay. To everyone here: It would be good if you have questions; please post them. [8:09:18 AM:] monty_: Or do you want me to go through the session slides and give a quick overall note what they contain and why ? [8:09:32 AM:] * joro wants to here more about logging the queries executed to the binlog [8:09:38 AM:] joro: s/here/hear/ [8:10:11 AM:] joro: when are they logged ? [8:10:21 AM:] joro: and who decides how much is logged ? [8:10:35 AM:] guilhemb: depends on if row-based or statement-based [8:10:38 AM:] monty_: everything is logged so that it's repeatable on the slave [8:10:58 AM:] monty_: and of course, anyone that can answer a question, feel free to join in! [8:11:07 AM:] joro: what about the case when the slave is in a different autocommit mode ? [8:11:23 AM:] monty_: We always log the whole transaction [8:11:32 AM:] monty_: So the slave will always exceute the whole transaction too [8:11:50 AM:] monty_: ie, all transactions logged are implictely in a begin ... commit block [8:12:29 AM:] joro: how does that relate to the autocommiting statements (like drop table) [8:12:54 AM:] Leithal: to create temporary tables you need the create temporary table privilege, however you can not run any DML on the table once you have done this, which means you have to open up privileges on the database level to be able to use them effectively, do we plan on extending the 'temporary table' privielges so that you can perform any action on them without db level privs? [8:12:56 AM:] monty_: all DLL (ie drop ,create) are in it's own transaction and commits any previous transaction [8:13:42 AM:] joro: monty_: even "create temporary table" ? [8:13:43 AM:] monty_: Hm; You should be able to do anything on a temporary table without DML privileges [8:14:07 AM:] monty_: If I remember correctly, you can define if a create temporary is logged or not [8:14:17 AM:] monty_: (or at least, the plan has always been to allow this) [8:14:34 AM:] ***: peterD_L (n=peter@bas13-toronto12-1167982836.dsl.bell.ca) has joined the channel [8:14:36 AM:] ***: stefan__ is now known as stefan-mysql [8:14:40 AM:] Leithal: 'should be able to' is perhaps correct :) we have had a customer raise this as a concern - because you can not [8:14:43 AM:] * joro is going to verify that [8:14:48 AM:] joro: thanks [8:15:26 AM:] ***: alik (n=alik@81.94.152.42) has joined the channel [8:15:36 AM:] monty_: for statement based we should log the create temporary [8:15:45 AM:] monty_: (at least the code looks like it should do it..) [8:15:55 AM:] monty_: because we do support one to do [8:16:09 AM:] monty_: INSERT INTO t1 select * from temp_table; [8:16:09 AM:] ***: ccharles (n=byte@fedora/byte) has joined the channel [8:16:15 AM:] monty_: and this only works if create temporary is logged [8:16:19 AM:] ***: ChanServ has opped stefan-mysql [8:16:47 AM:] monty_: joro: why do you think create temporary is not logged ? [8:16:53 AM:] joro: ok. but I was interested more about the create/drop commands themselves [8:16:55 AM:] monty_: (it's not logged in row based logging) [8:17:08 AM:] monty_: they should all be logged [8:17:19 AM:] joro: monty_: I think it is. I was just trying to understand the transaction boundries. [8:17:50 AM:] ***: stefan-mysql has changed the topic to “TOPIC: Overview of Query Execution and Use of Temp Tables (Monty Widenius)†[8:17:51 AM:] monty_: It's the master that defines the transaction boundaries; The slave only exceuts them [8:18:00 AM:] joro: I see. [8:18:18 AM:] ***: ingo__ has left the channel ("Leaving") [8:18:41 AM:] monty_: According to how the master tells it; Generally the slave configuration (like auto commit) doesn't have any effect on how the statements are executed [8:19:28 AM:] monty_: but there are exceptions; Slave can have a different default engine, which is used for all tables created without a specfic engine (or if the master's engine is not loaded) [8:19:57 AM:] joro: aha. ok. that makes sense. [8:20:42 AM:] joro: hello, jani :-) [8:21:05 AM:] Leithal: this is the problem I refer to http://pastebin.ca/566442 [8:21:10 AM:] monty_: I will this week update the document with any questions added to the university page or asked here [8:21:40 AM:] Leithal: to be able to allow somebody to actually insert into a temporary table, without knowing the names of them up front, and without them first existing, you need a database wide DML level privilege [8:21:58 AM:] Leithal: can we extend the create temporary table privielge to 'do anything on temporary tables'? [8:22:07 AM:] monty_: The main problem with the access checks is that we do them BEFORE we open a table [8:22:21 AM:] monty_: so we don't yet know if it's a temporary table that is refered to [8:22:28 AM:] monty_: but this is someting that should be chanegd [8:23:10 AM:] monty_: DROP t1,t1; [8:23:40 AM:] monty_: The access checking of the above is not trivial as the first may refer to a temporary table, but in this case the other doesn't [8:24:12 AM:] Leithal: OK I see [8:24:55 AM:] monty_: Anything you want me to go into greater detail in the document ? [8:25:08 AM:] monty_: either written (later) or verbally (now)? [8:25:15 AM:] pdubois: Are we getting sound dropouts? [8:25:23 AM:] * joro hears just time [8:25:44 AM:] vangelis_GSOC: I get [8:26:24 AM:] pdubois: If there are no more questions, I think we can conclude. [8:26:26 AM:] pdubois: Anyone? [8:26:58 AM:] * joro has one [8:27:12 AM:] pdubois: joro: Please ask [8:27:20 AM:] joro: are there any plans to preserve temp tables ? [8:27:29 AM:] monty_: do you mean over crashes ? [8:27:32 AM:] joro: like statement cache of a sort ? [8:27:45 AM:] timour: joro: do you mean across prepared statement re-execution? [8:27:46 AM:] Leithal: like global temporary tables? [8:27:52 AM:] monty_: do you mean in the quert cache? [8:27:59 AM:] joro: both global and across prepared statements [8:28:07 AM:] joro: yes, the query cache [8:28:14 AM:] monty_: global should be doable, but not planned [8:28:20 AM:] ***: jani (n=jamppa@a88-112-100-96.elisa-laajakaista.fi) has joined the channel [8:28:23 AM:] ***: ingo__ (n=istruewi@tc17-n68-201.de.inter.net) has joined the channel [8:28:24 AM:] Leithal: I know we have a worklog for global temporary tables [8:28:30 AM:] monty_: for query cache, it should be trivial to add support (if it's not there already) [8:28:44 AM:] monty_: actually, it isn't as this is thread specific [8:29:04 AM:] monty_: joerg_b: on the other hand, you do have a kind of temporary tables with HEAP tables [8:29:06 AM:] Leithal: we would have to do global first right? [8:29:39 AM:] monty_: The only difference between global and heap tables is that they would be automaticly droped on crash [8:29:44 AM:] monty_: (more or less) [8:30:21 AM:] Leithal: that is worklog 934 [8:30:27 AM:] monty_: Leithal: can you dig up the worklog item so I can include it into the documentation ? [8:30:29 AM:] monty_: thanks! [8:30:49 AM:] monty_: so, any other questions ? [8:31:08 AM:] Leithal: nope! [8:32:22 AM:] monty_: does anyone want us to create similar overview workflows for other statements than SELECT ? [8:32:40 AM:] timour: monty_, absolutely, and [8:32:45 AM:] joro: hmm, insert would be interesting : especially the DELAYED kind. [8:32:56 AM:] Leithal: I think it would be helpful to have something like this for SP execution as well [8:33:05 AM:] timour: it would be very beneficial to explain what are the *non*-ovelapping parts [8:33:10 AM:] joro: and prepared statements. [8:33:15 AM:] timour: of the code that handles other statements. [8:33:32 AM:] monty_: timour: isn't the over-lapping parts as intersting ? [8:34:19 AM:] miracee: I think update would be interesting too [8:34:27 AM:] timour: monty_, all are interesting, but we have some code duplication, so it would help to know where to look (well, I know some parts already). [8:34:42 AM:] vangelis_GSOC: is this documentation you refer to for internal purposes? [8:35:11 AM:] monty_: it's for internal.text, ie for everyone (but it's descrbing the interrnals) [8:35:49 AM:] vangelis_GSOC: yes. I though it might be for inter-cooprorate use [8:35:57 AM:] timour: monty_, an outline of how INSERT/UPDATE are processed, with notes where we take shortcuts instead of executing via the "normal" SELECT call path. [8:36:00 AM:] guilhemb: Leithal: SP execution flow is already documented, let me find it [8:36:02 AM:] weigon: stream is quiet ? [8:36:18 AM:] pdubois: me too [8:36:26 AM:] monty_: update/insert are actually almost never exceuted through the SELECT part [8:36:44 AM:] ***: alik has left the channel ("Leaving") [8:36:48 AM:] guilhemb: Leithal: http://forge.mysql.com/wiki/MySQL_Internals_Stored_Programs [8:36:57 AM:] pdubois: or, sound is choppy, not gone [8:36:58 AM:] Leithal: guilhemb: Thanks! [8:37:08 AM:] monty_: vangelis_GSOC: all documentation are now on forge and this should be there too [8:37:10 AM:] joro: please consider also documenting prepared statements execution [8:37:42 AM:] monty_: joro: yes, we should have kostja doing a session on this and document it at the same time [8:38:08 AM:] monty_: pdubois: are you taking notes about future sessions ? [8:38:22 AM:] monty_: and future documentation issues/ideas? [8:38:24 AM:] pdubois: monty_: I have the IRC session. :-) [8:38:36 AM:] monty_: I can add everything that is related to select and temporary tables to my document [8:38:51 AM:] pdubois: monty_: sounds good [8:39:40 AM:] pdubois: monty_: this is about the usual size crowd [8:39:56 AM:] monty_: if you have any comments on the format of the document, please add it is a comment [8:40:02 AM:] vangelis_GSOC: I am from Google Summer of Code, Hi! [8:40:04 AM:] miracee: pdubois: fyi: my private client here always log irc [8:40:30 AM:] joro: the form seems fine [8:40:32 AM:] timour: vangelis_GSOC, if you have any questions - ask, now is the time. [8:40:36 AM:] pdubois: miracee: mine as well [8:40:46 AM:] miracee: :) [8:41:22 AM:] timour: monty_, one very useful thing would be to add a summary of the main data structures in the server, and what concepts they represent. [8:41:36 AM:] joro: this document will be very valuable as a base to maintain when adding new ways into the query execution. [8:41:43 AM:] monty_: timour: that would be a very usefull session! [8:41:46 AM:] timour: monty_, for example, in your outline you write: "Setup join structure with accessors functions". [8:41:59 AM:] vangelis_GSOC: Yes structures and important classes, apart from the ones in 'Internals [8:42:20 AM:] timour: So what are "accessor functions", what data structures constitute a query execution plan? [8:43:06 AM:] monty_: This is the query exceutioner; It only relalizes the plan the optimizer have given it [8:43:42 AM:] monty_: having all the data structures used for optimizer documented in a way to show all connections would be very nice [8:43:44 AM:] timour: monty_, so how does a plan look like? what constitutes a complete, executable plan? [8:43:46 AM:] Shrews: I would like to see the Doxygen output put on the wiki so that other articles could link to it more easily. :-) [8:44:07 AM:] monty_: Shrews: I would prefer to link to a live doxygen tree [8:44:21 AM:] monty_: Didn't know how to do that (and didn't have time to find out :( ) [8:45:15 AM:] monty_: timour: don't know how to answer you shortly [8:45:48 AM:] timour: monty_, yes, I understand, so this is more a suggestion for another session I guess :-) [8:46:17 AM:] monty_: timour: the flow charts in: Description of join execution (sql_select.cc::do_select()) should give you most of the information [8:46:25 AM:] monty_: these are exactly the structures we set up [8:46:42 AM:] monty_: ie, join->next_seect [8:46:49 AM:] timour: monty_, at this level of detail I know things for few years now. [8:47:15 AM:] monty_: timour: yes, but we are a lot of new people here that don't know the optimizer [8:47:21 AM:] timour: monty_, but many newcomers would find it difficult to figure. [8:47:24 AM:] monty_: You want to have the exact, in detail information [8:47:26 AM:] timour: monty_, that was my point [8:47:33 AM:] vangelis_GSOC: monty_: that would be really interesting [8:47:52 AM:] monty_: timour: is this a session that you would like to hold? [8:48:02 AM:] monty_: I can help you with describing all the structures! [8:48:13 AM:] timour: monty_, I don't know a lot of details, so that's why I asked you :-0 [8:48:15 AM:] timour: :-) [8:48:29 AM:] monty_: I can do the hard work and getting most of the information [8:48:51 AM:] monty_: then you can help fill in those things you and igor know and write it in a way that you can understand [8:48:58 AM:] timour: monty_, Ok, let's discuss this offline. [8:49:15 AM:] monty_: ie: As I understand the code and the purpose of it, it's easy to write it in too technical way [8:49:33 AM:] monty_: (ie, so that you can't still understand it without extra knowledge) [8:50:35 AM:] * timour agrees [8:50:58 AM:] pdubois: More questions, anyone? [8:51:18 AM:] vangelis_GSOC: monty_: as I understand there is a plan to further document the optimizer/ SELECT execution flow? [8:51:23 AM:] timour: vangelis_GSOC, do you have questions related to "ref" optimization? [8:51:59 AM:] monty_: vangelis_GSOC: yes, we plan to document everything and these university sessions is one way to know how to prioritize the documentation [8:52:49 AM:] vangelis_GSOC: timour: udpate_ref_and_keys is still a bit unclear to me as a flow, so at the moment I have no specific questions [8:53:19 AM:] vangelis_GSOC: monty_: I can certainly help at spotting frustruting part :-) [8:54:07 AM:] monty_: update_refs_and_keys bascily creates an array of all possible key parts that can be used for every table [8:54:38 AM:] monty_: then later we skip all key parts that are not a prefix of the key [8:54:58 AM:] vangelis_GSOC: monty_: are they deleted form the key arrays of simply pushed in the back? [8:55:22 AM:] monty_: we only delete duplicates (as far as I can remmeber) [8:55:43 AM:] monty_: the idea from the start was to allow the optimzer to also work with handlers that have keys that can be read without a prefix [8:56:00 AM:] monty_: duplicate parts are deleted [8:56:21 AM:] vangelis_GSOC: monty_: with handlers you mean table/file handlers structures? [8:56:31 AM:] monty_: table handlers [8:57:03 AM:] monty_: we very recently changed the index_read() functions so that one can read a key based on any combination off key segments [8:57:08 AM:] monty_: ie, not only a key prefix [8:57:29 AM:] monty_: this allows use to better use handlers that can optimize: [8:57:41 AM:] monty_: SELECT * from t1 where key_part2=2 and key_part3=3 [8:58:24 AM:] ***: jbalint (i=jbalint@unaffiliated/jbalint) has joined the channel [8:58:32 AM:] monty_: vangelis_GSOC: is it clear? [8:58:55 AM:] vangelis_GSOC: monty_: it makes things less blurry :-) [8:59:00 AM:] timour: monty_, but we still don't have support in the optimizer to make use of this, right? [8:59:21 AM:] timour: At least I am not aware of such functionality. [8:59:25 AM:] monty_: no, this is something that we need to add [8:59:36 AM:] monty_: and we have a customer request about doing this [8:59:54 AM:] Leithal: so the left most prefix rule on indexes would not apply on this once added to the optimizer? [9:00:09 AM:] monty_: yes [9:00:14 AM:] monty_: for that particular engine [9:00:23 AM:] Leithal: nice [9:00:36 AM:] timour: Leithal, yes, but currently the optimizer would pick table scan even if the storage egine can do "index skip scans". [9:01:08 AM:] Leithal: timour: that's what I meant by 'once added' :) [9:01:19 AM:] timour: Leithal, so we have to change the optimizer to consider such query plans. [9:01:20 AM:] monty_: timour: this is better than index skip scan; Ie, we need a storage engine flag that tell that any key part combination is as good as a key [9:01:45 AM:] timour: monty_, I agree, this is more general. [9:01:46 AM:] monty_: ie, we can threat this as a normal key [9:02:10 AM:] Leithal: that would save a lot of duplicated indexes [9:02:13 AM:] Leithal: which would be great [9:02:14 AM:] timour: Index-skip-scan assumes a sequence of key parts. [9:02:15 AM:] monty_: I don't kow how to fix opt_range for this kind of indexes [9:02:37 AM:] monty_: but the normal key read (ref) should work perfectly [9:02:46 AM:] timour: Leithal, not exactly, as you can't expect similar performance. [9:03:07 AM:] timour: Leithal, if you have an index prefix, you can hardly beat that. [9:03:19 AM:] Leithal: yea I guess [9:04:12 AM:] Shrews: so it would basically be the same as a full index scan? [9:04:17 AM:] monty_: no [9:04:24 AM:] timour: monty_, this can be true for column-based storage engines. [9:04:42 AM:] monty_: Shrews: now, performance is similar to using a key search [9:04:52 AM:] timour: monty_, so you are right, for those this should be the case. [9:05:33 AM:] monty_: timour: it's of course also depending in how the keys are stored [9:06:18 AM:] timour: monty_, a question I always wanted to ask - [9:06:23 AM:] monty_: then please do [9:06:37 AM:] weigon: timour: can you explain "index skip scans" in 2 sentences ? [9:06:52 AM:] weigon: is it jumping ahead then checking each node ? [9:06:53 AM:] monty_: ie, you do a 'read_key_next' for every key read [9:07:10 AM:] monty_: ie, if you are doing a two key part scan, you say that for next read you want [9:07:14 AM:] timour: weigon, if you look at the index as a tree, and you don't have a prefix of the tree, [9:07:18 AM:] monty_: key > first_part [9:07:23 AM:] timour: but you have some infix, [9:07:36 AM:] monty_: SELECT * from t1 where key_part2=2 [9:07:36 AM:] timour: then you can jump over the subtrees. [9:07:53 AM:] weigon: ah, sure [9:07:56 AM:] monty_: Assume first row is (1,2) [9:08:10 AM:] timour: weigon, and when we do that, we can skip whole subtrees altogether. [9:08:13 AM:] monty_: then you can do a new search for key_part1 > 1 [9:08:33 AM:] monty_: this works very good if you have many equal values for key_part1 [9:08:56 AM:] monty_: weigon: is it clear? [9:09:01 AM:] monty_: (or clearer) [9:09:12 AM:] weigon: currently we build a compound index by (CONCAT(RFILL(key1, max_length), key2) (in MyISAM,...) ? [9:09:37 AM:] monty_: why rfill()? [9:10:10 AM:] weigon: VARCHAR(32) results in a 32-char entry in the index, right ? [9:10:15 AM:] monty_: no [9:10:39 AM:] monty_: varchar is stored (at least in myisam) as <LENGHT>DATA [9:10:48 AM:] weigon: f1 varchar(32), f2 varchar(32), index(f1, f2) [9:10:52 AM:] monty_: and the data is prefix compressed to previous (if first key part) [9:11:08 AM:] monty_: so in this case, you have prefix compression of f1 [9:11:14 AM:] weigon: ok, prefix-compression [9:11:20 AM:] ***: LenzGr (n=lgrimmer@c184198.adsl.hansenet.de) has joined the channel [9:11:24 AM:] monty_: (and prefix compression for f2 for all data with same f1) [9:11:25 AM:] weigon: how about the first key ? [9:11:35 AM:] monty_: It's stored as [9:11:48 AM:] monty_: <prefix-length><left-length>data [9:12:17 AM:] monty_: Ie, the <lenght> byte is not part of compression [9:13:38 AM:] weigon: so, this is a worst-case check: [9:13:39 AM:] weigon: create table t5 ( f1 varchar(124), f2 varchar(1024), index(f1, f2)); [9:13:39 AM:] weigon: ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes [9:13:43 AM:] monty_: So if you have data (AAAA, BBBB) and (AAAB, BBBB) they first key is <4>AAAA<4>BBBB and second is <128+3><1>B<4>BBBB [9:14:12 AM:] monty_: use index(f1,f2(500)) [9:15:09 AM:] timour: monty_, so my question was: are you aware of other DBMS that do "constant table optimizations" or something similar? [9:15:09 AM:] monty_: index(f1(20),f2(20)) is in many cases preferable (assuming it's likely that the first 20 characters differs) [9:15:20 AM:] monty_: no [9:16:48 AM:] monty_: but i don't know; others may do it too. I just thought it was a good common optimization to add (especially when I had a lot of tables with 1 row as system tables when I created MySQL) [9:16:54 AM:] timour: monty_, yes, in addition I mean other constant optimizations such as replacing MAX(), MIN() with a constant. [9:17:35 AM:] timour: monty_, well, but this optimization makes it much harder to implement proper prepared statements, so I guess that's why others don't have it. [9:17:38 AM:] monty_: according to benchmarks, almost no one else optimizes MIN(), MAX(), cOUNT() the way we do [9:17:38 AM:] ***: stefan-mysql is now known as stefan-mysql-awa [9:17:58 AM:] monty_: why is it harder ? [9:18:00 AM:] ***: stefan-mysql-awa is now known as stefan-mysql-afk [9:18:17 AM:] timour: monty_, because the query plan becomes very dependent on data [9:18:32 AM:] monty_: that's true in normal life [9:18:45 AM:] monty_: the range optmizer we have gives us very different plans depending on data [9:18:46 AM:] timour: so the optimizer has re-do much of the optimization even if a statement is prepared. [9:18:52 AM:] monty_: and this is a very good thing [9:19:21 AM:] monty_: better to get good plans than inferior plans once in a while [9:19:43 AM:] Leithal: is this something that the likes of histograms would help solve? [9:19:49 AM:] monty_: no [9:19:59 AM:] monty_: histograms also changes depending on data ranges [9:20:04 AM:] timour: Leithal, that was what I was going to say :-) [9:20:07 AM:] monty_: so histograms only makes it workse [9:20:38 AM:] monty_: ie, in all cases you have the problem that different constants may give you different join orders or different methods how to execute the join [9:20:55 AM:] monty_: and if that's the case, there is no obvious solution [9:21:16 AM:] monty_: one could store a default query plan and just verify it it's 'acceptable' and if not, generate a new one [9:22:00 AM:] Leithal: so more like oracle's stored outlines? [9:22:06 AM:] timour: monty_, one could store multiple plans (or sub-plans), that was suggested over 10 years ago. [9:22:08 AM:] monty_: possible; Don't know about them [9:22:28 AM:] monty_: timour: and with infivite work hours and developers would have this done ;) [9:22:50 AM:] Leithal: same concept really, 'always execute using this execution plan' - but it's an '*always*' rather than 'if not generate a new one' [9:23:19 AM:] jbalint: ah, and they are saved in the db too, right? and you can tweak them offline [9:23:20 AM:] monty_: I don't like the 'always' case, as it's very bad in a dynamic word [9:23:41 AM:] Leithal: jbalint: yea [9:23:54 AM:] Leithal: monty_: I would tend to agree yea, 'things change' [9:23:54 AM:] monty_: for static, well defined and well distributed data things works ok [9:23:57 AM:] Leithal: you have to notice that [9:24:20 AM:] * timour will have to leave in 5 min [9:24:54 AM:] joro: one should consider the lifespan of a prepared statement. if it's shorter than the time to change the ratio of the data. [9:25:47 AM:] monty_: joro: SELECT * from t1 where key=5; [9:25:51 AM:] monty_: joro: SELECT * from t1 where key=?; [9:26:08 AM:] pdubois: monty: timing note: We'll be at 90 minutes in a few minutes. Sessions allow up to 2 h, most have been about 90 mins. [9:26:12 AM:] monty_: if input and all values are 5, you want to do a table scan [9:26:19 AM:] monty_: otherwise you want to at once give an error [9:26:29 AM:] monty_: This you can't do by pre-pearing the query [9:26:41 AM:] ***: peterD_L is now known as peterD__L [9:26:48 AM:] monty_: I think we are fine now, as we are discussing things outside of the scope of the session [9:27:02 AM:] monty_: does anyone has anything more to ask that is related to the session topic? [9:27:06 AM:] monty_: one last question.... [9:27:11 AM:] * joro has nothing [9:27:28 AM:] vangelis_GSOC: monty_: (and timour before you leave) (back to the documentation issue) half of my GSOC project is documenting make_join_statistics(). I could help there either on producing documentation parts or on the structure of the document (as a neebie I could spot unclear/difficult parts) [9:28:11 AM:] monty_: vangelis_GSOC: who are you working with on this? [9:28:15 AM:] timour: vangelis_GSOC, I am aware of that, we will discuss that soon. [9:28:22 AM:] timour: monty_, with me [9:28:28 AM:] ***: horst has quit IRC ("Chatzilla 0.9.73 [Firefox 1.5.0.3/2006042500]") [9:28:30 AM:] monty_: I can also help with anything that you find unclear [9:28:53 AM:] vangelis_GSOC: timour: I brought this up since the talk was about documentation/commenting [9:28:59 AM:] monty_: timour: if you come up with questions, ping me anytime on IRC and I will try to document those parts that you find you need help with [9:29:10 AM:] monty_: vangelis_GSOC: yes, thanks! [9:29:47 AM:] timour: monty_, thanks! [9:29:56 AM:] vangelis_GSOC: monty_: :-) [9:30:35 AM:] * timour leaving [9:30:44 AM:] ***: timour is now known as timour|away [9:30:57 AM:] monty_: vangelis_GSOC: especially the structures used by make_join_statistics() and then later filled up by make_join_read_info() would be nice [9:31:00 AM:] LenzGr: monty_: So make sure that you hang out on the #mysql-dev channel on freenode often :) [9:31:13 AM:] monty_: LenzGr: I plan to do [9:31:20 AM:] ***: guilhemb is now known as guilhemb|bbl [9:31:32 AM:] ***: Shrews has left the channel () [9:32:08 AM:] vangelis_GSOC: monty_: I think that a document should have an introduction for the general structure/structs used and then get into more details [9:32:32 AM:] LenzGr: vangelis_GSOC: Will you add this document to the "internals" documentation on the MySQL Forge Wiki? [9:32:35 AM:] monty_: vangelis_GSOC: agree [9:32:38 AM:] vangelis_GSOC: monty_: get into details in 2 maybe levels of detail? [9:33:04 AM:] vangelis_GSOC: LenzGr: I have talked with timour about it. Finally it will go there [9:33:06 AM:] monty_: vangelis_GSOC: don't know exactly what you mean with the levels [9:33:31 AM:] vangelis_GSOC: monty: I mean using different levels of detial [9:33:31 AM:] LenzGr: vangelis_GSOC: Excellent! Let me know if you need any assistance with the Forge part [9:33:44 AM:] vangelis_GSOC: Digging into more technical issues [9:33:45 AM:] LenzGr: vangelis_GSOC: Or if you need monty on freenode IRC :) [9:34:08 AM:] vangelis_GSOC: monty_: but prograssivelly [9:34:20 AM:] monty_: ok [9:34:24 AM:] vangelis_GSOC: LenzGr: thanks! [9:34:50 AM:] vangelis_GSOC: monty_: I have nothing else for the moment. [9:35:01 AM:] monty_: ok, have a good day everyone! [9:35:17 AM:] jbalint: thanks! [9:35:20 AM:] monty_: Will update the documentation with the new stuff before next week [9:35:20 AM:] LenzGr: Thanks, monty! Don't forget to stop the audio. [9:35:20 AM:] vangelis_GSOC: monty: thanks :- [9:35:30 AM:] weigon: monty_: thanks [9:35:32 AM:] joro: thanks monty
[edit] Questions posted for the Session
Area where attendees can post questions in advance to the Session
- We have had a request from a customer (this week in fact) that we fix the restriction that temporary tables can only be used once within any single statement. I understand this is due to the fact that all tables are locked up front, and we had to put this restriction in to place for UNION statements (to stop deadlocking). This is referenced within Bug #7742 (http://bugs.mysql.com/bug.php?id=7742). On the bug it mentions that we would fix this in 5.1 - however that does not seem to have been done. How hard is this to fix, and when might we be able to fix it? (Mark Leith)
- In what ways is the internal-temporary-tables page in the manual inaccurate? (Useful to know so that I can fix it.) (Paul)
