INFORMATION SCHEMA storage engine
Contents |
[edit] INFORMATION_SCHEMA storage engine
This is a page for a Google Summer of Code 2009 project of implementing a storage engine for INFORMATION_SCHEMA tables.
Student: Haihao Tang
Mentor: Sergei Golubchik
[edit] Task description
[edit] Current I_S internal
There is an enumeration named enum_schema_tables in file hander.h, which contains all INFORMATION_SCHEMA tables’ names. Another important thing is ST_SCHEMA_TABLE array schema_tables in sql_show.cc. Make sure that the order of schema_tables and enum_schema_tables are the same. Each structure ST_SCHEMA_TABLE represents one INFORMATION_SCHEMA table, like ‘ENGINES’ and ‘TABLES’, and the structure contains the table’s name, fields definition, and function pointers of creating, filling and processing the table. The definition of ST_SCHEMA_TABLE looks as follow:
typedef struct st_schema_table
{
const char* table_name;
ST_FIELD_INFO *fields_info;
/* Create information_schema table */
TABLE *(*create_table) (THD *thd, TABLE_LIST *table_list);
/* Fill table with data */
int (*fill_table) (THD *thd, TABLE_LIST *tables, COND *cond);
/* Handle fileds for old SHOW */
int (*old_format) (THD *thd, struct st_schema_table *schema_table);
int (*process_table) (THD *thd, TABLE_LIST *tables, TABLE *table,
bool res, LEX_STRING *db_name, LEX_STRING *table_name);
int idx_field1, idx_field2;
bool hidden;
uint i_s_requested_object; /* the object we need to open(TABLE | VIEW) */
} ST_SCHEMA_TABLE;
For example, the 'table' element in array schema_tables is I_S table ‘TABLES’. As we can see in source code:
{"TABLES", tables_fields_info, create_schema_table,
get_all_tables, make_old_format, get_schema_tables_record, 1, 2, 0,
OPTIMIZE_I_S_TABLE}
The table’s name is TABLES;
The table’s fields definition is described by structure charsets_fields_info;
The table’s creation is executed by function create_schema_table;
Function get_all_tables is for all based file system tables. Based File system tables mean the tables show tables' meta-data designated by database name and table name, like db.t1. Function get_schema_tables_record is especially for TABLES table, where other based file system tables have different function. Table VIEWS has function get_schema_views_record and table PARTITIONS has funtion get_schema_partitions_record.
Function make_old_format is used for "SHOW" statments. If users run:
SHOW TABLES FROM db;
make_old_format will handle it.
1 and 2 represents field 1 and field 2 will be the indexes of table, which are TABLE_SCHEMA and TABLE_NAME.
0 means whether the table should be hidden.
OPTIMIZE_I_S_TABLE is flag of optimization.
If a query is, for example,
SELECT * from INFORMATION_SCHEMA.TABLES;
The table creation is:
create_schema_table (Forms the fields by tables_fields_info) -> create_tmp_table -> get_new_handler (reach the storage engine level, if table is small, use a heap, or use MyISAM or Maria).
After creation, filling the table by function get_all_tables and get_schema_tables_record.
[edit] How I_S work now
Metadata are stored in directories and files. One .frm file per table, one directory per schema. The list of all tables in a schema is generated by scanning the directory and collecting file names.
Currently, I_S tables are implemented using temporary tables. When a client sends, say,
SELECT * FROM INFORMATION_SCHEMA.TABLES;
MySQL scans all directories under the datadir, creates a list of all .frm files there - it is a list of all tables - and then opens each table in turn to get the table metadata. It is done early in the SELECT execution, and all these data are stored in the temporary table. Then SELECT reads from this temporary table.
If a query is, for example,
SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE SCHEMA_NAME='db1' AND TABLE_NAME='t1';
MySQL doesn't need to scan all directories, it can open the file 'db1/t1.frm' directly. To support this optimization sql_show.cc contains a code to analyze WHERE condition, duplicating similar functionality in the optimizer.
[edit] Problems
Preloading the data in a temporary table hurts performance. Especially when this table is created on disk. LIMIT isn't honored. Functionality of the optimizer is duplicated. Only simple WHERE expressions are optimized (optimizer is duplicated incompletely, of course), LIKE and inequalities aren't supported. PERFORMANCE_SCHEMA is implemented differently.
[edit] New implementation
I_S (and P_S) tables should be implemented using a storage engine. All I_S/P_S tables are pre-created, they always exist, they have a fixed structure. New engine will implement that. Actual tables will be implemented with a table descriptor structure, like the one we use now. This table descriptor will list all columns of a table, index, open flag and etc:
struct infos_table_share
{
/** Table name. */
LEX_STRING m_name;
/** Table fields. */
ST_INFOSCHEMA_FIELD *m_fields;
/** Table fields number. */
uint m_fields_count;
/** Open table function. */
infos_open_table_t m_open_table;
/** Length of the m_pos position structure. */
uint m_ref_length;
/** The lock, stored on behalf of the SQL layer. */
THR_LOCK *m_thr_lock_ptr;
/** index idx. */
int m_index_field1, m_index_field2;
/** Open flag. */
uint m_i_s_requested_object;
};
m_fields contains all definitions of fields, which is as follow:
typedef struct st_infoschema_field
{
/**
Column name.
*/
const char *field_name;
/**
For string-type columns, this is the maximum number of
charactor. Otherwise, it is the 'display-length' for the column.
*/
uint field_length;
/**
This denotes data type for the column.
*/
enum enum_field_types field_type;
int value;
/**
Field attributes.
*/
uint field_flags;
const char *old_name;
/**
This should be one of @c SKIP_OPEN_TABLE,
@c OPEN_FRM_ONLY or @c OPEN_FULL_TABLE.
*/
uint open_method;
} ST_INFOSCHEMA_FIELD;
Here take table TABLES as example to explain how the new I_S storage engine work. If a client sends, say,
SELECT * FROM INFOSCHEMA.TABLES;
MySQL server will find table definition from .frm file normally, but in the new I_S storage engine, .frm files are not allown to exist. Since .frm files don't really work for plugins. Any plugin may install I_S table any time, so we cannot pre-create them all. And a user can unload a plugin any time - the table have to dissapear automatically. Instead of .frm files, the storage engine use ha_discover to get filled TABLE_SHARE structure. Function infoschema_discover creates a TABLE_SHARE and fills it with columns defintion and index definiton. After returning the TABLE_SHARE, MySQL server could process the table as normal as opening table with .frm file. The next thing is fetching data for table TABLES. ha_infoschema::rnd_init will construct all database names and table names, and ha_infoschema::rnd_next will get right data according to database name and table name one by one. The above-mentioned is whole table scanning. How does index scanning work? Like,
SELECT * FROM INFOSCHEMA.TABLES WHERE TABLE_SCHEMA = "db";
The open table is the same as whole table scanning by using infoschema_discover. ha_infoschema::records_in_range will decide to use which index; ha_infoschema::index_read_map will make up all table names of database db and return one row data; ha_infoschema::index_next will return row data one by one; ha_infoschema::index_end will destroy all resources which are constructed in ha_infoschema::index_read_map.