ForeignKeySupport
This page describes the MySQL plan for Foreign Key Support, which is currently under
consideration. Internally it's known as WL#148 (WL stands for "Work Log"; conventionally
worklog entries are written like this). The worklog is updated periodically, and at regular
intervals we'll update this page.
The format of this page has been derived manually from the original entry, and may contain errors and slightly outdated information. Please quote from this page when sending feedback on the specification. Thank you.
[edit] High-level description
[edit] Foreign Keys
MySQL currently "supports" foreign keys in two ways:
- if storage engine = InnoDB then let InnoDB handle them,
- otherwise parse and ignore foreign-key clauses.
The goals of this task are:
- take the support out of the storage engine so that foreign keys work for all storage engines (this is an instruction from Brian Aker),
- support SQL standard syntax instead of InnoDB syntax.
For a question from an earlier version of this worklog entry, Brian said he would go along with the "majority". Please vote by commenting in the appropriate places for anything that you like/dislike.
I use the abbreviation FK when discussing the referencing table (the table with the foreign key), or its indexes and columns. I use the abbreviation PK when discussing the referenced table (which is often a table with a primary key), or its indexes and columns.
Note that this implementation is for immediately-checked constraints only. Deferred contraints -- i.e. constraints that are checked at transaction-end rather than statement-end are another task and will be implemented for all constraint types (not just foreign key constraints) when the time comes.
[edit] High-level specification
[edit] Brian's comment, do not delete without permission
A flag must be added to disallow a foreign key constraint against a table type that can not possible support it (aka think about Federated). If the flag is set then an error is tossed during the creation of the foreign key. A bit of syntax should be added to allow this to be forced.
How does the migration from our current system work? How does the upgrade work, how does the downgrade work?
This comment can only be removed by Brian's approval, and he will require this to be in the HLA and acknowledged to be done :)
-- end Brian's comment, do not delete without permission --
[ The above comment contains a redundancy, since the description below already mentions a flag for "Disallow foreign keys if non-transactional table" ]
[edit] Syntax (Table Constraint)
Table constraints are separate clauses in CREATE TABLE or ALTER TABLE.
[ CONSTRAINT constraint_name ] /* NOTE 1 */
FOREIGN KEY (fk_column list)
REFERENCES pk_table_name [(pk_column list)] /* NOTE 2 */
[ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ]
[ ON UPDATE { CASCADE | SET NULL | SET DEFAULT | RESTRICT | NO ACTION } ]
[ ON DELETE { CASCADE | SET NULL | SET DEFAULT | RESTRICT | NO ACTION } ]
/* NOTE 1 */
- The "CONSTRAINT constraint_name" clause is the subject of a different worklog entry, WL#2226 "Constraint Names". Searches for constraint names are case insensitive.
/* NOTE 2 */
- Currently we support "(pk_column list)" not "[(pk_column list)]". That is, we are changing so that the pk column list is optional.
It is legal to say either "ON UPDATE ... ON DELETE ..." or "ON DELETE ... ON UPDATE ...".
Examples:
CREATE TABLE t2 (s1 INT, FOREIGN KEY (s1) REFERENCES t1);
CREATE TABLE t2 (s1 INT, s2 INT,
FOREIGN KEY (s1,s2) REFERENCES t1 (s2,s1),
FOREIGN KEY (s2) REFERENCES t1 (s3) ON UPDATE CASCADE);
[edit] Syntax (Column Constraint)
Column constraints are part of a column definition.
[ CONSTRAINT constraint_name ]
REFERENCES referenced_table_name [(referenced_column list])
[ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ]
[ ON UPDATE { CASCADE | SET NULL | SET DEFAULT | RESTRICT | NO ACTION } ]
[ ON DELETE { CASCADE | SET NULL | SET DEFAULT | RESTRICT | NO ACTION } ]
These clauses are precisely the same as the clauses that follow "FOREIGN KEY (fk_column list)" in the Syntax (Table Constraint) described previously. So the column constraint "colx data-type REFERENCES ..." can be converted to "colx data-type, FOREIGN KEY (colx) REFERENCES ...". That is, column constraint is merely a shorthand.
InnoDB does not support column constraints.
Examples:
CREATE TABLE t2 (s1 INT REFERENCES t1);
CREATE TABLE t2 (s1 INT CONSTRAINT `x` REFERENCES t1 (s1));
The above statements should be changed to:
CREATE TABLE t2 (s1 INT, FOREIGN KEY (s1) REFERENCES t1);
CREATE TABLE t2 (s1 INT, CONSTRAINT `x` FOREIGN KEY REFERENCES t1 (s1));
[edit] Defaults
- If ON UPDATE is omitted, assume ON UPDATE NO ACTION.
- If ON DELETE is omitted, assume ON DELETE NO ACTION.
- Support only MATCH SIMPLE and MATCH FULL.
- Emit an error for MATCH PARTIAL ER_UNSUPPORTED_FEATURE.
- If MATCH clause is omitted, assume MATCH SIMPLE (this is standard).
- If pk_column list is omitted, then pk table must have a PRIMARY KEY, just copy the column names from there. Try to support foreign key constraints if there is no keys defined on the used tables.
[edit] Checks during CREATE TABLE or ALTER TABLE
MySQL must check table constraints at CREATE/ALTER TABLE time thus:
- The fk table must not be temporary.
- The pk table must not be temporary. (This restriction is not standard.)
- The pk table must not be a view.
- The fk and pk tables must both be made with the same storage engine.
- The number of columns in fk_column list must be between 1 and 16. (The number 16 is the InnoDB maximum, but Oracle allows up to 32.)
- The number of columns in fk_column list must = number of columns in pk_column list.
- No columns in fk_column list may be duplicates, e.g. you can't say FOREIGN KEY (s1,s1,s1).
For each column in fk_column list:
- Data type must not be TIMESTAMP WITH TIME ZONE (this restriction exists in other DBMSs, I left it in for safety).
- Data type must be comparable to corresponding pk_column data type. ("Comparable" means: if you can compare without needing to cast, okay. This is looser than the InnoDB requirement that data types be the same.)
- Column must be nullable if on_update_rule or on_delete_rule is SET NULL.
- If column is auto_increment then
{
If on_update_rule = SET NULL | SET DEFAULT | CASCADE, error
If on_delete_rule = SET NULL | SET DEFAULT, error )
For each column in pk_column list:
- If any column is nullable, error
- If user does not have REFERENCES privilege on column, error (InnoDB does not have a check for REFERENCES privilege.)
- If the set of pk columns is not the same as a primary-key or unique-key definition, error. (This is a limit that InnoDB does not have, InnoDB allows FKs to refer to anything.)
If an index covering the foreign key definition doesn't exist, it will be created automatically and a warning will be emitted. This index will have an auto-generated name and will be droppable. Once an index is dropped, the foreign key continues to function. If, on the contrary, a user-defined index is added that can serve foreign key implementation dropped, the autogenerated index is automatically dropped. This behavior in regard to autogenerated indexes is a decision of March 2006 Sorrento Architecture meeting.
It is legal for FK and PK to be the same table.
It is legal to have multiple referential constraints.
It is legal to have duplicate referential constraints. DB2 would give a warning in such a circumstance (SQLSTATE '01543'), MySQL should consider doing the same.
[edit] Checks during INSERT
The same rules apply for single-row insert, multiple-row insert, update (in which case we are concerned with the new value only), load data, replace (again we are concerned with the new value only). These checks also apply for REPLACE when the row is actually inserted. These checks also apply for CREATE TABLE at time foreign key is made. Error checks take place after triggers are activated.
If (it's INSERT and not INSERT IGNORE)
If (table has a FOREIGN KEY)
For each column in FK-column list:
If column value is NULL, it "passes" -- no need for further checks
(this is all that is required for MATCH SIMPLE)
SELECT * FROM PK WHERE pk-column(s) = 'fk-column value'(s)
If not found: error
The word "error" means "halt with an error message if sql_mode=TRADITIONAL, which causes rollback of statement if storage engine is transactional, but otherwise the statement is ".
If a table has both triggers and foreign keys, or if a table has multiple foreign keys, the order of processing will be the same as it is now for InnoDB tables.
[edit] Error checks during DELETE or TRUNCATE
For each FK that refers to table:
If (FK is ON DELETE CASCADE)
DELETE FROM FK WHERE fk-columns = pk-columns
If (FK is ON DELETE SET NULL)
UPDATE FK SET all fk-columns = NULL WHERE fk-column(s) = 'pk-column value'(s)
If (FK is ON DELETE SET DEFAULT)
UPDATE FK SET all fk-columns = DEFAULT WHERE fk-column(s) = 'pk-column value'(s)
If (FK is ON DELETE RESTRICT | NO ACTION)
SELECT * FROM FK WHERE fk-column(s) = 'pk-column value'(s) LIMIT 1
If anything is returned: Error
[edit] Error checks during UPDATE
The same rules apply for single-row update, multiple-row update, and REPLACE when value is actually updated. No checks if IGNORE.
For each fk-column in the table, referencing another table:
- [ Rule for the new column value is the same as for INSERT, above ]
For each FK that refers to table:
If (FK is ON UPDATE CASCADE)
DELETE FROM FK WHERE fk-column(s) = 'pk-column value'(s)
If (FK is ON UPDATE SET NULL)
UPDATE FK SET fk-column(s) = NULL WHERE fk-column(s) = 'pk-column value'(s)
If (FK is ON UPDATE SET DEFAULT)
UPDATE FK SET fk-column(s) = DEFAULT WHERE fk-column(s) = 'pk-column value'(s)
If (FK is ON UPDATE RESTRICT | NO ACTION)
SELECT * FROM FK WHERE fk-column(s) = 'pk-column value'(s) LIMIT 1
If anything is returned: Error
[edit] Error checks during DROP TABLE / DATABASE
If (table has a primary key): Error.
[edit] Error checks during REVOKE
It's possible to REVOKE REFERENCES ON PK FROM user, so user no longer has a right to have an FK on the PK. In other DBMSs, this would cause restrict or cascade. But we'll just allow it -- no error check.
[edit] Error checks during ALTER
If table has an FK referencing it
If ALTER TABLE ... CHANGE and column is in PK
If column data type, size, or collation changes: Error
(It would be possible to allow if columns are
still comparable, but one would have to re-check
every row.)
If ALTER TABLE ... DROP [COLUMN] column_name and column is in PK: Error
If ALTER TABLE ... DISABLE KEYS: Error
If ALTER TABLE ... ENABLE KEYS: Error
If ALTER TABLE ... RENAME [TO] new_table_name: Error
If ALTER TABLE ... CONVERT TO CHARACTER SET: Error
If ALTER TABLE ... DROP PRIMARY KEY: Error
[edit] Error checks during RENAME TABLE
If table has an FK referencing it, Error.
[edit] Flags
Right now we're ignoring the foreign-key clauses. We will start to enforce them. This might affect people upgrading from earlier versions, who have the clause in the table definition, but who have data that violates the clause already.
So inevitably somebody will say "let's have a flag for 'parse and ignore' (as before) or 'enforce'.
Another possible flag might be:
- "Disallow foreign keys if non-transactional table"
It's difficult to see how foreign-key checks would work unless statement rollback is possible with MyISAM. It might someday be possible in theory in rare cases (see WL#1332), but when it's not there should be an error return for any database update that might affect either the primary-key table or the foreign-key table. Of course, this flag must be OFF by default, since it negates what we're asking for with this task.
We will continue to support "SET FOREIGN_KEY_CHECKS = 0". We will not check for database-integrity violations when the flag goes back on.
[edit] Warning in documentation and marketing material
At absolutely no time can we ever claim that we enforce "data integrity" for MyISAM or any other non-transactional table. For such tables, we stop after discovering an error, but that's too late, and could conceivably harm data integrity more than it helps.
[edit] Errors
These errors already exist:
ER_CANNOT_ADD_FOREIGN 1215 'HY000'
"Cannot add foreign key constraint"
ER_NO_REFERENCED_ROW 1216 '23000'
"Cannot add or update a child row: a foreign key constraint fails",
ER_ROW_IS_REFERENCED 1217 '23000'
"Cannot delete or update a parent row: a foreign key constraint fails",
ER_WRONG_FK_DEF 1239 '42000'
"Wrong foreign key definition for '%-.64s': %s",
ER_KEY_REF_DO_NOT_MATCH_TABLE_REF 1240 'HY000'
"Key reference and table reference doesn't match",
ER_DUP_KEYNAME 1061 '23000'
"Duplicate key name '%-.64s'",
ER_TOO_MANY_KEY_PARTS 1070 '42000'
"Too many key parts specified. Max %d parts allowed",
ER_TOO_LONG_KEY 1071 '42000'
"Specified key was too long; max key length is %d bytes",
This error is new:
ER_? '42830'
"Data type must be comparable to pk_column data type"
[edit] Recursion
A restriction of this worklog is lack of support for circular foreign key definitions. A check for a loop in the definition will be performed at creation. This is a decision of the architecture meeting in Sorrento, March 2006.
When this restriction is lifted, we may consider the following behavior of an UPDATE/DELETE cascade:
- When UPDATE/DELETE cascade eventually cycles back so that we're changing the original table again, if a row has already been changed in this statement, stop.
InnoDB has a limit to the number of recursions. This is not standard.
[edit] InnoDB
Foreign keys have worked with InnoDB since MySQL version 3.23, and are reasonably stable.
Possibly: if storage engine = InnoDB, we could continue to pass on foreign-key checking to InnoDB. But that is not what this proposal advocates.
But foreign keys with InnoDB, version 4.1.2, have problems:
- "FOREIGN KEY (column_name) REFERENCES table_name" is illegal
- "column_name REFERENCES table_name" doesn't work
- self-referential insertion doesn't always work
- data types of foreign key and primary key must be the same
- it's compulsory to have an index on the foreign key (well, this is relaxed now, an index is created)
- REFERENCES privilege is ignored
- default for ON UPDATE|DELETE is RESTRICT instead of NO ACTION
- referenced key needn't be primary/unique (at least for 'strict' mode it should be)
- one gets errors when referring to a table in another database.
Probably Heikki will fix these flaws for InnoDB eventually. Meantime, there's no reason to imitate these flaws for MyISAM.
[edit] References
We have a description of foreign keys in the MySQL User Manual:
See also WL#1438 Improving 'foreign keys' functionality for InnoDB
See also WL#1444 "ON UPDATE CASCADE should be recursive"
See also WL#1799 Support for Foreign Keys in MySQL Cluster (Responding to a question about WL#1799's relation to WL#148, Jonas Oreland said "... a FK implementation for ndb in mysqld is bound to be useless for a very large portion of mysql cluster users".)
See also https://intranet.mysql.com/secure/wiki/ServerDevelopmentRoadMap: "Add support to MyISAM for showing stored foreign key definitions, which is something a lot of GUI applications need (no specific WL entry, Brian thinks this is done)"
Feature requests:
- Bug#6085 DROP TABLE fails if table is referenced (from Georg Richter)
- Bug#17943 Inline foreign key constraint definition should give a warning
- Bug#19173 Foreign Key reference integrity
These bugs will be fixed:
- Bug#8878: foreigh keys in innodb
- Bug#10883: foreign key constraint table references are lowercase
- Bug#11715: naming rules for constraints lead to illstructured information_schema
- Bug#13301: FK definition requires definition outside of column definition
- Bug#11472: Triggers not executed following foreign key updates/deletes
Pages 418-431 in SQL-99 Complete, Really.
[edit] Low-level design
[edit] Table of contents
I. Introduction II. A brief insight into approaches to FK implementation taken by different databases. III. Proposed implementation. IV. Low level design. V. Open issues. VI. References. Appendix A. Triggers implementing foreign key constraints Appendix B. an example of the violation of the standard in row-by-row order of constraint checks. Appendix C. Visibility of updates.
[edit] Introduction
This document does not include the definition of the problem. Please refer to the contents of WL#148 for the high level speicfication.
[edit] A brief insight into approaches to FK implementation taken by different databases.
Below is a brief description of approaches that are used by various databases to implement foreign-key constraints.
InnoDB -- InnoDB checks foreign key constraints on row-by-row basis: right after inserting, updating or deleting a row from a table InnoDB checks if this operation violates a constraint or if some CASCADE action should be taken. It uses a simple routine written in C which performs these checks using InnoDB internal methods. Since InnoDB assumes that an index is always created for the FK these are mostly index lookups. As constraint checks are done in row-by-row fashion rather than at the end of statement execution, InnoDB behaviour deviates from the standard in case of a loop in the foreign key definition (a self-referencing table) (see appendix 1).
Firebird -- Firebird takes a similar approach to InnoDB and checks foreign keys in a row-by-row fashion, but the implementation of the check/CASCADE actions uses a row-level trigger (in bytecode) instead of a C routine. Ann Harrison notes: this description is wrong. Indeed it suffers from the same problems with self-referencing tables as InnoDB.
Ingres -- Foreign-key constraints are implemented with help of statement-level triggers written in SQL. Internal temporary tables are used to store/accumulate old and new versions of rows that are inserted/updated/ deleted by the statement until the statement end. The trigger is invoked in the end of statement processing.
Postgres -- Postgres uses row-level triggers, which execution is deferred until the end of statement (or the end of transaction in case of deferred constraints). Instead of keeping separate copies of old and new versions of the modified rows, it utilizes copies which are stored until the end of transaction by its storage engine (which implements MVCC). Triggers are written in C but use the internal prepared statement API to access/modify data.
Raima optimization technique -- Raima uses a special trick to optimize updates and deletes from the parent table. A hidden column to store the number of rows referencing this row is created in the parent table. In some cases, this allows to handle deletes and updates on the parent table without accessing the child table. OTOH it requires the parent table to be updated each time the child table is updated.
[edit] Proposed implementation
The defining decision for the architecture of foreign keys is the decision about the order of execution.
As described above, the standard requirement is that foreign key checks and cascade actions are applied in the end of statement execution, and the same requirement exists for primary key checks and the time of trigger activation. Currently MySQL does not follow the standard: triggers and primary key/unique constraints are checked and invoked in row-by-row fashion.
Let's assume for a moment that we try to follow the standard path for foreign key constraints. In a nutshell, in order to implement this approach we should:
- save the rows being inserted/updated/deleted (particularly, their FK/PK values) till the end of statement. This data is needed to locate the referencing/referenced row in the child/parent table (correspondingly). Note, that in future we may also want to exploit specifics of some storage engines (e.g. access old versions through a versioning mechanism) to solve this problem.
- we should change the triggers implementation in such a way that they also are executed at the end of statement or we should accept that we are not standard compatible in this respect (in other words, that triggers are called before foreign key checks). If the triggers implementation is changed, in addition to FK/PK values that are saved till the end of the statement, we need to save all fields that are accessed through OLD. pseudo row variable, available in triggers.
- the order of primary/unique key constraint checks shall also be changed: according to the standard, they should be checked at the end of statement as well. The alternative of partially standard-compliant behaviour introduces an inconsistency and the execution order that is hard to understand and document. Example: in case of a partially compliant order of execution, when shall the triggers that are fired by a CASCADE action be executed: before all constraints are checked for all rows, or in row-by-row fashion? In case of a complex scheme using triggers and constraints across many tables a partially standard order of execution is exceedingly difficult to document and understand.
Therefore, as it does not seem realistic to implement a fully standard compliant order of execution of primary/unique key checks, triggers, constraints, the proposal is to use the approach that is consistent with the existing MySQL features and is not standard-compliant.
The gist -- Foreign key constraints should be checked in row-by-row fashion. As to the particular method in which the checks are done, the suggestion is to use triggers that are written in C. These triggers shall use internal MySQL methods to access and update the data. Perhaps, a more general approach, such as triggers written in SQL or use the prepared statements API, is better from the standpoint of code reuse, but it may harm performance unacceptably.
The main restriction of this implementation is that it doesn't allow to handle loop-like dependecies. (Appendix B). Suggestion: we should emit a warning when one attempts to create a loop-like sequence of foreign-key constraints, instead of issuing an error on update or delete, as it happens now for primary or unique key checks.
[edit] Low level design.
Storage of the information about FK (Data Dictionary requirements) -- The proposal is to store the information about foreign key constraints in the .FRM files of referencing and referenced tables. This approach assumes that `name locks' are taken on both tables during creation or modification of a foreign key constraint. It is similar to the approach that we use for creation of triggers currently and therefore should be consistent with the current locking scheme.
Although it is possible to employ a table in `mysql' database to store constraints metadata, a correct implementation (which works consistently with statement based replication) will either require implementation of the entire data-dictionary-in-tables framework or will require us to acquire `name locks' on both tables participating in the foreign key constraint during its creation or modification of such constraint, as we do in store-in-.FRM approach. The former is not realistic to accomplish in the given timeframe and the latter approach gives us almost no benefits in comparison with the original one, while is more expensive perfomance-wise.
Two important notes about the suggested approach:
- When we DROP or RENAME the table which has a foreign key defined for it we should name-lock and update all .FRMs of referenced tables.
- It is not clear how we should perform locking if we have CREATE TABLE ... SELECT that additionally defines some foreign key constraints.
Provided that we store the data in .frm files, a decision should be made about the exact format in which we should store the information about a foreign key. The first option is to store the definition text (or a part of it) of ALTER TABLE ADD CONSTRAINT statement. It allows us to reuse code but adds extra overhead. This option is also easily extensible. The second option is to use a binary format. It is slighly harder to implement, but does not require parsing on table open, so is probably more efficient. Note, that if this second option is chosen, we should ensure that the binary format is easily extensible and thus future-proof.
In either case, we should store the list of foreign keys in which a given table participates as a referencing table and the list of foreign keys for which this tables serves as a referenced one.
Table locking. -- To perform foreign key constraints checks/actions for the table we have to open and lock referenced and referencing tables. The only way to do that in the current locking architecture is to follow the same "prelocking" approach which is used for tables used in stored functions and triggers. In other words, we will calculate a complete set of tables used by the statement, including the appropriate referenced/referencing tables, and then open and lock them at once at the beginning of statement execution.
The following rules can be used to determine which tables should be opened and which locks should be taken in order to perform foreign key checks/actions for a table:
- If the operation at hand modifies data, and the table participates in a foreign key constraint (either as a referencing table or as the referenced one), for each foreign key constraint in which the table participates do:
- If the table is a referencing table in this constraint, then the referenced table should be added to the list of prelocked tables with TL_READ lock request.
- If the table is the referenced table in this constraint, then the referencing table should be added to the list of prelocked tables. Lock type should be:
- If the constraint contains CASCADE, SET DEFAULT or SET NULL in ON UPDATE or ON DELETE clauses then TL_WRITE lock should be requested.
- Otherwise TL_READ lock is satisfactory.
The new tables added to the prelocking list by the above algorithm will be automatically explored by the current prelocking algorihtm and checked for triggers or new foreign key constraints. In other words, it's necessary to extend only the 'explore' step of the prelocking algorithm with new rules, the recursive nature of the algorithm will ensure that all the tables participating in foreign key constraints are added to the transitive closure.
Note, that when building such a transitive closure, we should not add referencing/referenced tables to the prelocking list more than once. This is necessary not only to exclude unnecessary duplicates of tables from the locking list, but also to avoid an infinite loop in case of a loop-like foreign key dependency.
Consequently, instead of implementing the above rules as a separate algorithm, which will specifically calculate the foreign-key constraint tables to be added to the prelocking set, the proposal is to rely on the existing implementation that is used for triggers. In other words, we simply need to ensure that the custom triggers that are used by the foreign key subsystem properly expose the information about the used tables and all of the above rules will be fulfilled automatically.
Handling of the the MATCH clause -- The HLS specifies that support for MATCH SIMPLE only is required. Since it is very easy to support MATCH FULL if we already have support for MATCH SIMPLE, MATCH FULL should perhaps be added to the list of requirements as an optional feature. These types of the MATCH clause differ only in case of checks that are done during insertion into a referencing table.
So they can be implemented by a simple check which is to be done at the beggining of the trigger which is responsible for processing of a foreign key check on insertion:
- if the match type is SIMPLE and one or more foreign key columns in the row being inserted is NULL, then regard the constraint as satisified. Otherwise continue with checking.
- if the match type for the foreign key constraint is FULL and all foreign key columns in the row being inserted are NULL, then treat the constraint as satisified. If at least one of columns in the foreign key is NULL, and at least one of them is not NULL, treat the foreign key constraint as failed. Otherwise (if all columns in the foreign key are not NULL), continue with checking.
Handling of foreign key constraints with MATCH PARTIAL is more complex and support for it is not required in the HLS.
Handling of checks and actions -- In order to perform checks and actions supporting foreign key constraints it is enough to create "on insert" and "on update" triggers for referencing tables and "on update" and "on delete" triggers for the referenced table.
Note that instead of creating persistent triggers which will be stored in database and loaded into memory once the table is opened (and inserted into table cache), it is probably better to create trigger objects from information about foreign keys in which a table participates at the moment when we load table into table cache (this will allow us to change triggers without creating problems for user during upgrade).
First version of these triggers can be implemented using SQL (see Appendix A for examples of such triggers). But since it is likely that such implementation will be inefficient, we may have to reimplement them in C. This implementation can use the same make_select()/init_read_record()/read_record() approach as used in mysql_delete() and mysql_update(). Implementation of some of triggers (e.g. of "on insert" trigger for referencing table) can be optimized even further by directly using handler methods once it is clear that even this approach is too expensive.
It is very important to note here that the triggers which update tables ("on delete" and "on update" triggers for referenced table in case then one of referential actions differs from RESTRICT/NO ACTION) should in their turn call triggers (and thus process foreign key constraints) associated with the referencing table (see bug #11472 "Triggers not executed following foreign key updates/deletes"). So their C implementation shall be integrated with processing of UPDATEs and DELETEs (server functions mysql_update() and mysql_delete()).
The other question is which timing should have triggers that implement foreign key checking? The proposal is to execute checks before actual modification of tables (in other words right after execution of all BEFORE triggers). Also on update/delete checks for foreign keys with RESTRICT/NO ACTION referential actions should be done before execution of triggers for SET NULL/SET DEFAULT/CASCADE actions. Although such approach is not fully consistent with standard it should miminimize effects of failing foreign key constraints for tables that don't support statement rollback (altough this implementation still leaves some holes which can be fixes only by using tables supporting statement rollback). It will try to leave no dangling rows in database.
See Appendix A/HLS for descriptions of particular triggers used.
What should happen if a constraint fails ? -- We should emit a proper error (ER_NO_REFERENCED_ROW or ER_ROW_IS_REFERENCED). Also such a failure shall initiate a rollback of the current statement for transactional storage engines and storage engines that support statement level rollback (like MyISAM should be once WL#1332 will be implemented).
In case when the storage engine at hand does not support transactions or doesn't have a statement level rollback, we should just abort execution of the current statement. This will leave all changes that were applied far by the aborted statement intact. The last row, which triggered the constraint failure should not be present in the table. In case when we reject update/delete operation on row becuase it violates effects of this operation should be visible too. But unlike the previous case it may be possible to observe some partial changes to referencing table which were done by processing of CASCADE referential action.
One of possible approaches to solving the problem with inability to rollback changes to non-transactional table is to perform all checks before executing the actual operation. It is however not a strategy
[edit] Limitations
Which kind of limitations are imposed by suggested implementation?
- a) It does not handle self-referencing tables and tables with loop-like dependencies well.
- b) Suggested implementation does not work very well for engines which don't support statement-level rollback. For example, one might be able to observe some left-overs of attempts to execute cascading deletes on referencing tables in case when deletion from the referenced table failed because of a foreign key constraint violation.
[edit] Q & A
Q. Which format we should choose for storing the infromation about foreign keys in .FRM - storage of full-blown foreign key definition or some extensible binary format?
A. Monty's requirement for foreign keys in 5.2: text format of frms files will be implemented.
Q. Should we handle in a special way the situation when foreign key constraint actions should affect a table which is used in statement causing these actions (we are not talking about self-referencing tables here, but about cases when we have delete from parent table which uses subquery with child table, or when one does multi-update which should update parent table but also uses child table for reading)? Or should we simply emit error like we do it for triggers? See appendix C.
A. This is a restriction of this worklog, we will simply emit an error.
Q. How we should handle INSERT IGNORE?
A. We should skip the row which does not satisfy the foreign key constraint and move on to the next row.
Q. Should we allow creation of foreign keys which are not supported by indexes on referencing tables?
A. We will create an index automatically if it doesn't exist, emit a warning that it was created automatically, and allow it to be dropped. If someone adds an index (in the same statement or later), the automatic index should be dropped.
Q. Should we check that the table satisfies to the constraint being added when we add it with ALTER TABLE? HLS says nothing about it.
A. A feture request from Monty: ALTER TABLE should have an option that says that you add a constraint without having to check it. Default mode: CHECK.
- We should also have ACTIVATE, ACTIVATE and CHECK commands.
- RENAME TABLE ... / DROP TABLE DEACTIVATE: doesn't drop dependent objects. Needed for quick restore in a data warehouse environment.
- This shall be addressed in a separate worklog item.
Q. How we should handle REFERENCES privilege? Should we really do nothing when it is revoked as HLS suggests?
A. Yes, this is not a security issue.
Q. What shall happen with a foreign key when an insert into a table happens that bypasses the SQL layer? Example: use of NDB API, direct insert into the source table in a federated setup.
A. Inserts through NDB API will bypass foreign key checks in the current architecture. We won't guarantee consistency of foreign keys if NDB API is used or a direct insert into the source federated table is used. The problem of NDB API should be dealed with in WL#1799 "Support for Foreign Keys in MySQL Cluster".
Q. Shall we support foreign keys across different storage engines?
A. The initial implementation will allow this possibility. A QA resource will be dedicated to explore possible consequences of this feature, and based on the results of the investigation this feature will be prohibited or allowed in the release.
[edit] References
- Innodb source code (as bundled with 5.0)
- (innobase/dict/dict0crea.c, dict0dict.c, dict0load.c, innobase/include/dict0mem.h, innobase/row/row0ins.c, row0upd.c, row0mysql.c)
- Postgres 8.0 source code
- (backend/utils/adt/ri_triggers.c, backend/commands/tablecmds.c, trigger.c, backend/executor/execMain.c)
- Ingres r3 source code
- (back/qef/qeq/qeaddl.c, psf/psl/pslcons.c)
- Ingres SQL Reference Guide
- Firebird2 source code
- (src/gpre/cmd.cpp, jrd/exe.cpp, ...)
- SQL-99 Complete, Really (pp. 403-442)
- ISO/IEC 9075-2:2003 (a.k.a. SQL 2003 standard)
- (Sections 4.17, 11.8, 11.20, 14.16 - 14.27)
[edit] Appendix A: Triggers implementing foreign key constraints
# Trigger on insertion into child table (MATCH SIMPLE)
create trigger child_on_insert before insert on child for each row
begin
if not ((new.fk1 is null) or (new.fk2 is null)) then
if not exists (select * from parent where pk1 = new.fk1 and pk2 = new.fk2) then
call my_error_fk_violation();
end if;
end if;
end;|
# The same trigger in case of MATCH FULL
create trigger child_on_insert before insert on child for each row
begin
if not ((new.fk1 is null) or (new.fk2 is null)) then
if not exists (select * from parent where pk1 = new.fk1 and pk2 = new.fk2) then
call my_error_fk_violation();
end if;
elseif not ((new.fk1 is null) and (new.fk2 is null)) then
call my_error_fk_violation();
end if;
end;|
# Trigger on deletion from parent table in case of ON DELETE RESTRICT
create trigger parent_on_delete before delete on parent for each row
begin
if exists(select * from child where fk1 = old.pk1 and fk2 = old.pk2) then
call my_error_fk_violation();
end if;
end;|
# Similar trigger for updating parent table (ON UPDATE RESTRICT)
create trigger parent_on_update before update on parent for each row
begin
if exists (select * from child where fk1 = old.pk1 and fk2 = old.pk2) then
call my_error_fk_violation();
end if;
end;|
# Trigger on deletion from parent table in case of ON DELETE SET DEFAULT create trigger parent_on_delete before delete on parent for each row begin update child set fk1= default(fk1), fk2= default(fk2) where fk1 = old.pk1 and fk2 = old.pk2; end;|
# Trigger for update for ON UPDATE CASCADE create trigger parent_on_update before update on parent for each row begin update child set fk1= new.pk1, fk2= new.pk2 where fk1 = old.pk1 and fk2 = old.pk2; end;|
[edit] Appendix B: an example of the violation of the standard in row-by-row order of constraint checks.
# Example 0: Batched-insert into a self-referencing table
CREATE TABLE MID1 (P_KEY DECIMAL(4) NOT NULL UNIQUE,
F_KEY DECIMAL(4) REFERENCES MID1(P_KEY));
# This will fail in case of row-by-row constraint checking INSERT INTO MID1 VALUES (1,1), (3,2), (2,1);
# Example 1: Update of a self-referencing table
CREATE TABLE MID1 (P_KEY DECIMAL(4) NOT NULL UNIQUE,
F_KEY DECIMAL(4) REFERENCES MID1(P_KEY));
INSERT INTO MID1 VALUES(1,1); INSERT INTO MID1 VALUES(2,1); INSERT INTO MID1 VALUES(3,2); INSERT INTO MID1 VALUES(4,3); INSERT INTO MID1 VALUES(5,1);
# This will also fail in case of row-by-row constraint checking # This case also exploits the shortocomng of row-by-row primary key checks. UPDATE MID1 SET P_KEY = P_KEY + 1, F_KEY = F_KEY + 1;
# Example 2: Deletion from a self-referencing table
CREATE TABLE STAFF_C (EMPNUM CHAR(3) NOT NULL,
EMPNAME CHAR(20),
GRADE DECIMAL(4),
CITY CHAR(15),
MGR CHAR(3),
UNIQUE (EMPNUM),
FOREIGN KEY (MGR)
REFERENCES STAFF_C(EMPNUM));
INSERT INTO STAFF_C VALUES('E1','Alice',12,'Deale',NULL);
INSERT INTO STAFF_C VALUES('E2','Betty',10,'Vienna','E1');
INSERT INTO STAFF_C VALUES('E3','Carmen',13,'Vienna','E2');
INSERT INTO STAFF_C VALUES('E4','Don',12,'Deale','E2');
INSERT INTO STAFF_C VALUES('E5','Don',12,'Deale','E1');
INSERT INTO STAFF_C VALUES('E6','Tom',14,'Gettysburg','E5');
INSERT INTO STAFF_C VALUES('E7','Kingdom',18,'Gettysburg','E7');
# This will give an error with the row-by-row approach DELETE FROM STAFF_C WHERE EMPNUM = 'E2' OR MGR = 'E2';
[edit] Appendix C. Visibility of updates
Consider the following test case:
drop table if exists t1,t2; create table t1 (id int); create table t2 (sum int);
create trigger t1_ai after insert on t1 for each row
insert into t2 values ((select sum(id) from t1));
create trigger t1_au after update on t1 for each row
insert into t2 values ((select sum(id) from t1));
insert into t1 (id) values (1), (2), (3);
Depending on the storage engine that is used for t1, this test produces different results.
- If the storage engine is MyISAM, t2 gets rows (1), (3), (6).
- If the storage engine is CSV, t2 gets rows (0), (0), (0).
- If the storage engine is NDB, t2 gets rows (NULL), (1), (3).
The results are also expected to differ in case of FEDERATED and ARCHIVE engines.
For self-referencing constraints, even if the check is performed at the end of a statement and not for each row, SQL layer needs to see the new state of the table, not the old one or an intermediate state in order to implement the standard-compliant semantics.