WL#5125: Refactory of Slave's master.info and relay_log.infoAffects: Server-Prototype Only — Status: Code-Review — Priority: MediumCONTEXT ======= We aim at having a slave that, after a crash, may continue its normal operation without any human intervention. The idea is to make the master.info and relaylog.info, i.e the replication positions or simply positions, transactional persistent and reliable. In other words, the idea is to keep the positions in sync with the execution of transactions on the slave, thus incrementing the positions when a transaction commits and restoring the previous positions when a transaction rolls back. There are two different proposals to accomplish our goals. Specifically, the WL#2775 and WL#3970. The former proposes to exploit the transactional properties of the storage engines (e.g. Innodb) and the latter to use a 2-PC mechanisms. See further details in what follows. BACKGROUND ========== Transactional Engines: ---------------------- "In a system using write ahead logging, all modifications are written to a log before they are applied. Usually both redo and undo information is stored in the log. The changes are applied in memory, and asynchronously flushed to disk." 2PC: ---- "The two phases of the algorithm are the prepare phase, in which a coordinator process attempts to prepare all the transaction's participating processes (named participants or cohorts) to take the necessary steps for either committing or aborting the transaction, and the commit phase, in which, based on voting (either "Yes," commit, or "No," abort) of the cohorts, the coordinator decides whether to commit (only if all vote "Yes") or abort the transaction, and notifies the result to the cohorts, which follow with the needed actions (commit or abort) with their transactional resources and their respective portions in the transaction's output." DESCRIPTION =========== WL#2775 ------- It proposes the use of system tables to store the positions and takes advantage of the transactional properties of the engine. Requirements: 1 - If the data and positions are stored in different engines, all the engines involved must support 2PC in order to provide crash-safety. 2 - If the data and positions are stored in the same engine, the engine must be transactional in order to provide crash-safety. Advantages: 1 - It may be the fastest approach if data and positions are stored in the same engine. 2 - Non special requirement is needed if data and positions are stored in the same engine, which means that all the current transactional engines can be used with this approach. Disadvantages: 1. Customers are used to manage files (i.e. master.info and relay-log.info) and this approach will eliminate those files. Since all position data is stored in database tables, it will not be possible to check the master.info and relay-log.info files offline. If administrators are used to manipulate the files to "fix" replication, this approach will complicate issues for those administrators. WL#3970 ------- It proposes to keep using the current files, i.e. master.info and relay.info, and augment the current code base with a 2PC mechanism to make the positions transactional persistent and reliable. Requirements: 1 - The engines must support 2PC in order to provide crash-safety. Advantages: 1. Customers are used to manage files (i.e. master.info and relay-log.info) and this approach will keep the same infra-structure. Thus it is possible to check the master.info and relay-log.info files offline if administrators are used to manipulate the files to "fix" replication. Disadvantages: 1 - It will require the engines to provide 2PC. 2 - It may harm the performance due to extra-fsyncs. See an analysis in what follows. ANALYSIS FSYNC =============== In this analysis, we compare a vanilla MySQL with possible implementations in order to figure out the number of extra fsyncs required to make the solution crash-safe. 1 - Storing positions along with the XID BACKGROUND: If the binlog is enabled, the the current implementation of the 2-PC uses the stored XID in the binlog in order to decide if a transaction should commit after a failure. In other words, in the second phase of a 2-PC after all the participants have voted to commit a transaction, a failure while writing to the binlog would rollback the transaction when the MySQL recovers. Although the binlog is a participant in the 2-PC, it does nothing in the prepare phase requiring just to fsync in the commit phase. In this approach, we propose to store the positions along with XID in the binlog file. EXTRA-FSYNCS: . One extra fsync per storage engine in the prepare phase of the protocol. . An extra fsync while writing the positions along with the XID. - Total 2 extra fsyncs. 2 - Storing the positions in a different file from the binlog. BACKGROUND: The new file is a participant in the 2-PC protocol. In contrast to the approach described in (1), we propose here to store the positions in a new file to be specified by the user. EXTRA-FSYNCS: . One extra fsync per storage engine in the prepare phase of the protocol. . Two extra fsyncs while writing the positions into the new file (prepare and commit phases). - Total 3 extra fsyncs. 3 - Storing the positions in a different file with the binlog enabled. BACKGROUND: The new file is a participant in the 2-PC protocol. This approach is similar to the one described in (2), but now, we also have the slave acting as a master and as such the binlog is enabled. Thus regarding fsyncs this approach is the sum of (1) and (2). EXTRA-FSYNCS: . One extra fsync per storage engine in the prepare phase of the protocol. . Two extra fsyncs while writing the positions into the new file (prepare and commit phases). . And an extra fsync while writing the XID. - Total 4 extra fsyncs. 4 - Storing the positions in a system table using the same engine as the data BACKGROUND: The transactional mechanism of the storage engine will hide any performance penalties. Note, however, that the implementation needs to be well designed to avoid creating unnecessary entries in the transactional log and keep the data in memory. EXTRA-FSYNCS: - This is the best case and there is no need for extra fsyncs. 5 - Storing the positions in a system table but using a different storage engine from the data. BACKGROUND: Note that if the data is stored in a different storage engine from the positions a 2-PC is required. This is equivalent to case 1. EXTRA-FSYNCS: . One extra fsync per storage engine in the prepare phase of the protocol. . An extra fsync while writing the positions along with the XID. - Total 2 extra fsyncs. RELATED ISSUES ============== There are other bugs and worklogs that also have the goal of making the slave safe. See a brief list below: 1 - BUG#45292 aims at making the index file safe. 2 - WL#4621 handles the case that the master.info and relay.info are not in sync and the relaylog is corrupted. 3 - There is no worklog or bug to handle the case that the master gets its binary log corrupted due to a crash. There is no positional information similar to what we have on the slave. CONTEXT We propose the following design to the Rpl_info, Relay_log_info_file and No Comments yet |
VotesWatches0 members are watching this worklog
You must be logged in to track this worklog.
Provide Feedback
You must be logged in to comment
|