WL#5125: Refactory of Slave's master.info and relay_log.info

Affects: Server-Prototype Only — Status: Code-Review — Priority: Medium

CONTEXT
=======
We aim at having a slave that, after a crash, may continue its normal operation
without any human intervention. The idea is to make the master.info and
relaylog.info, i.e the replication positions or simply positions, transactional
persistent and reliable. In other words, the idea is to keep the positions in
sync with the execution of transactions on the slave, thus incrementing the
positions when a transaction commits and restoring the previous positions when a
transaction rolls back.

There are two different proposals to accomplish our goals. Specifically, the
WL#2775 and WL#3970. The former proposes to exploit the transactional properties
of the storage engines (e.g. Innodb) and the latter to use a 2-PC mechanisms.
See further details in what follows.

BACKGROUND
==========

Transactional Engines:
----------------------
"In a system using write ahead logging, all modifications are written to a log
before they are applied. Usually both redo and undo information is stored in the
log. The changes are applied in memory, and asynchronously flushed to disk."

2PC:
----
"The two phases of the algorithm are the prepare phase, in which a coordinator
process attempts to prepare all the transaction's participating processes (named
participants or cohorts) to take the necessary steps for either committing or
aborting the transaction, and the commit phase, in which, based on voting
(either "Yes," commit, or "No," abort) of the cohorts, the coordinator decides
whether to commit (only if all vote "Yes") or abort the transaction, and
notifies the result to the cohorts, which follow with the needed actions (commit
or abort) with their transactional resources and their respective portions in
the transaction's output."


DESCRIPTION
===========

WL#2775
-------
It proposes the use of system tables to store the positions and takes advantage
of the transactional properties of the engine.

Requirements:
1 - If the data and positions are stored in different engines, all the engines
involved must support 2PC in order to provide crash-safety.

2 - If the data and positions are stored in the same engine, the engine must be
transactional in order to provide crash-safety.

Advantages:
1 - It may be the fastest approach if data and positions are stored in the same
engine.

2 - Non special requirement is needed if data and positions are stored in the
same engine, which means that all the current transactional engines can be used
with this approach.

Disadvantages:
1. Customers are used to manage files (i.e. master.info and relay-log.info) and
this approach will eliminate those files. Since all position data is stored in
database tables, it will not be possible to check the master.info and 
relay-log.info files offline. If administrators are used to manipulate the files
to "fix" replication, this approach will complicate issues for those administrators.

WL#3970
-------
It proposes to keep using the current files, i.e. master.info and relay.info,
and augment the current code base with a 2PC mechanism to make the positions
transactional persistent and reliable.

Requirements:
1 - The engines must support 2PC in order to provide crash-safety.

Advantages:
1. Customers are used to manage files (i.e. master.info and relay-log.info) and
this approach will keep the same infra-structure. Thus it is possible to check
the master.info and relay-log.info files offline if administrators are used to
manipulate the files to "fix" replication.

Disadvantages:
1 - It will require the engines to provide 2PC.
2 - It may harm the performance due to extra-fsyncs. See an analysis in what
follows.


ANALYSIS FSYNC
===============

In this analysis, we compare a vanilla MySQL with possible implementations in
order to figure out the number of extra fsyncs required to make the solution
crash-safe.

1 - Storing positions along with the XID

  BACKGROUND:
  If the binlog is enabled, the the current implementation of the 2-PC
  uses the stored XID in the binlog in order to decide if a transaction should
  commit after a failure. In other words, in the second phase of a 2-PC after 
  all the participants have voted to commit a transaction, a failure while
  writing to the binlog would rollback the transaction when the MySQL recovers.

  Although the binlog is a participant in the 2-PC, it does nothing in the
  prepare phase requiring just to fsync in the commit phase.

  In this approach, we propose to store the positions along with XID in the
  binlog file.

  EXTRA-FSYNCS:
  . One extra fsync per storage engine in the prepare phase of the protocol.
  . An extra fsync while writing the positions along with the XID.
  - Total 2 extra fsyncs.

2 - Storing the positions in a different file from the binlog.

  BACKGROUND:
  The new file is a participant in the 2-PC protocol.

  In contrast to the approach described in (1), we propose here to store the
  positions in a new file to be specified by the user.

  EXTRA-FSYNCS:
  . One extra fsync per storage engine in the prepare phase of the protocol.
  . Two extra fsyncs while writing the positions into the new file (prepare and
  commit phases).
  - Total 3 extra fsyncs.

3 -  Storing the positions in a different file with the binlog enabled.

  BACKGROUND:
  The new file is a participant in the 2-PC protocol.

  This approach is similar to the one described in (2), but now, we also have
  the slave acting as a master and as such the binlog is enabled. Thus regarding
  fsyncs this approach is the sum of (1) and (2).

  EXTRA-FSYNCS:
  . One extra fsync per storage engine in the prepare phase of the protocol.
  . Two extra fsyncs while writing the positions into the new file (prepare and
  commit phases).
  . And an extra fsync while writing the XID.
  - Total 4 extra fsyncs.

4 - Storing the positions in a system table using the same engine as the data

  BACKGROUND:
  The transactional mechanism of the storage engine will hide any performance
  penalties. Note, however, that the implementation needs to be well designed
  to avoid creating unnecessary entries in the transactional log and keep the
  data in memory.

  EXTRA-FSYNCS:
  - This is the best case and there is no need for extra fsyncs.

5 - Storing the positions in a system table but using a different storage engine
    from the data.

  BACKGROUND:
  Note that if the data is stored in a different storage engine from the
  positions a 2-PC is required. This is equivalent to case 1.

  EXTRA-FSYNCS:
  . One extra fsync per storage engine in the prepare phase of the protocol.
  . An extra fsync while writing the positions along with the XID.
  - Total 2 extra fsyncs.

RELATED ISSUES
==============

There are other bugs and worklogs that also have the goal of making the
slave safe. See a brief list below:

1 - BUG#45292 aims at making the index file safe.

2 - WL#4621 handles the case that the master.info and relay.info are not in sync
and the relaylog is corrupted.

3 - There is no worklog or bug to handle the case that the master gets its
binary log corrupted due to a crash. There is no positional information similar
to what we have on the slave.
CONTEXT
=======
The current code-base is structure as follows:

-------------------------------
| Slave_reporting_capability |
-------------------------------
^ ^
| |
---------------------- ----------------------
| Master_info | | Relay_log_info |
---------------------- ----------------------

Slave_reporting_capability - provides reporting capabilities.

Master_info - handles information in the master.info file.

Relay_log_info - handles information in the relay_log.info.

Both the Master_info and Relay_log_info are designed to store data into files
and have several common internal structures such as thread pointers and mutexes
that are duplicated.

PROPOSAL
========
In this work, we propose to refactory the code and create the class Rpl_info
that will be inherited by the Master_info and Relay_log_info classes. These
classes will provide methods that handle operations that are independent from
the type of persistence used (file, system tables, etc) and will hide their
associated information through a set of getter and setter methods.

Each type of persistence will have its own specialization of both the
Master_info and Relay_log_info. In particular, the current implementation which
does not provide crash-safety will have the following classes: Master_info_file
and Relay_log_info_file. Furthermore, in order to easy the development of
different specializations (WL#2775 and WL#3970), we will design a common
interface specified in the Rpl_info.

-------------------------------
| Slave_reporting_capability |
-------------------------------
^
|
----------------------
| Rpl_info |
----------------------
^ ^
| |
---------------------- ----------------------
| Master_info | | Relay_log_info |
---------------------- ----------------------
^ ^
| |
---------------------- ----------------------
| Master_info_file | | Relay_log_info_file|
---------------------- ----------------------

Requirements
============
R1. It shall preserve the current behavior which uses files to unreliable store
both the master.info and relay_log.info data.

R2. It shall enable to easily create other persistence mechanisms as described
in the WL#2775 and WL#3970.

R3. It shall enable to easily introduce options in order to choose the
persistence mechanisms available.

Rpl_info
========
This is an abstract class and provides the following interface:

check() verifies if the class is correctly initialized.

prepare_info() prepares data to be flushed and sometimes this also implies a
flush as in 2-PC mechanisms.

flush_info() flushes data to a persistent storage system.

reset_info() cleans the storage system and internal data structures setting
the class to an initialization point.

end_info() closes flushes data and closes the storage system.

Note that there is no specific method to handle the initialization process as
this is delegated to the concrete classes described in what follows.

Master_info
===========
. Master_info will handle the following information:
- Master_Log_File - The name of the master binary log currently being read
from the master.
- Read_Master_Log_Pos - The current position within the master binary log
that have been read from the master.
- Master_Host - The host name of the master.
- Master_User - The user name used to connect to the master.
- Password (not shown by SHOW SLAVE STATUS) - The password used to connect to
the master.
- Master_Port - The network port used to connect to the master.
- Connect_Retry - The period (in seconds) that the slave will wait before
trying to reconnect to the master.
- Master_SSL_Allowed - Indicates whether the server supports SSL connections.
- Master_SSL_CA_File - The file used for the Certificate Authority (CA)
certificate.
- Master_SSL_CA_Path - The path to the Certificate Authority (CA)
certificates.
- Master_SSL_Cert - The name of the SSL certificate file.
- Master_SSL_Cipher - The name of the cipher in use for the SSL connection.
- Master_SSL_Key - The name of the SSL key file.
- Master_SSL_Verify_Server_Cert - Whether to verify the server certificate.

Relay_log_info
==============
. Relay_log_info will handle the following information:
- Relay_Log_File - The name of the current relay log file.
- Relay_Log_Pos - The current position within the relay log file. Events up
to this position have been executed on the slave database.
- Relay_Master_Log_File - The name of the master binary log file from which
the events in the relay log file were read.
- Exec_Master_Log_Pos - The equivalent position within the master's binary log
file of events that have already been executed.

Master_info_file
================
. Master_info_file will implement the current behavior which unreliably stores
the information described in Master_info into a file.

Relay_log_info_file
===================
. Relay_log_info_file will implement the current behavior which unreliably
stores the information described in Relay_log_info into a file.
We propose the following design to the Rpl_info, Relay_log_info_file and
Master_info_file:

. Explicitly disable copying (copy construction and copy assignment).
. Use a facade pattern.

class Rpl_info : public Slave_reporting_capability
==============
{
public:
pthread_mutex_t data_lock,run_lock;
pthread_cond_t data_cond,start_cond,stop_cond;

THD *info_thd;

volatile bool inited;
volatile bool abort_slave;
volatile uint slave_running;
volatile ulong slave_run_id;

#ifndef DBUG_OFF
int events_until_exit;
#endif

inline int check()
{
{
return do_check();
}

inline int prepare_flush_info()
{
return do_prepare_flush_info();
}

inline int flush_info()
{
return do_flush_info();
}

inline int reset_info()
{
return do_reset_info();
}

inline void end_info()
{
do_end_info();
}

Rpl_info(const char* type);
virtual ~Rpl_info();

private:
virtual int do_check()= 0;
virtual int do_init_info()= 0;
virtual int do_prepare_flush_info()= 0;
virtual int do_flush_info()= 0;
virtual void do_end_info()= 0;
virtual int do_reset_info()= 0;

Rpl_info& operator=(const Rpl_info& info);
Rpl_info(const Rpl_info& info);
};


class Relay_log_info_file : public Relay_log_info
=========================
{
public:

const char* info_fname;
File info_fd;
IO_CACHE info_file;
uint sync_counter;

Relay_log_info_file(bool is_slave_recovery, const char* info_name);

private:
int do_check();
int do_init_info();
int do_prepare_flush_info();
int do_flush_info();
void do_end_info();
int do_reset_info();

Relay_log_info_file& operator=(const Relay_log_info_file& info);
Relay_log_info_file(const Relay_log_info_file& info);
};


class Master_info_file : public Master_info
======================
{
public:

const char* info_fname;
File info_fd;
IO_CACHE info_file;
uint sync_counter;

Master_info_file(const char* param_info_fname);

private:
int do_check();
int do_init_info();
void do_end_info();
int do_prepare_flush_info();
int do_flush_info();
int do_reset_info();

Master_info_file& operator=(const Master_info_file& info);
Master_info_file(const Master_info_file& info);
};

The following files should be added to accommodate the changes:
. rpl_info.h
. rpl_info.cc
. rpl_rli_file.h
. rpl_rli_file.cc
. rpl_mi_file.h
. rpl_mi_file.cc

You must be logged in to tag this worklog

No Comments yet

Votes

Not yet rated.
You must be logged in to vote.

Watches

0 members are watching this worklog
You must be logged in to track this worklog.

Provide Feedback

Please note:
HTML will be purified, but we allow for a number of HTML tags so that you have the flexibility to decorate your comment text to some extent. The comments allow the following HTML tags:

strong, b, em, blockquote, a, code, pre

To put code into your comment, simply encapsulate your code with
[code language="XXX"][/code], where XXX is any common language, for instance "PHP", "SQL", "C", etc.



You must be logged in to comment