ReplicationFeatures/ReplicationInterface
(Draft document)
Contents |
[edit] Introduction
(This is an early draft document.)
This project is run by the MySQL Replication and Backup team.
For the purpose of this description, we separate the architecture into a set of components, where each component is a cohesive unit of code that can (but does not have to) be distributed as a separate package.
When discussing interfaces, we have two kinds of roles that are associated with the interface: users that use the interface by calling functions in it and providers that implement the interface by associating functions with each function of the interface.
We have two different kinds of interfaces under consideration: callout interfaces and callin interfaces. The callout interfaces is used by the server, which calls certain functions at well-defined point in the code and plug-ins are providers in the sense that they provide an implementation for each of the functions. The callin interfaces are interfaces where the server provides an implementation, and the external components use these interfaces by calling functions in it. The callin interfaces will be implemented as service APIs in the server, so in practice, the notion of a (server) callin interface and service API are interchangeable.
In this document, we will describe the interfaces as C++ namespaces. This does not mean that the actual implementation will be a C++ namespace.
[edit] Components
In order to describe the interfaces correctly, we have to introduce a set of (logical) components that the interfaces are associated with.
Master components
- Binary logging. This component is responsible for define binlog transactions, logging events and ensure that events are handled by binlog observers.
- Binary log file. This component is reponsible for storing logging events. This includes, but is not limited to, durability and serializability.
- Binlog Dump. Upon request, sends a sequence of event that are stored in a file to the client.
Slave components
- Binlog relay. Request events from master, store events to relay log file and
execute the events to apply the modification on slave.
Observer components
- Binlog server observer. observe binlog service start/stop.
- Binlog transaction observer. Observe binlog transactional activities.
- Binlog logging observer. Observe and handling events from logger.
- Binlog dump observer. Observe master dumping events.
- Binlog relay IO/SQL observer. Observe slave relay log IO and SQL execution.
Auxiliary components
- Event handler. Can manipulate events.
[edit] MySQL Server Replication and logging APIs
Replication APIs are divides into following categories:
- binlog logger functions
- binlog event packet functions
- filter functions for binlog and replication
- binlog file functions
- binlog dump service functions (master)
- binlog relay service functions (slave)
[edit] Binlogger API
- User: MySQL server
- Provider: MySQL server
- Files: log.h log.cc
Not used by semi-sync.
Binlogger is the manager class of binary logging mechanisam, the Binlogger
can be attached one or more Binlog_handlers, which processes the events
generated and propogated by Binlogger.
This is used to implement a more universal replication interface that is suitable for many replication protocols, not necessary for semi-sync.
This can seperate the binary event logging(generating) and handling(storing, processing), so that more than one binlog event process mechanisam can be attached.
/*
Binlogger is the class that do the logging of binary events.
It is used by the server to generate Statement/Row events for SQL
statements executed in the server and propogate binary events to all
handlers to process.
This class can also be used by components to log binary events for
SQL statements executed outside of server or arbitraily generated
events.
*/
class Binlogger {
public:
/* start/end of binlog transaction/statement
NOTE: can this be removed and use the start/end_trans in
Trans_handler
*/
int start_log_trans(THD *thd);
int end_log_trans(THD *thd);
/* log stm/row events, filter rules is applied before generating
events, events are propogated to all handlers by calling
log_event, these two functions are similiar to current
THD::binlog_query and THD::binlog_write/update/delete_row */
int log_query(THD *thd);
int log_row(THD *thd);
/* propogate the event to all handlers, this method is called by
log_query and log_row, it can also be called to propogate an
arbitrary event */
int log_event(Log_event *event);
/* add/remove handlers to/from this logger */
int add_handler(Binlog_handler *handler);
int remove_handler(Binlog_handler *handler);
/* filter rules in this filter will be used before log_query,
log_row to determine if given event should be generated and
propagated through all handlers */
Binlog_filter filter;
};
/* Logger used by MySQL server */
extern class Binlogger mysql_binlogger;
/*
Binlogger APIs
*/
/* Add/remove handlers to/from logger */
int binlog_add_handler(Binlogger *logger, Binlog_handler *handler);
int binlog_remove_handler(Binlogger *logger, Binlog_handler *handler);
/* log stm/row events */
int binlog_log_query(Binlogger *logger, THD *thd);
int binlog_log_row(Binlogger *logger, THD *thd);
/* Insert given event to the logger, the event will be propogated
through all handlers of the logger */
int binlog_log_event(Binlogger *logger, Log_event *event);
[edit] Event handler API
- User: External components
- Provider: Component "Event handler" (to start with this will be provided by the MySQL server)
- Files: log_event.h log_event.cc
This is not used by semi-sync.
The interface is described with the following namespace:
namespace Event_handler {
int pack_event(Log_event *event, String *packet);
Log_event* unpack_event(const char *event_buf, ulong len);
const char* get_table(Log_event *event);
const char* get_database(Log_event *event);
bool is_ddl(Log_event *event);
bool is_dml(Log_event *event);
bool is_row(Log_event *event);
bool is_stm(Log_event *event);
};
[edit] Binlog filter API
- User: MySQL Server (or possibly external components)
- Provider: External component
- Depends on: Nothing, this is standalone
- Files: rpl_filter.h rpl_filter.cc
Not used by semi-sync.
Filter mechanisam is provided both on master side and slave side. On master side, filter rules determines if a statement or transaction should be binlogged or not. On slave side, filter rules determines if an event should be applied or not.
Binlog filter functions
/*
Binlog_filter
currently this is the same as Rpl_filter, can be changed to support
more complicated filter mechanisams, for example, filter rules based
on some kind of pattern.
*/
typedef Rpl_filter Binlog_filter;
/* add filter rules to the filter object */
int binlog_filter_add_do_db(Binlog_filter *filter, const char *db);
int binlog_filter_add_ignore_db(Binlog_filter *filter, const char *db);
int binlog_filter_add_wild_do_db(Binlog_filter *filter, const char *db);
int binlog_filter_add_wild_ignore_db(Binlog_filter *filter, const char *db);
int binlog_filter_add_do_table(Binlog_filter *filter, const char *table);
int binlog_filter_add_ignore_table(Binlog_filter *filter, const char *table);
int binlog_filter_add_wild_do_table(Binlog_filter *filter, const char *table);
int binlog_filter_add_wild_ignore_table(Binlog_filter *filter, const char *table);
/* these three functions use the filter rules to do the filtering */
bool binlog_filter_db(Binlog_filter *filter, const char *db);
bool binlog_filter_table(Binlog_filter *filter, const char *table);
bool binlog_filter_event(Binlog_filter *filter, Log_event *event);
[edit] Binlog storage API
- User: MySQL Server
- Provider: External component
- Depends on: nothing
- Files: log.h log.cc
Not used by semi-sync.
TODO: Divide into writer and reader parts!
int binlog_file_open(Binlog_file *bf,
const char *log_name, const char *index_name);
int binlog_file_close(Binlog_file *bf);
/* binlog file is used by master for update */
int binlog_file_in_use(Binlog_file *bf);
int binlog_file_read_lock(Binlog_file *bf);
int binlog_file_write_lock(Binlog_file *bf);
int binlog_file_unlock(Binlog_file *bf);
/* seek from current position by events */
int binlog_file_seek_event(Binlog_file *bf, int count);
/* return current event number */
int binlog_file_tell_event(Binlog_file *bf);
/* seek from current position by bytes */
int binlog_file_seek_pos(Binlog_file *bf, my_off_t len);
/* return current position */
my_off_t binlog_file_tell_pos(Binlog_file *bf);
Log_event* binlog_file_read_event(Binlog_file *bf);
int binlog_file_write_event(Binlog_file *bf, Log_event *event);
int binlog_file_flush(Binlog_file *bf);
int binlog_file_purge(Binlog_file *bf);
int binlog_file_rotate(Binlog_file *bf);
/* return first/next/last binlog file */
Binlog_file* binlog_file_first(Binlog_file *bf);
Binlog_file* binlog_file_next(Binlog_file *bf);
Binlog_file* binlog_file_last(Binlog_file *bf);
/* generate binlog file name based on log_file if given or current
binlog filename if log_file is NULL */
const char* binlog_file_genname(const char *log_file);
/* Put thread to sleep until flushed updates to binlog file */
int binlog_file_wait_update(Binlog_file *bf, THD *thd, uint timeout);
typedef struct Binlog_file {
const char *log_file;
const char *index_file;
Binlog_index *bi;
IOCACHE *log;
pthread_mutex_t lock;
} Binlog_file;
[edit] Replication Transmitter/receiver API
- User: MySQL Server, external component
- Provider: MySQL Server
- Files:
Not used by semi-sync.
int rpl_send_event(THD *thd, Log_event *event); Log_event* rpl_read_event(THD *thd, MYSQL *mysql);
rpl_send_event sends one event via the client connection. rpl_read_event reads one event
[edit] Binlog dump service (master)
- User: external component
- Provider: external component
- Files: sql_repl.cc
Not used by semi-sync
Binlog dump service is the service provided by MySQL server dump thread, which reads event from binlog file and send them to replication slave.
/* master dump */
int rpl_dump_binlog(THD *thd, const char *log_file,
my_off_t log_pos, int flags);
[edit] Binlog relay service (slave)
- User: external component
- Provider: external component
- Files: slave.cc
/* slave IO */
int rpl_connect_master(THD *thd, Master_info *mi);
int rpl_request_binlog(THD *thd, MYSQL *mysql, const char *log_file,
my_off_t log_pos, inf flags);
/* slave SQL */ int rpl_apply_event(THD *thd, Relay_log_info *rli, Log_event *event); int rpl_update_pos(THD *thd, Relay_log_info *rli, Log_event *event);
rpl_binlog_dump start sending events start from the request log file and position via the client connection. rpl_connect_master connect to given master. rpl_request_binlog request MySQL server to sending events from specified log file and position.
rpl_apply_event execute event to update database. rpl_update_pos advance relay log position.
[edit] Replication plugin interface
A new kind of plugin to extend replication has been added, a set of hooks for replication extension are also provided.
Replicaton hooks are divided into several sub-category according to the service role in replication.
* binlog server observer (Binlog_server_observer) * binlog transaction observer (Binlog_trans_observer) * binlog logging observer (Binlog_observer) * binlog file observer (Binlog_file_observer) * binlog dump observer (replication master) * binlog relay IO observer (replication slave IO) * binlog relay execution observer (replication slave SQL)
It is designed to support mulitple replication plugins and multiple instances for each observer. So that we can have one plugin for semi-sync, one for encrypted communication, and another plugin for full-sync, and slaves can request different services for their needs. But currently there lacks mechanisams for dependency and conflict between replication plugins.
[edit] Replicator
- User: MySQL server
- Provider: external component
struct Replicator is a combination of all these observers. A replicator has a pointer for all observers. Use can write a replication plugin by providing the callback functions in these observers to extend the replication mechanisam of MySQL server. It is not required to implement all callbacks in the observers, developer can selectively implement any callbacks in any observers. If no callbacks are provided for a channel, just set the observer pointer NULL in the Replicator struct.
/**
@struct Replicator
@brief callback pointers for replication extension
Replicator is a set of binlog/master/slave observers defined by user
to extend binlog or replication mechanisam.
User do not need to provide all these observers, they can
selectively provide channels they needed.
*/
typedef struct Replicator {
uint32 len;
int interface_version;
/* Binlog server */
Binlog_server_observer *binlog_server;
/* Binlog transaction */
Binlog_trans_observer *binlog_trans;
/* Binlogging observer */
Binlogging_observer *binlogging;
/* Binlog storage observer */
Binlog_file_observer *binlog_storage;
/* Replication binlog dump observer */
Repl_dump_observer *binlog_dump;
/* Replication binlog relay IO observer */
Repl_relay_IO_observer *relay_io_observer;
/* Replication binlog relay SQL observer */
Repl_relay_SQL_observer *relay_sql_observer;
} Replicator;
/*
These two are convenient functions to register/unregister
binlog/master/slave channels if the corresponding pointer is not
null in the Replicator structure
*/
bool register_replicator(Replicator *replicator);
bool unregister_replicator(Replicator *replicator);
[edit] Server/Transaction observer API
- User: MySQL server
- Provider: External "Transaction listener" component, e.g. the semi-synchronous replication component.
- Files: log.cc handler.cc sql_parse.cc mysqld.cc
Server observer is not used by semi-sync. Transaction observer is used by semi-sync.
This component include callback pointers for binlog server start/stop, binlog transaction.
- start is the point when server is starting and binlog service is initialized.
- stop is the point when server is exiting and binlog service is stopped.
- start_trans is when binary log is started for a statement or a transaction. (THD::binlog_start_trans_and_stmt)
- end_trans is when binary log is ended for a statement or a transaction (THD::binlog_end_trans), this is after the binlog for current transaction or statement has been written to binlog file, and before transaction is committed or rollbacked.
- commit is when transaction or statement has been committed in storage engines,
this is used by semi-sync to wait for a rely from slave for current transaction.
- rollback is when transaction has bee rollbacked in storage engines
- recover is called to recovery prepared transactions, the callback should do channel specific recovery operations.
/*
Server observer
*/
typedef struct Binlog_server_observer {
uint32 len;
int interface_version;
THD *thd;
uint server_id;
/*
binlog service start/stop
*/
bool (*start)(Binlog_server_observer);
bool (*stop)(Binlog_server_observer);
} Binlog_server;
/*
Binlog_transaction observer
*/
typedef struct Binlog_trans_observer {
uint32 len;
int interface_version;
THD *thd;
/*
binlog transaction
*/
bool (*start_trans)(Binlog_trans_observer *trans);
bool (*end_trans)(Binlog_trans_observer *trans);
bool (*prepare)(Binlog_trans_observer *trans);
bool (*commit)(Binlog_trans_observer *trans);
bool (*rollback)(Binlog_trans_observer *trans);
bool (*recover)(Binlog_trans_observer *trans, HASH xids);
} Binlog_trans_observe;
[edit] Binlog logging observer
- User: MySQL server
- Provider: component
Not used by semi-sync.
Binlogging_observer is used to handle events from a Binlogger.
- emit_event is when a binary log event is write to transaction log or binlog file, in the callback, the event can be changed, and extra events can be inserted or appended.
/*
Binlogging_observer is used to handle events propogated
from a Binlogger
Binary log events generated by Binlogger is propogated to instances
of Binlog_handler to process.
For example, the mechanisam to store binlog events to binlog files
can be implemented as a Binlogging_observer. Binlogging_observe can
also be used to do anything with the event propogated to it in callback
function emit.
*/
typedef struct Binlogging_observer {
THD *thd;
uint server_id;
/* Reference to the logger of this observer, can be used to inject an
arbitrary events to the logger */
Binlogger *logger;
/* filter for this observer, events propogated to this handler will
go through this filter before emit */
Binlog_filter filter;
/* actually process the event, observer can store, ignore, change,
apply, or anything appropiate in this function */
bool (*handle_event)(struct Binlogging_observer *observer, Log_event *event);
} Binlogging_observer;
[edit] Storage observer API
- User: MySQL server
- Provider: External component
Used by semi-sync.
TODO: Consider if this is general logging and not only files.
This interface is used to be able to listen to the events that operate on the binary log. An agent wishing to observe the process need to register an object implementing this interface to the server.
This is the callbacks for binlog file operations. This observer can be used to alter the behavior of binlog file. for example, do encryption or update a backup binlog file, etc.
Semi-sync only used the report_update callback to wait for the slave to catch up with the update on master.
typedef struct Binlog_file_observer {
THD *thd;
Binlog_file *binlog_file;
bool (*open)(Binlog_file_observer *observer);
bool (*close)(Binlog_file_observer *observer);
bool (*seek_pos)(Binlog_file_observer *observer, my_off_t pos);
Log_event (*read_event)(Binlog_file_observer *observer);
bool (*write_event)(Binlog_file_observer *observer, Log_event *event);
bool (*flush)(Binlog_file_observer *observer);
bool (*purge)(Binlog_file_observer *observer);
bool (*rotate)(Binlog_file_param *observer);
/*
binlog file update
*/
bool (*report_update)(Binlog_file_observer *observer,
const char *log_file, my_off_t log_pos);
} Binlog_file_observer;
[edit] Replication transmitter observer API
- User: MySQL server
- Provider: External component
Used by semi-sync.
This API is used to control the behaviour of replicaton master dumping binlog events to slaves.
- send_event can be used to alter the event before sending to slaves, or
sending extra events before/after given event.
- reserve_header can be used to reserve extra space to store extra information.
Used by semi-sync to reserve extra header bytes to indicate if the slave should send a reply or not.
- write_header can be used to update these extra information before sending out
the event. Used by semi-sync to update the need reply flag in the extra header bytes.
- reset can be used to do observer specific cleanup on the master when the
command RESET MASTER is issued. Used by semi-sync to reset extra master status introduced by it.
/*
Replication binlog dump observer
callbacks for replication master binlog dump mechanisam
*/
typedef struct Repl_dump_observer {
int32 len;
int interface_version;
THD *thd;
/* Binlog dump flags set by slave when requesting dumping from
master, default is 0.
*/
int flags;
/*
replication binlog dump callbacks
*/
bool (*start)(Repl_dump_observer *observer,
const char *log_file, my_off_t log_pos);
bool (*stop)(Repl_dump_observer *observer);
bool (*send_event)(Repl_dump_observer *observer, String* packet,
const char *log_file, my_off_t log_pos);
int (*reserve_header)(Repl_dump_observer *observer, String *packet);
bool (*write_header)(Repl_dump_observer *observer, char *header);
bool (*reset)(Repl_dump_observer *observer);
} Repl_dump_observer;
[edit] Replication relay IO service observer
- User: component
- Provider: component
Used by semi-sync.
This observer is used to alter the slave side behaviour of binlog dumping (slave IO thread).
This observer is used by semi-sync to check if the slave should provide reply for a event or not.
* request_dump is called when issueing the command BINLOG_DUMP to master to request for binlog events. This callback can set the binlog flags or issuing some other commands, for example, SET commands to set some variables before dumping start. Used by semi-sync to set the binlog flags to request semi-sync behaviour from the master. * read_event is called when reading an event from the connection * queue_event is called when writing the event to relay log * read_header can be used to read extra information in the header reserved by reserve_header. used by semi-sync to check if this event need a reply or not. * remove_header removes the extra header reserved by reserve_header, so that the event can be process by default event operations. used by semi-sync to remove extra header from the event packet, so that the event can be process by other part of the server. * reset is called to do observer specific cleanup when RESET SLAVE command is issued. used by semi-sync to reset extra slave status introduced by it.
/*
Replication binlog relay IO
*/
typedef struct Repl_relay_IO_observer {
int32 len;
int interface_version;
THD *thd;
Master_info *mi;
/*
replication relay IO callbacks
*/
bool (*start)(Repl_relay_IO_observer *param);
bool (*stop)(Repl_relay_IO_observer *param);
bool (*request_dump)(Repl_relay_IO_observer *param);
bool (*read_event)(Repl_relay_IO_observer *param,
const char *packet, ulong len,
const char**event_buf, ulong *event_len);
bool (*queue_event)(Repl_relay_IO_observer *param,
const char *event_buf, ulong event_len);
bool (*read_header)(Repl_relay_IO_observer *param, char *header);
bool (*remove_header)(Repl_relay_IO_observer *param, const char *header,
const char **event_buf, ulong *len);
bool (*reset)(Repl_relay_IO_observer *param);
} Repl_relay_IO_observer;
[edit] Replication relay SQL service observer
- User: component
- Provider: component
Could be used by semi-sync.
This observer can be used to alter behaviour of relay log execution.
This observer can be used to implement a full-sync replication, send reply after the slave have executed the event.
* apply_event is called to execution given event and apply the modification to database, callback function can alter the event or apply extra events in it. * reset is called to do observer specific cleanup when RESET SLAVE command is issued.
/*
Replication relay SQL observer
callbacks for slave SQL thread
*/
typedef struct Repl_relay_SQL_observer {
uint32 len;
int interface_version;
THD *thd;
Relay_log_info *rli;
bool (*start)(Repl_relay_SQL_observer *param);
bool (*stop)(Repl_relay_SQL_observer *param);
bool (*apply_event)(Repl_relay_SQL_observer *param, Log_event *event);
bool (*reset)(Repl_relay_SQL_observer *param);
} Repl_relay_SQL_observer;