ReplicationFeatures/ReplicationInterface

(Draft document)

Contents

[edit] Introduction

(This is an early draft document.)

This project is run by the MySQL Replication and Backup team.

For the purpose of this description, we separate the architecture into a set of components, where each component is a cohesive unit of code that can (but does not have to) be distributed as a separate package.

When discussing interfaces, we have two kinds of roles that are associated with the interface: users that use the interface by calling functions in it and providers that implement the interface by associating functions with each function of the interface.

We have two different kinds of interfaces under consideration: callout interfaces and callin interfaces. The callout interfaces is used by the server, which calls certain functions at well-defined point in the code and plug-ins are providers in the sense that they provide an implementation for each of the functions. The callin interfaces are interfaces where the server provides an implementation, and the external components use these interfaces by calling functions in it. The callin interfaces will be implemented as service APIs in the server, so in practice, the notion of a (server) callin interface and service API are interchangeable.

In this document, we will describe the interfaces as C++ namespaces. This does not mean that the actual implementation will be a C++ namespace.

[edit] Components

In order to describe the interfaces correctly, we have to introduce a set of (logical) components that the interfaces are associated with.

Master components

  1. Binary logging. This component is responsible for define binlog transactions, logging events and ensure that events are handled by binlog observers.
  2. Binary log file. This component is reponsible for storing logging events. This includes, but is not limited to, durability and serializability.
  3. Binlog Dump. Upon request, sends a sequence of event that are stored in a file to the client.

Slave components

  1. Binlog relay. Request events from master, store events to relay log file and

execute the events to apply the modification on slave.

Observer components

  1. Binlog server observer. observe binlog service start/stop.
  2. Binlog transaction observer. Observe binlog transactional activities.
  3. Binlog logging observer. Observe and handling events from logger.
  4. Binlog dump observer. Observe master dumping events.
  5. Binlog relay IO/SQL observer. Observe slave relay log IO and SQL execution.

Auxiliary components

  1. Event handler. Can manipulate events.

[edit] MySQL Server Replication and logging APIs

Replication APIs are divides into following categories:

  1. binlog logger functions
  2. binlog event packet functions
  3. filter functions for binlog and replication
  4. binlog file functions
  5. binlog dump service functions (master)
  6. binlog relay service functions (slave)

[edit] Binlogger API

Not used by semi-sync.


Binlogger is the manager class of binary logging mechanisam, the Binlogger can be attached one or more Binlog_handlers, which processes the events generated and propogated by Binlogger.

This is used to implement a more universal replication interface that is suitable for many replication protocols, not necessary for semi-sync.

This can seperate the binary event logging(generating) and handling(storing, processing), so that more than one binlog event process mechanisam can be attached.

   /*
     Binlogger is the class that do the logging of binary events.
   
     It is used by the server to generate Statement/Row events for SQL
     statements executed in the server and propogate binary events to all
     handlers to process.
   
     This class can also be used by components to log binary events for
     SQL statements executed outside of server or arbitraily generated
     events.
    */
   class Binlogger {
   public:
     /* start/end of binlog transaction/statement
   
        NOTE: can this be removed and use the start/end_trans in
        Trans_handler
      */
     int start_log_trans(THD *thd);
     int end_log_trans(THD *thd);
   
     /* log stm/row events, filter rules is applied before generating
        events, events are propogated to all handlers by calling
        log_event, these two functions are similiar to current
        THD::binlog_query and THD::binlog_write/update/delete_row */
     int log_query(THD *thd);
     int log_row(THD *thd);
   
     /* propogate the event to all handlers, this method is called by
        log_query and log_row, it can also be called to propogate an
        arbitrary event */
     int log_event(Log_event *event);
   
     /* add/remove handlers to/from this logger */
     int add_handler(Binlog_handler *handler);
     int remove_handler(Binlog_handler *handler);
   
     /* filter rules in this filter will be used before log_query,
        log_row to determine if given event should be generated and
        propagated through all handlers */
     Binlog_filter filter;
   };
   
   /* Logger used by MySQL server */
   extern class Binlogger mysql_binlogger;
   
   /*
     Binlogger APIs
    */
   
   /* Add/remove handlers to/from logger */
   int binlog_add_handler(Binlogger *logger, Binlog_handler *handler);
   int binlog_remove_handler(Binlogger *logger, Binlog_handler *handler);
   
   /* log stm/row events */
   int binlog_log_query(Binlogger *logger, THD *thd);
   int binlog_log_row(Binlogger *logger, THD *thd);
   
   /* Insert given event to the logger, the event will be propogated
      through all handlers of the logger */
   int binlog_log_event(Binlogger *logger, Log_event *event);


[edit] Event handler API

This is not used by semi-sync.

The interface is described with the following namespace:

 namespace Event_handler {
   int pack_event(Log_event *event, String *packet);
   Log_event* unpack_event(const char *event_buf, ulong len);
   const char* get_table(Log_event *event);
   const char* get_database(Log_event *event);
   bool is_ddl(Log_event *event);
   bool is_dml(Log_event *event);
   bool is_row(Log_event *event);
   bool is_stm(Log_event *event);
 };

[edit] Binlog filter API

Not used by semi-sync.

Filter mechanisam is provided both on master side and slave side. On master side, filter rules determines if a statement or transaction should be binlogged or not. On slave side, filter rules determines if an event should be applied or not.

Binlog filter functions

   /*
     Binlog_filter
   
     currently this is the same as Rpl_filter, can be changed to support
     more complicated filter mechanisams, for example, filter rules based
     on some kind of pattern.
      
    */
   typedef Rpl_filter Binlog_filter;
   
   /* add filter rules to the filter object */
   int binlog_filter_add_do_db(Binlog_filter *filter, const char *db);
   int binlog_filter_add_ignore_db(Binlog_filter *filter, const char *db);
   int binlog_filter_add_wild_do_db(Binlog_filter *filter, const char *db);
   int binlog_filter_add_wild_ignore_db(Binlog_filter *filter, const char *db);
   int binlog_filter_add_do_table(Binlog_filter *filter, const char *table);
   int binlog_filter_add_ignore_table(Binlog_filter *filter, const char *table);
   int binlog_filter_add_wild_do_table(Binlog_filter *filter, const char *table);
   int binlog_filter_add_wild_ignore_table(Binlog_filter *filter, const char *table);
   
   /* these three functions use the filter rules to do the filtering */
   bool binlog_filter_db(Binlog_filter *filter, const char *db);
   bool binlog_filter_table(Binlog_filter *filter, const char *table);
   bool binlog_filter_event(Binlog_filter *filter, Log_event *event);


[edit] Binlog storage API

Not used by semi-sync.

TODO: Divide into writer and reader parts!

   int binlog_file_open(Binlog_file *bf,
                        const char *log_name, const char *index_name);
   int binlog_file_close(Binlog_file *bf);
   /* binlog file is used by master for update */
   int binlog_file_in_use(Binlog_file *bf);
   int binlog_file_read_lock(Binlog_file *bf);
   int binlog_file_write_lock(Binlog_file *bf);
   int binlog_file_unlock(Binlog_file *bf);
   /* seek from current position by events */
   int binlog_file_seek_event(Binlog_file *bf, int count);
   /* return current event number */
   int binlog_file_tell_event(Binlog_file *bf);
   /* seek from current position by bytes */
   int binlog_file_seek_pos(Binlog_file *bf, my_off_t len);
   /* return current position */
   my_off_t binlog_file_tell_pos(Binlog_file *bf);
   Log_event* binlog_file_read_event(Binlog_file *bf);
   int binlog_file_write_event(Binlog_file *bf, Log_event *event);
   int binlog_file_flush(Binlog_file *bf);
   int binlog_file_purge(Binlog_file *bf);
   int binlog_file_rotate(Binlog_file *bf);
   /* return first/next/last binlog file */
   Binlog_file* binlog_file_first(Binlog_file *bf);
   Binlog_file* binlog_file_next(Binlog_file *bf);
   Binlog_file* binlog_file_last(Binlog_file *bf);
   /* generate binlog file name based on log_file if given or current
      binlog filename if log_file is NULL */
   const char* binlog_file_genname(const char *log_file);
   /* Put thread to sleep until flushed updates to binlog file */
   int binlog_file_wait_update(Binlog_file *bf, THD *thd, uint timeout);
   typedef struct Binlog_file {
       const char *log_file;
       const char *index_file;
       Binlog_index *bi;
       IOCACHE *log;
       pthread_mutex_t lock;
   } Binlog_file;

[edit] Replication Transmitter/receiver API

Not used by semi-sync.

   int rpl_send_event(THD *thd, Log_event *event);
   Log_event* rpl_read_event(THD *thd, MYSQL *mysql);

rpl_send_event sends one event via the client connection. rpl_read_event reads one event

[edit] Binlog dump service (master)

Not used by semi-sync

Binlog dump service is the service provided by MySQL server dump thread, which reads event from binlog file and send them to replication slave.

   /* master dump */
   int rpl_dump_binlog(THD *thd, const char *log_file,
                       my_off_t log_pos, int flags);

[edit] Binlog relay service (slave)

   /* slave IO */
   int rpl_connect_master(THD *thd, Master_info *mi);
   int rpl_request_binlog(THD *thd, MYSQL *mysql, const char *log_file,
                          my_off_t log_pos, inf flags);
   /* slave SQL */
   int rpl_apply_event(THD *thd, Relay_log_info *rli, Log_event *event);
   int rpl_update_pos(THD *thd, Relay_log_info *rli, Log_event *event);

rpl_binlog_dump start sending events start from the request log file and position via the client connection. rpl_connect_master connect to given master. rpl_request_binlog request MySQL server to sending events from specified log file and position.

rpl_apply_event execute event to update database. rpl_update_pos advance relay log position.

[edit] Replication plugin interface

A new kind of plugin to extend replication has been added, a set of hooks for replication extension are also provided.

Replicaton hooks are divided into several sub-category according to the service role in replication.

* binlog server observer (Binlog_server_observer)
* binlog transaction observer (Binlog_trans_observer)
* binlog logging observer (Binlog_observer)
* binlog file observer (Binlog_file_observer)
* binlog dump observer (replication master)
* binlog relay IO observer (replication slave IO)
* binlog relay execution observer (replication slave SQL)

It is designed to support mulitple replication plugins and multiple instances for each observer. So that we can have one plugin for semi-sync, one for encrypted communication, and another plugin for full-sync, and slaves can request different services for their needs. But currently there lacks mechanisams for dependency and conflict between replication plugins.

[edit] Replicator

struct Replicator is a combination of all these observers. A replicator has a pointer for all observers. Use can write a replication plugin by providing the callback functions in these observers to extend the replication mechanisam of MySQL server. It is not required to implement all callbacks in the observers, developer can selectively implement any callbacks in any observers. If no callbacks are provided for a channel, just set the observer pointer NULL in the Replicator struct.


   /**
      @struct Replicator
      @brief callback pointers for replication extension
   
      Replicator is a set of binlog/master/slave observers defined by user
      to extend binlog or replication mechanisam.
   
      User do not need to provide all these observers, they can
      selectively provide channels they needed.
    */
   typedef struct Replicator {
       uint32 len;
       int interface_version;
       /* Binlog server */
       Binlog_server_observer *binlog_server;
       /* Binlog transaction */
       Binlog_trans_observer *binlog_trans;
       /* Binlogging observer */
       Binlogging_observer *binlogging;
       /* Binlog storage observer */
       Binlog_file_observer *binlog_storage;
       /* Replication binlog dump observer */
       Repl_dump_observer *binlog_dump;
 
       /* Replication binlog relay IO observer */
       Repl_relay_IO_observer *relay_io_observer;
       /* Replication binlog relay SQL observer */
       Repl_relay_SQL_observer *relay_sql_observer;
   } Replicator;
   /*
     These two are convenient functions to register/unregister
     binlog/master/slave channels if the corresponding pointer is not
     null in the Replicator structure
    */
   bool register_replicator(Replicator *replicator);
   bool unregister_replicator(Replicator *replicator);

[edit] Server/Transaction observer API

Server observer is not used by semi-sync. Transaction observer is used by semi-sync.

This component include callback pointers for binlog server start/stop, binlog transaction.

 this is used by semi-sync to wait for a rely from slave for current
 transaction.


   /*
      Server observer
    */
   typedef struct Binlog_server_observer {
      uint32 len;
      int interface_version;
      THD *thd;
      uint server_id;
      
       /*
         binlog service start/stop
       */
       bool (*start)(Binlog_server_observer);
       bool (*stop)(Binlog_server_observer);
   } Binlog_server;
   /*
      Binlog_transaction observer 
    */
   typedef struct Binlog_trans_observer {
       uint32 len;
       int interface_version;
       THD *thd;
       /*
         binlog transaction
       */
       bool (*start_trans)(Binlog_trans_observer *trans);
       bool (*end_trans)(Binlog_trans_observer *trans);
       bool (*prepare)(Binlog_trans_observer *trans);
       bool (*commit)(Binlog_trans_observer *trans);
       bool (*rollback)(Binlog_trans_observer *trans);
       bool (*recover)(Binlog_trans_observer *trans, HASH xids);
   } Binlog_trans_observe;

[edit] Binlog logging observer

Not used by semi-sync.

Binlogging_observer is used to handle events from a Binlogger.

   /*
     Binlogging_observer is used to handle events propogated
     from a Binlogger
   
     Binary log events generated by Binlogger is propogated to instances
     of Binlog_handler to process.
   
     For example, the mechanisam to store binlog events to binlog files
     can be implemented as a Binlogging_observer. Binlogging_observe can
     also be used to do anything with the event propogated to it in callback
     function emit. 
    */
   typedef struct Binlogging_observer {
     THD *thd;
     uint server_id;
   
     /* Reference to the logger of this observer, can be used to inject an
        arbitrary events to the logger */
     Binlogger *logger;
   
     /* filter for this observer, events propogated to this handler will
        go through this filter before emit */
     Binlog_filter filter;
   
     /* actually process the event, observer can store, ignore, change,
        apply, or anything appropiate in this function */
     bool (*handle_event)(struct Binlogging_observer *observer, Log_event *event);
   } Binlogging_observer;


[edit] Storage observer API

Used by semi-sync.

TODO: Consider if this is general logging and not only files.

This interface is used to be able to listen to the events that operate on the binary log. An agent wishing to observe the process need to register an object implementing this interface to the server.

This is the callbacks for binlog file operations. This observer can be used to alter the behavior of binlog file. for example, do encryption or update a backup binlog file, etc.

Semi-sync only used the report_update callback to wait for the slave to catch up with the update on master.


   typedef struct Binlog_file_observer {
       THD *thd;
       Binlog_file *binlog_file;
       bool (*open)(Binlog_file_observer *observer);
       bool (*close)(Binlog_file_observer *observer);
       bool (*seek_pos)(Binlog_file_observer *observer, my_off_t pos);
       Log_event (*read_event)(Binlog_file_observer *observer);
       bool (*write_event)(Binlog_file_observer *observer, Log_event *event);
       bool (*flush)(Binlog_file_observer *observer);
       bool (*purge)(Binlog_file_observer *observer);
       bool (*rotate)(Binlog_file_param *observer);
 
       /*
          binlog file update
       */
       bool (*report_update)(Binlog_file_observer *observer,
                             const char *log_file, my_off_t log_pos);
   } Binlog_file_observer;

[edit] Replication transmitter observer API

Used by semi-sync.

This API is used to control the behaviour of replicaton master dumping binlog events to slaves.

 sending extra events before/after given event.
 Used by semi-sync to reserve extra header bytes to indicate if the slave
 should send a reply or not.
 the event. Used by semi-sync to update the need reply flag in the extra header
 bytes.
 command RESET MASTER is issued. Used by semi-sync to reset extra master
 status introduced by it.


   /*
      Replication binlog dump observer
      callbacks for replication master binlog dump mechanisam
   */
   typedef struct Repl_dump_observer {
       int32 len;
       int interface_version;
       THD *thd;
       /* Binlog dump flags set by slave when requesting dumping from
          master, default is 0.
       */
       int flags;
       /*
         replication binlog dump callbacks
       */
       bool (*start)(Repl_dump_observer *observer,
                     const char *log_file, my_off_t log_pos);
       bool (*stop)(Repl_dump_observer *observer);
       bool (*send_event)(Repl_dump_observer *observer, String* packet,
                          const char *log_file, my_off_t log_pos);
       int (*reserve_header)(Repl_dump_observer *observer, String *packet);
       bool (*write_header)(Repl_dump_observer *observer, char *header);
       bool (*reset)(Repl_dump_observer *observer);
   } Repl_dump_observer;

[edit] Replication relay IO service observer

Used by semi-sync.

This observer is used to alter the slave side behaviour of binlog dumping (slave IO thread).

This observer is used by semi-sync to check if the slave should provide reply for a event or not.

* request_dump is called when issueing the command BINLOG_DUMP to
  master to request for binlog events. This callback can set the
  binlog flags or issuing some other commands, for example, SET
  commands to set some variables before dumping start. Used by semi-sync
  to set the binlog flags to request semi-sync behaviour from the master.
* read_event is called when reading an event from the connection
* queue_event is called when writing the event to relay log
* read_header can be used to read extra information in the header
  reserved by reserve_header. used by semi-sync to check if this
  event need a reply or not.
* remove_header removes the extra header reserved by reserve_header,
  so that the event can be process by default event operations. used
  by semi-sync to remove extra header from the event packet, so that
  the event can be process by other part of the server.
* reset is called to do observer specific cleanup when RESET SLAVE
  command is issued. used by semi-sync to reset extra slave status
  introduced by it.

   /*
     Replication binlog relay IO
   */
   typedef struct Repl_relay_IO_observer {
       int32 len;
       int interface_version;
       THD *thd;
       Master_info *mi;
       /*
          replication relay IO callbacks
        */
       bool (*start)(Repl_relay_IO_observer *param);
       bool (*stop)(Repl_relay_IO_observer *param);
       bool (*request_dump)(Repl_relay_IO_observer *param);
       bool (*read_event)(Repl_relay_IO_observer *param,
                          const char *packet, ulong len,
                          const char**event_buf, ulong *event_len);
       bool (*queue_event)(Repl_relay_IO_observer *param,
                           const char *event_buf, ulong event_len);
       bool (*read_header)(Repl_relay_IO_observer *param, char *header);
       bool (*remove_header)(Repl_relay_IO_observer *param, const char *header,
                             const char **event_buf, ulong *len);
       bool (*reset)(Repl_relay_IO_observer *param);
   } Repl_relay_IO_observer;


[edit] Replication relay SQL service observer

Could be used by semi-sync.

This observer can be used to alter behaviour of relay log execution.

This observer can be used to implement a full-sync replication, send reply after the slave have executed the event.

* apply_event is called to execution given event and apply the
  modification to database, callback function can alter the event or
  apply extra events in it.
* reset is called to do observer specific cleanup when RESET SLAVE
  command is issued.
   /*
     Replication relay SQL observer
     callbacks for slave SQL thread
    */
   typedef struct Repl_relay_SQL_observer {
       uint32 len;
       int interface_version;
       THD *thd;
       Relay_log_info *rli;
       bool (*start)(Repl_relay_SQL_observer *param);
       bool (*stop)(Repl_relay_SQL_observer *param);
       bool (*apply_event)(Repl_relay_SQL_observer *param, Log_event *event);
       bool (*reset)(Repl_relay_SQL_observer *param);
   } Repl_relay_SQL_observer;

Retrieved from "http://forge.mysql.com/wiki/ReplicationFeatures/ReplicationInterface"

This page has been accessed 1,620 times. This page was last modified 09:52, 8 May 2008.

Find

Browse
MySQLForge
Main Page
Current events
Recent changes
Random page
Help
Edit
Edit this page
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Special pages
New pages
File list
Statistics
Bug reports
More...