WL#2387: Replication Master Filtering

Affects: WorkLog-3.4 — Status: Un-Assigned — Priority: Medium

SUMMARY
-------
Be able to have the replication filters work on master instead of
on the slave.  (Currently data is being replicated to the 
slave even if the filters on the slave discard that data.)


MOTIVATION
----------
Much less network bandwidth used when replicating.  
Tables, databases that should be filtered away are 
being so already at the master.


REQUIREMENTS
----------------
1. Filtering on originating server (or originating cluster if we implement that)
   could also be done on the master.

USER INTERFACE
--------------
The following options start to take effect on master instead
of slave:
`--replicate-do-db=DB_NAME'
`--replicate-do-table=DB_NAME.TBL_NAME'
`--replicate-ignore-db=DB_NAME'
`--replicate-ignore-table=DB_NAME.TBL_NAME'
`--replicate-wild-do-table=DB_NAME.TBL_NAME'
`--replicate-wild-ignore-table=DB_NAME.TBL_NAME'

The following options still take effect at the slave:
`--replicate-rewrite-db=FROM_NAME->TO_NAME'


OPEN ISSUE
----------
Either the filtering can be controlled by the master (so that 
slaves would only get what the master has defined).  Alternatively
each slave can connect to the master with a different defintion of
filter.  The latter version needs changes to the way the slave
asks the master for the binlog.


OPTIONAL EXTENSION
------------------
All of this options could be added to CHANGE MASTER in the 
following way:
CHANGE MASTER 'foo' TO MASTER_HOST=127.0.0.1, REPLICATE-DO-DB='mydb';


IMPLEMENTATION
--------------
All filtering code is refactored into a separate file 
rpl_filter.cc

Part 1: When the slave registers on the master it forwards 
        information about all filters that should be applied.
        This requires an exension to the function
        slave.cc:register_slave_on_master().

Part 2: The master adds functionality in the dump thread 
        to filter things.  Much of the code in rpl_filter.cc
        can be used for this (functions like slave.cc:db_ok())


BINLOG EXTENSIONS
-----------------
There is a possibility to divide the filtered binlog into 
separate binlogs, i.e. on binlog for one database and another 
for another database (Brian seems fond of this idea.)

If we choose this path, we need to rename binlog files 
accordingly, for instance like this:
- <name>-bin.index
- <name>-bin.NNNNNN

Note, however that this is not really needed for filtering 
on master.  One could just use one binlog and then apply 
the filtering in the dump thread instead.  There are, however,
benefits in dividing it into multiple binlogs (e.g. backups 
could be done of different binlogs at different times.  Purging 
could be done differently on different binlogs).

It is not yet decided if this extension should be implemented.

Lars suggests that the naming of the binlogs is separate from 
the naming of the schemas, i.e. no automatic naming.  When 
you specify that you want this schema in that binlog, you 
can provide the binlog name then.  This removes problems with 
renamed schemas etc.  Also it makes it more flexible (e.g.
perhaps we want binlogs on other filters than schemas)

See also Guilhems notes in WL#1401.

NOTES
-----
There are corresponding ideas for filtering the query log, 
see WL#3017.
Use rpl_filter for the actual logic behind the filtering
mechanisms (Master binlog filtering, master replication filtering and
slave replication filtering), but that a cached variable on the table
object makes sense. Add "uint32 table->s->flags" and the
following enum in table->s:
enum enum_flag
{
FILTER_BINLOG_SEND_F = (1U << 0),
FILTER_BINLOG_WRITE_F = (1U << 1),
FILTER_SLAVE_EXECUTE_F = (1U << 2)
};
Whenever the table object is created, the corresponding rpl_filter
object should be asked for how to set each flag.

You must be logged in to tag this worklog

This is a feature that we would really like to have. It is troublesome to have to restart the slave when a new database is added to a server that needs to be replicated.

I tend to like the filtering done on the slave, as I may have two slaves that are applying binlogs for different databases from the same master, but I would be satisfied with either, as there's some advantages as well if you only need to grab the binlog that contains info on the database you are replicating.

Votes

  • Rated 4.00 out of 5
Rated 4.00 out of 5 with 1 votes cast.
You must be logged in to vote.

Watches

0 members are watching this worklog
You must be logged in to track this worklog.

Provide Feedback

Please note:
HTML will be purified, but we allow for a number of HTML tags so that you have the flexibility to decorate your comment text to some extent. The comments allow the following HTML tags:

strong, b, em, blockquote, a, code, pre

To put code into your comment, simply encapsulate your code with
[code language="XXX"][/code], where XXX is any common language, for instance "PHP", "SQL", "C", etc.



You must be logged in to comment