WL#2387: Replication Master FilteringAffects: WorkLog-3.4 — Status: Un-Assigned — Priority: MediumSUMMARY
-------
Be able to have the replication filters work on master instead of
on the slave. (Currently data is being replicated to the
slave even if the filters on the slave discard that data.)
MOTIVATION
----------
Much less network bandwidth used when replicating.
Tables, databases that should be filtered away are
being so already at the master.
REQUIREMENTS
----------------
1. Filtering on originating server (or originating cluster if we implement that)
could also be done on the master.
USER INTERFACE
--------------
The following options start to take effect on master instead
of slave:
`--replicate-do-db=DB_NAME'
`--replicate-do-table=DB_NAME.TBL_NAME'
`--replicate-ignore-db=DB_NAME'
`--replicate-ignore-table=DB_NAME.TBL_NAME'
`--replicate-wild-do-table=DB_NAME.TBL_NAME'
`--replicate-wild-ignore-table=DB_NAME.TBL_NAME'
The following options still take effect at the slave:
`--replicate-rewrite-db=FROM_NAME->TO_NAME'
OPEN ISSUE
----------
Either the filtering can be controlled by the master (so that
slaves would only get what the master has defined). Alternatively
each slave can connect to the master with a different defintion of
filter. The latter version needs changes to the way the slave
asks the master for the binlog.
OPTIONAL EXTENSION
------------------
All of this options could be added to CHANGE MASTER in the
following way:
CHANGE MASTER 'foo' TO MASTER_HOST=127.0.0.1, REPLICATE-DO-DB='mydb';
IMPLEMENTATION
--------------
All filtering code is refactored into a separate file
rpl_filter.cc
Part 1: When the slave registers on the master it forwards
information about all filters that should be applied.
This requires an exension to the function
slave.cc:register_slave_on_master().
Part 2: The master adds functionality in the dump thread
to filter things. Much of the code in rpl_filter.cc
can be used for this (functions like slave.cc:db_ok())
BINLOG EXTENSIONS
-----------------
There is a possibility to divide the filtered binlog into
separate binlogs, i.e. on binlog for one database and another
for another database (Brian seems fond of this idea.)
If we choose this path, we need to rename binlog files
accordingly, for instance like this:
- <name>-bin.index
- <name>-bin.NNNNNN
Note, however that this is not really needed for filtering
on master. One could just use one binlog and then apply
the filtering in the dump thread instead. There are, however,
benefits in dividing it into multiple binlogs (e.g. backups
could be done of different binlogs at different times. Purging
could be done differently on different binlogs).
It is not yet decided if this extension should be implemented.
Lars suggests that the naming of the binlogs is separate from
the naming of the schemas, i.e. no automatic naming. When
you specify that you want this schema in that binlog, you
can provide the binlog name then. This removes problems with
renamed schemas etc. Also it makes it more flexible (e.g.
perhaps we want binlogs on other filters than schemas)
See also Guilhems notes in WL#1401.
NOTES
-----
There are corresponding ideas for filtering the query log,
see WL#3017.Use rpl_filter for the actual logic behind the filtering |
VotesWatches0 members are watching this worklog
You must be logged in to track this worklog.
Provide Feedback
You must be logged in to comment
|
This is a feature that we would really like to have. It is troublesome to have to restart the slave when a new database is added to a server that needs to be replicated.
I tend to like the filtering done on the slave, as I may have two slaves that are applying binlogs for different databases from the same master, but I would be satisfied with either, as there's some advantages as well if you only need to grab the binlog that contains info on the database you are replicating.