WL#2540: Replication event checksums

Affects: Server-6.x — Status: In-Progress — Priority: Low

SITUATION
=========

Events are written to the binary log as they are executed. They are
then sent to the slave on an event-by-event basis either through a
socket or over a network.


RATIONALE
=========

This is some fundamental validity checking that would check that
replication is working correctly.

It would make it much more clear to our customers when the replication
failure is due to network/disk/memory failures and when the failure is
due to bugs in the servers.  See amazingly long list below for
potentially previously affected customers.

WISHFUL REQUIREMENT
===================

After this is implemented, it should (preferingly) not be possible 
to ever crash the slave due to corrupted events.  To make this happen,
one could go through all values in all events and check that they can't
be illegal.  This could possibly be a second patch.

PROBLEM
=======

Some customers get very strange replication failures and 
it is impossible to know what causes them.  Sinisa says
that they could be "network problems" (e.g. CSC#4792).

The failures causes the slave to corrupt the data rather than stop,
which would be the appropriate action.

Some reported incidents:
- BUG#25737
- BUG#26123
- BUG#27048
- BUG#23619
- BUG#29309
- http://forums.mysql.com/read.php?26,148423,148423
- BUG#22889
- BUG#5116


SUMMARY
=======

Add checksum to binary log events.

SINISA WRITES (2005-04-13, CSC#4792):
This points to the missing reliability features, that SHOULD be
implemented in 5.0.

* checksum stored in binary and relay log to check for RAM / disk
  corruption. 

* checksum sent to slave for each event to check for network
  corruption. 


TYPE OF CHECKSUM
================

Alternatives:

  1. 8-12 bit checksum
     PRO: Can use the flags part of common event header 
          and thus not need to change binary log format
     CON: Can fail to detect corrupt binary log with some probability,
          i.e. one chance in 256 or one chance in 16 * 256.
 
  2. CRC32
     PRO: Standard, little probability of failures (1 in 4 * 256)
          Computationally inexpensive
     CON: Can't be used to sign events

  3. SHA or MD5
     PRO: Can also be used for authentication of events
     CON: Are very long (at least 128 bits)
          Computationally expensive.

Mats and Lars are currently considering 32 bit checksum 
(described in ISO 3309).


REPAIRING EVENTS
================

- Should the checksum be capable of also repairing the event?
  PRO: That would be really nice, especially if the mysqlbinlog 
       client can be extended to do repairs of binary logs
  CON: Then we can't use the bits for authentication

Mats and Lars thinks that we can have the type of the checksum in the
format_description log event (or implicitly as the binary log version).
This so that we, in the future, can change to an error correcting 
checksum.


SUGGESTED SOLUTION (By Mats)
============================

To ensure the integrity of each event arriving to the slave, a checksum should
be added to each event.  This allows the slave to check that the event was
transmitted correctly and written correctly to the relay log. If it was not, the
slave can stop indicating an error rather than trying to process the event.

This is particularly important when using row-based replication, since subtle
transmission errors can be applied without any form of error.

In my opinion, we should focus on the integrity of the events, and ignore issues
that relates to authenticity, since methods for handling that are
computationally expensive, and can be achieved through other means (e.g., using
SSL). Note that the suggestion below (using SSL) does not solve the issue of
corruption in the binary log or in the relay log: it just handled the corruption
during transfer of the event.


INTERESTING WORK-AROUND (ANDREI'S TEXT)
=======================================

From reading about SSL's features [1] and simulating corrupted packets
on slave via changing data just after libc's recv function returns the
buffer we can conclude that this idea (using SSL to create and verify
a checksum) should work.

Of course I can not check all the possible situations, for that we
would need to study algorithms that are used.  Almost obviously the
alg of encryption in ssl take care of a checksum.

Reference: 

    [1] 5.1 Manual, 5.8.7.1. Basic SSL Concepts

    SSL is a protocol that uses different encryption algorithms to
    ensure that data received over a public network can be trusted. It
    has mechanisms to detect any data change, loss, or replay

Since SSL is not used for every connection, we still need to consider per-event
checksums.


OPEN ISSUES
===========

1. How about only having checksums at commit time?  How would that work?
   What about non-trx tables, these would probably need a 
   checksum in that event.  

   Probably the solution is that *whenever* something is committed, 
   the checksum is needed, be it a statement (due to autocommit), 
   a non-transactional table update, or a real transaction.
 
NOTES FROM DISCUSSION (Lars, Lus) - 26/02/2009
===============================================


19 bytes header Checksum
+----+-----------+----+
| HD | PAYLOAD | CS | - EVENT e
+----+-----------+----+

A. log_event.cc
1. take two bits for stating length of checksum (L)
2. check_and_transfer (event e)
3. fm(e) == fs(e-L) return e;

B. mysqlbinlog
1. we need to upgrade the tools to work with the new checksum
field (these should now handle the checksum as well)

C. work with multiple slave versions
1. OM -> OS (ok)
2. OM -> NS (ok)
3. NM -> OS a) slave version check, b) strip checksum
4. NM -> NS (ok) - set 00 header
- change (e-L)

OM - Old Master, OS - Old Slave,
NM - New Master, NS - New Slave

NOTES:
a) we need to check if the slave registers some information
about its version when connecting to the master

D. Make implementation modular
1. make a function that can be reused in several places

E. Configuration option (to state which type of checksum to use)
1. --binlog--checksum-bytes=0,2,4,8
2. defaults to 0

F. Target 6.0 tree

You must be logged in to tag this worklog

If you are mentioning bug numbers, please include BUG#26489

Votes

  • Rated 4.00 out of 5
Rated 4.00 out of 5 with 1 votes cast.
You must be logged in to vote.

Watches

2 members are watching this worklog
You must be logged in to track this worklog.

Provide Feedback

Please note:
HTML will be purified, but we allow for a number of HTML tags so that you have the flexibility to decorate your comment text to some extent. The comments allow the following HTML tags:

strong, b, em, blockquote, a, code, pre

To put code into your comment, simply encapsulate your code with
[code language="XXX"][/code], where XXX is any common language, for instance "PHP", "SQL", "C", etc.



You must be logged in to comment