MySQL Internals MySQL Backup Stream API
← Back to MySQL Internals overview page
Contents |
[edit] MySQL Backup Stream API
This document describes the format of the output produced by the BACKUP DATABASE statement. The information here can be used to decipher the output of a backup operation.
[edit] Backup Stream Format (Overview)
A backup stream is a sequence of bytes that consists of a 10-byte prefix followed by a backup image. The prefix contains a magic number and an image format version. The backup image should be interpreted according to the version number.
The backup image structure can be thought of at two levels:
- The transport layer consists of blocks, all of which have the same size (except possibly the last). The size is given at the beginning of the first block.
- The image layer consists of chunks, which have varying size.
At the transport layer, blocks are read and decomposed to obtain fragments (one or more per block). Fragments are assembled to obtain chunks. A chunk is not necessarily smaller than or larger than a block. A chunk might be smaller than a block, or it might contain information from multiple blocks. That is, a chunk is not always constructed from an integer number of blocks.
The programming interface to the image layer returns a chunk at a time, calling the transport layer to read blocks as necessary for decomposition into fragments and assembly into the next chunk. There is a function (or method) interface that can be used for reading chunks.
The remainder of this discussion describes how to interpret the backup stream, both at the image layer and at the transport layer. Note that the 10-byte backup stream prefix is not part of the backup image, at either layer. It must be read separately.
[edit] Source Files Related to the Backup Stream API
The relevant source files pertaining to backup stream processing are in the sql/backup directory of a MySQL source tree. These files contain the library of functions that are used for reading a backup stream:
- stream_v1.h
- stream_v1.c
- stream_v1_transport.c
- stream_v1_services.h
[edit] Generating Browsable Stream Format Information
Source files in the sql/backup directory of a MySQL source tree or source distribution contain comments that can be processed with doxygen to generate HTML files that describe classes, files, and so forth. Those files can be viewed in your Web browser.
To generate the HTML files and view information about the classes related to backup stream format, do this:
- In your MySQL source tree, change location to the sql/backup directory:
- shell> cd sql/backup
- Run doxygen to generate the HTML files. These commands create a subdirectory named html containing the HTML output:
- shell> doxygen -g
- shell> doxygen
- To view the top-level index page, load the html/index.html file into your browser.
- To view information about the stream format, load the html/stream_format.html page (click on "Related Pages" and then "Backup Stream Format (v1)").
[edit] Notation and Conventions
[edit] Syntax Conventions
- [ ] indicates an optional syntax component
- ... indicates that the preceding syntax component can repeat
- | separates alternatives in a list of choices
Note: The backup stream API was originally described in WL#4063, and there are also notes in the comments of the source files. The notation used in this document is somewhat different from that used in the original documentation:
- The syntax is more like that used in the reference manual. In particular:
- [ ] means "optional"
- ... means "the preceding item can repeat"
- | separates alternatives in a list of choices
- The original syntax lists "snapshot descriptions" at the end of the header syntax. But a header chunk does not include these descriptions; they are separate chunks. Here they are listed that way, not as part of the header.
- The binlog_pos and binlog_group_pos items actually represent binary log coordinates (position *and* filename), not just position, so here they are denoted binlog_coords and binlog_group_coords.
- The initial parts of the original catalog syntax (the parts preceding the db_catalog items) comprise a single chunk and have been factored out as catalog_header.
- table_data in the original syntax stands for a sequence of table data chunks. Here table_data means a single chunk and the sequence of chunks is signified by table_data ... (note the ellipsis).
[edit] Backup Stream Syntax Overview
This section describes the high-level (chunk-layer) syntax of the major components of a backup stream. Later sections provide more detail about these components.
backup_stream: stream_prefix backup_image
backup_image: preamble table_data ... [summary]
preamble: header snapshot_description ... [summary] catalog metadata
catalog: catalog_header [db_catalog ...]
metadata: global_items [tables ... other_items]
The summary chunk can appear either following the table_data chunks at the end of the image, or as part of the preamble, following the snapshot descriptions in the preamble. Placement of the summary is indicated by a bit in the header flags.
[edit] Basic Data Type Storage
This section discusses storage for basic data types:
- Fixed-length and variable-length integers
- Strings
- Times
Integer Storage
Only unsigned integers are stored.
Fixed-length integers are stored as 1-byte, 2-byte, or 4-byte values, least significant byte first.
Variable-length integers are stored as a sequence of bytes, each byte storing 7 bits (least significant bit first). The most significant bit in a byte is set if more bytes follow. There is no maximum number of bytes that a variable-length integer can contain. A program that reads them must be prepared to deal with arbitrarily large values, or to throw an error if it encounters a value larger than what it can handle.
Example: The binary number 100101011101011010110100011110 will be split into 7-bit groups 10 0101011 1010110 1011010 0011110 and then stored in 5 bytes (least significant bytes first): 0x9E 0xDA 0xD6 0xAB 0x02
String Storage
Strings are stored as counted strings: A byte count indicating the length of the string, followed by the bytes in the string. The length is stored as a variable-length integer using the encoding just described. An empty string (or "null string") is represented as a single 0x00 byte.
Example: "abcd" is encoded as 0x04 0x61 0x62 0x63 0x64
Example: "" (the empty string) is encoded as 0x00
Time Storage
Times are stored in UTC. A time value takes six bytes:
- 2 bytes. Years since 1900 and month (0..11)
- Bits 0-7, 12-15. Year:
- High 8 bits from bits 0-7 of byte 0
- Low 4 bits from bits 4-8 of byte 1
- Bits 8-11. Month (bits 0-3 of byte 1)
- Bits 0-7, 12-15. Year:
- 1 byte. Day of month (1..31)
- 1 byte. Hour (0..23)
- 1 byte. Minute (0..59)
- 1 byte. Second (0..60)
Example: 2008-10-11 15:28:17 is encoded as 0x06 0xc9 0x0b 0x0f 0x1c 0x11
- Year = (0x06 << 4) + (0xc9 >> 4) = 0x60 + 0x0c = 0x6c = 108 (that is, 2008 - 1900)
- Month = 0xc9 & 0x0f = 9
- Day = 0x0b = 11
- Hour = 0x0f = 15
- Minute = 0x1c = 28
- Second = 0x11 = 17
A time value may consist entirely of 0x00 bytes, which means "no date."
[edit] Item Type Encoding
Item type encoding is used for db_item values in db_catalog chunks, and for item types in metadata item lists.
Allowable item types are encoded as follows:
- 1 = character set
- 2 = user
- 3 = privilege
- 4 = database
- 5 = table
- 6 = view
- 7 = stored procedure
- 8 = stored function
- 9 = event
- 10 = trigger
- 11 = tablespace
Item types are encoded using two bytes. A value of 0 is not a valid item type, so a 0x00 0x00 sequence can be used to separate one item list from the next.
[edit] Backup Stream Prefix Format
A backup stream begins with a 10-byte prefix:
backup_stream: stream_prefix backup_image
stream_prefix = magic_number version
- 8 bytes. The magic number. The byte values are 0xe0 0xf8 0x7f 0x7e 0x7e 0x5f 0x0f 0x03.
- 2 bytes. Integer backup format version number.
The backup image that follows the prefix is stored in the format indicated by the version number. Currently, the only version is version 1.
The prefix containing the magic number and version number is not part of the backup image. It is stored in the backup stream to make two things possible:
- Detection of where the backup image starts inside a backup stream
- Determination of the format of the backup image that follows the prefix
For example, suppose that the backup image is not stored at the beginning of a stream, but is preceded by some other information. An application can scan the stream looking for the backup magic number. Once found, the application will know with high probability that the version number and a backup image follows.
In other scenarios, the 10-byte prefix might not be needed. Suppose that the backup image is sent from some kind of server to a client using a custom protocol. The client knows that what server sends is a backup image. In the initial handshake, the server tells the client the version of the backup image format to follow. Thus, after opening the communication channel, the client can directly read the bytes of the backup image and there is no need to send the 10-byte prefix.
Because the prefix is not necessarily present, any backup image reader should not assume that the prefix should be read, whether the reader operates at the block, fragment, or chunk level. The decision about reading the prefix needs to be made separately.
[edit] Image Layer (Chunk Level) Format
backup_image: preamble table_data ... [summary]
The preamble contains information about what objects the backup image contains.
The table_data chunks contain table contents.
The summary ties the backup image to the binary log. This enables point-in-time recovery operations that combine use of the backup image and binary log modifications made after the backup operation.
Information to be stored in the summary is known only after the backup image has been created. Depending on whether it is possible to rewind the output stream, the summary can be stored in the preamble or at the end of the image. The summary appears in the header rather than at the end of the image if the inline-summary header flag is set.
[edit] Preamble Format
The preamble contains information about what objects the backup image contains.
preamble: header snapshot_description ... [summary] catalog metadata
The preamble consists of several chunks, which contain the following information:
- Header information, such as global image flags, image creation time, and server version number
- Information about table data snapshots stored in the image
- Possibly the summary
- A catalog listing all items stored in the image
- Metadata for the items
[edit] Header Format
header: flags creation_time snapshot_count server_version extra_data
The header chunk contains the following fields:
- 2 bytes. Integer image flags.
- 6 bytes. Time of image creation.
- 1 byte. Integer number of snapshots
- Variable length. String server version.
- [TODO: extra_data: Contents? Size = bytes following version to end of chunk?]
[TODO: Does image creation time indicate when the backup operation started?]
Bits in the flags field have the following meanings:
- Bit 0 (BSTREAM_FLAG_INLINE_SUMMARY)
- If set, the backup image summary is stored inline within the image preamble.
- If clear, the summary appears at the end of the image, following the table data chunks.
- Bit 1 (BSTREAM_FLAG_BIG_ENDIAN)
- If set, the server that created the backup uses big-endian storage.
- If clear, the server uses little-endian storage.
- Bit 2 (BSTREAM_FLAG_BINLOG)
- If set, the binlog_coords and binlog_group_coords fields in the summary contain valid values.
- If clear, binlog_coords and binlog_group_coords should be ignored (they are present in the summary but the values are not useful).
- Bits 3-15: Reserved.
The big-endian flag has no bearing on storage of values in the backup image itself. It might be of use in the case where an image was written by some native driver but fails to restore on a machine other than the one on which it was created. If you check the big-endian flag and find that the backup host byte order differs from the restore host byte order, that might indicate that the native backup and restore drivers did not correctly deal with different host byte orders. A way to test this would be to attempt the restore on a host with the same byte order as the backup host.
server_version is encoded as follows:
- 1 byte. Integer major number
- 1 byte. Integer minor number
- 1 byte. Integer release number
- Variable length. String representation of the version
For example, a server version of "6.0.8-alpha" is stored using this byte sequence:
0x06 0x00 0x08 0x0b 0x36 0x2e 0x30 0x2e 0x38 0x2d 0x61 0x6c 0x70 0x68 0x61
[edit] Snapshot Description Format
The preamble contains one or more snapshot_description chunks following the header chunk, each describing one of the table data snapshots in the image. The snapshot_description chunks implicitly form a numbered sequence beginning with 1. The number of snapshot_description chunks is given by the snapshot_count field in the header.
Each table_data chunk contains a snapshot_num field that indicates which snapshot_description it is associated with. A table data snapshot is the set of table_data chunks that contain the same snapshot_num value.
snapshot_description: image_type format_version global_options table_count [backup_engine_info] extra_data
backup_engine_info: engine_name major_version minor_version
- 1 byte. Integer image type, encoded as follows:
- 0 (BI_NATIVE): Snapshot was created by native backup driver.
- 1 (BI_DEFAULT): Snapshot was created by the built-in blocking driver.
- 2 (BI_CS): Snapshot was created by the built-in driver that uses a consistent read transaction.
- 2 bytes. Integer snapshot format version.
- 2 bytes. Integer global options. Reserved for future use.
- Variable length. Integer count of the tables stored in the table data.
- Variable length. Backup engine information (name and version). (Optional)
- Variable length. Extra data. Reserved for future use; currently empty.
The backup_engine_info field is empty for default and CS snapshots. For native snapshots, it has this format:
- Variable-length. String engine name.
- 1 byte. Integer major version.
- 1 byte. Integer minor version.
[TODO: Verify meaning of format_version. Is this the version to use for interpreting the corresponding table data snapshot?]
[edit] Catalog Format
The catalog describes what items are stored in the backup image. It contains no metadata because that is stored in a separate section of the image. The catalog only lists the items and provides the information needed to identify and select them. It consists of a header chunk followed by zero or more chunks that each describe items in a single database:
catalog: catalog_header [db_catalog ...]
[edit] Catalog Header Format
catalog_header: charsets 0x00 [users] 0x00 [tablespaces] 0x00 databases
The charsets, users, and tablespaces sections of the catalog_header chunk each contain a list of strings and are terminated by a 0x00 byte (an empty string). The databases section contains database-information items and extends to the end of the catalog_header chunk.
The catalog_header starts with a list of character sets identified by name:
charsets: charset_name ...
Character set names are string values represented using ASCII characters. The first two charset_name entries have a special meaning and should always be present:
- The character set used to encode all strings stored in the preamble following the charsets list. This should be a universal character set capable of representing any string, such as UTF8.
- The default character set of the server on which the backup image was created. It can be the same as the first character set.
charset_name values following the first two, if present, are any character sets used by the items stored in the image and thus needed to restore those items.
References to character sets at other locations in the backup image are as 0-based positions within the character set list. The number of character sets is limited to 256 so that one byte is sufficient to identify a character sets by its number.
Note: Even if some objects (such as tables or databases) use character sets in their definition, these character sets will not be stored in this list. On restore, all entries in this list are ignored.
Collation information is not stored currently, but might be in the future.
Character set and collation numbers used within the image catalog are internal to the backup image and have nothing to do with the IDs used in the server. Backup image-processing code must translate between image IDs and internal server IDs as necessary.
After the character set names, a list of users follows:
users: user_name ...
The users list, if present, contains users for which any privileges are stored in the image. Each user_name value is a string.
Currently, the users list is always empty and is ignored for restore operations.
After users, a list of names of all tablespaces used by tables follows:
tablespace: tablespace_name ...
Each tablespace_name value is a string.
After tablespaces, a list of all databases follows. If the list is empty, it consists of a single empty string. Otherwise, it is a list of one or more db_info entries:
databases: 0x00 | db_info ...
db_info entries each contain fields that describe a single database:
db_info: db_name db_flags [extra_data]
- Variable length: String database name.
- 1 byte. Integer database flags.
- Variable length: Extra data. Optional; present only if indicated in the flags.
- 2 bytes. Integer data length.
- Variable length. Data bytes, as many as specified by the length.
Bits in the db_flags field are used as follows:
- Bits 0-6: Reserved
- Bit 7 (BSTREAM_FLAG_HAS_EXTRA_DATA): Set if the extra_data field is present in the db_info entry.
[edit] Database Catalog Format
If there are no databases in the image, the databases list in the catalog_header chunk is empty and the catalog contains no db_catalog chunks following the catalog_header chunk.
Each db_catalog chunk lists all tables and other per-database items belonging to a single database. If the database is empty, its db_catalog chunk consists of two 0x00 bytes. Otherwise, db_catalog contains lists of table and db_item_info entries:
db_catalog: 0x00 0x00 | db_tables db_other_items
db_tables: table_info ...
db_other_items: db_item_info ...
All tables are stored before other per-database items in db_catalog. This is important for addressing the per-database items because the coordinates for such items refer to the position in the db_other_items list. (For example, per-database item 0 is the first item in db_other_items.)
Grants belonging to a given database are listed in db_other_items. They have artificial names such as "user_joe 0000001".
table_info: type table_name flags snapshot_num table_pos [extra_data]
- 2 bytes. Integer item type (always 0x05 0x00 for a table)
- Variable length: String table name.
- 1 byte. Integer flags.
- 1 byte. Integer indicating which snapshot contains the table's data.
- Variable length. Integer position of the table within its snapshot.
- Variable length: Item data. Optional; present only if indicated in the flags. (Currently not used.)
- 2 bytes. Integer data length.
- Variable length. Data bytes, as many as specified by the length.
Each entry in the db_catalog contains information that describes a single database item:
db_item_info:= type name [item_data]
- 2 bytes. Integer item type.
- Variable length. String item name.
- Variable length. Item data. (Optional; currently not used)
Allowable item type values are given in #Item Type Encoding.
Bits in the flags field are used as follows:
- Bits 0-6: Reserved
- Bit 7 (BSTREAM_FLAG_HAS_EXTRA_DATA): Set if the extra_data field is present in item_data.
[edit] Metadata Format
The metadata section contains information for items that need to be created when restoring data. It has three main sections:
- Metadata for global items
- Metadata for tables
- Metadata other items, subdivided into per-database and per-table items
metadata: global_items [tables ... other_items]
If there are no databases in the image, metadata consists of the global_items chunk only.
Each metadata section contains a list of metadata entries, each entry containing data required for restoring a single item. Often this is an SQL CREATE statement for that item, but it can also contain other data stored in binary format.
The order of entries is relevant. They are stored in such an order that items can be created while reading these entries without breaking any dependencies.
Currently the global_items chunk contains metadata only for tablespaces and databases, but it might be used to store other global objects in the future. Tablespace definitions precede database definitions.
The tables chunks are grouped by database, one chunk per database. (This is okay because foreign constraints can be disabled when tables are created). This will help to skip tables upon selective restore of databases.
tables: db1_tables ... dbN_tables
The other_items chunk has two parts, one for all per-database items other than tables, and one for all per-table items. These parts are separated by an empty item (two 0x00 0x00 bytes).
other_items: per_db_items 0x00 0x00 per_table_items
The per-database items other than tables include stored routines, views, triggers, and events. These cannot be grouped by database because of potential inter-database dependencies. This is why they are stored separately in the other_items section. Grants are considered per-database items.
Metadata item lists can be empty or consist of several item entries. An empty item list consists of two 0x00 bytes, which cannot start any valid item_entry value. (The first field in an item_entry is the item type, which cannot be 0.) A list of item entries continues to the end of the chunk.
item_list: 0x00 0x00 | item_entry ...
Each item entry has the following format:
item_entry: type flags catalog_pos [extra_data] [create_statement]
- 2 bytes. Integer item type.
- 1 byte. Integer flags.
- Variable length. Integer catalog coordinates.
- Variable length. Optional; present only if BSTREAM_FLAG_HAS_EXTRA_DATA is set in the flags. (Currently not used.)
- 2 bytes. Integer data length.
- Variable length. Data bytes, as many as specified by the length.
- Variable length. String CREATE statement. Optional; present only if BSTREAM_FLAG_HAS_CREATE_STMT is set in the flags.
Allowable item type values are given in #Item Type Encoding.
A metadata item entry contains a CREATE statement or other binary data or both. The flags field indicates which metadata elements are present in the entry:
- Bits 0-5: Reserved
- Bit 6 (BSTREAM_FLAG_HAS_CREATE_STMT): The create_statement field is present.
- Bit 7 (BSTREAM_FLAG_HAS_EXTRA_DATA): The extra_data field is present.
The format of the coordinates for each item's catalog position depends on the item type.
Catalog coordinates for global items
catalog_pos: pos_in_the_list
- Variable length. 0-based position in the corresponding list in catalog_header.
The catalog is capable of storing four types of global objects: databases, tablespaces, users, and character sets. For all these global objects, catalog_pos is a single number pointing at the entry on the appropriate list within the catalog_header chunk of the catalog. The metadata sections contain entries only for databases and tablespaces, but each type of global object can be located in the catalog using its coordinates. The coordinate is a position on the corresponding list inside the catalog_header chunk.
- Database: The coordinates value for a database is a number N indicating that the database is described by the N-th entry in the databases list within the catalog_header chunk.
- Tablespace: Similar, but for the tablespaces list within catalog_header.
- User: Similar, but for the users list within catalog_header.
- Character set: Similar, but for the character sets list within catalog_header.
Catalog coordinates for tables
catalog_pos: table_pos snapshot_num
- Variable length. Position of the table within the given snapshot as given by the corresponding table_info entry in the db_catalog entry of the database containing this table.
- 1 byte. 0-based integer snapshot number.
To see which snapshot each table belongs to and what position within the snapshot it has, examine the corresponding table_info entry within the db_catalog entry of the database containing the table.
Table #(N, K): The table described by the table_info entry with snapshot_num = N and table_pos= K. This table_info entry can be found in the db_catalog entry of the database containing the table.
Catalog coordinates for other per-database items
catalog_pos: item_pos db_num
- Variable length. 0-based position in the db_other_items list of the db_catalog for the given database.
- Variable length. 0-based position of the database in the databases list in catalog_header.
The other per-database items represent views, stored procedures, stored functions, events, triggers, and grants.
Other #(N, K): The object described by the db_item_info entry in the db_other_items list with item_pos = N and db_num = K.
Catalog coordinates for per-table items
catalog_pos: item_pos db_num table_num
The meaning not defined yet. All three fields are variable length numbers.
Per-table objects are not really supported by the current code and there is no place for them in the catalog. The per_table_items part of other_items is reserved for future use and currently is always empty. This means that oher_items always ends with 0x00 0x00.
[edit] Table Data Format
Table data is contained in a sequence of one or more table_data chunks.
table_data: snapshot_num sequence_num flags table_num data
- 1 byte. Integer snapshot number.
- 2 bytes. Integer sequence number.
- 1 byte. Integer flags.
- Variable length. Integer table number.
- Variable length. Data, to end of chunk.
All table_data chunks that have the same snapshot_num value form a table data snapshot. Snapshots are numbered beginning with 1, so the snapshot_num field has a value from 1 to 255 and no table_data chunk can begin with a 0x00 byte. This enables the end of the sequence of table_data chunks to be detected if the summary follows the table data because in this case the summary will begin with a 0x00 byte.
Data chunks of each snapshot are numbered by sequence_num values from 0 to 65535. These values can be used to detect discontinuities in a backup stream, although the values need not be strictly increasing due to wraparound. It is possible for a snapshot to contain more than 65536 table_data chunks, in which case the sequence_num values will overflow after 65535 and wrap around to begin with 0 again.
Bits in the flags field are used as follows:
- Bit 0 (BSTREAM_FLAG_LAST_CHUNK): Set if the chunk is the last data chunk for the table.
- Bits 1-7. Reserved.
[TODO: What is the table number in reference to?]
[TODO: Format of data at end of chunk? Depends on the backup engine?]
[TODO: I observe table_data chunks that are the last in the snapshot, but without BSTREAM_FLAG_LAST_CHUNK set in the flags]
[edit] Summary Format
The summary ties the backup image to the binary log. This enables point-in-time recovery operations that combine use of the backup image and binary log modifications made after the backup operation.
The summary appears inline within the header if the inline-summary header flag is set. Otherwise, it appears at the end of the backup image, following the table_data chunks, and contains a leading 0x00 byte. The 0x00 byte serves to distinguish the summary chunk from a table_data chunk because table_data cannot begin with a 0x00 byte.
summary: [0x00] vp_time end_time binlog_coords binlog_group_coords
- 1 byte. 0x00 byte distinguishing the summary from preceding table_data chunks. Not present if summary appears in the header?
- 6 bytes: Time of validity point.
- 6 bytes: Time that backup ended. [TODO: Verify that]
- Variable length: Binary log coordinates, stored as:
- 4 bytes: Integer binary log position.
- Variable length: String binary log filename.
- Variable length: Binary log group coordinates. The format is same as for binlog_coords. This field is reserved for future use.
If BSTREAM_FLAG_BINLOG is not set in the header flags, the binlog_coords and binlog_group_coords values in the summary should be ignored. (The values are present but not useful.)
[TODO: Verify meaning of end_time. I observe a value of 6 0x00 bytes, what does that mean? (It's a bug)]
[edit] Transport Layer (Block Level) Format
The transport layer consists of fixed-sized blocks. Blocks within a given backup image are the same size (except possibly the last), but the block size is encoded within the first few blocks of the image itself. Blocks are encoded such that backup image chunks can be recognized and extracted from the byte stream.
The backup stream prefix (magic number and image format version) is not part of the sequence of blocks comprising the backup image. If present, the prefix must be read separately.
backup_image: first_block [initial_block ...] [regular_block ...]
first_block: block_size initial_block_count block_data
initial_block: block_size block_data
regular_block: block_data
The first few blocks are special:
- The first block contains the block size for all following blocks, the number N of "initial" blocks to follow, and some data.
- The next N blocks following the first block (the "initial" blocks) contain the block size and some data.
The blocks following the first and initial blocks contain only data.
The first and inital blocks all contain the block size. The redundancy serves to enable detection of data corruption.
first_block fields:
- 4 bytes. block_size: An integer indicating how large blocks are.
- 1 byte. initial_block_count: An integer indicating how many initial blocks follow the first block.
- block_size-5 bytes. block_data: Stream data to the end of the block. The size is block_size-5 because the first 5 bytes of the block contain the block_size and initial_block_count fields.
initial_block fields:
- 4 bytes. block_size: An integer indicating how large blocks are.
- block_size-4 byte. block_data: Stream data to the end of the block. The size is block_size-4 because the first 4 bytes of the block contain the block_size field.
The block_size value in all initial_block blocks must match the block_size in first_block.
regular_block fields:
- block_size bytes. block_data: Stream data (comprises entire block).
Because the block size is unknown initially, the block reader must begin by reading the first four first_block bytes to get the block size before it can read the rest of the block.
The last block in a backup image might not be the full block size, and there might not be as many initial blocks as indicated in the first block. For example, the first block might indicate a block_size of 16384 bytes and initial_block_count of 2. But if the entire backup image size is 3000 bytes, the first block will contain only 3000 bytes and there will be no initial blocks following it. The block-reading level must be prepared to deal with this and handle what block data bytes are actually present, interpreting them according to the rules governing block_data content described in the next section.
[edit] Transport Layer Block Data Format
When a block is read in the transport layer, its block_data bytes are extracted and then interpreted to find the fragments that it contains. (No fragment ever crosses a block boundary.) These fragments are assembled into chunks for use by the image layer, which operates on chunk units. There is no fixed relationship between chunks and blocks. A chunk can be constructed from fragments spanning several blocks, or a chunk might require fragments from only part of a block.
The block_data part of a given block consists of one or more fragments:
block_data_stream: fragment [fragment ...]
fragment: EOC | EOS | frag_header payload
EOC: 0x80 (end of chunk)
EOS: 0xc0 (end of stream)
An EOC marker ends a chunk even if the preceding fragment indicates that more fragments follow.
frag_header: frag_type frag_size
payload: data bytes (size depends on frag_header contents)
A frag_header fragment header byte indicates the type and size of the fragment:
- Bits 0-5: The fragment size. Interpretation of this value depends on the fragment type.
- Bits 6-7: The fragment type.
Thus:
- frag_type = (frag_header & 0xc0)
- frag_size = (frag_header & 0x3f)
Possible frag_type values:
- 0x00: Small fragment, more fragments in chunk to follow
- 0x40: Small fragment, last fragment in chunk
- 0x80: Big fragment, more fragments in chunk to follow
- 0xc0: Huge fragment, more fragments in chunk to follow
For a small fragment, the second bit of the frag_type bits determine whether there are more fragments in the chunk. A chunk cannot end with a big or huge fragment, but if the last part of a chunk requires exactly as many bytes as fit in a big or huge fragment, an EOC fragment will follow to indicate the end of the chunk.
The frag_size value must be interpreted to determine the actual fragment size:
- Small fragment: size = rest of block if frag_size == 0; size = frag_size otherwise
- Big fragment: size = (frag_size << 6)
- Huge fragment: size = (frag_size << 12)