MySQL Internals MyISAM
← Back to MySQL Internals overview page
Contents |
[edit] MyISAM Storage Engine
[edit] MyISAM Record Structure
[edit] Introduction
When you say:
CREATE TABLE Table1 ...
MySQL creates files named Table1.MYD ("MySQL Data"), Table1.MYI ("MySQL Index"), and Table1.frm ("Format"). These files will be in the directory:
/<datadir>/<database>/
For example, if you use Linux, you might find the files in the /usr/local/var/test directory (assuming your database name is test). if you use Windows, you might find the files in the \mysql\data\test\ directory.
Let's look at the .MYD Data file (MyISAM SQL Data file) more closely. There are three possible formats fixed, dynamic, and packed. First, let's discuss the fixed format.
- Page Size
Unlike most DBMSs, MySQL doesn't store on disk using pages. Therefore you will not see filler space between rows. (Reminder: This does not refer to BDB and InnoDB tables, which do use pages).
- Record Header
The minimal record header is a set of flags:
- "X bit" = 0 if row is deleted, = 1 if row is not deleted
- "Null Bits" = 1 if row contains any null fields, or = 0 otherwise.
- "Filler Bits" = 1
The length of the record header is thus:
(1 + number of NULL columns + 7) / 8 bytes
After the header, all columns are stored in the order that they were created, which is the same order that you would get from SHOW COLUMNS.
Here's an example. Suppose you say:
CREATE TABLE Table1 (column1 CHAR(1), column2 CHAR(1), column3 CHAR(1));
INSERT INTO Table1 VALUES ('a', 'b', 'c');
INSERT INTO Table1 VALUES ('d', NULL, 'e');
A CHAR(1) column takes precisely one byte (plus one bit of overhead that is assigned to every column I'll describe the details of column storage later). So the file Table1.MYD looks like this:
Hexadecimal Display of Table1.MYD file
F1 61 62 63 00 F5 64 20 65 00 ... .abc..d e.
Here's how to read this hexadecimal-dump display:
- The hexadecimal numbers
F1 61 62 63 00 F5 64 20 65 00are byte values and the column on the right is an attempt to show the same bytes in ASCII. - The
F1byte means that there are no null fields in the first row. - The
F5byte means that the second column of the second row isNULL.
(It's probably easier to understand the flag setting if you restate F5 as 11110101 binary, and (a) notice that the third flag bit from the right is on, and (b) remember that the first flag bit is the X bit.)
There are complications the record header is more complex if there are variable-length fields but the simple display shown in the example is exactly what you'd see if you looked at the MySQL Data file with a debugger or a hexadecimal file dumper.
So much for the fixed format. Now, let's discuss the dynamic format.
The dynamic file format is necessary if rows can vary in size. That will be the case if there are BLOB columns, or "true" VARCHAR columns. (Remember that MySQL may treat VARCHAR columns as if they're CHAR columns, in which case the fixed format is used.) A dynamic row has more fields in the header. The important ones are "the actual length", "the unused length", and "the overflow pointer". The actual length is the total number of bytes in all the columns. The unused length is the total number of bytes between one physical record and the next one. The overflow pointer is the location of the rest of the record if there are multiple parts.
For example, here is a dynamic row:
03 start of header - Block type, see mi_dynrec.c, _mi_get_block_info()
04, 00 actual length
0c block length
01, fc flags + overflow pointer
**** data in the row
************ unused bytes
<-- next row starts here)
In the example, the actual length and the unused length are short (one byte each) because the table definition says that the columns are short if the columns were potentially large, then the actual length and the unused length could be two bytes each, three bytes each, and so on. In this case, actual length plus unused length is 10 hexadecimal (sixteen decimal), which is a minimum.
In a dynamic row, there is no deleted bit. Instead, deleted rows are marked with a block of type 0.
As for the third format packed we will only say briefly that:
- Numeric values are stored in a form that depends on the range (start/end values) for the data type.
- All columns are packed using either Huffman or enum coding.
For details, see the source files /myisam/mi_statrec.c (for fixed format), /myisam/mi_dynrec.c (for dynamic format), and /myisam/mi_packrec.c (for packed format).
Note: Internally, MySQL uses a format much like the fixed format which it uses for disk storage. The main differences are:
-
BLOBvalues have a length and a memory pointer rather than being stored inline. - "True
VARCHAR" (a column storage which will be fully implemented in version 5.0) will have a 16-bit length plus the data. - All integer or floating-point numbers are stored with the low byte first. Point (3) does not apply for
ISAMstorage or internals.
[edit] Physical Attributes of Columns
Next I'll describe the physical attributes of each column in a row. The format depends entirely on the data type and the size of the column, so, for every data type, I'll give a description and an example.
All the types are defined within the include/mysql_com.h file within the enum_field_types enumerated structure. Here's a sample of the key values and corresponding numbers:
MYSQL_TYPE_BIT 16 MYSQL_TYPE_BLOB 252 MYSQL_TYPE_DATE 10 MYSQL_TYPE_DATETIME 12 MYSQL_TYPE_DECIMAL 0 MYSQL_TYPE_DOUBLE 5 MYSQL_TYPE_ENUM 247 MYSQL_TYPE_FLOAT 4 MYSQL_TYPE_GEOMETRY 255 MYSQL_TYPE_INT24 9 MYSQL_TYPE_LONG 3 MYSQL_TYPE_LONGLONG 8 MYSQL_TYPE_LONG_BLOB 251 MYSQL_TYPE_MEDIUM_BLOB 250 MYSQL_TYPE_NEWDATE 14 MYSQL_TYPE_NEWDECIMAL 246 MYSQL_TYPE_NULL 6 MYSQL_TYPE_SET 248 MYSQL_TYPE_SHORT 2 MYSQL_TYPE_STRING 254 MYSQL_TYPE_TIME 11 MYSQL_TYPE_TIMESTAMP 7 MYSQL_TYPE_TINY 1 MYSQL_TYPE_TINY_BLOB 249 MYSQL_TYPE_VARCHAR 15 MYSQL_TYPE_VAR_STRING 253 MYSQL_TYPE_YEAR 13
- The character data types
CHAR
- Storage: fixed-length string with space padding on the right.
- Example: a
CHAR(5)column containing the value'A'looks like:hexadecimal 41 20 20 20 20-- (length = A??'</code>)
VARCHAR
- Storage: variable-length string with a preceding length.
- Example: a
VARCHAR(7)column containing'A'looks like:hexadecimal 01 41-- (length = A'</code>) - In MySQL 4.1 the length is always 1 byte. In MySQL 5.0 the length may be either 1 byte (for up to 255) or 2 bytes (for 256 to 65535). Some further random notes about the new format: In old tables (from MySQL 4.1 and earlier),
VARCHARcolumns have typeMYSQL_TYPE_VAR_STRING, which works exactly like aCHARwith the exception that if you do anALTER TABLE, it's converted to a trueVARCHAR(MYSQL_TYPE_VARCHAR). (This means that old tables will work as before for users.) ... Apart from the above case, there are no longer any automatic changes fromCHARtoVARCHARor fromVARCHARtoCHAR. MySQL will remember the declared type and stick to it ...VARCHARis implemented infield.handfield.ccthrough the new classField_varstring...MyISAMimplementsVARCHARboth for dynamic-length and fixed-length rows (as signaled with theROW_FORMATflag) ...VARCHARnow stores trailing spaces. (If they don't fit, that's an error in strict mode.) Trailing spaces are not significant in comparisons ... Intable->record, the space is reserved for length (1 or 2 bytes) plus data ... The number of bytes used to store the length is in the fieldField_varchar->length_bytes. Note that internally this can be 2 even ifField_varchar->field_length< 256 (for example, for a shortened key to avarchar(256)) ... There is a new macro,HA_VARCHAR_PACKLENGTH(field_length), that can be used onfield->lengthin write_row / read_row to check how many length bytes are used. (In this context we can't have a field_length < 256 with a 2-byte pack length) ... When creating a key for the handler,HA_KEYTYPE_VARTEXT1andHA_KEYTYPE_BINARY1are used for a key on a column that has a 1-byte length prefix andHA_KEYTYPE_VARTEXT2andHA_KEYTYPE_BINARY2for a column that has a 2-byte length prefix. (In the future we will probably deleteHA_KEYTYPE_BINARY#, as this can be instead be done by just using thebinarycharacter set withHA_KEYTYPE_VARTEXT#.) ... When sending a key to the handler forindex_read()or records_in_range, we always use a 2-byte length for theVARCHARto make things simpler. (For version 5.1 we intend to changeCHARs to also use a 2-byte length for these functions, as this will speed up and simplify the key handling code on the handler side.) ... The test case filemysql-test/include/varchar.incshould be included in the code that tests the handler. Seet/myisam.testfor how to use this. You should verify the result against the one inmysql-test/t/myisam.resultto ensure that you get the correct results ... A client sees both the old and newVARCHARtype asMYSQL_TYPE_VAR_STRING. It will never (at least for 5.0) seeMYSQL_TYPE_VARCHAR. This ensures that old clients will work as before ... If you run MySQL 5.0 with the--newoption, MySQL will show oldVARCHARcolumns as'CHAR'inSHOW CREATE TABLE. (This is useful when testing whether a table is using the newVARCHARtype or not.)
- The numeric data types
Important: MySQL almost always stores multi-byte binary numbers with the low byte first. This is called "little-endian" numeric storage; it's normal on Intel x86 machines; MySQL uses it even for non-Intel machines so that databases will be portable.
TINYINT
- Storage: fixed-length binary, always one byte.
- Example: a
TINYINTcolumn containing65looks like:hexadecimal 41-- (length = 1, value = 65)
SMALLINT
- Storage: fixed-length binary, always two bytes.
- Example: a
SMALLINTcolumn containing65looks like:hexadecimal 41 00-- (length = 2, value = 65)
MEDIUMINT
- Storage: fixed-length binary, always three bytes.
- Example: a
MEDIUMINTcolumn containing65looks like:hexadecimal 41 00 00-- (length = 3, value = 65)
INT
- Storage: fixed-length binary, always four bytes.
- Example: an
INTcolumn containing65looks like:hexadecimal 41 00 00 00-- (length = 4, value = 65)
BIGINT
- Storage: fixed-length binary, always eight bytes.
- Example: a
BIGINTcolumn containing65looks like:hexadecimal 41 00 00 00 00 00 00 00-- (length = 8, value = 65)
FLOAT
- Storage: fixed-length binary, always four bytes.
- Example: a
FLOATcolumn containing approximately65looks like:hexadecimal 00 00 82 42-- (length = 4, value = 65)
DOUBLE PRECISION
- Storage: fixed-length binary, always eight bytes.
- Example: a
DOUBLE PRECISIONcolumn containing approximately65looks like:hexadecimal 00 00 00 00 00 40 50 40-- (length = 8, value = 65)
REAL
- Storage: same as
FLOAT, or same asDOUBLE PRECISION, depending on the setting of the--ansioption.
- Storage: same as
DECIMAL
- MySQL 4.1 Storage: fixed-length string, with a leading byte for the sign, if any.
- Example: a
DECIMAL(2)column containing65looks like:hexadecimal 20 36 35-- (length = 3, value =' 65') - Example: a
DECIMAL(2) UNSIGNEDcolumn containing65looks like:hexadecimal 36 35-- (length = 2, value ='65') - Example: a
DECIMAL(4,2) UNSIGNEDcolumn containing65looks like:hexadecimal 36 35 2E 30 30-- (length = 5, value ='65.00') - MySQL 5.0 Storage: high byte first, four-byte chunks. We call the four-byte chunks "*decimal* digits". Since 2**32 = There is an implied decimal point. Details are in /strings/decimal.c.
- Example: a MySQL 5.0
DECIMAL(21,9)column containing111222333444.555666777looks like:hexadecimal 80 6f 0d 40 8a 04 21 1e cd 59-- (flag + '111', '222333444', '555666777').
NUMERIC
- Storage: same as
DECIMAL.
- Storage: same as
BOOL
- Storage: same as
TINYINT.
- Storage: same as
- The temporal data types
DATE
- Storage: 3 byte integer, low byte first. Packed as: 'day + month*32 + year*16*32'
- Example: a
DATEcolumn containing'1962-01-02'looks like:hexadecimal 22 54 0F
DATETIME
- Storage: eight bytes.
- Part 1 is a 32-bit integer containing year*10000 + month*100 + day.
- Part 2 is a 32-bit integer containing hour*10000 + minute*100 + second.
- Example: a
DATETIMEcolumn for'0001-01-01 01:01:01'looks like:hexadecimal B5 2E 11 5A 02 00 00 00
TIME
- Storage: 3 bytes, low byte first. This is stored as seconds: days*24*3600+hours*3600+minutes*60+seconds
- Example: a
TIMEcolumn containing'1 02:03:04'(1 day 2 hour 3 minutes and 4 seconds) looks like:hexadecimal 58 6E 01
TIMESTAMP
- Storage: 4 bytes, low byte first. Stored as unix
time(), which is seconds since the Epoch (00:00:00 UTC, January 1, 1970). - Example: a
TIMESTAMPcolumn containing'2003-01-01 01:01:01'looks like:hexadecimal 4D AE 12 23
- Storage: 4 bytes, low byte first. Stored as unix
YEAR
- Storage: same as unsigned
TINYINTwith a base value of 0 = 1901.
- Storage: same as unsigned
- Others
SET
- Storage: one byte for each eight members in the set.
- Maximum length: eight bytes (for maximum 64 members).
- This is a bit list. The least significant bit corresponds to the first listed member of the set.
- Example: a
SET('A','B','C')column containing'A'looks like:01-- (length = A')
ENUM
- Storage: one byte if less than 256 alternatives, else two bytes.
- This is an index. The value 1 corresponds to the first listed alternative. (Note:
ENUMalways reserves the value 0 for an erroneous value. This explains why'A'is 1 instead of 0.) - Example: an
ENUM('A','B','C')column containing'A'looks like:01-- (length = A')
- The Large-Object data types
Warning: Because TINYBLOB's preceding length is one byte long (the size of a TINYINT) and MEDIUMBLOB's preceding length is three bytes long (the size of a MEDIUMINT), it's easy to think there's some sort of correspondence between the BLOB and INT types. There isn't a BLOB's preceding length is not four bytes long (the size of an INT).
TINYBLOB
- Storage: variable-length string with a preceding one-byte length.
- Example: a
TINYBLOBcolumn containing'A'looks like:hexadecimal 01 41-- (length = A')
TINYTEXT
- Storage: same as
TINYBLOB.
- Storage: same as
BLOB
- Storage: variable-length string with a preceding two-byte length.
- Example: a
BLOBcolumn containing'A'looks like:hexadecimal 01 00 41-- (length = A')
TEXT
- Storage: same as
BLOB.
- Storage: same as
MEDIUMBLOB
- Storage: variable-length string with a preceding three-byte length.
- Example: a
MEDIUMBLOBcolumn containing'A' looks like:hexadecimal 01 00 00 41-- (length = A')
MEDIUMTEXT
- Storage: same as
MEDIUMBLOB.
- Storage: same as
LONGBLOB
- Storage: variable-length string with a preceding four-byte length.
- Example: a
LONGBLOBcolumn containing'A'looks like:hexadecimal 01 00 00 00 41-- (length = A')
LONGTEXT
- Storage: same as
LONGBLOB.
- Storage: same as
[edit] Where to Look For More Information
References:
Most of the formatting work for MyISAM columns is visible in the program /sql/field.cc in the source code directory. And in the MyISAM directory, the files that do formatting work for different record formats are: /myisam/mi_statrec.c, /myisam/mi_dynrec.c, and /myisam/mi_packrec.c.
[edit] The .MYI file
A .MYI file for a MyISAM table contains the table's indexes.
The .MYI file has two parts: the header information and the key values. So the next sub-sections will be "The .MYI Header" and "The .MYI Key Values".
The .MYI Header
A .MYI file begins with a header, with information about options, about file sizes, and about the "keys". In MySQL terminology, a "key" is something that you create with CREATE [UNIQUE] INDEX.
Program files which read and write .MYI headers are in the ./myisam directory: mi_open.c has the routines that write each section of the header, mi_create.c has a routine that calls the mi_open.c routines in order, and myisamdef.h has structure definitions corresponding to what we're about to describe.
These are the main header sections:
Section Occurrences ------- ----------- state Occurs 1 time base Occurs 1 time keydef (including keysegs) Occurs once for each key recinfo Occurs once for each field
Now we will look at each of these sections, showing each field.
We are going to use an example table throughout the description. To make the example table, we executed these statements:
CREATE TABLE T (S1 CHAR(1), S2 CHAR(2), S3 CHAR(3));
CREATE UNIQUE INDEX I1 ON T (S1);
CREATE INDEX I2 ON T (S2,S3);
INSERT INTO T VALUES ('1', 'aa', 'b');
INSERT INTO T VALUES ('2', 'aa', 'bb');
INSERT INTO T VALUES ('3', 'aa', 'bbb');
DELETE FROM T WHERE S1 = '2';
We took a hexadecimal dump of the resulting file, T.MYI.
In all the individual descriptions below, the column labeled Dump From Example File has the exact bytes that are in T.MYI. You can verify that by executing the same statements and looking at a hexadecimal dump yourself. With Linux this is possible using od -h T.MYI; with Windows you can use the command-line debugger.
Along with the typical value, we may include a comment. The comment usually explains why the value is what it is. Sometimes the comment is derived from the comments in the source code.
state
This section is written by mi_open.c, mi_state_info_write().
Name Size Dump From Example File Comment
---- ---- ---------------------- -------
file_version 4 FE FE 07 01 from myisam_file_magic
options 2 00 02 HA_OPTION_COMPRESS_RECORD
etc.
header_length 2 01 A2 this header example has
0x01A2 bytes
state_info_length 2 00 B0 = MI_STATE_INFO_SIZE
defined in myisamdef.h
base_info_length 2 00 64 = MI_BASE_INFO_SIZE
defined in myisamdef.h
base_pos 2 00 D4 = where the base
section starts
key_parts 2 00 03 a key part is a column
within a key
unique_key_parts 2 00 00 key-parts+unique-parts
keys 1 02 here are 2 keys --
I1 and I2
uniques 1 00 number of hash unique
keys used internally
in temporary tables
(nothing to do with
'UNIQUE' definitions)
language 1 08 "language for indexes"
max_block_size 1 01
fulltext_keys 1 00 # of fulltext keys.
= 0 if version <= 4.0
not_used 1 00 to align to 8-byte
boundary
state->open_count 2 00 01
state->changed 1 39 set if table updated;
reset if shutdown (so
one can examine this
to see if there was an
update without proper
shutdown)
state->sortkey 1 FF "sorted by this key"
(not used)
state->state.records 8 00 00 00 00 00 00 00 02 number of actual,
un-deleted, records
state->state.del 8 00 00 00 00 00 00 00 01 # of deleted records
state->split 8 00 00 00 00 00 00 00 03 # of "chunks" (e.g.
records or spaces left
after record deletion)
state->dellink 8 00 00 00 00 00 00 00 07 "Link to next removed
"block". Initially =
HA_OFFSET_ERROR
state->state.key_file_length 8 00 00 00 00 00 00 0c 00 2048
state->state.data_file_length 8 00 00 00 00 00 00 00 15 = size of .MYD file
state->state.empty 8 00 00 00 00 00 00 00 00
state->state.key_empty 8 00 00 00 00 00 00 00 00
state->auto_increment 8 00 00 00 00 00 00 00 00
state->checksum 8 00 00 00 00 00 00 00 00
state->process 4 00 00 09 E6 from getpid(). process
of last update
state->unique 4 00 00 00 0B initially = 0
state->status 4 00 00 00 00
state->update_count 4 00 00 00 04 updated for each write
lock (there were 3
inserts + 1 delete,
total 4 operations)
state->key_root 8 00 00 00 00 00 00 04 00 offset in file where
I1 keys start, can be
= HA_OFFSET_ERROR
00 00 00 00 00 00 08 00 state->key_root occurs
twice because there
are two keys
state->key_del 8 FF FF FF FF FF FF FF FF delete links for keys
(occurs many times if
many delete links)
state->sec_index_changed 4 00 00 00 00 sec_index = secondary
index (presumably)
not currently used
state->sec_index_used 4 00 00 00 00 "which extra indexes
are in use"
not currently used
state->version 4 3F 3F EB F7 "timestamp of create"
state->key_map 8 00 00 00 03 "what keys are in use"
state->create_time 8 00 00 00 00 3F 3F EB F7 "time when database
created" (actually:
time when file made)
state->recover_time 8 00 00 00 00 00 00 00 00 "time of last recover"
state->check_time 8 00 00 00 00 3F 3F EB F7 "time of last check"
state->rec_per_key_rows 8 00 00 00 00 00 00 00 00
state->rec_per_key_parts 4 00 00 00 00 (key_parts = 3, so
00 00 00 00 rec_per_key_parts
00 00 00 00 occurs 3 times)
base
This section is written by mi_open.c, mi_base_info_write(). The corresponding structure in myisamdef.h is MI_BASE_INFO.
In our example T.MYI file, the first byte of the base section is at offset 0x00d4. That's where it's supposed to be, according to the header field base_pos (above).
Name Size Dump From Example File Comment
---- ---- ---------------------- -------
base->keystart 8 00 00 00 00 00 00 04 00 keys start at offset
1024 (0x0400)
base->max_data_file_length 8 00 00 00 00 00 00 00 00
base->max_key_file_length 8 00 00 00 00 00 00 00 00
base->records 8 00 00 00 00 00 00 00 00
base->reloc 8 00 00 00 00 00 00 00 00
base->mean_row_length 4 00 00 00 00
base->reclength 4 00 00 00 07 length(s1)+length(s2)
+length(s3)=7
base->pack_reclength 4 00 00 00 07
base->min_pack_length 4 00 00 00 07
base->max_pack_length 4 00 00 00 07
base->min_block_length 4 00 00 00 14
base->fields 4 00 00 00 04 4 fields: 3 defined,
plus 1 extra
base->pack_fields 4 00 00 00 00
base->rec_reflength 1 04
base->key_reflength 1 04
base->keys 1 02 was 0 at start
base->auto_key 1 00
base->pack_bits 2 00 00
base->blobs 2 00 00
base->max_key_block_length 2 04 00 length of block = 1024
bytes (0x0400)
base->max_key_length 2 00 10 including length of
pointer
base->extra_alloc_bytes 2 00 00
base->extra_alloc_procent 1 00
base->raid_type 1 00
base->raid_chunks 2 00 00
base->raid_chunksize 4 00 00 00 00
[extra] i.e. filler 6 00 00 00 00 00 00
keydef
This section is written by mi_open.c, mi_keydef_write(). The corresponding structure in myisamdef.h is MI_KEYDEF.
This is a multiple-occurrence structure. Since there are two indexes in our example (I1 and I2), we will see that keydef occurs two times below. There is a subordinate structure, keyseg, which also occurs multiple times (once within the keydef for I1 and two times within the keydef for I2).
Name Size Dump From Example File Comment
---- ---- ---------------------- -------
/* key definition for I1 */
keydef->keysegs 1 01 there is 1 keyseg (for
column S1).
keydef->key_alg 1 01 algorithm = Rtree or
Btree
keydef->flag 2 00 49 HA_NOSAME +
HA_SPACE_PACK_USED +
HA_NULL_PART_KEY
keydef->block_length 2 04 00 i.e. 1024
key def->keylength 2 00 06 field-count+sizeof(S1)
sizeof(ROWID)
keydef->minlength 2 00 06
keydef->maxlength 2 00 06
/* keyseg for S1 in I1 */
keyseg->type 1 01 /* I1(S1) size(S1)=1,
column = 1 */
= HA_KEYTYPE_TEXT
keyseg->language 1 08
keyseg->null_bit 1 02
keyseg->bit_start 1 00
keyseg->bit_end 1 00
[0] i.e. filler 1 00
keyseg->flag 2 00 14 HA_NULL_PART +
HA_PART_KEY
keyseg->length 2 00 01 length(S1) = 1
keyseg->start 4 00 00 00 01 offset in the row
keyseg->null_pos 4 00 00 00 00
/* key definition for I2 */
keydef->keysegs 1 02 keysegs=2, for columns
S2 and S3
keydef->key_alg 1 01 algorithm = Rtree or
Btree
keydef->flag 2 00 48 HA_SPACE_PACK_USED +
HA_NULL_PART_KEY
keydef->block_length 2 04 00 i.e. 1024
key def->keylength 2 00 0B field-count+ sizeof(all fields)+
sizeof(RID)
keydef->minlength 2 00 0B
keydef->maxlength 2 00 0B
/* keyseg for S2 in I2 */
keyseg->type 1 01 /* I2(S2) size(S2)=2,
column = 2 */
keyseg->language 1 08
keyseg->null_bit 1 04
keyseg->bit_start 1 00
keyseg->bit_end 1 00
[0] i.e. filler 1 00
keyseg->flag 2 00 14 HA_NULL_PART +
HA_PART_KEY
keyseg->length 2 00 02 length(S2) = 2
keyseg->start 4 00 00 00 02
keyseg->null_pos 4 00 00 00 00
/* keyseg for S3 in I2 */
keyseg->type 1 01 /* I2(S3) size(S3)=3,
column = 3 */
keyseg->language 1 08
keyseg->null_bit 1 08
keyseg->bit_start 1 00
keyseg->bit_end 1 00
[0] i.e. filler 1 00
keyseg->flag 2 00 14 HA_NULL_PART +
HA_PART_KEY
keyseg->length 2 00 03 length(S3) = 3
keyseg->start 4 00 00 00 04
keyseg->null_pos 4 00 00 00 00
recinfo
The recinfo section is written by mi_open.c, mi_recinfo_write(). The corresponding structure in myisamdef.h is MI_COLUMNDEF.
This is another multiple-occurrence structure. It appears once for each field that appears in a key, including an extra field that appears at the start and has flags (for deletion and for null fields).
Name Size Dump From Example File Comment ---- ---- ---------------------- ------- recinfo->type 2 00 00 extra recinfo->length 2 00 01 recinfo->null_bit 1 00 recinfo->null_pos 2 00 00 recinfo->type 2 00 00 I1 (S1) recinfo->length 2 00 01 recinfo->null_bit 1 02 recinfo->null_pos 2 00 00 recinfo->type 2 00 00 I2 (S2) recinfo->length 2 00 02 recinfo->null_bit 1 04 recinfo->null_pos 2 00 00 recinfo->type 2 00 00 I2 (S3) recinfo->length 2 00 03 recinfo->null_bit 1 08 recinfo->null_pos 2 00 00
We are now at offset 0xA2 within the file T.MYI. Notice that the value of the third field in the header, header_length, is 0xA2. Anything following this point, up till the first key value, is filler.
The .MYI Key Values
And now we look at the part which is not the information header: we look at the key values. The key values are in blocks (MySQL's term for pages). A block contains values from only one index. To continue our example: there is a block for the I1 key values, and a block for the I2 key values.
According to the header information (state->key_root above), the I1 block starts at offset 0x0400 in the file, and the I2 block starts at offset 0x0800 in the file.
At offset 0x0400 in the file, we have this:
Name Size Dump From Example File Comment
---- ---- ---------------------- -------
(block header) 2 00 0E = size (inclusive)
(first bit of word =
0 meaning this is a
B-Tree leaf, see the
mi_test_if_nod macro)
(first key value) 2 01 31 Value is "1" (0x31).
(first key pointer) 2-8 00 00 00 00 Pointer is to Record
#0000. pointer length
is = base->rec_reflength
(second key value) 2 01 33 Value is "3" (0x33).
(second key pointer) 2-8 00 00 00 02 Pointer is to Record
#0002.
(junk) 1010 .. .. .. .. .. .. .. rest of the 1024-byte
block is unused
At offset 0x0800 in the file, we have this:
Name Size Dump From Example File Comment
---- ---- ---------------------- -------
(block header) 2 00 18 = size (inclusive)
(first key value) 7 01 61 61 01 62 20 20 Value is "aa/b "
(first key pointer) 2-8 00 00 00 00 Pointer is to Record
#0000.
(second key value) 7 01 61 61 01 62 62 62 Value is "aa/bbb"
(second key pointer) 2-8 00 00 00 02 Pointer is to Record
#0002.
(junk) 1000 .. .. .. .. .. .. .. rest of the 1024-byte
block is unused
From the above illustrations, these facts should be clear:
- Each key contains the entire contents of all the columns, including trailing spaces in
CHARcolumns. There is no front truncation. There is no back truncation. (There can be space truncation ifkeyseg->flagHA_SPACE_PACKflag is on.) - For fixed-row tables: The pointer is a fixed-size (4-byte) number which contains an ordinal row number. The first row is Record #0000. This item is analogous to the ROWID, or RID (row identifier), which other DBMSs use. For dynamic-row tables: The pointer is an offset in the
.MYDfile. - The normal block length is 0x0400 (1024) bytes.
These facts are not illustrated, but are also clear:
- If a key value is
NULL, then the first byte is 0x00 (instead of 001 as in the preceding examples) and that's all. Even for a fixedCHAR(3)column, the size of the key value is only 1 byte. - Initially the junk at the end of a block is filler bytes, value = A5. If MySQL shifts key values up after a
DELETE, the end of the block is not overwritten. - A normal block is at least 65% full, and typically 80% full. (This is somewhat denser than the typical B-tree algorithm would cause, it is thus because
myisamchk -rqwill make blocks nearly 100% full.) - There is a pool of free blocks, which increases in size when deletions occur. If all blocks have the same normal block length (1024), then MySQL will always use the same pool.
- The maximum number of keys is 32 (
MI_MAX_KEY). The maximum number of segments in a key is 16 (MI_MAX_KEY_SEG). The maximum key length is 500 (MI_MAX_KEY_LENGTH). The maximum block length is 16384 (MI_MAX_KEY_BLOCK_LENGTH). All these MI_... constants are expressed by #defines in themyisamdef.hfile.
[edit] MyISAM Files
Some notes about MyISAM file handling:
- If a table is never updated, MySQL will never touch the table files, so it would never be marked as closed or corrupted.
- If a table is marked readonly by the OS, it will only be opened in readonly mode. Any updates to it will fail.
- When a normal table is opened for reading by a
SELECT, MySQL will open it in read/write mode, but will not write anything to it. - A table can be closed during one of the following events:
- Out of space in table cache
- Someone executed flush tables
- MySQL was shut down
- flush_time expired (which causes an automatic flush-tables to be executed)
- When MySQL opens a table, it checks if the table is clean. If it isn't and the server was started with the
--myisam-recoveroption, check the table and try to recover it if it's crashed. (The safest automatic recover option is probably--myisam-recover=BACKUP.)
[edit] MyISAM Dynamic Data File Layout
Variable length records are contained in "frames". A record can be put in one or more frames, also called the record "parts" or "blocks". The sense of the frames is to allow reusage of the space of deleted records. Starting with an empty data file, records are put in a single frame each, unless a record is bigger than the maximum frame size (16MB - 4). When a record is deleted, its frame is marked deleted. When a record is inserted after this, it reuses the old frame. If the new record is smaller, the frame is split. The unused part is marked deleted. If a new record is bigger than the old frame, the frame is filled with the record as much as fits. The rest is inserted in other old frames, or, if non is available, in a new frame at end of file.
[edit] Layout of the Record Storage Frame (Record Part, Record Block)
MI_MIN_BLOCK_LENGTH 20 /* 20 bytes are required for the biggest frame type: Deleted block. */ MI_MAX_BLOCK_LENGTH 16777212 /* 16MB - 4, max 3 bytes for length available, 4-byte aligned. */ MI_DYN_ALIGN_SIZE 4 /* Frames start a 4-byte boundaries. */
Part header[0] (decimal/hexadecimal, one byte):
0/00: Deleted block
block_len 3 bytes [1-3]
next_filepos 8 bytes [4-11]
prev_filepos 8 bytes [12-19]
=> header length 20
1/01: Full small record, full block
rec_len,data_len,block_len 2 bytes [1-2]
=> header length 3
2/02: Full big record, full block
rec_len,data_len,block_len 3 bytes [1-3]
=> header length 4
3/03: Full small record, unused space
rec_len,data_len 2 bytes [1-2]
unused_len 1 byte [3]
=> header length 4
4/04: Full big record, unused space
rec_len,data_len 3 bytes [1-3]
unused_len 1 byte [4]
=> header length 5
5/05: Start small record
rec_len 2 bytes [1-2]
data_len,block_len 2 bytes [3-4]
next_filepos 8 bytes [5-12]
=> header length 13
6/06: Start big record
rec_len 3 bytes [1-3]
data_len,block_len 3 bytes [4-6]
next_filepos 8 bytes [7-14]
=> header length 15
7/07: End small record, full block
data_len,block_len 2 bytes [1-2]
=> header length 3
8/08: End big record, full block
data_len,block_len 3 bytes [1-3]
=> header length 4
9/09: End small record, unused space
data_len 2 bytes [1-2]
unused_len 1 byte [3]
=> header length 4
10/0A: End big record, unused space
data_len 3 bytes [1-3]
unused_len 1 byte [4]
=> header length 5
11/0B: Continue small record
data_len,block_len 2 bytes [1-2]
next_filepos 8 bytes [3-10]
=> header length 11
12/0C: Continue big record
data_len,block_len 3 bytes [1-3]
next_filepos 8 bytes [4-11]
=> header length 12
13/0D: Start giant record
rec_len 4 bytes [1-4]
data_len,block_len 3 bytes [5-7]
next_filepos 8 bytes [8-15]
=> header length 16
block_len does not include the header length except of deleted blocks.
In deleted blocks block_len includes the header length.
data_len is the number of bytes within this block that are part of the record.
rec_len is the total record length including the data lengths of all belonging blocks.
In deleted blocks next_filepos and prev_filepos make a doubly linked list over all deleted blocks. At list start, prev_filepos is HA_OFFSET_ERROR (all bits set). At list end, next_filepos is HA_OFFSET_ERROR (all bits set).
In non-deleted blocks next_filepos points to the next part of the record.
The read function for the block header of dynamic records is mi_dynrec.c:_mi_get_block_info().
[edit] Record contents
[edit] Packed record layout
pack bits (!= NULL bits!): One bit per packable column:
FIELD_BLOB: Set if blob is empty. No blob data in record.
FIELD_SKIP_ZERO: Set if all bytes zero, no data in record.
FIELD_SKIP_ENDSPACE,
FIELD_SKIP_PRESPACE: Set if some blanks stripped in record.
=> The "pack bits" are rounded up to the next byte boundary.
Record contents from the in-memory representation. Each field is copied verbatim unless packed according to the "pack bits" paragraph.
[edit] In-memory record layout
null bits: One bit per column that can be NULL.
The "null bits" are rounded up to the next byte boundary.
the number of "null bytes" are also referred as data_offset.
fields: One field per column. No separators, length fields, etc.
Length (pack_length) and layout of the fields depend on the
field type:
-- to be added --
Note: The "in-memory record layout" is used by the SQL layer. It is independent from storage engines. All storage engines have to accept and produce this in-memory format at the handler interface.
[edit] MyISAM Compressed Data File Layout
This chapter describes the layout for the data file of compressed MyISAM tables.
[edit] Huffman compression
MyISAM compression is based on Huffman compression. In his article from 1952 Huffman proved that his algorithm uses the least possible number of bits to encode a sequence of messages. The number of bits assigned to each message depends on its probability to appear in the sequence.
Huffman did not specify exactly, what those "messages" are. One could take all possible values - say of a table column - as "messages". But if there are too many of them, the code tables could become bigger than the uncompressed table. One would need to specify every possible value once and the code tree with its indexes and offsets. Not to forget the effort to step through big binary trees for every value and - on the encoding side - the comparison of each value against the already collected distinct values.
The usual way to define "Huffman messages" is to take the possible 256 values, which a byte can express, as the "messages". That way the code trees are of limited size. On the other hand, the theoretical maximum compression is 1:8 (12.5%) in this case.
[edit] The myisampack Program
myisampack tries both ways to compress the column values. When starting to analyze the existing uncompressed data, it collects distinct column values up to a limit of 8KB. If there are more, it falls back to byte value compression for this column.
This means also that myisampack may use different algorithms for different columns. Besides a couple of other tricks, myisampack determines for every column if distinct column value compression or byte value compression is better. After that it tries to combine byte value compression trees of different columns into one or more code trees. This means that finally we may have less code trees than columns. Therefore the column information in the file header contains the number of the code tree used for each column. Some columns might not need a code tree at all. This happens for columns which have the same value in all records.
[edit] Record and Blob Length Encoding
Since the compressed data file should be usable for read-only purposes by the MySQL database management system, every record starts on a byte boundary. For easier handling by the system, every record begins with a length information for the compressed record and a length information for the total size of all uncompressed blobs of this record. Both lengths are encoded in 1 to 5 bytes, depending on its value.
A length from 1 to 253 bytes is represented in one byte. A length of 254 to 65535 bytes (64KB-1) is represented by three bytes. The first contains the value 254 and the next two bytes contain the plain length. The low order byte goes first. A length of 65536 to 4294967295 bytes (4GB-1) is represented by five bytes. The first contains the value 255 and the next four bytes contain the plain length. The low order byte goes first.
The encoded compressed record length does not include these length bytes. It tells the number of bytes which follow behind the length bytes for this record.
[edit] Code Tree Representation
The code trees are binary trees. Every node has exactly two children. The children can be leaves or branches. A leaf contains one original, uncompressed value. A branch contains a pointer to another node. The Huffman codes represent the navigation through the tree. Every left branch gets a 0 bit, every right branch gets a 1 bit.
The in-memory representation of the trees are two unsigned integers per node. Each describes either a leaf value or an offset (in unsigned integers relative from this node) to another node. To distinguish values from offsets, the 15th bit (decimal value 32768) is set together with offsets. This is safe as the size of the trees is limited by either having a maximum of 256 elements for byte value compression or 4096 elements for distinct column value compression.
The representation of the trees in the compressed data file is almost the same. But instead of writing all bits of the unsigned integers, only as many bits are written as are required to represent the highest value or offset respectively. One more bit per value/offset is written in advance, to distinguish both. The number of bits required per value and per offset is computed in advance and part of the code tree description.
[edit] Usage of the Index File
While the header of the compressed data file contains a lot of information, there are still some things which need to be taken from the index file. These are the number of columns of the table and the length of each column. The latter is required for columns with suppressed leading spaces or suppressed trailing spaces or zeros.
[edit] Tricks
As already mentioned, myisampack uses some tricks to decrease the amount of data to be encoded. These cope with leading and trailing spaces or zeros or with all blank or NULL fields.
I do not describe these in detail here. They do not materialize in the compressed data files other than the later mentioned field and pack types. They are however important to know for decoding the records.
[edit] Detailed Specification of the Decoding:
Below follows the detailed specification of the encoding:
Datafile fixed header (32 bytes):
4 byte magic number 4 byte total header length (fixed + column info + code trees) 4 byte minimum packed record length 4 byte maximum packed record length 4 byte total number of elements in all code trees 4 byte total number of bytes collected for distinct column values 2 byte number of code trees 1 byte maximum number of bytes required to represent record+blob lengths 1 byte record pointer length, number of bytes for compressed data file length 4 byte zeros
Column Information. For every column in the table:
5 bits field type
FIELD_NORMAL 0
FIELD_SKIP_ENDSPACE 1
FIELD_SKIP_PRESPACE 2
FIELD_SKIP_ZERO 3
FIELD_BLOB 4
FIELD_CONSTANT 5
FIELD_INTERVALL 6
FIELD_ZERO 7
FIELD_VARCHAR 8
FIELD_CHECK 9
6 bits pack type as a set of flags
PACK_TYPE_SELECTED 1
PACK_TYPE_SPACE_FIELDS 2
PACK_TYPE_ZERO_FILL 4
5 bits if pack type contains PACK_TYPE_ZERO_FILL
minimum number of trailing zero bytes in this column
else
number of bits to encode the number of
packed bytes in this column (length_bits)
x bits number of the code tree used to encode this column
x is the minimum number of bits required to represent the highest
tree number.
Alignment:
x bits alignment to the next byte border
Code Trees. For every tree:
1 bit compression type
0 = byte value compression
8 bits minimum byte value coded by this tree
9 bits number of byte values encoded by this tree
5 bits number of bits used to encode the byte values
5 bits number of bits used to encode offsets to next tree elements
1 = distinct column value compression
15 bits number of distinct column values encoded by this tree
16 bits length of the buffer with all column values
5 bits number of bits used to encode the index of the column value
5 bits number of bits used to encode offsets to next tree elements
For each code tree element:
1 bit IS_OFFSET
x bits the announced number of bits for either a value or an offset
x bits alignment to the next byte border
If compression by distinct column values:
The number of 8-bit values that make up the column value buffer
Compressed Records. For every record:
1-5 bytes length of the compressed record in bytes
1. byte 0..253 length
254 length encoded in the next two bytes little endian
255 length encoded in the next x bytes little endian
x = 3 for pack file version 1
x = 4 for pack file version > 1
1-5 bytes total length of all expanded blobs of this record
1. byte 0..253 length
254 length encoded in the next two bytes little endian
255 length encoded in the next x bytes little endian
x = 3 for pack file version 1
x = 4 for pack file version > 1
For every column:
If pack type includes PACK_TYPE_SPACE_FIELDS,
1 bit 1 = spaces only, 0 = not only spaces
In case the field type is of:
FIELD_SKIP_ZERO
1 bit 1 = zeros only, 0 = not only zeros
In the latter case
x bits the Huffman code for every byte
FIELD_NORMAL
x bits the Huffman code for every byte
FIELD_SKIP_ENDSPACE
If pack type includes PACK_TYPE_SELECTED,
1 bit 1 = more than min endspace, 0 = not more
In the former case
x bits nr of extra spaces, x = length_bits
else
x bits nr of extra spaces, x = length_bits
x bits the Huffman code for every byte
FIELD_SKIP_PRESPACE
If pack type includes PACK_TYPE_SELECTED,
1 bit 1 = more than min prespace, 0 = not more
In the former case
x bits nr of extra spaces, x = length_bits
else
x bits nr of extra spaces, x = length_bits
x bits the Huffman code for every byte
FIELD_CONSTANT or FIELD_ZERO or FIELD_CHECK
nothing for these
FIELD_INTERVALL
x bits the Huffman code for the buffer index of the column value
FIELD_BLOB
1 bit 1 = blob is empty, 0 = not empty
In the latter case
x bits blob length, x = length_bits
x bits the Huffman code for every byte
FIELD_VARCHAR
1 bit 1 = varchar is empty, 0 = not empty
In the latter case
x bits blob length, x = length_bits
x bits the Huffman code for every byte
x bits alignment to the next byte border