Falcon Feature Preview
Contents |
[edit] Introduction
This Feature Preview is intended to allow the community to test some important performance improvements to the Falcon Database. The performance changes are listed below along with key bug fixes.
[edit] Version 6.0.5 - Feature Preview
Several performance improvements were made in this release (downloads)
Supernodes
Supernodes are an array of 16 vectors into each index page to keys that are fully expanded with noprefix compression. This allows the page to be searched quicker using a binary search of supernode keys followed by the normal sequential search. Previously, the whole page was searched using a sequential search. This reduced the time spent searching key pages tremendously.
Thread Signaling
Bug#34890 documents a problem that existed in very high concurrency situations in which a thread that waits on a SyncObject could miss the signal and then wait on the next time that SyncObject was locked and unlocked. If the syncObject was a Transaction::syncObject, the transaction may complete and it may never get signaled. In this case there would be a false wait lock timeout. For other SyncObjects, the waiting thread would stall up to 10 seconds before retrying the lock. These hangs and stalls occur in boxes with multiple CPUs with high concurrency. They are very intermittent and timing dependent. Performance tests usually exhibit high deviation between identical runs with the overall result of lowering performance.
[edit] Version 6.0.4 - Feature Preview
Performance Improvements Previous alpha versions of Falcon showed performance problems in two areas that are addressed in the 6.0.4 release. The first was that forcing pages written by a checkpoint operation to disk took longer than the period between checkpoint operations. The second was that the front half of Falcon, where transactions run, tended to get significantly ahead of the back half which is responsible for integrating committed changes into the database. Fixing the first problem required a multi-step reworking of the I/O architecture of Falcon, which, to nobody's surprise, uncovered several surprises. Improving the speed of checkpoints helped the back end keep up with the front. Adding more gopher threads got the two ends running together again.
The measurements were made and the changes were tested on Linux. The solutions may change performance on Windows, and other platforms, but as of this moment we don't know whether the changes are beneficial elsewhere, let alone what the degree of change may be. We're more hopeful for *nix platforms. We will work on performance on Windows in the future. We know lots of tricks there, too.
Pool of Asynchronous I/O Threads
Having gone through all that effort, adding a group of threads writing in parallel seemed like a reasonable next step. That, of course, lead to the question, how many threads are enough? And that lead to the next parameter: falcon_io_threads which defaults to 2. When and if we discover a reliable algorithm for picking the right number of I/O threads, we would like to abandon this parameter.
Direct IO
When fsync was eliminated in version 6.0.3, Falcon used O_SYNC to force each page through the file system cache to disk. That lead to much discussion of the unnecessary cost of copying pages from the Falcon page cache to the system file cache. So O_SYNC was chanded to O_DIRECT. Then the two were compared and tested on different file systems. RAM disk doesn't support O_DIRECT, and at least one system showed 0_SYNC being faster than O_DIRECT, so we compromised, using O_DIRECT by default and falling back to O_SYNC, allowing an override with the parameter falcon_direct_io.
Pool of Gopher Threads
Falcon has a front end and a back end. The two run largely asynchronously. The front end handles transactions from start to durable commit. The front end uses the page cache to read data into the record cache. A running transaction makes data changes in the record cache and index changes in its deferred indexes. During a prepare or commit, the transaction's changes move to the serial log. Once the serial log is flushed to disk, the transaction is durable.
At that point, the back end of Falcon starts moving committed changes from the serial log into the database where they are easier to find. In fact, unmoved changes remain in memory and continue to be referenced there until the gopher gets them into the database. If the system crashes before the gopher has emptied the serial log, the recovery process picks up where the gopher left off and nothing much happens until all committed changes are on disk.
As the front end got faster, the back end lagged, causing bizarre performance problems. The strangest was that running a complex memory-intensive query on the Information Schema made the system faster. From our tests, we believe that the query tied up the front end, allowing the back end to finish pushing committed changes into the database and releasing the transactions that had committed but were kept around until they became "write complete" - meaning that their changes were in the database.
Part of the solution was to get the single gopher some friends to share the load. The number of gopher threads is governed by yet another parameter: falcon_gopher_threads which defaults to five.
Thread Scheduler
Changing to asynchronous I/O threads uncovered another interesting situation. Both reads into the cache and writes to the serial log starved while checkpoints ran along briskly. So Falcon developed yet another characteristic of an operating system: an I/O thread scheduler that gives different priority to different operations.
[edit] Version 6.0.3 - Alpha
Eliminate fsync
Prior to Version 6.0.3, Falcon used buffered I/O to write the pages flushed in a checkpoint operation. When all pages were written, Falcon used an fsync to force them to disk. The frequency of checkpoints is determined by the parameter falcon_checkpoint_schedule which defaults to once every 30 seconds. The fsync often took longer than 30 seconds. The next checkpoint waited for the first to complete. Delayed checkpoints tended to have more work than those that occurred on schedule, so the delays propagated. [Think air traffic control with a squall line going from Chicago to Atlanta.]
Page consolidation
The last of the I/O performance improvements in 6.0.3 is page consolidation. It is much faster to write a large contiguous block in a single operation than in a sequence of page sized writes. Unfortunately, the page cache doesn't necessarily keep pages in storage order. Nor should it. During the first pass over the cache, Falcon determines which of the pages to be written are actually contiguous. Those pages are moved to a write buffer and written in a single operation.
[edit] Feature Changes
[edit] New Settings
- FALCON_CONSISTENT_READ - Determines how repeatable read isolation is done when viewing new changes.
- FALCON_DIRECT_IO - Allows the user to select between O_DIRECT and O_SYNC
- FALCON_GOPHER_THREADS - Number of Gopher threads
- FALCON_IO_THREADS - Number of IO threads
- FALCON_LARGE_BLOB_THRESHOLD - Blobs below this threshold are stored in data pages instead of blob pages. This provides faster transaction durability since only the serial log needs to be written at the end of the transaction, not the blob pages.
- FALCON_LOCK_TIMEOUT - Specifies how long Falcon will make one transaction wait for another. Default = 0 which means indefinitely.
- FALCON_SERIAL_LOG_DIR - Allows the serial log to be placed on a separate disk.
- FALCON_SERIAL_LOG_PRIORITY - Allows the serial log to be written to at a higher priority.
[edit] Falcon Repeatable Read
The ISO SQL standard defines four isolation modes for transactions: Serializable, Repeatable Read, Read Committed, and Read Uncommitted. The definitions of the modes are based on the behavior of systems that lock records and ranges.
The ISO SQL Standard describes Repeatable Read transactions as having the isolation level provided by read/write record locks without locks on ranges. Reading the same record twice will always get the same value for its fields, but a select with the same criteria may get more records each time it runs. Oddly, the standard defines "Repeatable Read" as not repeatable.
However, Falcon and other engines that rely on Multi-Version Concurrency Control provide an isolation level that is completely repeatable but not serializable because it allows some update anomalies. This is the mode a Falcon Transaction gets when it chooses the Repeatable Read isolation level. Each transaction sees a stable snapshot of the records that were committed when the transaction started.
InnoDB's implementation of Repeatable Read includes an anomaly that causes a simple select statement to get different results from a select for update. A simple select sees the state of the database that was committed when the transaction began. A select for update sees all committed changes as of the instant it runs - effectively Read Committed mode. Update and delete statements also run in Read Committed mode.
This hybrid of Repeatable-Read and Read-Committed isolation levels improves throughput in a highly concurrent environment full of database updates. With careful coding, it also gets consistent results. In Falcon's normal Repeatable Read mode, a transaction cannot update or delete a record if it cannot select the most recent committed version. When the situation arises, the transaction gets an update conflict error and must rollback before the operation can succeed. By allowing the select for update, update, and delete statements to access the most recently committed version of records, InnoDB allows more transactions to succeed, at the cost of possible inconsistent results for improperly coded applications.
Falcon originally provided Repeatable-Read transactions that were consistent. In the 6.0.2 and 6.0.3 alpha releases, it emulated InnoDB. Now there is a setting in which you can choose between the two modes. The parameter falcon_consistent_read is on by default and provides truly repeatable reads. Turning the parameter off makes Repeatable Read transactions behave like InnoDB.
[edit] Bug Fixes
[edit] Bugs Fixed in Version 6.0.4 - Feature Preview
- Bug#22125 Falcon: Double precision searches fail if index exists
- Bug#22168 Inserting bad early dates
- Bug#22173 TRUNCATE does not reset auto_increment counter
- Bug#22564 auto_increment column gets automatically incremented
- Bug#27424 Falcon: crash if case sensitive database names
- Bug#27425 Falcon: case sensitive table names
- Bug#27426 Falcon: searches fail if datetime column and index exists
- Bug#29151 Falcon: running sysbench 0.4.8 leads to duplicate key errors
- Bug#29211 Falcon: information_schema has a falcon_tables view
- Bug#29452 Falcon: two-way deadlock with unique index and trigger
- Bug#29823 Falcon falcon_database_io table doesn't report stats for user DB's
- Bug#30281 Falcon: missing privilege check for dropping tablespace
- Bug#31005 Falcon: setting falcon_serial_log_dir has no effect
- Bug#31045 Error in compiling Falcon 6.0.0.2 alpha FreeBSD
- Bug#31110 Falcon: missing engine check while dropping tablespace
- Bug#31114 Falcon: creating tablespace with same name twice returns Unknown error -103
- Bug#31286 Falcon crashes when falcon_record_memory_max is exceeded
- Bug#31296 Falcon does not remove associated tablespace file.
- Bug#31490 The funcs_1 test "falcon_func_view" fails due to differences in a datetime col
- Bug#31671 Falcon engine does not support ROW or STATEMENT binlog_format
- Bug#31967 Falcon: hang changing falcon_record_memory_max
- Bug#32191 Memory overrun when using join buffering for falcon table with a blob
- Bug#32194 Falcon: incorrect count of changed rows
- Bug#32413 Memory usage not constrained by falcon_record_memory_max, assertion failure
[edit] Bugs Fixed in Version 6.0.3 - Alpha
- Bug#30826 Falcon - Crash if OPTIMIZE PARTITION of a file with no records
- Bug#29332 Falcon deadlocks when running falcon_bug_28026.test
[edit] Known Open Bugs
Try this link to access Open, Verified, Analyzing, and In Progress Falcon Bugs
[edit] Downloads
[edit] Binary packages
Preview builds for Linux x86-64 and Windows-32 are available for download from http://downloads.mysql.com/forge/falcon_feature_preview/
[edit] Sources
Source code is available from our public BitKeeper trees at http://mysql.bkbits.net/ - please consult the reference manual for more information on how to build a MySQL binary from a source tree.
The source tree for this feature preview is also found here. This source has been compiled on Linux 32-bit and 64-bit, FreeBSD 32-bit and 64-bit, Mac/Intel, Windows 32-bit and 64-bit, and Solaris/x86.