Categories: MySQLDevelopment | Software Preview

Falcon Feature Preview

Notice: This page is old and unmaintained.

Notice: Since early 2009, Falcon development has been discontinued. This page is only kept for historical reasons

Contents

Introduction

This Feature Preview is intended to allow the community to test some important performance improvements to the Falcon Database. The performance changes are listed below along with key bug fixes.

Version 6.0.7

Online Add & Drop Index

Until this version, adding or dropping an index made the table unavailable for the duration of the operation. Adding an index meant that the table was unavailable while MySQL first copied the data from table to a temporary table, then recreated the existing indexes, then created the new index, then dropped the old table, then renamed the temporary table to the original table name. Dropping an index also required copying all the data and recreating all the surviving indexes.

Starting with 6.0.7, Falcon creates indexes on the fly, without interrupting read/write access to to the table. A request to drop an index will block until any queries using that index have completed, then the space used by the index is released without affecting other access.

Page Checksum Protection

By default, Falcon now checksums pages before writing them to disk and validates the checksum after reading the page. To turn off checksumming, set the parameter falcon_checksums to 'OFF'.

Serial log File Truncation

Falcon writes its serial log to two files, alternately. While the Falcon front end writes to one file, the gopher threads move committed changes from the inactive file to the database. When all changes from the inactive file are written to the database, Falcon switches and begins writing to the formerly inactive file, overwriting the obsolete entries there. Under some circumstances, the switch may be delayed, causing the active file to become very large. A new parameter, falcon_serial_log_file_size, causes Falcon to truncate the inactive file immediately before switching to it. The default is 10 megabytes.

Version 6.0.6

Optimize Limit Queries Normally Falcon performs index look-ups in two stages: first it reads the index, setting bits in a sparse bitmap to indicate records that meet the indexed criteria, then it uses the bitmap to read the record data. That algorithm allows Falcon to make separate passes over the index and data, and insures that data is read in storage order. However, it also means that the rows are returned in storage order, not index order.

For queries that use LIMIT and some queries that use GROUP BY or DISTINCT, the MySQL optimizer produces better plans if rows are returned in index order. Starting in 6.0.6, Falcon can return rows in index order if the optimizer indicates that the query will run faster.

Online Add & Drop Column

This feature allows Falcon to add columns to an existing table without copying the table. Unfortunately there is a bug in the implementation and the feature is temporarily unavailable.

Record Cache Backlogging

Although it uses Multi-Version Concurrency Control (MVCC), Falcon stores only the most recently committed version of each record in the database. Old versions of records and uncommitted records are kept in the Record Cache in memory. While a transaction is active, Falcon must keep older record versions even after new versions are committed to support repeatable read. Prior to V6.0.6, long running transactions could cause Falcon to preserve huge numbers of old record versions, that could eventually fill the Record Cache completely. This feature allows chains of record versions, complete with their RecordVersion objects to be written out to a special tablespace.

Tests of very high concurrency and tests containing longer transactions could get a "Record Memory Exhausted" error when the record cache filled with record versions.

The Record backlog is a tablespace designed to hold record versions from any and all tables. It contains a single table, indexed by the table identifier and record number of the chain of versions being stored. Both writing to the backlog and retrieving records from the backlog are expensive operations compared with accessing them from the Record Cache. The backlog allows Falcon to continue operation in a low memory situation, but it should not be used to as an alternative to an adequately sized Record Cache. It is an emergency backup of records from the Record Cache, and its use should be considered a stopgap like swapping or paging from system memory.

Version 6.0.5

Several performance improvements were made in this release (downloads)

Supernodes

Supernodes are an array of 16 vectors into each index page to keys that are fully expanded with noprefix compression. This allows the page to be searched quicker using a binary search of supernode keys followed by the normal sequential search. Previously, the whole page was searched using a sequential search. This reduced the time spent searching key pages tremendously.

Thread Signaling

Bug#34890 documents a problem that existed in very high concurrency situations in which a thread that waits on a SyncObject could miss the signal and then wait on the next time that SyncObject was locked and unlocked. If the syncObject was a Transaction::syncObject, the transaction may complete and it may never get signaled. In this case there would be a false wait lock timeout. For other SyncObjects, the waiting thread would stall up to 10 seconds before retrying the lock. These hangs and stalls occur in boxes with multiple CPUs with high concurrency. They are very intermittent and timing dependent. Performance tests usually exhibit high deviation between identical runs with the overall result of lowering performance.

Version 6.0.4

Performance Improvements Previous alpha versions of Falcon showed performance problems in two areas that are addressed in the 6.0.4 release. The first was that forcing pages written by a checkpoint operation to disk took longer than the period between checkpoint operations. The second was that the front half of Falcon, where transactions run, tended to get significantly ahead of the back half which is responsible for integrating committed changes into the database. Fixing the first problem required a multi-step reworking of the I/O architecture of Falcon, which, to nobody's surprise, uncovered several surprises. Improving the speed of checkpoints helped the back end keep up with the front. Adding more gopher threads got the two ends running together again.

The measurements were made and the changes were tested on Linux. The solutions may change performance on Windows, and other platforms, but as of this moment we don't know whether the changes are beneficial elsewhere, let alone what the degree of change may be. We're more hopeful for *nix platforms. We will work on performance on Windows in the future. We know lots of tricks there, too.

Pool of Asynchronous I/O Threads

Having gone through all that effort, adding a group of threads writing in parallel seemed like a reasonable next step. That, of course, lead to the question, how many threads are enough? And that lead to the next parameter: falcon_io_threads which defaults to 2. When and if we discover a reliable algorithm for picking the right number of I/O threads, we would like to abandon this parameter.

Direct IO

When fsync was eliminated in version 6.0.3, Falcon used O_SYNC to force each page through the file system cache to disk. That lead to much discussion of the unnecessary cost of copying pages from the Falcon page cache to the system file cache. So O_SYNC was chanded to O_DIRECT. Then the two were compared and tested on different file systems. RAM disk doesn't support O_DIRECT, and at least one system showed 0_SYNC being faster than O_DIRECT, so we compromised, using O_DIRECT by default and falling back to O_SYNC, allowing an override with the parameter falcon_direct_io.

Pool of Gopher Threads

Falcon has a front end and a back end. The two run largely asynchronously. The front end handles transactions from start to durable commit. The front end uses the page cache to read data into the record cache. A running transaction makes data changes in the record cache and index changes in its deferred indexes. During a prepare or commit, the transaction's changes move to the serial log. Once the serial log is flushed to disk, the transaction is durable.

At that point, the back end of Falcon starts moving committed changes from the serial log into the database where they are easier to find. In fact, unmoved changes remain in memory and continue to be referenced there until the gopher gets them into the database. If the system crashes before the gopher has emptied the serial log, the recovery process picks up where the gopher left off and nothing much happens until all committed changes are on disk.

As the front end got faster, the back end lagged, causing bizarre performance problems. The strangest was that running a complex memory-intensive query on the Information Schema made the system faster. From our tests, we believe that the query tied up the front end, allowing the back end to finish pushing committed changes into the database and releasing the transactions that had committed but were kept around until they became "write complete" - meaning that their changes were in the database.

Part of the solution was to get the single gopher some friends to share the load. The number of gopher threads is governed by yet another parameter: falcon_gopher_threads which defaults to five.

Thread Scheduler

Changing to asynchronous I/O threads uncovered another interesting situation. Both reads into the cache and writes to the serial log starved while checkpoints ran along briskly. So Falcon developed yet another characteristic of an operating system: an I/O thread scheduler that gives different priority to different operations.

Version 6.0.3 - Alpha

Eliminate fsync

Prior to Version 6.0.3, Falcon used buffered I/O to write the pages flushed in a checkpoint operation. When all pages were written, Falcon used an fsync to force them to disk. The frequency of checkpoints is determined by the parameter falcon_checkpoint_schedule which defaults to once every 30 seconds. The fsync often took longer than 30 seconds. The next checkpoint waited for the first to complete. Delayed checkpoints tended to have more work than those that occurred on schedule, so the delays propagated. [Think air traffic control with a squall line going from Chicago to Atlanta.]

Page consolidation

The last of the I/O performance improvements in 6.0.3 is page consolidation. It is much faster to write a large contiguous block in a single operation than in a sequence of page sized writes. Unfortunately, the page cache doesn't necessarily keep pages in storage order. Nor should it. During the first pass over the cache, Falcon determines which of the pages to be written are actually contiguous. Those pages are moved to a write buffer and written in a single operation.

Feature Changes

New Settings

Falcon Repeatable Read

The ISO SQL standard defines four isolation modes for transactions: Serializable, Repeatable Read, Read Committed, and Read Uncommitted. The definitions of the modes are based on the behavior of systems that lock records and ranges.

The ISO SQL Standard describes Repeatable Read transactions as having the isolation level provided by read/write record locks without locks on ranges. Reading the same record twice will always get the same value for its fields, but a select with the same criteria may get more records each time it runs. Oddly, the standard defines "Repeatable Read" as not repeatable.

However, Falcon and other engines that rely on Multi-Version Concurrency Control provide an isolation level that is completely repeatable but not serializable because it allows some update anomalies. This is the mode a Falcon Transaction gets when it chooses the Repeatable Read isolation level. Each transaction sees a stable snapshot of the records that were committed when the transaction started.

InnoDB's implementation of Repeatable Read includes an anomaly that causes a simple select statement to get different results from a select for update. A simple select sees the state of the database that was committed when the transaction began. A select for update sees all committed changes as of the instant it runs - effectively Read Committed mode. Update and delete statements also run in Read Committed mode.

This hybrid of Repeatable-Read and Read-Committed isolation levels improves throughput in a highly concurrent environment full of database updates. With careful coding, it also gets consistent results. In Falcon's normal Repeatable Read mode, a transaction cannot update or delete a record if it cannot select the most recent committed version. When the situation arises, the transaction gets an update conflict error and must rollback before the operation can succeed. By allowing the select for update, update, and delete statements to access the most recently committed version of records, InnoDB allows more transactions to succeed, at the cost of possible inconsistent results for improperly coded applications.

Falcon originally provided Repeatable-Read transactions that were consistent. In the 6.0.2 and 6.0.3 alpha releases, it emulated InnoDB. Now there is a setting in which you can choose between the two modes. The parameter falcon_consistent_read is on by default and provides truly repeatable reads. Turning the parameter off makes Repeatable Read transactions behave like InnoDB.

Bug Fixes

Bugs Fixed in Version 6.0.7

Bugs Fixed in Version 6.0.6

Bugs Fixed in Version 6.0.5

Bugs Fixed in Version 6.0.4

Bugs Fixed in Version 6.0.3 - Alpha


Known Open Bugs

Try this link to access Open, Verified, Analyzing, and In Progress Falcon Bugs

Downloads

The most recent downloads of MySQL 6.0 including the Falcon storage engine are available at http://dev.mysql.com/downloads/mysql/6.0.html.

Binary packages

Older Preview builds for Linux x86-64 and Windows-32 are available for download from http://downloads.mysql.com/forge/falcon_feature_preview/

Sources

Source code is available from our public Bzr trees at [1] - please consult the reference manual for more information on how to build a MySQL binary from a source tree.

The source tree for this feature preview is also found here. This source has been compiled on Linux 32-bit and 64-bit, FreeBSD 32-bit and 64-bit, Mac/Intel and Mac/PPC, Windows 32-bit and 64-bit, and Solaris/x86 and Solaris/SPARC.

Retrieved from "http://forge.mysql.com/wiki/Falcon_Feature_Preview"

This page has been accessed 55,597 times. This page was last modified 13:35, 12 October 2010.

Find

Browse
MySQLForge
Main Page
Current events
Recent changes
Random page
Help
Edit
View source
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Special pages
New pages
File list
Statistics
Bug reports
More...