Falcon
Contents |
[edit] Overview
Falcon (code name) is a transactional storage engine, based on Netfrastructure database engine, extended and integrated into MySQL.
The main goals of Falcon are to exploit large memory for more than just a bigger cache, to use threads and processors for data migration.
You can find a video-presentation on Falcon in the MySQL_Tutorials category.
[edit] Take a look
Last session of the MySQL UC is Jim Starkey giving an introduction to the new Falcon storage engine. Jim is an icon in the database field, was the creator of MVCC and the BLOB data type.
[edit] What is Falcon
- transactional MySQL storage engine
- based on Netfrastructure database engine
- engine has been in mission critical apps for more than 4 years
- extended and integrated into MySQL
[edit] Falcon is NOT
- an InnoDB clone
- Firebird
- a Firebird clone
- a standalone database management management system
- Netfrastructure
Jim's been at this for a long time, there have been some changes since he wrote his first database at DEC:
- Uni-processors to multi-core
- CPU performance from 1.7 to 42000 MPS
- Address space from 32-bit to 64-bit
- Memory speed from 1 micro to 7 nanoseconds
- Memory from 2mb to 4GB
- Disk access has gone from 20 ms to 7 ms
- back then 100 users was big, now 100,000 users is a small application
- RDMSs moved from decision support -> OLTP -> Data mining -> Web
- DBAs are smarter and they are harder to find
- Application programmers skill levels are down, using higher-level languages
- Expert design means appearance, not architecture
- Applications have much larger databases, more queries per human interaction, fewer rows per results set, latency is more critical, blobs are much larger (more important), and search without context.
[edit] What Jim has learned
- CPUs and memory are faster
- Disks are still slow
- MVCC (which was invented by Jim) works
- Record versions on disk are problematic
- Web applications are better and are the future (the attention span is low so you have to make good ones and are forced to constantly be making them better)
- People have more important things to do than tune databases - these days machines are powerful enough to be able to tune themselves, people shouldn't have to serve databases, it's the other way around.
Falcon is designed for the next 20 years. Jim is comfortable saying that what he's learned about databases over the past 20 years, and has put into Falcon, will take database technology to new heights.
[edit] Goals of Falcon
- Exploit large memory for more than just a bigger cache
- Use threads and processors for data migration
- Design to eliminate tradeoffs, minimize tuning
- Scale gracefuly to very heavy loads
[edit] Architectural Overview
Incomplete in-memory database with backfill from disk that has two caches. The traditional LRU page cache for disk. A larger row cache with age group scavenging. Falcon is multi-version in memory and single version on disk. All transaction state is in memory with automatic overflow to disk. Data and indexes are a single file plus log files. In the future, Jim would like to create BLOB repositories where the data is stored off to the side. He is hoping to provide multiple page spaces in the future.
Falcon uses B-tree indexes with prefix compression. There is no data except the key in the index.
Uncommitted row data is staged in memory (can overflow to a scratch file). Indexes are updated immediately. On commit, row data is copied to the serial log and written. A post-commit dedicated thread copies row data from serial log pages to data pages. The page cache is periodically flushed to disk. BLOB data is scheduled for write at creation.
Data reliability is protected by "careful write," where writes are sequenced to the disk so it's always valid and consistent. There is a repair mechanism, but Jim's hope is that no one will ever have to use it.
Falcon has a do/redo/undo log in the serial.
[edit] Secret Agenda
Jim's got a secret agenda of things he'd like to do in the database world, starting with MySQL:
- Replace varchar with string, varchar is a throwback to punchcard technology
- Replace tiny, small, medium, and big integers with "number"
- Adopt a security model useful for the web
- Provide gow-level security (filter sets)
- Teach the database world the merits free context search. Why do you have to use SELECT statements?
[edit] Q&A
- Foreign keys - have them but not enforced
- Backups - Netfrastrucure has two
- When can we see it - will get a beta in Q3 ( from Robin) (update 4/2/07: currently in alpha, beta targeted for "late Q2 this year": [1])
- Filesystem storage - stored in one file, broken into fixed-length pages from 2K to 32K
- Memory - everything except BLOBs come into the cache
- Performance . . . 420 GB instance in Massachusetts that is pretty active and is only using 30% of the CPU