Implement XDB Indexes for Maria.
NOTE: We have a formal agreement signed with the inventors of the XDB indexes, and
this gives us the rights to use the XDB indexes freely in MySQL product
development, under the following condition:
(c) Licensee agrees to give credit to Licensor in all its documents either on
paper, electronic or Internet describing this block optimization feature (XDB).
XDB indexes will be clearly stated as invented by the Licensor, and a link to
Licensors website will be presented.
Below definition from Agreement with XDB inventors.
XDB is a Block Optimization feature of SafeDB
A table is partitioned in segments (called blocks in XDB) and min/max column
values are calculated for each block and kept in memory, enabling to skip the
reading of a block when no matching values can be found in it (because for
instance the matching value is out of the rang min-max).
In other words:
The database (datastore) is divided into segments (a set of values of which
size is defined at the databases design stage)
Each segment has an index entry holding the minimum and maximum values of
each block and for each specified column
Block optimization is lighter to use (min/max block values array is much smaller
than an index) and can be faster than indexing when many rows are retrieved
because the table is still read sequentially.
This feature gives an excellent and reduction at a very low cost, principally
when values are clustered and obviously sorted in table, so that the min/max
range is restricted for each block. However contrary to the lack of efficiency
of traditional DB indexing techniques in most actual data, many columns are
correlated together which extends the benefit of the XBD feature to several
columns by the same token. For example, in a personnel table, salaries, dates of
birth and positions are closely correlated variables. This extends the benefit
of the XBD feature to many more columns than one. Another significant advantage
over indexing techniques is linked to the slight cost of building the XDB
feature: there is almost no burden in building or rebuilding an XDB block
optimization. On a simple 1.6Mhz PC, block optimization for a single column is
built at the typical rate of 3 seconds for 1 million rows (4 block indexes being
built out of 40 columns in vector format).
Conversely, if all column values were scattered randomly in a table, then, the
min/max values for each block would be likely close to the global min/max values
of the column and most blocks would be eligible for reading, making such feature
useless.
However, as explained above, in the real world, many columns are correlated and
rarely random.
Finally, one should be aware that there is a tradeoff between block size and
expected optimization efficiency.
Usage:
Searches looking up the hierarchical min/max index pages (xdb pages), and
checking based on query constraints whether the hierarchically lower pages need
to be investigated, or in case of n=1, the segment needs to be table-scanned
When using partitioned tables, the top-level XDB pages can be kept in memory,
which avoids in many cases opening and scanning of one or several entire
partitions, and result in considerable performance gain.
Benefits:
- For some tablescans the block optimization can result in 100-1000 times
faster performance, for some tablescans there will be no advantage or drawback.
- The magnitude of the benefit comes from a) the dataset in the table and b)
the type of query executed
- It is easy to imagine use cases where the benefits are considerable: for
instance tables which automatically includes a timestamp column, which gets
filled almost chronologically - then searching for a particular time span could
be done much quicker than through ordinary table scans or even traditional
indexing techniques.
- If we implement this functionality into Maria, it should be simple to create
benchmark tests with suitable data, where we could demonstrate huge performance
benefits over the competitors (However: note that with the ame test and
different data set the result could be very different and - in the worst case
scenario - there might be no differences between MySQL and competitors).
You must be logged in to tag this worklog