Categories: Specifications | MySQLDevelopment

SE API QFPD

Contents

[edit] Storage Engine API Extensions For Query Fragment Pushdown

This document summarizes the ideas and plans for MySQL storage engine API extensions primarily focused on supporting query fragment pushdown.


[edit] Version History


Date         Author   Comments
04-Sep-2009  Thava    Initial version after discussions with Mikael Ronstrom,
                      Timour and Storage Engine API Team.
21-Jan-2009  Thava    serverapi.h updates
05-Feb-2009  Thava    serverapi.h updates
13-apr-2009  Thava    Renamed the updated api file to aqtdict.h  

[edit] Background

In many instances, it is much more efficient to push parts of query conditions to storage engines rather than doing it in server. Such conditions could be either trivial scan filters (e.g: "a = 30") or more complex (e.g. join conditions on different tables). Pushing down the constant scan filter was added recently for cluster storage engine. However, pushing down more complex conditions is required and not yet supported. This requirement is especially critical for cluster storage engine. Without this, all entire tables involved in the query are fetched to the server from different nodes before applying "joins", for example. This could be eliminated by pushing such conditions to storage engines to limit the amount of data shipped across nodes.

Current storage engine API exposes too many internal details, hence too fragile if anything changes internally. We need API to expose the data structure through abstract and stable interfaces. The new API should be very well documented so that it would be easier to understand and use.

[edit] Goals

The primary goals are mentioned below :

  1. Enable query fragment pushdown(QFPD) to storage engines. The immediate next step should support pushing down join conditions on tables belonging to same storage engine. More complex conditions would follow later. However the design would be easily extensible in future. The API Extensions would make use of easy-to-use and clearly documented interfaces. New AQT (Abstract Query Tree) data structure will be created that will be used as parameters for new API extensions.
  2. Extend the AQT interfaces that could be potentially be used by both server and/or storage engines. Such use is not only limited to QFPD. The benefit would be-- easier to use APIs across different server modules and storage engines. (This is long term goal).

[edit] Abstract Query Tree

The Abstract Query Tree is the core theme which enables easy-to-use and stable interface between server and storage engine. As a first step, this will be used to pass query fragments to storage engine for pushdown. In longer term, this is envisioned to be extensible and usable within server itself whenever appropriate.

[edit] Worklogs Reference


[edit] Main Tasks

The main tasks involved are :

[edit] WL#4535 scratchpad

[edit] Example applications of AQT designs to imaginary and real storage engines

[edit] Engine capabilities
[edit] Application of AQT designs to the engines above
my_se::pushdown_analysys(aqt *qf)
{
  aqt_cond cur_join_cond = qf->get_join_cond();
  if (cur_join_cond is not an atomic condition)
    qf->mark_non_pd()
    return;

  aqt_abstract_table *cur_join_operand;
  cur_join_operand = qf->get_left();
  if (cur_join_operand->type() != AQT_BASE_TABLE)
    qf->mark_non_pd()
    return;

  cur_join_operand = qf->get_right();
  if (cur_join_operand is not a base table)
    qf->mark_non_pd()
    return;
}

my_se::pushdown_analysys(aqt *qf)
{
  aqt_cond cur_join_cond = qf->get_join_cond();
  if (cur_join_cond is not an atomic condition)
    qf->mark_non_pd()
    return;
.................

}

TODO

[edit] How to expose Table as abstract Table (Serg's Draft)


struct TABLE
{
  TABLE_SHARE	*s;
  TABLE *share_next, **share_prev;
};

class Public_Table : private TABLE {
    char *get_name() { return s->table_name.str; }
    Public_Table *next_in_list() { return (Public_Table *)next; }
    bool GIS_Is_Supported () { return s->db_type->flags & HA_CAN_GEOMETRY; }
    Public_Table *next_InnoDB_table() {
      TABLE *n;
      for (n=next; n; n=n->next)
        if (n->s->db_type == innobase_hton_ptr)
          return n;
      return NULL;
    }
    void mark_as_pushdownable();
}

//*+++++++****************************************+

  ...
  TABLE *table;
  ...
  handler->AQT_push_down((Public_Table*)table);


Item_func()....


Public_Item_func() {
  next_child(iterator_state *state) {
    if (state->current == 0)
      return arg0;
    return arguments[state->current++ - 1];
  }
}

[edit] List of Objects and functionalities that need abstraction


 1)  Table Instance.   Currently: TABLE
 2)  Table Definition:  Currently: TABLE_SHARE
 3)  Column Definition/Instance.  Currently: Field
 4)  From  clause TableJoin Specification. Currently: TABLE_LIST
 5)  Index Information
 6)  Key Information
 7)  Table JOIN instance : Currently JOIN
 8)  Query Execution Plan: Currently list of JOIN_TAB's
 9)  Expression:  Currently: Item (it is fairly OK already)

[edit] Class Hierarchies of important classes

   
  Item Class Hierarchy :

    http://forge.mysql.com/w/images/1/16/ClassItem_inherit_graph.png

[edit] NDB Current pushdown implementation


We describe NDB implementation since this will be a good example 
input for AQT implementation. 

Current NDB pushdown is supported only for single table used as column
filters. There are certain restrictions on the type of filters.

Examples of supported pushdown where conditions:
 
     a > 2 and b > 3;      // i.e. select * from t where a>2 and b>3;
     a > 2 and b > 3 and c > 4;   // composite logical and/or OK
     a > 2  or b > 3;
     a > 2+3 and b < mod(10, 8);  // constant functions are OK
     str like '%hello%'  ;        
     mydate > '2008-02-25' ;   // date can be compared with const date string

     a in (2, 3, 4);     // rule is rewritten using OR
     a in (2, 3+5, 10);  // constant expressions are OK
     a  between  5 and 10;  // rule rewritten using AND

Column Types Supported : string,real,decimal,int, varbin, 
                         date, time, datetime, year

Other column types such as blob types are not supported.
Field Result Types Supported : String, real, int, decimal

 
The handler infrastructure for this push down condition is provided by handler
class COND *cond_push(const COND *cond) function which is implemented by NDB
engine. The COND structure is same as Item structure. 
   
The handler function ha_ndbcluster::cond_push(COND *cond) which implements the
current pushdown method is fairly complicated (850+ lines of code) though the
problem specification may sound simple. Here is a simplified *pseudo* code of
current NDB implementation (omitting much of lower level details) :

   // Return NULL on successful push else return rejected condition
   COND * ha_ndbcluster::cond_push(COND *cond)
   {
      context = Initialize a dynamic context object which specifies
                things like 
                (1) what Item Types are expected e.g. COND_ITEM, FUNC_ITEM
                (2) what result types are expected e.g. INT_RESULT,REAL_RESULT
                etc.
      item->traverse_cond(&ndb_serialize_cond, 
                 (void *) &context, Item::PREFIX);

      if (context.supported) return NULL  
      else  return  cond;

   }

  The Item class (and all it's subclasses) provides a traverse_cond() virtual 
function to traverse it's tree. For example, for Item_ident, this function will
be a trivial function to visit it's only node. For Item_func, it will invoke 
the given function on it's operands in the specified order.

     The ndb_serialize_cond() function is used to traverse the condition and 
generate filter condition for NDB in it's own list format. We will focus on the
logic of traversing and evaluating of the pushdown condition is supported or not. 

Pseudo code :

Step 0:

   At the top level, boolean expression is expected. 
   context :=  { Expected  Item types = FUNC_ITEM, COND_ITEM }
   Note: FUNC_ITEM used for <, >, >=, <=, =, NOT, IN, BETWEEN
         COND_ITEM used for AND, OR
     Internally Item_cond is a subclass of Item_func. 
     However Item_cond is used only for logical AND/OR operations.

Step 1: Call  ndb_serialize_cond() recursively on PREFIX order on item tree:
      item->traverse_cond(&ndb_serialize_cond, 
                 (void *) &context, Item::PREFIX);

   ndb_serialize_cond(Item *item, Context *ctxt)
   {

     if (  item type or item result type is not one of currently expected 
           types as per context)
                 mark context as "unsupported" and return;

   switch item type {
     case COND_ITEM :  Make sure it is AND/OR nothing else. 
                       Push this AND/OR condition into NDB condition list.
                       ctxt holds same item types to expect next:
                                {Expect: FUNC_ITEM, COND_ITEM type nodes} 
                       break;

     case FUNC_ITEM :  /* It is one of <, >, =, IN, BETWEEN, NOT etc */

                       [If the operator is one of IN, BETWEEN, 
                       equivalent AND/OR conditions are pushed into NDB list.
                       The context holds enough information to remember
                       the states across calls. Further details are omitted
                       in the interest of brevity regarding IN, BETWEEN ops.]

                       Push this condition node into NDB list.
                       Set Context to expect the type of operands next:
                       {ItemType:string,real,decimal,int,field,varbin;} 
                       {FieldResultType:- String, real, int, decimal}
                       break;

     case FIELD_ITEM : assert field->type() is one of supported types.
                       Note: blob types are not supported here.
                       Push this node into NDB list including column num info.
                       Mark Context to expect:
                             {No more Field Items}  // atmost 1 field per comp
                             {Expect compatible item types}
                       e.g. If field->type() is MYSQL_TYPE_LONG, 
                            then expect INT_ITEM
      
     case  INT_ITEM  :  If (no field item has been processed  before)
                        { set context to expect: {Only Field Item  next}}
                        else { /* both field and INT item processed now  
                                  Reset context to accept new conditions.
                                  e.g. a < 5 and b > 3 */
                           Context := {Expect FUNC_ITEM or COND_ITEM types}
                        }  
                        push this item into NDB serialized list.               

    case Item::DECIMAL_ITEM:  /* Similar */
    case Item::VARBIN_ITEM:   /* Similar */   
    case Item::REAL_ITEM:     /* Similar */
    case Item::STRING_ITEM:   /* Similar */

   }




/** @file  aqt_dict.h
    @mainpage   Abstract API Declarations for database objects.

    AQT dictionary API provides abstract dictionary interface for database 
objects such as Table, Column, Index, Database, etc. 
  
    Last updated: 2009 Apr 07

 */

/* Import standard portable declarations like int32 etc from my_global.h */

#include  "my_global.h"

namespace aqtdict {

/* Imported class declarations */

class String;
template<typename E> class RList;  /* Read only list */
class A_query_expr;
class A_object {
  /* place holder here. definition imported later. */
};


/* List of classes exported */

class  A_object;
class  A_database;
class  A_table;
class  A_base_table;
class  A_view_table;
class  A_column;
class  A_index;
class  A_fkey;
class  A_trigger;

class  A_table_constraint;
class  A_ukey;
class  A_pkey;
class  A_index_member;

/* Imported function declarations */

extern  A_table      *aqtdict_get_table(String *dbname, String *tablename);
extern  A_base_table *aqtdict_get_base_table(String *dbname, 
                                            String *tablename);
extern  A_column     *aqtdict_get_column(String *dbname, String *tablename, 
                                        String *colname);
extern  A_database   *aqtdict_get_database(String *dbname);
extern  A_view_table *aqtdict_get_view(String *dbname, String *viewname);
extern  A_trigger    *aqtdict_get_trigger(String *dbname, String *tablename,
                                         String  *triggername);

/* List of class definitions */

/**
   @class  A_database
   @brief  Abstract database class
 */

class  A_database : public A_object {
  virtual  int32  get_database_name();         /*< Get database name */
  virtual  RList<String> *get_table_names();   /*< Get the list of tables*/
  virtual  RList<String> *get_view_names();    /*< Get the list of views */
};

/**
   @class  A_table
   @brief  Abstract Table base class
 */

class  A_table : public A_object {

public :

  virtual A_database  *get_database();        /*< Get database */
  virtual int32  get_table_name(String *str); /*< Get table name */
  virtual int32  get_alias_name(String *str); /*< Get alias name */
  virtual RList<A_column>  *get_columns();    /*< Get column list */

  enum A_table_type {
          A_BASE_TABLE=1,                     /*< Base Table */
          A_DERIVED_TABLE,                    /*< Derived Table */
          A_VIEW_TABLE                        /*< View Table or View */
  } ;

  virtual A_table_type  get_table_type();      /*< Get table type */
  virtual int32  get_engine_name(String *str); /*< Get storage engine name*/
  virtual int32  get_charset_name(String *str);/*< Get charset name */
  virtual int32  get_collation_name(String *str);/*< Get collation name */

  enum A_row_format {
           A_ROW_FORMAT_DEFAULT, A_ROW_FORMAT_DYNAMIC, A_ROW_FORMAT_FIXED,
           A_ROW_FORMAT_COMPRESSED, A_ROW_FORMAT_REDUNDANT, 
           A_ROW_FORMAT_COMPACT };

  virtual A_row_format get_row_format();       /*< Get row format. */


};


/* This enum was copied from  mysql_com.h  */

enum base_field_type { MYSQL_TYPE_DECIMAL, MYSQL_TYPE_TINY,
                        MYSQL_TYPE_SHORT,  MYSQL_TYPE_LONG,
                        MYSQL_TYPE_FLOAT,  MYSQL_TYPE_DOUBLE,
                        MYSQL_TYPE_NULL,   MYSQL_TYPE_TIMESTAMP,
                        MYSQL_TYPE_LONGLONG,MYSQL_TYPE_INT24,
                        MYSQL_TYPE_DATE,   MYSQL_TYPE_TIME,
                        MYSQL_TYPE_DATETIME, MYSQL_TYPE_YEAR,
                        MYSQL_TYPE_NEWDATE, MYSQL_TYPE_VARCHAR,
                        MYSQL_TYPE_BIT,
                        MYSQL_TYPE_NEWDECIMAL=246,
                        MYSQL_TYPE_ENUM=247,
                        MYSQL_TYPE_SET=248,
                        MYSQL_TYPE_TINY_BLOB=249,
                        MYSQL_TYPE_MEDIUM_BLOB=250,
                        MYSQL_TYPE_LONG_BLOB=251,
                        MYSQL_TYPE_BLOB=252,
                        MYSQL_TYPE_VAR_STRING=253,
                        MYSQL_TYPE_STRING=254,
                        MYSQL_TYPE_GEOMETRY=255,
                        MAX_NO_FIELD_TYPES /* Should always be last */
};

class A_datatype : public  A_object  {
   virtual base_field_type get_base_type();
   /* Todo: Include type specific things here including :
         - length for bit, INT types, char types
         - (length, decimals) for float, decimal, "numeric" types
         - charset, collation info for char, text, enum, set types
         - value set for enum types
         - value set for set type
    */
};

/**
   @class  A_column
   @brief  Abstract column base class.
          This represents the column of an active table in database.
 */


class A_column : public A_object {

public:

  virtual  int  get_column_name(String *str); /*< Get column name */
  virtual  A_table *get_table();              /*< Get associated table */
  virtual  bool  is_nullable();               /*< Is column nullable? */
  /* Get default value for this column in String form */
  virtual int32  get_default_value_str(String *def);  
  /* Get column number. First column starts from 1. */
  virtual int32  get_column_number();    

  virtual  bool  is_unique();                /*< Is unique column? */
  virtual  bool  is_pkey();                  /*< Is primarykey column? */
  virtual  bool  is_sequence();              /*< Is auto increment col?*/

  virtual A_datatype  *get_datatype();       /*< Get SQL data type.  */

  /* Get Charset name. Applies to only char based types */
  virtual int32  get_charset_name(String *str);
  /* Get collation name. Applies to only char based types */
  virtual int32  get_collation_name(String *str);

};

class A_base_table : public A_table {

public :

   virtual bool  is_temp_table();               /*< Is this temp table?*/

   virtual RList<A_index>   *get_indexes();     /*< Get index list */
   virtual RList<A_trigger> *get_triggers();    /*< Get triggers list */

   virtual int32 get_table_comment(String *str);/*< Get table comment. */
   virtual int32 get_tablespace_name(String *str); /*< Get tablespace name*/

   /* 
      Get all kinds of constraints for this table:
      This includes: PrimaryKey, ForeignKey, Unique constraints.
   */
   virtual RList<A_table_constraint>  *get_constraints();

   virtual A_pkey  *get_primary_key();       /*< Get Primary Key. */
   
   virtual RList<A_fkey> *get_foreign_keys();/*< Get Foreign keys */

   /* Get the list of unique constraints. Includes pkey if exists */
   virtual RList<A_ukey> *get_unique_constraints();

   /* Get auto-increment starting value. */
   virtual uint64 get_autoincrement(); 

   /* 
     Get keyblock size in bytes. It is hint to storage engine.
     Value 0 indicates use of default. 
   */
   virtual int32 get_key_block_size();        

   /* Todo: Get partition info. */

};

class  A_view_table : public A_table {
public:
    virtual A_query_expr  *get_query_expr();    /*< Get query expr */
    virtual int32  get_sql(String *str);        /*< Get associated SQL */
};


/*
   @class A_index
   @brief Abstract class for Index.
      Note: An index may internally be created due to:
        1) "primary key" constraint specification 
        2) "unique" constraint specification 
        3) explicit index specification
   Non-unique indexes are always created as part of explicit specification.
   Explicit specification of index always creates new index and mysql 
   does not attempt to "share" internally generated indexes as part of 
   unique/pkey specification. 

   The index name and constraint name are same when the index is created
   for the constraint (either primary key or unique key).

   Currently foreign key specification does not auto generate index, hence
   an index is not associated with any foreign key in referencing table.

 */

class A_index : public A_object {

public:

    virtual  int32 get_index_name(String *str);/*< Get index name */
    virtual  A_table *get_table();             /*< Get associated table */
    virtual  RList<A_index_member> *get_members();/*< Get index members */
    virtual  bool  is_unique();                /*< Is unique index? */
    virtual  bool  is_clustered();             /*< Is clustered index? */
    virtual  bool  is_pkey_index();            /*< Is primary key index?*/
    virtual  bool  is_system_generated();      /*< System generated index?*/

    /*
      This returns associated source constraint for this index.
      Note: Single index can atmost be associated with single constraint:
            This could be primary or unique constraint.
            Non-unique index is not associated with any constraints
            and will return NULL.
     */
    virtual  A_table_constraint *get_index_constraint();

    /* Enum type for index types.  */
    enum enum_index_type  { A_INDEX_TYPE_PRIMARY, A_INDEX_TYPE_UNIQUE, 
                            A_INDEX_TYPE_MULTIPLE, A_INDEX_TYPE_FULLTEXT, 
                            A_INDEX_TYPE_SPATIAL,
                            A_INDEX_TYPE_FOREIGN_KEY};

    virtual  enum_index_type  get_index_type(); /*< Get index type. */

    enum enum_index_algo  { A_INDEX_ALGO_UNDEF=0,
                            A_INDEX_ALGO_BTREE, 
                            A_INDEX_ALGO_RTREE,
                            A_INDEX_ALGO_HASH,
                            A_INDEX_ALGO_FULLTEXT };

    virtual  enum_index_algo   get_index_algo();  /*< Get Index Algorithm*/
    /* 
      Get keyblock size in bytes. It is hint to storage engine.
      Value 0 indicates use of default. 
    */
    virtual  int32 get_key_block_size();
};

class A_index_member : public  A_object {

public :

     virtual  A_column   *get_column();        /* Get associated column */
     /* 
        Index may be created on leading part of column values.
        Return value of 0 indicates entire column is used for index.
     */
     virtual int32 get_index_member_partial_length(); 
     /* 
        Index member can be ascending or descending. 
        Currently it is always used as ascending in mysql.
      */
     virtual bool  is_ascending();
};

/*
   @class  A_table_constraint
   @brief  Abstract table constraint clause.

Note: Check SQL Standard 5WD-02-Foundation-2003-09 4.17 
      Integrity Constraints for more description about the constraints.
 
   Constraint  :=  Table Constraint | Domain Constraint | Assertion
    Table Constraint  :=  Involves one or more columns in a table
    Domain constraint :=  Restricted datatype e.g integer value > 3
                         can be used against any column
    Assertion         := Involves multiple tables

    MySQL currently supports table constraints only.

   MySQL assumes all constraints are non-deferrable and checked immediate.
   [ SQL standard supports the notion of constraint being either deferrable 
     or not. If deferrable, the current constraint mode can be
     "deferred" or "immediate". ]

   TableConstraint :- unique | referential(foreign_key) | check 

   unique-constraint(column-list) :- defined by:  "unique" | "primary key"
   referential-constraint(referencing-columns) :- 
      specifies referenced-table(referenced-columns);
      The <referenced-columns> should be of type "unique".
      The match type (full,partial,simple) defines semantics of match when 
      some columns can be null.
      The referential triggered action specifies what to do :
         i.e. ON delete|update  restrict|cascade|set-null|no-action 
      referenced table may be same as referencing table.

   check-constraint : constraint applied per row on a given table.
                      currently parsed and ignored by MySQL.

   unique key and primary key must be associated with an index.
   Foreign key constraint does not require an associated index in
   referencing table.

 */

class A_table_constraint : public  A_object {

public:

   /* Get associated base table. */
   virtual A_base_table  *get_base_table();
   virtual int get_constraint_name(String *name); /*<Get constraint name */
   /* 
      Currently all MySQL constraints are not deferrable.
      so this always returns false for now.
    */ 
   virtual bool is_deferrable();    
   /*
     For mysql, all constraints are not deferrable and 
     always checked immediate. So this always returns false for now.
    */
   virtual bool is_initially_deferred();

   enum  A_constraint_type {
       UNIQUE_KEY_CONSTRAINT,
       PRIMARY_KEY_CONSTRAINT,
       FOREIGN_KEY_CONSTRAINT
   };

   /* Get constraint type :  unique | primary_key | foreign_key */
   virtual A_constraint_type  get_constraint_type();

};

/*
  @class A_fkey
  @brief Abstract foreign key referential constraint 
 */
class A_fkey : public A_table_constraint {

public:

    /* Get associated referencing table. */
    virtual A_base_table *get_referencing_table();

    /* Get associated referenced table. */
    virtual A_base_table *get_referenced_table();

    /* Get referencing columns.  */
    virtual RList<A_column> *get_referencing_columns();

    /* Get referenced columns. */
    virtual RList<A_column> *get_referenced_columns();

    /*
       @enum  A_fkey_match_type
       @brief Match type for foreign key referential constraint.

       Match type provides semantics of match when some 
       referencing/referenced columns could be null.
     */
    enum  A_fkey_match_type { 
       /* 
          Either all referencing columns are null or non-nulls. Every 
          non-null referencing column must match with a referenced column.
        */
       FKEY_MATCH_FULL=1,
       /* Non-null values of referencing cols must match with referenced.*/
       FKEY_MATCH_PARTIAL, 
       /* If atleast one referencing column is NULL, no constraint imposed.
          Otherwise all non-null columns must match.
        */
       FKEY_MATCH_SIMPLE 
    };
    virtual A_fkey_match_type get_match_type();

    /*
       @enum  A_fkey_action_type
       @brief Referential action types for implementing foreign keys.
              The action specifies what to do after delete|update of the 
              referenced row. This could be one of: 
                   restrict|cascade|set-null|no-action|set-default
              The no-action is the default action type.
     */
   enum A_fkey_action_type {
        FKEY_NO_ACTION = 0,     /* No action. */
        FKEY_RESTRICT = 1,      /* Reject update/delete */
        FKEY_CASCADE = 2,       /* Cascade update/delete */
        FKEY_SET_NULL = 3,      /* Set referencing columns as null*/
        FKEY_SET_DEFAULT = 4    /* Set referencing cols to def value*/
   };

   /*  Returns action to take when a row in referenced table is deleted. */
   virtual A_fkey_action_type  get_on_delete_action();

   /*  Returns action to take when a row in referenced table is updated. */
   virtual A_fkey_action_type  get_on_update_action();

};

/*
  @class A_unique_constraint
  @brief Abstract unique key constraint
 */
class A_ukey  : public A_table_constraint {
public :
   virtual A_index  *get_unique_index();
};

/*
  @class A_pkey
  @brief Abstract primary key constraint
 */
class A_pkey : public A_ukey {
};


}  // End namespace aqtdict
 

Retrieved from "http://forge.mysql.com/wiki/SE_API_QFPD"

This page has been accessed 2,806 times. This page was last modified 06:23, 13 April 2009.

Find

Browse
MySQLForge
Main Page
Current events
Recent changes
Random page
Help
Edit
Edit this page
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Special pages
New pages
File list
Statistics
Bug reports
More...