Categories: Contributing | Development | MySQLUniversity

MySQL Storage Handler Interface

← Back to MySQL University main page
 Presenter: Brian Aker
 Time: Thursday 15. March 2007, at 6am PST = 9am EST = 15 CET = 16 EET
 Estimated length of Session: presenter to fill in here - note max 3 hours
 Moderator: Patrik Backman
 Scribe: MC Brown
 Scribe (for the Heidelberg Developer's Conference): Stefan
 Attendees: To register for this Session - Please enter your name here:
 Alexander Nozdrin
 Andrey Hristov
 HakanKuecuekyilmaz
 Marc Alff
 Martin MC Brown
 Sergei Golubchik
 Ingo Strüwing
 Hartmut Holzgraefe
 Giuseppe Maxia
 Dmitri Lenev
 Eric Herman (will have to leave early)
 Axel Schwenke
 Lars Thalmann
 Chris Powers
 Omer BarNir
 IgnacioGalarza
 User:TimSmith
 ArjenLentz
 User:GuilhemBichot
 Oleksandr Byelkin
 User:DanielFischer
 User:UlfWendel
 User:JanKneschke
 GeorgiKodinov

Contents

[edit] MySQL_Storage_Handler_Interface

http://krow.net/talks/New%20Storage%20Engine%20Talk.pdf

[edit] First Let us Talk about a st_plugin

 mysql_declare_plugin(memcache)
 {
    MYSQL_STORAGE_ENGINE_PLUGIN,
    &memcache_storage_engine,
    "MEMCACHE",
    "Brian Aker, Tangent Org",
    "Simple Interface for working with memcache as a storage engine",
    PLUGIN_LICENSE_GPL,
    memcache_init_func, /* Plugin Init */
    memcache_done_func, /* Plugin Deinit */
    0x0005,
    NULL,                  /* status variables        */
    NULL,                  /* system variables        */
    NULL                   /* config options          */
 },


[edit] Simple Handlerton

 static int example_init_func(void *p)
 {
   DBUG_ENTER("example_init_func");
   if (!example_init)
   {
     example_hton= (handlerton *)p;
     example_init= 1;
     VOID(pthread_mutex_init(&example_mutex,MY_MUTEX_INIT_FAST));
     (void) hash_init(&example_open_tables,system_charset_info,32,0,0,
                 (hash_get_key) example_get_key,0,0);
     example_hton->state= SHOW_OPTION_YES;
     example_hton->db_type= DB_TYPE_EXAMPLE_DB;
     example_hton->create= example_create_handler;
     example_hton->flags= HTON_CAN_RECREATE;
   }
   DBUG_RETURN(0);
 }

[edit] A Little More Complex

 federated_hton->db_type= DB_TYPE_FEDERATED_DB;
 federated_hton->commit= federated_commit;
 federated_hton->rollback= federated_rollback;
 federated_hton->create= federated_create_handler;
 federated_hton->panic= federated_db_end;
 federated_hton->flags= HTON_ALTER_NOT_SUPPORTED;

(If you want more complex, look at ha_ndbcluster.cc)

[edit] The handler

(think of it as a cursor)

[edit] Table Object Methods

[edit] bas_ext()

 static const char *ha_myisam_exts[[[]] = {
   ".MYI",
   ".MYD",
   NullS
 };
 const char **ha_myisam::bas_ext() const
 {
   return ha_myisam_exts;
 }

[edit] create()

 int ha_federated::create(const char *name, TABLE *table_arg,
                     HA_CREATE_INFO *create_info)
 {
   int retval;
   FEDERATED_SHARE tmp_share; // Only a temporary share, to test the url
   DBUG_ENTER("ha_federated::create");
   if (!(retval= parse_url(&tmp_share, table_arg, 1)))
     retval= check_foreign_data_source(&tmp_share, 1);
   my_free((gptr) tmp_share.scheme, MYF(MY_ALLOW_ZERO_PTR));
   DBUG_RETURN(retval);
 }


[edit] delete_table()

(not everyone has one)

 int ha_myisam::delete_table(const char *name)
 {
   return mi_delete_table(name);
 }

[edit] rename_table()

 void ha_heap::drop_table(const char *name)
 {
   heap_drop_table(file);
   close();
 }

[edit] Insert!

 T@3 : | | | | | | | | | | >ha_archive::open
 T@3 : | | | | | | | | | | | >ha_archive::get_share
 T@3 : | | | | | | | | | | | <ha_archive::get_share
 T@3 : | | | | | | | | | | | >ha_archive::create_record_buffer
 T@3 : | | | | | | | | | | | <ha_archive::create_record_buffer
 T@3 : | | | | | | | | | | <ha_archive::open
 T@3 : | | | | >ha_archive::start_bulk_insert
 T@3 : | | | | <ha_archive::start_bulk_insert
 T@3 : | | | | | >ha_archive::write_row
 T@3 : | | | | | | | >ha_archive::pack_row
 T@3 : | | | | | | | <ha_archive::pack_row
 T@3 : | | | | | <ha_archive::write_row
 T@3 : | | | | >ha_archive::end_bulk_insert
 T@3 : | | | | <ha_archive::end_bulk_insert

[edit] write_row()

 int ha_tina::write_row(byte * buf)
 {
   int size;
   DBUG_ENTER("ha_tina::write_row");
   if (share->crashed)
       DBUG_RETURN(HA_ERR_CRASHED_ON_USAGE);
   ha_statistic_increment(&SSV::ha_write_count);
   if (table->timestamp_field_type & TIMESTAMP_AUTO_SET_ON_INSERT)
     table->timestamp_field->set_time();
   size= encode_quote(buf);
   stats.records++;
   DBUG_RETURN(0);
 }

[edit] Read (Scan)

 T@3 : | | | | | | | >ha_archive::info
 T@3 : | | | | | | | <ha_archive::info
 T@3 : | | | | | | | | >ha_archive::rnd_init
 T@3 : | | | | | | | | <ha_archive::rnd_init
 T@3 : | | | | | | | >ha_archive::rnd_next
 T@3 : | | | | | | | | | | >ha_archive::unpack_row
 T@3 : | | | | | | | | | | <ha_archive::unpack_row
 T@3 : | | | | | | | | <ha_archive::get_row
 T@3 : | | | | | | | <ha_archive::rnd_next
 T@3 : | | | | | | | >ha_archive::rnd_next
 T@3 : | | | | | | | <ha_archive::rnd_next

[edit] info()

 int ha_archive::info(uint flag)
 {
   DBUG_ENTER("ha_archive::info");
   stats.records= share->rows_recorded;
   stats.deleted= 0;
   /* Costs quite a bit more to get all information */
   if (flag & HA_STATUS_TIME)
   {
     MY_STAT file_stat; // Stat information for the data file
     VOID(my_stat(share->data_file_name, &file_stat, MYF(MY_WME)));
     stats.mean_rec_length= table->s->reclength + buffer.alloced_length();
     stats.data_file_length= file_stat.st_size;
     stats.create_time= file_stat.st_ctime;
     stats.update_time= file_stat.st_mtime;
     stats.max_data_file_length= share->rows_recorded * stats.mean_rec_length;
   }
   stats.delete_length= 0;
   stats.index_file_length=0;
   if (flag & HA_STATUS_AUTO)
   {
     azflush(&archive, Z_SYNC_FLUSH);
     stats.auto_increment_value= archive.auto_increment;
   }
   DBUG_RETURN(0);
 }

[edit] rnd_init()

 int
 ha_innobase::rnd_init(
 /*==================*/
                   /* out: 0 or error number */
      bool scan) /* in: ???????? */
 {
      int err;
      row_prebuilt_t* prebuilt = (row_prebuilt_t*) innobase_prebui
      if (prebuilt->clust_index_was_generated) {
             err = change_active_index(MAX_KEY);
      } else {
             err = change_active_index(primary_key);
      }
      start_of_scan = 1;
      return(err);
 }

[edit] rnd_next()

 int
 ha_innobase::rnd_next(
 /*==================*/
                    /* out: 0, HA_ERR_END_OF_FILE, or error number */
      mysql_byte* buf)/* in/out: returns the row in this buffer,
                    in MySQL format */
 {
      int error;
      DBUG_ENTER("rnd_next");
      statistic_increment(current_thd->status_var.ha_read_rnd_next_count,
             &LOCK_status);
      if (start_of_scan) {
             error = index_first(buf);
             if (error == HA_ERR_KEY_NOT_FOUND) {
                    error = HA_ERR_END_OF_FILE;
             }
             start_of_scan = 0;
      } else {
             error = general_fetch(buf, ROW_SEL_NEXT, 0);
      }
      DBUG_RETURN(error);
 }

[edit] update_row()

 T@3 : | | | | >ha_tina::info
 T@3 : | | | | <ha_tina::info
 T@3 : | | | | | >ha_tina::rnd_init
 T@3 : | | | | | <ha_tina::rnd_init
 T@3 : | | | | | >ha_tina::extra
 T@3 : | | | | | <ha_tina::extra
 T@3 : | | | | >ha_tina::rnd_next
 T@3 : | | | | <ha_tina::rnd_next
 T@3 : | | | | >ha_tina::rnd_next
 T@3 : | | | | <ha_tina::rnd_next
 T@3 : | | | | >ha_tina::update_row
                       <------------------Update occurs here
 T@3 : | | | | <ha_tina::update_row
 T@3 : | | | | >ha_tina::rnd_next
 T@3 : | | | | <ha_tina::rnd_next
 T@3 : | | | | >ha_tina::update_row
 T@3 : | | | | <ha_tina::update_row
 T@3 : | | | | >ha_tina::rnd_next
 T@3 : | | | | <ha_tina::rnd_next
 T@3 : | | | | >ha_tina::extra
 T@3 : | | | | <ha_tina::extra
 T@3 : | | | | >ha_tina::rnd_end
 T@3 : | | | | | >ha_tina::write_meta_file
 T@3 : | | | | | <ha_tina::write_meta_file
 T@3 : | | | | <ha_tina::rnd_end

[edit] update_row()

 int ha_tina::update_row(const byte * old_data, byte * new_data)
 {
   int size;
   DBUG_ENTER("ha_tina::update_row");
   ha_statistic_increment(&SSV::ha_read_rnd_next_count);
   if (table->timestamp_field_type & TIMESTAMP_AUTO_SET_ON_UPDATE)
     table->timestamp_field->set_time();
   size= encode_quote(new_data);
   if (chain_append())
     DBUG_RETURN(-1);
   DBUG_RETURN(0);
 }

[edit] delete_row()

 T@3 : | | | | >ha_tina::info
 T@3 : | | | | <ha_tina::info
 T@3 : | | | | | >ha_tina::rnd_init
 T@3 : | | | | | <ha_tina::rnd_init
 T@3 : | | | | | >ha_tina::extra
 T@3 : | | | | | <ha_tina::extra
 T@3 : | | | | >ha_tina::rnd_next
 T@3 : | | | | <ha_tina::rnd_next
 T@3 : | | | | >ha_tina::rnd_next
 T@3 : | | | | <ha_tina::rnd_next
 T@3 : | | | | >ha_tina::delete_row
                   <--------------------- delete occurs here
 T@3 : | | | | <ha_tina::delete_row
 T@3 : | | | | >ha_tina::rnd_next
 T@3 : | | | | <ha_tina::rnd_next
 T@3 : | | | | >ha_tina::rnd_next
 T@3 : | | | | <ha_tina::rnd_next
 T@3 : | | | | >ha_tina::extra
 T@3 : | | | | <ha_tina::extra
 T@3 : | | | | >ha_tina::rnd_end
 T@3 : | | | | | >ha_tina::write_meta_file
 T@3 : | | | | | <ha_tina::write_meta_file
 T@3 : | | | | <ha_tina::rnd_end

[edit] delete_row()

 int ha_tina::delete_row(const byte * buf)
 {
   DBUG_ENTER("ha_tina::delete_row");
   ha_statistic_increment(&SSV::ha_delete_count);
   if (chain_append())
     DBUG_RETURN(-1);
   stats.records--;
   /* DELETE should never happen on the log table */
   DBUG_ASSERT(!share->is_log_table);
   DBUG_RETURN(0);
 }


[edit] index_read()

 int ha_memcache::read_range_first(const key_range *start_key,
               const key_range *end_key,
               bool eq_range, bool sorted)
 {
   DBUG_ENTER("ha_memcache::read_range_first");
   int ret;
   ret= find_row(table->record[[[0]], start_key->key,
             start_key->length);
   DBUG_RETURN(ret);
 }

[edit] index_next()

 int index_next(byte * buf)
 {
   return HA_ERR_END_OF_FILE;
 }

[edit] What about deining features?

[edit] Locks anyone?

[edit] store_lock()

 THR_LOCK_DATA **ha_tina::store_lock(THD *thd,
                          THR_LOCK_DATA **to,
                          enum thr_lock_type lock_type)
 {
   if (lock_type != TL_IGNORE && lock.type == TL_UNLOCK)
     lock.type=lock_type;
   *to++= &lock;
   return to;
 }

[edit] external_lock()

(yes, its a historical, and now horrible name)

 int ha_federated::external_lock(THD *thd, int lock_type)
 {
   int error= 0;
   ha_federated *trx= (ha_federated *)thd->ha_data[[[ht->slot]];
   DBUG_ENTER("ha_federated::external_lock");
   if (lock_type != F_UNLCK)
   {
     if (!(thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN)))
     {
       error= connection_autocommit(TRUE);
        trans_register_ha(thd, FALSE, ht);
     }
     else
     {
       DBUG_PRINT("info",("not autocommit"));
       if (!trx)
       {
         error= connection_autocommit(FALSE);
           thd->ha_data[[[ht->slot]]= this;
        }
       else
       {
         ha_federated *ptr;
         for (ptr= trx; ptr; ptr= ptr->trx_next)
           if (ptr == this)
             break;
           else if (!ptr->trx_next)
             ptr->trx_next= this;
       }
     }
   }
   DBUG_RETURN(0);
 }

[edit] Optimizer


[edit] What is missing?



[edit] Session notes

[edit] Questions & Answers

  1. Hartmut, Slide 2: "Are there plans for a plugin dependency mechanism to determine init/deinit order? (like we have for extensions in PHP land?)''
    • (Serg): Yes
    • (Brian): Yes there is. We didn't do this in the first version, but there is a plan so that you can say 'we require this plugin before startup'.
  2. dlenev, Slide 5: 'How 'cheap' is the handler? I mean one may want to have several cursors for the same operation and this case 'cheapness' comes into play.'
    • (Monty): The handler is relatively cheap to open again, especially if is already open. One current exception is the partition handler. Currently we don't support several cursors for the same handler, although it can easily be fixed with either cursor store/restore methods or a relatively cheap clone call. When using a handler as a cursor, the handler becomes a very lightweight dispatcher over the native handler implementation.
  3. TheK, Slide 11: 'Where did they get the handler for the fields?'
    • (Brian): The fields themselves are members of tables, and tables are a variable inside of the handler. For example:
 int ha_tina::encode_quote(byte *buf)
 {
  char attribute_buffer[1024];
  String attribute(attribute_buffer, sizeof(attribute_buffer),
                   &my_charset_bin);
  my_bitmap_map *org_bitmap= dbug_tmp_use_all_columns(table, table-
  buffer.length(0);
  for (Field **field=table->field ; *field ; field++)
  {
    const char *ptr;
    const char *end_ptr;
    /*
      Write an empty string to the buffer in case of a NULL value.
      Basically this is a safety check, as no one ensures that the
      field content is cleaned up every time we use 
      in the code.
    */
    if ((*field)->is_null())
    {
      buffer.append(STRING_WITH_LEN("\"\","));
(cheaper then current_thd())
      continue;
    }
  1. TheK : 'When did we get the handle to the table?'
    • (Brian): When the handler is created, we populate the handler with a reference to the actual table itself.
  2. Hakan, Slide 14: 'Does rnd == random || rnd == read next data?'
    • (Serg): It stands for Random, although the same functions are used for linear table reads.
    • (Brian): Yeah, random.
  3. Timothy, Slide 15: 'Why are statistics incremented in each handler, not at the upper level? it seems error-prone, or different engines could report statistics differently, which could make it confusing.'
    • (Serg): Correct, should be fixed (moved to ha_write_row()/etc wrappers). and it should be done in backward-compatible way (by defining ha_statistic_increment to be no-op)
    • (Brian): The answer to that is an historical question. We didn't have all handler functions in all engines, so we implemented them in lower layers, rather than upper ones. But yes, this should be fixed so that they are in upper layers in the wrapper functions.
  4. dlenev, Slide 22: 'What is the difference between table_flags() and handlerton flags ? Are there any capabilities which are dynamical ?'
    • (Serg): table_flags() is for dynamic capabilities, handlerton flags for static. You need to create/open a handler to access to the table_flags. So every flag that you need to know about in advance (i.e. that needs to be known before opening the table) should be in the handlerton flags.
    • (Brian): index_flags are dynamic. I'm not aware of any engine at the moment that uses dynamic table_flags. Table flags should probably be merged with handlerton flags. It's cheap to access handlerton flags because they are global in nature.
  5. Marc Alff, Slide 25: 'Why providing the full thd->ha_data array to each engine, as opposed to provide only the engine slot and rely on the storage engine code to usethe proper ht->slot ?'
    • (Brian): We don't have a special transaction object, which is why we have to use this current method.
  6. Sanja, Slide 25: 'How do you think external_lock() should be named?'
    • (Brian): It should still exist, but I think it should be used only for people who interested ion something like the flock() call. But we should remove the transaction state from it. I think the transaction state should be handled by the handlerton for the storage engine. So external_locks() should exist, it just shouldn't be handling transactions.
  7. Kostja, Slide 25: 'We don't call store_lock/external_lock for temporary tables. Why?
    • (Brian): We do not call store_lock/external_lock on internal temporary tables which are created either as memory or MyISAM.
    • (Monty): We do call external_lock for all transactional temporary tables, it is only non-transational temporary tables where we don't call extgern_lock/unlock.
    • Brian-> The difference in the above two answers is that Monty is referring to tmp tables created by users, I am talking about tmp tables created internally for use in GROUP BY, etc...

Questions and answers at the Heidelberg Developers Meeting

  1. write_row(): Why hasn't the statistics info (on/off) been moved to the layer above?
    • For historical reasons; it should actually be moved above.
  2. update_row(): Does Falcon use the FIELD class?
    • Yes.

Retrieved from "http://forge.mysql.com/wiki/MySQL_Storage_Handler_Interface"

This page has been accessed 6,200 times. This page was last modified 22:09, 24 January 2011.

Find

Browse
MySQLForge
Main Page
Current events
Recent changes
Random page
Help
Edit
Edit this page
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Special pages
New pages
File list
Statistics
Bug reports
More...