Category: MySQLDevelopment

MySQL Internals Guided Tour

← Back to MySQL Internals overview page

Contents

[edit] A Guided Tour Of The MySQL Source Code

What we're about to do in this section is pick up the latest copy of the MySQL source code off the Internet. Then we'll get a list of the directories and comment on why they're there. Next we'll open up some of the files that are vital to MySQL's working, and comment on specific lines in the source code. We'll close off with a few pictures of file formats.

[edit] Getting the source tree

We want to download the latest, the very latest, version of the MySQL server. So we won't click Downloads on the MySQL Developer Zone page that's usually a few weeks old. Instead we'll use Bazaar, which is a revision control package, vaguely like CVS or Perforce. This is what MySQL's developers use every day, so what we download with Bazaar is usually less than a day old. If you've ever submitted a bug report and gotten the response "thanks, we fixed the bug in the source code repository" that means you can get the fixed version with Bazaar.

The MySQL Bazaar Howto provides detailed instructions on how to install and use Bazaar. The following commands are just a quick walkthrough (assuming you have Bazaar up and running on your local system and can access the Internet).

The general syntax to create a local copy (branch) of a Bazaar source code repository is:

shell> bzr branch <remote repository> <local name>

To obtain a local copy of the MySQL Server 6.0 source tree, this would require the following commands:

shell> bzr init-repo $HOME/mysql-server
shell> cd $HOME/mysql-server
shell> bzr branch lp:mysql-server/5.1 mysql-5.1

(The $HOME directory is usually your personal area that you're allowed to write to. If that's not the case, replace $HOME with your personal choice whenever it appears.)

There is a lot of code, so the first time you do this the download will take over an hour. That's if you're lucky.

If you're glitch-prone, you'll need to read the manual: Installing from the Development Source Tree.

On later occasions, you'll be doing what's called a bzr pull instead of a bzr branch, and it will go faster (as it will only fetch the recent changes that have been commited to the remote repository since the last time you pulled).

Directories, Alphabetical Order

After bzr branch finished you'll have some 40 new sets of files on your computer, as you'll be able to see with ls or dir.

BUILD
Docs
NEW-RPMS
SSL
VC++Files
bdb
client
cmd-line-utils
config
dbug
extra
heap
include
innobase
libmysql
libmysql_r
libmysqld
man
myisam
myisammrg
mysql-test
mysys
ndb
netware
os2
pstack
regex
scripts
server-tools
sql
sql-bench
sql-common
strings
support-files
tests
tools
vio
zlib

These will all be installed as directories below the directory that was created by the bzr branch' command. At first all these directory names might intimidate you, and that's natural. After all, MySQL is a big project. But we're here to show you that there's order in this apparent chaos.

[edit] The Major Directories

  1. BUILD
  2. client
  3. Docs
  4. myisam
  5. mysys
  6. sql
  7. vio

The orderly approach is to look first at the most important directories, then we'll look at the whole list in our second pass. So, first, let's look at what you'll find in just seven of the directories: BUILD, client, Docs, myisam, mysys, sql, and vio.

[edit] Major Directories: BUILD

The first major directory we'll look at is BUILD. It actually has very little in it, but it's useful, because one of the first things you might want to do with the source code is: compile and link it.

The example command line that we could use is

shell> ./BUILD/compile-pentium-debug --prefix=$HOME/mysql-bin

It invokes a batch file in the BUILD directory. When it's done, you'll have an executable MySQL server and client.

Or, um, well, maybe you won't. Sometimes people have trouble with this step because there's something missing in their operating system version, or whatever. Don't worry, it really does work, and there are people around who might help you if you have trouble with this step. Search for "build" in the archives of lists.mysql.com.

We, when we're done building, tend to install it with the following sequence:

shell> make
shell> make install
shell> $HOME/mysql-bin/bin/mysql_install_db\
 --basedir=$HOME/mysql-bin\
 --datadir=$HOME/mysql-bin/var

This puts the new MySQL installation files on

shell> $HOME/mysql-bin/libexec      -- for the server
shell> $HOME/mysql-bin/bin          -- for the mysql client
shell> $HOME/mysql-bin/var          -- for the databases

[edit] GNU debugger

Once you've got something that runs, you can put a debugger on it. We recommend use of the GNU debugger

http://www.gnu.org/software/gdb/documentation/

And many developers use the graphical debugger tool DDD - Data Display Debugger

http://www.gnu.org/software/ddd/manual/

These are free and common, they're probably on your Linux system already.

There are debuggers for Windows and other operating systems, of course don't feel left out just because we're mentioning a Linux tool name! But it happens that we do a lot of things with Linux ourselves, so we happen to know what to say. To debug the mysqld server, say:

shell> ddd --gdb --args \
     $HOME/mysql-bin/libexec/mysqld \
     --basedir=$HOME/mysql-bin \
     --datadir=$HOME/mysql-bin/var\
     --skip-networking

From this point on, it may be tempting to follow along through the rest of the "guided tour" by setting breakpoints, displaying the contents of variables, and watching what happens after starting a client from another shell. That would be more fun. But it would require a detour, to discuss how to use the debugger. So we'll plow forward, the dull way, noting what's in the directories and using a text editor to note what's in the individual files.

[edit] Running a Test with the Debugger

To run a test named some.test with the debugger in embedded mode you could do this:

  1. Run libmysqld/examples/test_run --gdb some.test. This creates a libmysqld/examples/test-gdbinit file which contains the required parameters for mysqltest.
  2. Make a copy of the test-gdbinit file (call it, for example, some-gdbinit). The test-gdbinit file will be removed after test-run --gdb has finished.
  3. Load libmysqld/examples/mysqltest_embedded into your favorite debugger, for example: gdb mysqltest_embedded.
  4. In the debugger, for example in gdb, do: --sou some-gdbinit

Now some.test is running, and you can see if it's passing or not.

If you just want to debug some queries with the embedded server (not the test), it's easier to just run libmysqld/examples/mysql. It's the embedded server-based clone of the usual mysql tool, and works fine under gdb or whatever your favorite debugger is.

[edit] Major Directories: client

The next major directory is mysql-5.0/client.

size   name          comment
 ----   ----          -------
 100034 mysql.cc     "The MySQL command tool"
  36913 mysqladmin.c maintenance of MYSQL databases
  22829 mysqlshow.c  show databases, tables, or columns
 + 12 more .c and .cc programs
 

It has the source code of many of your familiar favorites, like mysql, which everybody has used to connect to the MySQL server at one time or another. There are other utilities too in fact, you'll find the source of most client-side programs here. There are also programs for checking the password, and for testing that basic functions such as threading or access via SSL are possible.

You'll notice, by the way, that we're concentrating on the files that have extension of ".c" or ".cc". By now it's obvious that C is our principal language although there are some utilities written in Perl as well.

[edit] Major Directories: Docs

The next major directory is mysql-5.0/Docs.

With the Bazaar downloads, /Docs is nearly empty. Binary and source distributions include some pre-formatted documentation files, such as the MySQL Reference manual in Info format (for Unix) or CHM format (for Windows). The mysqldoc documentation repository is available separately from http://dev.mysql.com/tech-resources/sources.html. If you have Subversion, you can check out a copy of the repository with this command:

svn checkout http://svn.mysql.com/svnpublic/mysqldoc/

Some important files in the mysqldoc repository are:

name                            comment
 ----                            -------
 .../sample-data/world/world.sql script to make 'world' database
 .../internals/internals.xml     internals manual
 .../refman/*.xml                reference manual
 + several more .xml files
 

Our documents are written in XML using DocBook. The DocBook format is becoming popular among publishers, and of course there's lots of general documentation for it, for example at http://www.docbook.org/. For our immediate purpose, the most interesting directory might be the internals directory, because it contains the source for the Internals Manual that you're reading now.

At this moment, the Internals Manual has more than 100 pages of information, including some details about the formats of MySQL files that you won't find anywhere else, and a complete description of the message formats that the client and server use to communicate. Although it's rough and may contain errors and is obsolete in parts, it is a document that you must read to truly understand the workings of MySQL.

[edit] Major Directories: myisam

The next major directory is labelled myisam. We will begin by mentioning that myisam is one of what we call the MySQL storage engine directories.

The MySQL storage engine directories:
heap           -- also known as 'memory'
innodb         -- maintained by Innobase Oy
myisam         -- see next section!
ndb            -- ndb cluster

For example the heap directory contains the source files for the heap storage engine and the ndb directory contains the source files for the ndb storage engine.

But the files in those directories are mostly analogues of what's in the myisam directory, and the myisam directory is sort of a 'template'.

On the myisam directory, you'll find the programs that do file I/O. Notice that the file names begin with the letters mi, by the way. That stands for MyISAM, and most of the important files in this directory start with mi.

File handling programs on mysql-5.0/myisam:

size   name           comment
 ----   ----           -------
  40301 mi_open.c      for opening
   3593 mi_close.c     for closing
   1951 mi_rename.c    for renaming
 + more mi_*.c programs
 

Row handling programs on mysql-5.0/myisam:

size   name           comment
 ----   ----           -------
  29064 mi_delete.c    for deleting
   2562 mi_delete_all.c for deleting all
   6797 mi_update.c    for updating
  32613 mi_write.c     for inserting
 + more mi_*.c programs
 

Drilling down a bit, you'll also find programs in the myisam directory that handle deleting, updating, and inserting of rows. The only one that's a little hard to find is the program for inserting rows, which we've called mi_write.c instead of mi_insert.c.

Key handling programs on mysql-5.0/myisam:

size   name           comment
 ----   ----           -------
   4668 mi_rkey.c      for random key searches
   3646 mi_rnext.c     for next-key searches
  15440 mi_key.c       for managing keys
 + more mi_*.c programs
 

The final notable group of files in the myisam directory is the group that handles keys in indexes.

To sum up: (1) The myisam directory is where you'll find programs for handling files, rows, and keys. You won't find programs for handling columns we'll get to them a bit later. (2) The myisam directory is just one of the handler directories. The programs in the other storage engine directories fulfill about the same functions.

[edit] Major Directories: mysys

The next major directory is labelled mysys, which stands for MySQL System Library. This is the toolbox directory, for example it has low level routines for file access. The .c files in mysys have procedures and functions that are handy for calling by main programs, for example by the programs in the myisam directory. There are 115 .c files in mysys, so we only can note a sampling.

Sampling of programs on mysql-5.0/mysys

size   name           comment
 ----   ----           -------
  17684 charset.c      character sets
   6165 mf_qsort.c     quicksort
   5609 mf__tempfile.c temporary files
 + 112 more *.c programs
 

Example one: with charset.c routines, you can change the character set.

Example two: mf_qsort.c contains our quicksort package.

Example three: mf_tempfile.c has what's needed for maintaining MySQL's temporary files.

You can see from these examples that mysys is a hodgepodge. That's why we went to the trouble of producing extra documentation in this document to help you analyze mysys's contents.

[edit] Major Directories: sql

The next major directory is mysql-5.0/sql. If you remember your manual, you know that you must pronounce this: ess queue ell.

The "parser" programs on mysql-5.0/sql:

size   name           comment
 ----   ----           -------
  51326 sql_lex.cc     lexer
 230026 sql_yacc.yy    parser
 + many more *.cc programs
 

This is where we keep the parser. In other words, programs like sql_lex.cc and sql_yacc.yy are responsible for figuring out what's in an SQL command, and deciding what to do about it.

The "handler" programs on mysql-5.0/sql:

size   name             comment
 ----   ----             -------
  79798 ha_berkeley.cc   bdb
  56687 ha_federated.cc  federated (sql/med)
  61033 ha_heap.cc       heap (memory)
 214046 ha_innodb.cc     innodb
  47361 ha_myisam.cc     myisam
  14727 ha_myisammrg.cc  merge
 215091 ha_ndbcluster.cc ndb
 

This is also where we keep the handler programs. Now, you'll recall that the storage engine itself, for example myisam, is a separate directory. But here in the sql directory, we have programs which are responsible for determining which handler to call, formatting appropriate arguments, and checking results. In other words, the programs that begin with the letters ha are the handler interface programs, and there's one for each storage engine.

The "statement" routines in mysql-5.0/sql:

size   name           comment
 ----   ----           -------
  24212 sql_delete.cc  'delete ...' statement
   1217 sql_do.cc      'do ...'
  22362 sql_help.cc    'help ...'
  75331 sql_insert.cc  'insert ...'
 430486 sql_select.cc  'select ...'
 130861 sql_show.cc    'show ...'
  42346 sql_update.cc  'update ...'
 + many more sql_*.cc programs
 

Also in the sql directory, you'll find individual programs for handling each of the syntactical components of an SQL statement. These programs tend to have names beginning with sql_. So for the SELECT statement, check out sql_select.cc.

Thus, there are "statement" routines like sql_delete.c, sql_load.c, and sql_help.c, which take care of the DELETE, LOAD DATA, and HELP statements. The file names are hints about the SQL statements involved.

The "statement function" routines in mysql-5.0/sql:

size   name           comment
 ----   ----           -------
  19906 sql_string.cc  strings
   6152 sql_olap.cc    olap (rollup)
  14241 sql_udf.cc     user-defined functions
  17669 sql_union.cc   unions
 

Then there are the routines for components of statements, such as strings, or online analytical processing which at this moment just means ROLLUP, or user-defined functions, or the UNION operator.

[edit] Major Directories: vio

The final major directory that we'll highlight is labelled vio, for "virtual I/O".

The vio routines are wrappers for the various network I/O calls that happen with different protocols. The idea is that in the main modules one won't have to write separate bits of code for each protocol. Thus vio's purpose is somewhat like the purpose of Microsoft's winsock library.

That wraps up our quick look at the seven major directories. Just one summary chart remains to do.

[edit] The Flow

This is a diagram of the flow.

 User enters "INSERT" statement     /* client */
 |
 |
 Message goes over TCP/IP line      /* vio, various */
 |
 |
 Server parses statement            /* sql */
 |
 |
 Server calls low-level functions   /* myisam */
 |
 |
 Handler stores in file             /* mysys */

The diagram is very simplified it's a caricature that distorts important things, but remember that we've only discussed seven major directories so far: Docs, BUILD, and the five that you see here.

The flow works like this:

First, the client routines get an SQL statement from a user, allowing edit, performing initial checks, and so on.

Then, via the vio routines, the somewhat-massaged statement goes off to the server.

Next, the sql routines handle the parsing and call what's necessary for each individual part of the statement. Along the way, the sql routines will be calling the low level mysys routines frequently.

Finally, one of the ha (handler) programs in the sql directory will dispatch to an appropriate handler for storage. In this case we've assumed, as before, that the handler is myisam so a myisam-directory program is involved. Specifically, that program is mi_write.c, as we mentioned earlier.

Simple, eh?

[edit] The Open-source Directories

We're now getting into the directories which aren't "major". Starting with:

dbug
pstack
regex
strings
zlib

Now it's time to reveal a startling fact, which is we didn't write all of the source code in all of the source code directories all by ourselves. This list is, in a sense, a tribute to the idea of open source.

There's dbug, which is Fred Fish's debug library.

There's pstack, which displays the process stack.

There's regex, which is what we use for our regular expressions function.

There's strings, the meaning of which is obvious.

There's zlib, which is for Lempel-Ziv compression.

All of the programs in these directories were supplied by others, as open source code. We didn't just take them, of course. MySQL has checked and changed what's in these directories. But we acknowledge with thanks that they're the products of other projects, and other people's labor, and we only regret that we won't have time to note all the contributed or publicly available components of MySQL, in this manual.

[edit] The Internal and External Storage Engine Directories

Continuing with our extract from the directory list ...

bdb                           /* external */
heap
innobase                      /* external */
myisam
myisammrg
ndb

Let's go through the idea of storage engines once more, this time with a list of all the storage engines, both the ones that we produce, and the ones that others produce. We've already mentioned the internal ones so now we'll remark on the directories of the two common external storage engines BDB and innobase.

The BDB, or Berkeley Database, handler, is strictly the product of Sleepycat software. Sleepycat has a web page at sleepycat.com, which contains, among other things, documentation for their product. So you can download Sleepycat's own documentation of the source code in the BDB directory.

As for the innobase handler, which many of you probably use, you'll be happy to know that the comments in the files are reasonably clear (the InnoBase Oy people are pretty strict about comments). There are two chapters about it in this document.

[edit] The "OS Specific" Directories

netware
NEW-RPMS
os2
VC++Files

A few words are in order about the directories that contain files which relate to a particular environment that MySQL can run in.

The netware directory contains a set of files for interfacing with netware, and anyone who has an involvement with NetWare knows that we're allied with them, and so this is one of the directories that represents the joint enterprise.

The NEW-RPMS directory (empty at time of writing) is for Linux, and the os2 directory is for OS/2.

Finally, the VC++Files directory is for Windows. We've found that the majority of Windows programmers who download and build MySQL from source use Microsoft Visual C. In the VC++Files directory you will find a nearly complete replication of what's in all the other directories that we've discussed, except that the .c files are modified to account for the quirks of Microsoft tools.

Without endorsing by particular names, we should note that other compilers from other manufacturers also work.

[edit] Odds and Ends

Finally, for the sake of completeness, we'll put up a list of the rest of the directories those that we haven't had occasion to mention till now.

Source Code Administration Directories:
SCCS

Common .h Files:
include

GNU Readline library and related:
cmd-line-utils

Stand-alone Utility & Test Programs:
extra
mysql-test
repl-tests
support-files
tests
tools

You don't have to worry about the administration directories since they're not part of what you build.

You probably won't have to worry about the stand-alone programs either, since you just use them, you don't need to remake them.

There's an include directory that you SHOULD have a look at, because the common header files for programs from several directories are in here.

Finally, there are stand-alone utility and test programs. Strictly speaking they're not part of the "source code". But it's probably reassuring to know that there's a test suite, for instance. Part of the quality-assurance process is to run the scripts in the test suite before releasing.

And those are the last. We've now traipsed through every significant directory created during your download of the MySQL source package.

[edit] A Chunk of Code in /sql/sql_update.cc

Now, having finished with our bird's eye view of the source code from the air, let's take the perspective of the worms on the ground (which is another name for MySQL's developer staff -- turn on laugh track here).

int mysql_update(THD *thd, ...)
{
   ...
   if ((lock_tables(thd, table_list)))
     DBUG_RETURN(1); ...
   ...
   init_read_record(&info,thd,table,select,0,1); ...
   while (!(error=info.read_record(&info)) && !thd->killed)
   {
      ...
     if (!(error=table->file->update_row((byte*) table->record[1]),
                                          (byte*) table->record[0])))
        updated++;
      ...
      if (table->triggers)
        table->triggers->process_triggers(thd, TRG_EVENT_UPDATE, TRG_ACTION_AFTER);
      ...
     }
   ...
   if (updated && (error <= 0 || !transactional_table))
   {
     mysql_bin_log.write(&qinfo) && transactional_table);
     ...
}

Here's a snippet of code from a .c file in the sql directory, specifically from sql_update.cc, which as we mentioned earlier -- is invoked when there's an UPDATE statement to process.

The entire routine has many error checks with handlers for improbable conditions, and showing multiple screens would be tedious, so we've truncated the code a lot. Where you see an ellipsis (three dots in a row), that means "and so on".

So, what do we learn from this snippet of code? In the first place, we see that it's fairly conventional C code. A brace causes an indentation, instructions tend to be compact with few unnecessary spaces, and comments are sparse.

Abbreviations are common, for example thd stands for thread, you just have to get used to them. Typically a structure will be defined in a separate .h file.

Routine names are sometimes long enough that they explain themselves. For example, you can probably guess that this routine is opening and locking, allocating memory in a cache, initializing a process for reading records, reading records in a loop until the thread is killed or there are no more to read, storing a modified record for the table, and (after the loop is through) possibly writing to the log. Incidentally, a transactional table is usually a BDB or an InnoDB table.

Obviously we've picked out what's easy to follow, and we're not pretending it's all smooth sailing. But this is actual code and you can check it out yourself.

[edit] The Skeleton Of The Server Code

And now we're going to walk through something harder, namely the server.

WARNING WARNING WARNING: code changes constantly, so names and parameters may have changed by the time you read this.

Important files we'll be walking through:

/sql/mysqld.cc
/sql/sql_parse.cc
/sql/sql_prepare.cc
/sql/sql_insert.cc
/sql/ha_myisam.cc
/myisam/mi_write.c

This is not as simple as what we've just done. In fact we'll need multiple pages to walk through this one, and that's despite our use of truncation and condensation again. But the server is important, and if you can grasp what we're doing with it, you'll have grasped the essence of what the MySQL source code is all about.

We'll mostly be looking at programs in the sql directory, which is where mysqld and most of the programs for the SQL engine code are stored.

Our objective is to follow the server from the time it starts up, through a single INSERT statement that it receives from a client, to the point where it finally performs the low level write in the MyISAM file.

Walking Through The Server Code: /sql/mysqld.cc

  int main(int argc, char **argv)
  {
    _cust_check_startup();
    (void) thr_setconcurrency(concurrency);
    init_ssl();
    server_init();                             // 'bind' + 'listen'
    init_server_components();
    start_signal_handler();
    acl_init((THD *)0, opt_noacl);
    init_slave();
    create_shutdown_thread();
    create_maintenance_thread();
    handle_connections_sockets(0);             // !
    DBUG_PRINT("quit",("Exiting main thread"));
    exit(0);
  }

Here is where it all starts, in the main function of mysqld.cc.

Notice that we show a directory name and program name just above this snippet. We will do the same for all the snippets in this series.

By glancing at this snippet for a few seconds, you will probably see that the main function is doing some initial checks on startup, is initializing some components, is calling a function named handle_connections_sockets, and then is exiting. It's possible that acl stands for "access control" and it's interesting that DBUG_PRINT is something from Fred Fish's debug library, which we've mentioned before. But we must not digress.

In fact there are 150 code lines in the main function, and we're only showing 13 code lines. That will give you an idea of how much we are shaving and pruning. We threw away the error checks, the side paths, the optional code, and the variables. But we did not change what was left. You will be able to find these lines if you take an editor to the mysqld.cc program, and the same applies for all the other routines in the snippets in this series.

The one thing you won't see in the actual source code is the little marker "// !". This marker will always be on the line of the function that will be the subject of the next snippet. In this case, it means that the next snippet will show the handle_connection_sockets function. To prove that, let's go to the next snippet.

Walking Through The Server Code: /sql/mysqld.cc

  handle_connections_sockets (arg __attribute__((unused))
  {
     if (ip_sock != INVALID_SOCKET)
     {
       FD_SET(ip_sock,&clientFDs);
       DBUG_PRINT("general",("Waiting for connections."));
       while (!abort_loop)
       {
         new_sock = accept(sock, my_reinterpret_cast(struct sockaddr*)
           (&cAddr),
            &length);
         thd= new THD;
         if (sock == unix_sock)
         thd->host=(char*) localhost;
         create_new_thread(thd);            // !
         }

Inside handle_connections_sockets you'll see the hallmarks of a classic client/server architecture. In a classic client/server, the server has a main thread which is always listening for incoming requests from new clients. Once it receives such a request, it assigns resources which will be exclusive to that client. In particular, the main thread will spawn a new thread just to handle the connection. Then the main server will loop and listen for new connections but we will leave it and follow the new thread.

As well as the sockets code that we chose to display here, there are several variants of this thread loop, because clients can choose to connect in other ways, for example with named pipes or with shared memory. But the important item to note from this section is that the server is spawning new threads.

Walking Through The Server Code: /sql/mysqld.cc

  create_new_thread(THD *thd)
  {
    pthread_mutex_lock(&LOCK_thread_count);
    pthread_create(&thd->real_id,&connection_attrib,
        handle_one_connection,                        // !
        (void*) thd));
    pthread_mutex_unlock(&LOCK_thread_count);
  }

Here is a close look at the routine that spawns the new thread. The noticeable detail is that, as you can see, it uses a mutex or mutual exclusion object. MySQL has a great variety of mutexes that it uses to keep actions of all the threads from conflicting with each other.

Walking Through The Server Code: /sql/sql_parse.cc

handle_one_connection(THD *thd)
  {
    init_sql_alloc(&thd->mem_root, MEM_ROOT_BLOCK_SIZE, MEM_ROOT_PREALLOC);
    while (!net->error && net->vio != 0 && !thd->killed)
    {
      if (do_command(thd))            // !
        break;
    }
    close_connection(net);
    end_thread(thd,1);
    packet=(char*) net->read_pos;

With this snippet, we've wandered out of mysqld.cc. Now, we're in the sql_parse file, still in the sql directory. This is where the session's big loop is.

The loop repeatedly gets and does commands. When it ends, the connection closes. At that point, the thread will end and the resources for it will be deallocated.

But we're more interested in what happens inside the loop, when we call the do_command function.

Graphic:

   client           <===== MESSAGE ====>       server
                    <======PACKETS ====>

   Example:
   INSERT INTO Table1 VALUES (1);

To put it graphically, at this point there is a long-lasting connection between the client and one server thread. Message packets will go back and forth between them through this connection. For today's tour, let's assume that the client passes the INSERT statement shown on the Graphic, for the server to process.

Walking Through The Server Code: /sql/sql_parse.cc

bool do_command(THD *thd)
{
  net_new_transaction(net);
  packet_length=my_net_read(net);
  packet=(char*) net->read_pos;
  command = (enum enum_server_command) (uchar) packet[0];
  dispatch_command(command,thd, packet+1, (uint) packet_length);
// !
}

You've probably noticed by now that whenever we call a lower-level routine, we pass an argument named thd, which is an abbreviation for the word thread (we think). This is the essential context which we must never lose.

The my_net_read function is in another file called net_serv.cc. The function gets a packet from the client, uncompresses it, and strips the header.

Once that's done, we've got a multi-byte variable named packet which contains what the client has sent. The first byte is important because it contains a code identifying the type of message.

We'll pass that and the rest of the packet on to the dispatch_command function.

Walking Through The Server Code: /sql/sql_parse.cc

bool dispatch_command(enum enum_server_command command, THD *thd,
       char* packet, uint packet_length)
{
  switch (command) {
    case COM_INIT_DB:          ...
    case COM_REGISTER_SLAVE:   ...
    case COM_TABLE_DUMP:       ...
    case COM_CHANGE_USER:      ...
    case COM_EXECUTE:
         mysql_stmt_execute(thd,packet);
    case COM_LONG_DATA:        ...
    case COM_PREPARE:
         mysql_stmt_prepare(thd, packet, packet_length);   // !
    /* and so on for 18 other cases */
    default:
     send_error(thd, ER_UNKNOWN_COM_ERROR);
     break;
    }

And here's just part of a very large switch statement in sql_parse.cc. The snippet doesn't have room to show the rest, but you'll see when you look at the dispatch_command function that there are more case statements after the ones that you see here.

There will be we're going into list mode now and just reciting the rest of the items in the switch statement code for prepare, close statement, query, quit, create database, drop database, dump binary log, refresh, statistics, get process info, kill process, sleep, connect, and several minor commands. This is the big junction.

We have cut out the code for all of the cases except for two, COM_EXECUTE and COM_PREPARE.

Walking Through The Server Code: /sql/sql_prepare.cc

We are not going to follow what happens with COM_PREPARE. Instead, we are going to follow the code after COM_EXECUTE. But we'll have to digress from our main line for a moment and explain what the prepare does.

"Prepare:
Parse the query
Allocate a new statement, keep it in 'thd->prepared statements' pool
Return to client the total number of parameters and result-set
metadata information (if any)"

The prepare is the step that must happen before execute happens. It consists of checking for syntax errors, looking up any tables and columns referenced in the statement, and setting up tables for the execute to use. Once a prepare is done, an execute can be done multiple times without having to go through the syntax checking and table lookups again.

Since we're not going to walk through the COM_PREPARE code, we decided not to show its code at this point. Instead, we have cut and pasted some code comments that describe prepare. All we're illustrating here is that there are comments in the code, so you will have aid when you look harder at the prepare code.

Walking Through The Server Code: /sql/sql_parse.cc

  bool dispatch_command(enum enum_server_command command, THD *thd,
       char* packet, uint packet_length)
  {
  switch (command) {
    case COM_INIT_DB:          ...
    case COM_REGISTER_SLAVE:   ...
    case COM_TABLE_DUMP:       ...
    case COM_CHANGE_USER:      ...
    case COM_EXECUTE:
         mysql_stmt_execute(thd,packet);                   // !
    case COM_LONG_DATA:        ...
    case COM_PREPARE:
         mysql_stmt_prepare(thd, packet, packet_length);
    /* and so on for 18 other cases */
    default:
     send_error(thd, ER_UNKNOWN_COM_ERROR);
     break;
    }

Let's return to the grand central junction again in sql_parse.cc for a moment. The thing to note on this snippet is that the line which we're really going to follow is what happens for COM_EXECUTE.

Walking Through The Server Code: /sql/sql_prepare.cc

  void mysql_stmt_execute(THD *thd, char *packet)
  {
    if (!(stmt=find_prepared_statement(thd, stmt_id, "execute")))
    {
      send_error(thd);
      DBUG_VOID_RETURN;
    }
    init_stmt_execute(stmt);
    mysql_execute_command(thd);           // !
  }

In this case, the line that we're following is the line that executes a statement.

Notice how we keep carrying the THD thread and the packet along with us, and notice that we expect to find a prepared statement waiting for us, since this is the execute phase. Notice as well that we continue to sprinkle error-related functions that begin with the letters DBUG, for use by the debug library. Finally, notice that the identifier "stmt" is the same name that ODBC uses for the equivalent object. We try to use standard names when they fit.

Walking Through The Server Code: /sql/sql_parse.cc

  void mysql_execute_command(THD *thd)
       switch (lex->sql_command) {
       case SQLCOM_SELECT: ...
       case SQLCOM_SHOW_ERRORS: ...
       case SQLCOM_CREATE_TABLE: ...
       case SQLCOM_UPDATE: ...
       case SQLCOM_INSERT: ...                   // !
       case SQLCOM_DELETE: ...
       case SQLCOM_DROP_TABLE: ...
       }

In the mysql_execute_command function. we encounter another junction. One of the items in the switch statement is named SQLCOM_INSERT.

Walking Through The Server Code: /sql/sql_parse.cc

case SQLCOM_INSERT:
{
  my_bool update=(lex->value_list.elements ? UPDATE_ACL : 0);
  ulong privilege= (lex->duplicates == DUP_REPLACE ?
                    INSERT_ACL | DELETE_ACL : INSERT_ACL | update);
  if (check_access(thd,privilege,tables->db,&tables->grant.privilege))
    goto error;
  if (grant_option && check_grant(thd,privilege,tables))
    goto error;
  if (select_lex->item_list.elements != lex->value_list.elements)
  {
    send_error(thd,ER_WRONG_VALUE_COUNT);
    DBUG_VOID_RETURN;
  }
  res = mysql_insert(thd,tables,lex->field_list,lex->many_values,
                     select_lex->item_list, lex->value_list,
                     (update ? DUP_UPDATE : lex->duplicates));
// !
  if (thd->net.report_error)
    res= -1;
  break;
}

For this snippet, we've blown up the code around the SQLCOM_INSERT case in the mysql_execute_command function. The first thing to do is check whether the user has the appropriate privileges for doing an INSERT into the table, and this is the place where the server checks for that, by calling the check_access and check_grant functions. It would be tempting to follow those functions, but those are side paths. Instead, we'll follow the path where the work is going on.

Walking Through The Server Code: Navigation Aid

Some program names in the /sql directory:

Program Name          SQL statement type
 ------------          ------------------
 sql_delete.cc         DELETE
 sql_do.cc             DO
 sql_handler.cc        HANDLER
 sql_help.cc           HELP
 sql_insert.cc         INSERT            // !
 sql_load.cc           LOAD
 sql_rename.cc         RENAME
 sql_select.cc         SELECT
 sql_show.cc           SHOW
 sql_update.cc         UPDATE
 

Question: Where will mysql_insert() be?

The line that we're following will take us next to a routine named mysql_insert. Sometimes it's difficult to guess what program a routine will be in, because MySQL has no consistent naming convention. However, here is one aid to navigation that works for some statement types. In the sql directory, the names of some programs correspond to statement types. This happens to be the case for INSERT, for instance. So the mysql_insert program will be in the program sql_insert.cc. But there's no reliable rule.

(Let's add here a few sentences about the tags 'ctags' program. When an editor supports ctags (and the list is long, but vi and emacs of course are there), the function definition is one key press away - no guessing involved. In the above case, a vim user could press ^] on mysql_insert name and vim would open sql_insert.cc and position the curson on the first line of the mysql_insert() function. The tags help can be indispensable in everyday work.)

Walking Through The Server Code: /sql/sql_insert.cc

  int mysql_insert(THD *thd,TABLE_LIST *table_list, List<Item> &fields,
        List<List_item> &values_list,enum_duplicates duplic)
  {
    table = open_ltable(thd,table_list,lock_type);
    if (check_insert_fields(thd,table,fields,*values,1) ||
      setup_tables(table_list) ||
      setup_fields(thd,table_list,*values,0,0,0))
      goto abort;
    fill_record(table->field,*values);
    error=write_record(table,&info);                 // !
    query_cache_invalidate3(thd, table_list, 1);
    if (transactional_table)
      error=ha_autocommit_or_rollback(thd,error);
    query_cache_invalidate3(thd, table_list, 1);
    mysql_unlock_tables(thd, thd->lock);
    }

For the mysql_insert routine, we're just going to read what's in the snippet. What we're trying to do here is highlight the fact that the function names and variable names are nearly English.

Okay, we start by opening a table. Then, if a check of the fields in the INSERT fails, or if an attempt to set up the tables fails, or if an attempt to set up the fields fails, we'll abort.

Next, we'll fill the record buffer with values. Then we'll write the record. Then we'll invalidate the query cache. Remember, by the way, that MySQL stores frequently-used select statements and result sets in memory as an optimization, but once the insert succeeds the stored sets are invalid. Finally, we'll unlock the tables.

Walking Through The Server Code: /sql/sql_insert.cc

  int write_record(TABLE *table,COPY_INFO *info)
  {
    table->file->write_row(table->record[0];           // !
  }

You can see from our marker that we're going to follow the line that contains the words 'write row'. But this is not an ordinary function call, so people who are just reading the code without the aid of a debugger can easily miss what the next point is in the line of execution here. The fact is, 'write_row' can take us to one of several different places.

Walking Through The Server Code: /sql/handler.h

  /* The handler for a table type.
     Will be included in the TABLE structure */

  handler(TABLE *table_arg) :
table(table_arg),active_index(MAX_REF_PARTS),
    ref(0),ref_length(sizeof(my_off_t)),
block_size(0),records(0),deleted(0),
    data_file_length(0), max_data_file_length(0),
index_file_length(0),
    delete_length(0), auto_increment_value(0), raid_type(0),
    key_used_on_scan(MAX_KEY),
    create_time(0), check_time(0), update_time(0), mean_rec_length(0),
    ft_handler(0)
    {}
  ...
  virtual int write_row(byte * buf)=0;

To see what the write_row statement is doing, we'll have to look at one of the include files. In handler.h on the sql directory, we find that write_row is associated with a handler for a table. This definition is telling us that the address in write_row will vary it gets filled in at run time. In fact, there are several possible addresses.

There is one address for each handler. In our case, since we're using the default values, the value at this point will be the address of write_row in the MyISAM handler program.

Walking Through The Server Code: /sql/ha_myisam.cc

int ha_myisam::write_row(byte * buf)
{
  statistic_increment(ha_write_count,&LOCK_status);
   /* If we have a timestamp column, update it to the current time */
   if (table->time_stamp)
    update_timestamp(buf+table->time_stamp-1);
   /*
  If we have an auto_increment column and we are writing a changed row
    or a new row, then update the auto_increment value in the record.
  */
  if (table->next_number_field && buf == table->record[0])
    update_auto_increment();
  return mi_write(file,buf);     // !
}

And that brings us to write_row in the ha_myisam.cc program. Remember we told you that these programs beginning with the letters ha are interfaces to handlers, and this one is the interface to the myisam handler. We have at last reached the point where we're ready to call something in the handler package.

Walking Through The Server Code: /myisam/mi_write.c

int mi_write(MI_INFO *info, byte *record)
{
  _mi_readinfo(info,F_WRLCK,1);
  _mi_mark_file_changed(info);
  /* Calculate and check all unique constraints */
  for (i=0 ; i < share->state.header.uniques ; i++)
  {
    mi_check_unique(info,share->uniqueinfo+i,record,
      mi_unique_hash(share->uniqueinfo+i,record),
      HA_OFFSET_ERROR);
  }

  ... to be continued in next snippet

Notice that at this point there is no more referencing of tables, the comments are about files and index keys. We have reached the bottom level at last. Notice as well that we are now in a C program, not a C++ program.

In this first half of the mi_write function, we see a call which is clearly commented. This is where checking happens for uniqueness (not the UNIQUE constraint, but an internal matter).

Walking Through The Server Code: /myisam/mi_write.c

 ... continued from previous snippet

  /* Write all keys to indextree */
  for (i=0 ; i < share->base.keys ; i++)
  {
    share->keyinfo[i].ck_insert(info,i,buff,
      _mi_make_key(info,i,buff,record,filepos)
  }
  (*share->write_record)(info,record);
  if (share->base.auto_key)
    update_auto_increment(info,record);
}

In this second half of the mi_write function, we see another clear comment, to the effect that this is where the new keys are made for any indexed columns. Then we see the culmination of all that the last 20 snippets have been preparing, the moment we've all been waiting for, the writing of the record.

And, since the object of the INSERT statement is ultimately to cause a write to a record in a file, that's that. The server has done the job.

Walking Through The Server Code: Stack Trace

main in /sql/mysqld.cc
handle_connections_sockets in /sql/mysqld.cc
create_new_thread in /sql/mysqld.cc
handle_one_connection in /sql/sql_parse.cc
do_command in /sql/sql_parse.cc
dispatch_command in /sql/sql_parse.cc
mysql_stmt_execute in /sql/sql_prepare.cc
mysql_execute_command in /sql/sql_parse.cc
mysql_insert in /sql/mysql_insert.cc
write_record in /sql/mysql_insert.cc
ha_myisam::write_row in /sql/ha_myisam.cc
mi_write in /myisam/mi_write.c

And now here's a look at what's above us on the stack, or at least an idea of how we got here. We started with the main program in mysqld.cc. We proceeded through the creation of a thread for the client, the several junction processes that determined where we're heading, the parsing and initial execution of an SQL statement, the decision to invoke the MyISAM handler, and the writing of the row. We ended in a low level place, where we're calling the routines that write to the file. That's about as low as we should go today.

The server program would, of course, continue by returning several times in a row, sending a packet to the client saying "Okay", and ending up back in the loop inside the handle_one_connection function.

We, instead, will pause for a moment in awe at the amount of code we've just flitted past. And that will end our walk through the server code.

Graphic: A Chunk of MyISAM File

CREATE TABLE Table1 (
   column1 CHAR(1),
   column2 CHAR(1),
   column3 CHAR(1));

INSERT INTO Table1 VALUES ('a', 'b', 'c');

INSERT INTO Table1 VALUES ('d', NULL, 'e');

F1 61 62 63 00 F5 64 00 66 00 ... .abc..d e.

Continuing with our worm's-eye view, let's glance at the structure of a record in a MyISAM file.

The SQL statements on this graphic show a table definition and some insert statements that we used to populate the table.

The final line on the graphic is a hexadecimal dump display of the two records that we ended up with, as taken from the MyISAM file for Table1.

The thing to notice here is that the records are stored compactly. There is one byte at the start of each record F1 for the first record and F5 for the second record which contains a bit list.

When a bit is on, that means its corresponding field is NULL. That's why the second row, which has a NULL in the second column, or field, has a different header byte from the first row.

Complications are possible, but a simple record really does look this simple.

Graphic: A Chunk of InnoDB File

19 17 15 13 0C 06 Field Start Offsets /* First Row */
00 00 78 0D 02 BF Extra Bytes
00 00 00 00 04 21 System Column #1
00 00 00 00 09 2A System Column #2
80 00 00 00 2D 00 84 System Column #3
50 50 Field1 'PP'
50 50 Field2 'PP'
50 50 Field3 'PP'

If, on the other hand, you look at an InnoDB file, you'll find that it's got more complexities in the storage. The details are elsewhere in this document. But here's an introductory look.

The header here begins with offsets unlike MyISAM, which has no offsets. So you'd have to go through column 1 before getting to column 2.

Then there is a fixed header the extra bytes.

Then comes the record proper. The first fields of a typical record contain information that the user won't see, such as a row ID, a transaction ID, and a rollback pointer. This part would look different if the user had defined a primary key during the CREATE TABLE statement.

And finally there are the column contents the string of Ps at the end of the snippet here. You can see that InnoDB does more administrating.

There's been a recent change for InnoDB; what you see above is from a database made before version 5.0.

Graphic: A Packet

Header
Number Of Rows
ID
Status
Length
Message Content

Our final worm's-eye look at a physical structure will be a look at packets.

By packet, we mean: what's the format of a message that the client sends over the tcp/ip line to the server and what does the server send back?

Here we're not displaying a dump. If you want to see hexadecimal dumps of the contents of packets, this document is full of them. We're just going to note that a typical message will have a header, an identifier, and a length, followed by the message contents.

Admittedly this isn't following a standard like ISO's RDA or IBM's DRDA, but it's documented so if you want to go out and write your own type 4 JDBC driver, you've got what you need here. (Subject to license restrictions, of course.) But a word of advice on that last point: it's already been done. Mark Matthews wrote it originally, it's all in "MySQL Connector/J".

[edit] Recap

Okay, let's back up and restate. In this walkthrough, we've told you four main things.

One: How to get the MySQL source.

Two: What's in each directory in the source.

Three: The main sequence, as one walks through the server code.

Four: What physical structures look like.

We worked hard to make a description of the MySQL source that is simple, without distorting. If you were able to follow all that we've said, then that's wonderful, congratulations. If you ended up thinking that MySQL is really simple, well that's not what we wanted to convey, but we think you'll be disabused of that notion when you have a look at the code yourself.

Retrieved from "http://forge.mysql.com/wiki/MySQL_Internals_Guided_Tour"

This page has been accessed 46,240 times. This page was last modified 21:51, 24 January 2011.

Find

Browse
MySQLForge
Main Page
Current events
Recent changes
Random page
Help
Edit
Edit this page
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Special pages
New pages
File list
Statistics
Bug reports
More...