MySQL Internals Coding Guidelines
← Back to MySQL Internals overview page
Contents |
[edit] Coding Guidelines
This section shows the guidelines that MySQL's developers follow when writing new code. Consistent style is important for us, because everyone must know what to expect. For example, after we become accustomed to seeing that everything inside an if is indented two spaces, we can glance at a listing and understand what's nested within what. Writing non-conforming code can be bad. For example, if we want to find where assignments are made to variable mutex_count, we might search for mutex_count with an editor and miss assignments that look like mutex_count = with a space before the equal sign (which is non-conforming). Knowing our rules, you'll find it easier to read our code, and when you decide to contribute (which we hope you'll consider!) we'll find it easier to read and review your code.
[edit] General Development Guidelines
- We use Bazaar for source management. More information about how to use it can be found in the MySQL Bazaar Howto.
- You should use the TRUNK source tree (currently called "mysql-5.4") for all new developments. See the Development Cycle for a more detailed description of the development model. The public development branch can be downloaded with
shell> bzr branch lp:mysql-server/5.4 mysql-5.4
- If you have any questions about the MySQL source, you can post them to the Internals Mailing List and we will answer them.
- Before making big design decisions, please begin by posting a summary of what you want to do, why you want to do it, and how you plan to do it. This way we can easily provide you with feedback and also discuss it thoroughly. Perhaps another developer can assist you.
[edit] C/C++ Coding Guidelines of MySQL Server
This section covers guidelines for C/C++ code for the MySQL server. The guidelines do not necessarily apply for other projects such as MySQL Connector/J or Connector/ODBC.
[edit] How we maintain the server coding guidelines
We are committed to have a single coding style for core MySQL server. Storage engines, however, may have an own coding style: Falcon and NDB styles are documented later in this manual.
The server coding style is governed by a group of representatives from each technical team: Optimiser, Runtime, Replication, Backup Engines and Maria.
Currently these representatives are:
- Sergey Petrunia -- Optimizer
- Konstantin Osipov - Runtime
- Mats Kindahl - Replication
- Chuck Bell - Backup
- Sergey Vojtovich - Engines
- Guilhem Bichot - Maria.
The group accepts and considers change proposals. Each proposal must include an implementation strategy, and is first published on Internals mailing list for a public discussion. When the discussion is over, the group of representatives holds a vote, and the change is accepted if it's approved by a simple majority of the ballots. The submitter of the change request then carries out its implementation.
Now to the coding style itself.
[edit] Indentation and Spacing
- For indentation use space; do not use the tab (\t) character. See the editor configuration tips at the end of this section for instructions on configuring a vim or emacs editor to use spaces instead of tabs.
- Use line feed (\n) for line breaks. Do not use carriage return + line feed (\r\n); that can cause problems for other users and for builds. This rule is particularly important if you use a Windows editor.
- To begin indenting, add two spaces. To end indenting, subtract two spaces. For example:
{
code, code, code
{
code, code, code
}
}
- The maximum line width is 80 characters. If you are writing a longer line, try to break it at a logical point and continue on the next line with the same indenting. Use of backslash is okay; however, multi-line literals might cause less confusion if they are defined before the function start.
- You may use empty lines (two line breaks in a row) wherever it seems helpful for readability. But never use two or more empty lines in a row. The only exception is after a function definition (see below).
- To separate two functions, use three line breaks (two empty lines). To separate a list of variable declarations from executable statements, use two line breaks (one empty line). For example:
int function_1()
{
int i;
int j;
function0();
}
int function2()
{
return;
}
- Matching '
{}' (left and right braces) should be in the same column, that is, the closing '}' should be directly below the opening '{'. Do not put any non-space characters on the same line as a brace, not even a comment. Indent within braces. Exception: if there is nothing between two braces, i.e. '{}', they should appear together. For example:
if (code, code, code)
{
code, code, code;
}
for (code, code, code)
{}
- Indent
switchlike this:
switch (condition)
{
case XXX:
statements;
case YYY:
{
statements;
}
}
- You may align variable declarations like this:
Type value; int var2; ulonglong var3;
- When assigning to a variable, put zero spaces after the target variable name, then the assignment operator ('=' '+=' etc.), then space(s). For single assignments, there should be only one space after the equal sign. For multiple assignments, add additional spaces so that the source values line up. For example:
a/= b; return_value= my_function(arg1); ... int x= 27; int new_var= 18;
Align assignments from one structure to another, like this:
foo->member= bar->member; foo->name= bar->name; foo->name_length= bar->name_length;
- Put separate statements on separate lines. This applies for both variable declarations and executable statements. For example, this is wrong:
int x= 11; int y= 12; z= x; y+= x;
This is right:
int x= 11; int y= 12; z= x; y+= x;
- Put spaces both before and after binary comparison operators ('>', '==', '>=', etc.), binary arithmetic operators ('+' etc.), and binary Boolean operators ('||' etc.). Do not put spaces around unary operators like '!' or '++'. Do not put spaces around [de-]referencing operators like '->' or '[]'. Do not put space after '*' when '*' introduces a pointer. Do not put spaces after '('. Put one space after ')' if it ends a condition, but not if it ends a list of function arguments. For example:
int *var; if ((x == y + 2) && !param->is_signed) function_call();
- When a function has multiple arguments separated by commas ('
,'), put one space after each comma. For example:
ln= mysql_bin_log.generate_name(opt_bin_logname, "-bin", 1, buf);
- Put one space after a keyword which introduces a condition, such as
iforfororwhile. - After
iforelseorwhile, when there is only one instruction after the condition, braces are not necessary and the instruction goes on the next line, indented.
if (sig != MYSQL_KILL_SIGNAL && sig != 0) unireg_abort(1); else unireg_end(); while (*val && my_isspace(mysqld_charset, *val)) *val++;
- In function declarations and invocations: there is no space between function name and '('; there is no space or line break between '(' and the first argument; if the arguments do not fit on one line then align them. Examples:
Return_value_type *Class_name::method_name(const char *arg1,
size_t arg2, Type *arg3)
return_value= function_name(argument1, argument2, long_argument3,
argument4,
function_name2(long_argument5,
long_argument6));
return_value=
long_long_function_name(long_long_argument1, long_long_argument2,
long_long_long_argument3,
long_long_argument4,
long_function_name2(long_long_argument5,
long_long_argument6));
Long_long_return_value_type *
Long_long_class_name::
long_long_method_name(const char *long_long_arg1, size_t long_long_arg2,
Long_long_type *arg3)
(You may but don't have to split Class_name::method_name into two lines.)
When arguments do not fit on one line, consider renaming them.
- Format constructors in the following way:
Item::Item(int a_arg, int b_arg, int c_arg)
:a(a_arg), b(b_arg), c(c_arg)
{}
But keep lines short to make them more readable:
Item::Item(int longer_arg, int more_longer_arg)
:longer(longer_arg),
more_longer(more_longer_arg)
{}
If a constructor can fit into one line:
Item::Item(int a_arg) :a(a_arg) {}
[edit] Naming Conventions
- For identifiers formed from multiple words, separate each component with underscore rather than capitalization. Thus, use
my_varinstead ofmyVarorMyVar. - Avoid capitalization except for class names; class names should begin with a capital letter.
class Item; class Query_arena; class Log_event;
- Avoid function names, structure elements, or variables that begin or end with '
_'. - Use long function and variable names in English. This will make your code easier to read for all developers.
- Structure types are
typedef'ed to an all-upper-case identifier. - All
#definedeclarations should be in upper case.
#define MY_CONSTANT 15
- Enumeration names should begin with
enum_. - Function declarations (forward declarations) have parameter names in addition to parameter types.
[edit] Commenting Code
- Comment your code when you do something that someone else may think is not trivial.
- As of 2007, MySQL uses Doxygen comments for file, section, class, structure, function and method comments. The source code documentation generated by Doxygen is available online. MySQL uses the Javadoc flavor of Doxygen commenting style, e.g., @tag rather than \tag. The basic tags in use are @param, @retval and (maybe) @return; occasionallly @see, @warning, @note, @todo, @file, (rarely) @class, (rarely) @fn, and (only when necessary) @brief and @details.
- Put a comment in front of its subject. In particular, function and method comments should be placed in front of implementation rather than declaration. Class comments should be put in front of class declaration.
- When writing multi-line comments please put the '/*' and '*/' on their own lines, put the '*/' below the '/*', put a line break and a two-space indent after the '/*', do not use additional asterisks on the left of the comment.
/* This is how a multi-line comment in the middle of code should look. Note it not Doxygen-style if it's not at the beginning of a code enclosure (function or class). */ /* ********* This comment is bad. It's indented incorrectly, it has * additional asterisks. Don't write this way. * *********/
- When writing single-line comments, the '/*' and '*/" are on the same line. For example:
/* We must check if stack_size = Solaris 2.9 can return 0 here */
- For a short comment at the end of a line, you may use either /* ... */ or a
//double slash. In C files or in header files used by C files, avoid//comments. - Align short side
//or /* ... */ comments by 48 column (start the comment in column 49).
{
qc*= 2; /* double the estimation */
}
- When commenting members of a structure or a class, align comments by 48th column. If a comment doesn't fit into one line, move it to a separate line. Do not create multiline comments aligned by 48th column.
struct st_mysql_stmt
{
...
MYSQL_ROWS *data_cursor; /**< current row in cached result */
/* copy of mysql->affected_rows after statement execution */
my_ulonglong affected_rows;
my_ulonglong insert_id; /**< copy of mysql->insert_id */
/*
mysql_stmt_fetch() calls this function to fetch one row (it's different
for buffered, unbuffered and cursor fetch).
*/
int (*read_row_func)(struct st_mysql_stmt *stmt,
...
};
- All comments should be in English.
- Each standalone comment must start with a Capital letter.
- There is a '.' at the end of each statement in a comment paragraph (for the last one as well).
/* This is a standalone comment. The comment is aligned to fit 79 characters per line. There is a dot at the end of each sentence. Including the last one. */
- Every structure, class, method or function should have a description unless it is very short and its purpose is obvious.
- Use the below example as a template for function or method comments.
- Please refer to Doxygen Manual and CommunityDoxygenProject for additional information.
- Note the IN and OUT parameters. IN is implicit, but can (but usually shouldn't) be specified with tag @param[in]. For OUT and INOUT parameters you should use tags @param[out] and @param[in,out] respectively.
- Parameter specifications in @param section start with lowercase and are not terminated with a full stop/period.
- Section headers are aligned at 2 spaces. This must be a sentence with a full stop/period at the end. Iff the sentence must express a subject that contains a full stop such that Doxygen would be fooled into stopping early, then use the @brief and @details to explicitly mark them.
- Align @retval specifications at 4 spaces if they follow a @return description. Else, align at two spaces.
- Separate sections with an empty line.
- All function comments should be no longer than 79 characters per line.
- Put two line breaks (one empty line) between a function comment and its description.
/**
Initialize SHA1Context.
Set initial values in preparation for computing a new SHA1 message digest.
@param[in,out] context the context to reset
@return Operation status
@retval SHA_SUCCESS OK
@retval != SHA_SUCCESS sha error Code
*/
int sha1_reset(SHA1_CONTEXT *context)
{
...
[edit] Additional suggestions
- Try to write code in a lot of black boxes that can be reused or at least use a clean, easy to change interface.
- Reuse code; There are already many algorithms in MySQL that can be reused for list handling, queues, dynamic and hashed arrays, sorting, etc.
- Use the
my_*functions likemy_read()/my_write()/my_malloc()that you can find in themysyslibrary, instead of the direct system calls; This will make your code easier to debug and more portable. - Use
libstringfunctions (in thestringsdirectory) instead of standardlibcstring functions whenever possible. For example, usebfill()andbzero()instead ofmemset(). - Try to always write optimized code, so that you don't have to go back and rewrite it a couple of months later. It's better to spend 3 times as much time designing and writing an optimal function than having to do it all over again later on.
- Avoid CPU wasteful code, even when its use is trivial, to avoid developing sloppy coding habits.
- If you can do something in fewer lines, please do so (as long as the code will not be slower or much harder to read).
- Do not check the same pointer for
NULLmore than once. - Never use a macro when an (inline) function would work as well.
- Do not make a function inline if you don't have a very good reason for it. In many cases, the extra code that is generated is more likely to slow down the resulting code than give a speed increase because the bigger code will cause more data fetches and instruction misses in the processor cache.
It is okay to use inline functions are which satisfy most of the following requirements:
- The function is very short (just a few lines).
- The function is used in a speed critical place and is executed over and over again.
- The function is handling the normal case, not some extra functionality that most users will not use.
- The function is rarely called. (This restriction must be followed unless the function translates to fewer than 16 assembler instructions.)
- The compiler can do additional optimizations with inlining and the resulting function will be only a fraction of size of the original one.
- Think assembly - make it easier for the compiler to optimize your code.
- Avoid using
malloc(), which is very slow. For memory allocations that only need to live for the lifetime of one thread, usesql_alloc()instead. - Functions should return zero on success, and non-zero on error, so you can do:
if (a() || b() || c())
error("something went wrong");
However, short-circuit evaluation like that above is not the best method for evaluating options.
- Using
gotois okay if not abused. - If you have an 'if' statement that ends with a 'goto' or 'return' you should NOT have an else statement:
if (a == b) return 5; else return 6; -> if (a == b) return 5; return 6;
- Avoid default variable initializations. Use
LINT_INIT()if the compiler complains after making sure that there is really no way the variable can be used uninitialized. - Use
TRUEandFALSEinstead oftrueandfalsein C++ code. This makes the code more readable and makes it easier to use it later in a C library, if needed. -
boolexists only in C++. In C, you have to usemy_bool(which ischar); it has different cast rules thanbool:
int c= 256*2; bool a= c; /* a gets 'true' */ my_bool b= c; /* b gets zero, i.e. 'false': BAD */ my_bool b= test(c); /* b gets 'true': GOOD */
In C++, use bool, unless the variable is used in C code (for example the variable is passed to a C function).
- Do not instantiate a class if you do not have to.
- Use pointers rather than array indexing when operating on strings.
- Never pass parameters with the
&variable_nameconstruct in C++. Always use a pointer instead!
The reason is that the above makes it much harder for the one reading the caller function code to know what is happening and what kind of code the compiler is generating for the call.
- Do not use the
%pmarker ofprintf()(fprintf(),vprintf(), etc) because it leads to different outputs (for example on some Linux and Mac OS X the output starts with0xwhile it does not on some Solaris). In MySQL 6.0 and later, usemy_vsnprint,DBUG_PRINTwith %p for pointer formatting consistent across different platforms.In earlier versions, use printf-family functions with0x%lx, but beware it truncates pointers on 64-bit Windows. Being sure that there is always0xenables us to quickly identify pointer values in the DBUG trace. - Relying on loop counter variables being local to the loop body if declared in the
forstatement is not portable. Some compilers still don't implement this ANSI C++ specification. The symptom of such use is an error like this:
c-1101 CC: ERROR File = listener.cc, Line = 187
"i" has already been declared in the current scope.
for (int i= 0; i < num_sockets; i++)
[edit] Suggested mode in emacs
(require 'font-lock)
(require 'cc-mode)
(setq global-font-lock-mode t) ;;colors in all buffers that support it
(setq font-lock-maximum-decoration t) ;;maximum color
(c-add-style "MY"
'("K&R"
(c-basic-offset . 2)
(c-comment-only-line-offset . 0)
(c-offsets-alist . ((statement-block-intro . +)
(knr-argdecl-intro . 0)
(substatement-open . 0)
(label . -)
(statement-cont . +)
(arglist-intro . c-lineup-arglist-intro-after-paren)
(arglist-close . c-lineup-arglist)
(innamespace . 0)
(inline-open . 0)
(statement-case-open . +)
))
))
(defun mysql-c-mode-hook ()
(interactive)
(require 'cc-mode)
(c-set-style "MY")
(setq indent-tabs-mode nil)
(setq comment-column 48))
(add-hook 'c-mode-common-hook 'mysql-c-mode-hook)
[edit] Basic vim setup
set tabstop=8 set shiftwidth=2 set backspace=2 set softtabstop set smartindent set cindent set cinoptions=g0:0t0c2C1(0f0l1 set expandtab
[edit] Another vim setup
set tabstop=8
set shiftwidth=2
set bs=2
set et
set sts=2
set tw=78
set formatoptions=cqroa1
set cinoptions=g0:0t0c2C1(0f0l1
set cindent
function InsertShiftTabWrapper()
let num_spaces = 48 - virtcol('.')
let line = ' '
while (num_spaces > 0)
let line = line . ' '
let num_spaces = num_spaces - 1
endwhile
return line
endfunction
" jump to 48th column by Shift-Tab - to place a comment there
inoremap <S-tab> <c-r>=InsertShiftTabWrapper()<cr>
" highlight trailing spaces as errors
let c_space_errors=1
[edit] C++ Coding Guidelines of Falcon storage engine
Falcon uses CamelCase aka JavaStyleNames. Class names start with a capital letter, other names start with a lower case letter. Each class has its own .cpp and .h file, with the name ClassName.cpp and ClassName.h. The exception is ha_falcon.cpp the top level module in the MySQL interface.
Header files contain no code except inline functions. Cross-referencing of header files is minimized. Individual methods are short. Individual classes are small and specialized.
Related classes start with the same words or initials. For example, all the classes that represent serial log entries start with SRL, while the classes that control the serial log start with SerialLog.
More specific rules about code.
Falcon uses an indent of 4, rather than two. Specifically, Falcon uses tabs for indentation and requires that the tab stop be set to four spaces.
Comments are normally in the c++ manner - // comment. The /* */ format is generally used to comment out blocks of code.
Brackets, except the opening bracket of a method, are indented to the level of the code they enclose.
void Table::methodOne()
{
if (a == 1)
{
b = 2;
c = 2;
}
Leave blank lines before and after comments and conditional statements.
// Comment for the next for loop
for (x = 1; x < 10; x++)
{
y = myAverage (x, a);
if (x == y)
break;
else
a = x;
}
For emacs users already using our custom "MY" c-mode, you can use this auto-mode-alist
to trigger Falcon specific indentation rules whenever you are in 'storage/falcon' directory.
;; Falcon style
(defun falcon-c-mode ()
"MySQL C mode with adjustments for Falcon"
(c-mode)
(c-set-style "MY")
(setq c-basic-offset 4)
(setq tab-width 4)
(c-set-offset 'defun-open 0)
(c-set-offset 'substatement-open '+)
(c-set-offset 'statement-block-intro 0)
)
(setq auto-mode-alist (cons '("/storage/falcon/" . falcon-c-mode) auto-mode-alist))
For emacs users, here's a URL that describes setting tab sizes... http://www.student.northpark.edu/pemente/emacs_tabs.htm
For emacs users who aren't interested in learning all about tabs, here's an excerpt from that site.
For this session, set the tab-width to 4 characters
M-x set-variable<RET> tab-width<RET> 4
Permanently set the default tab-width to 4 characters
(setq default-tab-width 4); # add this to your .emacs file
For this file, set the tab-width to 4 characters
-*- tab-width:4 -*- # put this on line #1 of the file
# more about using File Variables
The macro __WIN__ is not defined for Visual Studio 7 and should not be used for conditional compilation. The symbol _WIN32, however, is defined for all versions of Visual Studio and Windows, including 64 bit version.
For consistency, write Windows' conditionals as
#ifdef _WIN32
rather than
#if defined(_WIN32)
[edit] C++ Coding Guidelines of NDB storage engine
The mysqld handler part of NDB (ha_ndbcluster.cc, ha_ndbcluster_binlog.cc,
etc.) uses the same coding style as the rest of the mysqld code.
The non-mysqld part of NDB code has a long history, and use a multitude of coding styles. When modifying and extending existing source files or modules, the coding style already used in that code should be followed in terms of indentations, naming conventions, etc. For completely new code, the mysqld conventions (with exceptions below) should probably be followed.
Do not do any change to NDB code purely for the sake of changing from one formatting style to another. It just causes merge annoyances and makes patches harder to read, and we do not expect the style to ever become 100% consistent across all of the source code. It is however ok to fix inconsistent style in lines that are changed for other reasons.
One convention that should be followed for all new or modified code, in both mysqld and non-mysqld parts of the code, is that class member variables should be named with lowercase words separated by underscores
'_', and pre-fixed with 'm_'. Like this:
const char *m_my_class_member;
[edit] Braces
if, while, etc *must* always have braces.
eg. Good
if (a == b)
{
dosomething();
}
Braces should be on separate line like above.
e.g BAD
if (a == b) {
dosomething();
}
Inline methods inside class(struct) is ok to write like below, (i.e opening brace is on same line as function declaration)
struct A
{
A() {
}
}
[edit] Assignment
a = 3; // ok a= 3; // not ok
[edit] Use of ndbrequire
In the NDB kernel code, the ndbrequire() facility has historically been widely used. However, most of this is now considered misuse, and use of ndbrequire should generally be avoided. Over time, we want to remove most or all ndbrequires.
There are three different classes of ndbrequire() usage, with corresponding replacement as follows:
- Verification of code logic, hitting this is a real bug, and the error message should be accordingly. For this one option is ndbassert() (only enabled in debug builds), or we might need to add ndbchecklogic() or similar.
- Hitting a configurable limit, which cannot be handled gracefully. For this one should use ndbrequireErr(). The error message should suggest config change to correct the problem, or refer to a section in the manual to read more.
- Hitting hardcoded limits; we should really try to avoid this, but if it is unavoidable, or if it is a limit we think we will never hit, use ndbrequireErr() and add appropriate error message.
[edit] DBUG Tags
The full documentation of the DBUG library is in files dbug/user.* in the MySQL source tree.
Here are some of the DBUG tags we now use:
-
enter
Arguments to the function.
-
exit
Results from the function.
-
info
Something that may be interesting.
-
warning
When something doesn't go the usual route or may be wrong.
-
error
When something went wrong.
-
loop
Write in a loop, that is probably only useful when debugging the loop. These should normally be deleted when you are satisfied with the code and it has been in real use for a while.
Some tags specific to mysqld, because we want to watch these carefully:
-
trans
Starting/stopping transactions.
-
quit
info when mysqld is preparing to die.
-
query
Print query.