How to Create Good Tests
[edit] How to Create Good Test Cases
There are a lot of strict rules and rules of thumb which may increase the quality of tests written for the tool pair mysqltest/mysqltest-run.pl (MTR). The following presentation should give you some hints.
-
- This document should not replace reading our excellent manual about testing http://dev.mysql.com/doc/mysqltest/en/writing-tests.html.
- Some of the examples might not work
- with MTR1 or
- on some operating system.
- MTR2 means the second version of mysql-test-run.pl/mysqltest. MTR2 replaced MTR1 ~ Jan 2009 in MySQL 5.1 and 6.0.
- The term "protocol" means in most cases "output from a test case."
- October 2007 - Matthias Leich - Create this document for a MySQL University session
- March 2009 - Matthias Leich - Updates and corrections, add "Examples of suspicious scripts"
- March 2009 - Patrick Crews - Corrections, add "Notes on MTR"
- Miscellaneous Tips
- Stability checks for random timing problems
- Stability checks for hostname-related problems
- Stability checks for unexpected problems
- Row order within result sets
- Test case behavior should depend on the result of a SHOW
- Does a test execution hang?
- Your test is complicated, support debugging
- You need to know some syntax, option etc., but hate reading manuals
[edit] Formal stuff
Rules about formatting within scripts and similar stuff.
[edit] Coding style
Please have a look at the C/C++ coding guidelines and apply them when they make sense: Coding Guidelines
Some more or less strict rules:
- Try to have a good readable text flow. Lines with very different length are a pain.
- Avoid lines longer than ~ 100 characters unless there is no other choice.
- Think about a comparison of the new and the old version of a test script within a graphical diff tool.
- Having the difference frequent at the end of long lines is very uncomfortable.
- mysqltest only accepts comment lines starting with '#'.
- Use spaces, not tabs.
- Lines must have no trailing spaces.
- Write SQL statements in the style of the MySQL Reference Manual
- SQL keywords and reserved words: uppercase
- Identifiers (table names, column names, etc.): lowercase
- Ignore this guideline if your intent is to check the processing of mixed lettercases ;-)
- Please follow this guideline for new tests. Rewriting existing tests to use better formatting is good, but can be tedious - a rule of thumb: don't rewrite unless you intend to touch the entire subtest (not the entire .test file, but the statements that make up a complete testing 'unit')
- If an SQL statement is long, add line breaks to reformat it and make it easier to read.
[edit] SQL statement example
SELECT f1 AS "my_column", f10 ....
FROM mysqltest1.t5
WHERE (f2 BETWEEN 17 AND 25 OR f2 = 61)
AND f3 IN (SELECT ....
FROM mysqltest1.t4
WHERE .....)
ORDER BY ...
[edit] Please avoid too dense code
except you intend to check the parser or the performance of the reader of the code ;-).
Unfortunate example:
select 1+1,1-1,1+1*2,8/5,8%5,mod(8,5),mod(8,5)|0,-(1+1)*-2;
Improved example:
SELECT 1 + 1, 1 - 1, 1 + 1 * 2, 8 / 5, 8 % 5,
MOD(8,5), MOD(8,5) | 0, -(1+1) * -2;
[edit] Header of scripts
For each test or auxiliary script, create a header that contains the following information:
- Purpose of the test or script
- Corresponding WL task, if there is any
- Creator of the test and date of creation
- Author of last significant change + date of change + what was changed
- Dates should be in ISO format (ISO 8601): YYYY-MM-DD
- In case that the script assigns values to some variables and sources some master test script, please explain the purpose of these variables and why you use these values
Note: The header, like the rest of the test, should not mention confidential information. Remember, our tests are available publicly.
[edit] Header example
########### suite/funcs_1/t/processlist_val_no_prot.test ############# # # # Testing of values within INFORMATION_SCHEMA.PROCESSLIST # # # # The prepared statement variant of this test is # # suite/funcs_1/t/processlist_val_ps.test. # # # # There is important documentation within # # suite/funcs_1/datadict/processlist_val.inc # # # # # # Creation: # # 2007-08-09 mleich Implement this test as part of # # WL#3982 Test information_schema.processlist # # # ######################################################################
[edit] Comments within boxes
Please work 100% perfect when using boxes made of '#'. Such boxes "jump" into the eye of most readers. Bad example similar to what I found in some tests:
############################################### # # # Some text # # Some text # # # # Author : ..... # # # # Some text # # # ###############################################
Please note that this is caused by spaces, not tabs.
[edit] Comments Everywhere
Write comments, they save the time of others.
- Complicated test architecture or tricky code
- A subtest case is able to reveal a bug
- Subtests
- Make test protocols more understandable
- Better protocols example
[edit] Complicated test architecture or tricky code
If your test is very complicated, write a sufficient explanation of the test architecture. This helps avoid having someone else come along and unintentially change the test in ways that:
- Shift the goal of the check
- Destroy your test architecture
- etc.
Comment your code when you do or check something that someone else may think is not trivial. Please write what you intend to check if it is not obvious.
[edit] A subtest case is able to reveal a bug
Write a comment if the next subtest revealed a bug in history. Please mention number and title.
Please use exact the formatting "# Bug#nnnnn <Title>"
Example: # Bug#3671 Stored procedure crash if function has "set @variable=param"
because it does not look nice to see so many different variants like "BUG #nnnnn", "bug #nnnnn - Title" often even within the same test.
[edit] Subtests
At least in cases where your file contains many subtests
- Mark these subtests for better readability of the script
- Write also a message into the protocol.
And please explain what each subtest checks, unless it is obvious.
Example:
--echo #----------------- Testcase 3.3.1.52 ---------------------# ############################################################################### # Ensure that a view that is a subset of every column and some rows of a single # underlying table, contains the correct row-and-column data; such a view has # a definition that is semantically equivalent to CREATE VIEW <view_name> # AS SELECT * FROM <table_name> WHERE ... ###############################################################################
[edit] Make test protocols more understandable
Please have the test write comments into the protocol if this makes the surrounding protocol content much more understandable. This is especially true for tests that do the following:
- Suppress the printing of SQL statements and result sets
- Work with more than one connection
[edit] Better protocols example
Good script with message about switching the connection:
SET @aux = 1; SELECT @aux AS "content of @aux is"; --echo # Establish session con1 (user=root) connect (con1,localhost,root,,,,); SELECT @aux AS "content of @aux is";
Protocol of good script:
SET @aux = 1; SELECT @aux AS "content of @aux is"; content of @aux is 1 # Establish session con1 (user=root) SELECT @aux AS "content of @aux is"; content of @aux is NULL
This is what the protocol looks like without the comment. The output becomes less clear because there is no indication that the connection has been changed.
SET @aux = 1; SELECT @aux AS "content of @aux is"; content of @aux is 1 SELECT @aux AS "content of @aux is"; content of @aux is NULL
[edit] Some Rules Outside of the Formal Stuff
- Prerequisites checks
- Error masking
- Use of the option "--disable_abort_on_error"
- Perfect cleanup at the end of a test
- Use of OS-specific commands
[edit] Error masking
Use error names instead of error numbers whenever possible.
Example:
--error ER_BAD_DB_ERROR USE <not existing database>;
You can find the error names in the include/mysqld_error.h file of a MySQL source distribution, or in the Server Error Codes and Messages section of the MySQL Reference Manual]. NOTE: Don't do this with error numbers > 2000. Use of the error name does not seem to work well in these cases.
[edit] Use of the option "--disable_abort_on_error"
This option is very useful when starting to write a new test because the test will not abort if your script contains some failing statements or SQL syntax errors. By disabling the abort, you get to see more of the errors per test run and can fix more of them at a time.
But there are only rare situations where the final versions of a test should use "--disable_abort_on_error", either at all or during a sequence of several SQL statements.
[edit] Perfect cleanup at the end of a test
- Cleaning up becomes much more comfortable and less error prone if you create your "own" database and create all tables there. At the end of the test, you need to drop just this database.
- Do not forget to remove all users you created and all permissions you granted. Otherwise the next test might fail when checking grants in general.
- Close all connections which you have explicitly created and please ensure that the disconnect is finished = the sessions are no more visible within the processlist.
- Remove all auxiliary files created within your test.
MTR2 runs now by default with "check-testcases" enabled. It checks if there are additional objects like user or tables, modifed system table etc. "check-testcases" will be soon improved. However, proper cleanup should still be a responsibility of both the test author and reviewer.
[edit] Use of OS-specific commands
The exec and system commands enable tests to execute external commands. However, many of such commands are available only on certain platforms. (For example, rm is Unix-specific and not available on Windows.) Please avoid these commands if possible. They harm the portability and stability of tests.
Have a look at the mysqltest manual and the t/mysqltest.test test file. There are now several mysqltest built-in commands such as
"--write_file", "--cat_file", "--remove_file", ...
which are reliable on all operating systems. Sometimes OS commands could be also replaced by SQL statement sequences writing to and reading from files and some SQL string functions.
[edit] "Negative" Tests
A "negative" test is a test for which you expect to see a failure. If an error does not occur, that itself indicates a problem.
DO NOT FORGET "NEGATIVE" TESTS where we expect to see fine error messages from the server.
This section contains just a few examples of what to test. Please be creative and imagine what could go wrong in rough reality.
Column with numeric data type:
- NULL, 0
- Minimum - 1, Minimum, Minimum + 1
- Maximum - 1, Maximum, Maximum + 1
- Negative values if data type is unsigned
- Garbage like 'abc', '1a', 'a1'
Column with string data type:
- Null, <empty string>, "exotic" characters like 'รค', single quotes, ...
- String longer than column
Limited number of tables, columns, variables, ...
- Maximum - 1, Maximum, Maximum + 1
Assume an SQL operation that affects the filesystem (LOAD DATA, CREATE SCHEMA or TABLE, backup/restore, ...). What will happen if the following conditions occur for a file or directory to be read, written, or implicitly created or deleted?
- Exists/does not exist
- Is writable/not writable(missing permission)
- Is empty/not empty
- Contains the expected content (Example: text)/unexpected content like maybe a JPG)
- Is a regular file/a directory/a softlink pointing to .../a pipe
- Is assigned via an extremely long path
- Becomes victim of file system full
[edit] Tests with Several Variants
Please think twice before you create
- engine-specific variants of a test
or
- let runs with ps-protocol/sp-protocol/cursor-protocol/view-protocol
happen. They might be
- of low value and a permanent wasting of resources and/or
- fail (protocol variants only)
[edit] Variation of the storage engine
- Checks of the INFORMATION_SCHEMA
- The storage engines to be used for information_schema tables are hardcoded. Therefore tests focussed on permissions, optimizer strategies, column data types etc. when selecting on INFORMATION_SCHEMA tables should not run with storage engine variations.
- Please prepare the scripts at least for use with different storage engines.
- That means mostly:
- DO NOT use hardcoded storage engine assignments within CREATE TABLE statements.
- If you assume that there is no significant impact of storage engines on your testing object,
- create/run with storage engine variants and check this.
Solution 1:
Do not assign the storage engine within your CREATE TABLE statements at all. The default storage engine MyISAM will be used for your tables. Check your assumption with:
./myql-test-run.pl --mysqld="--default-storage-engine=<engine>" <test_case>
Solution 2:
Assign the storage engine to be used via $variable.
Top level script:
let $engine_type= MyISAM;
The same script or a sourced script:
eval CREATE TABLE .... ENGINE = $engine_type ...
Check your assumption by creating and running storage engine variants of the top level test.
--source include/have_falcon.inc let $engine_type= Falcon;
[edit] Variation of the protocol option
It is usual to run tests with and without the mysql-test-run.pl startup option "--ps-protocol". And there are also attempts of System QA to run the other protocols.
- Effect of the "--ps-protocol" option: mysqltest will run as many SQL statements as possible as prepared statements.
- Effect of the "--sp-protocol"/"--cursor-protocol"/"--view-protocol" options: As far as I know, mysqltest takes many statements and transforms them into a statement sequence checking the corresponding feature (stored procedures, cursors, or views).
We have a lot of tests running very similar and extremely simple SQL just for the creation of a situation to be tested, check of table content, etc. So it can be assumed that the n'th test running again simple statements does not improve the coverage.
Conclusion:
If your test contains
- Simple statements: Prevent the non valuable runs for "protocols".
- Storage engine variations: Prevent the protocol runs for all except one storage engine (my take would be MyISAM).
- "Unique" and complex statements:
- ps-protocol statements = all SQL
- sp-protocol statements = DML
- view/cursor-protocols statements = SELECTs
- Do not prevent the protocol runs.
Another problem around runs with such protocols is that there are cases where we get different protocol content.
Example:
Script:
SELECT * FROM processlist ...
Protocol content if running without any "--*-protocol":
ID USER ... COMMAND ... STATE INFO 1 root ... Query ... preparing SELECT * FROM processlist ...
Protocol content if running with "--ps-protocol":
ID USER ... COMMAND ... STATE INFO 1 root ... Execute ... preparing SELECT * FROM processlist ...
So please check whether every new test gives the same result with every protocol but at least with "--ps-protocol". If not, do something of the following:
- Exclude some protocol variants from execution.
- Disable the the use of the protocols for problematic statements.
- Write protocol variant specific tests.
Example solution (code within the top level scripts):
# The file with expected results fits only to a run without
if (`SELECT $PS_PROTOCOL + $SP_PROTOCOL
+ $CURSOR_PROTOCOL + $VIEW_PROTOCOL > 0`)
{
--skip Test requires: ps-protocol/sp-protocol/cursor-protocol/view-protocol disabled
}
--source include/<whatever>.inc
or
# The file with expected results fits only to a run with "--ps-protocol".
if (`SELECT $SP_PROTOCOL + $CURSOR_PROTOCOL + $VIEW_PROTOCOL > 0
OR $PS_PROTOCOL = 0`)
{
--skip Test requires: ps-protocol enabled, other protocols disabled
}
--source include/<whatever>.inc
[edit] Miscellaneous Tips
- Stability checks for random timing problems
- Stability checks for hostname-related problems
- Stability checks for unexpected problems
- Row order within result sets
- Test case behavior should depend on the result of a SHOW
- Does a test execution hang?
- Your test is complicated, support debugging
- You need to know some syntax, option etc., but hate reading manuals
[edit] Stability checks for random timing problems
- Recommended Several test runs (<number>), use an memory based (extreme fast) filesystem if available
./mysql-test-run.pl --mem .... --repeat=<number> <your test>
- Several test runs, use a "slow" disk based filesystem
rmdir var # "var" must not be a symlink pointing into a RAM based filesystem ./mysql-test-run.pl .... --repeat=<number> <your test>
- Recommended Generate parallel load on the CPUs and/or the disk where MTR2 reads and writes data (in most cases directory "var"). Examples:
- all OS: Compile a MySQL version from source
- Unix derivates (OpenSolaris,Linux,OSX,...): tar -chvf - <some path> > /dev/null
- Unix derivates: root: dd if=<area of disk containing the "var" directory> of=/dev/null
- Windows: Defragmentation
rmdir var # "var" must not be a symlink pointing into a RAM based filesystem ./mysql-test-run.pl .... --repeat=<number> <your test>
- all OS:
./mysql-test-run.pl .... --repeat=<number> --parallel=8 <your test>
[edit] Stability checks for hostname-related problems
sudo hostname <other hostname> ./mysql-test-run.pl --mem <name of test>
Good values for <other hostname> to check are "0", "bbb", "mmm", "zzz".
[edit] Stability checks for unexpected problems
Ask a colleague to run your test or try another testing box (preferably one with a different operating system).
[edit] Row order within result sets
Please keep in mind that the row order within a significant number of result sets depends on storage engine properties and in worst cases the current load (NDB!, partitioning?, parallel query?) on the testing box.
Either use
--sorted_result SELECT * FROM t1;
or, if possible, "decorate" your SELECT with ORDER BY to make the row order static.
[edit] Test case behavior should depend on the result of a SHOW
Please have a look at the t/mysqltest.test file where "query_get_value()" is checked.
[edit] Does a test execution hang?
Symptom: No text flooding through the command window where a test is running.
Run this command in a different window:
tail -f <var_dir>/mysqld.1/data/mysql/general_log.CSV
If you see some changes, your test is alive.
[edit] Your test is complicated, support debugging
Top level test script:
... ##### Option, for debugging support ##### let $debug= 0; ...
At various places within your scripts:
if ($debug)
{
--echo # var1: $var1 , var2: $var2, .....
SELECT ......
}
or
--disable_query_log
if ($debug)
{
--enable_query_log
}
[edit] You need to know some syntax, option etc., but hate reading manuals
Try this:
grep -i '<one keyword you know or guess>' t/* include/* | less
[edit] Notes on MTR2
MTR2 has brought some new utility that can improve testing, but also some elements that can be pitfalls. This section attempts to describe both.
- Changes
- $MYSQLTEST_VARDIR/master-data -> $MYSQLD_DATADIR
- This change was introduced to work with parallel. Since MTR2 can run several instances of itself, one path is needed for each instance.
- You must also use let $MYSQLD_DATADIR= `SELECT @@datadir`;
- New options
- max-test-fail: The number of test cases that can fail before the test run aborts. Default=10, set to 0 to continue the run regardless of failure count.
- NOTE: --force alone is no longer sufficient to guarantee a full test suite run. If the number of failing tests == max-test-fail, then the test run will abort.
- parallel: Set n number of workers to run tests in parallel. The current recommendation is 2x the number of processors on your machine.
- NOTE: You can no longer be sure of executing test cases in a particular order due to this and other changes brought on by MTR2. *Do not* try to create multi-part tests (test_pt1.test, test_pt2.test, etc).
- NOTE: Be cautious when using a high (>2) value for parallel and --mem in combination. It is possible to use vast amounts of resources on your system and cause problematic performance till a crash of the operating system while MTR2 is running.
- repeat: Run the input test(s) n times in succession. Very good for diagnosing random failures - set a high value for 'n', create a high load on your machine, and see if you can duplicate the failure.
[edit] Examples of "suspicious" tests and scripts
- Probably mismatch of focus of test and code sequence
- Too greedy test
- Risky handling around additional sessions
[edit] Probably mismatch of focus of test and code sequence
CREATE TABLE t1 ( id INT NOT NULL AUTO_INCREMENT, my_column VARCHAR(30), name LONGTEXT, PRIMARY KEY (id));
INSERT INTO t1(my_column,name) VALUES('2','two');
INSERT INTO t1(my_column,name) VALUES('1','one');
INSERT INTO t1(my_column,name) VALUES('4','four');
INSERT INTO t1(my_column,name) VALUES('2','two');
INSERT INTO t1(my_column,name) VALUES('3','three');
This test will
- fail if the feature AUTO_INCREMENT is temporary broken
- will (depending on existence of prerequisite checks) fail or get skipped if the default storage engine does not support AUTO_INCREMENT or LONGTEXT
- ==> no coverage for other features or properties checked within this test
Some questions with recommended action depending on the answer:
- Is the table t1 just an auxiliary table and not the test object?
- Yes: Please ensure that the test does not break or gets skipped if the default storage engine does not support AUTO_INCREMENT or LONGTEXT and you are done.
- No: no action
- Do we check AUTO_INCREMENT and the corresponding column is t1.id?
- Yes: no action
- No: Remove the use of AUTO_INCREMENT
- Do we check LONGTEXT and the corresponding column is t1.name?
- Yes: no action
- No: Remove the use of LONGTEXT
- Do we check AUTO_INCREMENT and LONGTEXT in combination?
- Yes: no action
- No: Split the test at least if it should be a test of basic functionality
[edit] Too greedy test
Let's assume we have to check that every new created table causes a row in information_schema.tables.
--replace_column 15 <CREATE_TIME> 16 <UPDATE_TIME> 17 <CHECK_TIME> SELECT * FROM information_schema.tables;
The SELECT above makes the test extreme "greedy" for changes in behaviour which is maybe good for general bug hunting but not for smart and frequent automatic tests. It is to be expected that such a test requires frequent maintenance like adjustment of expected results.
The problems with the select above: We will (sometimes only maybe) get a result set difference whenever
- we get a new system table
- Completeness of result sets without WHERE are outside of the focus.
- Refined statement:
--replace_column 15 <CREATE_TIME> 16 <UPDATE_TIME> 17 <CHECK_TIME> SELECT * FROM information_schema.tables WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 't1';
- the number of columns within information_schema.tables changes
- the data type of a column within ... changes
- the content within one of the columns within ... changes
- Lets assume something intentional like VERSION is now 11 instead of 10.
- This means we should try to avoid "SELECT *".
Final statement:
Variant 1: SELECT TABLE_SCHEMA,TABLE_NAME FROM information_schema.tables WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 't1';
- Are we really consequent? No, but we could
Variant 2: SELECT COUNT(*) FROM information_schema.tables WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 't1';
It's on you which variant you choose. I decide depending on the situation.
[edit] Risky handling around additional sessions
Disconnects are performed asynchronous. This is most probably good for the throughput of usual application but has some annoying consequences for the development of tests.
If a
- disconnect <connection> or
- KILL <connection_id> or
- --exec <client program which connects to the server>
occurs within a script for mysqltest than the next statements will be executed even if the disconnect or kill is not complete finished.
I guess this also applies to the end (-> end of statements to be executed by mysqltest) of a test. It triggers some disconnect for any open session but this is also asynchronous.
Not complete finished means that the session is visible within the processlist and the entry about the 'quit' of the session is not already appended to the general log.
So it could happen that a succeeding test suffers by unexpected events like
- observing the session belonging to the previous test within the processlist
- being surprised by an entry (the 'quit' of the disconnected session) within the general log which was not caused by its own activity etc.
According to my experiences the likelihood that some test could harm a successing test by unfinished disconnects increases with
- additional parallel load on the testing box
- "disconnect <connection>" is missing within the test
- "shorter" distance between "disconnect <connection>", "KILL <connection_id>" or "--exec ..." and end of test
- the session runs a "heavy" SQL statement just before its disconnect
In my opinion the most robust solution is to add a
--source include/wait_until_disconnected.inc
just after the disconnect.
In case you don't believe me, run the sniplet following later
./mtr --skip-ndb --no-check-testcases --repeat=100 <sniplet>
and you will most probably observe something like
TEST RESULT TIME (ms) ------------------------------------------------------------ ... <sniplet> [ pass ] 4 <sniplet> [ fail ] ... CURRENT_TEST: <sniplet> --- <result> +++ <reject> @@ -1,4 +1,4 @@ SELECT COUNT(*) FROM information_schema.processlist WHERE id <> CONNECTION_ID(); COUNT(*) -0 +1 SELECT SLEEP(10); ... mysqltest: Result content mismatch
Sniplet for demonstration purposes:
SELECT COUNT(*) FROM information_schema.processlist WHERE id <> CONNECTION_ID(); # (1) connect (con1,localhost,root,,); send SELECT SLEEP(10); # (2) connection default; # Wait till the SQL statement of con1 is "in work" let $wait_condition= SELECT COUNT(*) = 1 FROM information_schema.processlist WHERE State = 'User sleep' AND Info = 'SELECT SLEEP(10)'; --source include/wait_condition.inc # (3) SELECT COUNT(*) = 1 FROM information_schema.processlist WHERE State = 'User sleep' AND Info = 'SELECT SLEEP(10)'; # (4) disconnect con1; (5) <end of script>
The sniplet is intended for demonstration purposes and contains code which is obvious "crap" but needed to enforce the intended effect. Some notes:
- (1) Such a statement at the begin of a test makes this test very sensitive to not finished disconnects.
- (2) The wait routine (3) prevents that our look on the process list (4) happens too early = before the server has started the processing of (2). Otherwise we get random result set differences for (4) under high parallel load.
- How to to prevent that this test harms the succeeding test?
(5) should be replaced by connection con1; disconnect con1; --source include/wait_until_disconnected.inc connection default;
- Where is the "crap"?
- There should be a 'reap' for every preceeding 'send <statement>'. This is not just for academic completeness it also ensures here that (2) is really finished.
- Do we really need 10 seconds within the SLEEP? According to my experiences sleep times <= 2 seconds are critical under high load. What about SLEEP(5)?