Category: RandomQueryGenerator

RandomQueryGeneratorGrammar


This page describes the SQL grammar format as understood by the RandomQueryGenerator

Contents

[edit] Grammar Elements

The following built-in grammar elements are understood by the Random Query Generator:

query:
   select ; update ;

select:
   SELECT _field[invariant] FROM _table[invariant] WHERE _table[invariant]._field[invariant] BETWEEN _digit[invariant] AND _digit[invariant] + _digit;

update:
   UPDATE _table[invariant] SET _field = _digit;

This will generate two queries, both operating on the same table. In the first SELECT, the field being selected will be the same field that is used in the WHERE. In the BETWEEN, the same digit will appear before and after the AND, so that the second argument to BETWEEN is always larger than the first one.

Please note that the value in

  • _<whatever>[invariant] is calculated once per query
  • _<whatever> is calculated per every time it occurs in a query
  • _<whatever>[invariant] does not contain the value of the last _<whatever> call.
UPDATE _table SET `char_5` = _quid WHERE _field = _digit ;
SELECT /* _quid */ _field FROM _table ;
CREATE TABLE _letter SELECT * FROM _table ;
    BACKUP DATABASE test TO _tmpnam ; RESTORE FROM _tmpnam ;
CREATE TEMPORARY TABLE _tmptable (`f1` INTEGER);
UPDATE _table SET _field_no_pk = _digit WHERE _field = _digit; 
INSERT INTO _table SELECT _field_list FROM _table;
SELECT * FROM _table ORDER BY _field_count - 1;
INSERT INTO _table ( _field ) VALUE ( _data ) ;

will be expanded to:

INSERT INTO t1 ( f1 ) VALUE ( LOAD_DATA ('/path/to/rqg/data/earth51K.jpeg'));

[edit] Comments

The following types of comments are recognized and processed:

query: SELECT /*! HIGH_PRIORITY */ * FROM t1
query: SELECT * FROM t1 ORDER BY f1 /*+JavaDB:Postgres: NULLS FIRST */


[edit] Embedded Perl Code

The grammar may contain embedded Perl code, delimited with { } . After the SQL statement has been built, the code is evaluated left to right. If any of the snippets return a value, it is substituted in-place. To avoid returning anything , use undef; as your last expression.

In addition, stand-alone variables such as $test will be replaced with their values.

Because the grammar is not parsed with a fully-featured lexer, the following constructs may not work:

Perl code is useful to generate stuff that is difficult to express in a grammar, e.g. when you want to generate or use the same thing twice. Such as:

CREATE DATABASE {$db++; "database".$db } ; USE $db ;

This increments a counter and generates a database name based on the counter. The same name is then used in the USE statement.

If you want to generate controlled pseudo-random values inside your Perl code, use the $prng object:

CREATE DATABASE { $db = $prng->int(0,9) } ;

see inside the GenTest::Random .pm file for more information on the values that can be generated.

To pick a random table and store its name in $table:

pick_table:
  { $table = $prng->arrayElement($executors->[0]->tables()) } ;

To order by all selected columns (assuming "fieldNN" aliases are used and kept track of by the grammar):

order_by_all:
  { join(', ', map { "field".$_ } (1..$fields) ) } ;

To generate a list of all tables and views (though with limited usefulness):

all_tables: 
  { join(', ', @{$executors->[0]->tables()}) }

[edit] Advanced Perl Usage

A stack mechanism that allows passing of an argument to sub-rules and returning a result is also supplied. Also, you may keep variables on a certain recursion level (just as in local variables in subroutines). This stack mechanism is available through the $stack variable. The following functions are supported:

push pushes a new frame to the stack. If the previous frame has a value "arg", this will be copied to the new frame.

pop(result) will pop the frame off the stack and place the argument as "result" in the next frame.

set(name,value) will set a value in the current frame

get(name) will get a value (if it exists) from the current frame.

All routines return undef with the exception of get.

Example:

query:
    { $stack->push(); $stack->set("arg","foo")} a { $stack->get("result") } ;

a:
    A { $stack->push(); $stack->pop($stack->get("arg")."bar"); return undef } ;

This grammar will generate

A  foobar

Conceptually, the example above can be visualized as follows:

Stack contents after op           op

+---------------------+
|                     |     <-- push()    (in rule "query")
+---------------------+

+---------------------+
| "arg"    = "foo"    |     <-- set("arg", "foo")
+---------------------+


+---------------------+
| "arg"    = "foo"    |     <-- push  (in rule "a")
+---------------------+
| "arg"    = "foo"    |    
+---------------------+

                                get("arg") = "foo"
                                get("arg")."bar" = "foobar"


+---------------------+
| "arg"    = "foo"    |     <-- pop(get("arg")."bar")
| "result" = "foobar" |   
+---------------------+

                                get("result") = "foobar"

There is also a global frame $global with a get and a set routine which will survive one generated query to another.

[edit] Objects available for embedded Perl

No other objects or methods should be referenced, since they are not part of the public interface and subject to change.

[edit] Various Tricks

[edit] Working with variable table structures

By default, the tables created by the RQG all have an identical structure. Therefore, a field returned by _field will be present in all tables. If you have a variable database structure, then you can use constructs such as

SELECT _database . _table . _field ;

This will first pick a database, then a random table from the database that was just picked and finally a random field from the table that was just picked.

_database is of course optional. In addition, the last database and table picked are stored in the $last_database and $last_table perl variables, which you can read and write to from your perl code embedded in the grammar.

Please note that

[edit] Joins

To use joins, give static alias names to your tables. This way even if your grammar picks random tables, you will be able to refer to them using their constant names.

select: SELECT select_item , select_item FROM _table AS A LEFT JOIN _table AS B USING ( _field );

select_item: X . _field | Y . _field; 

[edit] Multi-query statements and stored procedures

To generate a set of queries to be executed together, place them on a single line and separate them with a semicolon (;).

For example, to replicate a statement containing RAND() using statement-based replication:

replicate_rand:
  SET BINLOG_FORMAT = 'STATEMENT' ; INSERT INTO _table VALUES ( RAND() ) ; SET BINLOG_FORMAT = 'ROW' ;

To create a stored procedure that does something:

procedure:
 CREATE PROCEDURE _letter () BEGIN insert ; delete ; END ;

insert:
 INSERT INTO t1 VALUES ( _digit ) ;

delete:
 DELETE FROM t1 LIMIT 1 ;

_letter returns a random letter from A to Z, and _digit returns a random digit from 0 to 9.

[edit] Temporary file names

To facilitate the operation of multi-treaded tests, the framework provides a build-in _tmpfile rule which creates a temporary file. The file is destroyed before generating the next statement, so all references to _tmpfile must be contained in a a single statement. Within that line, all references to _tmpfile will return the same value. The file name being returned is located in the `pwd`/tmp directory, usually mysql-test-extra-6.0/mysql-test/gentest/tmp. For example:

dump_and_load:
 select_into_outfile ; load_data_infile ;
 
select_into_outfile:
 SELECT * FROM _table INTO OUTFILE _tmpfile ;

load_data_infile:
 LOAD DATA INFILE _tmpfile REPLACE INTO TABLE _table ;

[edit] Correlated subqueries

By defining alias X in the outer SELECT and an alias Y in the inner select, you can generate correlated subqueries:

select:
	SELECT _field
	FROM _table AS X
	WHERE _field IN (
		SELECT _field
		FROM _table AS Y
		WHERE X . _field = Y . _field
	);

[edit] Transactions

To create a transaction containing SELECTs and UPDATEs:

query:
	transaction | select | update;

transaction:
	BEGIN | COMMIT | ROLLBACK;

This will generate a stream of updates and selects with randomly dispersed transaction commits and rollbacks. When the query executor encounters a BEGIN, it will automatically execute SET AUTOCOMMIT=OFF for the connection.

To create fully-formed transactions with no orphan statements:

query:
  START TRANSACTION ; statement ; statement ; commit_rollback;

statement:
  insert | update | select ; 

commit_rollback:
  COMMIT | ROLLBACK ;

[edit] Variable-length lists

To create a variable-length list of items (e.g. fields, tables, etc.) use:

select: SELECT select_list FROM DUAL;

select_list: select_list , select_item | select_item;

select_item: integer | _field;

The select_list rule will either expand to another select_list with a select_item appended, thus prolonging the list, or will expand to just select_item, thus terminating the list.


[edit] Testing your grammar

To test the performance of your grammar, you can run

$ perl runall.pl \
  --basedir=/build/bzr/mysql-6.0 \
  --grammar=conf/subquery_semijoin.conf \
  --debug \
  --threads=1

This will generate and execute 1000 queries and will then report:

[edit] Sharing code (Rules) within some grammar

It is recommended to share code if possible because this can have significant advantages during grammar development and especially when using the grammar in production where

- required maintenance
- simplification of grammar for a given bad effect
- resource consumption on testing box

will cause expenses.

The generation of

- SELECT and INSERT ... SELECT can share rules for the select part
- SELECT, INSERT and DELETE can share rules for the where part
- joins in SELECTs can be shared with multi table UPDATE and DELETE
- the actions within PROCEDUREs, FUNCTIONs, TRIGGERS can borrow a lot from plain DML statement generation

[edit] Cause that some failure is reported in case an incorrect configured environment is used

Let's assume we have some test with the following architecture:

- thread1 runs DDL and creates and/or drops objects like tables, views, triggers etc.
- the other threads use these objects within DML statements

A RQG run with thread=1 is rather valueless as soon as

- the code for the activity of thread1 is sufficient perfect (no generation of wrong syntax etc.)
- RQG runs with this setting pass all time

Solution 1 - cause that some error is reported but do not abort the test immediate:

  thread1_init:
     SELECT /* RESULTSET_IS_SINGLE_INTEGER_ONE */ { $ENV{RQG_THREADS} } > 1 ;
  RQG output
  ...
  # 2011-05-06T18:27:50 GenTest::ErrorFilter(23587) started
  # 2011-05-06T18:27:50 Query: SELECT /* RESULTSET_IS_SINGLE_INTEGER_ONE */ 1 > 1 does not have the declared properties: RESULTSET_IS_SINGLE_INTEGER_ONE
  # 2011-05-06T18:27:50 Started periodic reporting process...
  ...
  # 2011-05-06T18:27:53 Child process completed with error code 33.
  ...
  # 2011-05-06T18:27:54 gentest.pl exited with exit status STATUS_CONTENT_MISMATCH (33)
  ...

Solution 2 - cause a specific error message and abort the test immediate:

  thread1_init:
     {if (not $ENV{RQG_THREADS} > 1) {print("!!! ERROR: The number of assigned threads must be > 1. Abort. !!! \n"); exit STATUS_ENVIRONMENT_FAILURE }; return undef};
  RQG output
  ...
  # 2011-05-06T18:48:57 GenTest::ErrorFilter(24621) started
  !!! ERROR: The number of assigned threads must be > 1. Abort. !!! 
  # 2011-05-06T18:48:57 Started periodic reporting process...
  ...'
  # 2011-05-06T18:48:58 Test completed with failure status STATUS_ENVIRONMENT_FAILURE (110)
  ...

Retrieved from "http://forge.mysql.com/wiki/RandomQueryGeneratorGrammar"

This page has been accessed 6,502 times. This page was last modified 03:34, 16 January 2012.

Find

Browse
MySQLForge
Main Page
Current events
Recent changes
Random page
Help
Edit
Edit this page
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Special pages
New pages
File list
Statistics
Bug reports
More...