Category: RandomQueryGenerator

RandomQueryGeneratorCaveats

This page lists known RQG caveats and some workarounds for them

Contents

[edit] Generating semantically valid queries

The RQG is capable of generating queries containing variable numbers of columns and tables. This variability is a problem because the items generated from a FROM clause would then be used in a WHERE clause. Using some simple embedded Perl code, it is possible to keep track of things generated for one part of the query in order to properly reuse them in another part.

[edit] Variable number of tables in a FROM clause

Knowing what tables participate in the FROM clause is important when generating a comprehensive and meaningful WHERE clause. So, we give each table a unique alias and record the total number of aliases that have been assigned:

query:
 { $tables = 0 } SELECT * FROM table_list WHERE where_list

table_list:
 new_table_item |
 table_list , new_table_item ;

where_list:
 (where_item) |
 (where_list AND where_item);

where_item:
  existing_table_item . _field = _digit ;

new_table_item:
  _table AS { "table".++$tables };

existing_table_item:
 { "table".$prng->int(1, $tables) };

The new_table_item rule picks a random table, gives it an alias, and increments the number of aliases given so far. The existing_table_item rule picks one such alias. This way, the WHERE clause will only contain table aliases that have actually been generated for the FROM clause.

[edit] Variable number of SELECT columns

We can use the same trick to keep track of the SELECT items we generate - we will give each one a unique alias:

query:
  { $fields = 0} SELECT select_list FROM _table HAVING having_list ;
 
select_list:
  new_select_item |
  select_list , new_select_item ;

having_list:
  (having_item) |
  (having_item AND having_list) ;
 
having_item:
  existing_select_item = _digit ;

new_select_item:
  _field AS { "field".++$fields };

existing_select_item:
  { "field".$prng->int(1, $fields) };

[edit] Generating queries that can be compared between server versions

Certain queries can legitimately return different results between different server versions or storage engines. Such false positives can be avoided by careful grammar constructions.

[edit] LIMIT without full ORDER BY

Even though the ResultsetComparator Validator attempts to account for different ordering of result sets, if a LIMIT clause is specified without a full ORDER BY, two servers may return a different result set altogether. There are two ways to avoid this:

After your usual ORDER BY, append all columns that were generated so far, using the aliases that were given to them using the grammar above:

order_by:
 ORDER BY your_order_by , total_order_by;

total_order_by:
  { join(', ', map { "field".$_ } (1..$fields) ) };
total_order_by:
  { join(', ', map { "table".$_." . `pk`" } (1..$tables) ) };

[edit] Different fields in SELECT and GROUP BY

If the SELECT contains non-aggregate fields that are different from the ones used in the GROUP BY, then the result will be undefined. To avoid this:

Which will make all such queries invalid. The chance of generating a valid query however will not be improved and will remain very low.

query_init:
  { $fields = 0; @nonaggregates = () } SELECT select_list FROM _table group_by
   
select_list:
  select_item |
  select_list , select_item ;
 
select_item:
  aggregate_select_item | nonaggregate_select_item ;

aggregate_select_item:
  COUNT( _field ) | SUM( _field ) | MIN( _field ) | MAX( _field );

nonaggregate_select_item:
  _field AS { $field_name = "field".++$fields ; push @nonaggregates , $field_name ; $field_name }; 

group_by:
  { scalar(@nonaggregates) > 0 ? " GROUP BY ".join (', ' , @nonaggregates ) : "" };

[edit] GROUP BY with ORDER BY

If you use GROUP BY and ORDER BY on a column that is not listed in the SELECT, the result may vary, depending on whether sorting happened before the grouping or the other way around. To avoid this, always use a column alias in the ORDER BY clause.

[edit] Sufficent number of database items in face of DDL statements

If you run a lot of DDL statements, your DML queries may frequently try to access a database object that does not exist. The following solutions may help to partially alleviate the problem:

query_init:
   create ; create ; create ; create ; create ;
 dml:
   create | create | create | create | drop ;
select:
   SELECT * FROM pick_existing_table ;

create:
   CREATE TABLE IF NOT EXISTS pick_create_table (F1 INTEGER) ;

drop:
   DROP TABLE IF EXISTS pick_drop_table ;

pick_create_table:
   { if (scalar(@dropped_tables) > 0) { $created_table = shift @dropped_tables } else { $created_table = $prng->letter() } ; push @created_tables, $created_table ; $created_table } ;

pick_drop_table:
   { if (scalar(@created_tables) > 0) { $dropped_table = pop @created_tables } else { $dropped_table = $prng->letter() } ; push @dropped_tables, $dropped_table ; $dropped_table } ;

pick_existing_table:
   { if (scalar(@created_tables) > 0) { $prng->arrayElement(\@created_tables) } else { $prng->letter() } } ;

Retrieved from "http://forge.mysql.com/wiki/RandomQueryGeneratorCaveats"

This page has been accessed 2,386 times. This page was last modified 11:54, 14 October 2010.

Find

Browse
MySQLForge
Main Page
Current events
Recent changes
Random page
Help
Edit
Edit this page
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Special pages
New pages
File list
Statistics
Bug reports
More...