Category: RandomQueryGenerator

RandomDataGenerator

Contents

[edit] The Random Data Generator

The Random Data Generator allows the creation of tables of arbitary size containing an arbitary combination of columns, filled with random data.

[edit] Installing the Data Generator

The Data Generator is available as part of the Random Query Generation framework, which is available on Launchpad. To create your own working copy of the RQG framework, execute:

$ bzr checkout lp:randgen/2.0 randgen

or

$ bzr branch lp:randgen

It will end up in a directory on your disk called randgen.

[edit] Invoking the Data Generator

The Data Generator can be invoked in three different ways:

$ perl gendata.pl \
  --dsn=dbi:mysql:host=127.0.0.1:port=9306:user=root:database=test\
  --spec=conf/examples/example.zz
$ perl gendata.pl \
  --dsn=dummy:print \
  --spec=conf/examples/example.zz

The DSN is in the standard format for Perl's DBI. If "dummy:print" is specified as DSN, the script prints the required SQL to create the tables to STDOUT. After trimming away the "Executing " text before each SQL statement, you can then pipe it from there into a mysql client for execution.

$ perl gentest.pl\
  --dsn=dbi:mysql:host=127.0.0.1:port=9306:user=root:database=test\
  --gendata=conf/example.zz\
  --grammar=conf/example.yy

This will run the Data Generator in order to create and populate the tables as specified in conf/example.zz and will then run the random query test as described in conf/select.yy

$ perl runall.pl \
  --basedir=/path/to/mysql/basedir \
  --gendata=conf/example.zz \
  --grammar=conf/example.yy

This will proceed to start a mysql server, generate the tables and the data, and run the random queries in a single command.

[edit] Default behavior

If the Random Query Generator is run without specifying a Data Generator configuration script (for example by using the --gendata option), a default set of tables will be generated.

See RandomQueryGenerator#The_Random_Data_Generator for more details.

The remainder of this document is about the more advanced, customizable data generator, and not the default one.


[edit] Configuring the Data Generator

The Data Generator is driven by a configuration file which may look as follows:

$tables = {
        rows => [0, 1, 10, 100],
        partitions => [ undef , 'KEY (pk) PARTITIONS 2' ],
        engines => ['Falcon', 'Innodb', 'Maria' ],
};

$fields = {
        types => [ 'int', 'char', 'enum', 'set', 'blob' ],
        indexes => [undef, 'unique', 'key' ],
        null => [undef, 'not null'],
        default => [undef, 'default null'],
        sign => [undef, 'unsigned'],
        charsets => ['utf8', 'latin1']
};

$data = {
        numbers => [ 'digit', 'null', undef ],
        strings => [ 'letter', 'english' ],
        blobs => [ 'data' ]
}

[edit] Configuring the Tables

The $tables section describes the sizes and the other attributes of the tables. The example above will create 24 tables, one for each combination of storage engine, partitions and table size. This section accepts the following parameters:

  names => ['A','B','C','D'],
 views => ['ALGORITHM=MERGE', 'ALGORITHM=TEMPTABLE'],
 merges => ['INSERT_METHOD=LAST','INSERT_METHOD=FIRST'],

Tables names are set according to the properties of the generated tables, and/or according to the names parameter if set. If the names parameter is not set, or if there are not enough names specified, the following approximate name structure is used:

table<rows>_[engine]_[charset]_[collation]_[partition]_<pk>_<row-format>

Resulting in table names like:

table0_int_autoinc
table1000_innodb_int_autoinc
table100_myisam_key_pk_parts_2_int_autoinc

If views is specified, views are named after the table upon which they are based, as v<tablename>_<sequence>, e.g.:

vtable1000_innodb_int_autoinc_0
vtable1000_innodb_int_autoinc_1
vA_0


[edit] Configuring the Fields

The $fields section describes the fields and the indexes to be created in each table. One field will be created for each combination of values. For example, the resulting table will have a field int_unsigned_unique and the corresponding key UNIQUE (int_unsigned_unique).

The order of the fields within each table is pseudo-random, which may help trigger bugs which are dependent on the physical placement of the fields. Each table also has a PRIMARY KEY named, appropriately, pk.

The following settings are available:

[edit] Configuring the Data

The $data section from the configuration file describes what data will be inserted into each field type: numbers, strings and blobs.

numbers => [ 'tinyint unsigned' , 'digit' ]

to create a mixture of 50% values from 0 to 9 and 50% values from 0 to 255.

To achieve a specific proportion of values, specify a value more than once, for example:

numbers => [ 'null', 'null', 'digit'] 

will create 66% nulls and 33% single digits .


strings => [ 'foo', 'bar' ]

Will cause random data to be generated containing an even proportion between the literal words 'foo' and 'bar'.


Retrieved from "http://forge.mysql.com/wiki/RandomDataGenerator"

This page has been accessed 7,917 times. This page was last modified 14:15, 13 December 2011.

Find

Browse
MySQLForge
Main Page
Current events
Recent changes
Random page
Help
Edit
Edit this page
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Special pages
New pages
File list
Statistics
Bug reports
More...