WL#4813: PERFORMANCE_SCHEMA Instrumenting Stages

Affects: Benchmarks-3.0 — Status: Un-Assigned — Priority: Very High

WL#2360 "Performance Schema" mentions "stages". They're part of a
hierarchy Transaction -> Statement -> Stage -> Wait.

WL#2360 has a section "[ NESTING-EVENT SUMMARIES ]" which
describes the appropriate columns of a nesting event.
It's unnecessary to repeat them here. The main concern is;
"what is a stage". By definition a stage is a process
that takes place during statement processing (it doesn't
span a statement), and examples that we've talked about
include "parse" and "filesort". We should be more exact.

Stages in mysql.backup_progress, show profiles
----------------------------------------------

An example of 'stages' in mysql-6.0-backup:
> > SELECT * FROM mysql.backup_progress;
+---------+--------------+--------------+--------+----------+--------+---------+
| backup_ | start_time   | stop_time    | total_ | progress | error_ | notes   |
| id      |              |              | bytes  |          | num    |         |
+---------+--------------+--------------+--------+----------+--------+---------+
|     279 | ... 00:00:00 | ... 00:00:00 |      0 |        0 |      0 | starting|
|     279 | ... 00:00:00 | ... 00:00:00 |      0 |        0 |      0 | running |
|     279 | ... 00:00:00 | ... 00:00:00 |      0 |        0 |      0 | validity|
|     279 | ... 22:14:46 | ... 00:00:00 |      0 |        0 |      0 | vp time |
|     279 | ... 00:00:00 | ... 00:00:00 |      0 |        0 |      0 | running |
|     279 | ... 00:00:00 | ... 00:00:00 |      0 |        0 |      0 | complete|
+---------+--------------+--------------+--------+----------+--------+-- ------+
6 rows in set (2.16 sec)

An example of 'stages' in mysql-6.0 with the profiling patch:
mysql> show profile cpu for query 2;
+--------------------+----------+----------+------------+
| Status             | Duration | CPU_user | CPU_system |
+--------------------+----------+----------+------------+
| starting           | 0.000337 | 0.000000 |   0.000000 |
| Opening tables     | 0.000200 | 0.000000 |   0.000000 |
| System lock        | 0.000118 | 0.000000 |   0.000000 |
| Table lock         | 0.000089 | 0.000000 |   0.000000 |
| init               | 0.000193 | 0.000000 |   0.000000 |
| end                | 0.000076 | 0.000000 |   0.000000 |
| query end          | 0.000072 | 0.000000 |   0.000000 |
| freeing items      | 0.000221 | 0.000000 |   0.000000 |
| logging slow query | 0.000072 | 0.000000 |   0.000000 |
| cleaning up        | 0.000111 | 0.000000 |   0.000000 |
+--------------------+----------+----------+------------+
10 rows in set (0.01 sec)
The "Status" values are I guess as in the MySQL Reference Manual
"7.5.5.2. General Thread States"
"7.5.5.3. Delayed-Insert Thread States"
"7.5.5.4. Replication Master Thread States"
"7.5.5.5. Replication Slave I/O Thread States"
"7.5.5.6. Replication Slave SQL Thread States"
"7.5.5.7. Replication Slave Connection Thread States"
"7.5.5.8. Event Scheduler Thread States"
starting at http://mysql.cs.pu.edu.tw/doc/refman/6.0/en/general-thread-states.html
and SHOW PROCESSLIST is similar.

There's no decision yet whether we should keep the current categories.
If they appear to be the product of careful thought, okay.

Description of stages in another product can be seen by clicking on
'Progress' and looking at changes for 2009-04-13.

It's interesting that some of the above "stages" implementations
use getrusage() or equivalent. We considered getrusage to be of
limited use for short WAIT instrumenting, but for longer operations
we have to consider it. Peter Gulutzan thinks the overhead is too great and
would prefer to only have getrusage() statistics for statements not
stages. Sergei Golubchik suggested we should have whatever SHOW PROFILE has.
See dev-private email thread "Re: BUG#28288 Query Profiler not available in
MySQL 5.1 and 6.0",
[mysql internal address]/secure/mailarchive/mail.php?folder=4&mail=25820
That correspondence mentions that Peter would like to deprecate SHOW PROFILE,
perhaps as part of this task.

Requirements
------------

Stages "should" be sequential. That is:
* one stage should not be nested within another stage
* one stage should not overlap with another stage.
But we must accept that nesting and overlapping will happen.
Keeping a stack is possible, but not for all the thousands of aggregator bytes.
The best we can hope for is to have:
Row#1     First part of Event A
Row#2     All of Event B
Row#3     Second part of Event A
For this, we need to recover the fact that we are resuming Event A
when Event B ends. So, when starting Event B, we need to save that
we were in Event A. And when ending Event B, we recover that fact.
And in Row#3 there is an indication that "this is Part 2" or
"Continued".

Tables in PERFORMANCE_SCHEMA
----------------------------

The name "stages" is uncertain. Other possible names are STATE
(which MySQL uses already) or SEGMENT or STEP or DATABASE CALL,
or [ fill in your own favourite here ].

So the main tables are
EVENTS_STAGES_CURRENT equivalent to EVENTS_WAITS_CURRENT
EVENTS_STAGES_HISTORY equivalent to EVENTS_WAITS_HISTORY
and the column descriptions are the same as in EVENTS tables,
plus summary columns (see WL#4816 section "[ NESTING-EVENT SUMMARIES ]").
So why have a different set of tables for stages?
One thought was: their content is backed up by a different set
of internal buffers. Another thought was: they obviously aren't
WAITS. Indeed, they're obviously not EVENTS either, but we're
sticking to the word EVENTS for all the tables with instrumentation
data.

EVENTS_STAGES_SUMMARY_BY_THREAD_BY_GENUS is the summaries equivalent
for STAGES equivalent to EVENTS_WAITS_SUMMARY_BY_THREAD_BY_GENUS.
There are also of course the nesting-event summaries.
We should consider at least two alternative ways to fill these tables:
(1) At start / During Read / At End.
    When the stage begins, save all the summaries data from the WAITS
    equivalent table, for example "COUNT_IO = 555" Whenever somebody
    reads EVENTS_STAGES_SUMMARY_BY_THREAD_BY_GENUS, get the most recent
    summaries data from the WAITS equivalent table, for example "COUNT_IO =
    666", subtract what the value was at the start, and return the
    difference (666 - 555) "COUNT_IO = 111" to the reader. When the stage
    ends, do the same difference calculation, it's guaranteed to be the
    total number of IO waits done by the thread (though not by any sub-threads)
    while the stage was going on. It works because waits are atomic and
    nested within stages.
(2) Continuously.
    Whenever a WAIT event takes place, add to the counters for the
    NESTING_EVENT_ID. This method probably has higher overhead if there
    are many waits in a stage. But when we get to higher points in the
    nesting hierarchy, e.g. statements, the trick of alternative way (1)
    might become impossible so the code for it isn't reusable. (The problem
    is that statements can contain statements.)

References
----------

dev-private email thread "Tables for showing what stage a statement has reached"
starting with
[mysql internal address]/secure/mailarchive/mail.php?folder=4&mail=21981

WL#1023 Check if we can get CPU time used by thread (1023)
WL#4116 Online Backup: Record Online Backup Progress
WL#4170 Provide progress indication for long queries and operations

Blog article Show profile + Information_schema.profiling
http://blogs.mysql.com/peterg/2008/11/06/show-profile-information_schemaprofiling/

You must be logged in to tag this worklog

No Comments yet

Votes

Not yet rated.
You must be logged in to vote.

Watches

0 members are watching this worklog
You must be logged in to track this worklog.

Provide Feedback

Please note:
HTML will be purified, but we allow for a number of HTML tags so that you have the flexibility to decorate your comment text to some extent. The comments allow the following HTML tags:

strong, b, em, blockquote, a, code, pre

To put code into your comment, simply encapsulate your code with
[code language="XXX"][/code], where XXX is any common language, for instance "PHP", "SQL", "C", etc.



You must be logged in to comment