Category: Specifications

SimpleDataAuditing


This page describes the MySQL plan for Simple Data Auditing, which is currently under consideration. Internally it's known as WL#2878 (WL stands for "Work Log"; conventionally worklog entries are written like this). The worklog is updated periodically, and at regular intervals we'll update this page.

The format of this page has been derived manually from the original entry, and may contain errors and slightly outdated information. Please quote from this page when sending feedback on the specification. Thank you.

Contents

[edit] High-level description

MySQL will provide simple data auditing features.

This task is related to:

We mainly need the information that mysqld produces in log files, e.g. with "mysqld --log", "mysqld --log-error", "mysqld --log-warnings". Fields should include:

The results should be readable as CSV files.

If there is more than one server, logs must be merged. There are some scheduling requirements.

There are reporting requirements, but they are not server tasks.

We are waiting for Jimmy Guerrero or Robin Schumacher to fill in the description.

<<<<Added by Jimmy Guerrero Dec 6 2005 >>>>

In a nutshell what is required from the server team is the ability to guarantee that ALL DDL, DML, DCL, SELECTs, and Server Admin commands be logged to a log, a series of logs (as with Microsoft SQL Server) or to a table (as with Oracle) so that the Data Auditing Agent can scrape or select from.

At a minimum the complete SQL statement, IP or Host, Login, Application, Timestamp, error number if any, and success or failure of the command should be logged.

Truncating, selectively deleting or marking this log should not impact the MySQL Server in any way.

Logging must be independent of connector (JDBC, Perl DBI, C API, etc). -- end note

<<<<Added by Jimmy Guerrero Sept 12 2006 >>>>

Some more general descriptions concerning typical reporting needs:

-- end note

[edit] High-level specification

The audit log contains information about SQL statements, and is written at end-of-statement, and is not for replication. So its closest equivalent is "slow query log". The proposal is: add new options for slow query log, instead of adding a new kind of log. We can refer to the changed slow query log as "audit log" for marketing purposes, but it's really "slow query log with minimum time = zero and a few more fields".

[edit] Starting

The difference between the regular slow query and audit log is: long_query_time can be zero (or, perhaps, minus one). If it is, then everything is logged. So SET @@long_query_time = 0; enables auditing with the slow query log.

[edit] Fields

For existing fields, use the names given by WL#1019:

 name           contents  
 ----           --------  
 start_time     TIMESTAMP  
 user_host      MEDIUMTEXT  
 query_time     TIME  
                We don't need this for auditing.  
 lock_time      TIME  
                We don't need this for auditing.  
 rows_sent      INTEGER  
                We don't need this for auditing.  
 rows_examined  INTEGER  
                We don't need this for auditing.  
 db             VARCHAR  
                We don't need this for auditing. However, it's easier to 
                interpret the value of the sql_text column if one knows what
                the last "use database_name;" statement was. 
 last_insert_id INTEGER  
                We don't need this for auditing.  
 server_id      INTEGER  
 sql_text       MEDIUMTEXT  
                If this is not a logging of COM_QUERY, the value is the 
                internal name, e.g. 'COM_QUIT'. If it is a logging of 
                COM_QUERY, then it's the SQL statement.  
               

The new fields are:

 sql_text_cset  VARCHAR 
                It would be easiest to say that sql_text is UTF8, but logging 
                in the client's current character set should be possible. So,
                this is the name of the character set that sql_text is in. To 
                see all possible names, see information_schema.character_sets. 
 error          INTEGER  
                It's easy to see, from this, whether the statement succeeded
                or failed. We won't show SQLSTATE, and  we won't show the error
                message application (Appears in the request document, attached)
                If this means "application program" or "package" or "language"
                ("Java", "MySQL GUI", etc.), then the value is NULL in version
                5.2. If this means "SQL procedure", it's irrelevant because we
                won't show statements that are within procedures. This could, 
                however, be a good place to put "Event Name". 
                Requirement from product management is: 
                "Ideally the column should hold the name of the client 
                application that created the connection to MySQL Server. This 
                column should be populated with the values passed by the
                application rather than the displayed name of the program."
                Conclusion: the "application" column will be NULL in the 
                initial version unless the logged statement is from an event, 
                in which case the event name is shown.
 "OS login"     (Appears in the request document) 
                This is probably tied up with other security-related features
                that we're not certain about yet. In WL#1054 we say that 
                system_user() won't be operating-system user. 
 role_name      If we implement WL#988 Roles, the question may arise: 
                "by what authority did the user execute this?" 
                We don't want to search for the last SET ROLE statement. 

[edit] Events

The following events cause writing to the slow query log.

Any command that sql_parse.cc:dispatch_command() sees (COM_INIT_DB, COM_REGISTER_SLAVE, COM_TABLE_DUMP, COM_CHANGE_USER, COM_STMT_EXECUTE, COM_STMT_FETCH, COM_SEND_LONG_DATA, COM_STMT_PREPARE, COM_STMT_CLOSE, COM_STMT_RESET, COM_STATISTICS, COM_PING, COM_QUERY, COM_QUIT, COM_CREATE_DB, COM_DROP_DB, COM_REFRESH, COM_SHUTDOWN).

Currently the slow query log only has COM_QUERY statements (alter analyze backup call change check checksum commit create dellocate delete describe do drop execute flush grant handler help insert kill load lock optimize keycache preload prepare purge release rename repair replace reset restore revoke rollback savepoint select set show slave start truncate unlock update use xa).

Any error that causes the DBMS to write to 'error log'. In this case, the value of the user_host field is:

Connect. This can be tracked outside dispatch_command(), perhaps in sql_parse.cc:handle_one_connection(). Notice that a refused connection (which can happen becaues there's no space, because the packet is wrong, etc.) will not appear in the log.

(There was a paragraph here about 'Deadlock', it was removed due to objections made at a Sorrento meeting.)

Statements caused by CREATE EVENT will be in the log. The user name will be the name of the event's creator, since events run with definer privileges. To distinguish an event-caused statement from a user-caused statement, the "application" field will contain the event name.

[edit] Switch logs

The requirement is: people must be able to move/destroy/edit logs while the server is running. We make this possible by allowing log switching.

Currently, one specifies the slow query log as a mysqld switch and ensures writing with FLUSH LOGS. We need a bit more than that.

Syntax:

AUDIT filename;

The verb AUDIT also appears in Oracle.

The 'name' is the name of a log file. This must be a string but it doesn't have to be a literal. Example: 'C:\WINNT\LOG1'.

You need the RELOAD privilege to do AUDIT, because something vaguely akin to FLUSH LOGS occurs.

When you execute AUDIT, you cause closing of any existing slow log file, and opening of a new file with name = filename. So this makes it easy to switch to a new log file, and then you can edit the old log file. However, you can't edit the log file until the server closes it.

Since version 5.1.6 it's been possible to log to a table. But "AUDIT filename" has no effect if logging is to a table.

[edit] Yet Another Variable

The auditing people want to log every statement. But others want to log every statement, except statements that end in an error (the only exception is: they want to see any deadlock errors).

So we have to add 'long_query_errors_too', which is TRUE if we want to log statements that end with errors, and is FALSE otherwise. The default is TRUE.

[edit] Features

The log will be readable with the new WL#1019 feature (WL#1019 = "SELECT FROM slow_query logs", complete).

If one starts the server with

one can control whether the log is in a file, or in a table, or in nothing.

The log will be visible with the mysqldumpslow utility.

[edit] Limitations

There's no filtering. It's likely that the log will get big and unwieldy very quickly. Perhaps this is not an issue for the people who want audits, since they want to sell a program which does filtering after the logs are produced.

There's no encryption.

There's no compression.

LOAD is logged, but the contents of the file are not logged.

CALL is logged, but the statements within routines are not logged.

EXECUTE is logged, but the contents of its argument are not logged. (You have to search for an earlier PREPARE statement.)

INSERT/UPDATE/DELETE are logged, but statements that they trigger are not logged.

We use operating-system buffering. Therefore:

We don't show what switches were in place when mysqld started. So you can't tell the sql_mode, or whether autocommit is on.

We may strip comments.

There is no special statement for adding notes to the log.

Since the statement is logged when it ends (not when it starts):

There can only be one log at a time.

There is no linkage with the plans for monitoring. Although WL#2360 is also "scheduled" for version 5.2, it's stalled. So it seemed best to assume that this logging task will have to be completed before the monitoring task starts.

[edit] Outstanding Issues

These issues were raised during the review meeting of this task during the Sorrento DevCon, in March/2006:

- Replication: If replication errors should be included in the audit log, then we need a WL on that in particular. The rpl errors are not that structured right now, not all of them are listed in the error file, some are just printed out on screen. We need to fix that at some point in time.

- Name change: Do we change the name of the slow_query log to audit_log? If so, in which version? What sort of notification to users is necessary?

[edit] References

"Oracle audit"

"SQL Server 2000 auditing"

WL#1282 Add log file which would log error codes together with queries.

Retrieved from "http://forge.mysql.com/wiki/SimpleDataAuditing"

This page has been accessed 10,177 times. This page was last modified 21:50, 24 January 2011.

Find

Browse
MySQLForge
Main Page
Current events
Recent changes
Random page
Help
Edit
Edit this page
Editing help
This page
Discuss this page
Post a comment
Printable version
Context
Page history
What links here
Related changes
My pages
Special pages
New pages
File list
Statistics
Bug reports
More...