WL#2878: Simple data auditing

Affects: Server-7.0 — Status: On-Hold — Priority: Medium

MySQL will provide simple data auditing features. 
 
This task is related to:  
WL#1410 Audit/Security log separate from general query log 
WL#1019 SELECT FROM slow_query logs 
 
We mainly need the information that mysqld produces in log 
files, e.g. with "mysqld --log", "mysqld --log-error", 
"mysqld --log-warnings". Fields should include: 
- for SQL statements: the entire SQL statement 
  with no truncation, the start time to the nearest second, 
  the error number or "OK", IP address, user name 
- for system messages (i.e. the sort that mysqld prints), 
  the time to the nearest second, and the message text 
- any messages that are deliberately added to the log, 
  if it turns out that such things can be produced with WL#1034 
The results should be readable as CSV files. 
 
If there is more than one server, logs 
must be merged. There are some scheduling requirements. 
 
There are reporting requirements, but they are not server tasks. 
 
We are waiting for Jimmy Guerrero or Robin Schumacher to fill in 
the description. 

<<<<Added by Jimmy Guerrero Dec 6 2005 >>>>

In a nutshell what is required from the server team is the ability 
to guarantee that ALL DDL, DML, DCL, SELECTs, and Server Admin commands be 
logged to a log, a series of logs (as with Microsoft SQL Server) or to a table 
(as with Oracle) so that the Data Auditing Agent can scrape or select from. 

At a minimum the complete SQL statement, IP or Host, Login, Application, 
Timestamp, error number if any, and success or failure of the command should be 
logged.

Truncating, selectively deleting or marking this log should not impact the 
MySQL Server in any way.

Logging must be independent of connector (JDBC, Perl DBI, C API, etc).
-- end note

<<<<Added by Jimmy Guerrero Sept 12 2006 >>>>

Some more general descriptions concerning typical reporting needs:

* Application Audit Report 
** List all activity by application. Note: Not sure this is possible given that
we are unable to track back the application that created the connection to MySQL.

* Database Object Audit Report
** Bulk data movement activity (i.e. LOAD DATA INFILE, etc)
** Bulk data movement activity for last N days 
** Backup, restore activity 
** Backup, restore for last N days 
** Activity for specified objects (SELECT, DELETE, INSERT, TRUNCATE, EXEC,
ALTER, etc)

* DDL Audit Report
** Schema changes made to specified databases 
** Schema changes made to specified databases in last N days

* Host Audit Report:
** Activity for specified hosts 

* Policy Audit Report
** Changes to audit settings. Note: Not sure this is possible if we have no
upfront filtering of what gets logged.

* Security Audit Report
** Security changes for specified objects 
** Security changes for specified objects in last N days 
** Security changes performed by specified users 
** Activity for which permission was denied 
** Login activity for specified users 

* User Audit Report
** Login creation activity 
** Login creation activity in last N days 
** Login deletion activity 
** List all login deletion activity in last N days
** Activity for specified users 
-- end note
The audit log contains information about SQL statements,  
and is written at end-of-statement, and is not for
replication. So its closest equivalent is "slow query log".
The proposal is: add new options for slow query log,
instead of adding a new kind of log. We can refer to
the changed slow query log as "audit log" for marketing
purposes, but it's really "slow query log with minimum
time = zero and a few more fields".

References to "the request document" are references to the
file MySQL_Network_Auditing_Manager_MRD.doc; see the
"File Attachments" section of this worklog task.

===Starting===
The difference between the regular slow query and audit
log is: long_query_time can be zero (or, perhaps, minus one).
If it is, then everything is logged. So
SET @@long_query_time = 0;
enables auditing with the slow query log.

Update: Monty changed long_query_time behaviour, see BUG#6238.
So now we need to use long_query_time = -1, or a new flag.

===Fields===
For existing fields, use the names given by WL#1019:

name contents
---- --------
start_time TIMESTAMP
user_host MEDIUMTEXT
query_time TIME
We don't need this for auditing.
lock_time TIME
We don't need this for auditing.
rows_sent INTEGER
We don't need this for auditing.
rows_examined INTEGER
We don't need this for auditing.
db VARCHAR
We don't need this for auditing. However, it's easier to
interpret the value of the sql_text column if one knows what
the last "use database_name;" statement was.
last_insert_id INTEGER
We don't need this for auditing.
server_id INTEGER
sql_text MEDIUMTEXT
If this is not a logging of COM_QUERY, the value is the
internal name, e.g. 'COM_QUIT'. If it is a logging of
COM_QUERY, then it's the SQL statement.

The new fields are:

sql_text_cset VARCHAR
It would be easiest to say that sql_text is UTF8, but logging
in the client's current character set should be possible. So,
this is the name of the character set that sql_text is in. To
see all possible names, see information_schema.character_sets.
error INTEGER
It's easy to see, from this, whether the statement succeeded
or failed. We won't show SQLSTATE, and we won't show the error
message application (for explanation see request document).
If this means "application program" or "package" or "language"
("Java", "MySQL GUI", etc.), then the value is NULL in version
5.2. If this means "SQL procedure", it's irrelevant because we
won't show statements that are within procedures. This could,
however, be a good place to put "Event Name".
Requirement from product management is:
"Ideally the column should hold the name of the client
application that created the connection to MySQL Server. This
column should be populated with the values passed by the
application rather than the displayed name of the program."
Conclusion: the "application" column will be NULL in the
initial version unless the logged statement is from an event,
in which case the event name is shown.
"OS login" (Appears in the request document)
This is probably tied up with other security-related features
that we're not certain about yet. In WL#1054 we say that
system_user() won't be operating-system user.
role_name If we implement WL#988 Roles, the question may arise:
"by what authority did the user execute this?"
We don't want to search for the last SET ROLE statement.

===Events===
The following events cause writing to the slow query log.

Any command that sql_parse.cc:dispatch_command() sees
(COM_INIT_DB, COM_REGISTER_SLAVE, COM_TABLE_DUMP,
COM_CHANGE_USER, COM_STMT_EXECUTE, COM_STMT_FETCH,
COM_SEND_LONG_DATA, COM_STMT_PREPARE, COM_STMT_CLOSE,
COM_STMT_RESET, COM_STATISTICS, COM_PING, COM_QUERY,
COM_QUIT, COM_CREATE_DB, COM_DROP_DB, COM_REFRESH,
COM_SHUTDOWN).

Currently the slow query log only has COM_QUERY statements
(alter analyze backup call change check
checksum commit create dellocate delete describe do drop
execute flush grant handler help insert kill load lock
optimize keycache preload prepare purge release rename
repair replace reset restore revoke rollback savepoint
select set show slave start truncate unlock update use
xa).

Any error that causes the DBMS to write to 'error log'.
In this case, the value of the user_host field is:
* "Server Error".

Connect. This can be tracked outside dispatch_command(),
perhaps in sql_parse.cc:handle_one_connection().
Notice that a refused connection (which can happen becaues
there's no space, because the packet is wrong, etc.) will
not appear in the log. See also WL#3955 Auditing SUPER users access.

(There was a paragraph here about 'Deadlock', it was
removed due to objections made at a Sorrento meeting.)

Statements caused by CREATE EVENT will be in the log.
The user name will be the name of the event's creator,
since events run with definer privileges. To distinguish
an event-caused statement from a user-caused statement,
the "application" field will contain the event name.

===Switch logs===
The requirement is: people must be able to move/destroy/edit
logs while the server is running. We make this possible by
allowing log switching.

Currently, one specifies the slow query log as a mysqld switch
and ensures writing with FLUSH LOGS. We need a bit more than that.

Syntax:

AUDIT filename;

The verb AUDIT also appears in Oracle.

The 'name' is the name of a log file. This must be a string
but it doesn't have to be a literal. Example: 'C:\WINNT\LOG1'.

You need the RELOAD privilege to do AUDIT, because
something vaguely akin to FLUSH LOGS occurs.

When you execute AUDIT, you cause closing of any existing
slow log file, and opening of a new file with name = filename.
So this makes it easy to switch to a new log file, and then
you can edit the old log file. However, you can't edit the
log file until the server closes it.

Since version 5.1.6 it's been possible to log to a table.
But "AUDIT filename" has no effect if logging is to a table.

===Yet Another Variable===
The auditing people want to log every statement.
But others want to log every statement, except statements that
end in an error (the only exception is: they want to see any
deadlock errors).

So we have to add 'long_query_errors_too', which is TRUE
if we want to log statements that end with errors, and is
FALSE otherwise. The default is TRUE.

===Features===
The log will be readable with the new WL#1019 feature
(WL#1019 = "SELECT FROM slow_query logs", complete).

If one starts the server with
* --log-output = { TABLE | FILE | NONE }
one can control whether the log is in a file, or in a table, or in nothing.

The log will be visible with the mysqldumpslow utility.

===Limitations===
There's no filtering. It's likely that the log will get big and
unwieldy very quickly. Perhaps this is not an issue for
the people who want audits, since they want to sell a
program which does filtering after the logs are produced.

There's no encryption.

There's no compression.

LOAD is logged, but the contents of the file are not logged.

CALL is logged, but the statements within routines are not logged.

EXECUTE is logged, but the contents of its argument are not logged.
(You have to search for an earlier PREPARE statement.)

INSERT/UPDATE/DELETE are logged, but statements that they trigger
are not logged.

We use operating-system buffering. Therefore:
* (a) the most recent statements might be lost in a system crash;
* (b) operating-system "flush everything to disk" calls take longer.

We don't show what switches were in place when mysqld started.
So you can't tell the sql_mode, or whether autocommit is on.

We may strip comments.

There is no special statement for adding notes to the log.

Since the statement is logged when it ends (not when it starts):
* (a) entries will not be in order by start time;
* (b) statements which never end will never be logged.

There can only be one log at a time.

There is no linkage with the plans for monitoring.
Although WL#2360 is also "scheduled" for version 7.x, it's stalled.
So it seemed best to assume that this logging task will have
to be completed before the monitoring task starts.

===Outstanding Issues===
These issues were raised during the review meeting of
this task during the Sorrento DevCon, in March/2006:

- Replication: If replication errors should be included in
the audit log, then we need a WL on that in particular. The
rpl errors are not that structured right now, not all of them
are listed in the error file, some are just printed out on screen.
We need to fix that at some point in time.

- Name change: Do we change the name of the slow_query log to
audit_log? If so, in which version? What sort of notification
to users is necessary?

===References===

"Oracle audit"
* http://www.securityfocus.com/infocus/1689

"SQL Server 2000 auditing"
* http://www.microsoft.com/technet/security/prodtech/sqlserver/sql2kaud.mspx

WL#1282 Add log file which would log error codes together with queries.

Feature requests:
BUG#6238 long_query_time less then 1 (now closed by Michael Widenius)

BUG#17350 General log should indicate whether commands were successful
BUG#17572 Deadlock log

See also
WL#4261 Add OS uid to general query log for localhot connections

You must be logged in to tag this worklog

Awesome! I used this on my penny stock newsletter site to grab leads from the huge tables

Votes

  • Rated 5.00 out of 5
Rated 5.00 out of 5 with 3 votes cast.
You must be logged in to vote.

Watches

5 members are watching this worklog
You must be logged in to track this worklog.

Provide Feedback

Please note:
HTML will be purified, but we allow for a number of HTML tags so that you have the flexibility to decorate your comment text to some extent. The comments allow the following HTML tags:

strong, b, em, blockquote, a, code, pre

To put code into your comment, simply encapsulate your code with
[code language="XXX"][/code], where XXX is any common language, for instance "PHP", "SQL", "C", etc.



You must be logged in to comment