WL#3759: Optimize identifier conversion in client-server protocolAffects: Server-5.5 — Status: Complete — Priority: LowSince 4.1, we use utf8 to store idenfifiers on disk,
in memory, for lookups, for comparison, and so on.
Move to utf8 was done as a consequence of introducing
multiple character set support under the same
server, in the same database, in the same table,
or even in the same SQL statement - character set
of identifiers must be a super set for all supported
character sets.
Tests with "valgrind --cachegrind" profiler detected
some peformance degradation between mysqld versions 4.0 and 4.1.
The source of slow down is in latin1->utf8->latin1
identifier conversion.
A test client program using latin1 client character set
was sending "SELECT a FROM t1", 100000 times, against an
empty heap table:
CREATE TABLE t1 (a int NOT NULL) TYPE=HEAP;
Version 4.1 generated 1,813,494,123 work units on mysqld side.
While version 4.0 produced only 1,393,268,776.
4.0 was 25% faster for this kind of client application.
There were extra 420,225,347 work units, with most important being:
74,502,664 sql_string.cc:String::copy()
43,901,866 ctype-utf8.c:my_utf8_uni
42,902,112 ctype-latin1.c:my_wc_mb_latin1
34,800,812 protocol.cc:Protocol::store_string_aux
21,600,676 charset.c:my_charset_same
3,000,060 ctype-utf8.c:my_ismbchar_utf8
6,000,000 protocol.cc:Protocol::send_fields
This is because of utf8->latin1 conversion is done
during Protocol::send_fields().
This is very unpleasant performance degradation, especially
for the users who want only a single character set (like in 4.0).
The WL#1898 proposes to compile a "light" version mysqld,
with a single character set, which will mean that
no character set conversion is necessary at all, and performance
should return closer towards performance of 4.0.
However, even in "full" version, we can improve performance
significantly. In many cases "full featured" conversion
is not really necessary. For example, the test program was
using just pure ASCII identifiers which are compatible between
utf8 and latin1.
We can optimize the code for the cases like utf8->latin1,
and even for some multibyte character sets, for example utf8->gbk.
Typical conversion scenarios and the ways of their optimization 1. A new method for the "Protocol" class will be added: No Comments yet |
VotesWatches0 members are watching this worklog
You must be logged in to track this worklog.
Provide Feedback
You must be logged in to comment
|