WL#3764: Sinhala Collation

Affects: Server-5.5 — Status: Complete — Priority: Medium

Support Sinhala (Sri Lanka) collation for ucs2 and utf8 character sets.

The original patch came from Harshula Jayasuriya, for the
community edition. Decisions and discussions are in the thread
"Re: Patch to add Sinhala (Sri Lanka) collation to MySQL"
(recipients: Bar, Lenz, Peter, Sergei).

See also:
BUG#26474 Add Sinhala script (Sri Lanka) collation to MySQL
From Harshula: """

The Unicode codepage for Sinhala can be broken into 3 categories.
U+0D85 - U+0D96 = independent vowels
U+0D9A - U+0DC6 = consonants
U+0DCA - U+0DF3 = dependent vowels.
U+0D82 - U+0D83 = consonant modifiers.

The collation order of the groups are:
1) independent vowels
2) consonant modifiers
3) consonants
4) dependent vowels.
"""

The standard speaks rather obliquely: """

a) Conjunct letters (බ ඳ අකර) are decomposed into the equivalent
<pure consonant, consonant-with-vowel> sequence e.g. ක ->කක .
b) Touching letters are decomposed into the equivalent <pure consonant,
consonant-with-vowel> sequence e.g. සව -> සව, මම -> මම.c) The
yansaya and rakaransaya are decomposed into their equivalent forms e.g:ක
-> කය and ක-> කර.
d) The repaya is decomposed in its equivalent form e.g.ර ->රම.
e) The letter ඥ is decomposed as follows:
ඥ->ජඤ.
Thus, ඥ න is collated as being equivalent to ජඤ න.

...

The algorithm for the Simple collation is the same as for the Dictionary
collation sequence, except that the decomposition in step d) of 4.1 is omitted.
Therefore, ඥ is not decomposed into ජඤ but treated as a single letter.

"""

(Additional notes: The standard is not described using Unicode code-points.
The PDF specification can't be scraped via normal means to get literal data.)

You must be logged in to tag this worklog

No Comments yet

Votes

Not yet rated.
You must be logged in to vote.

Watches

1 members are watching this worklog
You must be logged in to track this worklog.

Provide Feedback

Please note:
HTML will be purified, but we allow for a number of HTML tags so that you have the flexibility to decorate your comment text to some extent. The comments allow the following HTML tags:

strong, b, em, blockquote, a, code, pre

To put code into your comment, simply encapsulate your code with
[code language="XXX"][/code], where XXX is any common language, for instance "PHP", "SQL", "C", etc.



You must be logged in to comment