adelton

Czech support in MySQL

Czech collation support for MySQL 3.23

Current versions of MySQL (3.23+) contain support for Czech collation approximating Czech standard. The support has to be compiled in the server.

To set the Czech collation as the default one, compile the server with ./configure's option --with-charset=czech. In this case, all sortings on character columns will use Czech rules. When a compile-time option ./configure --with-extra-charsets=all is used, server will support multiple character sets and collations and actual variant can be set upon server startup, with run-time parameter --default-character-set=czech. The default is again given by the parameter --with-charset. When you change the character set used by the server, indexes have to be regenerated, see Chapter The Character Set Used for Data and Sorting in MySQL documentation. Server parameters can also be specified in configuration file, typically in /etc/my.cnf, using

[mysqld]
default-character-set=czech

This Czech collation table implements case sensitive order of letters. MySQL manual talks about case insensitivity but that only holds in the default (Latin1) situation.

Character sets

Sorting uses the ISO-8859-2 character set. It your data on the client side is in character set Windows-1250 (often, people will realize this when words with letters š and ž get sorted incorrectly --- ISO-8859-2 and Windows-1250 are similar but not exactly the same), on-line translation of character sets between the server and the client can be set. Server has to have this feature compiled in, the easiest way is to remove comment in file sql/convert.cc

/* #define DEFINE_ALL_CHARACTER_SETS */

before compilation. Then, in the client, issue command

SET CHARACTER SET cp1250_latin2

Client will work with data in Windows-1250 and server will store it in ISO-8859-2.

Server messages

The MySQL distribution contains message catalogue translated to Czech, translated messages will get turned on at server start-up with parameter --language=czech.

Support for half-Czech collation in MySQL in Windows-1250

The distribution mysql-3.23.42-win1250ch-1.tar.gz contains file strings/ctype-win1250ch.c, that implements simpler two-pass sorting similar to the Czech one. In this collation, ch is sorted correctly but the primary ordering is case insensitive and is in the Windows-1250 character set. Included in the distribution are also patches of the Configure.in and sql/share/charsets/Index.

Support for nearly complete UCA UTF-8 collation in MySQL

The distribution mysql-3.23.42-utf8adnocase.tar.gz contains support for ordering based on UCA algorithm. Included are two algorithms, case sensitive and case insensitive.

Both (collation) character set win1250ch and utf8ad should also be coming included with all current versions of MySQL by now.

Stripping diacritics in MySQL (il2 to ascii)

I wrote functions that can be used in MySQL to convert text to plain ASCII, and conversions between ISO-8859-2 and Windows-1250. The distribution is called udf_charsets-1.0.tar.gz, and also contains README with installation instructions.

Author

Copyright: (c) 1998--2001 Jan Pazdziora. All rights reserved. This package is free software; you can redistribute it and/or modify it under the terms of either GPL or Artistic Licence, whichever you like more.