Author: Adriano dos Santos Fernandes <adrianosf at uol.com.br>
Architecture
------------
Firebird allow you to specify character sets and collations in every field/variable declaration.
You can also specify the default character set at database create time and every CHAR/VARCHAR declaration that omit character set will use it.
At attachment time you can specify the character set that the client want to read all the strings.
If you don't specify one, NONE is assumed.
There are two specials character sets: NONE and OCTETS.
Both can be used in declarations but OCTETS can't be used in attachment.
They are very similar with the exception that space of NONE is ASCII 0x20 and space of OCTETS is 0x00.
They are specials because they don't follow the rule of others character sets regarding conversions.
With others character sets conversion is performed with CHARSET1->UNICODE->CHARSET2. With NONE/OCTETS the bytes is just copied: NONE/OCTETS->CHARSET2 and CHARSET1->NONE/OCTETS.
Enhancements
------------
Well-formedness checks
----------------------
Some character sets (specially multi-byte) don't accept everything.
Now, the engine verifies if strings are wellformed when assigning from NONE/OCTETS and strings sent by the client (the statement string and parameters).
When NONE is used as attachment character set, the sqlsubtype member of XSQLVAR has the character set number of the read field, instead of always 0 as in previous versions.
The UNICODE_FSS character set has a number of problems: it's an old version of UTF8, accepts malformed strings and doesn't enforce correct maximum string length. In FB 1.5.X UTF8 it's an alias to UNICODE_FSS.
New character sets and collations are implemented through dynamic libraries and installed in the server with a manifest file in intl subdirectory. For an example see fbintl.conf.
Not all implemented character sets and collations need to be listed in the manifest file. Only those listed are available and duplications are not loaded.