firebird-mirror/doc/README.connection_string_charset.txt

Author: Adriano dos Santos Fernandes <adrianosf@uol.com.br>
Date: 2008-12-15

Before FB 2.5, filenames used in the connection string are always passed from the client to the
server without any conversion. On the server, that filenames are used with OS API functions without
any conversion too. This creates the situation where filenames using non-ASCII characters do not
interoperate well when the client and the server are different OS or even same OS using different
codepages.

The problem is addressed in FB 2.5 in the following way:

The filename is considered, by default, to be on the OS codepage.

A new DPB is introduced, named isc_dpb_utf8_filename. It meaning is to change rule above, so FB
should consider the passed filename as being in UTF-8.

If a v2.5 (or superior) client is communicating with a remote server inferior than v2.5, and
isc_dpb_utf8_filename was used, the client converts the filename from UTF-8 to the client
codepage and pass that filename to the server. The client removes isc_dpb_utf8_filename DPB.
This guarantees backward compatibility where people are using the same codepage on the client and
server OS.

If a v2.5 (or superior) client is communicating with a v2.5 (or superior) server, and
isc_dpb_utf8_filename was not used, the client converts the filename from the OS codepage to
UTF-8 and inserts the isc_dpb_utf8_filename DPB. If isc_dpb_utf8_filename was used, the client just
pass the original filename withing the DPB to the server. So the client always pass to the server
UTF-8 filename and the isc_dpb_utf8_filename DPB.

The filename received on the server is subject to the same rules above. But note that v2.5 client
may automatically coverts the filename and insert the DPB. Client inferior than v2.5 do not,
so the received filenames are going to be considered as on the server codepage. We again guarantees
backward compatibility when client and server codepage are the same.

The OS codepage and UTF-8 may not be the better choice for filenames. For example, if you had a
ISQL (or some other tool) script and that script uses another connection charset. You could not
correctly edit a script (or any file) using multiple character sets (codepages). So you may now
encode any Unicode character as ASCII characters on the connection string filename. That's
accomplished using the symbol #. It is a prefix for an Unicode code point number (in hexadecimal
format, like U+XXXX notation). You should write it in this way: #XXXX with X being 0-9, a-f, A-F.
If you want to use the literal #, you could use ## or #0023 (the code point number of it).
That character is interpreted with this new semantics at the server even if the client is inferior
than v2.5.

The OS codepage used for conversions is:
- Windows: The Windows ANSI code page
- Others: UTF-8
Readme for filename encoding changes 2008-12-16 00:53:03 +01:00			`Author: Adriano dos Santos Fernandes <adrianosf@uol.com.br>`
			`Date: 2008-12-15`

			`Before FB 2.5, filenames used in the connection string are always passed from the client to the`
			`server without any conversion. On the server, that filenames are used with OS API functions without`
			`any conversion too. This creates the situation where filenames using non-ASCII characters do not`
			`interoperate well when the client and the server are different OS or even same OS using different`
			`codepages.`

			`The problem is addressed in FB 2.5 in the following way:`

			`The filename is considered, by default, to be on the OS codepage.`

			`A new DPB is introduced, named isc_dpb_utf8_filename. It meaning is to change rule above, so FB`
			`should consider the passed filename as being in UTF-8.`

			`If a v2.5 (or superior) client is communicating with a remote server inferior than v2.5, and`
			`isc_dpb_utf8_filename was used, the client converts the filename from UTF-8 to the client`
			`codepage and pass that filename to the server. The client removes isc_dpb_utf8_filename DPB.`
			`This guarantees backward compatibility where people are using the same codepage on the client and`
			`server OS.`

			`If a v2.5 (or superior) client is communicating with a v2.5 (or superior) server, and`
			`isc_dpb_utf8_filename was not used, the client converts the filename from the OS codepage to`
			`UTF-8 and inserts the isc_dpb_utf8_filename DPB. If isc_dpb_utf8_filename was used, the client just`
			`pass the original filename withing the DPB to the server. So the client always pass to the server`
			`UTF-8 filename and the isc_dpb_utf8_filename DPB.`

			`The filename received on the server is subject to the same rules above. But note that v2.5 client`
			`may automatically coverts the filename and insert the DPB. Client inferior than v2.5 do not,`
			`so the received filenames are going to be considered as on the server codepage. We again guarantees`
			`backward compatibility when client and server codepage are the same.`

			`The OS codepage and UTF-8 may not be the better choice for filenames. For example, if you had a`
			`ISQL (or some other tool) script and that script uses another connection charset. You could not`
			`correctly edit a script (or any file) using multiple character sets (codepages). So you may now`
			`encode any Unicode character as ASCII characters on the connection string filename. That's`
			`accomplished using the symbol #. It is a prefix for an Unicode code point number (in hexadecimal`
			`format, like U+XXXX notation). You should write it in this way: #XXXX with X being 0-9, a-f, A-F.`
			`If you want to use the literal #, you could use ## or #0023 (the code point number of it).`
			`That character is interpreted with this new semantics at the server even if the client is inferior`
			`than v2.5.`

			`The OS codepage used for conversions is:`
			`- Windows: The Windows ANSI code page`
			`- Others: UTF-8`