06-30-04 11:00 PM
SQL only supports little endian Unicode (x86 processor architecture is
little endian). Technically SQL understands little endian UCS-2, but
UCS-2 is equivalent to UTF-16 with the exception of surrogate pairs. You
can store and retrieve little endian UTF-16 data including surrogates in
a Unicode (nvarchar) column of SQL 2000 with a few caveats. SQL 2k is
what we term "surrogate safe", meaning:
- Surrogate characters can be entered and retrieved without data loss.
- Surrogate characters are considered two separate unicode characters,
i.e. an nvarchar(1) can not fit a surrogate character.
- String operations are not "surrogate aware". E.g.
- Substring(nvarchar(2),1,1) will result in half a surrogate character
if the input is a 4-byte surrogate character.
- In sorting & searching, all surrogate characters compare equal to all
other surrogate characters.
SQL 2000 was written before Unicode 3.0 existed (it was written more in
the Unicode 2.0 timeframe). This means that the meaning of characters
that were defined in the standard fairly recently will not be recognized
by SQL. They can still be stored and retrieved, but SQL considers them
to be "undefined" UTF-16 Unicode code points. Undefined Unicode
characters are handled like surrogates, by which I mean that they are
considered to be equal to all other undefined code points.
I apologize that I am not very familiar with HKSCS, but what I can find
on this standard seems to indicate that it is a standard set of
characters that can be mapped into the user-defined regions of Big-5 or
Unicode/ISO10646. If by "hkscs" you mean characters encoded in the
user-defined range of code points in the Unicode standard, then these
could be stored in a Unicode column and would be handled just like
surrogates or undefined Unicode characters as described above. However,
if by "hkscs" you are referring to Hong Kong data encoded with the Big-5
character set, that cannot be stored in the same column as Unicode data
because the encoding schemes are completely different.
HTH,
Bart
------------
Bart Duncan
Microsoft SQL Server Support
Please reply to the newsgroup only - thanks.
This posting is provided "AS IS" with no warranties, and confers no
rights.
--------------------
Thread-Topic: Unicode and hkscs
thread-index: AcRduJRUB/0bq4VnSkaTTGA0w9DC9A==
X-WBNR-Posting-Host: 210.176.229.83
From: examnotes <Fai@discussions.microsoft.com>
Subject: Unicode and hkscs
Date: Tue, 29 Jun 2004 02:08:01 -0700
Lines: 1
Message-ID: <D1956FD1-97BF-49C7-9016-66CABE3B1960@microsoft.com>
MIME-Version: 1.0
Content-Type: text/plain;
charset="Utf-8"
Content-Transfer-Encoding: 7bit
X-Newsreader: Microsoft CDO for Windows 2000
Content-Class: urn:content-classes:message
Importance: normal
Priority: normal
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
Newsgroups: microsoft.public.sqlserver.server
NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 127.0.0.1
Path: cpmsftngxa10.phx.gbl!TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
Xref: cpmsftngxa10.phx.gbl microsoft.public.sqlserver.server:348210
X-Tomcat-NG: microsoft.public.sqlserver.server
Does sql2000 support unicode utf-16 unicode big endian, unicode 3.0 and
hkscs format in a single column
[ Post a follow-up to this message ]
|