|
Home > Archive > Unix Programming > September 2005 > Question on Unicode
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Question on Unicode
|
|
|
| Hi all,
I wanted to know what datatype to be used in a C/C++ program for a
16 bit i.e Unicode.
Here is my code that does'nt work.
unsigned short testdata[50]="\x400\x401\x402";
printf("Testdata %x\n",testdata[0]);
This code does'nt compile with cc compiler on unix. My flavor of UNIX
is SCO Unixware 7.1.1
Note data-type used is unsigned short and size of it 2 bytes(16 bits).
How to assign data to a 16-bit array. Is there a datatype?
How to go about it?
Thanks.
| |
| Fletcher Glenn 2005-09-23, 2:49 am |
|
"SAM" <mshyamrao@gmail.com> wrote in message
news:1127451303.140695.235700@g47g2000cwa.googlegroups.com...
> Hi all,
>
> I wanted to know what datatype to be used in a C/C++ program for a
> 16 bit i.e Unicode.
>
> Here is my code that does'nt work.
>
> unsigned short testdata[50]="\x400\x401\x402";
> printf("Testdata %x\n",testdata[0]);
>
> This code does'nt compile with cc compiler on unix. My flavor of UNIX
> is SCO Unixware 7.1.1
>
> Note data-type used is unsigned short and size of it 2 bytes(16 bits).
>
> How to assign data to a 16-bit array. Is there a datatype?
>
> How to go about it?
>
> Thanks.
>
Try looking up w_char.
--
Fletcher Glenn
| |
| Maxim Yegorushkin 2005-09-23, 7:50 am |
|
SAM wrote:
> I wanted to know what datatype to be used in a C/C++ program for a
> 16 bit i.e Unicode.
It depends on which library you use for unicode manipulation. If it is
glibc then the type is wchar_t, see <wchar.h>
| |
| Roger Leigh 2005-09-23, 7:50 am |
| On 2005-09-23, Fletcher Glenn <fandxxxmgiiBLOCKED@pacbell.net> wrote:
>
> "SAM" <mshyamrao@gmail.com> wrote in message
> news:1127451303.140695.235700@g47g2000cwa.googlegroups.com...
>
> Try looking up w_char.
ITYM wchar_t.
It might also be a good idea to use an OS other than SCO that actually
has UTF-8 locales and allows UTF-8 source code, that way you can
simply write
const wchar_t testdata = L"Grüße";
That said, wchar_t isn't necessarily 16 bits. On GNU/Linux, it's 32.
Is there any particular reason you need 16 bits? UCS is a 32-bit code.
| |
| Bjorn Reese 2005-09-23, 5:55 pm |
| Maxim Yegorushkin wrote:
> It depends on which library you use for unicode manipulation. If it is
> glibc then the type is wchar_t, see <wchar.h>
wchar_t has been a standard type since 1989 (C89 and XPG3), and
<wchar.h> a few years later (C94 and XPG4). So, it predates glibc
by a wide margin.
--
mail1dotstofanetdotdk
| |
| Mikko Rauhala 2005-09-23, 5:55 pm |
| On Fri, 23 Sep 2005 10:42:20 +0100, Roger Leigh
<${roger}@whinlatter.uklinux.net.invalid> wrote:
> That said, wchar_t isn't necessarily 16 bits. On GNU/Linux, it's 32.
> Is there any particular reason you need 16 bits? UCS is a 32-bit code.
wchar_t doesn't even necessarily use Unicode code points internally,
though probably does on most relevant systems. One is advised to
check the presence of the __STDC_ISO_10646__ macro before making
assumptions on what's inside wchar_t. (If present, wchar_t contains
ISO-10646-1/Unicode code points.)
--
Mikko Rauhala - mjr@iki.fi - <URL:http://www.iki.fi/mjr/>
Transhumanist - WTA member - <URL:http://www.transhumanism.org/>
Singularitarian - SIAI supporter - <URL:http://www.singinst.org/>
| |
| Maxim Yegorushkin 2005-09-23, 5:55 pm |
|
Bjorn Reese wrote:
> Maxim Yegorushkin wrote:
>
>
> wchar_t has been a standard type since 1989 (C89 and XPG3), and
> <wchar.h> a few years later (C94 and XPG4). So, it predates glibc
> by a wide margin.
So what?
If you reread my posting carefully you might notice that I don't speak
about what predates what, neither do I care.
I'm talking that if you use some library for handling unicode you may
like to stick to whichever type the library accepts as unicode code
points. glibc accepts wchar_t, ICU - UChar32.
| |
| Bjorn Reese 2005-09-26, 6:02 pm |
| Maxim Yegorushkin wrote:
> So what?
>
> If you reread my posting carefully you might notice that I don't speak
> about what predates what, neither do I care.
Your posting was phrased in a glibc context, which could make it easy to
misinterpret your reply as if wchar_t is a glibc-only feature.
I intended to elaborate on your reply by making the original poster, who
is using UnixWare, aware that wchar_t was introduced to the standards
quite some time ago, which means that wchar_t is widespread and
therefore a good option.
I am a bit surprised at your dismissive reaction.
> I'm talking that if you use some library for handling unicode you may
> like to stick to whichever type the library accepts as unicode code
> points. glibc accepts wchar_t, ICU - UChar32.
Fair point.
--
mail1dotstofanetdotdk
| |
| Maxim Yegorushkin 2005-09-27, 2:52 am |
|
Bjorn Reese wrote:
[]
> I am a bit surprised at your dismissive reaction.
Sorry, didn't mean to offend you.
|
|
|
|
|