Unix Programming - Question on Unicode

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > September 2005 > Question on Unicode





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Question on Unicode
SAM

2005-09-23, 2:49 am

Hi all,

I wanted to know what datatype to be used in a C/C++ program for a
16 bit i.e Unicode.

Here is my code that does'nt work.

unsigned short testdata[50]="\x400\x401\x402";
printf("Testdata %x\n",testdata[0]);

This code does'nt compile with cc compiler on unix. My flavor of UNIX
is SCO Unixware 7.1.1

Note data-type used is unsigned short and size of it 2 bytes(16 bits).

How to assign data to a 16-bit array. Is there a datatype?

How to go about it?

Thanks.

Fletcher Glenn

2005-09-23, 2:49 am


"SAM" <mshyamrao@gmail.com> wrote in message
news:1127451303.140695.235700@g47g2000cwa.googlegroups.com...
> Hi all,
>
> I wanted to know what datatype to be used in a C/C++ program for a
> 16 bit i.e Unicode.
>
> Here is my code that does'nt work.
>
> unsigned short testdata[50]="\x400\x401\x402";
> printf("Testdata %x\n",testdata[0]);
>
> This code does'nt compile with cc compiler on unix. My flavor of UNIX
> is SCO Unixware 7.1.1
>
> Note data-type used is unsigned short and size of it 2 bytes(16 bits).
>
> How to assign data to a 16-bit array. Is there a datatype?
>
> How to go about it?
>
> Thanks.
>


Try looking up w_char.

--

Fletcher Glenn


Maxim Yegorushkin

2005-09-23, 7:50 am


SAM wrote:

> I wanted to know what datatype to be used in a C/C++ program for a
> 16 bit i.e Unicode.


It depends on which library you use for unicode manipulation. If it is
glibc then the type is wchar_t, see <wchar.h>

Roger Leigh

2005-09-23, 7:50 am

On 2005-09-23, Fletcher Glenn <fandxxxmgiiBLOCKED@pacbell.net> wrote:
>
> "SAM" <mshyamrao@gmail.com> wrote in message
> news:1127451303.140695.235700@g47g2000cwa.googlegroups.com...
>
> Try looking up w_char.


ITYM wchar_t.

It might also be a good idea to use an OS other than SCO that actually
has UTF-8 locales and allows UTF-8 source code, that way you can
simply write

const wchar_t testdata = L"Grüße";

That said, wchar_t isn't necessarily 16 bits. On GNU/Linux, it's 32.
Is there any particular reason you need 16 bits? UCS is a 32-bit code.

Bjorn Reese

2005-09-23, 5:55 pm

Maxim Yegorushkin wrote:

> It depends on which library you use for unicode manipulation. If it is
> glibc then the type is wchar_t, see <wchar.h>


wchar_t has been a standard type since 1989 (C89 and XPG3), and
<wchar.h> a few years later (C94 and XPG4). So, it predates glibc
by a wide margin.

--
mail1dotstofanetdotdk
Mikko Rauhala

2005-09-23, 5:55 pm

On Fri, 23 Sep 2005 10:42:20 +0100, Roger Leigh
<${roger}@whinlatter.uklinux.net.invalid> wrote:
> That said, wchar_t isn't necessarily 16 bits. On GNU/Linux, it's 32.
> Is there any particular reason you need 16 bits? UCS is a 32-bit code.


wchar_t doesn't even necessarily use Unicode code points internally,
though probably does on most relevant systems. One is advised to
check the presence of the __STDC_ISO_10646__ macro before making
assumptions on what's inside wchar_t. (If present, wchar_t contains
ISO-10646-1/Unicode code points.)

--
Mikko Rauhala - mjr@iki.fi - <URL:http://www.iki.fi/mjr/>
Transhumanist - WTA member - <URL:http://www.transhumanism.org/>
Singularitarian - SIAI supporter - <URL:http://www.singinst.org/>

Maxim Yegorushkin

2005-09-23, 5:55 pm


Bjorn Reese wrote:
> Maxim Yegorushkin wrote:
>
>
> wchar_t has been a standard type since 1989 (C89 and XPG3), and
> <wchar.h> a few years later (C94 and XPG4). So, it predates glibc
> by a wide margin.


So what?

If you reread my posting carefully you might notice that I don't speak
about what predates what, neither do I care.

I'm talking that if you use some library for handling unicode you may
like to stick to whichever type the library accepts as unicode code
points. glibc accepts wchar_t, ICU - UChar32.

Bjorn Reese

2005-09-26, 6:02 pm

Maxim Yegorushkin wrote:

> So what?
>
> If you reread my posting carefully you might notice that I don't speak
> about what predates what, neither do I care.


Your posting was phrased in a glibc context, which could make it easy to
misinterpret your reply as if wchar_t is a glibc-only feature.

I intended to elaborate on your reply by making the original poster, who
is using UnixWare, aware that wchar_t was introduced to the standards
quite some time ago, which means that wchar_t is widespread and
therefore a good option.

I am a bit surprised at your dismissive reaction.

> I'm talking that if you use some library for handling unicode you may
> like to stick to whichever type the library accepts as unicode code
> points. glibc accepts wchar_t, ICU - UChar32.


Fair point.

--
mail1dotstofanetdotdk
Maxim Yegorushkin

2005-09-27, 2:52 am


Bjorn Reese wrote:

[]

> I am a bit surprised at your dismissive reaction.


Sorry, didn't mean to offend you.

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com