Validating multibyte strings
Web Server forum
Back To The Forum Home!Search!Private Messaging System

Web Server Talk Web Server Talk > Unix and Linux reviews > Free Unix support > Unix Programming > Validating multibyte strings




  Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    Validating multibyte strings  
Simon Morgan


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
09-24-05 12:48 PM

Hi,

The following code is meant to validate a string of multibyte characters
by using mbcheck() to call mblen() on each character on the string passed
to it. The problem is that it isn't working how I expect. I've included in
the comments what I think mbcheck() should be returning for each string
given my understanding of how the multibyte system works.

#include <stdio.h>
#include <stdlib.h>

int mbcheck(const char *);

int main(void) {
char *a[] = {
"\x05\x87\x80\x36\xed\xaa", /*  0 */
"\x20\xe4\x50\x88\x3f",     /* -1 */
"\xde\xad\xbe\xef",         /* -1 */
"\x8a\x60\x92\x74\x41"      /*  0 */
};
int i;

for (i = 0; i < sizeof(a) / sizeof(a[0]); i++) {
printf("%d\n", mbcheck(a[i]));
puts("--");
}

return 0;
}

int mbcheck(const char *s) {
int n;

for (mblen(NULL, 0); ; s += n) {
printf("checking %#.8x\n", *s);
if ((n = mblen(s, MB_CUR_MAX)) <= 0)
return n;
printf("%d\n", n);
}
}

Does mblen() rely on a locale being set? Reading the man page it doesn't
look like it. This code is for an exercise in the book "C Programming: A
Modern Approach". The strings are supposedly Shift-JIS encoded kanji and I
have no idea which locale that relates to if there is one.

Also could somebody please explain to me what's with all the hexadecimal
f's in the output? As you've probably realised I'm still learning C but
seeing as s points to a char shouldn't printf() only be reading 1 byte and
padding the output with 0?

Many thanks.

--
"Being a social outcast helps you stay concentrated on the really important
things, like thinking and hacking." - Eric S. Raymond






[ Post a follow-up to this message ]



    Re: Validating multibyte strings  
Ulrich Eckhardt


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
09-24-05 12:48 PM

Simon Morgan wrote:
> Does mblen() rely on a locale being set? Reading the man page it doesn't
> look like it.

You need to update your manpages, mine (current Debian) explicitly mentions
locales.

> The strings are supposedly Shift-JIS encoded kanji and I
> have no idea which locale that relates to if there is one.

Just for you info, but how is mblen() supposed to know this encoding if not
via the locale?

Uli

--
http://www.erlenstar.demon.co.uk/unix/





[ Post a follow-up to this message ]



    Re: Validating multibyte strings  
Simon Morgan


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
09-24-05 12:48 PM

On Sat, 24 Sep 2005 13:51:01 +0200, Ulrich Eckhardt wrote:

> You need to update your manpages, mine (current Debian) explicitly
> mentions locales.

I just spotted it in the NOTES section, which I didn't read. Sorry. 

> Just for you info, but how is mblen() supposed to know this encoding if
> not via the locale?

I thought that the same multibyte encoding rules might apply to all
locales, i.e. a function such as mblen won't need to know the locale to
validate a string but a function used for displaying it would. I'm still
learning C so please excuse my ignorance.

--
"Being a social outcast helps you stay concentrated on the really important
things, like thinking and hacking." - Eric S. Raymond






[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 07:58 PM.      Post New Thread    Post A Reply      
  Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 
Medical and Health forum | Computer Games Reviews | Graphics design forum

Back To The Top
Home | Usercp | Faq | Register