 |
|
 |
|
11-18-05 10:54 PM
When coding a server, we always use the following code:
while(1) {
tmp_sd = accept(sd, (struct sockaddr*)&tmp_sin, &len);
//1
len = recv(tmp_sd, buf, MAX, 0); //2
send(tmp_sd, buf, len, 0); //3
close(tmp_sd);
}
1. And when accept a client, i want to check the client's IP. How to do
it?
2. When using utf-8 for communication, should i translate it into ascii
for normal using?
Can i deal with it directory like ascii?
Thanks a lot.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: How to socket and utf-8? |
 |
 |
|
|
11-18-05 10:54 PM
yarco.w@gmail.com writes:
> When coding a server, we always use the following code:
>
> while(1) {
> tmp_sd = accept(sd, (struct sockaddr*)&tmp_sin, &len);
> //1
> len = recv(tmp_sd, buf, MAX, 0); //2
> send(tmp_sd, buf, len, 0); //3
> close(tmp_sd);
> }
>
> 1. And when accept a client, i want to check the client's IP. How to do
> it?
The address of the connecting client is stored in the second argument
to accept().
> 2. When using utf-8 for communication, should i translate it into ascii
> for normal using?
> Can i deal with it directory like ascii?
Nothing special needs to be done. As long as both ends expect the
same encoding, everything should work.
--
Måns Rullgård
mru@inprovide.com
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: How to socket and utf-8? |
 |
 |
|
|
11-18-05 10:54 PM
yarco.w@gmail.com writes:
> 2. When using utf-8 for communication, should i translate it into ascii
> for normal using?
http://en.wikipedia.org/wiki/Utf-8
In C, you don't have a notion of character.
The type char is merely a small integer, perhaps signed perhaps
unsigned, and the type unsigned char is merely a small unsigned
integer, and the type signed char is merely a small signed integer.
ASCII is a encoding, which is a direct mapping between some
_characters_ and some _integers_. Since the integers of the ASCII
encoding are between 0 and 127, they're small enough to be held in C
variables of type unsigned char. It's just a coincidence.
If you wanted to use the UNICODE encoding, which (to a first
approximation) is a direct mapping between some more _characters_ and
some _integers_, but bigger integers up to 0x10fff, you'll need to use
unsigned long int C variables.
Now, both ASCII and UNICODE have a 1-1 mapping between a set of
characters and a set of integers.
But there are other encodings, such as UTF-8, or UTF-16, or
ISO-2022-JP, etc, that map a character to a sequence of numbers of
variable length. However, despite this variable length of characters
encoded in UTF-8, this encoding has some nice properties:
- a character X encoded in ASCII as the same code as the same
character X encoded in UTF-8.
- no multi-byte sequence of UTF-8 contain a byte equal to one of the
ASCII subset: all multi-byte sequences in UTF-8 use only numbers
between 160 and 255.
So when you use C variables of type unsigned char, you can handle
safely utf-8 byte sequences, while you're not interested in the actual
characters represented by the byte sequence, or as long as the only
characters in this utf-8 byte sequence are all ASCII characters.
By the way you cannot "translate utf-8 to ASCII", because most
characters encodable in utf-8 cannot be encoded in ASCII:
$ echo é|iconv -f utf-8 -t ascii
iconv: illegal input sequence at position 0
So you can easily process utf-8 data as a whole, without having to
translate it. What would be "normal use" for your strings?
> Can i deal with it directory like ascii?
Globally, yes.
If you want to process the characters, in general, no.
In some cases, yes.
For example: "C'est ça la vie" is encoded in UTF-8 as these bytes:
43 27 65 73 74 20 c3 a7 61 20 6c 61 20 76 69 65
If you want to split this string on spaces (bytes 20), you can do it
as if it was encoded in ASCII, because the space in UNICODE has the
same code as in ASCII, and because UTF-8 doesn't use this code for
anything else than a space. So you can get these four subsequences of
bytes:
43 27 65 73 74
c3 a7 61
6c 61
76 69 65
which, when decoded from UTF-8 give you back these four strings:
"C'est"
"ça"
"la"
"vie"
If you keep in mind that in C you are not processing characters, but
bytes, and if you keep in mind the properties of the UTF-8 encoding,
then you can do a great deal without having to decode UTF-8 bytes to
characters. You may want to use:
typedef unsigned char byte;
byte* bytes="Hello World";
instead of char and string...
--
__Pascal Bourguignon__ http://www.informatimago.com/
I need a new toy.
Tail of black dog keeps good time.
Pounce! Good dog! Good dog!
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: How to socket and utf-8? |
 |
 |
|
|
11-18-05 10:54 PM
yarco.w@gmail.com wrote:
# 2. When using utf-8 for communication, should i translate it into ascii
# for normal using?
# Can i deal with it directory like ascii?
If you are using the 7-bit ASCII subset, UTF-8 and ASCII are identical.
Non-ASCII characters are encoded as one or more bytes in the range
0x80-0xFF. If you pass through signed character <0 or unsigned characters
>=128 unmolested, you will preserve the Unicode characters.
--
SM Ryan http://www.rawbw.com/~wyrmwif/
Raining down sulphur is like an endurance trial, man. Genocide is the
most exhausting activity one can engage in. Next to soccer.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: How to socket and utf-8? |
 |
 |
|
|
11-18-05 10:54 PM
yarco.w@gmail.com wrote:
> When coding a server, we always use the following code:
>
> while(1) {
> tmp_sd = accept(sd, (struct sockaddr*)&tmp_sin, &len);
> //1
> len = recv(tmp_sd, buf, MAX, 0); //2
> send(tmp_sd, buf, len, 0); //3
> close(tmp_sd);
> }
>
> 1. And when accept a client, i want to check the client's IP. How to do
> it?
the getpeername function.
> 2. When using utf-8 for communication, should i translate it into ascii
> for normal using?
Depends on what you want to do with it. If you want to e.g. display
it on something that doesn't understand utf-8 you must do something.
utf-8 doesn't provide direct access to the individual characters if
you ever need to do that.
> Can i deal with it directory like ascii?
For most purposes, yes.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: How to socket and utf-8? |
 |
 |
|
|
11-19-05 10:51 PM
Thanks for replay.
I am doing this, for example:
A client send a command which is encoded in utf-8 to the server.
Then the server parse the command, send the response also encoded in
utf-8 to the client...
I don't know whether i can treat it as normal string:
if i do:
char* msg = "GET apple";
send(sd, msg, strlen(msg), 0);
what's the difference between using ascii and utf-8 for transfering?
I get confused...
Why not:
utf8* msg = u"GET apple";
send(sd, (char*)msg, utflen(msg)*sizeof(utf8), 0);
mmm...
Would you mean i can only translate it into Unicode for checking
whether it is command?
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: How to socket and utf-8? |
 |
 |
|
|
11-19-05 10:51 PM
yarco.w@gmail.com writes:
> Thanks for replay.
> I am doing this, for example:
> A client send a command which is encoded in utf-8 to the server.
> Then the server parse the command, send the response also encoded in
> utf-8 to the client...
> I don't know whether i can treat it as normal string:
> if i do:
> char* msg = "GET apple";
> send(sd, msg, strlen(msg), 0);
> what's the difference between using ascii and utf-8 for transfering?
> I get confused...
> Why not:
> utf8* msg = u"GET apple";
> send(sd, (char*)msg, utflen(msg)*sizeof(utf8), 0);
>
> mmm...
> Would you mean i can only translate it into Unicode for checking
> whether it is command?
[240]> (ext:convert-string-to-bytes "GET apple" charset:ascii)
#(71 69 84 32 97 112 112 108 101)
[241]> (ext:convert-string-to-bytes "GET apple" charset:utf-8)
#(71 69 84 32 97 112 112 108 101)
[242]> (equalp (ext:convert-string-to-bytes "GET apple" charset:ascii)
(ext:convert-string-to-bytes "GET apple" charset:utf-8))
T
So for this specific string, "GET apple" it doesn't make a difference
whether you encode it in ASCII or in UTF-8: you obtain the same byte
sequence.
Now, if your command was instead: "REÇOIT une pomme", it would matter:
[244]> (ext:convert-string-to-bytes "REÇOIT une pomme" charset:utf-8)
#(82 69 195 135 79 73 101 32 117 110 101 32 112 111 109 109 101)
[245]> (ext:convert-string-to-bytes "REÇOIT une pomme" charset:ASCII)
*** - Character #\u00C7 cannot be represented in the character set
CHARSET:ASCII
The following restarts are available:
ABORT :R1 ABORT
Break 1 [246]>
as you can see, there's no ASCII encoding for this string.
Ok, so you're using UTF-8, and you send this byte sequence:
#(82 69 195 135 79 73 101 32 117 110 101 32 112 111 109 109 101)
What happens when you decode it as ASCII?
(ext:convert-string-from-bytes
(ext:convert-string-to-bytes "REÇOIT une pomme" charset:utf-8)
charset:ascii)
*** - invalid byte #xC3 in CHARSET:ASCII conversion
The following restarts are available:
ABORT :R1 ABORT
Break 1 [250]>
Well, you've got a problem because ASCII bytes can only be between 0 and 127
.
Let's try something else, let's try to decode it as an ISO-8859-1
(Latin-1) bytes:
[251]> (ext:convert-string-from-bytes
(ext:convert-string-to-bytes "REÇOIT une pomme" charset:utf-8)
charset:iso-8859-1)
"REÇOIT une pomme"
Well, the command is not REÇOIT any more, so I don't know how your
server will be able to understand the command...
(Note that in iso-8859-1 the code 0x87 encodes no graphical character,
but a control character "ESA"). http://en.wikipedia.org/wiki/Iso-8859-1
Now, you could define your protocol differently, and say that messages
are made of bytes, and that if the first four bytes are:
71 69 84 32
then it's a GET command and you will call: do_get(msg+4);
and let do_get do whatever it wants with the following bytes, which
can be specified to be UTF-p8 bytes if you need.
Similarly, you could define your protocol to say that if the message
starts with these bytes:
82 69 195 135 79 73 101 32
then it's a GET command too, and you will call do_get(msg+8);
If you defined your protocol this way, you could even do as in HTTP,
let the command specify the encoding used for the data, so you could receive
these commands:
71 69 84 47 65 83 67 73 73 32 102 105 108 101
G E T / A S C I I SP <some ASCII bytes>
71 69 84 47 75 79 73 56 45 82 32 198 193 202 204
G E T / K O I 8 - R SP <some KOI8-R bytes>
71 69 84 47 85 84 70 45 56 32 209 132 208 176 208 185 208 187
G E T / U T F - 8 SP <some UTF-8 bytes>
You could parse them as:
const byte get={71,69,84,0};
byte* slash=strchr(msg,47); /* add a test for NULL ! */
byte* space=strchr(slash,32); /* add a test for NULL ! */
slash[0]=0;
space[0]=0;
if(strcmp(msg,get)==0){
byte* encoding=slash+1;
byte* encoded_bytes=space+1;
do_get(encoding,encoded_bytes);
}
--
"Debugging? Klingons do not debug! Our software does not coddle the
weak."
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: How to socket and utf-8? |
 |
 |
|
|
11-20-05 10:51 PM
Thanks, Pascal Bourguignon.
I'm trying to create a dict server in RFC2229.
Any suggestion for socket programming?
For example, i don't know how to test whether a client is still alive??
Someone said use write()...does there exist a function
is_alive(sock_description) to test it?
Thank you very much.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: How to socket and utf-8? |
 |
 |
|
|
11-20-05 10:51 PM
yarco.w@gmail.com writes:
> Thanks, Pascal Bourguignon.
> I'm trying to create a dict server in RFC2229.
> Any suggestion for socket programming?
> For example, i don't know how to test whether a client is still alive??
When the client dies, the socket gets closed automatically.
So next time you try to read or write to it, you get a EBADF error.
> Someone said use write()...does there exist a function
> is_alive(sock_description) to test it?
No, you just use read or write. It would be useless to have a
is_alive, because the client could die between your call to is_alive
and to read or write!
> Thank you very much.
--
__Pascal Bourguignon__ http://www.informatimago.com/
Nobody can fix the economy. Nobody can be trusted with their finger
on the button. Nobody's perfect. VOTE FOR NOBODY.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: How to socket and utf-8? |
 |
 |
|
|
11-20-05 10:51 PM
Pascal Bourguignon <spam@mouse-potato.com> writes:
> yarco.w@gmail.com writes:
>
>
> When the client dies, the socket gets closed automatically.
> So next time you try to read or write to it, you get a EBADF error.
Are you sure about that? The system will detect that the other end
has vanished, that's for sure. However, the file descriptor will
remain open (otherwise it could be reused, causing all sorts of
trouble). Writing to a socket where the other end has closed should
give an EPIPE error, reading should just indicate end of file. It is
also possible to get an ECONNRESET error, depending on how the link
was broken.
--
Måns Rullgård
mru@inprovide.com
[ Post a follow-up to this message ]
|
|
|
 |
|
|
|
|
Sponsored Links |
 |
 |
|
|
 |
All times are GMT. The time now is 03:43 AM. |
 |
|
|
 |
|
 |
|
|
 |
|
Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
|
|
|
|
Medical and Health forum | Computer Games Reviews | Graphics design forum
|
 |
|
 |
|