|
Home > Archive > Unix Programming > March 2007 > Protocols to exchange messages via a socket
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Protocols to exchange messages via a socket
|
|
| the_edge123.nospam@club-internet.fr 2007-03-12, 7:22 am |
| Hello,
I'm not sure I'm at the right group for this question.
I want to send a binary message over a socket and I'm wondering which
protocols are best suitable:
1) <separator><msg in hexa printable characters><separator>
This solution works but doubles the message length :-(
2) <msg length><raw msg>
The issue is that we're not sure to be at the beginning of the msg.
Can I receive correctly the next message if I close and re-open the
socket in case of a decoding error ?
Thanks in advance,
Fabien
| |
| Robert Harris 2007-03-12, 7:22 am |
| the_edge123.nospam@club-internet.fr wrote:
> Hello,
>
> I'm not sure I'm at the right group for this question.
> I want to send a binary message over a socket and I'm wondering which
> protocols are best suitable:
>
> 1) <separator><msg in hexa printable characters><separator>
> This solution works but doubles the message length :-(
>
> 2) <msg length><raw msg>
> The issue is that we're not sure to be at the beginning of the msg.
> Can I receive correctly the next message if I close and re-open the
> socket in case of a decoding error ?
As long as your <msg length> is right, you don't have to close and
re-open the socket if the message is bad.
If your message length takes up more than one byte, be careful that the
reader and writer of the socket agree on its endianness, i.e. that they
agree which byte of the length is the most significant.
Robert
>
> Thanks in advance,
> Fabien
>
| |
| the_edge123.nospam@club-internet.fr 2007-03-12, 1:23 pm |
| On 12 mar, 12:09, Robert Harris <robert.f.har...@blueyonder.co.uk>
wrote:
> the_edge123.nos...@club-internet.fr wrote:
>
>
>
>
> As long as your <msg length> is right, you don't have to close and
> re-open the socket if the message is bad.
I think I will use <separator><msg length><raw msg><separator> to be
able to recover in burst-sending mode if msg length is corrupted.
>
> If your message length takes up more than one byte, be careful that the
> reader and writer of the socket agree on its endianness, i.e. that they
> agree which byte of the length is the most significant.
Noted
>
> Robert
>
Thanks,
Fabien
| |
| James Antill 2007-03-12, 1:23 pm |
| On Mon, 12 Mar 2007 03:20:32 -0700, the_edge123.nospam wrote:
> Hello,
>
> I'm not sure I'm at the right group for this question. I want to send a
> binary message over a socket and I'm wondering which protocols are best
> suitable:
>
> 1) <separator><msg in hexa printable characters><separator> This
> solution works but doubles the message length :-(
How does it double the message length?
> 2) <msg length><raw msg>
> The issue is that we're not sure to be at the beginning of the msg. Can
> I receive correctly the next message if I close and re-open the socket
> in case of a decoding error ?
I'd highly recommend net strings:
http://cr.yp.to/proto/netstrings.txt
....just because multiple applications have implemented them, and there
isn't going to be any gain in doing something different. They have a
length and a "separator" ... although I'd recommend dropping the
connection on a netstring parse error, although it would be possible to
search for the next netstring.
--
James Antill -- james@and.org
http://www.and.org/and-httpd/ -- $2,000 security guarantee
http://www.and.org/vstr/
| |
| Rainer Weikusat 2007-03-12, 1:23 pm |
| James Antill <james-netnews@and.org> writes:
> On Mon, 12 Mar 2007 03:20:32 -0700, the_edge123.nospam wrote:
>
> How does it double the message length?
If you transform the original binary values into hexadecimal
representation, each nibble becomes a hex digit.
| |
| Martin Vuille 2007-03-12, 7:20 pm |
| the_edge123.nospam@club-internet.fr wrote in
news:1173708146.840736.307610@n33g2000cwc.googlegroups.com:
> I think I will use <separator><msg length><raw msg><separator>
> to be able to recover in burst-sending mode if msg length is
> corrupted.
As long as <separator> cannot occur in <raw msg>, even if there is
corruption.
MV
--
Do not send e-mail to the above address. I do not read e-mail sent
there.
| |
| William Ahern 2007-03-12, 7:20 pm |
| On Mon, 12 Mar 2007 03:20:32 -0700, the_edge123.nospam wrote:
> Hello,
>
> I'm not sure I'm at the right group for this question.
> I want to send a binary message over a socket and I'm wondering which
> protocols are best suitable:
>
> 1) <separator><msg in hexa printable characters><separator>
> This solution works but doubles the message length :-(
>
> 2) <msg length><raw msg>
> The issue is that we're not sure to be at the beginning of the msg.
> Can I receive correctly the next message if I close and re-open the
> socket in case of a decoding error ?
>
If all you're doing is writing messages across the wire, then no.
You cannot be sure how many messages were lost on the wire behind any
corrupted messages, but which were assumed "sent" by the sender. Also,
assuming TCP the atomicity of writes isn't guaranteed. You could have lost
1/2 a packet in transit.
You have to implement some sort of message accounting, for example
using windows and some retry protocol. To get an idea of how to approach
things (aside from reading the TCP RFC's), read up on this implementation:
http://airhook.ofb.net/
| |
| Barry Margolin 2007-03-13, 1:29 am |
| In article <1173694832.757185.244440@p10g2000cwp.googlegroups.com>,
the_edge123.nospam@club-internet.fr wrote:
> Hello,
>
> I'm not sure I'm at the right group for this question.
> I want to send a binary message over a socket and I'm wondering which
> protocols are best suitable:
Rather than designing your own, why not use an existing library like
XDR? And if you also use the RPC library it will handle lots of other
aspects of the protocol, such as automatic retransmission if you're
using UDP.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
| |
| David Schwartz 2007-03-13, 1:29 am |
| On Mar 12, 3:20 am, the_edge123.nos...@club-internet.fr wrote:
> I'm not sure I'm at the right group for this question.
> I want to send a binary message over a socket and I'm wondering which
> protocols are best suitable:
> 1) <separator><msg in hexa printable characters><separator>
> This solution works but doubles the message length :-(
You can use base 64 instead of hex if you prefer. That will increase
the message length by much less (about a third).
Alternatively, you can pick a character that appears only rarely in
your message, and use it as a separator. For example, of 0x55 appears
only rarely in your message, you can use 0x55 followed by 0x00 as your
message separator and 0x55 followed by 0xff to indicate a real 0x55 in
your message. This will only double a message if it consists of all
0x55's. (If you want to get fancy, you can use 0x55 followed by the
number of consecutive 0xff's in the real message, so two or more
consecutive 0x55's don't double in size. Use 0x55 0x00 as a
separator.)
You only need to reserve one character as a separator.
> 2) <msg length><raw msg>
> The issue is that we're not sure to be at the beginning of the msg.
> Can I receive correctly the next message if I close and re-open the
> socket in case of a decoding error ?
What kind of socket is this? If it's TCP, this is not an issue. Just
always start a new connection with the length of a message. If
something goes wrong somehow, close and re-open the connection and
know you'll start at the beginning of a message.
There are very few real-world cases where resynchronization is a real
issue. Broadcast is one of them; is that what you're doing?
DS
| |
|
| On Mar 12, 6:20 am, the_edge123.nos...@club-internet.fr wrote:
> Hello,
>
> I'm not sure I'm at the right group for this question.
> I want to send a binary message over a socket ...
This seems to be something of an FAQ. See thread @
http://groups.google.com/group/comp...ad268645815f16/
> Fabien
| |
|
| On Mar 12, 6:20 am, the_edge123.nos...@club-internet.fr wrote:
> Hello,
>
> I'm not sure I'm at the right group for this question.
> I want to send a binary message over a socket...
This seems to be something of an FAQ. See thread @
http://groups.google.com/group/comp...ad268645815f16/
> Fabien
| |
|
| On Mar 12, 6:20 am, the_edge123.nos...@club-internet.fr wrote:
> Hello,
>
> I'm not sure I'm at the right group for this question.
> I want to send a binary message over a socket...
This seems to be something of an FAQ. See thread @
http://groups.google.com/group/comp...ad268645815f16/
> Fabien
| |
|
| On Mar 12, 6:20 am, the_edge123.nos...@club-internet.fr wrote:
> Hello,
>
> I'm not sure I'm at the right group for this question.
> I want to send a binary message over a socket ...
This seems to be something of an FAQ. See thread @
http://groups.google.com/group/comp...ad268645815f16/
> Fabien
| |
| Rainer Weikusat 2007-03-13, 7:24 am |
| William Ahern <william@25thandClement.com> writes:
> On Mon, 12 Mar 2007 03:20:32 -0700, the_edge123.nospam wrote:
>
> If all you're doing is writing messages across the wire, then no.
> You cannot be sure how many messages were lost on the wire behind any
> corrupted messages, but which were assumed "sent" by the sender. Also,
> assuming TCP the atomicity of writes isn't guaranteed. You could have lost
> 1/2 a packet in transit.
Since TCP ordinarily provides reliable, in-order delivery of a stream
of bytes, no message can ever be 'lost in transit'.
| |
| William Ahern 2007-03-13, 7:17 pm |
| On Tue, 13 Mar 2007 10:57:33 +0100, Rainer Weikusat wrote:
> William Ahern <william@25thandClement.com> writes:
>
> Since TCP ordinarily provides reliable, in-order delivery of a stream
> of bytes, no message can ever be 'lost in transit'.
It's a pretty weak guarantee, given all the horrifying network routers and
proxies in-place these days. Also, would you trust all of your data to
the reliability of the CRC16 checksum in TCP? SSH and SSL/TLS provide much
stronger guarantees (which is maybe the answer most useful to the OP.)
Your hard drive ordinarily correctly stores your data. Yet, at my job I
spend an inordinate amount of time writing self-healing software;
software that fixes itself when the data it expected to have been
properly written to disk comes back mangled. Its not enough to simply
recognize the problem. The software cannot always simply say, "Oops, I
can't continue." Sometimes it has to figure out how to recover all on its
own, especially when its an appliance.
Maybe it's 1 in a billion occurrence, but when you have a hundred thousand
machines in the field, chugging away 24/7... it happens... often.
| |
| David Schwartz 2007-03-13, 7:17 pm |
| On Mar 12, 3:20 am, the_edge123.nos...@club-internet.fr wrote:
> I'm not sure I'm at the right group for this question.
> I want to send a binary message over a socket and I'm wondering which
> protocols are best suitable:
> 1) <separator><msg in hexa printable characters><separator>
> This solution works but doubles the message length :-(
You can use base 64 instead of hex if you prefer. That will increase
the message length by much less (about a third).
Alternatively, you can pick a character that appears only rarely in
your message, and use it as a separator. For example, of 0x55 appears
only rarely in your message, you can use 0x55 followed by 0x00 as your
message separator and 0x55 followed by 0xff to indicate a real 0x55 in
your message. This will only double a message if it consists of all
0x55's. (If you want to get fancy, you can use 0x55 followed by the
number of consecutive 0xff's in the real message, so two or more
consecutive 0x55's don't double in size. Use 0x55 0x00 as a
separator.)
You only need to reserve one character as a separator.
> 2) <msg length><raw msg>
> The issue is that we're not sure to be at the beginning of the msg.
> Can I receive correctly the next message if I close and re-open the
> socket in case of a decoding error ?
What kind of socket is this? If it's TCP, this is not an issue. Just
always start a new connection with the length of a message. If
something goes wrong somehow, close and re-open the connection and
know you'll start at the beginning of a message.
There are very few real-world cases where resynchronization is a real
issue. Broadcast is one of them; is that what you're doing?
DS
| |
|
| On Mar 13, 4:16 pm, William Ahern <will...@25thandClement.com> wrote:
> On Tue, 13 Mar 2007 10:57:33 +0100, Rainer Weikusat wrote:
>
>
>
>
>
>
> It's a pretty weak guarantee, given all the horrifying network routers and
> proxies in-place these days. Also, would you trust all of your data to
> the reliability of the CRC16 checksum in TCP? SSH and SSL/TLS provide much
> stronger guarantees (which is maybe the answer most useful to the OP.)
Good point.
>
> Your hard drive ordinarily correctly stores your data. Yet, at my job I
> spend an inordinate amount of time writing self-healing software;
> software that fixes itself when the data it expected to have been
> properly written to disk comes back mangled. Its not enough to simply
> recognize the problem. The software cannot always simply say, "Oops, I
> can't continue." Sometimes it has to figure out how to recover all on its
> own, especially when its an appliance.
I guess you're no stranger to Sun's ZFS then? They also recognised
that data integrity should be guaranteed at a layer below
applications.
>
> Maybe it's 1 in a billion occurrence, but when you have a hundred thousand
> machines in the field, chugging away 24/7... it happens... often.
| |
| the_edge123.nospam@club-internet.fr 2007-03-14, 7:26 am |
| On 13 mar, 05:09, "David Schwartz" <dav...@webmaster.com> wrote:
> On Mar 12, 3:20 am, the_edge123.nos...@club-internet.fr wrote:
>
>
> You can use base 64 instead of hex if you prefer. That will increase
> the message length by much less (about a third).
>
> Alternatively, you can pick a character that appears only rarely in
> your message, and use it as a separator. For example, of 0x55 appears
> only rarely in your message, you can use 0x55 followed by 0x00 as your
> message separator and 0x55 followed by 0xff to indicate a real 0x55 in
> your message. This will only double a message if it consists of all
> 0x55's. (If you want to get fancy, you can use 0x55 followed by the
> number of consecutive 0xff's in the real message, so two or more
> consecutive 0x55's don't double in size. Use 0x55 0x00 as a
> separator.)
>
> You only need to reserve one character as a separator.
You gave me the idee to use <separator><msg length><raw
msg><separator>
If necessary, I will add a CRC
| |
| Rainer Weikusat 2007-03-14, 7:26 am |
| William Ahern <william@25thandClement.com> writes:
> On Tue, 13 Mar 2007 10:57:33 +0100, Rainer Weikusat wrote:
>
> It's a pretty weak guarantee, given all the horrifying network routers and
> proxies in-place these days.
This will still not result in 'messages being [silently] lost in
transit'. One of the things TCP is somewhat infamous for is called
'head-of-line blocking' and it means that a lost segment will cause
data delivery to an application to be stopped until it has been
received, no matter if later segments have been received already.
> Also, would you trust all of your data to the reliability of the
> CRC16 checksum in TCP?
There is no CRC16 checksum in TCP. Apart from that, considering
personal and general experience and knowledge of 'common lower level
protocols' (for instance, ethernet FCS) I am willing to believe that
the person who designed the protocol knew what he was doing unless I
have (statistically) sound evidence to the contrary.
> SSH and SSL/TLS provide much stronger guarantees
If TCP was as worse as you appear to believe, you would have a really
hard time getting anything over SSH or TLS except a lot of
transmission errors.
[...]
> Your hard drive ordinarily correctly stores your data.
> Yet, at my job I spend an inordinate amount of time writing
> self-healing software; software that fixes itself when the data it
> expected to have been properly written to disk comes back mangled.
That would be an entirely different topic.
| |
| David Schwartz 2007-03-14, 1:24 pm |
| On Mar 13, 1:16 pm, William Ahern <will...@25thandClement.com> wrote:
> It's a pretty weak guarantee, given all the horrifying network routers and
> proxies in-place these days. Also, would you trust all of your data to
> the reliability of the CRC16 checksum in TCP? SSH and SSL/TLS provide much
> stronger guarantees (which is maybe the answer most useful to the OP.)
The TCP checksum is sufficient to protect non-critical traffic from
accidental corruption. It, of course, provides no protection at all
from intentional corruption. Take a look at a typical network
connection and you will actually see *very* little accidental
corruption.
> Maybe it's 1 in a billion occurrence, but when you have a hundred thousand
> machines in the field, chugging away 24/7... it happens... often.
And yet it all seems to work somehow.
I have seen four cases of verified corrupted data received over a TCP
connection, where the data that came out one end was not what went in
the other. In all four cases, the cause was tracked to be a device
that reconstructed an accurate checksum. (Most recently, a broken
piece of Windows proxy/firewall software helpfully neutralizing
suspicious looking bytes inside a perfectly innocent program I
downloaded.) So a stronger checksum would not really help, though a
cryptographic one that could not be faked by intermediaries would
have.
DS
| |
| William Ahern 2007-03-14, 7:20 pm |
| On Wed, 14 Mar 2007 10:15:44 +0100, Rainer Weikusat wrote:
> William Ahern <william@25thandClement.com> writes:
>
> This will still not result in 'messages being [silently] lost in
> transit'. One of the things TCP is somewhat infamous for is called
> 'head-of-line blocking' and it means that a lost segment will cause
> data delivery to an application to be stopped until it has been
> received, no matter if later segments have been received already.
>
I thought that it was implicit in the OP's question that dropped TCP
connections were occurring, or at least a risk that must be accounted for.
In that case, indeed his protocol messages could have been lost
in-transit, because
Normally I would agree that TCP is good enough. But if its critical that
state is maintained properly between two disparate systems, then not only
is TCP insufficient, neither seems the OP's simply construction. And,
again, it seemed implicit in the OP's question that maintaining state is
a criterion.
| |
| the_edge123.nospam@club-internet.fr 2007-03-15, 7:23 am |
| On 13 mar, 02:11, Barry Margolin <bar...@alum.mit.edu> wrote:
> In article <1173694832.757185.244...@p10g2000cwp.googlegroups.com>,
>
> the_edge123.nos...@club-internet.fr wrote:
>
>
> Rather than designing your own, why not use an existing library like
> XDR?
I don't need XDR because my data are BER-encoded.
| |
| David Schwartz 2007-03-16, 1:27 am |
| On Mar 15, 2:09 am, the_edge123.nos...@club-internet.fr wrote:
> I don't need XDR because my data are BER-encoded.
Then what do you need the length for? Every BER object either contains
its length in the first few bytes or has an 'end of object' marker.
DS
| |
| William Ahern 2007-03-16, 7:20 pm |
| On Thu, 15 Mar 2007 02:09:43 -0700, the_edge123.nospam wrote:
> On 13 mar, 02:11, Barry Margolin <bar...@alum.mit.edu> wrote:
> I don't need XDR because my data are BER-encoded.
You should have said that earlier. Have you tried:
http://lionet.info/asn1c/
This is what I use, in conjunction w/ an asynchronous I/O library. And the
generated code is intended to be used in a stream-oriented fashion, where
you can feed it any number of bytes from the network in any number of
calls, and it will eventually output an object.
| |
| Rainer Weikusat 2007-03-18, 7:16 pm |
| William Ahern <william@25thandClement.com> writes:
> On Wed, 14 Mar 2007 10:15:44 +0100, Rainer Weikusat wrote:
>
> I thought that it was implicit in the OP's question that dropped TCP
> connections were occurring, or at least a risk that must be accounted for.
> In that case, indeed his protocol messages could have been lost
> in-transit, because
No, in that case, the protocol messages do not 'just get lost in
transit', because the kernel will abort the TCP connection with some
kind of error notification communicated to the application. Or, more
precisely, they may fall onto the floor (see below) but will make a
loud noise when hitting it.
> Normally I would agree that TCP is good enough. But if its critical that
> state is maintained properly between two disparate systems, then not only
> is TCP insufficient, neither seems the OP's simply construction. And,
> again, it seemed implicit in the OP's question that maintaining state is
> a criterion.
If it is 'critical' that '[distributed] state is maintained properly',
then you are certainly not going to communicate over 'The Internet'
(or likely, any network) because this will not work and isn't supposed
to. Network communication can always fail in ways that make software
based recovery without data loss impossible. If there is an actual
reason to believe that a particular transport protocol is not reliable
enough, a different transport protocol could be used (like SCTP, for
instance, which has an Adler-32 checksum) or a different network
protocol could be used (IPsec). Another option would be to use a
minimal transport protocol (like UDP) and implement some kind of
reliabilty and integrity at the application level (for instance, using
Kerberos/ GSSAPI to integrity protect or encrypt data
message-per-message).
To reimplement a transport layer protocol in a library on top of
another transport protocol is usually (IMO) a bad idea. Within
comparatively short times (a couple of years), one will end up with
something very similar to NetBIOS-based protocols, where most of the
data going over the wire is actually noise caused by again wrapping
another layer around it, which again reimplements what used to be
already there (and still is), but is buried below enough other layers
that reimplement functionality that ... and so forth, that nobody still
remembers this (like using a byte stream protocol to implement a datagram
protocol to implement RPC to implement a byte stream protocol).
| |
| James Antill 2007-03-19, 7:24 pm |
| On Mon, 12 Mar 2007 18:53:52 +0100, Rainer Weikusat wrote:
> James Antill <james-netnews@and.org> writes:
>
> If you transform the original binary values into hexadecimal
> representation, each nibble becomes a hex digit.
Why would you _require_ each side send the full length of digits when
not using binary, and having a length/seperator?
See:
http://www.and.org/texts/ascii_binary_data
....Compact HEX vs. binary 64 bit values are a 75% reduction, for values
0-15. Binary only _starts_ to win at 16,777,216 and beyond. Sure, if you
assume 32bit bytes then you _start_ to win at 4,096 with binary ... but:
1) That's still pretty high (it's very likely that a lot of numbers going
over the network are going to be _small_, IMNSHO). 2) Everything is well
on it's way to having 64bit native numbers.
--
James Antill -- james@and.org
http://www.and.org/and-httpd/ -- $2,000 security guarantee
http://www.and.org/vstr/
| |
| Rainer Weikusat 2007-03-20, 7:27 am |
| James Antill <james-netnews@and.org> writes:
> On Mon, 12 Mar 2007 18:53:52 +0100, Rainer Weikusat wrote:
>
>
> Why would you _require_ each side send the full length of digits when
> not using binary, and having a length/seperator?
I didn't 'specify' the message format.
|
|
|
|
|