Unix Programming - ordering of socket connections?

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > April 2006 > ordering of socket connections?





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author ordering of socket connections?
Henry Townsend

2006-04-27, 7:55 am

I have a classic client/server app; clients connect via sockets and the
server uses a select() loop to handle those connections. The only
complication is that each client must connect twice, once to deliver a
START token and then again later to deliver some data followed by an END
token. And of course the START *must* arrive before the END.

I've noticed that under heavy load, occasionally START and END seem to
arrive out of order. The client is structurally incapable of sending
them in the wrong order, although it is possible for very little time to
elapse between the two, so the problem has to be somewhere else. I have
a theory but no easy way to test it so I'd like to see what experts out
there think:

The select loop runs through all possible file descriptors in order,
something like:

for (fd = 3; fd <= sockmax; fd++) {
if (fd == listener)
continue; // dealt with elsewhere

if (FD_ISSET(fd, &read_fds)) {
// read from fd
}
}

It seems to me that a START message could arrive while this code is busy
handling another connection on (say) file descriptor 12. So the START is
assigned to file descriptor 13. Then the server finishes processing the
open socket and closes it. Immediately thereafter the END connection is
made and assigned to the lowest available descriptor which is now 12.
Thus, the next time the select loop starts it will say data is available
on both 12 and 13 but because it services them in order we get END first.

Does this make sense? If so, is there a recommended way to deal with it?
I'm thinking of adding code to send a "try again later" message back to
the client when this happens, which should push END processing into the
subsequent select() cycle. But I'd rather not go to all that trouble if
this isn't likely to be the problem.

BTW, it's a structural requirement to make two connections. Using the
same socket to send both START and END would of course solve any
ordering problems but it's not an option here.

Thanks,
HT
moi

2006-04-27, 7:55 am

Henry Townsend wrote:
> I have a classic client/server app; clients connect via sockets and the
> server uses a select() loop to handle those connections. The only
> complication is that each client must connect twice, once to deliver a
> START token and then again later to deliver some data followed by an END
> token. And of course the START *must* arrive before the END.
>
> I've noticed that under heavy load, occasionally START and END seem to
> arrive out of order. The client is structurally incapable of sending
> them in the wrong order, although it is possible for very little time to
> elapse between the two, so the problem has to be somewhere else. I have
> a theory but no easy way to test it so I'd like to see what experts out
> there think:


The simplest way to impose order would be, IMHO, to make the server
acknowledge the messages (send a reply), and require the client not to
send the END before it has received the ACK.

But, probably your client is fire&forget?
Putting a session number or sequence number in the packets could
also be a way of matching START & END.

HTH,
AvK
Gordon Burditt

2006-04-27, 7:55 am

>I have a classic client/server app; clients connect via sockets and the
>server uses a select() loop to handle those connections. The only
>complication is that each client must connect twice, once to deliver a
>START token and then again later to deliver some data followed by an END
>token. And of course the START *must* arrive before the END.


>I've noticed that under heavy load, occasionally START and END seem to
>arrive out of order. The client is structurally incapable of sending
>them in the wrong order, although it is possible for very little time to
>elapse between the two, so the problem has to be somewhere else.


I don't believe there is any guarantee that packets (is this TCP or UDP?)
on different connections follow any particular relative ordering. If
it's TCP, one packet could get trashed and require resending, meanwhile
the other packet gets there first. If it's UDP, packets can get
re-ordered. This may not be what's happening in your case, but
it can happen. It's unlikely on localhost, but I think it still
can happen (running short on buffers).

>I have
>a theory but no easy way to test it so I'd like to see what experts out
>there think:
>
>The select loop runs through all possible file descriptors in order,
>something like:
>
> for (fd = 3; fd <= sockmax; fd++) {
> if (fd == listener)
> continue; // dealt with elsewhere
>
> if (FD_ISSET(fd, &read_fds)) {
> // read from fd
> }
> }


And the problem comes in here. select() doesn't tell you what order
things arrived in, but you assume an order. If, for example, your
process didn't get scheduled for a couple of hours, all of your file
descriptors could have a lot of data arriving on them.

>It seems to me that a START message could arrive while this code is busy
>handling another connection on (say) file descriptor 12. So the START is
>assigned to file descriptor 13. Then the server finishes processing the
>open socket and closes it. Immediately thereafter the END connection is
>made and assigned to the lowest available descriptor which is now 12.
>Thus, the next time the select loop starts it will say data is available
>on both 12 and 13 but because it services them in order we get END first.


It's quite possible, but it's hardly the worst scenario to deal with.

>Does this make sense? If so, is there a recommended way to deal with it?
>I'm thinking of adding code to send a "try again later" message back to
>the client when this happens, which should push END processing into the
>subsequent select() cycle. But I'd rather not go to all that trouble if
>this isn't likely to be the problem.


>BTW, it's a structural requirement to make two connections. Using the
>same socket to send both START and END would of course solve any
>ordering problems but it's not an option here.


What is the relative ordering of incoming connections vs. data
transmission? I'm thinking you could scan the file descriptors IN
ORDER OF THEIR CONNECTION, not numerically (this probably requires
maintaining an array or list of connection file descriptors in order
of connection). If it is possible both connections could come in
"simultaneously" while your server is handling something else, this
isn't a solution, as the order of socket connection isn't accurate
either.

Perhaps your protocol needs START_ACK and END_ACK tokens, which the
server sends and the client waits for before proceeding. Consider
the FTP protocol: I believe the server sends back an ack for the
command on port 21, which the client waits for, before the client
starts setting up a port 20 connection.

Gordon L. Burditt
purple_stars

2006-04-27, 7:55 am

why do they have to connect twice, i mean, why did you do it that way ?

Henry Townsend wrote:
> I have a classic client/server app; clients connect via sockets and the
> server uses a select() loop to handle those connections. The only
> complication is that each client must connect twice, once to deliver a
> START token and then again later to deliver some data followed by an END
> token. And of course the START *must* arrive before the END.

[snip]

Henry Townsend

2006-04-27, 7:55 am

moi wrote:
> The simplest way to impose order would be, IMHO, to make the server
> acknowledge the messages (send a reply), and require the client not to
> send the END before it has received the ACK.


Yes, that's a good idea though it might be a bit slower if each client
had to wait for the server to get around to sending him an ack before
continuing.

> But, probably your client is fire&forget?


Well, here's where we start getting to where I don't know what I'm
talking about ... it wasn't designed to be "fire and forget" but I'm
guessing that since the START message is quite short it fits within
PIPE_BUF and thus the OS (Linux, Solaris) accepts it asynchronously,
i.e. without necessarily having made the connection to the server yet.

As I think about it I'm starting to like this accidental feature. To go
a little deeper, my "client" code isn't really a standalone program but
rather an instrumentation library which is linked into the real client.
Therefore I want to bend over backwards to keep the instrumented
behavior as close as possible to its vanilla version.

I think I'll try adding code to refuse the END message if it shows up
before its matching START and tell the client to "try again". I think
this will be faster and less intrusive in the 999 cases out of 1000
where START does in fact arrive first.

Thanks,
HT
Henry Townsend

2006-04-27, 7:55 am

Gordon Burditt wrote:
> What is the relative ordering of incoming connections vs. data
> transmission? I'm thinking you could scan the file descriptors IN
> ORDER OF THEIR CONNECTION, not numerically (this probably requires
> maintaining an array or list of connection file descriptors in order
> of connection).


This certainly sounds like a possible solution. Thanks.

> If it is possible both connections could come in "simultaneously" ...


My question is what does "come in" mean here? The START message is
absolutely sent first in that the client does (pseudo-code):

socket();
connect();
write("START MESSAGE");
shutdown();
close();
...
socket();
connect();
write("END MESSAGE");
shutdown();
close();

but clearly sent first doesn't mean arrived first.

> Perhaps your protocol needs START_ACK and END_ACK tokens, which the
> server sends and the client waits for before proceeding. Consider
> the FTP protocol: I believe the server sends back an ack for the
> command on port 21, which the client waits for, before the client
> starts setting up a port 20 connection.


Yes, clearly there needs to be some sort of ack protocol. Thanks.

HT
moi

2006-04-27, 7:55 am

Henry Townsend wrote:
> moi wrote:
>
>
>
> Yes, that's a good idea though it might be a bit slower if each client
> had to wait for the server to get around to sending him an ack before
> continuing.


I now understand you don't want the client to block on read (ACK),
since this would make the instrumentation change the behaviour of the
client.

> guessing that since the START message is quite short it fits within
> PIPE_BUF and thus the OS (Linux, Solaris) accepts it asynchronously,
> i.e. without necessarily having made the connection to the server yet.


If *all* the messages are that short (and you can be happing with
occasionally dropping one) , UDP seems more natural to me.
(syslog comes to mind). You get message-boundaries for free, and you can
keep the socket open between writes. Within the same machine,
messageordering is preserved (given you use just one socket)

> As I think about it I'm starting to like this accidental feature. To go
> a little deeper, my "client" code isn't really a standalone program but
> rather an instrumentation library which is linked into the real client.
> Therefore I want to bend over backwards to keep the instrumented
> behavior as close as possible to its vanilla version.
>
> I think I'll try adding code to refuse the END message if it shows up
> before its matching START and tell the client to "try again". I think
> this will be faster and less intrusive in the 999 cases out of 1000
> where START does in fact arrive first.


If you allready have sufficient ID in the packets to match the
request, you could just keep the mismatched ENDs in a small queue or
hashtable, or even a bitmap. The BEGIN-code should then first check for
a prematurely-pending-end-of-file (tm).


HTH,
AvK
moi

2006-04-27, 7:55 am

Henry Townsend wrote:
> Gordon Burditt wrote:
>


> My question is what does "come in" mean here? The START message is
> absolutely sent first in that the client does (pseudo-code):
>
> socket();
> connect();
> write("START MESSAGE");
> shutdown();
> close();
> ...
> socket();
> connect();
> write("END MESSAGE");
> shutdown();
> close();
>
> but clearly sent first doesn't mean arrived first.


The shutdown (is it TCP?) may result in the connection being broken down
before any payload is transferred. But you don't wamt the close() to
block, I guess.

HTH,
AvK
Gordon Burditt

2006-04-27, 7:55 am

>> What is the relative ordering of incoming connections vs. data
>
>This certainly sounds like a possible solution. Thanks.


But it's NOT a foolproof one.

>
>My question is what does "come in" mean here? The START message is
>absolutely sent first in that the client does (pseudo-code):
>
> socket();

/******A*******/
> connect();
> write("START MESSAGE");
> shutdown();
> close();
> ...
> socket();
> connect();
> write("END MESSAGE");

/******B*******/

If all of the stuff between A and B can happen (with their effects
reaching the server) while the server is off processing stuff for
another client, or perhaps doesn't even get scheduled, you may not
detect the incoming connections in the correct order (that is, the
first accept() returns the connection from the SECOND connect()
above. Probably, for this to happen, a packet needs retransmission).
This leaves you with the possibility of END before START.

> shutdown();
> close();
>
>but clearly sent first doesn't mean arrived first.
>
>
>Yes, clearly there needs to be some sort of ack protocol. Thanks.


Gordon L. Burditt
Rick Jones

2006-04-27, 7:55 am

Henry Townsend <henry.townsend@not.here> wrote:
> socket();
> connect();
> write("START MESSAGE");
> shutdown();
> close();


Why are you bothering with shutdown() if you just go ahead and call
close()?

BTW, file descriptors are not assigned until after accept() is
called. So, in your example of walking the descriptors returned by
select(), 13 would only exist after accept() was called, which would
be after select said your listen endpoint was "readable"

You could I suppose, try an immediate, non-blocking recv() on the FD
newly returned by select().

Still, since your client is not waiting for anything from the server
(tsk tsk tsk you are XXX-u-me-ing (a wise old engineer taught me to
spell that way that connections are queued to a listen endpoint in
the order in which they arrive. While that may be true for 99 out of
every ten stacks out there, that 100th out of 10th stack, well...

Precisely why is it that the client has to do this with two connections?

FWIW, the server's "ack" could be implicit in closing the connection.
The client would be roughly:

socket();
connect();
write("START MESSAGE");

shutdown(SHUT_WR); /* cause a read return of zero at the server,
but remain open for recv */
select()/poll() for socket to be readable:
recv() - get the read return of zero to show the server closed
close();

(BTW, that is an example of a time where you would call shutdown() and
then do some other stuff with the socket and _then_ call close() )

Then you will know, without a doubt that the server has received your
first message and you can initiate the second one without reordering.

If you are concerned about the latency of the second connection setup,
you could still establish the second connection, but not send the
message until after the first was closed:

socket(1);
connect(1);
write(1 "START MESSAGE");

shutdown(1 SHUT_WR); /* cause a read return of zero at the server,
but remain open for recv */
socket(2);
connect(2);
select(1)/poll(1) for socket to be readable:
recv(1) - get the read return of zero to show the server closed
write(2);
close(1);
close(2);

while you will not _know_ that the second connection is established
after the server reads the first message, you do know that the second
message will not be received until after the server reads the first
message. You could even move socket(2)/connect(2) to just after the
write(1). This is XXX-u-me-ing your server is doing the right things
juggling multiple connections.

But I'd still wonder just exactly why it has to be two separate TCP
connections... and if the client isn't waiting for some kind of
response from the server, it has no idea that its messages were ever
received in the first place...

rick jones
--
a wide gulf separates "what if" from "if only"
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
Rick Jones

2006-04-27, 7:55 am

BTW, just to be paranoid - many people talk about TCP offering
"guaranteed delivery." That really isn't correct. What TCP offers is
"guaranteed notification of _probable_ non-delivery" - but then, only
when the application sticks-around long enough to hear. It is a
subtle, but very important distinction. It means you really cannot
XXX-u-me that fire and forget actually "worked" just because it was
over TCP.

rick jones
--
No need to believe in either side, or any side. There is no cause.
There's only yourself. The belief is in your own precision. - Jobert
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
Henry Townsend

2006-04-27, 7:56 am

moi wrote:
> The shutdown (is it TCP?) may result in the connection being broken down
> before any payload is transferred. But you don't wamt the close() to
> block, I guess.


Sorry, I meant to say before . Yes, it is TCP.
Henry Townsend

2006-04-27, 7:56 am

Rick Jones wrote:
> Why are you bothering with shutdown() if you just go ahead and call
> close()?


Because I'm a novice with sockets and had the impression from my
research that it's considered good form.

> FWIW, the server's "ack" could be implicit in closing the connection...


Thanks, I've implemented this and it works great (see below).

> But I'd still wonder just exactly why it has to be two separate TCP
> connections...


Briefly: clients are potentially very long lived and thousands can be
active at once (in fact that's how I ran into the problem posted about).
The server would be in danger of running out of file descriptors if it
kept an open socket for each active client. Not to mention that select()
is generally limited to 1024.

> and if the client isn't waiting for some kind of
> response from the server, it has no idea that its messages were ever
> received in the first place...


In this application it's not important for the *client* to know its
messages were received. The *server* will know if it doesn't get an END
message to match the START.

Thanks to the help I've gotten here, I think I now have a robust system.
The server does ack the START message and the client blocks till it sees
that. The client later sends the END and doesn't worry about whether it
got there, but the server is counting heads and will know.

Thanks again,
HT
Rick Jones

2006-04-27, 7:56 am

Henry Townsend <henry.townsend@not.here> wrote:
> Rick Jones wrote:
[vbcol=seagreen]
> Because I'm a novice with sockets and had the impression from my
> research that it's considered good form.


IMO, unless you are doing something like

shutdown(SHUT_WR)
recv() return of zero for the remote close
close()

which, btw is good form you might as well just call close()
directly.

[vbcol=seagreen]
> Thanks, I've implemented this and it works great (see below).


[vbcol=seagreen]
> Briefly: clients are potentially very long lived and thousands can
> be active at once (in fact that's how I ran into the problem posted
> about). The server would be in danger of running out of file
> descriptors if it kept an open socket for each active client. Not to
> mention that select() is generally limited to 1024.


It has been a very long time since I've seen select() limited to 1024

However, the performance of select() or poll() with hundreds and/or
thousands of FD's is a very real issue - hence the likes of epoll or
eventports etc:

ftp://ftp.cup.hp.com/dist/netoworki..._eventports.txt

Of course, "malicious" or otherwise unhappy clients could simply call
connect() and wait around before calling send/write which means your
server could be "stuck" with hundreds and/or thousands of connections
even now Isn't socket programming fun

[vbcol=seagreen]
> In this application it's not important for the *client* to know its
> messages were received. The *server* will know if it doesn't get an
> END message to match the START.


> Thanks to the help I've gotten here, I think I now have a robust
> system. The server does ack the START message and the client blocks
> till it sees that. The client later sends the END and doesn't worry
> about whether it got there, but the server is counting heads and
> will know.


I presume that the server assumes that if it receives no END message
that it's "ack" of the clients START message might have been lost? Or
does that not matter in this case?

rick jones
--
oxymoron n, commuter in a gas-guzzling luxury SUV with an American flag
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
Henry Townsend

2006-04-27, 7:56 am

Rick Jones wrote:
> It has been a very long time since I've seen select() limited to 1024


On Solaris 9 the default limit on both select() and the process overall
is still 1024 file descriptors. Each of these can be raised up to (I
think) 65536 in various ways but it takes some doing.

> However, the performance of select() or poll() with hundreds and/or
> thousands of FD's is a very real issue - hence the likes of epoll or
> eventports etc:
>
> ftp://ftp.cup.hp.com/dist/netoworki..._eventports.txt


Problem here (which I haven't been stressing because this is c.u.p) is
that the server has to run on Windows too. And so far the lowest common
denominator I've found is select(). I've spent a little time looking at
libevent and liboop (thanks for the pointer to eventports, BTW) and may
move to libevent because it seems to fit Windows into its abstraction.

> I presume that the server assumes that if it receives no END message
> that it's "ack" of the clients START message might have been lost? Or
> does that not matter in this case?


It's sufficient to conclude that "something went wrong" and abort. The
distinction between the two kinds of error cases above is not
interesting to the server. And it would be clear from inspection anyway;
if no ack was received the client will be blocked in a permanent read;
otherwise, if no END shows up it must be because the client aborted or
hung between start and end. In all these cases a few minutes with
ps/truss/strace/etc will clarify what happened.

HT
Maxim Yegorushkin

2006-04-27, 7:56 am


Henry Townsend wrote:
> Rick Jones wrote:
>
> On Solaris 9 the default limit on both select() and the process overall
> is still 1024 file descriptors. Each of these can be raised up to (I
> think) 65536 in various ways but it takes some doing.
>
>
> Problem here (which I haven't been stressing because this is c.u.p) is
> that the server has to run on Windows too. And so far the lowest common
> denominator I've found is select().


AFAIK, windoze is also capable of handling more than 1024 descriptors
in select().

davids@webmaster.com

2006-04-27, 7:56 am

You are asking for the impossible. You are saying "I want to start
doing X, and then without waiting for anything, even for X to complete,
I want to start doing Y. Then I want them to finish on their own, I
don't want to wait for anything. But I want to ensure that somewhere
else, X is seen as completed before Y is."

You can only do this if something imposes an order that X complete
before Y begins. TCP can do this in a single connection.

Another way is subtle, but works well. Imagine if Y was more like "END,
but if you didn't get a START, ignore my next START". And X was more
like "START, unless you already got an END without a matching START, in
which case, nothing".

You can accomplish this by including a token in the START and END
messages. An END without a corresponding START is treated as both a
START and an END, and the token is added to an ignore list so the START
can be ignored when received.

DS

Rick Jones

2006-04-27, 7:56 am

Henry Townsend <henry.townsend@not.here> wrote:
> Rick Jones wrote:
[vbcol=seagreen]
> On Solaris 9 the default limit on both select() and the process overall
> is still 1024 file descriptors. Each of these can be raised up to (I
> think) 65536 in various ways but it takes some doing.


[vbcol=seagreen]
> Problem here (which I haven't been stressing because this is c.u.p) is
> that the server has to run on Windows too.


That does toss some rather large wrenches into the works.

> And so far the lowest common denominator I've found is
> select(). I've spent a little time looking at libevent and liboop
> (thanks for the pointer to eventports, BTW) and may move to libevent
> because it seems to fit Windows into its abstraction.


In my case, for netperf4 I've been using glib and and GIOChannels. I
went with glib to also get abstraction for dynamic library loading and
the wizzy parser - so wizzy in fact that there is no call one can make
to get the code to emit the stuff it does when given --help !-( But I
digress...

I've no idea what they come down to under the covers though - could be
select(), could be something else. I've not looked at the glib source
code.

rick jones
--
firebug n, the idiot who tosses a lit cigarette out his car window
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com