 |
|
 |
|
07-07-07 06:20 PM
Hello,
I'm in the process of improving an existing tcp server in order to improve
the speed.
The server is ported to Windows, linux, solaris, macosx and several other
unix.
The server has until now been running with up to 50 clients at once which
performs an action once in a while.
However due to new requirements the server should be capable of handling
several thousand clients.
Overall the design is pretty flexible so there should be no bottlenecks for
achiving this.
However after performing a bit of testing I found some issues which needs to
be improved.
In my test I wrote a client application which simulates up to 500 clients by
spawning 500 threads. Each thread would then connect to the server and
perform a request.
When the server is empty the request takes about 250 ms. However with 500
idle clients connected each request takes about 890 ms.
This means that each request gets slowed down by several times.
On the server side I have 2 linked lists using std::list.
One is called unprocessed connections and the other is called in-process
connections.
Then I have a thread pool consisting of a thread per 15 connection + 3.
This means that for 500 clients I would have 500 / 15 + 3 = 36 threads.
I found by testing that this value gives the fastest results.
What each thread does is:
1. Remove a connection from the unprocessed list and insert it into the
in-process list.
2. Perform select() on each connection object and select for readability and
writability.
3. If something needs to be read or written the action is performed.
I should mention that all connections use non-blocking sockets already.
From my tests I know that out of 501 connections in total the 500 are idle
and does not need to be processed.
However select still returns that the sockets are readable. I did a small
optimization here by ensuring that recv() would only get called
when there was no pending data for sending back to the client. This gave a
big speed improvement.
So the question is how I can more effectively find out which socket are
pending for reading?
I found as a test that if I remove select for writability (by always pushing
data to be sent to the client even though it would return EWOULDBLOCK) it's
not going to improve the speed. So perhaps I should change my strategy for
finding readable and writable connections in another manner?
Thanks.
-- Henrik
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
07-07-07 06:20 PM
On Sat, 07 Jul 2007 14:14:59 +0200, Henrik Goldman wrote:
> Hello,
>
> I'm in the process of improving an existing tcp server in order to improve
> the speed.
> The server is ported to Windows, linux, solaris, macosx and several other
> unix.
>
> The server has until now been running with up to 50 clients at once which
> performs an action once in a while.
> However due to new requirements the server should be capable of handling
> several thousand clients.
> Overall the design is pretty flexible so there should be no bottlenecks fo
r
> achiving this.
> However after performing a bit of testing I found some issues which needs
to
> be improved.
>
> In my test I wrote a client application which simulates up to 500 clients
by
> spawning 500 threads. Each thread would then connect to the server and
> perform a request.
> When the server is empty the request takes about 250 ms. However with 500
> idle clients connected each request takes about 890 ms.
> This means that each request gets slowed down by several times.
>
Which is to be expected. More work==slower response.
Where does the server spend it's time ?
Can you profile it ?
> On the server side I have 2 linked lists using std::list.
Why ?
> One is called unprocessed connections and the other is called in-process
> connections.
Why ?
> Then I have a thread pool consisting of a thread per 15 connection + 3.
> This means that for 500 clients I would have 500 / 15 + 3 = 36 threads.
> I found by testing that this value gives the fastest results.
>
> What each thread does is:
> 1. Remove a connection from the unprocessed list and insert it into the
> in-process list.
Why ?
IMHO, for dispatching you need only *one* list/queue.
(this could even be implemented as a bitmap, eg an fd_set)
A thread can take a task from(the head of) the list and
execute it. If the task is finished, you are done, otherwise, the task can
be re-added to the work-list.
> 2. Perform select() on each connection object and select for readability
> and writability.
You call select() for every connection ? That would take 2 systemcalls for
every read... Why not let one centralized select() that adds work to the
worklist ?
> 3. If something needs to be read or written the action is
performed.
If nothing needs to be done, the task should not *be* on the worklist in
the first place....
> I should mention that all connections use non-blocking sockets already.
>
Which is good. (but it also frees you from calling select() before every
read(), since the read would "return" EWOULDBLOCK anyway.
> From my tests I know that out of 501 connections in total the 500 are
> idle and does not need to be processed. However select still returns
> that the sockets are readable. I did a small optimization here by
> ensuring that recv() would only get called when there was no pending
> data for sending back to the client. This gave a big speed improvement.
>
> So the question is how I can more effectively find out which socket are
> pending for reading?
> I found as a test that if I remove select for writability (by always
> pushing data to be sent to the client even though it would return
> EWOULDBLOCK) it's not going to improve the speed. So perhaps I should
> change my strategy for finding readable and writable connections in
> another manner?
>
For writeability, opinions differ. Normally, you don't need to select()
for it. If the network bandwidth is saturated, (and the response time
creeps up) people will stop using your server anyway.
Also, if you do throttle the writing process, you would still have to
buffer the data in your application's memory, which will cost roughly the
same amount of memory. (the best way would probably be to stop accepting
new work until the output has been drained )
HTH,
AvK
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
07-07-07 06:20 PM
>
> Which is to be expected. More work==slower response.
> Where does the server spend it's time ?
> Can you profile it ?
So far I have not been able to profile it that much but perhaps I should
recompile it with gprof to see how it behaves.
However it's pretty straightforward what goes on. Since I know that most
clients are not doing anything the only thing that goes on is stuff related
to networking.
>
> Why ?
>
> Why ?
This is a business demand in order to do remote statistics and get a
complete picture of who is connected to the server.
It would not be needed otherwise. However it's needed to take a snapshot of
the current server status.
> IMHO, for dispatching you need only *one* list/queue.
> (this could even be implemented as a bitmap, eg an fd_set)
Very true. fd_set bitmap will likely not be sufficient since each client has
an amount of status information associated.
However everything is wrapped into a connection object on the server side
which also includes the socket.
> A thread can take a task from(the head of) the list and
> execute it. If the task is finished, you are done, otherwise, the task can
> be re-added to the work-list.
Thats exactly what happens. There are just more threads doing that.
>
> You call select() for every connection ? That would take 2 systemcalls for
> every read... Why not let one centralized select() that adds work to the
> worklist ?
Because most OS's has a limit of how many fd's can be polled. I know that
it's common that OS's has 64 as a limit.
My idea to work around this is to poll up to 64 at a time and then add them
into the queue again.
So instead of just processing 1 connection it processes up to 64 at once.
>
> If nothing needs to be done, the task should not *be* on the worklist in
> the first place....
Well thats the issue. You need to find out which connection needs to be
processed before you can add it.
>
> Which is good. (but it also frees you from calling select() before every
> read(), since the read would "return" EWOULDBLOCK anyway.
Good point. I think I'll try to do something about that.
>
> For writeability, opinions differ. Normally, you don't need to select()
> for it. If the network bandwidth is saturated, (and the response time
> creeps up) people will stop using your server anyway.
Heh this is not an option. People are forced to use the services since it's
used within large corporations where it's not an option.
The only alternative is to let people install use more than one server to
get more speed.
> Also, if you do throttle the writing process, you would still have to
> buffer the data in your application's memory, which will cost roughly the
> same amount of memory. (the best way would probably be to stop accepting
> new work until the output has been drained )
>
Yes everything is buffered until written.
As you pointed out I stop accepting new reads from the same client until all
data is written. This speed up the process a lot since reading and writing
won't be happening in the same run.
Thanks for your input.
-- Henrik
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
07-07-07 06:20 PM
On Sat, 07 Jul 2007 17:07:25 +0200, Henrik Goldman wrote:
>
> So far I have not been able to profile it that much but perhaps I should
> recompile it with gprof to see how it behaves.
> However it's pretty straightforward what goes on. Since I know that most
> clients are not doing anything the only thing that goes on is stuff relate
d
> to networking.
Profile.
[my hypothesis is that you waste too much time maintaining the linked
lists, or you are convoying on the semaphores that guard them. Or you
poll too much).
>
[vbcol=seagreen]
>
> This is a business demand in order to do remote statistics and get a
> complete picture of who is connected to the server. It would not be
> needed otherwise. However it's needed to take a snapshot of the current
> server status.
Business demand does not dictate an implementation.
You could easily get your 'state report' by scanning an array and
counting the various states that the connections happen to be in.
>
>
> Very true. fd_set bitmap will likely not be sufficient since each client
> has an amount of status information associated. However everything is
> wrapped into a connection object on the server side which also includes
> the socket.
>
"normally" (eg without threads/objects), one would just use an array,
indexed by filedescriptor (since fd is guaranteed to be the lowest
available, this can be a fixed-size array). Each entry would contain all
the {state,data, buffers} wrt this connection.
In your case, you could use an array of pointers to the "objects"
it. If[vbcol=seagreen]
>
> Thats exactly what happens. There are just more threads doing that.
If you protect the list-operations by a semaphore, ("latch") there is
probably nothing wrong with this.
>
> Because most OS's has a limit of how many fd's can be polled. I know
> that it's common that OS's has 64 as a limit. My idea to work around
> this is to poll up to 64 at a time and then add them into the queue
> again.
> So instead of just processing 1 connection it processes up to 64 at
> once.
I don't know about other OSses. For UNIX, there will always be a way to
select() on all your available fds. (the same goes for poll()) You may
have to do some tweaking, but is is possible.
If you insist on chopping up your fdset, (which can only be done when
using poll() BTW), there is one problem: you cannot afford to block inside
select/poll, so you are in-fact busy-polling. ( --> calling select/poll N
times, just to discover 1 readable fd)
>
>
> Well thats the issue. You need to find out which connection needs to be
> processed before you can add it.
>
IMHO, "processed" is ambiguous, here.
WRT processing, there are only two states:
NEED_INPUT: this the 'idle' state for a connection.
this connection's fd has to be included in the fd_set for
input.
HAVE_WORK: in this state, enough input has been collected to perform
some useful work. We don't need more input, since we still have work...
( -->> this fd does not have to be included in the read-fd_set)
Note that the transition between NEED_INPUT and HAVE_WORK can be subtle:
eg if your protocol is 'line based', you cannot process before a CR/LF is
seen. You might ad a pointer or flag (or extra states) to handle this.
[ removing/adding a node to a linked list (+latching) is probably a bit
too expensive, just to check for sufficient input ]
HTH,
AvK
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
 |  |  |  |  |
 |
 |
|
allthecoolkidshaveone@gmail.com |
|
|
 |
 |


 |
 |
 |
|  |  |  |  |
|
07-08-07 06:22 AM
On Jul 7, 5:14 am, "Henrik Goldman" <henrik_gold...@mail.tele.dk>
wrote:
> Hello,
>
> I'm in the process of improving an existing tcp server in order to improve
> the speed.
> The server is ported to Windows, linux, solaris, macosx and several other
> unix.
>
One option:
Get rid of most if not all the threads. Rewrite it with a callback-
based model using libevent
(http://monkey.org/~provos/libevent/), which will use the highest
performance event polling mechanism a particular OS supports (kqueue
on BSDs (Including OS X), epoll on linux 2.6, /dev/poll on Solaris,
poll, and as a last resort, select).
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
07-09-07 12:18 AM
On Jul 7, 7:10 am, moi <r...@localhost.localdomain> wrote:
> For writeability, opinions differ. Normally, you don't need to select()
> for it. If the network bandwidth is saturated, (and the response time
> creeps up) people will stop using your server anyway.
WHAT?! That's as wrong as anything can be.
Suppose 500 people connect to your server and they each are
downloading a 1GB file over a 56Kbs connection. If you don't 'select'
for writability, your server will be completely crippled by this
regardless of how much network bandwidth the server has.
DS
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
07-09-07 12:18 AM
On Jul 7, 11:01 pm, allthecoolkidshave...@gmail.com wrote:
> One option:
>
> Get rid of most if not all the threads. Rewrite it with a callback-
> based model using libevent
> (http://monkey.org/~provos/libevent/), which will use the highest
> performance event polling mechanism a particular OS supports (kqueue
> on BSDs (Including OS X), epoll on linux 2.6, /dev/poll on Solaris,
> poll, and as a last resort, select).
That's a good idea. But you still need threads to block on disk I/O
and to handle extraordinary conditions like errors where the code may
need to fault in.
DS
[ Post a follow-up to this message ]
|
|
|
 |
|
|
07-09-07 06:26 PM
On Sun, 08 Jul 2007 17:16:56 -0700, David Schwartz wrote:
> On Jul 7, 7:10 am, moi <r...@localhost.localdomain> wrote:
>
>
>
> WHAT?! That's as wrong as anything can be.
>
I stand corrected. Opinions do not differ.
:-)
AvK
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
07-10-07 12:20 AM
Henrik Goldman <henrik_goldman@mail.tele.dk> wrote:
> In my test I wrote a client application which simulates up to 500
> clients by spawning 500 threads. Each thread would then connect to
> the server and perform a request. When the server is empty the
> request takes about 250 ms. However with 500 idle clients connected
> each request takes about 890 ms. This means that each request gets
> slowed down by several times.
How do you know some of that isn't in the client? If I were looking
to measure the scalibility of a server I'd want to have several
clients, not just one client process with 500 threads. Or, I'd want
to check the client first by testing it against several servers at
once.
The suggestion to profile things is spot-on.
rick jones
--
web2.0 n, the dot.com reunion tour...
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
07-13-07 06:22 PM
Thanks to everyone who has answered this so far.
As per suggestions I did some profiling and with a few small patches I
managed to boost the speed quite a bit. These patches were some
useless wait's here and there and some improvements with a little bit
of caching.
However now with those patches done I hit a new problem. Once in a
while I get errors on the client side and things just stop working.
This problem is *only* happening when I perform stress tests with
multiple socket connections.
I have identified it to be this piece of code on the client:
bool Csocket::SafeRecv()
{
int nIndex = 0;
int nLeft;
int ret;
HEADER H;
// Peek for data
do
{
if ((ret = m_Socket.Recv(&H, sizeof(H), MSG_PEEK)) ==
SOCKET_ERROR)
return false;
// See if client is disconnected
if (ret == 0) return false;
Sleep(5);
} while (ret < (int) sizeof(NH));
// The header has been received and tells you how much data is left
to receive.
nLeft = ENDIAN(H.lLength);
ResizeMemory(nLeft);
// Get the rest
while (nLeft > 0)
{
ret = m_Socket.Recv(&m_pBuffer[nIndex], nLeft, 0);
// Either the client disconnected or a socket error occured.
if (ret == SOCKET_ERROR) return false;
if (ret == 0) return false;
nLeft -= ret;
nIndex += ret;
}
return true;
}
I should make it clear that SOCKET_ERROR is defined as -1 and that
m_Socket.Recv() is just a wrapper around recv().
In the above code I read the header before reading rest of the data.
This is required in order to read how much more data is missing and
perform some cryptography services which I left out.
I have identified that something weird is going on with msg_peek since
"if (ret == 0) return false;" sometimes gets invoked. This means that
the server should have dropped the client connection but this is not
the case. I know that the server didn't do that just for the fun of
it.
Can anyone propose a better/ more safe way of achiving the same as
above?
What I want is to read the network header as 1 recv() before reading
rest of the data. This means that I'd like to wait on the client side
until I know that there is enough data to be read.
Thanks.
-- Henrik
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
|
Sponsored Links |
 |
 |
|
|
 |
All times are GMT. The time now is 09:16 AM. |
 |
|
|
 |
|
 |
|
|
 |
|
Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
|
|
|
|
Medical and Health forum | Computer Games Reviews | Graphics design forum
|
 |
|
 |
|