|
Home > Archive > Unix Programming > May 2007 > scalable TCP server design
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
scalable TCP server design
|
|
| Sheth Raxit 2007-05-11, 1:17 pm |
| Scalable Network Server.
I am writing multithreaded web (actually something else, ) server.
1. one thread will accept connection, after connect success (accept
return) it will create a thread and pass connection id, and go back to
accept new connection
newly created thread
now due to high load more number of threads are created, I am thinking
of following approach
2. One Producer, Thread pool of Multiple Consumers
-->Producer :
-One thread (producer) will Accept and Read- Request on Socket
-After Reading it will put Request into Queue.
-->Consumer :
-Multiple Consumer Threads will Read from the queue, procee req, send
response, close connection and again Read from Queue.
-Synchronization done using Condition Var or Semaphor.(something like
Event Driver approach)
Problem : Single thread doing accept and read (actually recieve) if
client get connected but not Write request,one thread is block on
Read , Unwanted.
Solution : multiple producer instead of single
Disadvantages :
doing all this stuff from 1 to 2 to improve performance , but i think
in case of multiple producer and multiple consumer the Synchronizatoin
overhead will come to play. It should after all outperform Thread
Creation and Thread Destroy time.
note :
1. i am mostly implementing everything in C (no C++/Java)
2. I have also implemented Prethreaded server, Everythread block on
Accept,
3. in situation 2 , how to decide number of Consumer Thread ? ( I
think i need to do real performance Test of 2, collect some statistics
and then decide)
4. anyone is having/knowing any performance measurement of approach 1
or 2
5. Is Asynchronous IO is helpful here, I am knwoing very much less
about this , but heard that it is not portable, and Conversion from 1
(which i already implemented) is complex and costly.
Any Idea how to cop with this common problem and to very very good
performance to scale the server very well. ? or any other Scalable
Server Design. ?
any link to online resouce/statistics (or opensource library/
framework) available for this ?
anyother suggestion ?
Thanks,
--Raxit
| |
|
|
| David Schwartz 2007-05-11, 7:20 pm |
|
I've spent the past 12 years working on this issue. Which OS(es) are
you interested in? The techniques are very different.
Speaking generally, the approach that I've found works best works like
this:
1) You use the OS' most efficient means of discovering which sockets
are ready to do I/O.
2) For each I/O job that is ready, you queue a job to your I/O queue.
3) You have a pool of threads that pulls I/O jobs from the queue.
4) You keep a count of how many I/O jobs need to be done but have not
been done. As soon as the I/O is finished, each thread decrements that
count. (So, if you're doing a read job, you decrement the count as
soon as you have read all the data but before you fully process it.)
5) The first thread to finish an I/O job and find the count at zero
calls the OS' socket discovery function.
On some OSes, you need more than one thread in socket discovery
functions. In that case, each I/O thread works like this:
1) Is there an I/O job to do (read/write/shutdown/setup new connection/
whatever)? If so, do it and go to step 1.
2) Does the OS' socket discovery function need to be called? If so,
call it, queue any discovered jobs and go to step 1.
3) Wait until the queue is non-empty.
There are several different ways to decide how many threads to use.
One way is to add up the number of CPUs, the number of threads you
need for socket discovery, and the number of I/O jobs you can usefully
pend concurrently and use a "few more" than that. There are also
various dynamic techniques (add a thread if the queue starts to get
larger). It's tricky, you can get in to a situation where you think
you are thread starved and keep creating more and more threads even
though all the threads are really CPU starved.
DS
| |
| Rainer Weikusat 2007-05-13, 1:21 pm |
| David Schwartz <davids@webmaster.com> writes:
[...]
> Speaking generally, the approach that I've found works best works like
> this:
>
> 1) You use the OS' most efficient means of discovering which sockets
> are ready to do I/O.
>
> 2) For each I/O job that is ready, you queue a job to your I/O queue.
>
> 3) You have a pool of threads that pulls I/O jobs from the queue.
>
> 4) You keep a count of how many I/O jobs need to be done but have not
> been done. As soon as the I/O is finished, each thread decrements that
> count. (So, if you're doing a read job, you decrement the count as
> soon as you have read all the data but before you fully process it.)
>
> 5) The first thread to finish an I/O job and find the count at zero
> calls the OS' socket discovery function.
Assuming the 'socket discovery routine' returns exactly one job, a
single thread will process that to I/O completion and then call the
discovery routine again, while everything else sits around idling (or
I am missing something here).
Worth a look: <URL:http://pl.atyp.us/content/tech/servers.html>
Excerpt:
The simplest conceptual model of a multi-threaded
event-driven server has a queue at its center; requests are
read by one or more "listener" threads and put on queues,
from which one or more "worker" threads will remove and
process them. Conceptually, this is a good model, but all too
often people actually implement their code this way. Why is
this wrong? Because the #2 cause of context switches is
transferring work from one thread to another. Some people
even compound the error by requiring that the response to a
request be sent by the original thread - guaranteeing not one
but two context switches per request. It's very important to
use a "symmetric" approach in which a given thread can go
from being a listener to a worker to a listener again without
ever changing context. Whether this involves partitioning
connections between threads or having all threads take turns
being listener for the entire set of connections seems to
matter a lot less.
| |
| David Schwartz 2007-05-14, 7:24 am |
| On May 13, 11:16 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> Assuming the 'socket discovery routine' returns exactly one job, a
> single thread will process that to I/O completion and then call the
> discovery routine again, while everything else sits around idling (or
> I am missing something here).
The only socket discovery routine that returns exactly one job that I
know of is IOCP. In that case, every thread should block on
GetOverlappedResult. I don't know of any UNIX that has a socket
discovery routine that always returns one result. Linux has epoll,
which does not. FreeBSD has kqueue, which does not. Generic UNIX has
select or poll which do not.
> Worth a look: <URL:http://pl.atyp.us/content/tech/servers.html>
>
> Excerpt:
>
> Because the #2 cause of context switches is
> transferring work from one thread to another. Some people
> even compound the error by requiring that the response to a
> request be sent by the original thread - guaranteeing not one
> but two context switches per request. It's very important to
> use a "symmetric" approach in which a given thread can go
> from being a listener to a worker to a listener again without
> ever changing context. Whether this involves partitioning
> connections between threads or having all threads take turns
> being listener for the entire set of connections seems to
> matter a lot less.
In my example, context switches are never forced. The thread that
calls the socket discovery routine immediately starts pulling jobs
from the queue. The first thread that happens to discover an empty
queue calls the socket discovery routine. My threads become 'listener'
threads when there is no work to do and 'worker' threads as soon as
there is.
DS
| |
| Sheth Raxit 2007-05-14, 7:24 am |
| On May 13, 11:16 pm, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> David Schwartz <dav...@webmaster.com> writes:
>
> [...]
>
>
>
>
>
>
mostly select (or epoll etc., i need to study kegel.com )[vbcol=seagreen]
>
>
>
>
Intresting !
Any re-usable stuff (or opensource example) available for these ?[vbcol=seagreen]
>
> Assuming the 'socket discovery routine' returns exactly one job, a
> single thread will process that to I/O completion and then call the
> discovery routine again, while everything else sits around idling (or
> I am missing something here).
>
> Worth a look: <URL:http://pl.atyp.us/content/tech/servers.html>
V. Good link,
i blooged the link before a year or 2, but i think i need to do more
in these,
while reading the link i think User level Threading - Instead of
pthread (which i think most probably maps to 1:1 model) (Correct me if
I am wrong !) may helpful.
Currently most of my code is using PThread, anyone knowing any Posix
compatible User Level Thread library (for Linux,Sol 8,10) ? I think
user level thread libraries are doing simillar stuff like thread
pooling (but may not like Queue and worker thread) so that
1. Thread creation and destroy time is less
2. Context Switch time is less.
>
> Excerpt:
>
> The simplest conceptual model of a multi-threaded
> event-driven server has a queue at its center; requests are
> read by one or more "listener" threads and put on queues,
> from which one or more "worker" threads will remove and
> process them. Conceptually, this is a good model, but all too
> often people actually implement their code this way. Why is
> this wrong? Because the #2 cause of context switches is
> transferring work from one thread to another. Some people
> even compound the error by requiring that the response to a
> request be sent by the original thread - guaranteeing not one
> but two context switches per request. It's very important to
> use a "symmetric" approach in which a given thread can go
> from being a listener to a worker to a listener again without
> ever changing context. Whether this involves partitioning
> connections between threads or having all threads take turns
I think i can pass connection-fd to queue so the Workerthread can send
the response,
> being listener for the entire set of connections seems to
> matter a lot less.- Hide quoted text -
>
> - Show quoted text -
--Raxit
| |
| David Schwartz 2007-05-14, 7:24 am |
| On May 14, 1:48 am, Sheth Raxit <raxitsheth2...@yahoo.co.in> wrote:
> Currently most of my code is using PThread, anyone knowing any Posix
> compatible User Level Thread library (for Linux,Sol 8,10) ? I think
> user level thread libraries are doing simillar stuff like thread
> pooling (but may not like Queue and worker thread) so that
User level thread libraries are essentially useless for I/O and server
type applications. The two main reasons for using threads are to keep
multiple CPUs busy and to prevent a stall (page fault, kernel thread
hijack, etcetera) from stalling the entire process. User level thread
libraries can't do either of these two things.
> 1. Thread creation and destroy time is less
So what? Say you need 100 threads. You create 100 threads in the first
second and then you never have to create or destroy a thread ever
again. You create and destroy threads (other than those you initially
need) as an optimization. Don't do it if it costs more than it saves.
> 2. Context Switch time is less.
You have to do something really wrong to create an application where
the cost of context switches is significant. Just don't do any of the
obviously stupid things that cause this problem.
DS
| |
| Sheth Raxit 2007-05-14, 7:24 am |
| On May 14, 2:13 pm, David Schwartz <dav...@webmaster.com> wrote:
> On May 14, 1:48 am, Sheth Raxit <raxitsheth2...@yahoo.co.in> wrote:
>
>
> User level thread libraries are essentially useless for I/O and server
> type applications. The two main reasons for using threads are to keep
> multiple CPUs busy and to prevent a stall (page fault, kernel thread
> hijack, etcetera) from stalling the entire process. User level thread
> libraries can't do either of these two things.
>
>
> So what? Say you need 100 threads. You create 100 threads in the first
> second and then you never have to create or destroy a thread ever
> again. You create and destroy threads (other than those you initially
> need) as an optimization. Don't do it if it costs more than it saves.
>
>
> You have to do something really wrong to create an application where
> the cost of context switches is significant. Just don't do any of the
> obviously stupid things that cause this problem.
I think, you are prompting about over-optimization i am trying to
do, ?
--Raxit
>
> DS
| |
| David Schwartz 2007-05-14, 7:24 am |
| On May 14, 3:01 am, Sheth Raxit <raxitsheth2...@yahoo.co.in> wrote:
> I think, you are prompting about over-optimization i am trying to
> do, ?
Whenever someone says something like "high-load" or "high-
performance", I assume they mean somewhere near the limit. If "high
load" to you means a couple of hundred concurrent connections or
something, then almost anything will work. (Except one-thread-per-
client.)
DS
| |
| Sheth Raxit 2007-05-14, 7:24 am |
| On May 14, 3:11 pm, David Schwartz <dav...@webmaster.com> wrote:
> On May 14, 3:01 am, Sheth Raxit <raxitsheth2...@yahoo.co.in> wrote:
>
>
> Whenever someone says something like "high-load" or "high-
> performance", I assume they mean somewhere near the limit. If "high
> load" to you means a couple of hundred concurrent connections or
> something, then almost anything will work. (Except one-thread-per-
> client.)
I mean approx. 1000 concurrent user can do,
My TCP Server actually doing this per request (not per client or not
per connection. !) (It serves one Request per connection -- I think
this may be bottleneck, Socket open/close is more more costly then
thread create/destroy !)
1. accept connection
2. create a thread this thread will do following
3. read request
4. process it ( disk i/o, read write record to files)
5. write response
6. close connection
7. exit thread
in other approach i tried on Pre Threaded Server with block on Accept,
but don't help much.
Thanks for your insightful comment on this.
--Raxit
>
> DS
| |
| David Schwartz 2007-05-14, 7:18 pm |
| On May 14, 3:55 am, Sheth Raxit <raxitsheth2...@yahoo.co.in> wrote:
> I mean approx. 1000 concurrent user can do,
Then any design other than thread-per-connection will probably be
fine. You only have to get fancy at around 10,000 connections.
> My TCP Server actually doing this per request (not per client or not
> per connection. !) (It serves one Request per connection -- I think
> this may be bottleneck, Socket open/close is more more costly then
> thread create/destroy !)
> 1. accept connection
> 2. create a thread this thread will do following
> 3. read request
> 4. process it ( disk i/o, read write record to files)
> 5. write response
> 6. close connection
> 7. exit thread
This would mean that to serve 1,000 clients you need 1,000 threads. It
would also mean that to do a little bit of work for each client, you
would need 1,000 context switches on a single CPU machine. That's
obviously badly broken.
You tried the one approach that doesn't work. You cannot dedicate a
thread to a connection.
> in other approach i tried on Pre Threaded Server with block on Accept,
> but don't help much.
>
> Thanks for your insightful comment on this.
Thread-per-connection is not scalable. The number of threads should be
based on how much work you can usefully do at once, not on how much
work you have to do in total.
DS
| |
| Rainer Weikusat 2007-05-15, 7:18 am |
| David Schwartz <davids@webmaster.com> writes:
> On May 13, 11:16 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
Nobody who reads this message can possibly extract any information
from it regarding what my text below was refering to. Which means that
somebody will certainly post some random nonsense in response to it,
assuming some random context he or she sees fit. When replying to
post, please keep the content of text intact by not removing all of
the context.
>
> The only socket discovery routine that returns exactly one job that I
> know of is IOCP. In that case, every thread should block on
> GetOverlappedResult. I don't know of any UNIX that has a socket
> discovery routine that always returns one result. Linux has epoll,
> which does not. FreeBSD has kqueue, which does not. Generic UNIX has
> select or poll which do not.
All 'socket discovery routines' will return 'exactly one job' if
exactly one job is available when they are called. I was assuming this
would be obvious enough to an eventual reader that explicitly
mentioning it would be superflous.
>
>
> In my example, context switches are never forced. The thread that
> calls the socket discovery routine immediately starts pulling jobs
> from the queue.
But the thread idling on some other CPU has already pulled it.
| |
| David Schwartz 2007-05-15, 7:17 pm |
| On May 15, 2:31 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> David Schwartz <dav...@webmaster.com> writes:
[vbcol=seagreen]
> All 'socket discovery routines' will return 'exactly one job' if
> exactly one job is available when they are called. I was assuming this
> would be obvious enough to an eventual reader that explicitly
> mentioning it would be superflous.
Right, but who cares about that case? You don't optimize for the most
miniscule amount of load. In any event, I think I handle that the best
possible way, whatever thread first gets to the queue gets the job,
and the thread that discovered the job is not out of the running.
[vbcol=seagreen]
> But the thread idling on some other CPU has already pulled it.
In that case, you win. The job got started even earlier than if you
had avoided the context switch. Context switches are bad when they
delay things. If they let things start earlier, they are good.
DS
| |
| David Schwartz 2007-05-15, 7:17 pm |
| Replying again now that I know what you mean:
On May 13, 11:16 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> Assuming the 'socket discovery routine' returns exactly one job, a
> single thread will process that to I/O completion and then call the
> discovery routine again, while everything else sits around idling (or
> I am missing something here).
Since there's only one job to do, this is exactly what you want. This
is perfect because there are no forced context switches. One thread
discovers the job, does the job, checks for more jobs.
Everything else is idling because there's nothing else to do.
(Or am I still misunderstanding you?)
DS
| |
| Frank Cusack 2007-05-16, 1:18 am |
| On Sun, 13 May 2007 20:16:25 +0200 Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
> David Schwartz <davids@webmaster.com> writes:
>
> [...]
>
and queues the job? then what does it do?
[vbcol=seagreen]
> Assuming the 'socket discovery routine' returns exactly one job, a
> single thread will process that to I/O completion and then call the
> discovery routine again, while everything else sits around idling (or
> I am missing something here).
>
> Worth a look: <URL:http://pl.atyp.us/content/tech/servers.html>
Isn't there a paper somewhere that shows calling accept() in multiple
threads is more work (less throughput) than having a single accept()
thread (just in an accept() loop) that dispatches work? I guess the
idea is that if lots of connections are pending, a lot of context
switches occur between the accept()ing threads, whereas a single
accept() thread runs the accept() part much more efficiently.
I can't find hide nor hair of it though.
-frank
| |
| William Ahern 2007-05-16, 1:18 am |
| Frank Cusack <fcusack@fcusack.com> wrote:
<snip>
> Isn't there a paper somewhere that shows calling accept() in multiple
> threads is more work (less throughput) than having a single accept()
> thread (just in an accept() loop) that dispatches work? I guess the
> idea is that if lots of connections are pending, a lot of context
> switches occur between the accept()ing threads, whereas a single
> accept() thread runs the accept() part much more efficiently.
> I can't find hide nor hair of it though.
Are you referring to the stampeding herd problem?
In modern kernels this is solved, at least for the historical scenarios. In
Linux, for instance, the accept descriptor's wait queue is smart enough to
only wake one thread.
| |
| David Schwartz 2007-05-16, 1:18 am |
| On May 15, 7:03 pm, Frank Cusack <fcus...@fcusack.com> wrote:
> Isn't there a paper somewhere that shows calling accept() in multiple
> threads is more work (less throughput) than having a single accept()
> thread (just in an accept() loop) that dispatches work? I guess the
> idea is that if lots of connections are pending, a lot of context
> switches occur between the accept()ing threads, whereas a single
> accept() thread runs the accept() part much more efficiently.
>
> I can't find hide nor hair of it though.
The fundamental issue with 'select' and 'poll' is that if there is not
at least one job ready at the time you call, the thread must be put on
a large number of wait queues and then when it wakes up, it must be
removed from all of them. This effect can swamp the context switch
issue in many realistic cases. You want to rig things so either your
sets are sufficiently small or you only very rarely (under low load)
have to sleep in 'select' or 'poll'.
This is largely a legacy issue because 'select' is pretty close to
obsolete. There are a variety of solutions all with disadvantages and
advantages. One possibility is to just have one thread 'select' on all
sockets. Another possibility is to break the sockets up into large
enough chunks that the maximum number of context switches is still
very small.
The best method I ever tried was one that keeps multiple 'select' or
'poll' sets sorted by activity. This allows the thread handling the
most active sockets to stay running for a relatively long time. The
thread that handles the inactive sockets has to be put on a lot of
wait queues every time it enters and leaves the kernel, but because
the sockets are inactive, it does that only rarely.
Mechanisms like IOCP, epoll, kqueue, and the like avoid this issue.
DS
| |
| Rainer Weikusat 2007-05-16, 7:17 am |
| David Schwartz <davids@webmaster.com> writes:
> Replying again now that I know what you mean:
>
> On May 13, 11:16 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
>
>
> Since there's only one job to do, this is exactly what you want. This
> is perfect because there are no forced context switches. One thread
> discovers the job, does the job, checks for more jobs.
>
> Everything else is idling because there's nothing else to do.
>
> (Or am I still misunderstanding you?)
I'd rather suggest that you still try some nonsense-interpretation of
my original remark 'designed' to take place in a context-free
environment, ie without anybody having a realistic chance to
understand what is actually being discussed. But, oh well, here is your
original outline again:
,----
| >
| > 1) You use the OS' most efficient means of discovering which sockets
| > are ready to do I/O.
| >
| > 2) For each I/O job that is ready, you queue a job to your I/O queue.
| >
| > 3) You have a pool of threads that pulls I/O jobs from the queue.
| >
| > 4) You keep a count of how many I/O jobs need to be done but have not
| > been done. As soon as the I/O is finished, each thread decrements that
| > count. (So, if you're doing a read job, you decrement the count as
| > soon as you have read all the data but before you fully process it.)
| >
| > 5) The first thread to finish an I/O job and find the count at zero
| > calls the OS' socket discovery function.
`----
Using this algorithm, if the initial 'discovery call' returns exactly
one job, the discovery routine won't be called again until that one
job has been processed. This means that jobs arriving while the one
job is being processed will be ignored until then, despite there are
(supposedly) threads availabe doing nothing.
| |
| B. Augestad 2007-05-16, 1:19 pm |
| David Schwartz wrote:
[Interesting stuff snipped]
>
> Mechanisms like IOCP, epoll, kqueue, and the like avoid this issue.
>
How portable/standardized are these mechanisms nowadays?
Bjørn
--
Looking for an embeddable web server?
http://www.metasystems.no/products/...nder/index.html
| |
| David Schwartz 2007-05-16, 1:19 pm |
|
Rainer Weikusat wrote:
> | > 1) You use the OS' most efficient means of discovering which sockets
> | > are ready to do I/O.
> | >
> | > 2) For each I/O job that is ready, you queue a job to your I/O queue.
> | >
> | > 3) You have a pool of threads that pulls I/O jobs from the queue.
> | >
> | > 4) You keep a count of how many I/O jobs need to be done but have not
> | > been done. As soon as the I/O is finished, each thread decrements that
> | > count. (So, if you're doing a read job, you decrement the count as
> | > soon as you have read all the data but before you fully process it.)
> | >
> | > 5) The first thread to finish an I/O job and find the count at zero
> | > calls the OS' socket discovery function.
> Using this algorithm, if the initial 'discovery call' returns exactly
> one job, the discovery routine won't be called again until that one
> job has been processed.
Incorrect. I'm not sure how you come to this conclusion but it's not
true. The following sequence will occur:
1) The thread will discover the single socket.
2) The thread will call 'read'.
3) The thread will decrement the count to zero.
4) The thread will process the job, and the next thread to run will
see the count is zero and call the socket discovery function.
You cannot call 'poll' or 'select' again until you perform the 'read'
or it will just return immediately. You do not have to process the
job, but you do have to read the data.
> This means that jobs arriving while the one
> job is being processed will be ignored until then, despite there are
> (supposedly) threads availabe doing nothing.
I don't see how you figure. Did you read the parenthetical at the end
of step 4? The I/O code decrements the count as soon as it's finished
calling 'read' or 'write' and before any other processing is done.
DS
| |
| David Schwartz 2007-05-16, 1:19 pm |
| On May 16, 6:58 am, "B. Augestad" <b...@metasystems.no> wrote:
[vbcol=seagreen]
> How portable/standardized are these mechanisms nowadays?
Each of them is standardized and stable but not really all that
portable. IOCP works on many different versions of Windows. I believe
kqueue works on FreeBSD and OSX. Solaris has a mechanism that works
only on Solaris (/dev/poll, I think), same with Linux (epoll).
Some of these were unstable for a time when they were initially
introduced. They're all relatively stable now.
DS
| |
| B. Augestad 2007-05-16, 1:19 pm |
| David Schwartz wrote:
> On May 16, 6:58 am, "B. Augestad" <b...@metasystems.no> wrote:
>
>
>
> Each of them is standardized and stable but not really all that
> portable. IOCP works on many different versions of Windows. I believe
> kqueue works on FreeBSD and OSX. Solaris has a mechanism that works
> only on Solaris (/dev/poll, I think), same with Linux (epoll).
>
> Some of these were unstable for a time when they were initially
> introduced. They're all relatively stable now.
>
This means that select()/poll() *still* are the only portable options
for POSIX platforms, doesn't it?
Bjørn
--
Looking for an embeddable web server?
http://www.metasystems.no/products/...nder/index.html
| |
| Casper H.S. Dik 2007-05-16, 1:19 pm |
| David Schwartz <davids@webmaster.com> writes:
>On May 16, 6:58 am, "B. Augestad" <b...@metasystems.no> wrote:
[vbcol=seagreen]
[vbcol=seagreen]
>Each of them is standardized and stable but not really all that
>portable. IOCP works on many different versions of Windows. I believe
>kqueue works on FreeBSD and OSX. Solaris has a mechanism that works
>only on Solaris (/dev/poll, I think), same with Linux (epoll).
>Some of these were unstable for a time when they were initially
>introduced. They're all relatively stable now.
And they invent new mechanisms if the old are found to be wanting.
(Solaris 10 introduces "event ports" which give a queue of fd, AIO,
message queues and timer events)
Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
| |
| Rainer Weikusat 2007-05-16, 1:19 pm |
| David Schwartz <davids@webmaster.com> writes:
> Rainer Weikusat wrote:
>
>
>
> Incorrect. I'm not sure how you come to this conclusion but it's not
> true.
I am still talking about I/O processing, like it did before.
> The following sequence will occur:
>
> 1) The thread will discover the single socket.
>
> 2) The thread will call 'read'.
>
> You cannot call 'poll' or 'select' again until you perform the 'read'
> or it will just return immediately. You do not have to process the
> job, but you do have to read the data.
With poll, it is possible to deactivate polling for a certain
descriptor by either setting the fd member of the associated struct
pollfd to -1 or by clearing the events member, whatever is more
convenient. Since select destroys its arguments anyway, the
descriptor-known-to-be-ready could just be omitted when reconstructing
the fd sets.
| |
| David Schwartz 2007-05-16, 7:18 pm |
| On May 16, 9:08 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> With poll, it is possible to deactivate polling for a certain
> descriptor by either setting the fd member of the associated struct
> pollfd to -1 or by clearing the events member, whatever is more
> convenient. Since select destroys its arguments anyway, the
> descriptor-known-to-be-ready could just be omitted when reconstructing
> the fd sets.
On the platforms I tested, this approach was measurably inferior. One
reason is pretty obvious, what do you do when you finish the non-
blocking 'read' a split-second later? Find another thread to
'poll'/'select' on that descriptor? Suppose you find 800 sockets
ready. You can't find 800 threads to each wait for the one descriptor
as you finish doing I/O on each one.
You are trying to optimize for a case that really can't be optimized
because it is already so close to optimal. If you discover a lot of
descriptors, you have a lot of work to do and keep all your CPUs busy
anyway. Even if you discovered a new job, you wouldn't do it now. If
you discover only a few, you will be done in so little time, there is
no more performance to squeeze out.
In fact, the longer you wait to call 'poll' or 'select' again, the
higher the performance (so long as you have work to do before you
call). That is because a smaller number of calls that each discover
more descriptors is more efficient than a larger number of calls that
each discover fewer.
DS
| |
| David Schwartz 2007-05-16, 7:18 pm |
| On May 16, 9:02 am, "B. Augestad" <b...@metasystems.no> wrote:
> This means that select()/poll() *still* are the only portable options
> for POSIX platforms, doesn't it?
Correct. But for high-performance scalable TCP servers, the low-level
network code is not going to be portable. That's just a fact of life.
You have to do whatever works best on each platform. This is part of
the 10% of the code that makes 90% of the performance.
DS
| |
| Rainer Weikusat 2007-05-17, 7:16 am |
| David Schwartz <davids@webmaster.com> writes:
> On May 16, 9:08 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
>
>
> On the platforms I tested, this approach was measurably inferior.
Since didn't describe any 'approach' but made a remark regarding what
I would consider to be a limitation of your original 'algorithm' and
wrote the text above only as reply to your claim that poll/ select
cannot be called again after they returned some event once until that
event has been 'silenced', it is somewhat unclear what you are talking
about. Apart from that, that you did something somewhere which
resulted in something else you considered have a certain relation
to something else you did somehwere which resulted in yet something
different is a totally pointless declaration.
Science works by providing theoretically falsfiable results, not by
expressing unspecific opinions about unknown phenomenons.
> One reason is pretty obvious,
One reason for what?
> what do you do when you finish the non- blocking 'read' a
> split-second later?
Which non-blocking read?
> Find another thread to 'poll'/'select' on that descriptor? Suppose
> you find 800 sockets ready. You can't find 800 threads to each wait
> for the one descriptor as you finish doing I/O on each one.
> You are trying to optimize for a case that really can't be optimized
This appears to be your boilerplate comment to anything. Since I
didn't post an algorithm, but criticized one that you posted, it
doesn't fit here.
| |
| David Schwartz 2007-05-17, 1:18 pm |
| On May 17, 12:27 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
[vbcol=seagreen]
[vbcol=seagreen]
> Since didn't describe any 'approach' but made a remark regarding what
> I would consider to be a limitation of your original 'algorithm' and
> wrote the text above only as reply to your claim that poll/ select
> cannot be called again after they returned some event once until that
> event has been 'silenced', it is somewhat unclear what you are talking
> about.
As I explained, it's not a limitation, it's a feature. Calling them
again makes things worse.
> Apart from that, that you did something somewhere which
> resulted in something else you considered have a certain relation
> to something else you did somehwere which resulted in yet something
> different is a totally pointless declaration.
Huh? It is specifically trying to call 'poll' or 'select' quickly that
causes the problem.
It's possible that there's some flaw or error in my argument, but you
have to actually point it out. Complexity doesn't automatically mean
error.
> Science works by providing theoretically falsfiable results, not by
> expressing unspecific opinions about unknown phenomenons.
I expressed some pretty specific opinions and hypotheses to explain
them.
[vbcol=seagreen]
> One reason for what?
One reason why not calling 'poll' or 'select' until after you call
'read' is a feature rather than a limitation, that is, why it has
advantages that outweigh its limitation.
[vbcol=seagreen]
> Which non-blocking read?
Do you remember the example we were talking about?! Is it somehow
possible that you and I are talking about completely different things?
I was advocating this scenario:
1) You call 'poll' or 'select'. You discover one socket ready for I/O.
2) You do your non-blocking 'read' or 'write' (but don't actually
process the data in any other way).
3) You then allow another thread to 'poll' or 'select' on the set that
includes this socket.
You were talking about this scenario:
1) You call 'poll' or 'select'. You discover one socket ready for I/O.
2) You remove the socket from the set, another socket 'select's or
'poll's on the set.
3) You do the non-blocking 'read' or 'write'.
How can you ask "which non-blocking read"? Each scenario contains
exactly one such operation.
[vbcol=seagreen]
> This appears to be your boilerplate comment to anything. Since I
> didn't post an algorithm, but criticized one that you posted, it
> doesn't fit here.
So your criticism simply comes down to "there might be some way to do
it better". Sure, there might. Do you *know* of any?
DS
| |
| Rainer Weikusat 2007-05-18, 1:23 pm |
| David Schwartz <davids@webmaster.com> writes:
> On May 17, 12:27 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
>
>
>
> As I explained, it's not a limitation, it's a feature.
> Calling them again makes things worse.
You wrote that it is not possible to call them again:
,----
| You cannot call 'poll' or 'select' again until you perform the 'read'
| or it will just return immediately. You do not have to process the
| job, but you do have to read the data.
`----
<1179330163.789846.187900@u30g2000hsc.googlegroups.com>
So, which of your two texts is supposed to be true?
>
> Huh?
| On the platforms I tested, this approach was measurably inferior.
Nobody knows what you tested on which platforms, what 'approach' you
believed that to be, what the precise result was, what the other
precise result was you compared it to, how that came into being and
why precisely you considered one to be superior.
>
> I expressed some pretty specific opinions and hypotheses to explain
> them.
IOW, you claim to have expressed 'specific speculations' and
'speculations about those speculations'.
[...]
> I was advocating this scenario:
>
> 1) You call 'poll' or 'select'. You discover one socket ready for I/O.
>
> 2) You do your non-blocking 'read' or 'write' (but don't actually
> process the data in any other way).
>
> 3) You then allow another thread to 'poll' or 'select' on the set that
> includes this socket.
>
> You were talking about this scenario:
>
> 1) You call 'poll' or 'select'. You discover one socket ready for I/O.
>
> 2) You remove the socket from the set, another socket 'select's or
> 'poll's on the set.
>
> 3) You do the non-blocking 'read' or 'write'.
I wasn't talking about any scenario, I have pointed out that your
scheme implies that (on a multiprocessor) only one CPU will actually
be working (this application only) under certain cirumstances.
And I tried to explain to you how one can call eg poll again despite
it returned 'something ready' because you claimed that was impossible
(see quotation above).
>
>
> So your criticism simply comes down to "there might be some way to do
> it better".
No, it comes down to n CPUs idling, 1 CPU working, [m TCP events
unserviced]. Since you claimed that was a feature in this post, I'll
assume that the behaviour would be so, ie I have understood it
correctly. And I know a specific, other algorithm that would avoid
both this condition and the needless passing of the job-to-be-serviced
to a different CPU. Since I currently don't need a scalable
TCP-server, I am not going to elaborate further on that, because there
have been enough speculations already.
| |
| David Schwartz 2007-05-18, 7:17 pm |
| On May 18, 5:56 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> David Schwartz <dav...@webmaster.com> writes:
[vbcol=seagreen]
> You wrote that it is not possible to call them again:
Are you deliberately trying to waste my time? I don't get it. You're a
smart guy, why are you asking stupid questions?
> ,----
> | You cannot call 'poll' or 'select' again until you perform the 'read'
> | or it will just return immediately. You do not have to process the
> | job, but you do have to read the data.
> `----
> <1179330163.789846.187...@u30g2000hsc.googlegroups.com>
>
> So, which of your two texts is supposed to be true?
The two texts are clearly referring to different things. The one
directly above is referring to calling 'poll' or 'select' on the same
poll set, the one further above is referring to calling 'poll' or
'select' with the discovered descriptor removed from the set.
This was absolutely clear in context. Why are you deliberately
misunderstanding me? I don't get it.
>
>
> | On the platforms I tested, this approach was measurably inferior.
>
> Nobody knows what you tested on which platforms, what 'approach' you
> believed that to be, what the precise result was, what the other
> precise result was you compared it to, how that came into being and
> why precisely you considered one to be superior.
If you want more details, ask specific questions.
>
>
> IOW, you claim to have expressed 'specific speculations' and
> 'speculations about those speculations'.
Okay, you're just trying to waste my time. Have a nice day.
>
>
>
>
>
>
>
[vbcol=seagreen]
> I wasn't talking about any scenario, I have pointed out that your
> scheme implies that (on a multiprocessor) only one CPU will actually
> be working (this application only) under certain cirumstances.
Right, circumstances where that is so close to the most efficient
outcome that trying to achieve some other outcome costs more than it
gains. Remember, to get this scenario, we had to finish every single I/
O job we had discovered in so little time that only one new I/O job
became ready in that time.
> And I tried to explain to you how one can call eg poll again despite
> it returned 'something ready' because you claimed that was impossible
> (see quotation above).
It is impossible with that socket still in the set. And as I explained
in more detail later, removing the socket from the set costs more than
it gains.
>
>
[vbcol=seagreen]
> No, it comes down to n CPUs idling, 1 CPU working, [m TCP events
> unserviced]. Since you claimed that was a feature in this post, I'll
> assume that the behaviour would be so, ie I have understood it
> correctly. And I know a specific, other algorithm that would avoid
> both this condition and the needless passing of the job-to-be-serviced
> to a different CPU. Since I currently don't need a scalable
> TCP-server, I am not going to elaborate further on that, because there
> have been enough speculations already.
Your claim is that you know some way to improve this but won't tell
anyone. Thanks for making that clear.
DS
| |
| Rainer Weikusat 2007-05-20, 7:19 am |
| David Schwartz <davids@webmaster.com> writes:
> On May 18, 5:56 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
[...]
>
> If you want more details, ask specific questions.
If you want to convince me of something, provide details. Otherwise,
being a real simpleton, I will happily assume they don't exist :->.
[...]
>
> Right, circumstances where that is so close to the most efficient
> outcome that trying to achieve some other outcome costs more than it
> gains.
[...]
>
>
> Your claim is that you know some way to improve this but won't tell
> anyone.
Exactly. As long as I don't have an actual implementation of something
else, whose real (as opposed to 'what I believe it would be')
behaviour can be determined with at least some amount of objectivity,
its presumptive behaviour could be discussed forever to no avail.
It is, for instance, entirely conceivable that your claim regarding
the non-improveability is totally correct. I just don't know.
| |
| David Schwartz 2007-05-20, 7:21 pm |
| On May 20, 4:45 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> It is, for instance, entirely conceivable that your claim regarding
> the non-improveability is totally correct. I just don't know.
I am not making the claim that everything I'm advocating is the
absolute best possible way to do things. It probably isn't. As I said,
I've spent many, many years refining it and have no reason to think
I'm finished.
However, I doubt it has any possible trivial improvements or defects.
Simple questions like, "why don't you do it this way instead" where
the "this way" is obvious are most likely things I've already
considered.
Modifying the poll/select sets, for example, is one of those things.
I've tried it many ways, including the clever way of trying to sort
the descriptors by how heavily used they are. I've not yet found a way
where the advantages outweigh the disadvantages and I've explained why
the maximum potential improvement is extremely small.
DS
| |
| James Antill 2007-05-21, 7:26 pm |
| On Sun, 20 May 2007 12:56:50 -0700, David Schwartz wrote:
> However, I doubt it has any possible trivial improvements or defects.
> Simple questions like, "why don't you do it this way instead" where the
> "this way" is obvious are most likely things I've already considered.
>
> Modifying the poll/select sets, for example, is one of those things.
> I've tried it many ways, including the clever way of trying to sort the
> descriptors by how heavily used they are. I've not yet found a way where
> the advantages outweigh the disadvantages and I've explained why the
> maximum potential improvement is extremely small.
As usual I'm lurking and mostly agreeing with you (although I'd do s/
threads/tasks/, but whatever). I was wondering though, have you tried
using epoll with edge triggered event notification ... and if so what did
you find the difference to be over level triggered event notification?
--
James Antill -- james@and.org
http://www.and.org/and-httpd/ -- $2,000 security guarantee
http://www.and.org/vstr/
http://www.and.org/ustr/
| |
| David Schwartz 2007-05-21, 7:26 pm |
| On May 21, 3:19 pm, James Antill <james-netn...@and.org> wrote:
> As usual I'm lurking and mostly agreeing with you (although I'd do s/
> threads/tasks/, but whatever). I was wondering though, have you tried
> using epoll with edge triggered event notification ... and if so what did
> you find the difference to be over level triggered event notification?
I have not experimented with edge-triggered event notification. I
honestly don't see any significant benefit over level-triggered and
significantly increased risk. To me, it seems somewhat ill thought
out.
DS
| |
| William Ahern 2007-05-22, 1:18 am |
| David Schwartz <davids@webmaster.com> wrote:
> On May 21, 3:19 pm, James Antill <james-netn...@and.org> wrote:
[vbcol=seagreen]
> I have not experimented with edge-triggered event notification. I
> honestly don't see any significant benefit over level-triggered and
> significantly increased risk. To me, it seems somewhat ill thought
> out.
Particularly since more often than not applications tend to drain the I/O
buffers anyhow, so there's not much savings in syscalls there. And I find it
hard to imagine a scenario where edge-triggered readiness management is more
efficient for a kernel.
And, like you say, even if none of that held in a particular scenario, the
risk would put a quite a premium on the cost of the alternative.
Though it would be interesting just to see real-world numbers.
| |
| Andrei Voropaev 2007-05-22, 7:18 am |
| Newsgroups: comp.unix.programmer
X-Trace: chessie.cirr.com 1179831541 7069 192.94.73.35 (22 May 2007 10:59:01 GMT)
X-Complaints-To: abuse@cirr.com
NNTP-Posting-Date: Tue, 22 May 2007 10:59:01 +0000 (UTC)
User-Agent: slrn/0.9.8.1 (NetBSD)
Xref: number1.nntp.dca.giganews.com comp.unix.programmer:177758
On 2007-05-22, William Ahern <william@wilbur.25thandClement.com> wrote:
[...]
> Particularly since more often than not applications tend to drain the I/O
> buffers anyhow, so there's not much savings in syscalls there. And I find it
> hard to imagine a scenario where edge-triggered readiness management is more
> efficient for a kernel.
I also find it to be true. I once tried to process incoming buffers in
parts and the performance droped by 20%. So it's always the best to
process everything that is available. As to the design of the server, I
believe there's no universal answer. Everything shall depend on what the
server has to do. For example in HTTP server where each connection is
handled pretty much independent from others the multithreading is good.
But in a server where one connection has to exchange data with other
connections, its better to have them all in single thread that uses
epoll. The cost of synchronisation is too high.
--
Minds, like parachutes, function best when open
| |
| James Antill 2007-05-22, 1:25 pm |
| On Mon, 21 May 2007 16:35:46 -0700, David Schwartz wrote:
> On May 21, 3:19 pm, James Antill <james-netn...@and.org> wrote:
>
>
> I have not experimented with edge-triggered event notification. I
> honestly don't see any significant benefit over level-triggered and
> significantly increased risk. To me, it seems somewhat ill thought out.
Ahh, fair enough. I keep seeing hints from kernel developers that imply
edge triggered will be faster in some way, but also no real world user
space code using it.
I think there might be use cases for it where you are hitting the
outbound send rate of a connection, as you then won't have to keep
toggling POLLOUT via. epoll. But I haven't measured anything, and as you
say the risk of getting it wrong is significant ... I was just hoping I
could get some free data 
--
James Antill -- james@and.org
String APIs for C use too much memory? Micro string library: length, ref count,
size and read-only/fixed -- Ave. 55% overhead over strdup(), for 0-20B strings
http://www.and.org/ustr/
| |
| David Schwartz 2007-05-24, 7:20 am |
| On May 15, 7:03 pm, Frank Cusack <fcus...@fcusack.com> wrote:
[vbcol=seagreen]
> and queues the job? then what does it do?
Sorry, I never answered this. It calls the OS' socket discovery
function, queues any job it finds, and then starts over. If it did
queue any jobs, it will then start doing those jobs.
DS
|
|
|
|
|