|
Home > Archive > Unix Programming > June 2006 > Adding new non-blocking sockets during select-call
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Adding new non-blocking sockets during select-call
|
|
| Oliver Sudmann 2006-06-08, 7:26 am |
| Hi,
I try to write an application for linux. The application opens an
unknown number of connections to an unknown time and for an unknown
time. The general idea is to use non-blocking sockets to handle all
connections in one thread.
Now I got the following problem, maybe here is someone with a good idea
for a solution:
Assume there are already some connections in an fd_set and the
application is in a select-call observing this set. Now the application
makes a new connection how can I add this new connection to the set to
observe it?
The select-call waits for an IO-Event of all the other connections, but
it cannot know that there is a new connection that has to be added.
I see only three ways to solve the problem, but all this solutions are
not very nice:
- Timeout in select-call and updating fd_set after timeout.
- Using signals?.
- Using a dummy file descriptor producing an IO_Event when I need it.
I think I cannot be the only one with this problem, so maybe there is a
well known and elegant solution for it.
Thanks,
Oliver
| |
| Aaron Isotton 2006-06-08, 7:26 am |
| Oliver Sudmann wrote:
> Hi,
>
> I try to write an application for linux. The application opens an
> unknown number of connections to an unknown time and for an unknown
> time. The general idea is to use non-blocking sockets to handle all
> connections in one thread.
>
> Now I got the following problem, maybe here is someone with a good idea
> for a solution:
>
> Assume there are already some connections in an fd_set and the
> application is in a select-call observing this set. Now the application
> makes a new connection how can I add this new connection to the set to
> observe it?
>
> The select-call waits for an IO-Event of all the other connections, but
> it cannot know that there is a new connection that has to be added.
I'm not sure whether I understood your problem, but I try summing up
your situation:
- some sockets are open and in an fd_set
- you are currently select()ing on this fd_set
In other words, your program is blocked and - since it is
single-threaded - doing nothing.
While your program is blocked, no new connection will just fall from the
sky, so there is no problem. If your application is making a new
connection, then it is obviously not blocked on select() - and thus you
can add the new connection to the fd_set.
Greetings,
Aaron
| |
| Andrei Voropaev 2006-06-08, 7:26 am |
| On 2006-06-08, Oliver Sudmann <nospam@nospam.com> wrote:
> I try to write an application for linux. The application opens an
> unknown number of connections to an unknown time and for an unknown
> time. The general idea is to use non-blocking sockets to handle all
> connections in one thread.
Does it mean that your application is multi-threaded?
> Assume there are already some connections in an fd_set and the
> application is in a select-call observing this set. Now the application
> makes a new connection how can I add this new connection to the set to
> observe it?
In other words, you created connection in separate thread and now want
to pass it to the thread that does select calls. I think the best in
this situation would be to use some pipe for sending signals to select.
Though even better would be to redesign your application so that no
connections are created outside of thread with select.
--
Minds, like parachutes, function best when open
| |
| Oliver Sudmann 2006-06-08, 7:26 am |
| Hi,
> I'm not sure whether I understood your problem, but I try summing up
> your situation:
>
> - some sockets are open and in an fd_set
> - you are currently select()ing on this fd_set
yes, that is what I wanted to say.
> In other words, your program is blocked and - since it is
> single-threaded - doing nothing.
>
> While your program is blocked, no new connection will just fall from the
> sky, so there is no problem. If your application is making a new
> connection, then it is obviously not blocked on select() - and thus you
> can add the new connection to the fd_set.
No not the whole program is blocked. Only the thread with the
select-call, but there maybe other threads that create new connections.
(My fault, did not mention that there maybe other threads.) So I have to
do something to unblock the thread with the select-call.
By the way the reason why I just want to use one thread for all
connections in a multithreaded application is, because I do not want to
mess up everythings with thousends of threads for connections (imagine
e.g. a p2p-application that creates 1000 connections everyone with its
own thread)
Greetings,
Oliver
| |
| noogie.brown@gmail.com 2006-06-08, 7:26 am |
|
Oliver Sudmann wrote:
> Hi,
>
>
> yes, that is what I wanted to say.
>
>
> No not the whole program is blocked. Only the thread with the
> select-call, but there maybe other threads that create new connections.
> (My fault, did not mention that there maybe other threads.) So I have to
> do something to unblock the thread with the select-call.
>
> By the way the reason why I just want to use one thread for all
> connections in a multithreaded application is, because I do not want to
> mess up everythings with thousends of threads for connections (imagine
> e.g. a p2p-application that creates 1000 connections everyone with its
> own thread)
>
> Greetings,
> Oliver
You can add your listening socket to the fd_set. When a new connection
arrives, select will end with that socket in the rd_set...in which case
you accept() the new connection and do what you want with it.
| |
| Aaron Isotton 2006-06-08, 7:26 am |
| Oliver Sudmann wrote:
> Hi,
>
>
>
> yes, that is what I wanted to say.
>
>
>
> No not the whole program is blocked. Only the thread with the
> select-call, but there maybe other threads that create new connections.
> (My fault, did not mention that there maybe other threads.) So I have to
> do something to unblock the thread with the select-call.
I think signals are the way to go. Use some data structure shared
between the threads (appropriately protected with a mutex) and a signal
as a notification telling the worker thread that "something has
happened". You can send signals to a specific thread using pthread_kill(3).
Greetings,
Aaron
| |
| Oliver Sudmann 2006-06-08, 1:25 pm |
| > I think signals are the way to go. Use some data structure shared
> between the threads (appropriately protected with a mutex) and a signal
> as a notification telling the worker thread that "something has
> happened". You can send signals to a specific thread using pthread_kill(3).
To summarize your suggestions and the suggestions of Andrei Voropaev: I
should use signals for my application. But I'm really getting the
impression, that the whole thing with select is not a good solution
("even better would be to redesign your application",Andrei Voropaev).
So here is a completely different idea:
What do you think about the following alternative (completely without an
select-call) using simple polling. (Better or more worse than signals?):
Here is some pseudocode:
working thread:
while(!end)
{
for(i=0;i<number_of_connections;i++)
{
ConnectionSocket cs=list_of_connections(i);
cs.doNonBlockRecv();
cs.doNonBlockSend();
}
updateConnectionList();
Sleep();
}
A thread creating connections can add connections to another list and
everytime the for-loop has been run the connection-list of the for-loop
will be updated for the working thread.
To cut a long story short: Is polling a typical "NoNo" or can it be used
for GOOD applications?
Thanks,
Oliver
| |
|
| Oliver Sudmann wrote:
>
>
> To summarize your suggestions and the suggestions of Andrei Voropaev: I
> should use signals for my application. But I'm really getting the
> impression, that the whole thing with select is not a good solution
> ("even better would be to redesign your application",Andrei Voropaev).
> So here is a completely different idea:
>
> What do you think about the following alternative (completely without an
> select-call) using simple polling. (Better or more worse than signals?):
>
> Here is some pseudocode:
>
> working thread:
> while(!end)
> {
> for(i=0;i<number_of_connections;i++)
> {
> ConnectionSocket cs=list_of_connections(i);
> cs.doNonBlockRecv();
> cs.doNonBlockSend();
> }
> updateConnectionList();
> Sleep();
> }
>
> A thread creating connections can add connections to another list and
> everytime the for-loop has been run the connection-list of the for-loop
> will be updated for the working thread.
>
> To cut a long story short: Is polling a typical "NoNo" or can it be used
> for GOOD applications?
>
> Thanks,
> Oliver
This is bad design by design, IMHO.
The reason you _want_ threads is that you *want* them to be allowed to
block. If you only have one thread that deals with accept() -ing new
connections, it only needs to allocate a new "handler" thread for the
new connection (or schedule one) ,and go on with it's life: accepting
new connections.
Another reason for using threads would be using a multi processor
machine. This would only pay off if you have more than (about) four
processors: one for the app, one for the application, and two for
keeping the ethernet busy. (very course estimate, I know)
Yet another reason would be the inability to exploit poll/select.
(which can be hard, eg when recursive-descent style parsing is involved)
HTH,
AvK
| |
| Brian Raiter 2006-06-08, 7:23 pm |
| > In other words, you created connection in separate thread and now
> want to pass it to the thread that does select calls. I think the
> best in this situation would be to use some pipe for sending signals
> to select.
If the new connections are arriving via a listening socket, then I
would go further and advise that you NOT handle the listening socket
in a separate thread in the first place. Add the listening socket to
the set of read fds in your select call. In other words: Use threads
wisely.
b
| |
| davids@webmaster.com 2006-06-08, 7:23 pm |
|
moi wrote:
> This is bad design by design, IMHO.
I could not disagree more strongly.
> The reason you _want_ threads is that you *want* them to be allowed to
> block. If you only have one thread that deals with accept() -ing new
> connections, it only needs to allocate a new "handler" thread for the
> new connection (or schedule one) ,and go on with it's life: accepting
> new connections.
How do you know the reason he wants threads? A perfectly reasonable,
common, and extremely effective use of threads is as a slight variant
to the typical single-threaded select loop design. This design is
wonderful, except it has one flaw -- if any tiny piece of code ever
blocks for any reason, your whole server collapses. Using a select loop
design with multiple threads to work around this one problem is
eminently sensible.
Of course he doesn't want his threads to block, that forces a context
switch. That's almost always less efficient than having the work be
done by the thread that's already running anyway.
DS
| |
| davids@webmaster.com 2006-06-08, 7:23 pm |
|
Andrei Voropaev wrote:
> Though even better would be to redesign your application so that no
> connections are created outside of thread with select.
Why clutter an elegant design with silly restrictions? Having some
threads be special is something you'd prefer to avoid where possible.
That way, whatever thread happens to be running can do whatever work
happens to need to be done.
DS
| |
| davids@webmaster.com 2006-06-08, 7:23 pm |
|
Oliver Sudmann wrote:
> To summarize your suggestions and the suggestions of Andrei Voropaev: I
> should use signals for my application. But I'm really getting the
> impression, that the whole thing with select is not a good solution
> ("even better would be to redesign your application",Andrei Voropaev).
> So here is a completely different idea:
>
> What do you think about the following alternative (completely without an
> select-call) using simple polling. (Better or more worse than signals?):
Oh for the love of god, why are you letting these people talk you out
of a perfectly good design?! Polling is horrible.
The solution you want is to create a pipe with the 'pipe' system call
and add the read end of the 'pipe' to the 'select' fd_set. Make sure
both ends of the pipe are non-blocking.
You can make this scheme more efficient with a few tricks:
1) You can keep a flag to indicate whether or not a thread is currently
in select. Set it at the same time you copy the fd_set (protect it with
the same lock). Do not write to the pipe unless this flag is set (since
the thread can't be blocked in 'select' anyway).
2) You can clear that flag when you write to the pipe. That way another
connection becoming active won't cause another write to the 'pipe'.
3) Before copying the fd_sets and calling 'select', read from the pipe
to make sure there isn't a stale byte in there from before. If there
is, 'select' could just return immediately.
4) Make sure to read as many bytes from the pipe as there are, not just
one. Otherwise, again, you could wind up returning from 'select'
prematurely.
Note that you probably don't need all of these tricks, as some of them
overlap.
You are doing *exactly* the right thing. You are designing an
application that gets all of the advantages of a single-threaded
'select' loop application with none of the disadvantages. You won't
have lots of context switches as thread-per-connection or blocking
designs have. You won't have to write every line of server with the "I
must not ever block or I'm dead" fear.
Ignore the nay-sayers. They are just annoyed that you aren't
reinforcing their prejudices.
DS
| |
|
| davids@webmaster.com wrote:
> moi wrote:
>
>
>
>
> I could not disagree more strongly.
>
>
>
>
> How do you know the reason he wants threads? A perfectly reasonable,
> common, and extremely effective use of threads is as a slight variant
> to the typical single-threaded select loop design. This design is
> wonderful, except it has one flaw -- if any tiny piece of code ever
> blocks for any reason, your whole server collapses. Using a select loop
> design with multiple threads to work around this one problem is
> eminently sensible.
>
> Of course he doesn't want his threads to block, that forces a context
> switch. That's almost always less efficient than having the work be
> done by the thread that's already running anyway.
>
> DS
>
I rest my case.
AvK
| |
| davids@webmaster.com 2006-06-08, 7:23 pm |
|
Brian Raiter wrote:
> If the new connections are arriving via a listening socket, then I
> would go further and advise that you NOT handle the listening socket
> in a separate thread in the first place. Add the listening socket to
> the set of read fds in your select call. In other words: Use threads
> wisely.
Just becase the thread that calls select discovers that the socket is
ready to be accepted from, that doesn't mean it has to be the one to
call accept on it. For example, a fairly common architecture works like
this:
1) First I/O thread that has no I/O jobs to do calls 'select' so long
as no other thread is in 'select'.
2) All discovered sockets are added to a queue.
3) I/O threads take jobs from the queue, and removed as soon as the
socket is no longer discoverable. (For example, if ready for 'accept',
they remove the job as soon as they call 'accept', but not after they
do other work. If ready for reading, the remove the job after they call
'read' but not after they process the data.)
4) When the queue is empty, the first finished I/O thread calls
'select', other threads then wait for it to put jobs on the queue.
This has several advantages. First, on a single-processor machine, the
thread that calls 'select' will probably just wind up processing all
the jobs anyway with no context switches. However, if a job blocks,
other threads can continue handling other I/O jobs. if a thread blocks
after the job is removed, another thread can even call 'select'.
Why would you call advising against this type of approach using threads
wisely? Forcing specific jobs to be done by specific threads is, in my
opinion, the quintissential example of not using threads wisely,
because it brings back the biggest problem that threads fixes. The
whole point of threads is that unexpected blocking isn't fatal -- why
put in artificial restrictions to make it fatal again?!
DS
| |
|
|
| Oliver Sudmann 2006-06-09, 7:24 am |
| Andrei Voropaev wrote:
> I guess, it may come down to the matter of taste. Your only argument for
> having multiple threads do the socket handling is "what if handling has
> to block". Well, so far I din't encounter such situation, so it's hard
> for me to talk about it That is why I prefer the simplicity of
> having single select thread that does all the socket handling. Just
> don't like going into all the mutex, queues and synchronization hell 
>
> Oh well. Probably the "simplicity" of single thread select handling is
> also relative But I've spent so much time working with the library
> similar to libevent that this seems to be very simple At least
> tracing all the race conditions in multi-threaded application is much
> more complex.
Firstly thanks about all your opinions about this topic. That is exactly
what I needed! Because I have not much experience in network programming
it is really interesting to listen to your experiences.
The first thing I learned here (correct if I'm wrong): There is not one
best solution, but there are several solutions that are good depending
on what I want to do with my application.
Because I'm writing a base-library for different applications I want to
develop, I got the problem that I need a solution that works for all
these different applications, now. So I have to decide to use one solution.
- Because networking will only be one component of every application I
do not want to spend thousends of threads for that(this is why I do not
want a thread for every connection. But the other components can have
threads and here starts the problem).
- I do not want to think about how my network implementation works in
the other components. So the other comonents will run in diffrent
threads and advise the network thread to do something through a clear
interface.
Because polling is EVIL I think I should use select (This is something I
learned here, too). So I have to solve the problem I mentioned in the
first posting. There are two solutions that seems to be the best from my
point of view at this moment:
- Using signals...
- Using a pipe...
....to inform the select-call that something has happend.
I think I will give the pipes a try. If I get a better idea I will
inform you.
Thanks all for your help,
Oliver
P.S.: Does anyone know how the connections in aMule are managed? They
need to manage thousends of connections, so I cannot imagine that they
use thousends threads. (I could have read the sourcecode, but this would
take me to much time)
| |
| davids@webmaster.com 2006-06-09, 7:23 pm |
|
Andrei Voropaev wrote:
> I guess, it may come down to the matter of taste. Your only argument for
> having multiple threads do the socket handling is "what if handling has
> to block". Well, so far I din't encounter such situation, so it's hard
> for me to talk about it That is why I prefer the simplicity of
> having single select thread that does all the socket handling. Just
> don't like going into all the mutex, queues and synchronization hell 
Oh, you probably have. You just didn't recognize it.
Ever notice that for no apparent reason, your server just freezes for a
fraction of a second or so and then suddenly it continues? That was
probably because one of your threads blocked unexpectedly and whatever
you were waiting for could only be done by that thread.
The beauty of a multi-threaded server is that you don't have to worry
that if some chunk of your code blocks unexpectedly, your whole sever
will freeze. This also means that you don't have to code every last bit
of your program and all the libraries it calls and so on to carefully
avoid any blocking under any circumstances.
By limiting yourself to a single I/O discovery thread, you must be
absolutely certain that no function that thread could ever possibly
call, or any function those could call, can ever block under any
circumstances. This includes page faults, file reads, and the like.
That's a lot of extra coding effort and extra risk for *no* benefit.
Worse, it entails a net performance loss on SMP systems unless the
'select' thread farms off the actual read/accept jobs to other threads,
in which case it means extra context switches. Yuck.
Hey, do it that way if it doesn't hurt your particular application. But
don't advocate it in general. It's definitely not the best way to go.
DS
| |
| davids@webmaster.com 2006-06-09, 7:23 pm |
|
davids@webmaster.com wrote:
> By limiting yourself to a single I/O discovery thread, you must be
> absolutely certain that no function that thread could ever possibly
> call, or any function those could call, can ever block under any
> circumstances. This includes page faults, file reads, and the like.
I forgot to add "or any function that might be called by another thread
that holds a lock that the I/O discovery thread might try to grab".
DS
| |
| Jeremy 2006-06-10, 1:25 am |
| Hi David,
Really interested in the 'fairly common architecture' you described,
but I have some troubles understanding your explanation:
davids@webmaster.com wrote:
> Just becase the thread that calls select discovers that the socket is
> ready to be accepted from, that doesn't mean it has to be the one to
> call accept on it. For example, a fairly common architecture works like
> this:
>
> 1) First I/O thread that has no I/O jobs to do calls 'select' so long
> as no other thread is in 'select'.
>
> 2) All discovered sockets are added to a queue.
>
so these are sockets with events?
> 3) I/O threads take jobs from the queue, and removed as soon as the
> socket is no longer discoverable. (For example, if ready for 'accept',
> they remove the job as soon as they call 'accept', but not after they
> do other work. If ready for reading, the remove the job after they call
> 'read' but not after they process the data.)
What jobs? are they just sockets with events? Don't really understand
what you mean by "socket is no longer discoverable" here since the info
is not known until another select is called. Could you also explain
what "remove" means here?
>
> 4) When the queue is empty, the first finished I/O thread calls
> 'select', other threads then wait for it to put jobs on the queue.
>
> This has several advantages. First, on a single-processor machine, the
> thread that calls 'select' will probably just wind up processing all
> the jobs anyway with no context switches. However, if a job blocks,
> other threads can continue handling other I/O jobs. if a thread blocks
> after the job is removed, another thread can even call 'select'.
>
Again, don't get how the single thread can do all the work, guess it's
related to how the job is removed.
Maybe a concrete example helps.
Thanks,
Jeremy
| |
| davids@webmaster.com 2006-06-10, 1:25 am |
|
Jeremy wrote:
> Hi David,
>
> Really interested in the 'fairly common architecture' you described,
> but I have some troubles understanding your explanation:
> davids@webmaster.com wrote:
> so these are sockets with events?
I don't understand what you mean. Generally, you are interested in
knowing if there is data to be read on most of the sockets you are
handling and you are interested if there may be write space on any
socket you need to write to. Determining this is often referred to as
"socket discovery". When you call 'select' or 'poll', you get a list of
discovered sockets and you then need to attempt some form of I/O on
them.
[vbcol=seagreen]
> What jobs? are they just sockets with events? Don't really understand
> what you mean by "socket is no longer discoverable" here since the info
> is not known until another select is called. Could you also explain
> what "remove" means here?
Suppose you get that a socket may have data to read, so you call 'read'
for 4,096 bytes. Yet get 2,000, so you presume that the socket is no
longer discoverable. At this point, it is acceptable to let another
thread call 'select' even though you haven't processed the bytes you've
received yet. However, if you let another thread call 'select' before
you call 'read', it will just rediscover the same data, which would
hurt performance.
You generally do not let a thread call 'select'/'poll' unless two
things are true:
1) no thread is already calling 'select' (or 'poll') for these same
sockets.
2) there are no events you've already discovered that you have not made
undiscoverable. (That means, for example, that for a listening socket,
you called 'accept' but may not have finished processing the new
connection. As soon as you call 'accept', the same thing is no longer
discoverable.)
[vbcol=seagreen]
> Again, don't get how the single thread can do all the work, guess it's
> related to how the job is removed.
The thread returns from 'select' or 'poll', goes through the returned
set and queues jobs for every discovered event. It then begins
processing events. If there's only a single CPU, it will simply process
all the events, and then call 'select' or 'poll' again. If it blocks,
another thread will get switched in and it will process events. As soon
as all discovered events are no longer discoverable, the next thread to
complete processing an event will call 'select' or 'poll'.
> Maybe a concrete example helps.
By 'event', I mean that a listening socket has returned a read hit or a
TCP connection socket has returned a read or write hit. By "no longer
discoverable", I mean that the same event will no longer trigger a
'select' or 'poll' return. For a listening socket, that means you
called 'accept' (even if you haven't even looked at the return value).
For a TCP connection that gets a read hit, that means you called 'read'
as many times as you are going to for this discovery.
By 'socket discovery' I mean how you tell that a socket may be ready
for I/O, usually using 'select' or 'poll'.
DS
| |
| Jeremy 2006-06-10, 7:25 am |
|
davids@webmaster.com wrote:
>
> I don't understand what you mean. Generally, you are interested in
> knowing if there is data to be read on most of the sockets you are
> handling and you are interested if there may be write space on any
> socket you need to write to. Determining this is often referred to as
> "socket discovery". When you call 'select' or 'poll', you get a list of
> discovered sockets and you then need to attempt some form of I/O on
> them.
>
I was guessing that you meant "sockets with events" when you said
"discovered sockets", looks like I was right.
>
>
> Suppose you get that a socket may have data to read, so you call 'read'
> for 4,096 bytes. Yet get 2,000, so you presume that the socket is no
> longer discoverable. At this point, it is acceptable to let another
> thread call 'select' even though you haven't processed the bytes you've
> received yet. However, if you let another thread call 'select' before
> you call 'read', it will just rediscover the same data, which would
> hurt performance.
>
Ok, much clear to me as to waht 'discoverable' is. I can understand the
io thread is doing read-ahead when data arrives, but how about writes?
are you assuming the other arbitrary thread is doing half-write (likely
to get a EWOULDBLOCK), then the other half been done by the io thread
when it discovers space available, or you assuming the whole write
operation is dispatched to the io thread? I had a stress test that
indicated the second approach would degrade perf. by about 10-20%, but
now sure if the first method is best (or elegant).
> You generally do not let a thread call 'select'/'poll' unless two
> things are true:
>
> 1) no thread is already calling 'select' (or 'poll') for these same
> sockets.
>
> 2) there are no events you've already discovered that you have not made
> undiscoverable. (That means, for example, that for a listening socket,
> you called 'accept' but may not have finished processing the new
> connection. As soon as you call 'accept', the same thing is no longer
> discoverable.)
>
>
>
> The thread returns from 'select' or 'poll', goes through the returned
> set and queues jobs for every discovered event. It then begins
> processing events. If there's only a single CPU, it will simply process
> all the events, and then call 'select' or 'poll' again. If it blocks,
> another thread will get switched in and it will process events. As soon
> as all discovered events are no longer discoverable, the next thread to
> complete processing an event will call 'select' or 'poll'.
>
Do you assume locks for the job queue and entering the 'poll' state?
what you described is more like how iocp behaves, but I dont know how
it would work without a complicated design (or even thread scheduling
support).
| |
| davids@webmaster.com 2006-06-10, 7:21 pm |
|
Jeremy wrote:
> Ok, much clear to me as to waht 'discoverable' is. I can understand the
> io thread is doing read-ahead when data arrives, but how about writes?
> are you assuming the other arbitrary thread is doing half-write (likely
> to get a EWOULDBLOCK), then the other half been done by the io thread
> when it discovers space available, or you assuming the whole write
> operation is dispatched to the io thread? I had a stress test that
> indicated the second approach would degrade perf. by about 10-20%, but
> now sure if the first method is best (or elegant).
You can do it either way. I used to prefer having the other thread do
the write, falling back to the I/O thread only on EWOULDBLOCK. The
problem with that occurs in cases where a non-I/O thread might have to
write a small amount of data to a large number of sockets -- it bogs
down with all the user-space/kernel-space transitions. (If you have no
such cases, then it's fine, I think.)
On the flip side, it's also superfluous to call 'select' or 'poll'
before writing to a socket that one has not written to in a long time.
It is almost certain that a large amount of data will be able to be
written before blocking indications are returned.
I settled on a hybrid approach. When a non-I/O thread wants to send
data to a socket, it does the following: If we are already trying to
select/poll for write on the socket, just add the data to the
user-space send queue. If we are not, queue a 'speculative write' job
to the I/O pool. When an I/O thread dequeues a speculative write job,
it keeps calling 'write' until it either empties the user-space send
queue (in which case, it's done) or gets a blocking indication, in
which case it adds the socket to the write set.
> Do you assume locks for the job queue and entering the 'poll' state?
> what you described is more like how iocp behaves, but I dont know how
> it would work without a complicated design (or even thread scheduling
> support).
If you held a lock on the job queue while in 'poll', non-I/O threads
couldn't queue I/O jobs while you were selecting or polling, which
would not be good.
DS
| |
| Andrei Voropaev 2006-06-12, 7:26 am |
| On 2006-06-09, davids@webmaster.com <davids@webmaster.com> wrote:
>
> Andrei Voropaev wrote:
>
>
> Oh, you probably have. You just didn't recognize it.
[...]
> Hey, do it that way if it doesn't hurt your particular application. But
> don't advocate it in general. It's definitely not the best way to go.
Ok, I won't. As I said, this approach works well for *my* applications.
And really I don't have to create super-fast web servers. But really how
many people have to? So, I guess we have to draw the balancing line
here. It doesn't make sense to advocate approaches, neither multi, nor
single threaded For each particular application there should be
specific approach.
I hope the OP didn't get confused with all this stuff, but instead saw
few ways to implement the project and has chosen the suitable one 
--
Minds, like parachutes, function best when open
| |
| Andrei Voropaev 2006-06-12, 7:26 am |
| On 2006-06-10, Jeremy <fc2004@gmail.com> wrote:
>
> davids@webmaster.com wrote:
[...]
> Do you assume locks for the job queue and entering the 'poll' state?
> what you described is more like how iocp behaves, but I dont know how
> it would work without a complicated design (or even thread scheduling
> support).
>
No, I guess here simple "semaphore" would be sufficient. After finishing
processing of all queued jobs, the thread would check whether someone
else is already doing select call and if not, then do it itsefl, setting
the semaphore. Now if the call is already in progress, then the thread
may continue with something else. Probably, it should be also mentioned,
that the job that got blocked (and its thread), after finishing may find
that some other thread has already called select. In this case it may
need to use some of the ways for terminating current
select call. Which is what OP has asked about 
This whole thing can be wrapped into the interface similar to libevent,
so, the caller will just need to provide "the callbacks" and won't need
to worry about threads. The library shall provide the thread for the
callback automatically. Well, the callback has to take into account the
needed synchronisation. Hm, I may extend my library to support this
approach as well. Just for fun 
--
Minds, like parachutes, function best when open
| |
| Andrei Voropaev 2006-06-13, 7:31 am |
| On 2006-06-10, davids@webmaster.com <davids@webmaster.com> wrote:
>
> Jeremy wrote:
[...][vbcol=seagreen]
[...][vbcol=seagreen]
>
> The thread returns from 'select' or 'poll', goes through the returned
> set and queues jobs for every discovered event. It then begins
> processing events. If there's only a single CPU, it will simply process
> all the events, and then call 'select' or 'poll' again. If it blocks,
> another thread will get switched in and it will process events. As soon
> as all discovered events are no longer discoverable, the next thread to
> complete processing an event will call 'select' or 'poll'.
[...]
After going thru this few more times, looks like I still miss one very
important point: how many "another" threads do we have to pick up the
scheduled jobs? Taking your scenario, thread returns from select, it
queues 100 jobs to be processed and starts processing. It gets blocked
on the first job. The next thread picks up second job and also gets
blocked. Who arranges for more threads to pick up jobs 3 thru 100 while
the first 2 are blocked? Does this approach imply that none of the jobs
would really block for long, so that 5-6 preallocated threads would be
enough?
--
Minds, like parachutes, function best when open
| |
| davids@webmaster.com 2006-06-13, 7:21 pm |
|
Andrei Voropaev wrote:
> After going thru this few more times, looks like I still miss one very
> important point: how many "another" threads do we have to pick up the
> scheduled jobs? Taking your scenario, thread returns from select, it
> queues 100 jobs to be processed and starts processing. It gets blocked
> on the first job. The next thread picks up second job and also gets
> blocked. Who arranges for more threads to pick up jobs 3 thru 100 while
> the first 2 are blocked? Does this approach imply that none of the jobs
> would really block for long, so that 5-6 preallocated threads would be
> enough?
It really depends upon the specifics of your application. If blocking
is going to be extremely infrequent, then you only need a few more
threads than you have CPUs. If blocking is going to be frequent, then
you may need many more.
Note that the idea that having a lot of ready-to-run threads hurts
performance is largely a myth. It's the context switches that hurt
performance or, to put it another way, the problem is when a lot of
threads each do a little work rather than each thread doing a lot of
work.
If you expect frequently blocking, you may need to dynamically manage
the size of your thread pool. Unfortunately, this is difficult as no
UNIX I know of makes it easy to tell whether your thread-starved or
CPU-starved. If you have, say, 2 CPUs, 8 threads, and 100 jobs in the
queue, are those 8 threads all blocked (and more threads would help)?
Or are they all burning full CPU working on jobs (and more threads will
not help at all)? You can't tell. (WIN32 has a solution to this
problem, UNIX doesn't.)
The approach I recommend is multi-pronged:
1) Make blocking infrequent.
2) Use a few more threads than CPUs.
3) Be able to dynamically tune the number of threads. If you want to be
really fancy, you can have an I/O thread mark itself as "likely to
block", "CPU limited", or the like, but that's probably more trouble
than it's worth.
It's not perfect. You can wind up creating more threads in a case where
more threads doesn't help at all.
DS
|
|
|
|
|