|
Home > Archive > Unix Programming > December 2007 > Restricting children processes
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Restricting children processes
|
|
| Kelvin Moss 2007-12-17, 1:29 pm |
| Hi all,
I want to fork a number of children processes but I also want to
restrict the number of forked children processes at any time to not
cross some value (say N). Parent should wait if the threshold is met.
I have written a sample code for FreeBSD 4.11 that seems to work fine.
Could you please point out any mistakes, or offer any suggestions
Thanks ..
P.S. - Ignore the error checking for now.
#include <stdio.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/wait.h>
int max_children = 50;
int num_children = 0;
void sig_handler(int sig)
{
num_children--;
}
int main()
{
int pid;
int status;
int i;
signal(SIGCHLD, sig_handler);
for (i = 0; i < 10; i++) {
if (num_children == max_children) {
wait(&status);
}
pid = fork();
if (pid < 0) {
exit(2);
} else if (pid == 0) {
sleep(10*i); /* Do any random work */
exit(0);
} else {
num_children++;
}
}
}
| |
| Rainer Weikusat 2007-12-17, 1:29 pm |
| Kelvin Moss <km_jr_usenet@yahoo.com> writes:
> I want to fork a number of children processes but I also want to
> restrict the number of forked children processes at any time to not
> cross some value (say N). Parent should wait if the threshold is met.
> I have written a sample code for FreeBSD 4.11 that seems to work fine.
> Could you please point out any mistakes, or offer any suggestions
>
> Thanks ..
> P.S. - Ignore the error checking for now.
>
> #include <stdio.h>
> #include <signal.h>
> #include <sys/types.h>
> #include <sys/wait.h>
>
> int max_children = 50;
> int num_children = 0;
>
> void sig_handler(int sig)
> {
> num_children--;
> }
>
> int main()
> {
> int pid;
> int status;
> int i;
> signal(SIGCHLD, sig_handler);
> for (i = 0; i < 10; i++) {
> if (num_children == max_children) {
> wait(&status);
> }
> pid = fork();
> if (pid < 0) {
> exit(2);
> } else if (pid == 0) {
> sleep(10*i); /* Do any random work */
> exit(0);
> } else {
> num_children++;
> }
> }
> }
Since the loop is only executed ten times, num_children can never
equal max_children. Theoretically, num_children should be declared
as
volatile sig_atomic_t num_children
In practice, this doesn't matter (now), because int accesses are
atomic and gcc (which I assume you are using) always reloads the
values of non-automatic variables. Nevertheless, according to the
semantics known to a C-compiler, it would (AFAIK) be ok to just always
use the same register for num_children and never actually load a value
from the corresponding memory location.
| |
| Eric Sosman 2007-12-17, 7:22 pm |
| Kelvin Moss wrote:
> Hi all,
>
> I want to fork a number of children processes but I also want to
> restrict the number of forked children processes at any time to not
> cross some value (say N). Parent should wait if the threshold is met.
> I have written a sample code for FreeBSD 4.11 that seems to work fine.
> Could you please point out any mistakes, or offer any suggestions
>
> Thanks ..
> P.S. - Ignore the error checking for now.
>
> #include <stdio.h>
> #include <signal.h>
> #include <sys/types.h>
> #include <sys/wait.h>
>
> int max_children = 50;
> int num_children = 0;
>
> void sig_handler(int sig)
> {
> num_children--;
> }
>
> int main()
> {
> int pid;
> int status;
> int i;
> signal(SIGCHLD, sig_handler);
> for (i = 0; i < 10; i++) {
> if (num_children == max_children) {
> wait(&status);
> }
> pid = fork();
> if (pid < 0) {
> exit(2);
> } else if (pid == 0) {
> sleep(10*i); /* Do any random work */
> exit(0);
> } else {
> num_children++;
> }
> }
> }
Observation #1: There's a race condition in the handling
of the num_children variable.
main(): sig_handler():
* Fetch num_children
* Add one
--- SIGCHLD --->
* Fetch num_children
* Subtract one
* Store num_children
<--- return ---
* Store num_children
This race is not curable by use of volatile and/or sig_atomic_t.
Observation #2: Nothing prevents a child from forking a
million grandchildren. Is that a problem?
Observation #3: If you'd explain why you want to enforce a
limit, someone might have further ideas.
--
Eric Sosman
esosman@ieee-dot-org.invalid
| |
| Martin Vuille 2007-12-17, 7:22 pm |
| Kelvin Moss <km_jr_usenet@yahoo.com> wrote in
news:4ca5d71d-0662-45ab-ab2c-59b2b1a0a229@d27g2000prf.googlegrou
ps.com:
> Hi all,
>
> I want to fork a number of children processes but I also want
> to restrict the number of forked children processes at any
> time to not cross some value (say N). Parent should wait if
> the threshold is met. I have written a sample code for
> FreeBSD 4.11 that seems to work fine. Could you please point
> out any mistakes, or offer any suggestions
>
> Thanks ..
> P.S. - Ignore the error checking for now.
>
> #include <stdio.h>
> #include <signal.h>
> #include <sys/types.h>
> #include <sys/wait.h>
>
> int max_children = 50;
> int num_children = 0;
>
> void sig_handler(int sig)
> {
> num_children--;
> }
A couple of things. Don't you actually need
to reap the children so that they don't become
zombies? That would require calling waitpid()
or similar.
Also, I believe not all implementations guarantee
that the signal handler is called for every child
that exits. So you might need a loop to ensure
all dead children are counted in the event that
several exit "simultaneously".
MV
| |
| Kelvin Moss 2007-12-17, 7:22 pm |
| On Dec 17, 12:24 pm, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> Observation #1: There's a race condition in the handling
> of the num_children variable.
>
> main(): sig_handler():
> * Fetch num_children
> * Add one
> --- SIGCHLD --->
> * Fetch num_children
> * Subtract one
> * Store num_children
> <--- return ---
> * Store num_children
>
> This race is not curable by use of volatile and/or sig_atomic_t.
That's true. There is no fool proof method to avoid this race
condition.
> Observation #2: Nothing prevents a child from forking a
> million grandchildren. Is that a problem?
> Observation #3: If you'd explain why you want to enforce a
> limit, someone might have further ideas.
We want to run some part of our processing logic in a child process
(basically we are analyzing if it makes it any scalable on a multicore
box). The child process might take a while, but we don't want to swamp
the system with lots of such children processes. Hence, the need to
restrict the number of children processes.
Thanks ..
| |
| Eric Sosman 2007-12-17, 7:22 pm |
| Kelvin Moss wrote:
> [...]
> We want to run some part of our processing logic in a child process
> (basically we are analyzing if it makes it any scalable on a multicore
> box). The child process might take a while, but we don't want to swamp
> the system with lots of such children processes. Hence, the need to
> restrict the number of children processes.
Have you considered a multithreaded solution?
Another possibility might be to run a fixed number of
child processes, each capable of being told to perform
several tasks, one after another. You could use a shared
memory segment (for example) to hold a queue of "job tickets"
for the child processes to work on; whenever a child process
has nothing else to do it peels the next ticket off the queue
and goes off to do the work the ticket describes.
Much depends on how the existing code is structured ...
--
Eric Sosman
esosman@ieee-dot-org.invalid
| |
| Chris Friesen 2007-12-17, 7:22 pm |
| Kelvin Moss wrote:
> On Dec 17, 12:24 pm, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
>
>
> That's true. There is no fool proof method to avoid this race
> condition.
Here are a couple ways to avoid it:
1) In your main loop you could mask SIGCHLD before doing the
increment-and-store, then unblock it after. This would prevent the
signal handler from running at the sensitive time.
2) You could ignore SIGCHLD and decrement num_children after the wait()
call in main().
Chris
| |
| Kelvin Moss 2007-12-17, 7:22 pm |
| On Dec 17, 3:18 pm, Chris Friesen <cbf...@mail.usask.ca> wrote:
> Here are a couple ways to avoid it:
>
> 1) In your main loop you could mask SIGCHLD before doing the
> increment-and-store, then unblock it after. This would prevent the
> signal handler from running at the sensitive time.
But am I guaranteed that the pending signal would be delivered after
unblocking it?
> 2) You could ignore SIGCHLD and decrement num_children after the wait()
> call in main().
Hmm..I think there might be scenarios where you under-fork with this
approach. You would never cross the maximum limit though.
Thanks!
| |
| fjblurt@yahoo.com 2007-12-18, 1:42 am |
| On Dec 17, 4:00 pm, Kelvin Moss <km_jr_use...@yahoo.com> wrote:
> On Dec 17, 3:18 pm, Chris Friesen <cbf...@mail.usask.ca> wrote:
>
> Hmm..I think there might be scenarios where you under-fork with this
> approach. You would never cross the maximum limit though.
I think what he means is to leave SIGCHLD set to SIG_DFL. wait() will
return exactly once for each child that exits, so that means you can
fork one more. If two children exited at almost the same time, wait()
will return immediately on the next iteration. I don't see how this
could result in having too few processes.
This would, however, get tricky if the main process has other work to
do. You can use waitpid() with WNOHANG, but depending how often you
call it there might be a delay before your new processes get started.
| |
| Chris Friesen 2007-12-18, 1:42 am |
| Kelvin Moss wrote:
> But am I guaranteed that the pending signal would be delivered after
> unblocking it?
Yes, you are. It wouldn't make sense any other way.
From the SUSv3 specification:
"If there are any pending unblocked signals after the call to
sigprocmask(), at least one of those signals shall be delivered before
the call to sigprocmask() returns."
"During the time between the generation of a signal and its delivery or
acceptance, the signal is said to be ``pending''....the signal shall
remain pending until it is unblocked..."
Chris
| |
| Rainer Weikusat 2007-12-18, 7:33 am |
| Eric Sosman <esosman@ieee-dot-org.invalid> writes:
> Kelvin Moss wrote:
[...]
[vbcol=seagreen]
>
> Observation #1: There's a race condition in the handling
> of the num_children variable.
>
> main(): sig_handler():
> * Fetch num_children
> * Add one
> --- SIGCHLD --->
> * Fetch num_children
> * Subtract one
> * Store num_children
> <--- return ---
> * Store num_children
This assumes either a load/store architecture or particular properties
of the generated machine code which would need to behave as if running
on a load/store architecture.
> This race is not curable by use of volatile and/or sig_atomic_t.
The volatile is required by C in any case, because reading an object
is not defined to be a side effect without it and because of this, the
compiler is allowed to eliminate read accesses it considers to be
redundant.
> Observation #2: Nothing prevents a child from forking a
> million grandchildren. Is that a problem?
'Forking a million grandchildren' is certainly 'some random work'.
| |
| Rainer Weikusat 2007-12-18, 7:33 am |
| Kelvin Moss <km_jr_usenet@yahoo.com> writes:
> On Dec 17, 12:24 pm, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
>
> That's true. There is no fool proof method to avoid this race
> condition.
There is: Use two counters, one for created and one for deceased
children, and the difference of these two counters to determine if
more children should be forked.
NB: This helps nothing agains all the other nascent problems of the
code, like the assumption that the signal-handler will actually
execute once for every child which is deceased (as somebody else
already wrote).
| |
| Eric Sosman 2007-12-18, 1:33 pm |
| Rainer Weikusat wrote:
> Eric Sosman <esosman@ieee-dot-org.invalid> writes:
>
> [...]
>
>
>
>
> This assumes either a load/store architecture or particular properties
> of the generated machine code which would need to behave as if running
> on a load/store architecture.
True. The race condition I illustrate simply does not
exist for machines that have (and use) atomic in-memory increment
and decrement operations. I can't recall ever meeting such a
machine, but there are a lot of machines I've never met.
>
> The volatile is required by C in any case, because reading an object
> is not defined to be a side effect without it and because of this, the
> compiler is allowed to eliminate read accesses it considers to be
> redundant.
Since volatile doesn't help, it makes little difference
whether it's required or not. My state of residence requires
that my car be registered and inspected, but the requirement
loses its teeth if my car is inoperable and sitting on cement
blocks in the back yard.
--
Eric Sosman
esosman@ieee-dot-org.invalid
| |
| fjblurt@yahoo.com 2007-12-18, 1:33 pm |
| On Dec 18, 6:16 am, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> Rainer Weikusat wrote:
>
> True. The race condition I illustrate simply does not
> exist for machines that have (and use) atomic in-memory increment
> and decrement operations. I can't recall ever meeting such a
> machine, but there are a lot of machines I've never met.
x86/amd64 has the INC and DEC instructions which do just that, though
of course you can't guarantee that the compiler will use them.
| |
| Kelvin Moss 2007-12-18, 1:33 pm |
| On Dec 17, 6:33 pm, fjbl...@yahoo.com wrote:
> On Dec 17, 4:00 pm, Kelvin Moss <km_jr_use...@yahoo.com> wrote:
>
>
>
> I think what he means is to leave SIGCHLD set to SIG_DFL. wait() will
> return exactly once for each child that exits, so that means you can
> fork one more. If two children exited at almost the same time, wait()
> will return immediately on the next iteration. I don't see how this
> could result in having too few processes.
Yes, I think the problem of underforking would surface with main doing
some processing.
Example - Let limit of forking be 5 processes and assume the case that
parent has forked 5 processes. Parent would then land up in the wait
state. Whenever a child terminates it would reduce the forked_counter
to 4, and go ahead and fork again making the forked_counter 5 again.
Here, if the parent does some processing, AND say if 3 child processes
terminate before it enters the loop again, then the forked_counter
would be reduced to 4, while it should actually have been 2. Am I
correct?
| |
| Kelvin Moss 2007-12-18, 1:33 pm |
| On Dec 17, 9:21 pm, Chris Friesen <cbf...@mail.usask.ca> wrote:
> Kelvin Moss wrote:
>
> Yes, you are. It wouldn't make sense any other way.
>
> From the SUSv3 specification:
>
> "If there are any pending unblocked signals after the call to
> sigprocmask(), at least one of those signals shall be delivered before
> the call to sigprocmask() returns."
At least one signal getting delivered should not be good enough,
right? Because you could then maintain erroneous counters for the
forked_count (underfork problem).
Thanks!
| |
| Kelvin Moss 2007-12-18, 1:33 pm |
| On Dec 18, 2:35 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> There is: Use two counters, one for created and one for deceased
> children, and the difference of these two counters to determine if
> more children should be forked.
>
> NB: This helps nothing agains all the other nascent problems of the
> code, like the assumption that the signal-handler will actually
> execute once for every child which is deceased (as somebody else
> already wrote
I guess this seems like a good enough solution, but with some
assumptions as you say.
Thanks!
| |
| Rainer Weikusat 2007-12-18, 1:33 pm |
| Kelvin Moss <km_jr_usenet@yahoo.com> writes:
> On Dec 17, 9:21 pm, Chris Friesen <cbf...@mail.usask.ca> wrote:
>
> At least one signal getting delivered should not be good enough,
> right?
For ordinary (not queued realtime) signals, only one signal will be
delivered per type of event to be signalled, no matter how many times
the event occured.
| |
| Gordon Burditt 2007-12-19, 1:39 am |
| >> True. The race condition I illustrate simply does not
>
>x86/amd64 has the INC and DEC instructions which do just that, though
>of course you can't guarantee that the compiler will use them.
Does that guarantee hold on a 64k-core amd64 machine? For that matter,
just dual-core might screw things up.
| |
| Logan Shaw 2007-12-19, 1:39 am |
| Kelvin Moss wrote:
> Example - Let limit of forking be 5 processes and assume the case that
> parent has forked 5 processes. Parent would then land up in the wait
> state. Whenever a child terminates it would reduce the forked_counter
> to 4, and go ahead and fork again making the forked_counter 5 again.
> Here, if the parent does some processing, AND say if 3 child processes
> terminate before it enters the loop again, then the forked_counter
> would be reduced to 4, while it should actually have been 2. Am I
> correct?
No, the kernel will take care of this for you. wait() will return
without blocking three times. Your value of 5 will get decremented
to 4 the first time, then wait() will return again, it will go to 3,
and then wait() will return again and it will go to 2.
In fact, this kernel guarantee is the same reason why you see "zombie"
processes: these are process table entries which the kernel preserves
(even though the child process has finished) until the parent does
wait() (or until the parent exits).
- Logan
| |
| fjblurt@yahoo.com 2007-12-19, 7:32 am |
| On Dec 18, 5:41 pm, gordonb.o5...@burditt.org (Gordon Burditt) wrote:
>
>
> Does that guarantee hold on a 64k-core amd64 machine? For that matter,
> just dual-core might screw things up.
In a sense. The instructions are atomic with respect to the core
which executes them, but AFAIK they don't lock the bus, so another
core could in principle see something different. However, we are
talking about a single process interrupting itself. The signal
handler isn't going to be run concurrently with the main program
(barring threads etc) so these instructions would be atomic as far as
the process is concerned.
| |
| Rainer Weikusat 2007-12-19, 7:32 am |
| gordonb.o58su@burditt.org (Gordon Burditt) writes:
>
> Does that guarantee hold on a 64k-core amd64 machine? For that matter,
> just dual-core might screw things up.
To phrase this in less bombastic terms: Do x86-read/modify/write
operations execute atomically with respect to other processors in a
multiprocessor?
And the answer is 'yes if bus-locking was requested' (LOCK-prefix).
BTW, and equally valid answer would have been 'RTFM'.
| |
| Rainer Weikusat 2007-12-19, 7:32 am |
| Eric Sosman <esosman@ieee-dot-org.invalid> writes:
> Rainer Weikusat wrote:
[...]
[vbcol=seagreen]
[...]
[vbcol=seagreen]
>
> Since volatile doesn't help, it makes little difference
> whether it's required or not.
Reading the value of an object is not considered to be a side effect
except if the object is declared as being volatile. Consequently, a
compiler may generate code which does not read the value of an object,
but uses a previously-read copy still located in some register
instead. For the code which was posted, this would be the incremented
value calculated in main, while the decremented values created by the
signal handler may not necessarily be ever visible to the code in
main. So, this makes the fairly important difference that the code is
incorrect except if it the object shared by main and the signal
handler is volatile-qualified.
There is no reason to believe that volatile-qualifications, use or
non-use of specific types or, fwiw, spitting over your should before
writing the code, could anyhow affect race conditions like the one you
described above.
Because this is a possible optimization gcc (all versions I know of)
is incapable of doing (if this is actually a technical deficiency or
something intentionally avoided because it would break code written by
the people who like to break existing code written by others would be
an interesting question), use or non-use of volatile for objects with
static storage duration has no effect when the code is compiled with
gcc.
| |
| Eric Sosman 2007-12-19, 1:23 pm |
| Rainer Weikusat wrote:
> Eric Sosman <esosman@ieee-dot-org.invalid> writes:
>
> [...]
>
>
> [...]
>
>
> Reading the value of an object is not considered to be a side effect
> except if the object is declared as being volatile. Consequently, a
> compiler may generate code which does not read the value of an object,
> but uses a previously-read copy still located in some register
> instead. For the code which was posted, this would be the incremented
> value calculated in main, while the decremented values created by the
> signal handler may not necessarily be ever visible to the code in
> main. So, this makes the fairly important difference that the code is
> incorrect except if it the object shared by main and the signal
> handler is volatile-qualified.
My point is that the code is incorrect in any case.
Adding volatile does not eliminate the race, and so does
not correct the incorrect code.
> There is no reason to believe that volatile-qualifications, use or
> non-use of specific types or, fwiw, spitting over your should before
> writing the code, could anyhow affect race conditions like the one you
> described above.
Exactly. volatile or non-volatile, static or automatic,
externally linked or internally linked, descriptive name or
foo -- none of these do anything at all about the problem.
That's why I think it's pointless to recommend volatile
here: it doesn't help. It's a rearrangement of the deck
chairs on the Titanic.
--
Eric.Sosman@sun.com
| |
| Rainer Weikusat 2007-12-19, 1:23 pm |
| Eric Sosman <Eric.Sosman@sun.com> writes:
> Rainer Weikusat wrote:
>
> My point is that the code is incorrect in any case.
> Adding volatile does not eliminate the race,
Since the code is incorrect in any case, fixing the race, which does
not fix the issue with the missing volatile is
"a rearrangement of the deck chairs on the Titanic."
| |
| Rainer Weikusat 2007-12-19, 1:23 pm |
| Rainer Weikusat <rweikusat@mssgmbh.com> writes:
[...]
> Since the code is incorrect in any case, fixing the race, which does
> not fix the issue with the missing volatile is
>
> "a rearrangement of the deck chairs on the Titanic."
To not forget a similar piece of reasoning:
1) Every program can be shortendend by at least one instruction.
2) Every program has at least one bug
=> Every program can be shortended to one instruction which
doesn't work.
(SCNR)
| |
| Kelvin Moss 2007-12-19, 7:27 pm |
| On Dec 18, 8:11 pm, Logan Shaw <lshaw-use...@austin.rr.com> wrote:
> Kelvin Moss wrote:
>
> No, the kernel will take care of this for you. wait() will return
> without blocking three times. Your value of 5 will get decremented
> to 4 the first time, then wait() will return again, it will go to 3,
> and then wait() will return again and it will go to 2.
>
> In fact, this kernel guarantee is the same reason why you see "zombie"
> processes: these are process table entries which the kernel preserves
> (even though the child process has finished) until the parent does
> wait() (or until the parent exits).
>
> - Logan
But the code to wait and decrement the counter gets invoked when I hit
the maximum count. Do you still see the code working properly? Or,
probably I am getting confused with this approach.
Thanks!
| |
| Eric Sosman 2007-12-19, 7:27 pm |
| Kelvin Moss wrote:
> On Dec 18, 8:11 pm, Logan Shaw <lshaw-use...@austin.rr.com> wrote:
>
> But the code to wait and decrement the counter gets invoked when I hit
> the maximum count. Do you still see the code working properly? Or,
> probably I am getting confused with this approach.
Let's drop back ten and punt, as they say in American
football. The simplest way I can think of to get your
original design to work is
1) Realize that the number of SIGCHLD signals received
may be less than the number of children that terminate. To
deal with this, the signal handler should loop calling
waitpid() with WNOHANG to reap all the terminated children,
one by one. When waitpid() says "no more terminations yet,"
the signal handler can return.
2) Use a race-immune method to maintain the number of
running children and to alert the forker when it's all
right to launch another. I suggest a semaphore initialized
to the desired limit on children. The main thread calls
sem_wait() after each successful fork(), and the signal
handler calls sem_post() after each termination.
This, I think, does the least violence to your original
design. It might not be the best solution for your overall
problem, but it should be simple to get working within your
existing framework.
--
Eric.Sosman@sun.com
| |
| Eric Sosman 2007-12-19, 7:27 pm |
| Eric Sosman wrote:
> [...] I suggest a semaphore initialized
> to the desired limit on children. The main thread calls
> sem_wait() after each successful fork(), [...]
Sorry; the initial value should be one less than the
limit. (I originally had sem_wait() before the fork(),
then decided to interchange them and forgot to adjust the
counter. Don't you just *hate* incomplete edits?)
--
Eric.Sosman@sun.com
| |
| Logan Shaw 2007-12-20, 7:35 am |
| Kelvin Moss wrote:
> On Dec 18, 8:11 pm, Logan Shaw <lshaw-use...@austin.rr.com> wrote:
[vbcol=seagreen]
> But the code to wait and decrement the counter gets invoked when I hit
> the maximum count. Do you still see the code working properly? Or,
> probably I am getting confused with this approach.
Here are the guarantees that Unix provides which make it easy to keep
an accurate count of child processes:
(1) When fork() returns and succeeds, the number of child processes
in existence will have increased by 1. Always by 1, exactly.
(2) When wait() returns and succeeds, the number of child processes
in existence will have decreased by 1. Always by 1, exactly.
The only catch to this explanation is that it requires that you define
zombie processes as "existing". But that's a fair definition, since
they show up in the process table, the output of "ps", and so on.
- Logan
|
|
|
|
|