Unix Programming - killking child processes

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > June 2005 > killking child processes





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author killking child processes
Irek

2005-06-18, 2:48 am

I am writing about killing child processes.

I am developing a program which spawns children. I will refer to it
as the main process. The main process creates a process group. The
children are regular programs like "tail," "python" or Xvfb. I want
to make sure that these children exit when the main process exits.

What I currently do in the main process is this:
- create my own process group,
- install my handler for SIGTERM, this handler just exits with code 0,
- and at the end of the main process I send SIGTERM to my process
group.

Every child process spawned by the main process will receive SIGTERM,
unless the child process leaves the process group of the main process.

However, I have two concerns about this solution.

1. When the main process crashes, the children will run,

2. The system is free to kill any process. One reason might be that
the system memory is running low. If the system kills my main
process, the children will run.

The result in both cases is running children processes, while I want
to make sure no children processes ever run without the main process.

One solution that comes to my mind is that I fork a process which I
will stop right away with SIGSTOP. When the main process is gone (for
whatever reason) and the group becomes orphaned, the system sends
SIGHUP and SIGCONT to all processes in the orphaned group, because
there is this stopped process that I created. The default action for
SIGHUP is to terminate, which is desired for me . However, some
programs do not exit on SIGHUP, like Xvfb.

Therefore, is there a reliable solution? I would appreciate your
advice.

Thanks for reading.


Best,
Irek

Michael Kerrisk

2005-06-20, 2:48 am

Irek,

I don't really have a solution to the problem you pose, in part
because it's not quite clear why you want to do what you propose. But
some comments / qns below may be useful.

>I am writing about killing child processes.
>
>I am developing a program which spawns children. I will refer to it
>as the main process. The main process creates a process group. The
>children are regular programs like "tail," "python" or Xvfb. I want
>to make sure that these children exit when the main process exits.


So, is your main program running arbitrary programs in it's children?
If it is, then you probably have no definitive guarantees about being
able to kill them. You already identify one of them (leaving the
process group) that affects your solution. See some others below.

>What I currently do in the main process is this:
>- create my own process group,
>- install my handler for SIGTERM, this handler just exits with code 0,
>- and at the end of the main process I send SIGTERM to my process
> group.


I assume here that you mean that the signal handler does the
kill(-pgrp, SIGTERM)?

>Every child process spawned by the main process will receive SIGTERM,
>unless the child process leaves the process group of the main process.


And what do you think to do in that case? What I'm really meaning is:
you seem not to care too much about this case -- or do you? If you
don't really care about this case, why not?

>However, I have two concerns about this solution.
>
>1. When the main process crashes, the children will run,
>
>2. The system is free to kill any process. One reason might be that
> the system memory is running low. If the system kills my main
> process, the children will run.
>
>The result in both cases is running children processes, while I want
>to make sure no children processes ever run without the main process.


And what if one of the programs you run is set-user-ID-root, and
changes its credentials in such a way that you can no longer kill it
(i.e., changes both its real and saved set-user-ID)?

And what if the child programs make themselves immune to (i.e., ignore
or catch) SIGTERM?

>One solution that comes to my mind is that I fork a process which I
>will stop right away with SIGSTOP. When the main process is gone (for
>whatever reason) and the group becomes orphaned, the system sends
>SIGHUP and SIGCONT to all processes in the orphaned group, because
>there is this stopped process that I created. The default action for
>SIGHUP is to terminate, which is desired for me . However, some
>programs do not exit on SIGHUP, like Xvfb.


I don't suppose that this solution is really any more reliable. What
if your STOPPED child is killed for some reason? And, as you note,
various programs do not react to to SIGHUP by terminating. Also,
you'll have to do some manipulations of sessions which could confuse
(from the user's point of view) the operation of job control in the
shell.

>Therefore, is there a reliable solution? I would appreciate your
>advice.


Perhaps it would help if you paint a slightly bigger pigcture of what
you want to do, and why.

Cheers,

Michael
Irek

2005-06-20, 6:05 pm

Thanks for your reply, Michael.

> Perhaps it would help if you paint a slightly bigger pigcture of what
> you want to do, and why.


I am developing a client-server application. The client is written in
Java and connects to a remote computer with ssh. At the remote
computer the connection is accepted by the ssh daemon, which starts for
us an application that is the server.

The client issues some commands to the server, so it's a kind of a
login session. However, other clients can connect later to the same
server. The server is supposed to die when the last client
disconnects. The last client to disconnect doesn't have to be the
client which started the server.

The server creates some children to help perform the offered services.
When the server dies I want to make sure that the children die too. I
do not want the users to log in to the remote computer just to kill the
running processes.

> So, is your main program running arbitrary programs in it's children?


They are not arbitrary. These programs are some commands, but I know
in advance what they are.

> If it is, then you probably have no definitive guarantees about being
> able to kill them. You already identify one of them (leaving the
> process group) that affects your solution.


I understand that programs are albe to run away from my control. They
can fork so that PID changes: the parent exits, the child keeps running
under a different PID. They can change the process group, they can
change their session. I understand these actions, and so I make sure
that the children which the server spawns don't change their process
group.

> I assume here that you mean that the signal handler does the
> kill(-pgrp, SIGTERM)?


Yes, I do kill(0, SIGTERM).

>
> And what do you think to do in that case? What I'm really meaning is:
> you seem not to care too much about this case -- or do you? If you
> don't really care about this case, why not?


I know what the children do. Before I decide to incorporate some
command into the server, I make sure that this command will not change
its process group. Therefore I do not have to worry how to kill the
processes which changed their process group.

> And what if one of the programs you run is set-user-ID-root, and
> changes its credentials in such a way that you can no longer kill it
> (i.e., changes both its real and saved set-user-ID)?
>
> And what if the child programs make themselves immune to (i.e., ignore
> or catch) SIGTERM?


Yes, these are problems which I should consider if I ran arbitrary
commands. However, I know that the commands which I use will not pose
such problems.

> I don't suppose that this solution is really any more reliable. What
> if your STOPPED child is killed for some reason?


I agree that the STOPPED child may be killed and then my whole scheme
is for nothing. But this is a "safety system." If the main process
dies before it gets to killing the child processes, there would be
still some chance for killing the child processes. Indeed there is
this chance because this STOPPED process might have not been killed,
and it will take over the task of killing the children.

Thanks again, Michael, for your detailed e-mail. I trully appreciate
it.


Best,
Irek

David Schwartz

2005-06-20, 6:05 pm


"Irek" <ireneusz.szczesniak@wp.pl> wrote in message
news:1119288181.165293.327000@g14g2000cwa.googlegroups.com...

> The client issues some commands to the server, so it's a kind of a
> login session. However, other clients can connect later to the same
> server. The server is supposed to die when the last client
> disconnects. The last client to disconnect doesn't have to be the
> client which started the server.


The only way I can think of to do that that's guaranteed to work is have
a user that's sole purpose is to run the server and have a way to kill all
processes owned by that user.

DS


Chuck Dillon

2005-06-20, 6:05 pm

Irek wrote:

> I am writing about killing child processes.
>
> I am developing a program which spawns children. I will refer to it
> as the main process. The main process creates a process group. The
> children are regular programs like "tail," "python" or Xvfb. I want
> to make sure that these children exit when the main process exits.
>
> What I currently do in the main process is this:
> - create my own process group,
> - install my handler for SIGTERM, this handler just exits with code 0,
> - and at the end of the main process I send SIGTERM to my process
> group.
>
> Every child process spawned by the main process will receive SIGTERM,
> unless the child process leaves the process group of the main process.
>
> However, I have two concerns about this solution.
>
> 1. When the main process crashes, the children will run,
>
> 2. The system is free to kill any process. One reason might be that
> the system memory is running low. If the system kills my main
> process, the children will run.


Neither of the above can be eliminated unless you don't rely on a
running process. By that I mean using a mechanism that runs
periodically to look for the unwanted conditions and when it sees them
it clean things up. A cron job or something in the inittab for example.

The way I delt with a similar situation in the past was to create a
watchdog process for every spawned process. IOW, the "main" process
forks a watchdog that then forks the worker process. The watchdog
should be as simple and lean as possible so it's not likely to be buggy
or perceived to be a resource hog by the system. The watchdog monitors
the necessary conditions with minimal use of resources.

Possible approach: the "main" process creates a file and grabs a lock
on it. Each watchdog establishes a handler for SIGCHLD and blocks
trying to grab the lock on the file. If it gets SIGCHLD it reaps the
child and exits. If it gets the lock it kills the worker and then exits.

-- ced


>
> The result in both cases is running children processes, while I want
> to make sure no children processes ever run without the main process.
>
> One solution that comes to my mind is that I fork a process which I
> will stop right away with SIGSTOP. When the main process is gone (for
> whatever reason) and the group becomes orphaned, the system sends
> SIGHUP and SIGCONT to all processes in the orphaned group, because
> there is this stopped process that I created. The default action for
> SIGHUP is to terminate, which is desired for me . However, some
> programs do not exit on SIGHUP, like Xvfb.
>
> Therefore, is there a reliable solution? I would appreciate your
> advice.
>
> Thanks for reading.
>
>
> Best,
> Irek
>



--
Chuck Dillon
Senior Software Engineer
NimbleGen Systems Inc.
Irek

2005-06-20, 6:05 pm

Thanks, David, for your e-mail.

> The only way I can think of to do that that's guaranteed to work is have a
> user that's sole purpose is to run the server and have a way to kill all
> processes owned by that user.


That's a good idea, but unfortunately the application is meant for
regular users, who may login for a regular session (like telnet, ftp,
ssh) to the remote computer while they use our application. If we
follow your scheme and kill all the processes of a given user when the
server dies, we would kill also these sessions the user is running.

Moreover, the user might want to run serveral instances of our
application, i.e. of our server, and so exiting one session would kill
the rest.


Best,
irek

David Schwartz

2005-06-20, 6:05 pm


"Irek" <ireneusz.szczesniak@wp.pl> wrote in message
news:1119295917.153565.120630@g14g2000cwa.googlegroups.com...

> Thanks, David, for your e-mail.


[vbcol=seagreen]
> That's a good idea, but unfortunately the application is meant for
> regular users, who may login for a regular session (like telnet, ftp,
> ssh) to the remote computer while they use our application. If we
> follow your scheme and kill all the processes of a given user when the
> server dies, we would kill also these sessions the user is running.


> Moreover, the user might want to run serveral instances of our
> application, i.e. of our server, and so exiting one session would kill
> the rest.


Then I'm totally confused. Because you wrote this:

>The client issues some commands to the server, so it's a kind of a
>login session. However, other clients can connect later to the same
>server. The server is supposed to die when the last client
>disconnects. The last client to disconnect doesn't have to be the
>client which started the server.


So each user can start a server that other clients can later connect to?
Are these "clients" distinct from "users"? I guess I don't understand your
setup. If users are setting up their own servers, one for each user, why are
other clients connecting to the same server? Is the same user running more
than one client to the same server? Or what?

The problem you have is that you want one process to be a master over
another, able to terminate it and anything it might create or do. However,
the master has the same privileges as the thing it's supposed to master.
That makes it very hard.

DS


Irek

2005-06-20, 6:05 pm

Thanks for your interest, David.

> Then I'm totally confused. Because you wrote this (...)


My apologies if I had not been clear. Let me add a couple more
details.

The application is of the client-server type. The client logs in with
ssh, and asks the ssh daemon to start a server. This server is our
application. The server is started by the user who logs in, and is
run under her privileges.

A server manages a session. Then other clients can connect to the
same session (i.e. a server), but they must have an account on the
remote computer. We just rely on the system accounts, and we don't
have our own accounts (like, for instance, Samba has).

One user can start many sessions. Other services, like ftp or ssh,
allow many sessions too. The important difference is that we allow
other uses to connect to a session. These sessions facilitate
collaboration between users: exchange of data, pictures and movies,
and so different users have to connect to one server.

> Are these "clients" distinct from "users"?


A user is the system user, who can login with ssh. A client in the
client-server terminology is a program which asks for service from the
server. A client is the program run by the user. I realize that I
might have not used these terms consistently.

> The problem you have is that you want one process to be a master
> over another, able to terminate it and anything it might create or
> do. However, the master has the same privileges as the thing it's
> supposed to master. That makes it very hard.


The root privileges are the only kid of privileges there are, which
are better then the privileges of a regular user. On the remote
computer that we use (it' a large computational facility) we have no
chance of getting the root privileges. We have to our best with the
resources at hand.


Best,
Irek

Irek

2005-06-21, 2:50 am

Thanks, Chuck, for the ideas.

> Neither of the above can be eliminated unless you don't rely on a
> running process.


I must admit it sounds disappointing. Does it really mean there are
no guarantees? But I must share your pessimism after having read
relevant chapters of "Advanced programming in the UNIX Environment" by
W. Richard Stevens and "Advanced UNIX Programming" by Marc
J. Rochkind.

> By that I mean using a mechanism that runs periodically to look for
> the unwanted conditions and when it sees them it clean things up. A
> cron job or something in the inittab for example.


Brining off this trick would fix the problem well. Unfortunately, the
server that we are using is a large computational facility where we
are not allowed to have a user crontab. Asking the system
administrators to install a system wide crontab is unfeasible. For
those reasons this solution cannot be applied.

> The watchdog should be as simple and lean as possible so it's not
> likely to be buggy or perceived to be a resource hog by the system.
> The watchdog monitors the necessary conditions with minimal use of
> resources.


Right now I think that making the main process as simple and lean as
possible is the best solution. Therefore my main process has been
written with these rules in mind. The main process is very simple:

> #!/bin/bash
>
> # We don't want the script to exit with an error code, so we must
> # capture the TERM signal and exit with 0.
>
> trap 'exit 0' TERM
>
> # At the exit we want to send SIGTERM to all processes in our group,
> # which means to everything that we start.
>
> trap 'kill -s TERM 0' EXIT
>
> # Here we run the worker. The worker spawns the children which belong
> # to the same process group
>
> ./worker


The above running script takes only 96KB of memory on Linux, so it's
unlikely that the OS will kill it. The chance that this script
crashes is slim, possibly couldn't be slimmer, because Bash has been
around for many years and is reliable.


Best,
irek

loic-dev@gmx.net

2005-06-21, 2:50 am

Salut Irek,

>
> Right now I think that making the main process as simple and lean as
> possible is the best solution. Therefore my main process has been
> written with these rules in mind. The main process is very simple:


[snip]

> The above running script takes only 96KB of memory on Linux, so it's
> unlikely that the OS will kill it. The chance that this script
> crashes is slim, possibly couldn't be slimmer, because Bash has been
> around for many years and is reliable.


I am regurlarly faced to similar problems when developping High
Available / Fault Tolerant Systems.

You could have your main() forks a reaper process whose action shall be
to clean-up the process group if for any reason the main process
vanishes. There are several solutions to get asynchronously notified
that the parent has died, for instance using a lock as suggested by
Chuck (I personly use a pipe). When the reaper process gets notified,
it reaps all the processes in the process group.

Next, it might happen that the reaper process dies too (though unlikely
because of a bug or resource hog, since such a reaper is 'trivial').
You might even go further in the paranoid scale, and re-fork() the
reaper from the main process if it dies (the main process shall receive
SIGCHLD in that case).

This surely improves system's reliability, but this is strictly
speaking not 100% reliable. Indeed both main and reaper process might
get killed at about the same time.

There is no simple solution to your problem I am afraid. I guess, it
all depends how "far" you want to go.


HTH,
Loic.

Irek

2005-06-22, 2:50 am

Thank, Loic, for your e-mail.

> You might even go further in the paranoid scale, and re-fork() the
> reaper from the main process if it dies (the main process shall receive
> SIGCHLD in that case).


Thanks for this idea. It sounds both interesting and useful.

All in all, I think I will stick to my small Bash script which starts
the main worker and which kills the whole process group at the very end
of the script.


Best,
Irek

Rajan

2005-06-22, 2:50 am

Hi Irek,
what if we keep checking in the child process for its ppid() , and
store the child process pid by calling getpid() , then when the ppid()
is 0 do a exit(errno) for the child.
Do you think this would be a feasible solution?

loic-dev@gmx.net

2005-06-22, 7:54 am

Salut Rajan,

> what if we keep checking in the child process for its ppid() , and
> store the child process pid by calling getpid() , then when the ppid()
> is 0 do a exit(errno) for the child.
> Do you think this would be a feasible solution?


First you had to check periodically if the parent pid is 1 (which
corresponds to the init process), not 0.

Second, your method is nothing but polling. Chuck and myselve gave
better methods that notify the child asynchronously when the parent
terminates (lock, pipe...).

Third, if I understood Irek correctly, he has no influence about the
child (otherwise, he would had set-up a handler for SIGHUP that suits
to his needs).

Cheers,
Loic.

Irek

2005-06-23, 2:48 am

Rajan and Loic,

> Second, your method is nothing but polling. Chuck and myselve gave
> better methods that notify the child asynchronously when the parent
> terminates (lock, pipe...).


Yes, it's disadvantage is polling, but notice how simple and reliable
this method is. The reaper can be just as simple as shown below.
Compile the code below and name the executable "reap".

> #include <sys/types.h>
> #include <unistd.h>
> #include <signal.h>
>
> #define POLL_INTERVAL 60
>
> int main(void)
> {
> /* continue until the parent process is gone */
> while(getppid() != 1)
> sleep(POLL_INTERVAL);
>
> /* reap the whole process group */
> kill(0, SIGTERM);
> }


I tested this solution with this script:

> #!/bin/bash
>
> ./reap &
>
> sleep 1h &
> sleep 1h &
> sleep 1h &
> sleep 1h &
>
> sleep 1h


Run the script and press ctrl+c or kill it with SIGKILL. This reaper
works really nice: orphaned process are killed at most after 60
seconds. The disadvantage it that it's polling, but its advantage is
that it uses the most basic features and it's very simple.

In one of my previous posts I mentioned a stopped reaper that is going
to be woken up when the main process is gone. This is the code of the
reaper:

> #include <sys/types.h>
> #include <unistd.h>
> #include <signal.h>
>
> void reap_group(int a)
> {
> kill(0, SIGTERM);
> }
>
> int main(void)
> {
> signal(SIGHUP, reap_group);
> kill(getpid(), SIGSTOP);
> }


Again, compile this code to get the "reap" executable, and run the
script again. Kill the script, and the "sleep" commands are gone.
Advantage: asynchronous, no polling. Disadvantage: the mechanism is
more complicated.

I tested the code on Linux 2.6.10 and AIX 5.2.


Best,
Irek

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com