|
Home > Archive > Unix Programming > January 2005 > Can a user app hang a *IX system?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Can a user app hang a *IX system?
|
|
|
| I have a situation where our program(s) completely hang the system.
OS is MontaVista Embedded Linux on a xscale big-endian cpu.
Our app starts and hangs, and also hangs the system, can't ctrl-brk the app;
can't use console, can't telnet (not a rejected connection, just hangs
during the connect). I can, however, ping the system (thanks for small
favors).
I tried using gdb to attach to the process, and gdb also hangs.
Further, while building the app, if I had a large no. of objects, it would
also hang during the link for the shared library - by breaking it up into
multiple .so files, it succeeded.
It's obviously a resource problem, but I thought most *IX systems would
start grumbling about running out before just dying.
I have a smaller app that tests the no. of threads, semaphores, and files
and it behaves normally - running out with about 500 thrds and 127
semaphores.
This is a very large app and is obviously way too big for the system (64mb),
but should it be able to hang totally. The apps is fairly stable and runs
fine on AIX, Linux RH, HP, Solaris.
Any insight before I start flooding it with printfs???
Thanks much,
--
Jim
--
Gail
| |
| Fletcher Glenn 2005-01-26, 8:47 pm |
| JimW wrote:
> I have a situation where our program(s) completely hang the system.
> OS is MontaVista Embedded Linux on a xscale big-endian cpu.
> Our app starts and hangs, and also hangs the system, can't ctrl-brk the app;
> can't use console, can't telnet (not a rejected connection, just hangs
> during the connect). I can, however, ping the system (thanks for small
> favors).
>
> I tried using gdb to attach to the process, and gdb also hangs.
> Further, while building the app, if I had a large no. of objects, it would
> also hang during the link for the shared library - by breaking it up into
> multiple .so files, it succeeded.
>
> It's obviously a resource problem, but I thought most *IX systems would
> start grumbling about running out before just dying.
>
> I have a smaller app that tests the no. of threads, semaphores, and files
> and it behaves normally - running out with about 500 thrds and 127
> semaphores.
>
> This is a very large app and is obviously way too big for the system (64mb),
> but should it be able to hang totally. The apps is fairly stable and runs
> fine on AIX, Linux RH, HP, Solaris.
>
> Any insight before I start flooding it with printfs???
>
> Thanks much,
Normally, what you say is true, a user app cannot hang the system.
However, what can happen is that so much of real memory can be consumed
by the application that the system will spend more time swapping than it
does performing useful work. This condition is called "thrashing", and
in this condition, the system will appear to be very busy and almost
totally non-responsive. You might be able to get it to do something,
but you will probably die of old-age before anything useful is done.
Before you start your application, you might try running vmstat in the
mode where it prints out the virtual memory statistics every few
seconds. This might give you some clue if the system is thrashing.
--
Fletcher Glenn
| |
| Måns Rullgård 2005-01-26, 8:47 pm |
| "JimW" <j1mw_nospam_plz@bellsouth.net> writes:
> I have a situation where our program(s) completely hang the system.
> OS is MontaVista Embedded Linux on a xscale big-endian cpu.
> Our app starts and hangs, and also hangs the system, can't ctrl-brk the app;
> can't use console, can't telnet (not a rejected connection, just hangs
> during the connect). I can, however, ping the system (thanks for small
> favors).
>
> I tried using gdb to attach to the process, and gdb also hangs.
> Further, while building the app, if I had a large no. of objects, it would
> also hang during the link for the shared library - by breaking it up into
> multiple .so files, it succeeded.
>
> It's obviously a resource problem, but I thought most *IX systems would
> start grumbling about running out before just dying.
>
> I have a smaller app that tests the no. of threads, semaphores, and files
> and it behaves normally - running out with about 500 thrds and 127
> semaphores.
>
> This is a very large app and is obviously way too big for the system (64mb),
> but should it be able to hang totally. The apps is fairly stable and runs
> fine on AIX, Linux RH, HP, Solaris.
>
> Any insight before I start flooding it with printfs???
Are you by chance running out of memory so the system starts swapping
heavily? This can make the system appear to be hung, even if it is
making slow progress.
--
Måns Rullgård
mru@inprovide.com
| |
| Chuck Dillon 2005-01-27, 5:52 pm |
| JimW wrote:
> I have a situation where our program(s) completely hang the system.
> OS is MontaVista Embedded Linux on a xscale big-endian cpu.
> Our app starts and hangs, and also hangs the system, can't ctrl-brk the app;
> can't use console, can't telnet (not a rejected connection, just hangs
> during the connect). I can, however, ping the system (thanks for small
> favors).
>
> I tried using gdb to attach to the process, and gdb also hangs.
> Further, while building the app, if I had a large no. of objects, it would
> also hang during the link for the shared library - by breaking it up into
> multiple .so files, it succeeded.
>
> It's obviously a resource problem, but I thought most *IX systems would
> start grumbling about running out before just dying.
I suggest you ask the folks in comp.unix.admin for ideas on how to
diagnose what is going on. If the system is not properly configured
such a situation can occur. I'm not an admin but I have worked on
development systems configured such that /tmp was in the root file
system. A user program that generated a lot of data to /tmp, like a
compiler or linker for example, could fill the root file system which
would lock up the system.
I would think it's also possible that you have a hardware problem, like
some bad memory for example.
Then there was that development system with the non-terminated SCSI
bus. It worked for the most part but occasionally got really confused
IIRC.
-- ced
>
> I have a smaller app that tests the no. of threads, semaphores, and files
> and it behaves normally - running out with about 500 thrds and 127
> semaphores.
>
> This is a very large app and is obviously way too big for the system (64mb),
> but should it be able to hang totally. The apps is fairly stable and runs
> fine on AIX, Linux RH, HP, Solaris.
>
> Any insight before I start flooding it with printfs???
>
> Thanks much,
--
Chuck Dillon
Senior Software Engineer
NimbleGen Systems Inc.
| |
| Norm Dresner 2005-01-27, 5:52 pm |
| "JimW" <j1mw_nospam_plz@bellsouth.net> wrote in message
news:d2WJd.97605$zy6.85228@bignews5.bellsouth.net...
> I have a situation where our program(s) completely hang the system.
> OS is MontaVista Embedded Linux on a xscale big-endian cpu.
> Our app starts and hangs, and also hangs the system, can't ctrl-brk the
app;
> can't use console, can't telnet (not a rejected connection, just hangs
> during the connect). I can, however, ping the system (thanks for small
> favors).
>
> I tried using gdb to attach to the process, and gdb also hangs.
> Further, while building the app, if I had a large no. of objects, it would
> also hang during the link for the shared library - by breaking it up into
> multiple .so files, it succeeded.
>
> It's obviously a resource problem, but I thought most *IX systems would
> start grumbling about running out before just dying.
>
> I have a smaller app that tests the no. of threads, semaphores, and files
> and it behaves normally - running out with about 500 thrds and 127
> semaphores.
>
> This is a very large app and is obviously way too big for the system
(64mb),
> but should it be able to hang totally. The apps is fairly stable and runs
> fine on AIX, Linux RH, HP, Solaris.
>
> Any insight before I start flooding it with printfs???
>
> Thanks much,
> --
>
> Jim
I've had some experience writing device drivers for various Linux systems
and have had the wonderful experience of having a bad address in a device
driver hang the motherboard (not the system but the hardware). There were
times during the development of these drivers that it seemed like the
address (which was computed in the driver) was at least partially a function
of data and requests that some user programs passed to it.
SO ... Yes, it is possible, I truly believe, that a user program in
conjunction with a device driver that's not adequately protected from bad
request data can cause a PCI-bus hang in some system configurations. Is
this what's happening in your case? Probably not, but it does answer the
question in the subject line and might indicate that your suspicions about
the hang being in part triggered by a specific user program are correct.
Norm
|
|
|
|
|