|
Home > Archive > Unix administration > March 2005 > Tru64 w command reports an idle user which doesn't exist ? Is utmp corrupt ?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Tru64 w command reports an idle user which doesn't exist ? Is utmp corrupt ?
|
|
| James Blackmore 2005-03-17, 7:52 am |
| System is Tru64 5.1B running on a cluster.
We have monitoring scripts checking for idle users which picked up the
following:
[4]swips2:/usr/users/jamesb # w |egrep "[d]ays|User"
11:58 up 29 days, 2:44, 69 users, load average: 3.10, 3.09, 3.04
User tty from login@ idle JCPU PCPU what
YENQP1 pts/162 192.168.16.60 23:06 11days
This suggests there should be an idle process for this connection, but
there isn't:
[4]swips2:/usr/users/jamesb # ps -ef | grep [Y]ENQ
[4]swips2:/usr/users/jamesb #
I think this means that the utmp file is incorrect, I cannot reboot
this machine (its a 24/7 service and this is only a minor irritation),
so is there a way to refresh/rebuild utmp or fix this another way ?
Thanks.
| |
| Doug Freyburger 2005-03-17, 5:55 pm |
| James Blackmore wrote:
>
> We have monitoring scripts checking for idle users which picked up
the
> following:
>
> [4]swips2:/usr/users/jamesb # w |egrep "[d]ays|User"
> 11:58 up 29 days, 2:44, 69 users, load average: 3.10, 3.09, 3.04
> User tty from login@ idle JCPU PCPU
what
> YENQP1 pts/162 192.168.16.60 23:06 11days
Not a particularly good method to use, as you have
discovered. What you have enocuntered is common to
all versions of Unix and Unix-alikes as far as I know.
> This suggests there should be an idle process for this connection,
but
> there isn't:
>
> [4]swips2:/usr/users/jamesb # ps -ef | grep [Y]ENQ
> [4]swips2:/usr/users/jamesb #
>
> I think this means that the utmp file is incorrect
Define "correct". Login sessions log in utmp. Login
sessions that exit gracefully log in utmp. Login
sessions that are killed ungracefully do not log in
utmp.
The most typical way these ghosts are created is someone
exitting their windowing session without exitting their
login sessions first. It used to happen under various
X11 window managers but eventually they switched to
more gracefull kill methods. It still happens when
folks exit their Windows login while a Unix window is
open.
> I cannot reboot
> this machine (its a 24/7 service and this is only a minor
irritation),
> so is there a way to refresh/rebuild utmp or fix this another way ?
First thing, you already know this is an issue so modify
your script. If there are no processes, move on to the
next user.
Next thing, if there are plenty of logins on the host
they eventually clean up on their own. Each login
session takes an unused pty and when there enough
logins to reach the pty with the ghost the ghost goes
away.
Last thing, if you really want to clean-up utmp, there
are various programs on various freeware sites. Look
for "fix utmp" and so on on your favorite freeware
site.
| |
| James Blackmore 2005-03-18, 2:47 am |
| Thanks Doug,
> Define "correct". Login sessions log in utmp. Login
> sessions that exit gracefully log in utmp. Login
> sessions that are killed ungracefully do not log in
> utmp.
Thanks, I didn't realise this, and this script has been running for 2
years and this is the first time this has occured so I can only assume
Wintegrate/Powerterm (the common clients here) are reasonably good at
exiting cleanly even when window is closed.
> First thing, you already know this is an issue so modify
> your script. If there are no processes, move on to the
> next user.
Thanks, I will do this, I might even drop the w | grep days altogether
and use STIME on a ps -ef, something like:
ps -ef | grep `date +'%b'` | egrep -v '`date +"%b %e"`'
> Next thing, if there are plenty of logins on the host
> they eventually clean up on their own. Each login
> session takes an unused pty and when there enough
> logins to reach the pty with the ghost the ghost goes
> away.
I don't understand why this didn't happen then, as we have several
hundred logins a day, so surely it should have been re-used. From
midnight to 9am today we have already had 200 logins, and the busy
time starts at 9am, so in 11 days we should have had several thousand
logins ?
[4]swips2:/ # date
Fri Mar 18 08:47:16 GMT 2005
[4]swips2:/ # last | tail -1
wtmp begins Fri Mar 18 00:02
[4]swips2:/ # last | grep -v ftp | wc
197 1956 14481
I wonder if the pty is not properly returned to 'free list' in this
'unclean exit' case, or this would have been cleaned up in 11 days I
think.
> Last thing, if you really want to clean-up utmp, there
> are various programs on various freeware sites. Look
> for "fix utmp" and so on on your favorite freeware
> site.
Thanks, but user accounting information is not too critical, so once I
was sure this was just an 'incorrect' utmp file I flushed it with
logclean.
All users are kicked out for a nightly 2am backup anyway, so I can
easily check for any 'idle' sessions manually from before then which
stayed up, and the accounting info will be correct from now on (till
the next time).
Thanks for the response though, all very useful info !
James.
| |
| Doug Freyburger 2005-03-18, 5:56 pm |
| James Blackmore wrote:
> Doug Freyubrger wrote:
>
>
> Thanks, I didn't realise this, and this script has been running for 2
> years and this is the first time this has occured so I can only
assume
> Wintegrate/Powerterm (the common clients here) are reasonably good at
> exiting cleanly even when window is closed.
Sounds like it. If you're using W2Kmost of the
time folks will logout and it appears that is
being handled gracefully. It looks like someone
powered off without logging out or some sort of
application crash happened.
>
> I don't understand why this didn't happen then, as we have several
> hundred logins a day, so surely it should have been re-used. From
> midnight to 9am today we have already had 200 logins, and the busy
> time starts at 9am, so in 11 days we should have had several thousand
> logins ?
It isn't quite just the number of logins that
determines pty recycling. Each session tends
to use the lowest available numbered pty,
though occasionaly a race condition will have
a session skip a couple. So what really counts
for reclaiming these ghosts is the peak number
of sessions not the raw number.
> I wonder if the pty is not properly returned to 'free list' in this
> 'unclean exit' case, or this would have been cleaned up in 11 days I
> think.
It's just a missing entry in utmp and ownerships
of the device pair in /dev. Not all that much
to the clean-up involved. A process no longer
has the device open so a scan will show it
available.
So what I think happened: Your app is usually good
about exitting gracefully so it gets logged in
utmp. On this occasion there was an application
crash, or kill -9 rahter than -15, or a power off
without logout or similar. It happened to be a
session with a high pty number because the login
happened to happen during a monthly peak.
>
> Thanks, but user accounting information is not too critical, so once
I
> was sure this was just an 'incorrect' utmp file I flushed it with
> logclean.
Yup. Logclean is just fine for utmp clean-up.
As long as there aren't any processes you know
it is really available.
|
|
|
|
|