Red Hat Topics - Hard reset - toasted RH9

This is Interesting: Free IT Magazines  
Home > Archive > Red Hat Topics > October 2005 > Hard reset - toasted RH9





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Hard reset - toasted RH9
John B. Moore

2005-10-08, 5:49 pm

Need some advice from the wizards...

(Note that for a lot of reasons we need to stay with RH9 (kernal
2.4.20-30.9), so though it is good advice to upgrade and use the
latest, that is not feasible at this time.. (Lots of machines.. a lot
of configurations..etc..)

I had a situation* where I had to do a hard reset of a machine (newly
installed) and upon reboot the machine went into "kernal panic" after
the line "Freeing unused kernal memory..."

This was disconcerting because it so totally hosed the install.

This was a newly installed machine, on new hardware, and the hard reset
was done soon after the initial startup and the reset totally hosed
RHat.. (Tried the rescue disk and it reported there as no "linux
partition", really hosed..)

I'm considering this a "wakeup" call to investigating strategies to
recover from this type of thing, as well as investigate if there is a
way to make the system less fragile to getting hosed.. (since a major
power outage could simulate the same event, btw.. there are 45 min.
UPSs on every machine)

So.. any wisdom to be shared on this problem or pointers to resources
would be appreciated..

John Moore
SonicSpider LLC

*Switched monitors and the graphic display was scrambled.. could not get
it to shut down.. Obviously, I since then, have documented the key
commands that will shutdown the machine.. another issue..
Jean-David Beyer

2005-10-08, 5:49 pm

John B. Moore wrote:
> Need some advice from the wizards...
>
> (Note that for a lot of reasons we need to stay with RH9 (kernal
> 2.4.20-30.9), so though it is good advice to upgrade and use the
> latest, that is not feasible at this time.. (Lots of machines.. a lot
> of configurations..etc..)
>
> I had a situation* where I had to do a hard reset of a machine (newly
> installed) and upon reboot the machine went into "kernal panic" after
> the line "Freeing unused kernal memory..."
>
> This was disconcerting because it so totally hosed the install.
>
> This was a newly installed machine, on new hardware, and the hard reset
> was done soon after the initial startup and the reset totally hosed
> RHat.. (Tried the rescue disk and it reported there as no "linux
> partition", really hosed..)
>
> I'm considering this a "wakeup" call to investigating strategies to
> recover from this type of thing, as well as investigate if there is a
> way to make the system less fragile to getting hosed.. (since a major
> power outage could simulate the same event, btw.. there are 45 min.
> UPSs on every machine)


.... so do the UPSs not notify the machine that power is lost and give the
machine almost 45 minutes to do a proper power-down? My systems use APC
Smart-UPS units, and the associated software polls the UPS about every 2
seconds to see if all is well. If it is, fine. If it reports line power
failed, it notifies the users that power will be going doen after an
interval, and that they should get off before then. When the time comes, it
does a controlled shutdown which, on my system, terminates all the daemons,
sends sig 15 to all the remaining processes, and if they do not take the
hint, sends them a sig 9 about 5 seconds later.
>
> So.. any wisdom to be shared on this problem or pointers to resources
> would be appreciated..


ext3 file systems would, I imagine, survive this kind of thing better than
ext2 ones, but if it scrozzled your partition table, boot block, or
something like that, too bad.

I run BRU and CRU for backups. I do full backups every day. Once a month I
do a backup with CRU that writes a boot floppy, a root floppy, and a backup
tape (my tape drives can take up to 160 GBytes). When disaster strikes, I
just boot with the boot disk in there, followed by the root disk. This reads
the backup tape and supposedly restores everything: partition table, and all.

It must be said that I have never gotten this to work with recent versions
of Red Hat. Part of the problem is that Red Hat like to but labels in
/etc/fstab, and the restore part of the CRU software cannot deal with them.
I changed it. Here is an old one on another machine:

LABEL=/ / ext3 defaults 1 1
LABEL=/boinc /boinc ext3 defaults 1 2
LABEL=/boot /boot ext2 defaults 1 2
....

Here is how I fixed it on the new machine:
/dev/hda3 / ext3 defaults 1 1
/dev/hda11 /boinc ext3 defaults 1 2
/dev/hda1 /boot ext2 defaults 1 2
....

That still does not work, and I was too annoyed to figure out why. Surely an
incompatability between what BRU|CRU think should be there and what is. The
CRU boot disk has their own kernel, so they know what to expect from it, but
they also think they know the file system structure, and apparently it is
not quite like that.

So I just reinstalled any old Linux, installed BRU, and did a restore from
that. That works.

I really should figure out what is the matter, but I do not really have a
machine I can take down and practice on.

Nice to always have full backups, none over 24 hours old.
>
> John Moore
> SonicSpider LLC
>
> *Switched monitors and the graphic display was scrambled.. could not get
> it to shut down.. Obviously, I since then, have documented the key
> commands that will shutdown the machine.. another issue..



--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ PGP-Key: 9A2FC99A Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 16:35:00 up 8 days, 9:37, 5 users, load average: 4.19, 4.26, 4.37
John B. Moore

2005-10-08, 8:45 pm

Jean-David Beyer wrote:

>... so do the UPSs not notify the machine that power is lost and give the
>machine almost 45 minutes to do a proper power-down? My systems use APC
>Smart-UPS units, and the associated software polls the UPS about every 2 seconds to see if all is well.
>

I had hoped to do that as well but the APC units I have (Back_UPS 1000
and 1500) did not have any linux based software.. I admit I was really
bummed and have not had time to get into it further (I seem to remember
that there was some specs that might be used to write a driver..)

Obviously I need to revisit this, because it is definitely in the
solution matrix.. Just wish I could find someone that has done this
already.. (APC at the time did not have software for this unit..)

>ext3 file systems would, I imagine, survive this kind of thing better than
>ext2 ones, but if it scrozzled your partition table, boot block, or
>something like that, too bad.
>
>


I'm using ext3..

>I run BRU and CRU for backups. I do full backups every day.
>Nice to always have full backups, none over 24 hours old.
>
>

I have a good backup system (weekly full, daily incremental), so that is
not the issue.. This "event" got me thinking again that I'm still in a
position where it would be a real hassle to re-intstall a doz or more
machines.. restore backups and get everything up and running. Soo...
I'm out researching and searching again..

Thanks for the feedback...

John..
Lenard

2005-10-08, 8:45 pm

John B. Moore wrote:

> I had hoped to do that as well but the APC units I have (Back_UPS 1000
> and 1500) did not have any linux based software.. I admit I was
> really bummed and have not had time to get into it further (I seem to
> remember that there was some specs that might be used to write a
> driver..)



http://www.networkupstools.org/



--
Contained within the Microsoft EULA;
This Limited Warranty is void if failure of the Product has resulted
from accident, abuse, misapplication, abnormal use or a virus.
Jean-David Beyer

2005-10-09, 7:46 am

John B. Moore wrote:
> Jean-David Beyer wrote:
>
>
> I had hoped to do that as well but the APC units I have (Back_UPS 1000
> and 1500) did not have any linux based software.. I admit I was really
> bummed and have not had time to get into it further (I seem to remember
> that there was some specs that might be used to write a driver..)


I know. I wish APC would get with the program. I use the obsolete
PowerChutePlus-4.5.3-2_RedHat.i386.rpm
that you should be able to find on the web somewhere.
APC have a new thing for Linux, that I have downloaded. For some reason, I
am not using it. I forget why. I think incompatable libraries, but it may be
that the monitoring program works only on Windows.

Below you will see that my machine rebooted. The power here (almost all of
the towns of Shrewsbury and Tinton Falls and perhaps elsewhere) went off at
22:08:29 last night and did not come back until a little before 01:40:09
this morning. The power came back a bit sooner, but I have a several minute
delay (different for each machine so they do not all come back at once) and
a requirement that the battery recharge to 50% before the machine reboots.
It actually shutdown at 23:03:59.

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ PGP-Key: 9A2FC99A Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 05:05:00 up 3:26, 3 users, load average: 5.04, 4.44, 4.20
John B. Moore

2005-10-09, 5:46 pm

Cool, Thanks.. (that should keep me entertained for a while...<G> )

Lenard wrote:

>John B. Moore wrote:
>
>
>
>
>
>http://www.networkupstools.org/
>
>
>
>
>

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com