| gellis 2004-05-27, 4:36 pm |
| To Any & All System Administrators:
Subject: Best Practice re: patching multiple Sun Servers connected to a
Hitachi SAN
I am a system administrator for several Sun Servers and encountered several
major problems regarding the installation of patches. I would like to know
from other System Administrators the best practices that they use or
recommend related to patching systems with a similar configuration.
The environment:
One Hitachi 9200 SAN with two controllers and four HBAs connected to two
Brocade 2408 Silkworm fibre switches.
Two Sun E4500 with SCSI connected Sun Netra D130 as boot drives. The E4500's
use two JNI Sbus FC64-1063 Fibre cards as the physical connections to the
Brocade Switches.
Two Sun V480 with internal boot drives. The V480s use two JNI PCI FCE-6410
cards as the physical connections to the Brocade Switches
All Sun Servers are using Solaris 8 with Veritas Volume Manager 3.2 to
manage filing systems. The root partitions are encapsulated. All LUNs on
Hitachi 9200 are under Veritas control and use VxFS filing system with large
files switched on. Persistent binding is used to bind each LUN to allowing
VxDMP to manage two paths to the SAN. The Hitachi uses LUN security and the
wwn (world wide node names) and wwp (world wide port names) of the intended
server to recognize the correct LUNs. The Brocades are soft zoned to allow
the same wwn and wwp names to be recognized by each server. Each server has
JNI Ezfibre 2.2 installed and is used to bind both paths. All Servers are
behind a firewall and have a limited number of Unix user accounts setup. The
E4500's are used as Oracle database servers and the V480's are used as our
Oracle application tier. The systems have been running very well for more
than 2 years with no major issues.
The issue:
An outside audit firm is advising executive management of the best practices
of patching and security hardening Sun Servers. The delivery and content of
their best practices is reminiscent of a textbook. However, I thought the
task of patching these servers would be easy and with that I proceed
forward. Well, you may have guessed things did not go so well or else I
would not be writing this letter. I ran patchcheck and determined that my
best course of action was to install a patch cluster, specifically
8_recommended patch cluster with a date stamp of 4/20/2004. Being of a
cautious nature, I decided to contact Sun Support for their recommendation;
in addition, I decided to only patch the development servers. Sun Advised me
to install the patch cluster in single user mode with my mirrors detached
and also to install 113201-05 prior to rebooting, because the patch cluster
breaks parts of Veritas. I read the readme and thought I was set to go. I
thought wrong! The patch cluster had work fine on another system an E420
that is not connected to any SAN and does not have Veritas Volume manager
installed. The patch cluster ran on the E4500 and the V480 with only
expected error codes of 2 and 8, no problem so far. Yes, I should I have
patched one system at a time, but that is hindsight and I'm sure we have all
rushed into a burning room as well. I ran the Veritas patch 113201-05 on the
v480 and that is when the problems started to roll out. The patch choked on
Hitachi and removed the LUN's managed by Veritas from view. I rebooted the
system several times and checked the system with the following commands:
vxdisk list, vxprint -hrt, iostat -En, format (-e), etc.... I checked the
sd.conf file and it was fine. The Sun O/S was still picking up the LUNs, but
Veritas was not. Time to call Sun Support, several hours later and having
run Sun explorer they escalated and advised me to call Hitachi. Sun and I
agreed not to proceed with patching the E4500 until a solution had been
determined for the V480. On the line with Hitachi, they had me run a utility
and send them the output and then advised me to call Veritas while they
worked on it. By this time, I had that sinking feeling and all the fun had
evaporated along with my system's stability. Veritas' turn, same deal run
Veritas Vxexplorer and punch some commands into the system and will get back
to when we have more information. 10+ hours and a flurry of commands from
Sun, Hitachi, and Veritas managed to bring the LUN's back, meanwhile the
E4500 begins to panic. I had moved to another directory because I was going
to reinstall an older patch that I believed would recreate the soft links,
which may have been deleted. I had to recover the system (the un-encapsulate
deal for those of you have experienced it) so I was very unhappy by this
time and up 30+ hours straight, bummer. It appears (not complete certain
yet) that HDS9200's were set to active/active and the patch requires
active/passive and that was not included in the patch readme, not that I may
have found that anyway. I am still working with all vendors involved to
clear this problem up.
The basic question:
Anyone with systems similar to this, do you patch? If yes, is your
architecture more vanilla than mine? Your view, your company's policy, and
the resources allocated (I am the sole admin on these systems) to this grand
endeavor? Does anyone have to deal with these type policies (patching,
security, and disaster recovery) from outside or internal auditors? Any
comments are welcome.
Sincerely,
GE
|