Unix administration - Software configuration management tool required

This is Interesting: Free IT Magazines  
Home > Archive > Unix administration > September 2005 > Software configuration management tool required





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Software configuration management tool required
Vincent van Scherpenseel

2005-08-18, 7:50 am

Hello group,

Our company has more than 100 servers running all different kinds of
services which are currently all documented. The problem is: after a couple
of months the piece of paper will be worthless if it doesn't get updated by
the system administrators logging what they changed.

There are several administrators working on the servers and the problem is
that not everything which gets changed will be logged. People forget about
it, or just don't care to log.

Management has now issued a new policy requiring *everyone* to log the
changes. Unfortunately, checking all the servers if the administrators are
living up to the policy is a very time-consuming task.

Is there any software out there which is able to check remote servers on
their running services and their configuration? I need to know which
services are running, where their configuration lives, where they're
logging to, where theier data is stored (if any), what their dependencies
are, which cronjobs are planned and when, ...

Unfortunately I can't use snmp, since that only lists services *currently*
running, no cronjobs and no configuration files etc.

I know there probably won't be any tool out there which is able to do all
the stuff we want, but if it only detects a little bit it would be of great
help to us.

The servers are running different versions of Linux and FreeBSD.

Please let me know if you know any software for this purpose.

With Regards,
Vincent van Scherpenseel.
Dave Hinz

2005-08-18, 5:54 pm

On Thu, 18 Aug 2005 10:53:25 +0200, Vincent van Scherpenseel <reply@newsgroup.invalid> wrote:
> Hello group,
>
> Our company has more than 100 servers running all different kinds of
> services which are currently all documented.


....that you know of...

> The problem is: after a couple
> of months the piece of paper will be worthless if it doesn't get updated by
> the system administrators logging what they changed.


Yup.

> There are several administrators working on the servers and the problem is
> that not everything which gets changed will be logged. People forget about
> it, or just don't care to log.


Normal and predictable behavior, yes.

> Management has now issued a new policy requiring *everyone* to log the
> changes. Unfortunately, checking all the servers if the administrators are
> living up to the policy is a very time-consuming task.
> Is there any software out there which is able to check remote servers on
> their running services and their configuration? I need to know which
> services are running, where their configuration lives, where they're
> logging to, where theier data is stored (if any), what their dependencies
> are, which cronjobs are planned and when, ...


Couple of thoughts. On a basic level, you could get your logging by
instituting sudo on your servers - all work done as root is logged in
that manner. The logs aren't the most human readable but they're
complete.

There's a commercial product called "BladeLogic" (named strangely as it
has nothing to do with specifically blade servers, but there you go)
which we'll most likely be putting in place next year here, for our
100+ unix boxes. It has all the logging, rollback, things like "change
the encryption on all apache instances in the DMZ" type logic, and a ton
of other stuff. Scheduling as well. They'll come out & give you the
dog&pony show; we had the demo and it looks pretty good. A friend of
mine went to work for them and he's pretty cynical generally, but he's
very enthused about this; for a while after he went there he'd call and
tell me "Hey, you know that quarterly patching you guys do? I've got a
module that does it hands-off", and so on. Looks like a solid tool,
and not obscenely expensive.

> Unfortunately I can't use snmp, since that only lists services *currently*
> running, no cronjobs and no configuration files etc.


Yup. Same reasons we went looking for something else, and when budget
allows (next fiscal year) we'll most likely go with it.

> Please let me know if you know any software for this purpose.


Likewise; we prefer Open Source for several reasons, and I'd love to
hear about other options as well. But, sometimes, buying a commercial
package makes sense.

Dave Hinz

David Magda

2005-08-18, 8:50 pm

Dave Hinz <DaveHinz@spamcop.net> writes:

> On Thu, 18 Aug 2005 10:53:25 +0200, Vincent van Scherpenseel wrote:

[...]
>
> Likewise; we prefer Open Source for several reasons, and I'd love to
> hear about other options as well. But, sometimes, buying a commercial
> package makes sense.


You may be interested in some of the papers and the mailing list over
at:

http://www.infrastructures.org/

There's a similar mailing list for network people:

http://www.greatcircle.com/lists/network-automation/

--
David Magda <dmagda at ee.ryerson.ca>
Because the innovator has for enemies all those who have done well under
the old conditions, and lukewarm defenders in those who may do well
under the new. -- Niccolo Machiavelli, _The Prince_, Chapter VI
David Magda

2005-08-18, 8:50 pm

Vincent van Scherpenseel <reply@newsgroup.invalid> writes:

> Our company has more than 100 servers running all different kinds of
> services which are currently all documented. The problem is: after a
> couple of months the piece of paper will be worthless if it doesn't
> get updated by the system administrators logging what they changed.


For procedures a Wiki could be useful. Also, a weblog where people can
post could also be useful for simple "heads up" posts about changes or
planned changes.

> There are several administrators working on the servers and the
> problem is that not everything which gets changed will be
> logged. People forget about it, or just don't care to log.


A combination of restricting root access, using sudo, and something
like RCS/CVS/Subversion may encourage people to 'follow procedures'.

> Management has now issued a new policy requiring *everyone* to log
> the changes. Unfortunately, checking all the servers if the
> administrators are living up to the policy is a very time-consuming
> task.


Discipline comes from inside, not from outside. (I think I got that
from a fortune cookie.

> Unfortunately I can't use snmp, since that only lists services
> *currently* running, no cronjobs and no configuration files etc.


SNMP (or other monitoring system that uses SNMP) should be looked to
help monitor how things are running. The system administrators should
be one of the first people to know when things aren't working
properly. Something like Nagios doesn't cost a penny, and isn't too
difficult to set up.

> The servers are running different versions of Linux and FreeBSD.


I would look at radmind:

http://rsug.itd.umich.edu/software/radmind/

Perhaps cfengine as well:

http://www.cfengine.org/

In another post I mention infrastructures.org; go through the mailing
list archives as this has been discussed a couple of times.

--
David Magda <dmagda at ee.ryerson.ca>
Because the innovator has for enemies all those who have done well under
the old conditions, and lukewarm defenders in those who may do well
under the new. -- Niccolo Machiavelli, _The Prince_, Chapter VI
Michael Vilain

2005-08-18, 8:50 pm

In article <m23bp6lrz2.fsf@gandalf.local>,
David Magda <dmagda+trace050401@ee.ryerson.ca> wrote:

> Vincent van Scherpenseel <reply@newsgroup.invalid> writes:
>
>
> For procedures a Wiki could be useful. Also, a weblog where people can
> post could also be useful for simple "heads up" posts about changes or
> planned changes.
>
>
> A combination of restricting root access, using sudo, and something
> like RCS/CVS/Subversion may encourage people to 'follow procedures'.
>
>
> Discipline comes from inside, not from outside. (I think I got that
> from a fortune cookie.
>
>
> SNMP (or other monitoring system that uses SNMP) should be looked to
> help monitor how things are running. The system administrators should
> be one of the first people to know when things aren't working
> properly. Something like Nagios doesn't cost a penny, and isn't too
> difficult to set up.
>
>
> I would look at radmind:
>
> http://rsug.itd.umich.edu/software/radmind/
>
> Perhaps cfengine as well:
>
> http://www.cfengine.org/
>
> In another post I mention infrastructures.org; go through the mailing
> list archives as this has been discussed a couple of times.


All the reporting systems in the world aren't going to help you at all
if the admins aren't held accountable for any unreported changes they
make. We had someone like this at my last contract but she was the
boss's "favorite" and could do no wrong. So she got away with it. The
rest of us had to do project plans, design review, and change control
for every change. At least we could test stuff on the "test" systems.
Development systems were essentially production environments where the
developers lived and played. They didn't like the fact that they had to
request changes that had to be reviewed but their boss just told them to
shut up and he took the heat.

One of the things they implemented was taking root away from everyone
but line managers. They had sealed envelopes with root passwords that
they only opened if there was an outage that precluded sudo or
Powerbroker from running (networked sudo with a non-shelling vi). Since
that was the only way to get root and it logged to a central system, we
had records of changes people made. Another thing that was a lifesaver
was always requiring a reboot after every change to ensure that (a) the
change didn't screw anything up and it would boot and (b) no flakey
hardware had cropped up. Otherwise, each system was usually up for 3-6
months.

Be prepared for staff turnover if you implement this type of
environment. Some admins just won't like it and will rebel, either
leaving or being fired. If you have older admins, they tend to like
controlled environments like this. It allows for fewer emergency calls.

--
DeeDee, don't press that button! DeeDee! NO! Dee...



Dave Hinz

2005-08-19, 5:54 pm

On Thu, 18 Aug 2005 17:42:51 -0700, Michael Vilain <vilain@spamcop.net> wrote:
>
> All the reporting systems in the world aren't going to help you at all
> if the admins aren't held accountable for any unreported changes they
> make.


Right. Which is why if you use a tool which _automates_ the
documentation, it's the best way to accomplish it. If it automates
documentation and makes the job easier (let the admin figure out "what
to do" and then have a "and now go do that 100 times" button) it's more
likely to be used and welcomed by your staff.

> One of the things they implemented was taking root away from everyone
> but line managers.


I'm not sure what "line manager" means in your world, but here's how we
do it - we _have_ the root passwords in an encrypted database, but the
only time I use them is when I'm doing something that involves logging
in to single-user mode (usually patch clusters). All the day-to-day
work is done with sudo which is logged (individually on the servers at
this time, so of limited value except for investigating if something
went wrong).

> Another thing that was a lifesaver
> was always requiring a reboot after every change to ensure that (a) the
> change didn't screw anything up and it would boot and (b) no flakey
> hardware had cropped up.


Every change? Ouch. Our policy is that if you do something that needs
to start at boot, you test it by running the rc?.d script that init will
run at boot, to start it up. Has been pretty successful, but we only
have 6 Unix admins to keep honest, so it's not too bad.

> Be prepared for staff turnover if you implement this type of
> environment. Some admins just won't like it and will rebel, either
> leaving or being fired. If you have older admins, they tend to like
> controlled environments like this. It allows for fewer emergency calls.


Well, that's why making it painless and pleasant is preferable to being
dictatorial. But yeah, if people don't want to be accountable for what
they do, that's a problem.

Dave Hinz
Michael Vilain

2005-08-19, 5:54 pm

In article <3mmce5F17l8afU11@individual.net>,
Dave Hinz <DaveHinz@spamcop.net> wrote:

> On Thu, 18 Aug 2005 17:42:51 -0700, Michael Vilain <vilain@spamcop.net> wrote:
>
> Right. Which is why if you use a tool which _automates_ the
> documentation, it's the best way to accomplish it. If it automates
> documentation and makes the job easier (let the admin figure out "what
> to do" and then have a "and now go do that 100 times" button) it's more
> likely to be used and welcomed by your staff.


Here's the meat of the problem we had: how do you document changes to a
system? Unless you have a snapshop before and after of every relevant
file, software configuration (e.g. oracle layout on raw filesystems) and
hardware configuration, how do you figure out what's changed via a
script or program? For some of the big systems, it would take quite a
long time to run an auditing program and a lot of storage to keep those
records.

I was a fan of a "site logbook" for each system that required the admin
to fill out a running dialog of what they were doing as they were doing
it during a change (aka a "devolution"). But this model breaks down
with a roomful of servers. We had a summer intern create an Access
database that we all had to fill out _daily_ of what each of us did. It
was mailed to managers and section heads daily. If there was a system
change or outage scheduled for that night, there's better be an entry in
the database for it the next morning unless you're still working on it.
It also helped the various shifts communicate with each other so we knew
what happened last night just by reading the email (I also scanned log
files just to check--I caught a few problems by noticing differences in
"that's not what it's usually like").

The automation method only really works if you have each system audited
down to the serial number on each board in terms of hardware and total
software configuration. Many of the servers were "one-offs" running a
single application (e.g. finance or MRP or documentation or
trouble-tickets or email & calendaring or the 100+ company-private web
sites). Knowing what disks had what on them, their layout, memory, disk
controllers, tape drives, and even the crontabs that ran nightly was all
important and had to be tracked.

>
>
> I'm not sure what "line manager" means in your world, but here's how we
> do it - we _have_ the root passwords in an encrypted database, but the
> only time I use them is when I'm doing something that involves logging
> in to single-user mode (usually patch clusters). All the day-to-day
> work is done with sudo which is logged (individually on the servers at
> this time, so of limited value except for investigating if something
> went wrong).


Line managers at this place were those that managed the people that did
things. They didn't do things themselves accept to assign tasks,
prioritize and go to endless meetings. They may have done the grunt
work some years ago, but have since become a manager of grunts.

>
>
> Every change? Ouch. Our policy is that if you do something that needs
> to start at boot, you test it by running the rc?.d script that init will
> run at boot, to start it up. Has been pretty successful, but we only
> have 6 Unix admins to keep honest, so it's not too bad.


Well, some developers are rather blithe about changing system parameters
because Oracle or some vendor tells them to do so. Some of those
parameters affect things like the SGA or the maximum open files. We had
no problem changing them on the development systems overnight with a
reboot to test if the change screwed up the startup of Oracle or the
backups or other stuff. It was really a life saver.

Stuff like adding printers or accounts or day-to-day stuff that's in
written procedures wasn't at issue. It was system changes that merit
the reboot verification. Developers changing the Oracle environment had
the wrath of their boss to deal with and wasn't really our problem.
We'd sort of get bent out of shape when they change something that
caused Oracle to fail to restart after backups.

>
>
> Well, that's why making it painless and pleasant is preferable to being
> dictatorial. But yeah, if people don't want to be accountable for what
> they do, that's a problem.
>
> Dave Hinz


Well, the last contract _was_ rather dictatorial about such things.

--
DeeDee, don't press that button! DeeDee! NO! Dee...



Dave Hinz

2005-08-19, 5:54 pm

On Fri, 19 Aug 2005 13:19:35 -0700, Michael Vilain <vilain@spamcop.net> wrote:
> In article <3mmce5F17l8afU11@individual.net>,
> Dave Hinz <DaveHinz@spamcop.net> wrote:


>
> Here's the meat of the problem we had: how do you document changes to a
> system?


Document, or document _usably_? Sadly, not a lot of overlap.

> Unless you have a snapshop before and after of every relevant
> file, software configuration (e.g. oracle layout on raw filesystems) and
> hardware configuration, how do you figure out what's changed via a
> script or program?


Right. A centralized tool that allows you to make the changes, provides
snapshots, and easy backout and automation would be the ideal. From
their claims, bladelogic is just that, and the fact that a trusted
friend who now works for them is still enthusiastic about it leads me to
believe that it's more true than "marketing fluff".


> For some of the big systems, it would take quite a
> long time to run an auditing program and a lot of storage to keep those
> records.


Well, if all changes are made by a mechanism that tracks, then you know
what changes are made, by definition. It's a different way of working,
though, and the best way to get something like that adopted is to have
using it be less work than doing it the normal way. If it's harder
_and_ a hassle, it'll get ignored.

> I was a fan of a "site logbook" for each system that required the admin
> to fill out a running dialog of what they were doing as they were doing
> it during a change (aka a "devolution"). But this model breaks down
> with a roomful of servers. We had a summer intern create an Access
> database that we all had to fill out _daily_ of what each of us did. It
> was mailed to managers and section heads daily. If there was a system
> change or outage scheduled for that night, there's better be an entry in
> the database for it the next morning unless you're still working on it.


Or unless it's monday and you completely forgot what you did friday.

> It also helped the various shifts communicate with each other so we knew
> what happened last night just by reading the email (I also scanned log
> files just to check--I caught a few problems by noticing differences in
> "that's not what it's usually like").


I wish I had the time to know my logfiles personally, but with 6 guys
and 100-ish servers, it's just not going to happen. Hell, I can't even
remember all the sites we host anymore.

> The automation method only really works if you have each system audited
> down to the serial number on each board in terms of hardware and total
> software configuration. Many of the servers were "one-offs" running a
> single application (e.g. finance or MRP or documentation or
> trouble-tickets or email & calendaring or the 100+ company-private web
> sites).


Well, to some extent, I think. Again, we don't have it in yet so I'm
somewhat speculating, but...our webserver cluster is a series of
identical enough boxes. If, for instance, I want to turn off some
encryption method on all apache instances...let's see, that's close to
100 of 'em. Too many files to edit by hand for my comfort. In that
case, the boxes don't need to be identical, and the files are _not_
identical, but on all of 'em I need to change the line which says
"blah +blurgh"
....to just say
"blurgh"

Sure, I could do some foreach server in (list) type thing, but there's
no tracking. If I use the tool for it, it's tracked, the old version of
the file is saved, and I can revert if I need to. All of these things
are, of course, scriptable, this just puts a framework and a boatload of
sample scripts to start with.

> Knowing what disks had what on them, their layout, memory, disk
> controllers, tape drives, and even the crontabs that ran nightly was all
> important and had to be tracked.


Yup.

[vbcol=seagreen]
> Line managers at this place were those that managed the people that did
> things. They didn't do things themselves accept to assign tasks,
> prioritize and go to endless meetings. They may have done the grunt
> work some years ago, but have since become a manager of grunts.


Ah. The position I'm trying to avoid, at least for now. Got it.

[vbcol=seagreen]
> Well, some developers are rather blithe about changing system parameters
> because Oracle or some vendor tells them to do so. Some of those
> parameters affect things like the SGA or the maximum open files. We had
> no problem changing them on the development systems overnight with a
> reboot to test if the change screwed up the startup of Oracle or the
> backups or other stuff. It was really a life saver.


Something to consider, anyway, yes.

[vbcol=seagreen]
> Well, the last contract _was_ rather dictatorial about such things.


Customer gets to set the rules, after all. We've got some, well, let's
just say large financial institutions whose names probably appear "in
your wallet" that we deal with, and the demands of some of them are
pretty strict. It's doubly ironic when those same companies show up on
the front page of the WSJ for data security breaches, which, if they
followed what they force us to follow, couldn't happen.

Topic drift anyone? Sorry about that.

Ulrich Herbst

2005-08-25, 6:04 pm

Vincent van Scherpenseel <reply@newsgroup.invalid> writes:

> Is there any software out there which is able to check remote servers on
> their running services and their configuration? I need to know which
> services are running, where their configuration lives, where they're
> logging to, where theier data is stored (if any), what their dependencies
> are, which cronjobs are planned and when, ...


Try ServDoc
http://servdoc.sourceforge.net/

It documents many "standard" services, configurations,... .
All you need to do is to run it (it's just one PERL script) on a
regular basis and collect the results centrally.

It's easy to add documentation for new services.
Or ask the maintainer :-)

Uli

--
'''
(0 0)
+------oOO----(_)--------------+
| |
| Ulrich Herbst |
| |
| Tel. ++49-7271-940775 |
| |
| Ulrich.Herbst@gmx.de |
+-------------------oOO--------+
|__|__|
|| ||
ooO Ooo
jpd

2005-09-20, 6:05 pm

Begin <m23bp6lrz2.fsf@gandalf.local>
On 2005-08-18, David Magda <dmagda+trace050401@ee.ryerson.ca> wrote:
[snip!]
>
> I would look at radmind:
>
> http://rsug.itd.umich.edu/software/radmind/
>
> Perhaps cfengine as well:
>
> http://www.cfengine.org/
>
> In another post I mention infrastructures.org; go through the mailing
> list archives as this has been discussed a couple of times.


Thanks for the links and sorry for re-awakening a rather old thread,
'twas interesting enough to drop the question here;

Are there experiences with the arusha project/ARK here? I had a shot
at it once, but then that test-server got usurped in one of the many
other projects and I kinda forgot about it. It is Python based which I
don't like too much, but if the benefits are big enough that's easily
overlooked, of course.

http://ark.sourceforge.net/


--
j p d (at) d s b (dot) t u d e l f t (dot) n l .
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com