Web Server forum
Back To The Forum Home!Search!Private Messaging System

This is Interesting: Free IT Magazines Now Free shipping to   
Web Server Talk Web Server Talk > Unix and Linux reviews > Mandrake Linux Support > Removing a RAID disk




  Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    Removing a RAID disk  
Iain Miller


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
11-13-05 10:50 PM

I am pretty new to Linux & at this stage I am just playing with it on a
spare PC with a view to setting this up as a simple file server. I am using
Mandrake 10

The machine has 3 disks in it - one holds the OS and the other two are on a
separate controller (1 channel for each) and are configured as a Raid 1
array. So I have "hda" with the OS in it & "hde" & "hdg" which are
partitioned as Linux Raid (using diskdrake) & then added to Md0 & formatted
as a single ext3 partition. All the disks are IDE

These will be for data storage only. I have got Samba working and I can see
& map folders/drives from my Windows workstations - all good so far. Cat
/proc/mdstat shows I have a working Raid 1 array. I've done most of this
using Mandrake's GUI tools & Webmin.

As it turns out one of the two Raid disks (which are brand new) is making an
irritating squeaking noise (this disk is hdg) & I need to remove it and RMA
it - an ideal opportunity to learn how to take a failed disk out & replace
it! Only it hasn't actually failed so Linux doesn't know there's a problem &
isn't running the array in degraded mode.

So the question is simply this - how do I stop one of the disks  & reduce
the thing to running in degraded mode so I can power the machine down,
remove the disk & then power the machine back up & have it continue to run
for the next week or so on just one disk in the Raid array while I RMA the
other one? Once I get the new disk back I then need to power the thing down
again, re-install the new disk and get the Raid array to rebuild the morror.

So far I tried just powering it down & just unplugging the disk & doesn't
like that when it reboots. Then I tried removing the disk from the Raid
array (using diskdrake) and it doesn't like that (says you can't have only 1
disk in the array).

I've tried unmounting the array (again using diskdrake) and it won't let me
because it says the device or resource is busy (can't figure out why, I
tried stopping Samba & Logged off any Windows PCs that might be mapping
drives on it)

Spent two hours trawling google for a simple "a,b,c" on how to do this & I
can;t find one.

Any help much appreciated

I.







[ Post a follow-up to this message ]



    Re: Removing a RAID disk  
Aragorn


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
11-13-05 10:50 PM

On Sunday 13 November 2005 15:00, Iain Miller stood up and spoke the
following words to the masses in /alt.os.linux.mandrake...:/

> I am pretty new to Linux & at this stage I am just playing with it on
> a spare PC with a view to setting this up as a simple file server. I
> am using Mandrake 10
>
> The machine has 3 disks in it - one holds the OS and the other two are
> on a separate controller (1 channel for each) and are configured as a
> Raid 1 array. So I have "hda" with the OS in it & "hde" & "hdg" which
> are partitioned as Linux Raid (using diskdrake) & then added to Md0 &
> formatted as a single ext3 partition. All the disks are IDE

Okay, software RAID then... ;-)

> These will be for data storage only. I have got Samba working and I
> can see & map folders/drives from my Windows workstations - all good
> so far. Cat /proc/mdstat shows I have a working Raid 1 array. I've
> done most of this using Mandrake's GUI tools & Webmin.

GUI tools are handy, but they become a pain in the butt if that's all
you allow yourself to rely on... :-/

> As it turns out one of the two Raid disks (which are brand new) is
> making an irritating squeaking noise (this disk is hdg) & I need to
> remove it and RMA it - an ideal opportunity to learn how to take a
> failed disk out & replace it! Only it hasn't actually failed so Linux
> doesn't know there's a problem & isn't running the array in degraded
> mode.

You could try running /badblocks/ on the disk to make sure.  On the
other hand, the noise could also come from one of the balls in the
bearing of the disk's spindle.  That can be irritating, yes...

> So the question is simply this - how do I stop one of the disks  &
> reduce the thing to running in degraded mode so I can power the
> machine down, remove the disk & then power the machine back up & have
> it continue to run for the next week or so on just one disk in the
> Raid array while I RMA the other one? Once I get the new disk back I
> then need to power the thing down again, re-install the new disk and
> get the Raid array to rebuild the morror.

As far as I know, you could simply stop the RAID and prevent it from
being initialized upon boot again, since it's a mirrorset and therefore
you still have all your data available.

> So far I tried just powering it down & just unplugging the disk &
> doesn't like that when it reboots.

Define "doesn't like that when it reboots"?  You would of course get to
see error messages or warnings that the array is running in degraded
mode, but it should be perfectly possible to boot up like that and to
have the system running as such for the time being.

> Then I tried removing the disk from the Raid array (using diskdrake)
> and it doesn't like that (says you can't have only 1 disk in the
> array).

True, but you are probably not following the correct procedure.  In
order to tell a software RAID array that you want to remove a disk from
the array, you need to stop the array first.

See the following information...:

man -k raid
man raidstart
man raidstop

;-)

> I've tried unmounting the array (again using diskdrake) and it won't
> let me because it says the device or resource is busy (can't figure
> out why, I tried stopping Samba & Logged off any Windows PCs that
> might be mapping drives on it)

Check whether you have /famd/ running, and verify what processes are
using the filesystem(s).  See the following information...:

man fuser
man lsof

;-)

> Spent two hours trawling google for a simple "a,b,c" on how to do this
> & I can;t find one.
>
> Any help much appreciated

There should normally be at least one /HowTo/ installed on your system
in */usr/share/doc* (or below) somewhere regarding software RAID.  It
may be a bit outdated, but it should be usable.

If you can't find the document, check at the Linux Documentation Project
for the appropriate /HowTos./  You can find them here...:

http://www.tldp.org

Hope this helps... ;-)
--
With kind regards,

*Aragorn*
(Registered GNU/Linux user #223157)





[ Post a follow-up to this message ]



    Re: Removing a RAID disk  
Iain Miller


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
11-13-05 10:50 PM


> Okay, software RAID then... ;-)

Indee.....

> GUI tools are handy, but they become a pain in the butt if that's all
> you allow yourself to rely on... :-/

I am trying to learn some command line stuff but also balancing that off
against actually getting stuff to happen & not screwing the system up!

> You could try running /badblocks/ on the disk to make sure.  On the
> other hand, the noise could also come from one of the balls in the
> bearing of the disk's spindle.  That can be irritating, yes...

The disk is fine - just noisy & as its brand new its going back!

> As far as I know, you could simply stop the RAID and prevent it from
> being initialized upon boot again, since it's a mirrorset and therefore
> you still have all your data available.

Thats kind of the issue, how do I stop it and just get it to run on one disk
for a week. All the helps & how-tos I've seen seem to basically assume you
have a replacement disk to hand - and I don't.
 
>
> Define "doesn't like that when it reboots"?  You would of course get to
> see error messages or warnings that the array is running in degraded
> mode, but it should be perfectly possible to boot up like that and to
> have the system running as such for the time being.

The boot process stops when it tries to start /Dev/md0 & offers me a reboot
or dropping to a command line to "repair the raid" (I think it says). I have
no clue what to do at that point!
 
[vbcol=seagreen]
> True, but you are probably not following the correct procedure.  In
> order to tell a software RAID array that you want to remove a disk from
> the array, you need to stop the array first.

INdeed and I guess I need to unmount it first which I can't seem to do at
the moment (running processes & all that)
 
>
> Check whether you have /famd/ running, and verify what processes are
> using the filesystem(s).  See the following information...:
>
> There should normally be at least one /HowTo/ installed on your system
> in */usr/share/doc* (or below) somewhere regarding software RAID.  It
> may be a bit outdated, but it should be usable.

As above, all the help I have seen assumes you have a spare disk to
hand.....

> If you can't find the document, check at the Linux Documentation Project
> for the appropriate /HowTos./  You can find them here...:

See above!

> Hope this helps... ;-)

It did but I'm still "in the woods"

thanks

I.







[ Post a follow-up to this message ]



    Re: Removing a RAID disk  
Walter Mautner


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
11-14-05 12:48 PM

Iain Miller wrote:

> I am pretty new to Linux & at this stage I am just playing with it on a
> spare PC with a view to setting this up as a simple file server. I am
> using Mandrake 10
>
> The machine has 3 disks in it - one holds the OS and the other two are on
> a separate controller (1 channel for each) and are configured as a Raid 1
> array. So I have "hda" with the OS in it & "hde" & "hdg" which are
> partitioned as Linux Raid (using diskdrake) & then added to Md0 &
> formatted as a single ext3 partition. All the disks are IDE
>
> These will be for data storage only. I have got Samba working and I can
> see & map folders/drives from my Windows workstations - all good so far.
> Cat /proc/mdstat shows I have a working Raid 1 array. I've done most of
> this using Mandrake's GUI tools & Webmin.
>
> As it turns out one of the two Raid disks (which are brand new) is making
> an irritating squeaking noise (this disk is hdg) & I need to remove it and
> RMA it - an ideal opportunity to learn how to take a failed disk out &
> replace it! Only it hasn't actually failed so Linux doesn't know there's a
> problem & isn't running the array in degraded mode.
>
Guess you have /home on your md0? A "fdisk -l" output wouild help here.
Probably you have to go single-user mode (init 1), unmount the md0 and use
mdadm (you might first want to urpmi for the current version) to first stop
and then "degrade" and restart the array.
Carefully read "mdadm --manage --help", in particular the "--fail" and
"--add" options.
A good backup, however, can never hurt.






[ Post a follow-up to this message ]



    Re: Removing a RAID disk  
Iain Miller


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
11-14-05 12:48 PM


> Guess you have /home on your md0? A "fdisk -l" output wouild help here.
> Probably you have to go single-user mode (init 1), unmount the md0 and use
> mdadm (you might first want to urpmi for the current version) to first
> stop
> and then "degrade" and restart the array.
> Carefully read "mdadm --manage --help", in particular the "--fail" and
> "--add" options.

Walter, thanks for that.

You are right about the home directories - I moved them back to the boot
disk & that allowed me to unmount the Raid.

I tried to install mdadm but couldn't - something to do with some
"dependency" - I guess some other package I need to find first. Not yet
familar with urpmi & what it does but I'll go & look.

I tried the "setraidfaulty" command & cat /proc/mdstat reported the disk
with an [F] flag next to it. Even doing that when I removed it from the
machine it wouldn't boot again & dropped to a "raidrepair" command line. As
per my other message to Aragorn, all the stuff I've seen about replacing
raid disks seems to assume you have a spare disk to hand & I don't - I need
to run the thing on one disk for a week or so while I RMA the other one.

> A good backup, however, can never hurt.

Indeed so :-) - fortunately there is basically no data on the thing at this
time. However, this is all good learning stuff for what I'd need to do in
the event of a real failure with real data at stake.

Because of the above (re seeming to need a disk to hand) I'm leaning towards
having to copy everything off the raid, destroying the raid & repartitioning
the remaining disk as a simple file system so that I can physically remove
the bad disk - & then copying everything back to the remaining disk. Having
done that when I get a replacement I would have to copy everything off the
thing again, add the new disk & then recreate the Raid from scratch & then
copy everything back onto it. That said I am going to continue to try & get
the thing to work the way I want it to.

A bit dull, but the Raid will still be doing its job & preserving my data in
the event of a disk failure & I will still be able to get the data off the
disk(s).

On another (but related) subject - can you think of a way that I could get
the box to notify me in the event of a failed disk? Ideally by some kind of
pop-up message across the network rather than e-mail (mostly because I
havn't got as far as trying to configure anything like Postfix yet!)

regards

Iain







[ Post a follow-up to this message ]



    Re: Removing a RAID disk  
BearItAll


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
11-14-05 10:53 PM

On Sun, 13 Nov 2005 22:51:17 +0000, Iain Miller wrote:

> 
>
> Indee.....
> 
>
> I am trying to learn some command line stuff but also balancing that off
> against actually getting stuff to happen & not screwing the system up!
> 
>
> The disk is fine - just noisy & as its brand new its going back!
> 
>
> Thats kind of the issue, how do I stop it and just get it to run on one di
sk
> for a week. All the helps & how-tos I've seen seem to basically assume you
> have a replacement disk to hand - and I don't.
> 
>
> The boot process stops when it tries to start /Dev/md0 & offers me a reboo
t
> or dropping to a command line to "repair the raid" (I think it says). I ha
ve
> no clue what to do at that point!
> 
> 
>
> INdeed and I guess I need to unmount it first which I can't seem to do at
> the moment (running processes & all that)
> 
>
> As above, all the help I have seen assumes you have a spare disk to
> hand.....
> 
>
> See above!
> 
>
> It did but I'm still "in the woods"
>
> thanks
>
> I.

I think your getting yourself confused.

When you first set up your raid-1 you added (banked) two drives into an
array for this raid and probably named it. That became, in your case,
raid drive md0. From this point think of the raid drive (or rather
raid array) md0 as if that were a single physical drive. It would be nice
to have an interim word to fit between the physical drives and the
volumes, but we haven't got one so raid array or raid drive is the best we
can do.

physical drive 1\
physical drive 2/-......raid array...volumes


Then your created your partitions on the Raid array md0, maybe loaded
some data/software etc. At this point although your data was present on
the md0, it was not yet present on both physical drives of the array.

That mirroring will of taken some time, depending on how much data was
involved and how busy the server. A 100G would trickle across the array of
Raid 1 in what, something like 2 hours maybe? Unless you told the array to
be agressive in it's mirroring which is really only necessary to boost the
mirroring of very large arrays.

Then you decided you needed to remove one of the drives from the array.

I'm afraid I have to be a bit sparse with detail here, because your
using a raid controller I haven't used. But, you bring up your control,
you don't actually have to delete all knowledge of the drive from the
array, you only tell it that 'this physical drive' is 'disabled' or
'removed' or 'parked'.

The system then comes up (if yours only offers the raid array control on
startup). Then as far as you the user are concerned your volume is
unafected. Though obviously they is no mirroring taking place.

Then when your drive is ready to be fitted again, put it in, 'enable' or
'unpark' the drive and the mirror will start once the system has settled.

Hypothetically, what you could have done if you had had another drive, is
bank this other drive with the array controller, then when one of the pair
had to be removed this other drive would have been 'enabled' and become
the mirror.

Many that use mirrors will do just that, simply because it can be very
easy to miss one side of a mirror dieing, even if it is only missed
because it so rarely fails that we become complacent, regular swaps after
ensuring the mirror is complete, means that all drives in the array are
being tested.

raid-1 is only a two drive array, raid-5 allows a great deal more
flexibility. I don't think there is a direct convert path from raid-1 to
raid-5. But as you said its all meant as a training system for you, then
changing to raid-5 may be a good idea.







[ Post a follow-up to this message ]



    Re: Removing a RAID disk  
Walter Mautner


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
11-14-05 10:53 PM

Iain Miller wrote:

> 
>
> Walter, thanks for that.
>
> You are right about the home directories - I moved them back to the boot
> disk & that allowed me to unmount the Raid.
>
>  I tried to install mdadm but couldn't - something to do with some
> "dependency" - I guess some other package I need to find first. Not yet
> familar with urpmi & what it does but I'll go & look.
>
It's really simple. You may want to google for "easy urpmi" and follow the
advices, you will get a list of commands to paste into a konsole - kde
terminal - window (do a "su -" beforehand) that will give you online access
to main, update, contrib, Java and even the plf online sources.
A "urpmi.update -a" afterwards updates the sourcelists, and then you can
type "urpmi mdadm" and lean back. Your dependencies will be resolved
automagically.
After doing the magic easyurpmi stuff, you can even use Mandriva Control
Centre for regular source updates and browsing for / installing software.
Regarding linux sw raid, the old-fashioned way uses a /etc/raidtab file with
geometry data while mdadm uses "persistent superblocks" and does not rely
upon raidtab file anymore, but still keeps some information
in /etc/mdadm.conf file.
Unfortunately Mandrivas partitioning tool will be unable to show the size of
raid arrays created with mdadm and without the raidtab file, making it act
a bit funny.

> I tried the "setraidfaulty" command & cat /proc/mdstat reported the disk
> with an [F] flag next to it. Even doing that when I removed it from th
e
> machine it wouldn't boot again & dropped to a "raidrepair" command line.
> As per my other message to Aragorn, all the stuff I've seen about
> replacing raid disks seems to assume you have a spare disk to hand & I
> don't - I need to run the thing on one disk for a week or so while I RMA
> the other one.
>
Hmmm. I have had my box running "degraded" raid1, and no boot problem.
Actually my box boots from a md device in raid1 mode.
If you moved your /home off the raid, however, you can temporarily even
remove the /home stanza from /etc/fstab before you shutdown and remove the
drive.
 
>
> Indeed so :-) - fortunately there is basically no data on the thing at
> this time. However, this is all good learning stuff for what I'd need to
> do in the event of a real failure with real data at stake.
>
Mdadm has some advantages compared to the good old raidtools.
Btw., do a "fdisk -l>/tmp/fstab.txt" and paste it into a reply.
Same with your fstab file, would be nice.

> Because of the above (re seeming to need a disk to hand) I'm leaning
> towards having to copy everything off the raid, destroying the raid &
> repartitioning the remaining disk as a simple file system so that I can
> physically remove the bad disk - & then copying everything back to the
> remaining disk. Having done that when I get a replacement I would have to
> copy everything off the thing again, add the new disk & then recreate the
> Raid from scratch & then copy everything back onto it. That said I am
> going to continue to try & get the thing to work the way I want it to.
>
That should not be necessary. It would contradict the purpose of a raid 
...
> On another (but related) subject - can you think of a way that I could get
> the box to notify me in the event of a failed disk? Ideally by some kind
> of pop-up message across the network rather than e-mail (mostly because I
> havn't got as far as trying to configure anything like Postfix yet!)
>
Mail is the standard notification method, but "man mdadm.conf" will give a
good explanation on alternatives. You may have to use "PROGRAM" and a
script of your choice, maybe Xdialog or linpopup.





[ Post a follow-up to this message ]



    Re: Removing a RAID disk  
Iain Miller


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
11-15-05 01:48 AM

> I think your getting yourself confused.

wouldn't be a first (!)

> When you first set up your raid-1 you added (banked) two drives into an
> array for this raid and probably named it. That became, in your case,
> raid drive md0. From this point think of the raid drive (or rather
> raid array) md0 as if that were a single physical drive. It would be nice
> to have an interim word to fit between the physical drives and the
> volumes, but we haven't got one so raid array or raid drive is the best we
> can do.

> physical drive 1\
> physical drive 2/-......raid array...volumes

Yep, understand all that

> Then your created your partitions on the Raid array md0, maybe loaded
> some data/software etc. At this point although your data was present on
> the md0, it was not yet present on both physical drives of the array.

> That mirroring will of taken some time, depending on how much data was
> involved and how busy the server. A 100G would trickle across the array of
> Raid 1 in what, something like 2 hours maybe? Unless you told the array to
> be agressive in it's mirroring which is really only necessary to boost the
> mirroring of very large arrays.

OK - but not much data added yet

> Then you decided you needed to remove one of the drives from the array.

Yes

> I'm afraid I have to be a bit sparse with detail here, because your
> using a raid controller I haven't used.

This is Linux software raid - I'm not sure the controller matters. FWIW its
a Silicone Image UDMA raid controller - but there is no array set up on the
controller because Linux doesn't support it - there is no driver that I know
of. As such its basically just acting as a UDMA IDE controller. All the RAID
stuff is done in software under linux

> But, you bring up your control, you don't actually have to delete all
> knowledge of the drive from the
> array, you only tell it that 'this physical drive' is 'disabled' or
> 'removed' or 'parked'.

I did that (or at least I thought I did) by trying "setraidfaulty" and I
also tried "raidhotremove". In both cases shutting the box down, removing
the disk & powering the thing back on caused an error that just dropped me
to a "Raidrepair" command prompt.

> The system then comes up (if yours only offers the raid array control on
> startup). Then as far as you the > user are concerned your volume is
> unafected. Though obviously they is no mirroring taking place.

That's the theory.....but so far I am unable to reach that point in practise

> Then when your drive is ready to be fitted again, put it in, 'enable' or
> 'unpark' the drive and the mirror will start once the system has settled.

See above!

> Hypothetically, what you could have done if you had had another drive, is
> bank this other drive with the array controller, then when one of the pair
> had to be removed this other drive would have been 'enabled' and become
> the mirror.

As I said, I don't have a 3rd drive

> Many that use mirrors will do just that, simply because it can be very
> easy to miss one side of a mirror dieing, even if it is only missed
> because it so rarely fails that we become complacent, regular swaps after
> ensuring the mirror is complete, means that all drives in the array are
> being tested.

Next step is to find some way to monitor the mirror from my Win XP
workstation - or rather have the Lnux box notify me if there is a problem

> raid-1 is only a two drive array, raid-5 allows a great deal more
> flexibility. I don't think there is a direct convert path from raid-1 to
> raid-5. But as you said its all meant as a training system for you, then
> changing to raid-5 may be a good idea.

Would require a whole lot of hardware (controllers, disks & probably a new
PC) that I don't have.

Raid 1 will do me fine for what I need. I'm just trying to sort out how I
can fix it if it does break. Its all well & good having the thing but not
much help if I can't do that!

thanks for your input

Iain







[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 02:19 AM.      Post New Thread    Post A Reply      
  Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 

Back To The Top
Home | Usercp | Faq | Register