Am I endangering my RAID 5 array?
Web Server forum
Back To The Forum Home!Search!Private Messaging System

Web Server Talk Web Server Talk > WebserverTalk Community > Data Storage > Am I endangering my RAID 5 array?




  Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    Am I endangering my RAID 5 array?  
Yeechang Lee


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-07-05 10:47 PM

At home I have a Linux desktop and a headless Linux fileserver with a
software RAID 5 array (see
<URL:http://groups.google.ca/group/comp....5
254f5d>
for details).

A few days ago the UPS battery conked out. Besides the battery alarm
ringing every ten seconds and slowly driving me mad, the fileserver is
spontaneously rebooting several times a day; apparently momentary dips
and other irregularities in the power here in downtown San Francisco,
which the UPS had before filtered (and which likely prematurely aged
the battery after replacement only 19 months ago), are causing it to
reboot. (Interestingly, the Linux desktop hasn't hiccuped once;
apparently its power supply is less sensitive.)

The storage array is a pretty straightforward
JFS-on-LVM2-on-software-RAID 5 setup. Each time the server reboots it
usually causes the array to automatically rebuild. Sometimes the
reboots occur during the rebuilding process, causing it to restart.

My question: Am I risking data corruption through the repeated
rebuilds? Should I just shut the server down until the replacement UPS
battery arrives? Or, given that there haven't actually been any
hardware drive failures, are the RAID structure and filesystem robust
enough in the meanwhile?

--
<URL:http://www.pobox.com/~ylee/>			PERTH ----> *
Cpu(s): 48.6% us,  3.3% sy,  0.6% ni, 45.8% id,  1.5% wa,  0.1% hi,  0.0% si
Mem:    515416k total,   481280k used,    34136k free,     8212k buffers
Swap:  3052208k total,  1206032k used,  1846176k free,    56704k cached





[ Post a follow-up to this message ]



    Re: Am I endangering my RAID 5 array?  
Rod Speed


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-07-05 10:47 PM

Yeechang Lee wrote

> At home I have a Linux desktop and a headless
> Linux fileserver with a software RAID 5 array (see
> <URL:http://groups.google.ca/group/comp....br />
a5254f5d>
> for details).

> A few days ago the UPS battery conked out. Besides the battery alarm
> ringing every ten seconds and slowly driving me mad, the fileserver is
> spontaneously rebooting several times a day; apparently momentary
> dips and other irregularities in the power here in downtown San Francisco,
> which the UPS had before filtered (and which likely prematurely aged
> the battery after replacement only 19 months ago), are causing it to reboot.[/vbco
l]

Its much more likely the UPS itself is the cause of the reboots.
[vbcol=seagreen]
> (Interestingly, the Linux desktop hasn't hiccuped once;
> apparently its power supply is less sensitive.)

Yeah, thats not unusual.

> The storage array is a pretty straightforward
> JFS-on-LVM2-on-software-RAID 5 setup. Each time the server reboots it
> usually causes the array to automatically rebuild. Sometimes the
> reboots occur during the rebuilding process, causing it to restart.

> My question: Am I risking data corruption through the repeated rebuilds?

Yes.

> Should I just shut the server down until the replacement UPS battery arrives?[/vbc
ol]

It would be better to just plug it into the mains without the UPS.
[vbcol=seagreen]
> Or, given that there haven't actually been any
> hardware drive failures, are the RAID structure
> and filesystem robust enough in the meanwhile?

You're risking a reboot while writing and that can produce
significant turds on the drives with some drives.







[ Post a follow-up to this message ]



    Re: Am I endangering my RAID 5 array?  
Yeechang Lee


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-08-05 01:46 AM

Rod Speed wrote:
> Its much more likely the UPS itself is the cause of the reboots.

Makes sense. Or, to put it more accurately, likely there are
fluctuations in the power which are relatively harmless in real life
but which the UPS dutifully tries to fix up any way, but of course
can't because the battery is out (it's a nice model, in which the
battery is always providing the power regardless of whether power is
actually available or not, thus eliminating downtime when the power
does go out).
 
>
> It would be better to just plug it into the mains without the UPS.

I'll make the switch when I get home.
 
>
> Yes.

On the other hand, the only thing that's writing on the drive at the
moment is BitTorrent downloads, and that is an inherently
self-correcting mechanism, so I'm not too worried.

Filesystemwise, I'll run a fsck of the entire RAID once I remove the
UPS from the equation. (I'm curious as to how long it'll take on a
2.8TB array!)

--
<URL:http://www.pobox.com/~ylee/>			PERTH ----> *
Cpu(s): 49.1% us,  3.3% sy,  0.6% ni, 45.3% id,  1.5% wa,  0.1% hi,  0.0% si
Mem:    515416k total,   477372k used,    38044k free,     3720k buffers
Swap:  3052208k total,  1244744k used,  1807464k free,    67164k cached





[ Post a follow-up to this message ]



    Re: Am I endangering my RAID 5 array?  
Rod Speed


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-08-05 01:46 AM

Yeechang Lee wrote
> Rod Speed wrote
 
[vbcol=seagreen]
> Makes sense. Or, to put it more accurately, likely there
> are fluctuations in the power which are relatively harmless
> in real life but which the UPS dutifully tries to fix up any
> way, but of course can't because the battery is out

Its much more likely that it isnt actually attempting to switch
to the battery due to sags in the mains at that high rate.

> (it's a nice model, in which the battery is always providing the
> power regardless of whether power is actually available or not,

And thats why its likely that its not sags in the mains, just
the UPS not being able to work properly with failing batterys.
Its likely got a shorted cell and that means that the voltage
available from the battery isnt enough to provide a high enough
UPS output voltage to keep the server power supply happy now.

> thus eliminating downtime when the power does go out).

Yeah, always on UPSs are by far the best approach.

Tho they do have that downside if the batterys have gone bad.

I bet the reason the server reboots and the desktop
doesnt is just because the server has a much higher
load on its power supply and so its internal caps cant
ride thru much of a sag in the mains it sees from the UPS.
 
[vbcol=seagreen] 
[vbcol=seagreen]
> I'll make the switch when I get home.
 
[vbcol=seagreen] 
[vbcol=seagreen]
> On the other hand, the only thing that's writing on
> the drive at the moment is BitTorrent downloads,

Thats not correct with the rebuilds.

> and that is an inherently self-correcting
> mechanism, so I'm not too worried.

Sure, the main potential problem is that some drives
dont handle a power down while writing very well and
can produce bad sectors on the drive as a result of that.

> Filesystemwise, I'll run a fsck of the entire RAID
> once I remove the UPS from the equation. (I'm
> curious as to how long it'll take on a 2.8TB array!)

Yeah, it will be an interesting test.







[ Post a follow-up to this message ]



    Re: Am I endangering my RAID 5 array?  
Bill Todd


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-08-05 07:46 AM

Yeechang Lee wrote:

...

> On the other hand, the only thing that's writing on the drive at the
> moment is BitTorrent downloads, and that is an inherently
> self-correcting mechanism, so I'm not too worried.

A more insidious problem could be corrupted parity data, which you might
never see until some other failure occurred and you suddenly had to
depend upon it.  Validating (or, if that's not an available option, just
forcing a complete rebuild of) the parity data after you've eliminated
the problem of frequent restarts might be prudent (if you're truly
paranoid, you'll back all the data up first).

- bill





[ Post a follow-up to this message ]



    Re: Am I endangering my RAID 5 array?  
Yeechang Lee


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-08-05 07:46 AM

Bill Todd wrote:
> A more insidious problem could be corrupted parity data, which you
> might never see until some other failure occurred and you suddenly
> had to depend upon it.  Validating (or, if that's not an available
> option, just forcing a complete rebuild of) the parity data after
> you've eliminated the problem of frequent restarts might be prudent

Good point. However, I'm not aware of a way of forcing a parity
rebuild in Linux software RAID except for marking a drive as failed
then reinserting it into the array. In any case, the resulting resync
shouldn't be any different than the automatic postboot resyncing the
array is doing right now (after indeed having eliminated the
random-restarting problem by bypassing the faulty UPS), right?

> (if you're truly paranoid, you'll back all the data up first).

If you know of a cost- and time-effective way of backing up a 2.8TB
storage array being used for personal purposes, please let me
know. I'm not being flippant; if there is such a thing, I'd really
like to know! But I'm pretty sure there isn't one.

--
<URL:http://www.pobox.com/~ylee/>			PERTH ----> *
Cpu(s): 50.5% us,  3.4% sy,  0.7% ni, 43.8% id,  1.5% wa,  0.1% hi,  0.0% si
Mem:    515416k total,   479540k used,    35876k free,    11148k buffers
Swap:  3052208k total,  1405240k used,  1646968k free,    92024k cached





[ Post a follow-up to this message ]



    Re: Am I endangering my RAID 5 array?  
Bill Todd


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-08-05 07:46 AM

Yeechang Lee wrote:
> Bill Todd wrote:
> 
>
>
> Good point. However, I'm not aware of a way of forcing a parity
> rebuild in Linux software RAID except for marking a drive as failed
> then reinserting it into the array. In any case, the resulting resync
> shouldn't be any different than the automatic postboot resyncing the
> array is doing right now (after indeed having eliminated the
> random-restarting problem by bypassing the faulty UPS), right?

I'm not sufficiently familiar with the Linux design to say.  If it makes
no attempt to log what it's doing and simply does a brute-force complete
rebuild of *all* the parity information after an interruption, then yes.

>
> 
>
>
> If you know of a cost- and time-effective way of backing up a 2.8TB
> storage array being used for personal purposes, please let me
> know. I'm not being flippant; if there is such a thing, I'd really
> like to know! But I'm pretty sure there isn't one.

What is cost-and time-effective really depends upon the relationship
between the value you place on your data and the value you place on
other things.  Or, to look at it another way, data you don't back up is
by definition not worth backing up (which means that any data that *is*
worth backing up must be placed on storage which it is feasible to back up).

A solid RAID implementation has its own built-in paranoia and shouldn't
make an already-bad situation worse during a rebuild (e.g., if it finds
a hard-to-read sector it will *really* try to read it rather than
immediately go to the rest of the stripe to rebuild it, just in case
whatever affected the original sector may have left something in the
rest of the stripe - parity being the most obvious possibility, since it
would have been being written at about the same time - inconsistent as
well).  How solid the Linux implementation is in this regard I don't know.

- bill





[ Post a follow-up to this message ]



    Re: Am I endangering my RAID 5 array?  
Arno Wagner


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-08-05 10:48 PM

In comp.sys.ibm.pc.hardware.storage Yeechang Lee <ylee@pobox.com> wrote:
> At home I have a Linux desktop and a headless Linux fileserver with a
> software RAID 5 array (see
> <URL:http://groups.google.ca/group/comp....br />
a5254f5d>
> for details).

> A few days ago the UPS battery conked out. Besides the battery alarm
> ringing every ten seconds and slowly driving me mad, the fileserver is
> spontaneously rebooting several times a day; apparently momentary dips
> and other irregularities in the power here in downtown San Francisco,
> which the UPS had before filtered (and which likely prematurely aged
> the battery after replacement only 19 months ago), are causing it to
> reboot. (Interestingly, the Linux desktop hasn't hiccuped once;
> apparently its power supply is less sensitive.)

> The storage array is a pretty straightforward
> JFS-on-LVM2-on-software-RAID 5 setup. Each time the server reboots it
> usually causes the array to automatically rebuild. Sometimes the
> reboots occur during the rebuilding process, causing it to restart.

> My question: Am I risking data corruption through the repeated
> rebuilds? Should I just shut the server down until the replacement UPS
> battery arrives? Or, given that there haven't actually been any
> hardware drive failures, are the RAID structure and filesystem robust
> enough in the meanwhile?


This is a bit strange. Is this still a 2.4.x or older 2.6.x Kernel? The
newer ones only rebuild if the array was dirty.

On the quetion of risk: Yes, there is some risk, but but more that the
JFS gets corrupted (unless you have switched write-buffering off on
the disks, AFAIK cannot be done reliably at the moment) that that the
array itself dies. At least thet is my intuition with Linux software
RAID. You also have a pretty high risk of the PSU in that system
dying, so I would take the machine offline.

Arno






[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 11:43 AM.      Post New Thread    Post A Reply      
  Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 
Medical and Health forum | Computer Games Reviews | Graphics design forum

Back To The Top
Home | Usercp | Faq | Register