|
Home > Archive > Data Storage > July 2004 > Estimating RAID 1 MTBF?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Estimating RAID 1 MTBF?
|
|
|
| Hi,
I was wondering if anyone could tell me how to calculate/estimate the
overall MTBF of a RAID 1 (mirrored) configuration? I'm just looking for
a simple, "rule-of-thumb" type of calculation, assuming ideal
conditions.
I've been looking around for this, and I've seen a number of different
"takes" on this, and some of them seem to be quite at odds with each
other (and sometimes with themselves), so I thought that I'd post here
in the hopes that someone might be able to help.
Thanks,
Jim
| |
| Ron Reaugh 2004-07-15, 2:45 am |
|
"ohaya" <ohaya@cox.net> wrote in message news:40F5F095.422CFEE5@cox.net...
> Hi,
>
> I was wondering if anyone could tell me how to calculate/estimate the
> overall MTBF of a RAID 1 (mirrored) configuration? I'm just looking for
> a simple, "rule-of-thumb" type of calculation, assuming ideal
> conditions.
>
> I've been looking around for this, and I've seen a number of different
> "takes" on this, and some of them seem to be quite at odds with each
> other (and sometimes with themselves), so I thought that I'd post here
> in the hopes that someone might be able to help.
The basis will of course start with the MTBF of the HD over its rated
service life. That figure is not published by HD mfgs. The MTBF published
is a projection, educated guess plus imperical data from drives in early
life. The problem after that is any kind of assumptions about the pure
randomness of a failure or whether a failure might be clustered over
time/usage destroys any feasible precise math attempt.
The next issue is what kind of failures to take into consideration. Are SW,
OS, malice and external physical events like lightning, earthquakes, EMP,
PWS failure, other HW failure or overheating to be excluded? Excluding
such then my take is that if you replace a failing or potentially
failing(SMART) member of a RAID 1 set within 8 hours of failure/warning
during the drive's rated service life then it'll be a VERY cold day in hell
before you lose the RAID 1 set IF the HD model/batch does not have a
pathological failure mode that is intensely clustered.
An actual calculation would require information that is not available and
even the mfgs may not know precisely that information until towards the end
of a model's service life if then.
What takes on this have you found? I'd like to see how anyone would shoot
at this issue. The point is that with the exclusions noted then a RAID 1
set is VASTLY more reliable than a single HD. A shot would be at least 5000
times more reliable.
5000 is a rough shot at the number of 8 hour periods in five years.
So if you took 10K user of a given model HD and ran them all for the rated
service life and got 500 failures. Then take that same 10K group but all
using 2 drive RAID 1, there would only be a 1 chance in ten of a single
failure in the whole group or 1 failure if the group were 100K.
The accumulated threat of all the noted exclusions is VASTLY greater than
this so this issue is really a non-issue as RAID 1 is as good as it needs to
be. Keep a good backup.
| |
|
|
> The basis will of course start with the MTBF of the HD over its rated
> service life. That figure is not published by HD mfgs. The MTBF published
> is a projection, educated guess plus imperical data from drives in early
> life. The problem after that is any kind of assumptions about the pure
> randomness of a failure or whether a failure might be clustered over
> time/usage destroys any feasible precise math attempt.
>
> The next issue is what kind of failures to take into consideration. Are SW,
> OS, malice and external physical events like lightning, earthquakes, EMP,
> PWS failure, other HW failure or overheating to be excluded? Excluding
> such then my take is that if you replace a failing or potentially
> failing(SMART) member of a RAID 1 set within 8 hours of failure/warning
> during the drive's rated service life then it'll be a VERY cold day in hell
> before you lose the RAID 1 set IF the HD model/batch does not have a
> pathological failure mode that is intensely clustered.
>
> An actual calculation would require information that is not available and
> even the mfgs may not know precisely that information until towards the end
> of a model's service life if then.
>
> What takes on this have you found? I'd like to see how anyone would shoot
> at this issue. The point is that with the exclusions noted then a RAID 1
> set is VASTLY more reliable than a single HD. A shot would be at least 5000
> times more reliable.
> 5000 is a rough shot at the number of 8 hour periods in five years.
>
> So if you took 10K user of a given model HD and ran them all for the rated
> service life and got 500 failures. Then take that same 10K group but all
> using 2 drive RAID 1, there would only be a 1 chance in ten of a single
> failure in the whole group or 1 failure if the group were 100K.
>
> The accumulated threat of all the noted exclusions is VASTLY greater than
> this so this issue is really a non-issue as RAID 1 is as good as it needs to
> be. Keep a good backup.
Ron,
Thanks for your response. I've looked at so many different sources the
last couple of days, my eyes are blurring and my head is aching .
Before I begin, I was really looking for just a kind of "ballpark" kind
of "rule of thumb" for now, with as many assumptions/caveats as needed
to make it simple, i.e., something like assume drives are in their
"life" (the flat part of the Weibull/bathtub curve), ignore software,
etc.
Think of it like this: I just gave you two SCSI drives, I guarantee you
their MTBF is 1.2 Mhours, which won't vary over the time period that
they'll be in-service, no other hardware will ever fail (i.e., don't
worry about the processor board or raid controller), and it takes ~0
time to repair a failure.
Given something like that, and assuming I RAID1 these two drives, what
kind of MTBF would you expect over time?
- Is it the square of the individual drive MTBF?
See: http://www.phptr.com/articles/article.asp?p=28689
Or: http://tech-report.com/reviews/2001...id/index.x?pg=2 (this
one doesn't make sense if MTTR=0 ==> MTBF=infinity?)
Or: http://www.teradataforum.com/terada...0107_214543.htm (again,
don't know how MTTR=0 would work)
- Is it 150% the individual drive MTBF?
See:
http://www.zzyzx.com/products/white...lity_primer.pdf
- Is it double the individual drive MTBF? (I don't remember where I saw
this one.)
It's kind of funny, but when I first started looking, I thought that I'd
find something simple. That was this weekend ...
Jim
| |
| Ron Reaugh 2004-07-15, 2:45 am |
|
"ohaya" <ohaya@cox.net> wrote in message news:40F60365.73BF82D@cox.net...
>
published[vbcol=seagreen]
SW,[vbcol=seagreen]
EMP,[vbcol=seagreen]
hell[vbcol=seagreen]
and[vbcol=seagreen]
end[vbcol=seagreen]
shoot[vbcol=seagreen]
1[vbcol=seagreen]
5000[vbcol=seagreen]
rated[vbcol=seagreen]
all[vbcol=seagreen]
than[vbcol=seagreen]
needs to[vbcol=seagreen]
>
>
> Ron,
>
> Thanks for your response. I've looked at so many different sources the
> last couple of days, my eyes are blurring and my head is aching .
>
> Before I begin, I was really looking for just a kind of "ballpark" kind
> of "rule of thumb" for now, with as many assumptions/caveats as needed
> to make it simple, i.e., something like assume drives are in their
> "life" (the flat part of the Weibull/bathtub curve), ignore software,
> etc.
>
> Think of it like this: I just gave you two SCSI drives, I guarantee you
> their MTBF is 1.2 Mhours,
1,200,000 hours ~= 137 years
Now do you think that means that 1/2 fail in 137 years?
> which won't vary over the time period that
> they'll be in-service,
That's known to be false.
> no other hardware will ever fail (i.e., don't
> worry about the processor board or raid controller), and it takes ~0
> time to repair a failure.
>
> Given something like that, and assuming I RAID1 these two drives, what
> kind of MTBF would you expect over time?
Zero repair time?
> - Is it the square of the individual drive MTBF?
> See: http://www.phptr.com/articles/article.asp?p=28689
All obvious and based on known inaccurate assumptions
> Or: http://tech-report.com/reviews/2001...id/index.x?pg=2 (this
> one doesn't make sense if MTTR=0 ==> MTBF=infinity?)
> Or: http://www.teradataforum.com/terada...0107_214543.htm (again,
> don't know how MTTR=0 would work)
>
> - Is it 150% the individual drive MTBF?
> See:
>
http://www.zzyzx.com/products/white...lity_primer.pdf
"Industry standards have determined that redundant components increase the
MTBF by 50%." No citation supplied.
"It should be noted that in the example above, if the downtime is reduced to
zero, availability changes to 1 or 100% regardless of the MTBF."
> - Is it double the individual drive MTBF? (I don't remember where I saw
> this one.)
>
>
> It's kind of funny, but when I first started looking, I thought that I'd
> find something simple. That was this weekend ...
As I said in my prior post. Maintained RAID 1 failure(of the cases
included) can be ignored as it's swamped by other failures in the real
world. It's a great academic exercise with little practical application
here.
| |
| Bill Todd 2004-07-15, 2:45 am |
|
"ohaya" <ohaya@cox.net> wrote in message news:40F60365.73BF82D@cox.net...
....
> Before I begin, I was really looking for just a kind of "ballpark" kind
> of "rule of thumb" for now, with as many assumptions/caveats as needed
> to make it simple, i.e., something like assume drives are in their
> "life" (the flat part of the Weibull/bathtub curve), ignore software,
> etc.
The drives *have* to be in their nominal service life: once you go beyond
that, you won't get any meaningful numbers (because they have no
significance to the product, and thus the manufacturer won't have performed
any real testing in that life range).
>
> Think of it like this: I just gave you two SCSI drives, I guarantee you
> their MTBF is 1.2 Mhours, which won't vary over the time period that
> they'll be in-service, no other hardware will ever fail (i.e., don't
> worry about the processor board or raid controller), and it takes ~0
> time to repair a failure.
>
> Given something like that, and assuming I RAID1 these two drives, what
> kind of MTBF would you expect over time?
Infinite.
>
> - Is it the square of the individual drive MTBF?
> See: http://www.phptr.com/articles/article.asp?p=28689
No. This example applies to something like an unmanned spacecraft, where no
repairs or replacements can be made. Such a system has no meaningful MTBF
beyond its nominal service life (which will usually be much less than the
MTBF of even a single component, when that component is something as
reliable as a disk drive).
> Or: http://tech-report.com/reviews/2001...id/index.x?pg=2 (this
> one doesn't make sense if MTTR=0 ==> MTBF=infinity?)
That's how it works, and this is the applicable formula to use. For
completeness, you'd need to factor in the fact that drives have to be
replaced not only when they fail but when they reach the end of their
nominal service life, unless you reserved an extra slot to use to build the
new drive's contents (effectively, temporarily creating a double mirror)
before taking the old drive out.
> Or: http://www.teradataforum.com/terada...0107_214543.htm (again,
> don't know how MTTR=0 would work)
The same way: though the explanation for RAID-5 MTBF is not in the usual
form, it's equivalent.
>
> - Is it 150% the individual drive MTBF?
> See:
>
http://www.zzyzx.com/products/white...bility_primer.p
df
No: the comment you saw there is just some half-assed rule of thumb that
once again assumes no repairs are effected (and is still wrong even under
that assumption, though the later text that explains the value of repair is
qualitatively valid).
>
> - Is it double the individual drive MTBF? (I don't remember where I saw
> this one.)
No.
The second paper that you cited has a decent explanation of why the formula
is what it is. If you'd like a more detailed one, check out Transaction
Processing: Concepts and Techniques by Jim Gray and Andreas Reuter.
- bill
| |
|
|
Bill Todd wrote:
>
> "ohaya" <ohaya@cox.net> wrote in message news:40F60365.73BF82D@cox.net...
>
> ...
>
>
> The drives *have* to be in their nominal service life: once you go beyond
> that, you won't get any meaningful numbers (because they have no
> significance to the product, and thus the manufacturer won't have performed
> any real testing in that life range).
>
>
> Infinite.
>
>
> No. This example applies to something like an unmanned spacecraft, where no
> repairs or replacements can be made. Such a system has no meaningful MTBF
> beyond its nominal service life (which will usually be much less than the
> MTBF of even a single component, when that component is something as
> reliable as a disk drive).
>
>
> That's how it works, and this is the applicable formula to use. For
> completeness, you'd need to factor in the fact that drives have to be
> replaced not only when they fail but when they reach the end of their
> nominal service life, unless you reserved an extra slot to use to build the
> new drive's contents (effectively, temporarily creating a double mirror)
> before taking the old drive out.
>
>
> The same way: though the explanation for RAID-5 MTBF is not in the usual
> form, it's equivalent.
>
> http://www.zzyzx.com/products/white...bility_primer.p
> df
>
> No: the comment you saw there is just some half-assed rule of thumb that
> once again assumes no repairs are effected (and is still wrong even under
> that assumption, though the later text that explains the value of repair is
> qualitatively valid).
>
>
> No.
>
> The second paper that you cited has a decent explanation of why the formula
> is what it is. If you'd like a more detailed one, check out Transaction
> Processing: Concepts and Techniques by Jim Gray and Andreas Reuter.
>
> - bill
Bill,
Thanks. This kind of goes along with some other info I've just been
looking at (something like "Product of Reliabilities" on a website).
If the above calculation is in fact a good estimate, and just so that
I'm clear, if:
- I had a RAID1 setup with two SCSI drives that really have an MTBF of
1.2Mhours, and
- The drives are within their "normal" lifetime (i.e., not in infant
mortality or end-of-life), and
- The processor board/hardware was such that it supported a hot swap
such that if one of the drives failed, it could be replaced without
having halting the system, and
- We estimated (for planning purposes) that let's say, worst-case, it
took someone an 4 hours to detect the failure, get another identical
drive, and replace it (so MTTR ~4 hours).
Then a reasonable ballpark estimate for the "theoretical" MTTF (which is
~MTBF) to be:
(1.2Mhours)(1.2Mhours)
---------------------- = MTTF(RAID1)
2 x 4 hours
Is that correct?
Wow!!!
Somehow, this seems "counter-intuitive" (sorry) ....
Jim
| |
|
|
>
> As I said in my prior post. Maintained RAID 1 failure(of the cases
> included) can be ignored as it's swamped by other failures in the real
> world. It's a great academic exercise with little practical application
> here.
Ron,
Thanks again. I'm starting to understand your 2nd sentence above .
If I'm understanding what you're saying, with a RAID1 setup, with 2
drives with reasonable (i.e., 1.2Mhours) MTBF, from a design standpoint,
you wouldn't be worried about failures of the drives themselves, because
there are other failures/components (e.g., the processor board, etc.)
that would have an MTBF much lower than the raid'ed drives themselves.
Did I get that right?
BTW, re. the "0" MTTR, see my post back to Bill Todd. I had given 4
hours as an example in that post, but after posting and thinking about
it, given the scenario that I posed, it really seems like the MTTR would
be more like "0" than like 4 hours, since with my scenario, the "system"
never really fails (since the drives are hot-swappable).
Comments?
Jim
| |
| Bill Todd 2004-07-15, 2:45 am |
|
"ohaya" <ohaya@cox.net> wrote in message news:40F6162F.3C6681C6@cox.net...
....
> - I had a RAID1 setup with two SCSI drives that really have an MTBF of
> 1.2Mhours, and
> - The drives are within their "normal" lifetime (i.e., not in infant
> mortality or end-of-life), and
> - The processor board/hardware was such that it supported a hot swap
> such that if one of the drives failed, it could be replaced without
> having halting the system, and
> - We estimated (for planning purposes) that let's say, worst-case, it
> took someone an 4 hours to detect the failure, get another identical
> drive, and replace it (so MTTR ~4 hours).
>
> Then a reasonable ballpark estimate for the "theoretical" MTTF (which is
> ~MTBF) to be:
>
> (1.2Mhours)(1.2Mhours)
> ---------------------- = MTTF(RAID1)
> 2 x 4 hours
>
>
> Is that correct?
>
> Wow!!!
>
> Somehow, this seems "counter-intuitive" (sorry) ....
Hey, *single* disks are pretty damn reliable in the kind of ideal service
conditions you postulate: mirrored disks are just (reliable) squared.
A 2,000,000-year RAID-1-pair MTBF sounds great, until you recognize that if
you have 2,000,000 installations, about one of them will fail each year. If
each site has 100 disk pairs rather than just one, then someone will lose
data every 3+ days (or you'll need only 20,000 sites for about one to lose
data every year).
That's still really good, but not so far beyond something you'd start
worrying about to be utterly ridiculous - at least if you're a manufacturer
(individual customers still have almost no chance of seeing a failure, but
even a single one that does is still very bad publicity). Start including
RAID-5 configurations, and system MTBF drops by roughly the square of the
number of drives in a set, which starts getting significant before long
(again, especially from the manufacturer's viewpoint, even if very few
individual customers actually experience data loss: some of the new
virtualization architectures have RAID-5-like failure characteristics - even
if they're not using parity but mirroring to protect data, they're
distributing it around the disk set in a manner that can cause data loss if
*any* two disks fail - which users should at least be aware of).
- bill
| |
| Bill Todd 2004-07-15, 2:45 am |
|
"ohaya" <ohaya@cox.net> wrote in message news:40F6199E.2DEAE730@cox.net...
....
> BTW, re. the "0" MTTR, see my post back to Bill Todd. I had given 4
> hours as an example in that post, but after posting and thinking about
> it, given the scenario that I posed, it really seems like the MTTR would
> be more like "0" than like 4 hours, since with my scenario, the "system"
> never really fails (since the drives are hot-swappable).
>
> Comments?
If you've learned how to repopulate on the order of 100 GB of failed drive
in zero time, especially while not seriously degrading on-going processing
(so don't just assert that you can use anything like the full bandwidth of
its partner to restore it), I suspect that there are many people who would
be very interested in talking with you.
- bill
| |
|
|
>
> Hey, *single* disks are pretty damn reliable in the kind of ideal service
> conditions you postulate: mirrored disks are just (reliable) squared.
>
> A 2,000,000-year RAID-1-pair MTBF sounds great, until you recognize that if
> you have 2,000,000 installations, about one of them will fail each year. If
> each site has 100 disk pairs rather than just one, then someone will lose
> data every 3+ days (or you'll need only 20,000 sites for about one to lose
> data every year).
Bill,
Thanks for the perspective.
But, so that I'm clear, if the individual drives really have 1.2Mhours
MTBF (and I think the Atlas 15K II spec sheet actually claims
1.4Mhours), then the "squared" MTBF would indicate that RAID 1 pair
would be something like 1+ TRILLION hours MTBF, not 1+ MILLION hours.
Have I misinterpreted something?
Jim
| |
|
|
Bill Todd wrote:
>
> "ohaya" <ohaya@cox.net> wrote in message news:40F6199E.2DEAE730@cox.net...
>
> ...
>
>
> If you've learned how to repopulate on the order of 100 GB of failed drive
> in zero time, especially while not seriously degrading on-going processing
> (so don't just assert that you can use anything like the full bandwidth of
> its partner to restore it), I suspect that there are many people who would
> be very interested in talking with you.
Bill,
You're right, in my mind at least, I was ignoring any effect of
restoring to a replacement drive in the case of a failed drive. But, I
am looking mainly at FAILURE rates (MTBF), and assuming hot-swappable
drives, wouldn't the system continue to run (possibly with some
performance degradation because of the restore)?
Is the period of time where the new/replacement drive is being restored
normally considered "downtime", i.e., is it included in MTTR?
Jim
| |
|
|
>
> Bill,
>
> You're right, in my mind at least, I was ignoring any effect of
> restoring to a replacement drive in the case of a failed drive. But, I
> am looking mainly at FAILURE rates (MTBF), and assuming hot-swappable
> drives, wouldn't the system continue to run (possibly with some
> performance degradation because of the restore)?
>
> Is the period of time where the new/replacement drive is being restored
> normally considered "downtime", i.e., is it included in MTTR?
>
> Jim
Hi,
BTW, I wanted to mention that I really appreciate the patience you all
have shown with my questions, some of which might've admittedly appeared
stupid or naive, but this discussion has been VERY helpful to me, at
least. So again, thanks!!
Jim
| |
| Ron Reaugh 2004-07-15, 5:45 pm |
|
"ohaya" <ohaya@cox.net> wrote in message news:40F6199E.2DEAE730@cox.net...
>
I'd[vbcol=seagreen]
>
>
> Ron,
>
> Thanks again. I'm starting to understand your 2nd sentence above .
>
> If I'm understanding what you're saying, with a RAID1 setup, with 2
> drives with reasonable (i.e., 1.2Mhours) MTBF, from a design standpoint,
> you wouldn't be worried about failures of the drives themselves, because
> there are other failures/components (e.g., the processor board, etc.)
> that would have an MTBF much lower than the raid'ed drives themselves.
>
> Did I get that right?
And many more failure sources, EXACTLY.
> BTW, re. the "0" MTTR, see my post back to Bill Todd. I had given 4
> hours as an example in that post, but after posting and thinking about
> it, given the scenario that I posed, it really seems like the MTTR would
> be more like "0" than like 4 hours, since with my scenario, the "system"
> never really fails (since the drives are hot-swappable).
>
> Comments?
Except for the possibility that the second drive fails before the first is
replaced. But in that 4 hours I'd be more concerned about gaint meteroid
impact.
| |
| Peter da Silva 2004-07-15, 5:45 pm |
| In article <40F677BF.E3AF82B5@cox.net>, ohaya <ohaya@cox.net> wrote:
> Is the period of time where the new/replacement drive is being restored
> normally considered "downtime", i.e., is it included in MTTR?
Yes. Think of it this way: if a second drive failed in that period, would
the system as a whole fail? Yes. Therefore, that time has to be include in
the calculation, so they must be including it in MTTR.
--
I've seen things you people can't imagine. Chimneysweeps on fire over the roofs
of London. I've watched kite-strings glitter in the sun at Hyde Park Gate. All
these things will be lost in time, like chalk-paintings in the rain. `-_-'
Time for your nap. | Peter da Silva | Har du kramat din varg, idag? 'U`
| |
| _firstname_@lr_dot_los-gatos_dot_ca.us 2004-07-15, 5:45 pm |
| In article <40F6162F.3C6681C6@cox.net>, ohaya <ohaya@cox.net> wrote:
....
>If the above calculation is in fact a good estimate, and just so that
>I'm clear, if:
>
>- I had a RAID1 setup with two SCSI drives that really have an MTBF of
>1.2Mhours, and
>- The drives are within their "normal" lifetime (i.e., not in infant
>mortality or end-of-life), and
>- The processor board/hardware was such that it supported a hot swap
>such that if one of the drives failed, it could be replaced without
>having halting the system, and
>- We estimated (for planning purposes) that let's say, worst-case, it
>took someone an 4 hours to detect the failure, get another identical
>drive, and replace it (so MTTR ~4 hours).
>
>Then a reasonable ballpark estimate for the "theoretical" MTTF (which is
>~MTBF) to be:
>
>(1.2Mhours)(1.2Mhours)
>---------------------- = MTTF(RAID1)
> 2 x 4 hours
>
>
>Is that correct?
Yes. But irrelevant. And non-intuitive to boot.
First, the MTTR (repair time) has to be in there, because: While a
failed drive (1/2 the pair) is being repaired, the array is no longer
redundant. So the only failure mode considered in this formula is the
following: One drive fails; while that drive is being repaired, the
second drive also fails.
By "repair", we mean the time it takes to prepare another drive, and
copy the data from the surviving (good) drive onto it, so redundancy
is restored. By the way, you can immediately see why it is good to
have a hot spare drive ready to go: If you have to wait for a human to
remove the dead drive and add a new drive, the typical MTTR is at
least a few hours, often a day (the time it takes to alert the human
and get him into the room with the spare drive). If the spare is
powered up and ready to go, the typical MTTR is a few hours (can be as
short as 1 hour), to copy the data onyo it.
Obviously, the simply formula (comes from the appendix of the original
Berkeley RAID paper, and already caused much hilarity back then)
ignores all real-world problems, only addressing uncorrelated
single-drive failure.
Second, as many other people have said, this reliability calculation
is completely irrelevant. Real storage systems based on RAID fail,
and they do so all the time. Some fail because of simultaneous
failure of two drives (some slang calls this a "RAID kill"). Some
fail because during reconstruction after a single drive failure, the
surviving drive is found to have bad sectors or be unreadable, or the
extra stress of the reconstruction causes the surviving drive to fail
(slang sometimes calls this a "strip kill" or "repair kill"). Many
more fail due to correlated failures (for example a faulty power
supply manages to kill all the drives simultaneously).
The real source of failures, which is much much higher than the above
academic calculation, is systems issues. Within a disk array,
firmware or hardware faults are commonly the source for data loss
(examples: The array forgot to write dirty data back from cache, or
the SCSI bus has a double-bit error that's not caught by parity
checking, or in a RAID-5 XOR engine, which is sometimes implemented in
hardware, the byte counter can be off). Even more realistic: The best
RAIDed array in the world doesn't help you if your filesystem or
database corrupts data for fun - except that the corrupt data is now
stored extremely reliably.
There is a story of a company that had a complete second computer
center, with all their data being continuously replicated between the
two computer centers. In the event of a desaster, the second computer
center could with a few second notice take over for the first one, and
keep running nearly seamlessly. The second computer center was
located in the other tower of the World Trade Center. Oops.
If you really care about your data surviving for a long time, and
maybe being continuously accessible, and maybe even being continuously
accessible with good performance, you have to look at the overall
design, and have to study techniques such as logging, HSM, backup,
remote mirroring, transactional storage systems, data dispersion a la
Oceanstore ...
In the meantime, get yourself two disks, set them up as RAID-1, and
you have already made the largest single step towards a reliable
system.
--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr _dot_ los-gatos _dot_ ca.us
| |
| Bill Todd 2004-07-15, 5:45 pm |
|
"ohaya" <ohaya@cox.net> wrote in message news:40F66C45.325164D7@cox.net...
>
is[vbcol=seagreen]
service[vbcol=seagreen]
if[vbcol=seagreen]
If[vbcol=seagreen]
lose[vbcol=seagreen]
lose[vbcol=seagreen]
>
>
> Bill,
>
> Thanks for the perspective.
>
> But, so that I'm clear, if the individual drives really have 1.2Mhours
> MTBF (and I think the Atlas 15K II spec sheet actually claims
> 1.4Mhours), then the "squared" MTBF would indicate that RAID 1 pair
> would be something like 1+ TRILLION hours MTBF, not 1+ MILLION hours.
> Have I misinterpreted something?
Yes: the figures I gave above were in years, not hours.
Still, I dropped a 0 while doing the calcs in my head (I think I used 10^5
rather than 10^4 for approximating hours per year): they should all be 10x
as large.
Ralph made a very significant comment, by the way: at such probabilities,
you really have to take silent sector deterioration seriously, so the array
needs to 'scrub' its data in the background to detect such deterioration
while you still have a good copy left to fix it with. Otherwise, the
system's mean time to data loss drops precipitously.
- bill
| |
| Ron Reaugh 2004-07-15, 8:45 pm |
|
"Bill Todd" <billtodd@metrocast.net> wrote in message
news:JdadnQUPzO-5imrd4p2dnA@metrocast.net...
>
> "ohaya" <ohaya@cox.net> wrote in message news:40F66C45.325164D7@cox.net...
(which[vbcol=seagreen]
> is
> service
that[vbcol=seagreen]
> if
year.[vbcol=seagreen]
> If
> lose
> lose
>
> Yes: the figures I gave above were in years, not hours.
>
> Still, I dropped a 0 while doing the calcs in my head (I think I used 10^5
> rather than 10^4 for approximating hours per year): they should all be
10x
> as large.
>
> Ralph made a very significant comment, by the way: at such probabilities,
> you really have to take silent sector deterioration seriously, so the
array
> needs to 'scrub' its data in the background to detect such deterioration
> while you still have a good copy left to fix it with. Otherwise, the
> system's mean time to data loss drops precipitously.
A first stab at that process is called nightly backup and the second stab is
scheduled defrags. "silent sector deterioration" can happen but is usually
an isolated sector here or there and is quite uncommon. Good RAID 1 will
fill the new/replacement drive inspite of such a sector read error and then
one is left with an operable system with an isolated read error that may be
dealt with. Depending on the definition of "data loss" this issue may not
count and is relatively obscure. Modern HDs are quite good at being able to
read/recover their data.
| |
| Bill Todd 2004-07-15, 8:45 pm |
|
"Ron Reaugh" <ron-reaugh@worldnet.att.net> wrote in message
news:fUEJc.98971$OB3.60798@bgtnsc05-news.ops.worldnet.att.net...
>
> "Bill Todd" <billtodd@metrocast.net> wrote in message
> news:JdadnQUPzO-5imrd4p2dnA@metrocast.net...
....
probabilities,[vbcol=seagreen]
> array
>
> A first stab at that process is called nightly backup
Nope: this will read only one of the two copies of the data, and thus
decrease the probability that one is bad only by a factor of 2 (unless the
array is wise enough to choose a random copy for each read, or load
considerations encourage it to). Besides, the vast majority of the data
will usually be known to be unchanged and hence won't be backed up at all
frequently.
and the second stab is
> scheduled defrags.
Better, but there'll still often be some data that doesn't need to be moved
(at least if the defrag algorithm has any brains).
"silent sector deterioration" can happen but is usually
> an isolated sector here or there and is quite uncommon.
It doesn't have to be very common or at all extensive to decrease the mean
time to data loss of a RAID-1 pair from tens of millions of years to tens of
thousands of years. As I noted earlier, when the number of disk pairs gets
high, such a reduction becomes significant.
- bill
| |
| Ron Reaugh 2004-07-15, 8:45 pm |
|
"Bill Todd" <billtodd@metrocast.net> wrote in message
news:Yrmdnb2IcsvCtWrd4p2dnA@metrocast.net...
>
> "Ron Reaugh" <ron-reaugh@worldnet.att.net> wrote in message
> news:fUEJc.98971$OB3.60798@bgtnsc05-news.ops.worldnet.att.net...
>
> ...
>
> probabilities,
deterioration[vbcol=seagreen]
>
> Nope: this will read only one of the two copies of the data,
Well, "stab" and which it will read is not necessarily always clear and may
change.
> and thus
> decrease the probability that one is bad only by a factor of 2 (unless the
> array is wise enough to choose a random copy for each read, or load
> considerations encourage it to). Besides, the vast majority of the data
> will usually be known to be unchanged and hence won't be backed up at all
> frequently.
Assuming incremental backups but two drive RAID 1 may very well get imaged
each night.
> and the second stab is
>
> Better, but there'll still often be some data that doesn't need to be
moved
> (at least if the defrag algorithm has any brains).
Right but this is all about probability reducttion.
> "silent sector deterioration" can happen but is usually
>
> It doesn't have to be very common or at all extensive to decrease the mean
> time to data loss of a RAID-1 pair from tens of millions of years to tens
of
> thousands of years. As I noted earlier, when the number of disk pairs
gets
> high, such a reduction becomes significant.
Does a bad sector that happens to be detected during a RAID 1 HD failure and
replacement constitute any reflection on the efficacy of that recovery? I
say no.
Does undetected "silent sector deterioration" actually much of a threat to
real world current two drive RAID 1 reliability? I say no.
| |
| Bill Todd 2004-07-15, 8:45 pm |
|
"Ron Reaugh" <ron-reaugh@worldnet.att.net> wrote in message
news:H3GJc.261014$Gx4.45769@bgtnsc04-news.ops.worldnet.att.net...
....
> Does a bad sector that happens to be detected during a RAID 1 HD failure
and
> replacement constitute any reflection on the efficacy of that recovery? I
> say no.
And you're wrong - utterly. When you have a disk failure in your RAID-1
pair, and only *then* discover that a data sector on the surviving disk is
also bad, you've lost data - i.e., 'failed'.
> Does undetected "silent sector deterioration" actually much of a threat to
> real world current two drive RAID 1 reliability? I say no.
Same degree of wrongness here as well. You really need to write less and
read more.
- bill
| |
| Ron Reaugh 2004-07-15, 8:45 pm |
|
"Bill Todd" <billtodd@metrocast.net> wrote in message
news:OuidnYjh65hzr2rdRVn-gg@metrocast.net...
>
> "Ron Reaugh" <ron-reaugh@worldnet.att.net> wrote in message
> news:H3GJc.261014$Gx4.45769@bgtnsc04-news.ops.worldnet.att.net...
>
> ...
>
> and
I[vbcol=seagreen]
>
> And you're wrong - utterly. When you have a disk failure in your RAID-1
> pair, and only *then* discover that a data sector on the surviving disk is
> also bad, you've lost data - i.e., 'failed'.
That's not my definition of failed.
to[vbcol=seagreen]
>
> Same degree of wrongness here as well.
Nope.
| |
|
|
Bill Todd wrote:
>
> "ohaya" <ohaya@cox.net> wrote in message news:40F60365.73BF82D@cox.net...
>
> ...
>
>
> The drives *have* to be in their nominal service life: once you go beyond
> that, you won't get any meaningful numbers (because they have no
> significance to the product, and thus the manufacturer won't have performed
> any real testing in that life range).
>
>
> Infinite.
>
>
> No. This example applies to something like an unmanned spacecraft, where no
> repairs or replacements can be made. Such a system has no meaningful MTBF
> beyond its nominal service life (which will usually be much less than the
> MTBF of even a single component, when that component is something as
> reliable as a disk drive).
>
>
> That's how it works, and this is the applicable formula to use. For
> completeness, you'd need to factor in the fact that drives have to be
> replaced not only when they fail but when they reach the end of their
> nominal service life, unless you reserved an extra slot to use to build the
> new drive's contents (effectively, temporarily creating a double mirror)
> before taking the old drive out.
>
>
> The same way: though the explanation for RAID-5 MTBF is not in the usual
> form, it's equivalent.
>
> http://www.zzyzx.com/products/white...bility_primer.p
> df
>
> No: the comment you saw there is just some half-assed rule of thumb that
> once again assumes no repairs are effected (and is still wrong even under
> that assumption, though the later text that explains the value of repair is
> qualitatively valid).
>
>
> No.
>
> The second paper that you cited has a decent explanation of why the formula
> is what it is. If you'd like a more detailed one, check out Transaction
> Processing: Concepts and Techniques by Jim Gray and Andreas Reuter.
>
Hi,
I'm back , and I'm bottom-posting to one of the earlier posts so that
everything is there, as this thread is getting a little long. I hope
that this is ok?
I'm still a little puzzled about your (and I think Ron's) earlier
comments about the article from phptr.com that I linked earlier (see
above), and I've been trying to "reconcile" that approach/methodology to
the ones from the tech-report.com and from the teradataforum.com.
If I try to run the equivalent (hypothetical) numbers through both, I
get vastly different results.
For example, if I:
- assume 100,000 hours for a single drive/device and
- have 3 drives in RAID 1, and
- assume 24 hours MTTR, and
- use the tech-report.com/teradataforum.com method, I get:
MTTF(RAID1) ~ 20 TRILLION hours+
And, if I follow the method from phptr.com, with the same data, I get:
AFR(1 drive) = 8760/100,000 = .0876
AFR(3 drives-RAID1) = (.0876)^3 ~ .0006722
MTBF(3 drives-RAID1) = 8760/AFR(3 drives-RAID1) ~ 13 MILLION hours+
Using the method from the phptr.com page, the MTBF results are WAY less
than the other method.
Assuming that the tech-report.com/teradataforumc.com method is more
correct, and if the method from the phptr.com page is so wrong for
calculating just a relatively simple RAID1 configuration, is ANY of the
rest of the methods described in the phptr.com page a valid approach?
The reason for my question is that the next thing that I wanted to look
at was to use the method described in the rest of the phptr.com page
(i.e., in the case study) to do some ballpark figuring for a more
extended system (with more than just the raided drives), similar to what
was in the case study, using MTBF numbers that I have for components.
If any of you might be able to shed some (more) light on this, I'd
really appreciate it.
Thanks again,
Jim
| |
| Robert Wessel 2004-07-16, 2:45 am |
| "Ron Reaugh" <ron-reaugh@worldnet.att.net> wrote in message news:<FuGJc.99265$OB3.92215@bgtnsc05-news.ops.worldnet.att.net>...
>
> That's not my definition of failed.
I wrote data to the disk. It didn't come back. Sounds like failure to me.
| |
| Robert Wessel 2004-07-16, 2:45 am |
| "Ron Reaugh" <ron-reaugh@worldnet.att.net> wrote in message news:<FuGJc.99265$OB3.92215@bgtnsc05-news.ops.worldnet.att.net>...
>
> That's not my definition of failed.
I wrote data to the disk. It didn't come back. Sounds like failure to me.
| |
| Robert Wessel 2004-07-16, 2:45 am |
| "Ron Reaugh" <ron-reaugh@worldnet.att.net> wrote in message news:<FuGJc.99265$OB3.92215@bgtnsc05-news.ops.worldnet.att.net>...
>
> That's not my definition of failed.
I wrote data to the disk. It didn't come back. Sounds like failure to me.
| |
| Ralph Becker-Szendy 2004-07-16, 5:45 pm |
| In article <fUEJc.98971$OB3.60798@bgtnsc05-news.ops.worldnet.att.net>,
Ron Reaugh <ron-reaugh@worldnet.att.net> wrote:
....
>A first stab at that process is called nightly backup and the second stab is
>scheduled defrags. "silent sector deterioration" can happen but is usually
>an isolated sector here or there and is quite uncommon.
Yes, good arrays all have scrubbing capabilities (or should have
them). But life isn't quite so easy. Many disk workloads show very
high locality: For long stretches, the actuator stays at or near the
same position. If you start scrubbing carelessly while a
low-intensity foreground workload is running, the response time for
real IOs can increase quite precipitously. So the trick with
implementing scrubbing is to forecast when the foreground workload
will be idle. Like all forecasting of the future, this is quite
difficult (if I knew how to do it, I would play the stock market, and
get out of the storage business).
Note that good scrubbing has to be done internally to the array,
because external scrubbing (for example a full backup, or just reading
the block device end to end) will not touch all sectors on all disks.
And depending on how the array is implemented, it may never touch some
sectors (for example, as long as no disk has failed, most arrays will
never read the parity block on a RAID-5 group). So this isn't
something the user of a disk array can take care of himself.
> Good RAID 1 will
>fill the new/replacement drive inspite of such a sector read error and then
>one is left with an operable system with an isolated read error that may be
>dealt with. Depending on the definition of "data loss" this issue may not
>count and is relatively obscure. Modern HDs are quite good at being able to
>read/recover their data.
Well, the promise of RAIDed disks is that there is NO data loss. I
personally think that as soon as I lose a sector, I have violated my
contract with the end user. Clearly losing one sector is better than
losing a whole LUN or a whole array. But if that sector is in an
allocated area (of the file system or the database that sits above),
the array has corrupted or invalidated data. That's why to many
customers the first bit error invalidates the whole LUN - as soon as
you lose a single sector, you'll have some explaining to do (often
takes the form that a C-level executive has to call the customer and
apologize, followed by massive price cuts or rebates.
If you look at the introduction history and market penetration of the
big disk arrays (EMC Symmetrix, Hitachi Lightning, IBM Shark and so
on), you'll see that the "public perception" of data reliability has
been a big factor in selling and pricing; I don't want to go into
details, as they are sure to step on someones foot. Whether the
"public perception" of data reliability is actually correlated with
the real incidence of data loss is an interesting study in mass
psychology and the power of marketing over engineering. But what is
clear is that there are many customer who are perfectly willing to pay
a lot of extra money (a factor of 2, 3 or 10 more than the lowest
bidder) and select a vendor that gives them a warm and fuzzy feeling
(and maybe also real technical advantages, or even contractual
guarantees) about the quality and reliability of the disk array.
--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr _dot_ los-gatos _dot_ ca.us
| |
| _firstname_@lr_dot_los-gatos_dot_ca.us 2004-07-16, 5:45 pm |
| In article <H3GJc.261014$Gx4.45769@bgtnsc04-news.ops.worldnet.att.net>,
Ron Reaugh <ron-reaugh@worldnet.att.net> wrote:
>Does a bad sector that happens to be detected during a RAID 1 HD failure and
>replacement constitute any reflection on the efficacy of that recovery? I
>say no.
For an enterprise-class disk array, this is catastrophic (see previous
message). It will cause an alert to field service personnel. Often,
the customer will have to be told officially (even if the customer has
not detected the read failure yet).
For a small RAID array (for example a RAID card on the PCI bus with 2
or 4 drives, and a single-system file system like NTFS or ext3 on
top): Most of the time nobody cares. The performability expectation
for such a system is sufficiently low that loss of sectors can often
be tolerated. In particular because in typical file system workloads
(excluding data bases), much of the data is written, read maybe for a
short period after being written (for example by the next nightly
backup), and never touched again.
>Does undetected "silent sector deterioration" actually much of a threat to
>real world current two drive RAID 1 reliability? I say no.
Sorry, but for large disk arrays (which typically have many hundred or
a few thousand disks in them) this is right up there at the top
failure modes (excluding the ones that can't be dealt with anyhow,
like meteorite, fire, or software bugs). Together with complete
failure of the 2nd disk, and failure of the 2nd disk that is induced
by the extra stress of RAID recovery.
With the very large disks today, the detected and undetected failure
of individual sectors is beginning to be a very significant worry, and
I can assure you that the large companies in this sector (their name
is typical 2- or 3-letter abbreviations, for example
[IEHS][BMPu][CMn], plus Hitachi and NetApp) are putting significant
research and development effort into new forms of redundant storage
that can survive such problems better.
By the way, I'm always saying "RAID-1" and "2nd disk", even though a
lot of the large arrays are actually formatted to RAID-5 or other
parity or erasure code based schemes. The examples are just easier
for RAID-1.
One particular worrisome trend is "off-track writes", which is rumored
to be more common in consumer-grade disks (typically IDE disks): If
during writing mechanical vibration occurs, the head might wander off,
and write the new data slightly off the track, without completely
overwriting the data on the track. If you now seek away and come back
to read later, you can get lucky and by coincidence settle on the new
data, or you can get unlucky and hit the old track, and read old data
(which is still there, with perfectly valid ECCs, but maybe not for a
whole track and only for a few sectors). You can see how this can be
quite catastrophic, even in a non-redundant system. It gets really
juicy if this happens during a RAID-5 reconstruction, because now you
will take this old data, and XOR it with the other disks in the RAID
group, creating absolute gibberish, and then writing the gibberish
back to disk, thinking that it is valid. In a RAID-1 system the
off-track read at least returns data that used to be valid (small
consolation).
What you might detect here is a certain mindset. We all know that
individual disks are fallible, and we've learned to live with this
(operative word here is "backup"). For small RAID arrays (often based
on motherboards or PCI cards, or hidden in the back end of NAS
servers), we do a few simple steps that give you a huge improvement in
reliability, but are still considered somewhat unreliable. For most
personal and small business users, these small RAID systems give you a
huge bang for the buck. But once you enter the realm of the big
enterprise storage systems, things change, and you MUST NEVER EVER
LOSE DATA (in all upper case), because if you do, high-level
executives will have their busy schedules interrupted, and you
engineer's behind will be on the line or toast. The reason the
enterprise storage systems are so expensive (in terms of $/GB) is that
they are fantastically well built, and vendors go to extraordinary
lengths to stand behind them.
One of these days, if you buy me a few beers, I'll tell you the story
of the big array vendor who offered to truck pallets full of batteries
in every 24 hours to keep his disk array running through a multi-day
power outage (because shutting it down was considered to increase the
risk of data loss).
--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr _dot_ los-gatos _dot_ ca.us
| |
| Ron Reaugh 2004-07-16, 5:45 pm |
|
"Robert Wessel" <robertwessel2@yahoo.com> wrote in message
news:bea2590e.0407152246.44abd4d1@posting.google.com...
> "Ron Reaugh" <ron-reaugh@worldnet.att.net> wrote in message
news:<FuGJc.99265$OB3.92215@bgtnsc05-news.ops.worldnet.att.net>...
RAID-1[vbcol=seagreen]
disk is[vbcol=seagreen]
>
>
> I wrote data to the disk. It didn't come back. Sounds like failure to
me.
A single sector lost does not constitute RAID 1 failure. Does RAID 1
operate whereby each read is redundant and then the two read datasets are
compared in OS buffers? NO! There is a failure rate that such would catch
although obscure. Does that constitute a RAID 1 failure? Folks are
grasping into obscurity and very low probabilities.
| |
| Ron Reaugh 2004-07-16, 5:45 pm |
|
"Ralph Becker-Szendy" <lr@idiom.com> wrote in message
news:1089998384.642616@smirk...
> In article <fUEJc.98971$OB3.60798@bgtnsc05-news.ops.worldnet.att.net>,
> Ron Reaugh <ron-reaugh@worldnet.att.net> wrote:
> ...
is[vbcol=seagreen]
usually[vbcol=seagreen]
>
>
> Yes, good arrays all have scrubbing capabilities (or should have
> them). But life isn't quite so easy. Many disk workloads show very
> high locality: For long stretches, the actuator stays at or near the
> same position. If you start scrubbing carelessly while a
> low-intensity foreground workload is running, the response time for
> real IOs can increase quite precipitously. So the trick with
> implementing scrubbing is to forecast when the foreground workload
> will be idle. Like all forecasting of the future, this is quite
> difficult (if I knew how to do it, I would play the stock market, and
> get out of the storage business).
>
> Note that good scrubbing has to be done internally to the array,
> because external scrubbing (for example a full backup, or just reading
> the block device end to end) will not touch all sectors on all disks.
> And depending on how the array is implemented, it may never touch some
> sectors (for example, as long as no disk has failed, most arrays will
> never read the parity block on a RAID-5 group). So this isn't
> something the user of a disk array can take care of himself.
>
then[vbcol=seagreen]
be[vbcol=seagreen]
not[vbcol=seagreen]
to[vbcol=seagreen]
>
> Well, the promise of RAIDed disks is that there is NO data loss.
Well, one has to define that very carefully. Firstly differentiating
"loss" and "error".
> I
> personally think that as soon as I lose a sector, I have violated my
> contract with the end user.
Remember that this discussion was about two drive RAID 1.
> Clearly losing one sector is better than
> losing a whole LUN or a whole array. But if that sector is in an
> allocated area (of the file system or the database that sits above),
> the array has corrupted or invalidated data. That's why to many
> customers the first bit error invalidates the whole LUN - as soon as
> you lose a single sector, you'll have some explaining to do (often
> takes the form that a C-level executive has to call the customer and
> apologize, followed by massive price cuts or rebates.
And what percentage of "bit error" goes undetected overall system wise?
> If you look at the introduction history and market penetration of the
> big disk arrays (EMC Symmetrix, Hitachi Lightning, IBM Shark and so
> on), you'll see that the "public perception" of data reliability has
> been a big factor in selling and pricing; I don't want to go into
> details, as they are sure to step on someones foot. Whether the
> "public perception" of data reliability is actually correlated with
> the real incidence of data loss is an interesting study in mass
> psychology and the power of marketing over engineering. But what is
> clear is that there are many customer who are perfectly willing to pay
> a lot of extra money (a factor of 2, 3 or 10 more than the lowest
> bidder) and select a vendor that gives them a warm and fuzzy feeling
> (and maybe also real technical advantages, or even contractual
> guarantees) about the quality and reliability of the disk array.
Two drive modest configuration RAID 1 arrays are the issue.
| |
| Ron Reaugh 2004-07-16, 5:45 pm |
|
<_firstname_@lr_dot_los-gatos_dot_ca.us> wrote in message
news:1089999993.239394@smirk...
> In article <H3GJc.261014$Gx4.45769@bgtnsc04-news.ops.worldnet.att.net>,
> Ron Reaugh <ron-reaugh@worldnet.att.net> wrote:
and[vbcol=seagreen]
I[vbcol=seagreen]
>
> For an enterprise-class disk array, this is catastrophic (see previous
> message).
See the thread title and the thread itself and what the issue is.
> It will cause an alert to field service personnel. Often,
> the customer will have to be told officially (even if the customer has
> not detected the read failure yet).
>
> For a small RAID array (for example a RAID card on the PCI bus with 2
> or 4 drives, and a single-system file system like NTFS or ext3 on
> top): Most of the time nobody cares.
Now we're back to our thread and my point.
> The performability expectation
> for such a system is sufficiently low that loss of sectors can often
> be tolerated. In particular because in typical file system workloads
> (excluding data bases), much of the data is written, read maybe for a
> short period after being written (for example by the next nightly
> backup), and never touched again.
>
to[vbcol=seagreen]
>
> Sorry,
No, read it again "Does undetected "silent sector deterioration" actually
much of a threat to real world current two drive RAID 1 reliability? I say
no."
| |
| Malcolm Weir 2004-07-16, 5:45 pm |
| On Fri, 16 Jul 2004 20:44:14 GMT, "Ron Reaugh"
<ron-reaugh@worldnet.att.net> wrote:
>me.
>
>A single sector lost does not constitute RAID 1 failure.
A single byte lost does.
> Does RAID 1
>operate whereby each read is redundant and then the two read datasets are
>compared in OS buffers? NO! There is a failure rate that such would catch
>although obscure. Does that constitute a RAID 1 failure? Folks are
>grasping into obscurity and very low probabilities.
Ron is determined to try to maintain the fiction that he has a clue.
He is failing.
Malc.
| |
| Bill Todd 2004-07-16, 5:45 pm |
|
<_firstname_@lr_dot_los-gatos_dot_ca.us> wrote in message
news:1089999993.239394@smirk...
....
> One particular worrisome trend is "off-track writes", which is rumored
> to be more common in consumer-grade disks (typically IDE disks): If
> during writing mechanical vibration occurs, the head might wander off,
> and write the new data slightly off the track, without completely
> overwriting the data on the track. If you now seek away and come back
> to read later, you can get lucky and by coincidence settle on the new
> data, or you can get unlucky and hit the old track, and read old data
> (which is still there, with perfectly valid ECCs, but maybe not for a
> whole track and only for a few sectors). You can see how this can be
> quite catastrophic, even in a non-redundant system.
Hmmm. This sounds similar (but not identical) to a couple of failure modes
(which I may be recalling from Jim Gray's book): silent failure to write at
all, or a 'wild write' that hits sectors other than those it was aimed at.
I'm still plugging away at a file system that can tolerate (and correct)
such errors without undue excess overhead. Think there's any market for it?
....
> What you might detect here is a certain mindset. We all know that
> individual disks are fallible, and we've learned to live with this
> (operative word here is "backup").
Though that won't necessarily save you from the types of errors mentioned
above.
For small RAID arrays (often based
> on motherboards or PCI cards, or hidden in the back end of NAS
> servers), we do a few simple steps that give you a huge improvement in
> reliability, but are still considered somewhat unreliable.
I'd like to change that.
For most
> personal and small business users, these small RAID systems give you a
> huge bang for the buck. But once you enter the realm of the big
> enterprise storage systems, things change, and you MUST NEVER EVER
> LOSE DATA (in all upper case), because if you do, high-level
> executives will have their busy schedules interrupted, and you
> engineer's behind will be on the line or toast.
Well, I've always taken a somewhat more idealistic view: you should never,
ever lose data because 1) it can be a major inconvenience even for the
small-system user and 2) the technology to ensure against such loss exists,
even at PC-level prices (though if you're running without ECC memory or with
a buggy - e.g., overclocked - processor there's really not too much we can
do at the storage end to help you out).
The reason the
> enterprise storage systems are so expensive (in terms of $/GB) is that
> they are fantastically well built, and vendors go to extraordinary
> lengths to stand behind them.
OTOH, you can get away with far less sturdy (and thus far less expensive)
boxes if you wrap end-to-end redundancy checks around multiple instances of
them. And since anyone seriously interested in availability will be running
at least two geographically- (or at least slightly-)separated instances, the
only real additional overhead is in implementing those checks effectively.
- bill
| |
| Ron Reaugh 2004-07-16, 5:45 pm |
|
"Bill Todd" <billtodd@metrocast.net> wrote in message
news:OY-dnWnc7pWs2mXd4p2dnA@metrocast.net...
> Well, I've always taken a somewhat more idealistic view: you should
never,
> ever lose data because 1) it can be a major inconvenience even for the
> small-system user and 2) the technology to ensure against such loss
exists,
> even at PC-level prices (though if you're running without ECC memory or
with
> a buggy - e.g., overclocked - processor there's really not too much we can
> do at the storage end to help you out).
Oh so you do realize the weakness. And it's only OCed systems without ECC
that are fallible is it now??
| |
| Bill Todd 2004-07-16, 8:45 pm |
|
"Ron Reaugh" <ron-reaugh@worldnet.att.net> wrote in message
news:39YJc.102162$OB3.16516@bgtnsc05-news.ops.worldnet.att.net...
>
> "Bill Todd" <billtodd@metrocast.net> wrote in message
> news:OY-dnWnc7pWs2mXd4p2dnA@metrocast.net...
> never,
> exists,
> with
can[vbcol=seagreen]
>
> Oh so you do realize the weakness.
I realize far more than you're ever likely to be able to imagine, Ron.
Don't you ever get tired of being an idiot?
- bill
| |
| Peter da Silva 2004-07-16, 8:45 pm |
| In article <y6XJc.101955$OB3.70179@bgtnsc05-news.ops.worldnet.att.net>,
Ron Reaugh <ron-reaugh@worldnet.att.net> wrote:
> A single sector lost does not constitute RAID 1 failure. Does RAID 1
> operate whereby each read is redundant and then the two read datasets are
> compared in OS buffers? NO! There is a failure rate that such would catch
> although obscure. Does that constitute a RAID 1 failure? Folks are
> grasping into obscurity and very low probabilities.
If you weren't trying to avoid the loss of a single sector you could make
your RAID logic a lot simpler.
--
I've seen things you people can't imagine. Chimneysweeps on fire over the roofs
of London. I've watched kite-strings glitter in the sun at Hyde Park Gate. All
these things will be lost in time, like chalk-paintings in the rain. `-_-'
Time for your nap. | Peter da Silva | Har du kramat din varg, idag? 'U`
| |
| Ron Reaugh 2004-07-16, 8:45 pm |
|
"Peter da Silva" <peter@abbnm.com> wrote in message
news:cd9sk4$h0b$1@jeeves.eng.abbnm.com...
> In article <y6XJc.101955$OB3.70179@bgtnsc05-news.ops.worldnet.att.net>,
> Ron Reaugh <ron-reaugh@worldnet.att.net> wrote:
are[vbcol=seagreen]
catch[vbcol=seagreen]
>
> If you weren't trying to avoid the loss of a single sector you could make
> your RAID logic a lot simpler.
No, that reverse logic doesn't follow. No one is trying to lose anything.
The question is whether the unlikely but theoretical possible loss(there are
other theoretically possible losses which seem to be easily ignored) of a
sector in a two drive RAID 1 configuration is necessarily catastrophic. The
answer is that most often it will not stop the show. The likelihood of this
loss scenario is dramatically reduced by normal regular activities like
reguler use, BU and defrag. You take a low probability vulnerability and
diminish it's likelihood further and then even it it happens it's most
likely to not be a show stopper...what are you left with in a two drive RAID
1 configuration......a non-issue.
| |
| Robert Wessel 2004-07-17, 2:45 am |
| "Ron Reaugh" <ron-reaugh@worldnet.att.net> wrote in message news:<H6XJc.266039$Gx4.239946@bgtnsc04-news.ops.worldnet.att.net>...
> (...)
> Remember that this discussion was about two drive RAID 1.
> (...)
> And what percentage of "bit error" goes undetected overall system wise?
> (...)
> Two drive modest configuration RAID 1 arrays are the issue.
Sector faults used to occur at about the same order of magnitude as
actual (whole) drive failures (several studies from the early/mid
nineties), although it seems to have gotten rather worse over the last
decade or so. This is somewhat anecdotal, but the actual total sector
error rates per drive have probably gone up a small amount (perhaps a
factor of two or three - which is remarkable given the much larger
increase in the number of sectors on a drive), but hardware failures
rates have gone down by an order of magnitude. So you lose a sector
twenty or thirty times more often than you lose a whole drive.
Without scrubbing, the MTTR is very high (since the error is never
detected, and thus never corrected), which seriously negatively
impacts the reliability of the array (at least for the sector in
question, and those in the general vicinity).
This is a bit dated, but: "Latent Sector Faults and Reliability of
Disk Arrays," by HANNU H. KARI:
http://www.cs.hut.fi/~hhk/phd/phd.html
| |
| _firstname_@lr_dot_los-gatos_dot_ca.us 2004-07-17, 2:45 am |
| In article <39YJc.102162$OB3.16516@bgtnsc05-news.ops.worldnet.att.net>,
Ron Reaugh <ron-reaugh@worldnet.att.net> wrote:
>"Bill Todd" <billtodd@metrocast.net> wrote in message
>news:OY-dnWnc7pWs2mXd4p2dnA@metrocast.net...
>
>
>Oh so you do realize the weakness. And it's only OCed systems without ECC
>that are fallible is it now??
Malcolm and Bill: You know, Ron is right - within a certain class of
users and applications. For some other class of application (the
high-end ones that get all the attention), he is totally and
dangerously wrong.
If you are a desktop user who uses MS Windows, and MS Word, and
Outlook and IE, as a personal desktop, all the reliability that you
needs is provided by a single disk (use a crappy IDE from Fry's), and
a real simple backup mechanism (for example once a week copy all the
user files, that is the content of "My Documents", to a writeable CD).
Even a full-blown disk failure (whether it is complete failure to spin
up, or read errors, or even data corruption) requires you to at worst
buy a new disk, reinstall the OS, and copy his documents back from the
most recent weekly CD. You need no RAID. One could argue that you
actually don't need a real computer, but that would be cynical.
If you run a minor server, with a white-box PC, maybe running Linux
and Apache, or Windows and SQLserver: The risks to this system are so
huge (for example from incompetent administration, bad power supplies)
that a single RAID card or RAID on the motherboard, with a pair of
RAID-1 IDE disks is more than adequate. Even with this pretty crappy
disk setup, the chance that a disk failure takes you out is small,
compared to the other risks (let's not even start on the risks that
SQLserver has a bug and corrupts the data, which is much more likely
than disk failures). In this realm, Ron is right: If there is a read
error during a RAID-1 rebuild, just mark the sector as bad, and pray
that is wasn't an inode, or the index of the database.
Matter-of-fact, if the guys disk is only 70% full, chances are 30%
that the victim is an unallocated sector, and the bad sector will get
remapped on the next write without anyone being any wiser.
If you need a storage system to support the trading desk for Morgan
Stanley, or the database for the social security administration, or an
e-business where failure of the computer would cause complete
disruption of the revenue flow (categoric example: pornographic web
site), then we have to use a disk system that is EXTREMELY HIGHLY
RIDICULOUSLY reliable, and costs the big $$$. This is where you get
yourself a big HP server, run IBM DB2 on it, and put the data on four
Hitachi Lightnings (two of them at the local site, each internally
RAIDed, mirrored across the two with a LVM or a SAN virtualization
box, and each then synchronously remote mirrored via rented dark fiber
to two offsite facilities). By the way, all brand names in the
preceding sentence are meant as humorous illustrations; the fact that
they might be my former, current or future employers is one of these
funny coincidences. This system will cost you several ten M$, but it
is unlikely to go down or lose data. If it happens to lose data (for
example because the field service and support team of the vendor who
put it together and manages for you f***ed up, which does happen in
real life), the CEO of the vendor will call your CIO, and offer to
kiss any bodypart the CIO wants to have kissed ... not to mention some
major financial apologies. In this environment, quietly marking a
sector bad would be tantamount to treason, and might even start a
lawsuit.
Actually, I've heard that the single largest cause of data loss on
high-end system is wetware failure. Commonly if a small emergency
happens (typically in the middle of the night, at 3AM, when human
reasoning is at its worst), the team of super-experts tries to repair
it, sometimes with desastrous consequences. The only saving grace is:
More often than not it is the customer's own employees who are at
fault, so from the perspective of being a vendor, you are usually OK.
By the way, what do I run at home (I have a minor server with Linux,
Apache, mySQL, but it isn't used for anything that involves money
making): A single (non-RAIDed) 10K RPM Quantum SCSI disk, with nightly
backups to a 200GB cheap IDE disk, and occasional backups to writeable
CD or DLT tape taken offsite (whenever I feel like it). I've been
thinking of getting a cheapo IDE RAID card (I could probably swipe a
used 3Ware 4-port card from the office, we used a few of them in a
test setup and they are now gathering dust), and put 4 reasonable IDE
disks on them (for example the 80GB 7200 RPM Seagates, which can be
had for about $80 on sale). With four disks in a RAID-10
configuration, I would have more space than my current SCSI disk, it
would probably be faster (4 spindles instead of one, at least for a
read-intensive workload like a web server), and extra reliability to
boot. Just haven't had the time and energy to deal with it. Oh, and
calling the 3Ware card "cheapo" is not meant as an insult: I really
like working with them; they are inexpensive, effective, and get the
job done (for a reasonable definition of job).
--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr _dot_ los-gatos _dot_ ca.us
| |
| Bill Todd 2004-07-17, 2:45 am |
|
<_firstname_@lr_dot_los-gatos_dot_ca.us> wrote in message
news:1090041101.742428@smirk...
....
> Malcolm and Bill: You know, Ron is right - within a certain class of
> users and applications.
I don't think you've been paying close enough attention. While Ron is
certainly right in stating that there are a great many situations in which
other potential risks far outweigh the risk of loss of RAID-1-protected data
(even relatively incompetently RAID-1-protected data, as would be the case
in a non-scrubbing array), that issue is not one of those being debated.
Where he went completely off the rails was in suggesting that fatal sector
deterioration is not 'failure', and is thus irrelevant (as scrubbing would
then also be) to the calculation of the MTBF of the RAID-1 pair -
independent of what other external (non-RAID-1-pair-related) risks may or
may not exist in the environment in question.
That RAID-1-specific MTBF is what the original poster expressed interest in.
And scrubbing (or lack thereof) is beyond any shadow of a doubt of major
importance in evaluating it.
- bill
| |
| _firstname_@lr_dot_los-gatos_dot_ca.us 2004-07-17, 2:45 am |
| In article <I7udnRHWU9DsXWXdRVn-hA@metrocast.net>,
Bill Todd <billtodd@metrocast.net> wrote:
>
><_firstname_@lr_dot_los-gatos_dot_ca.us> wrote in message
>news:1090041101.742428@smirk...
>
>...
>
>
>I don't think you've been paying close enough attention. While Ron is
>certainly right in stating that there are a great many situations in which
>other potential risks far outweigh the risk of loss of RAID-1-protected data
>(even relatively incompetently RAID-1-protected data, as would be the case
>in a non-scrubbing array), that issue is not one of those being debated.
>
>Where he went completely off the rails was in suggesting that fatal sector
>deterioration is not 'failure', and is thus irrelevant (as scrubbing would
>then also be) to the calculation of the MTBF of the RAID-1 pair -
>independent of what other external (non-RAID-1-pair-related) risks may or
>may not exist in the environment in question.
>
>That RAID-1-specific MTBF is what the original poster expressed interest in.
>And scrubbing (or lack thereof) is beyond any shadow of a doubt of major
>importance in evaluating it.
>
>- bill
OK, now I understand. You are completely correct. The MTBF or a RAID
array is the time until the first bit, byte or sector fails (or is
quietly corrupted). Not the time until the whole thing falls over
dead.
--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr _dot_ los-gatos _dot_ ca.us
| |
| Ron Reaugh 2004-07-17, 2:45 am |
|
"Robert Wessel" <robertwessel2@yahoo.com> wrote in message
news:bea2590e.0407162052.44d1e5c3@posting.google.com...
> "Ron Reaugh" <ron-reaugh@worldnet.att.net> wrote in message
news:<H6XJc.266039$Gx4.239946@bgtnsc04-news.ops.worldnet.att.net>...
>
>
> Sector faults used to occur at about the same order of magnitude as
> actual (whole) drive failures (several studies from the early/mid
> nineties), although it seems to have gotten rather worse over the last
> decade or so. This is somewhat anecdotal, but the actual total sector
> error rates per drive have probably gone up a small amount (perhaps a
> factor of two or three - which is remarkable given the much larger
> increase in the number of sectors on a drive), but hardware failures
> rates have gone down by an order of magnitude. So you lose a sector
> twenty or thirty times more often than you lose a whole drive.
That information does NOT jibe with what's being reported in the industry.
Most recent HDs are detecting and self flawing sectors before they become
unreadable. When was the last time you saw anykind of failure during a BU,
defrag, copy on say a workstation from a sector becoming unreadable and just
a single sector not associated with an overall drive failure. In 1995 and
before that would happen relatively often but now almost never.
> Without scrubbing, the MTTR is very high (since the error is never
> detected, and thus never corrected), which seriously negatively
> impacts the reliability of the array (at least for the sector in
> question, and those in the general vicinity).
Assuming what you say is true and it's NOT.
> This is a bit dated, but: "Latent Sector Faults and Reliability of
> Disk Arrays," by HANNU H. KARI:
>
> http://www.cs.hut.fi/~hhk/phd/phd.html
Quite dated.
| |
| Ron Reaugh 2004-07-17, 2:45 am |
|
<_firstname_@lr_dot_los-gatos_dot_ca.us> wrote in message
news:1090041101.742428@smirk...
>
> If you run a minor server, with a white-box PC, maybe running Linux
> and Apache, or Windows and SQLserver: The risks to this system are so
> huge (for example from incompetent administration, bad power supplies)
> that a single RAID card or RAID on the motherboard, with a pair of
> RAID-1 IDE disks is more than adequate. Even with this pretty crappy
> disk setup, the chance that a disk failure takes you out is small,
> compared to the other risks (let's not even start on the risks that
> SQLserver has a bug and corrupts the data, which is much more likely
> than disk failures). In this realm, Ron is right: If there is a read
> error during a RAID-1 rebuild, just mark the sector as bad, and pray
> that is wasn't an inode, or the index of the database.
> Matter-of-fact, if the guys disk is only 70% full, chances are 30%
> that the victim is an unallocated sector, and the bad sector will get
> remapped on the next write without anyone being any wiser.
Somebody here actually DOES have a clue.
> By the way, what do I run at home (I have a minor server with Linux,
> Apache, mySQL, but it isn't used for anything that involves money
> making): A single (non-RAIDed) 10K RPM Quantum SCSI disk, with nightly
> backups to a 200GB cheap IDE disk, and occasional backups to writeable
> CD or DLT tape taken offsite (whenever I feel like it).
KinWin makes some real nice ~$30 removeable shock mounted IDE HD trays and
then get a nice little padded case and that big inexpensive [S]ATA HD makes
for a great offsite BU option too. There are no viable tape options in the
small/modest server environment today.
> I've been
> thinking of getting a cheapo IDE RAID card (I could probably swipe a
> used 3Ware 4-port card from the office, we used a few of them in a
> test setup and they are now gathering dust), and put 4 reasonable IDE
> disks on them (for example the 80GB 7200 RPM Seagates, which can be
> had for about $80 on sale).
Use WDCs or Maxtors.
> With four disks in a RAID-10
With the 3Ware why not RAID 5?
| |
| Ron Reaugh 2004-07-17, 2:45 am |
|
"Bill Todd" <billtodd@metrocast.net> wrote in message
news:I7udnRHWU9DsXWXdRVn-hA@metrocast.net...
>
> <_firstname_@lr_dot_los-gatos_dot_ca.us> wrote in message
> news:1090041101.742428@smirk...
>
> ...
>
>
> I don't think you've been paying close enough attention. While Ron is
> certainly right in stating that there are a great many situations in which
> other potential risks far outweigh the risk of loss of RAID-1-protected
data
> (even relatively incompetently RAID-1-protected data, as would be the case
> in a non-scrubbing array), that issue is not one of those being debated.
Wrong.
> Where he went completely off the rails was in suggesting that fatal sector
> deterioration is not 'failure', and is thus irrelevant (as scrubbing would
> then also be) to the calculation of the MTBF of the RAID-1 pair -
> independent of what other external (non-RAID-1-pair-related) risks may or
> may not exist in the environment in question.
Wrong. Read what he posted and what I posted rather than making up fairy
tales.
| |
| Paul Rubin 2004-07-17, 2:45 am |
| "Ron Reaugh" <ron-reaugh@worldnet.att.net> writes:
> That information does NOT jibe with what's being reported in the industry.
> Most recent HDs are detecting and self flawing sectors before they become
> unreadable. When was the last time you saw anykind of failure during a BU,
> defrag, copy on say a workstation from a sector becoming unreadable and just
> a single sector not associated with an overall drive failure. In 1995 and
> before that would happen relatively often but now almost never.
I've seen it on recent IBM Travelstar laptop drives. I have one with
a number of bad sectors that I took out of service when the problems
appeared, and another which started developing those problems and
shortly afterwards started failing completely.
| |
| Ron Reaugh 2004-07-17, 5:45 pm |
|
"Paul Rubin" <http://phr.cx@NOSPAM.invalid> wrote in message
news:7xoemfm5x4.fsf@ruckus.brouhaha.com...
> "Ron Reaugh" <ron-reaugh@worldnet.att.net> writes:
industry.[vbcol=seagreen]
become[vbcol=seagreen]
BU,[vbcol=seagreen]
just[vbcol=seagreen]
and[vbcol=seagreen]
>
> I've seen it on recent IBM Travelstar laptop drives. I have one with
> a number of bad sectors that I took out of service when the problems
> appeared, and another which started developing those problems and
> shortly afterwards started failing completely.
Any 3.5" drives do that recently?
| |
|
|
ohaya wrote:
>
> Bill Todd wrote:
>
> Hi,
>
> I'm back , and I'm bottom-posting to one of the earlier posts so that
> everything is there, as this thread is getting a little long. I hope
> that this is ok?
>
> I'm still a little puzzled about your (and I think Ron's) earlier
> comments about the article from phptr.com that I linked earlier (see
> above), and I've been trying to "reconcile" that approach/methodology to
> the ones from the tech-report.com and from the teradataforum.com.
>
> If I try to run the equivalent (hypothetical) numbers through both, I
> get vastly different results.
>
> For example, if I:
>
> - assume 100,000 hours for a single drive/device and
> - have 3 drives in RAID 1, and
> - assume 24 hours MTTR, and
> - use the tech-report.com/teradataforum.com method, I get:
>
> MTTF(RAID1) ~ 20 TRILLION hours+
>
> And, if I follow the method from phptr.com, with the same data, I get:
>
> AFR(1 drive) = 8760/100,000 = .0876
> AFR(3 drives-RAID1) = (.0876)^3 ~ .0006722
> MTBF(3 drives-RAID1) = 8760/AFR(3 drives-RAID1) ~ 13 MILLION hours+
>
> Using the method from the phptr.com page, the MTBF results are WAY less
> than the other method.
>
> Assuming that the tech-report.com/teradataforumc.com method is more
> correct, and if the method from the phptr.com page is so wrong for
> calculating just a relatively simple RAID1 configuration, is ANY of the
> rest of the methods described in the phptr.com page a valid approach?
>
> The reason for my question is that the next thing that I wanted to look
> at was to use the method described in the rest of the phptr.com page
> (i.e., in the case study) to do some ballpark figuring for a more
> extended system (with more than just the raided drives), similar to what
> was in the case study, using MTBF numbers that I have for components.
>
> If any of you might be able to shed some (more) light on this, I'd
> really appreciate it.
>
> Thanks again,
> Jim
Hi All,
Interesting thread, and I'm learning a lot.
But, getting back to the original subject matter ( ), I ran across a
document on the web that I think may explain what I was puzzled about.
The article was written by Jeffrey S. Pattavina, and it's at:
http://www.commsdesign.com/printabl...icleID=18311631
In that article, the author describes several different models and
scenarios for redundant systems, and describes the MTBF calculations for
each, and among these descriptions, you can see how the "150%"
("unmaintained systems") vs. the "square" ("maintained systems")
estimates come from.
Jim
| |
| Bill Todd 2004-07-17, 5:45 pm |
|
"ohaya" <ohaya@cox.net> wrote in message news:40F98091.FEDBC661@cox.net...
....
> But, getting back to the original subject matter ( ), I ran across a
> document on the web that I think may explain what I was puzzled about.
> The article was written by Jeffrey S. Pattavina, and it's at:
>
> http://www.commsdesign.com/printabl...icleID=18311631
>
> In that article, the author describes several different models and
> scenarios for redundant systems, and describes the MTBF calculations for
> each, and among these descriptions, you can see how the "150%"
> ("unmaintained systems") vs. the "square" ("maintained systems")
> estimates come from.
Indeed - and I was too harsh in my comments about the article you referred
to, since on more careful reading it makes it clear that they were talking
about 1) systems that weren't repaired when a device failed and 2) the
theoretical MTBF in the same sense that it applies to individual units
(i.e., systems where it's assumed that you simply discard the system when
its nominal service life expires, at some very small fraction of the MTBF,
and the probability of a failure *within* that service life).
Since you were explicitly asking for information about a system that *would*
be repaired on device failure, that analysis did not apply. But it was not
off-base for the specific situation it described. In fact, the reference I
mentioned discusses such systems (your repetition above jogged my memory),
but only en route to discussing maintained systems (which any RAID typically
is: you don't usually find disks on unmaintained spacecraft, since the
heads won't 'fly' in a vacuum, and in any event improving the MTBF by only
50% - assuming you really aren't going to effect repairs - hardly justifies
the use of a second disk).
- bill
| |
|
|
> Indeed - and I was too harsh in my comments about the article you referred
> to, since on more careful reading it makes it clear that they were talking
> about 1) systems that weren't repaired when a device failed and 2) the
> theoretical MTBF in the same sense that it applies to individual units
> (i.e., systems where it's assumed that you simply discard the system when
> its nominal service life expires, at some very small fraction of the MTBF,
> and the probability of a failure *within* that service life).
>
> Since you were explicitly asking for information about a system that *would*
> be repaired on device failure, that analysis did not apply. But it was not
> off-base for the specific situation it described. In fact, the reference I
> mentioned discusses such systems (your repetition above jogged my memory),
> but only en route to discussing maintained systems (which any RAID typically
> is: you don't usually find disks on unmaintained spacecraft, since the
> heads won't 'fly' in a vacuum, and in any event improving the MTBF by only
> 50% - assuming you really aren't going to effect repairs - hardly justifies
> the use of a second disk).
>
Bill,
Thanks.
Now for the "kicker" (and I hope that I don't get flamed for this )...
I don't remember if this was one of the links that I posted:
http://www.sun.com/blueprints/0602/816-5132-10.pdf
In the above article, the author examines the reliability of several
different configurations, and from looking at the way that he calculates
AFR and MTBF for redundant drives, it looks like it's an approximation
somewhere between a "Hot-standby-maintained" and
"Cold-standby-maintained".
At the beginning of that article, he does a calculation of the AFR for a
redundant pair of drives assuming MTBF of 100,000 hours for the
individual drives. Then, he goes through MTBF calculations for 3
different configurations with drives with MTBF of 1,000,000 hours.
If you take the MTBF of the simple redundant pair (which again, assumed
100,000 hours), I get a MTBF(System) of about 13,031,836 hours.
Now, if you look at the calculations that he does for the 3
configurations in his Case study (again, he used 1,000,000 hours for the
drive MTBF here, rather than 100,000 hours), the best MTBF was for
Architecture 2, 1,752,000 hours.
So, we have:
MTBF(RAID1 pair of 100,000 hour drives) = 13Mhours
MTBF(Architecture 2 w/1,000,000 hour drives) = 1.7Mhours
My question now is:
Given the above MTBF estimates, and that the MTBF of the simple RAID1
pair of drives (even with 100,000 hours MTBF), purely FROM A RELIABILITY
standpoint, why would anyone ever consider a SAN-type storage
architecture over a RAID1 pair of drives (again, this question is purely
from the standpoint of reliability)?
I've done several spreadsheets following the model in the above article,
and it's very difficult (actually, it might be impossible) to get the
MTBF of any of the architectures in the Case study to come even remotely
close to the MTBF of the simple RAID1 pair (even using 100,000 hours for
the simple pair).
If you add the fact that when organizations go to SANs, they oftentime
also have a goal of centralizing all storage for their entire
organization onto the SANs ("eggs in one basket" ), I'm even more
puzzled by this...
The only rationale that I can come up with is that the non-reliability
benefits of going to a centralized SAN-type store must outweigh the loss
of reliability.
Comments??
Thanks again,
Jim
| |
| _firstname_@lr_dot_los-gatos_dot_ca.us 2004-07-18, 2:45 am |
| In article <sO3Kc.103841$OB3.80303@bgtnsc05-news.ops.worldnet.att.net>,
Ron Reaugh <ron-reaugh@worldnet.att.net> wrote:
....
>With the 3Ware why not RAID 5?
With a file system workload, I hate the small-write penalty (the
read-modify-write cycle necessary for parity update). If I was doing
exclusively or mostly large files, or using a professional-strength
file system (that allocates space in a sensible and RAID-5-friendly
manner), or running a database (excluding TPC-C style small updates),
or if the incremental cost of storage between RAID-1 and RAID-5 were
an issue for me, I would change my mind and be OK with RAID. But in
my case, whether 4 disks give me 2x or 3x the capacity of a single
disk is pretty irrelevant (I'm having a hard time even filling my
current disks), so there is no reason to risk the speed hit that comes
from RAID-5.
Once again, I'm not saying that everyone should pick RAID-1. But
people who don't care about the capacity difference between RAID-1 and
RAID-5 should always pick RAID-1. I bet that in most commercial
environments, this situation doesn't arise often.
--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr _dot_ los-gatos _dot_ ca.us
| |
| _firstname_@lr_dot_los-gatos_dot_ca.us 2004-07-18, 2:45 am |
| In article <tM3Kc.268549$Gx4.56574@bgtnsc04-news.ops.worldnet.att.net>,
Ron Reaugh <ron-reaugh@worldnet.att.net> wrote:
....
>That information does NOT jibe with what's being reported in the industry.
>Most recent HDs are detecting and self flawing sectors before they become
>unreadable. When was the last time you saw anykind of failure during a BU,
>defrag, copy on say a workstation from a sector becoming unreadable and just
>a single sector not associated with an overall drive failure. In 1995 and
>before that would happen relatively often but now almost never.
True during write: If the sector is found unreadable (unable to sync
or unable to servo), it will be quietly remapped. That's why write
error have become just about non-existant. The only time you get them
is if the drive is out of spare sectors (at which point things are
probably going to hell in a handbasket anyhow, and the drive will
usually completely fail soon thereafter).
Not true during read. You will get errors during read. What is true:
Drives will take sectors that can still be read but are marginal, and
proactively remap them. This does reduce (maybe even greatly reduce)
the rate of read errors. In particular, it gives scrubbing more
traction. But it doesn't always help.
The counter-balancing effect is that disks are getting larger rather
quickly, while the IO bandwidth is increasing more slowly. This means
that statistically, a smaller fraction of the disk is being read, so
there is more opportunity for sectors to rot away unnoticed.
If you have a single disk, and read it just a little bit (on a
single-user system or small server), you are statistically unlikely to
see read errors anyhow. If your disk is 100% busy for years, you will
see read errors. Try this for fun: Get a few hundred disks (I think I
used to have about 150 disks in my previous lab), and run them flat
out for a few months. You will see single-sector read errors. Not
one or two, but many.
Or go to a major disk vendor, and buy a few million disks. They will
probably give you failure statistics that are not available to normal
humans. And if you buy this many disks, you probably have your own QC
department, and you study disk reliability. So you probably know
quite well how often individual sectors fail. But you will not (and
legally can not) release this information to the public.
--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr _dot_ los-gatos _dot_ ca.us
| |
| Bill Todd 2004-07-18, 7:45 am |
|
"ohaya" <ohaya@cox.net> wrote in message news:40F99961.9DEC4166@cox.net...
....
> I don't remember if this was one of the links that I posted:
>
> http://www.sun.com/blueprints/0602/816-5132-10.pdf
It is at least very similar to the first one you posted.
>
> In the above article, the author examines the reliability of several
> different configurations,
I'll start writing comments as I read the article, rather than organize them
more formally:
Right off the bat, there's the difference between 'reliability' and
'availability' (though the former term is sometimes not as well-defined as
the latter). Reliability, at least in my view, relates to providing
*correct* results (often in a timely manner), whereas availability relates
to continuing to provide results (with no explicit guarantee that they're
correct).
You simply can't achieve that definition of 'reliability' without redundant
hardware operating in lock-step with constant hardware comparisons of
outputs at one or more levels to make sure that both subsystems agree on the
result of every operatio | | |