Data Storage - Undetected Errors Rates

This is Interesting: Free IT Magazines  
Home > Archive > Data Storage > February 2006 > Undetected Errors Rates





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Undetected Errors Rates
Henry Newman

2006-02-08, 5:48 pm

Does anyone know the differences in rates between SATA and FC drives
for undected error rates. I am sure they are high but what is the
difference if any.

Thanks
robertwessel2@yahoo.com

2006-02-09, 2:46 am


Henry Newman wrote:
> Does anyone know the differences in rates between SATA and FC drives
> for undected error rates. I am sure they are high but what is the
> difference if any.



I don't know if I've ever seen that stat published by a manufacturer.
In any event this will have little or nothing to do with the interface,
and everything with the drive. And I would expect that the undetected
read error rate is proportional to the uncorrected read error rate
(although much smaller), and that stat *is* published for almost every
drive.

For example, Seagate lists the non-recoverable read errors of the
PATA/SATA Barracuda 7200.9's as 1 per 10**14 bits read. And 1/10**15
for the SCSI/SAS/FC Cheetah 15K.4's. One would expect that undetected
read errors are at least several orders of magnitude less common.

_firstname_@lr_dot_los-gatos_dot_ca.us

2006-02-09, 2:46 am

In article <1139460039.584271.70780@g43g2000cwa.googlegroups.com>,
robertwessel2@yahoo.com <robertwessel2@yahoo.com> wrote:
>
>Henry Newman wrote:

Maybe some people know. If they do, they won't talk about it in
public. The manufacturers won't talk about them for obvious reasons.
The large users of disks (disk array makers) won't talk about it to
protect their trade secrets, and not scare their customers.
[vbcol=seagreen]
>In any event this will have little or nothing to do with the interface,
>and everything with the drive.


Absolutely - read Anderson, Dykes and Riedel's paper "More than just
an interface: SCSI versus IDE" (or some title like it, google for it).
It so happens that, today cheap unreliable slow consumer drive versus
expensive reliable fast enterprise drive is strongly (but not
completely) correlated with (S)ATA versus SCSI/FC/SAS, but the real
effect is the underlying drive.

> And I would expect that the undetected
>read error rate is proportional to the uncorrected read error rate
>(although much smaller), and that stat *is* published for almost every
>drive.


DO NOT ASSUME THIS IF YOU WANT TO BUILD RELIABLE AND LARGE DISK
SYSTEMS (by large, I mean with more than a few disks). Instead,
determine the rate, either experimentally or by talking to the
manufacturer. Just take 500 or a thousand drives, put them into a
temperature chamber, and run them for a few months, under full load.

In particular, beware of effects like off-track writes, which can
cause false (old) data to be returned without error indication.

>For example, Seagate lists the non-recoverable read errors of the
>PATA/SATA Barracuda 7200.9's as 1 per 10**14 bits read. And 1/10**15
>for the SCSI/SAS/FC Cheetah 15K.4's.


20TB (which fits into three enclosures, it is just 40 drives, about
16" high if mounted in a rack, cost might be $40K, affordable for a
small business) is 1.6 * 10**14 bits. So statistically, without RAID,
you should expect to start unrecoverable read errors on a system that
size IF YOU READ THE DATA JUST ONCE, which takes hours or days. If
you read the data all year long, you will have serious problems even
on considerably smaller systems, unless you have RAID.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca.us
Bill Todd

2006-02-09, 2:46 am

robertwessel2@yahoo.com wrote:
> Henry Newman wrote:
>
>
> I don't know if I've ever seen that stat published by a manufacturer.


I have, but so long ago that I'm not certain I remember it correctly.

....

> For example, Seagate lists the non-recoverable read errors of the
> PATA/SATA Barracuda 7200.9's as 1 per 10**14 bits read. And 1/10**15
> for the SCSI/SAS/FC Cheetah 15K.4's. One would expect that undetected
> read errors are at least several orders of magnitude less common.


My dim recollection is that the number of orders was 3 (10**-15 vs.
10**-18, I think, for a high-end SCSI drive).

OTOH, I happened to be reading a paper just yesterday ("Reliability
Mechanisms for Very Large Storage Systems") which casually mentioned
that 10**-15 (only 1 order of magnitude lower than a common value for
the uncorrected error rate) is a common value for the undetected error
rate. It could have been a mistaken reference to the uncorrected error
rate, but several of the authors are people of some note so that would
kind of surprise me. And it's possible that the continuing push toward
higher densities has reduced the separation between the two values over
time.

- bill
Bill Todd

2006-02-09, 2:46 am

_firstname_@lr_dot_los-gatos_dot_ca.us wrote:

....

> 20TB (which fits into three enclosures, it is just 40 drives, about
> 16" high if mounted in a rack, cost might be $40K, affordable for a
> small business) is 1.6 * 10**14 bits. So statistically, without RAID,
> you should expect to start unrecoverable read errors on a system that
> size IF YOU READ THE DATA JUST ONCE, which takes hours or days. If
> you read the data all year long, you will have serious problems even
> on considerably smaller systems, unless you have RAID.


And possibly even if you *do* have RAID.

A 500 GB disk has 4*10**12 bits on it. Even a modest RAID-5 array of
such disks (say, 5 disks total) thus has 1.6*10**13 bits on the
surviving disks after one disk fails.

In other words, there appears to be something like a 16% chance that
you'll encounter an unreadable sector on one of the survivors while
attempting to rebuild the failed disk, and lose some data. Note that
this completely dwarfs the data-loss probability derived by a
conventional RAID analysis which only considers the chance of a second
whole-disk failure: that's why interest in RAID-6 (double-parity
protection) has increased markedly of late.

Now, the statements in the previous post suggest that uncorrectable
errors crop up randomly, but that doesn't seem consistent with the idea
that 'scrubbing' disks in the background to find - and fix - unreadable
sectors *before* they're needed is useful. If the manufacturer's value
for the incidence of uncorrectable reads does not assume such scrubbing,
the next question is how much scrubbing helps things (and whether too
much scrubbing can be harmful...).

- bill
_firstname_@lr_dot_los-gatos_dot_ca.us

2006-02-09, 5:48 pm

Hi Bill,

In article <K4mdnS-dlKVucXfeRVn-hQ@metrocastcablevision.com>,
Bill Todd <billtodd@metrocast.net> wrote:
>_firstname_@lr_dot_los-gatos_dot_ca.us wrote:
>Now, the statements in the previous post suggest that uncorrectable
>errors crop up randomly, but that doesn't seem consistent with the idea
>that 'scrubbing' disks in the background to find - and fix - unreadable
>sectors *before* they're needed is useful. If the manufacturer's value
>for the incidence of uncorrectable reads does not assume such scrubbing,
>the next question is how much scrubbing helps things (and whether too
>much scrubbing can be harmful...).


Apologies - I had no intention of suggesting such a thing. Certainly,
some classes of uncorrectable errors are not random. Certainly,
scrubbing (reading all the data regularly) is an excellent thing to
do. An even better thing to do is to scrub the data by copying it
around on the drive or among drives, as drives will check for write
errors, and fix marginal or bad sectors.

I don't know whether too much scrubbing can be harmful; one might
imagine that the extra actuator activity might wear something out;
hard to say. Scrubbing done badly can obviously kill performance;
disk arrays should only go into scrubbing mode if the foreground
workload will not feel any performance impact from the scrubbing
workload. Given the sad state of performance management in disk
drives, implementing scrubbing requires a gentle touch.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca.us
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com