Data Storage - RAID1 rebuild time question

This is Interesting: Free IT Magazines  
Home > Archive > Data Storage > March 2005 > RAID1 rebuild time question





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author RAID1 rebuild time question
Frank de Groot

2005-03-10, 5:45 pm

I have a RAID1 with a Silicon Image SATA controller sil3114.

Often, my PC bluescreens (that caused data corruption on my previous HD so I
switched to RAID1).
The SATARAID utility reports in such cases always an "event" (exceeding
S.M.A.R.T status) and starts rebuilding.
Incidentally, the same happens when copying a few hundred thousand files,
after about half an hour there seems to be so much overheating that the
S.M.A.R.T status is exceeded.

Anyway. The rebuild rate has been set to "fastest" but it still takes 24
hours (or more!) to rebuild the mirror.
Is that normal?

And, more importantly, what happens when there is another crash that
corrupts the *other* HDD during the rebuilding process?
(This has already happened several times..)

Some more questions: Can I add more disks to that card (which has 4 SATA
connectors). Doesn't seem so.

And this rebuilding, why is always the entire HD rebuilt and not the sector
that is deemed corrupted?
Is rebuilding so slow to avoid overheating or taking away to many system
resources?
Can the rebuilding be sped up by a registry hack or something?

Could it be that mismatching RAM timing ratings are responsible for my many
bluescreens that cause HD corruption?

(Newly installed WinXP SP2, virus free machine).

TIA,
Frank



David A.Lethe

2005-03-11, 2:45 am

On Thu, 10 Mar 2005 18:07:49 +0100, "Frank de Groot"
<franciad@online.no> wrote:

>I have a RAID1 with a Silicon Image SATA controller sil3114.
>
>Often, my PC bluescreens (that caused data corruption on my previous HD so I
>switched to RAID1).
>The SATARAID utility reports in such cases always an "event" (exceeding
>S.M.A.R.T status) and starts rebuilding.
>Incidentally, the same happens when copying a few hundred thousand files,
>after about half an hour there seems to be so much overheating that the
>S.M.A.R.T status is exceeded.
>
>Anyway. The rebuild rate has been set to "fastest" but it still takes 24
>hours (or more!) to rebuild the mirror.
>Is that normal?
>
>And, more importantly, what happens when there is another crash that
>corrupts the *other* HDD during the rebuilding process?
>(This has already happened several times..)
>
>Some more questions: Can I add more disks to that card (which has 4 SATA
>connectors). Doesn't seem so.
>
>And this rebuilding, why is always the entire HD rebuilt and not the sector
>that is deemed corrupted?
>Is rebuilding so slow to avoid overheating or taking away to many system
>resources?
>Can the rebuilding be sped up by a registry hack or something?
>
>Could it be that mismatching RAM timing ratings are responsible for my many
>bluescreens that cause HD corruption?
>
>(Newly installed WinXP SP2, virus free machine).
>
>TIA,
>Frank
>
>

The event means your disk is is dying, perhaps it is out of spare
sectors for reallocation. Replace the disk.

Frank de Groot

2005-03-11, 2:45 am

"David A.Lethe" <david@santools.com> wrote in message
news:19d231d1sreoe9ouk9pda6vvp9cda6g1f2@
4ax.com...
>
> The event means your disk is is dying, perhaps it is out of spare
> sectors for reallocation. Replace the disk.


I forgot to mention that all disks are brand new, almost the same serial #s
and verified OK.
When it happened the first time I traded the disk for another new one. Same
problem.

I have 5 new disks, any combination of two in a RAID1 shows the same
problem.

The same problem occurs with any other disk, not in a RAID1, but that causes
the disk to be corrupted so much that finally I loose data. So I gather it's
the MB, as replacing the OS (2000 to XP) did not do the trick either.

Still I wonder why the rebuild times are so slow with SATARAID (or the
sil3114).




Maxim S. Shatskih

2005-03-12, 2:45 am

> I have a RAID1 with a Silicon Image SATA controller sil3114.
>
> Often, my PC bluescreens (that caused data corruption on my previous HD so I


Try using Windows software RAID instead. Maybe this is a bug in the
controller's driver.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com


Frank de Groot

2005-03-12, 2:45 am

>> Often, my PC bluescreens (that caused data corruption on my previous HD
>
> Try using Windows software RAID instead. Maybe this is a bug in the
> controller's driver.


Nope.
It also happens on ordinary IDE HD's.
And it also happens on a different MOBO with the same RAM.
And it also happens with a different OS.



Frank de Groot

2005-03-12, 2:45 am

"Frank de Groot" <franciad@online.no> wrote in message
news:HCmYd.1683$SL4.30777@news4.e.nsc.no...

FYI, I now kno what it is.
One of my disks just DIED on me (meaning Windows could not do a delayed
write any more and the disk was gone from the Admin tools).

Then an AV with a message from MS saying that a certaain HP driver needed
updating urgently or I could get a damaged system.
I went to the HP site and there they said that the buggy driver could
irrepairably damage the bootsector, lead to loss of files making it
inevitable that the OS needed to be reinstalled etc. Some nice mess..

The craziest part is that the name of the app that wreaks all this havoc
(and has been doing for the past 2 year apparently, since I bought a
scanner) is called: "Memories to CD" or something
Damage suffered: many thousands of USD and many weeks of delays and many
lost file over the years.
I wish people wouldn't force-install that crap with scanners and printers
nowadays.


Faeandar

2005-03-12, 2:45 am

On Thu, 10 Mar 2005 18:07:49 +0100, "Frank de Groot"
<franciad@online.no> wrote:

>I have a RAID1 with a Silicon Image SATA controller sil3114.
>
>Often, my PC bluescreens (that caused data corruption on my previous HD so I
>switched to RAID1).
>The SATARAID utility reports in such cases always an "event" (exceeding
>S.M.A.R.T status) and starts rebuilding.
>Incidentally, the same happens when copying a few hundred thousand files,
>after about half an hour there seems to be so much overheating that the
>S.M.A.R.T status is exceeded.


First off, mirroring will not protect against write corruption.
Whatever it writes to one it writes to the other. And a blusescreen
would likely be a write corruption since most of the OS runs out of
memory bypassing reads. Unless you get a BSOD on boot or shortly
after login. Then maybe...

>
>Anyway. The rebuild rate has been set to "fastest" but it still takes 24
>hours (or more!) to rebuild the mirror.
>Is that normal?


You don't mention the size of the drives but in most cases yes. FC
drives can take less but most sata drives are 24 hours, some as long
as 36.

>
>And, more importantly, what happens when there is another crash that
>corrupts the *other* HDD during the rebuilding process?
>(This has already happened several times..)
>
>Some more questions: Can I add more disks to that card (which has 4 SATA
>connectors). Doesn't seem so.


If you only have 2 drives on there now then you can add 2 more.
Hopefully "connectors" really means channels. If not you're only
going to halve your speed per drive since a channel can only handle so
much througput.

>
>And this rebuilding, why is always the entire HD rebuilt and not the sector
>that is deemed corrupted?
>Is rebuilding so slow to avoid overheating or taking away to many system
>resources?
>Can the rebuilding be sped up by a registry hack or something?


There is what's called "sick disk recovery" where valid data on a
drive will be copied off to the spare, but I highly doubt your card
has that.
Rebuilding a mirror can seriously impact performance on a 2 drive
system. It may be purposeful, it may just be the limit of the drive.
Hardware raid rebuilds by blocks, not files. So if you only have 24gb
of data on the drive it's not rebuilding 24gb, it's rebuilding all
120/260/430gb (whatever) worth of blocks on the drive.

>
>Could it be that mismatching RAM timing ratings are responsible for my many
>bluescreens that cause HD corruption?


I would not think timing mismatch would be an issue, more likely bad
segments of memory if you suspect memory for some reason. It could
also be the card. Wouldn't be the first time a raid controller went
to crap slowly.

~F
Frank de Groot

2005-03-12, 7:45 am

> First off, mirroring will not protect against write corruption.

I was afraid of that.
Thanks for your answer BTW.

> You don't mention the size of the drives but in most cases yes. FC
> drives can take less but most sata drives are 24 hours, some as long
> as 36.


Indeed it takes up to 36 hours for 120 GB drives.

> If you only have 2 drives on there now then you can add 2 more.
> Hopefully "connectors" really means channels.


I meant to make a RAID12 that contains 3 drives + 1 hot spare instead of 2
drives + 1 hot spare.

> There is what's called "sick disk recovery" where valid data on a
> drive will be copied off to the spare, but I highly doubt your card
> has that.


No, this card (Silicon Image) doesn't even speed up reading from a RAID1.
It's software-based.

> I would not think timing mismatch would be an issue, more likely bad
> segments of memory if you suspect memory for some reason.


Will do a thorought test, thanks.

> It could
> also be the card. Wouldn't be the first time a raid controller went
> to crap slowly.


I am so serious about this ongoing issue that I bought two RAID controller
cards.
It's not the card, it happens on other drives as well, and other MOBO's as
well..
It is either a drriver, or the RAM.
In a previous post I mentioned an error message after a disk crash I got (I
got this a few times before and ignored it..)
The message + HP explanation at their site, in my words, amounted to: "There
is a Hewlett-Packard program on your system "Sweet Memories To Disk", that
can totally corrupt your harddisk so that you would have to wipe it clean
and re-install everything - please download this update". Well I have done
it now..

No kidding.




Frank de Groot

2005-03-12, 7:45 am

(C) Copyright 2005 Frank A. de Groot - All rights reserved.

> Will do a thorought test, thanks.



I removed 2 of 3 DIMMS and now the disks run like a charm instead of
immediately reporting errors, even under a severe stresstest.
Looks like the cause was a faulty DIMM.


Brian Inglis

2005-03-12, 5:45 pm

On Sat, 12 Mar 2005 12:48:49 +0100 in comp.arch.storage, "Frank de
Groot" <franciad@online.no> wrote:

>I removed 2 of 3 DIMMS and now the disks run like a charm instead of
>immediately reporting errors, even under a severe stresstest.
>Looks like the cause was a faulty DIMM.


May not be a faulty DIMM, the fault may be installing 3 DIMMs: most
systems nowadays will only run with 2 normal spec DIMMs, unless all
DIMMs are registered or tested: check your motherboard manufacturer's
web site for DIMM specs, and what specific brands and models are
allowed when you have more than 2 installed.

--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada

Brian.Inglis@CSi.com (Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
fake address use address above to reply
Frank de Groot

2005-03-12, 5:45 pm

"Brian Inglis" <Brian.Inglis@SystematicSW.Invalid> wrote in message
news:nk9631topovlrjfh5ghvsceiiqu7cehvmu@
4ax.com...

> May not be a faulty DIMM, the fault may be installing 3 DIMMs: most
> systems nowadays will only run with 2 normal spec DIMMs, unless all
> DIMMs are registered or tested:


You have the answer!
I had 2 "paired" DIMMs and one "rogue" with slightly different timings.
I could never find anything really *wrong* with that "odd" one, but when I
leave that one out, the system works like a charm.


Joerg Lenneis

2005-03-13, 5:46 pm


Frank de Groot:

> "Brian Inglis" <Brian.Inglis@SystematicSW.Invalid> wrote in message
> news:nk9631topovlrjfh5ghvsceiiqu7cehvmu@
4ax.com...


[vbcol=seagreen]
> You have the answer!
> I had 2 "paired" DIMMs and one "rogue" with slightly different timings.
> I could never find anything really *wrong* with that "odd" one, but when I
> leave that one out, the system works like a charm.


If you want to make sure that your memory is OK now, go to
http://www.memtest.org/ or to http://www.memtest86.com/ and use the
memory testing tools there for further verification. They catch a lot
of hardware problems.

--

Joerg Lenneis

email: lenneis@wu-wien.ac.at
Torbjorn Lindgren

2005-03-13, 5:46 pm

Frank de Groot <franciad@online.no> wrote:
>"Brian Inglis" <Brian.Inglis@SystematicSW.Invalid> wrote in message
> news:nk9631topovlrjfh5ghvsceiiqu7cehvmu@
4ax.com...
>
>
>You have the answer!
>I had 2 "paired" DIMMs and one "rogue" with slightly different timings.
>I could never find anything really *wrong* with that "odd" one, but when I
>leave that one out, the system works like a charm.


There are a multitude of possible causes for this.

It could the differences between the DIMMs that causes this, it might
not handle that many "sides" or it might require slowing down memory
accesses with that many "sides" (standard DIMMs can have one or two
sides).

Personally I'd suspect that the most likely cause is the difference
between the DIMMs, and that either rearranging the DIMMs (so that it
sees the "slower" first, unless they have *different* slow parameters.
Which way it scans is totally undocumented, and it SHOULD query all of
them, but in reallity this does help surprisingly often) or manually
settting down the memory speed slightly (CAS, speed or one of the
other parameteres).

The second most likely cause is that you need to reduce memory speed
due to too many sides, the reason I list this as less likely is that
this is something that the BIOS almost always gets right without help.
But check the manual and see what it says about memory.

I usually use Memtest86+ to verify that it works. It takes time (the
longer the better, let it run overnight), and isn't guaranteed to
always catch all errors, but it's fairly good.

It also shows what memory speed settings are used, so you can easily
see if the setting changes as you rotate the three dimms (three
tests).

(The original Memtest86 is also pretty good and they now trade
information back and forth, I find the + version still to be better).

http://www.memtest.org/
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com