there has to be a better way, or we are stuffed!
Web Server forum
Back To The Forum Home!Search!Private Messaging System

Web Server Talk Web Server Talk > WebserverTalk Community > Data Storage > there has to be a better way, or we are stuffed!




  Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    there has to be a better way, or we are stuffed!  
efffemm@f-m.fm


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
05-24-07 06:14 AM

Company I work for has a SAN of about 50 TB.
It is configured as 4 logical disks. So when there is a
failure, that logical disk is out of action while
the RAID rebuilds itself.  "Hot swapping" doesn't
help much when 1/4 of the system is paralysed for
hours afterwards. It seems about once a month that
one hard drive shits itself, and has to be replaced,
triggering the fiasco again.
Then system engineer says it would be good idea to
run complete diagnostics. That means taking all offline
for 172800 seconds = gazillions of dollars lost.






[ Post a follow-up to this message ]



    Re: there has to be a better way, or we are stuffed!  
ajm163@yahoo.com


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
05-24-07 06:14 PM

On May 23, 6:55 pm, efff...@f-m.fm wrote:
> Company I work for has a SAN of about 50 TB.
> It is configured as 4 logical disks. So when there is a
> failure, that logical disk is out of action while
> the RAID rebuilds itself.  "Hot swapping" doesn't
> help much when 1/4 of the system is paralysed for
> hours afterwards. It seems about once a month that
> one hard drive shits itself, and has to be replaced,
> triggering the fiasco again.
> Then system engineer says it would be good idea to
> run complete diagnostics. That means taking all offline
> for 172800 seconds = gazillions of dollars lost.



how many physical disks are we talking about?? Even if there are alot
loosing one a month seems like a really high failure rate to me.  what
type of drives are they??? also i dont know who your Raid vendor is
but with raid5 you should be able to continue writing to the LUN even
with a failed disk.  The lun should be critical but still accessable
at a slower speed (overhead of the rebuild process).  Id talk to my
raid vendor about the massive failure rate.  also what you woukd need
to do is make smaller LUNS that way when a drive fails you dont take
as much of a hit mabe 1/15 of your storage is offline instead of 1/4

just some suggestions

AJ






[ Post a follow-up to this message ]



    Re: there has to be a better way, or we are stuffed!  
Nik Simpson


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
05-25-07 12:14 AM

ajm163@yahoo.com wrote:
> On May 23, 6:55 pm, efff...@f-m.fm wrote: 
>
>
>
> how many physical disks are we talking about?? Even if there are alot
> loosing one a month seems like a really high failure rate to me.  what
> type of drives are they??? also i dont know who your Raid vendor is
> but with raid5 you should be able to continue writing to the LUN even
> with a failed disk.  The lun should be critical but still accessable
> at a slower speed (overhead of the rebuild process).  Id talk to my
> raid vendor about the massive failure rate.  also what you woukd need
> to do is make smaller LUNS that way when a drive fails you dont take
> as much of a hit mabe 1/15 of your storage is offline instead of 1/4
>
> just some suggestions
>
> AJ
>

Have to agree with AJ,

1. losing a drive shouldn't take the volumes offline, the whole point of
RAID is too prevent that.

2. Failure rates seem very high

3. If your vendor can't figure it out, it's time to look at a new vendor
--
Nik Simpson





[ Post a follow-up to this message ]



    Re: there has to be a better way, or we are stuffed!  
Rob Turk


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
05-25-07 12:14 AM

"Nik Simpson" <n_simpson@bellsouth.net> wrote in message
news:1Dk5i.14316$KC4.4587@bignews6.bellsouth.net...
> ajm163@yahoo.com wrote:
>
> 2. Failure rates seem very high
>

It all depends on the number of drives involved. 50TB RAID capacity might be
about 60TB native. If that's made up of 146GB disks you'd be talking about
400+ drives. With 3% annual failure rate (which is not unusual) that would
be about 12 per year.

Rob







[ Post a follow-up to this message ]



    Re: there has to be a better way, or we are stuffed!  
Nik Simpson


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
05-25-07 12:14 AM

Rob Turk wrote:
> "Nik Simpson" <n_simpson@bellsouth.net> wrote in message
> news:1Dk5i.14316$KC4.4587@bignews6.bellsouth.net... 
>
> It all depends on the number of drives involved. 50TB RAID capacity might 
be
> about 60TB native. If that's made up of 146GB disks you'd be talking about
> 400+ drives. With 3% annual failure rate (which is not unusual) that would
> be about 12 per year.
>
> Rob
>
>
True, I guess I was just assuming larger drives, but good point, OP
needs to tell us a little more about the configuration.

--
Nik Simpson





[ Post a follow-up to this message ]



    Re: there has to be a better way, or we are stuffed!  
GraemeDods@gmail.com


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
05-25-07 06:18 AM

On May 24, 8:55 am, efff...@f-m.fm wrote:
> Company I work for has a SAN of about 50 TB.
> It is configured as 4 logical disks. So when there is a
> failure, that logical disk is out of action while
> the RAID rebuilds itself.  "Hot swapping" doesn't
> help much when 1/4 of the system is paralysed for
> hours afterwards. It seems about once a month that
> one hard drive shits itself, and has to be replaced,
> triggering the fiasco again.
> Then system engineer says it would be good idea to
> run complete diagnostics. That means taking all offline
> for 172800 seconds = gazillions of dollars lost.

Either the storage system is configured very badly or it's a very poor
design. A single disk failure should not have such a significant
impact on performance. You should be able to replace the drive and let
the system rebuild it in the background and still allow user/
application access to the logical disks. For that quantity of storage
and for the cost of down-time (given your mention of lost revenue)
this storage should be a highly available enterprise level solution.
If that's what you've paid for, it certainly sounds like that's not
what you've got. Care to elaborate on what systems you're actually
running?

Graeme






[ Post a follow-up to this message ]



    Re: there has to be a better way, or we are stuffed!  
Lon


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
06-05-07 06:13 AM

efffemm@f-m.fm proclaimed:

> Company I work for has a SAN of about 50 TB.
> It is configured as 4 logical disks. So when there is a
> failure, that logical disk is out of action while
> the RAID rebuilds itself.  "Hot swapping" doesn't
> help much when 1/4 of the system is paralysed for
> hours afterwards. It seems about once a month that
> one hard drive shits itself, and has to be replaced,
> triggering the fiasco again.
> Then system engineer says it would be good idea to
> run complete diagnostics. That means taking all offline
> for 172800 seconds = gazillions of dollars lost.
>
One of the better ways is not to create such huge logical disks.
Common practice, wait till you see the fsck or chkdisk times on 40
terabytes.





[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 05:30 AM.      Post New Thread    Post A Reply      
  Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 
Medical and Health forum | Computer Games Reviews | Graphics design forum

Back To The Top
Home | Usercp | Faq | Register