Data Storage - How would you store 100TB data?

This is Interesting: Free IT Magazines  
Home > Archive > Data Storage > March 2006 > How would you store 100TB data?





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author How would you store 100TB data?
hikerhauk@gmail.com

2006-02-25, 5:53 pm

Hi, I'm a student and out of my curiousness I want to know how data
centers usually store their data. Some mail provider liker Gmail and
Hotmail must have huge volume of data, even properly compressed. I just
can't imagine they could store it in hard drives. Is there any special
devices?

Thanks in advance.

Paul Rubin

2006-02-26, 11:31 am

hikerhauk@gmail.com writes:
> Hi, I'm a student and out of my curiousness I want to know how data
> centers usually store their data. Some mail provider liker Gmail and
> Hotmail must have huge volume of data, even properly compressed. I just
> can't imagine they could store it in hard drives. Is there any special
> devices?


Of course they use hard drives. Installed in umpteen thousand
computers. What else would they use?!
Robert Klute

2006-02-26, 11:31 am

On 25 Feb 2006 15:28:24 -0800, hikerhauk@gmail.com wrote:

>Hi, I'm a student and out of my curiousness I want to know how data
>centers usually store their data. Some mail provider liker Gmail and
>Hotmail must have huge volume of data, even properly compressed. I just
>can't imagine they could store it in hard drives. Is there any special
>devices?
>



If the data is needed quickly then disk is it. Just check out the some
vendors websites - HP (whose products I will use a examples), IBM, EMC,
Fujitsu, Hitachi, 3Par, ... the list goes on.

One example would be the XP1200. Currently, it supports upto 290TB
usable storage per array (5 cabinets in width).

http://h18006.www1.hp.com/products/...2000/index.html

For Linux HPC there is the Scalable File Share (HP SFS) which can scale
upt to 1024TB.

http://h20311.www2.hp.com/HPC/cache...-0-0-0-121.html


If the need is for nearline archival storage, then products such as the
StorageWorks 6000 Virtual Library System can the used. It currently can
store upto 70TB per system.

http://h18006.www1.hp.com/products/...0vls/index.html


hikerhauk@gmail.com

2006-02-26, 11:32 am

Thanks. You really broadened my vision.

_firstname_@lr_dot_los-gatos_dot_ca.us

2006-02-26, 11:32 am

In article <1140910104.553084.114510@v46g2000cwv.googlegroups.com>,
<hikerhauk@gmail.com> wrote:
>Hi, I'm a student and out of my curiousness I want to know how data
>centers usually store their data. Some mail provider liker Gmail and
>Hotmail must have huge volume of data, even properly compressed. I just
>can't imagine they could store it in hard drives. Is there any special
>devices?


In a nutshell, they are stored on disks. Typically, one takes several
or many disks, and puts them together in a "disk array" - a box
ranging in size from a toaster (3U rackmount) to several
refrigerators, containing anywhere between a half-dozen and several
thousand disks. They range in cost from $1K to $10M. Disk arrays are
typically connected to the computer by a variety of network
technologies, ranging from SCSI or SATA for low-end to fibre channel
and ESCON/FICON for high end; Ethernet (iSCSI and other IP-based
protocols) is making rapid inroads.

Another option is to build a box that contains many disks, and then
serves file system storage protocols such as NFS and CIFS. This is
known as NAS or network-attached-storage. Such NAS boxes again range
in size from toaster to multiple refrigerators, with similar cost
ranges.

The largest single storage system I know of is the disk farm used by
the Livermore supercomputer; I think it contains about 10000 disks,
and stores several petabytes. I'm sure you can google for the
details. There are many places that have more storage than this
system, but typically not in a single storage system.

Since disks are so unreliable (they are typically the least reliable
computer thing in a data center, excluding the air conditioning, which
tends to be even less reliable, and often leaks all over the place),
disk arrays and NAS boxes typically use multiple disks in a redundant
way. The easiest thing is to store every byte on two separate disks,
which in effect amounts to always using pairs of disks to build
mirrored virtual disks (known is the trade as RAID-1). More efficient
ways to achieve redundancy use erasure codes, beginning with simple
parity (typically known as RAID-5). For even higher reliability, one
can use more than copies of the data. High-end disk arrays often have
the capability to automatically copy the data over wide-area network
links to remote sites (other buldings, other cities), thereby
achieving reliability in the case of site desasters.

To store 100 TB today, you only need 200 disks (with 500GB SATA
disks); with redundancy you better make that 250 to 400 disks. With
the most current technology, this will require about one to two racks
worth of real estate (about the size of a really large refrigerator);
if you go for the lowest cost, it should be somewhere between a
quarter and half miliion. A really nice 100TB system (with
redundancy, remote mirroring, great management software, from a
supplier with a good reputation and excellent customer support) can
easily cost you 10x that much. Usually, you get what you pay for,
unless you manage to find a slightly discounted lunch.

In todays data centers, 100TB systems are common; many corporations
store considerable more data than this. Mail providers and
web-related businesses (microsoft, google, yahoo) are actually not
particularly large users of storage; a lot of storage is also used in
banks, insurance, manufacturing, research, medical, as archives, and
in government agencies.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca.us
hozenshi@gmail.com

2006-02-27, 2:46 am

it seems the hard disk is quite expensive. but why they didn't use
optic based disc? it's cheaper and also can hold large data. i heard HP
have an optic drive with 10+ optic pickup head, which can faster
position and pick the data.

Ed Wilts

2006-02-27, 5:47 pm

hikerhauk@gmail.com wrote:
> Hi, I'm a student and out of my curiousness I want to know how data
> centers usually store their data. Some mail provider liker Gmail and
> Hotmail must have huge volume of data, even properly compressed. I just
> can't imagine they could store it in hard drives. Is there any special
> devices?


I help manage about 150TB in our data center. Yes, it's all on disks
and none of it is compressed. Check here
http://h18006.www1.hp.com/products/.../eva/index.html for some
products similar to what we have - we have the older generation with
the newer stuff coming in soon. We use a mix of 146GB, 250GB, and
300GB drives. A lot of them...

100TB isn't all that uncommon these days. Larger customers are talking
about petabytes (a thousand terabytes).

5 years ago we had 10TB on the floor. Now it's 150. We've got an
internal pool going for when we hit a petabyte :-)

.../Ed

Paul Rubin

2006-02-27, 8:46 pm

"Ed Wilts" <ewilts@ewilts.org> writes:
> 5 years ago we had 10TB on the floor. Now it's 150. We've got an
> internal pool going for when we hit a petabyte :-)


Out of curiosity what kind of data is it?
flux

2006-02-28, 7:46 am

In article <1140933623.229616@smirk>,
_firstname_@lr_dot_los-gatos_dot_ca.us wrote:

> Since disks are so unreliable (they are typically the least reliable


Oh really?

> computer thing in a data center, excluding the air conditioning, which


What about electronics, motherboards, CPUs, memory?

> In todays data centers, 100TB systems are common;


I wonder if there are ANY data centers that store 100 TB let alone...

>many corporations
> store considerable more data than this. Mail providers and

HVB

2006-02-28, 7:46 am

On Tue, 28 Feb 2006 00:14:40 -0500, flux <support@fluxsoft.com> wrote:

>I wonder if there are ANY data centers that store 100 TB let alone...


?!?!

I know one data centre that has 1PB (yes, you read that right) of
useable storage, with over half of it actually consumed.

I'm currently designing a new data centre which will require over
2.5PB of useable storage capacity. The amount of actual raw storage
required is very much higher than this.

So, yes, plenty of data centres actually store more than 100TB of
data.

HVB
Ed Wilts

2006-03-01, 7:49 am

Paul Rubin wrote:
> "Ed Wilts" <ewilts@ewilts.org> writes:
>
> Out of curiosity what kind of data is it?


Scanned images and database files probably make up the majority. Then
there's the usual miscellaneous home area files, Exchange, etc.. 5
different operating systems access our SAN today.

/Ed

flux

2006-03-01, 7:49 am

In article <gq880255976je89bc9ie24f05genjqkh0r@4ax.com>,
HVB <hvbnntp@googlemail.com> wrote:

> On Tue, 28 Feb 2006 00:14:40 -0500, flux <support@fluxsoft.com> wrote:
>
>
> ?!?!
>
> I know one data centre that has 1PB (yes, you read that right) of
> useable storage, with over half of it actually consumed.
>
> I'm currently designing a new data centre which will require over
> 2.5PB of useable storage capacity. The amount of actual raw storage
> required is very much higher than this.


These sound like special exceptions.

> So, yes, plenty of data centres actually store more than 100TB of
> data.


You seem to be saying that 2 cases you just quoted count for plenty.
robertwessel2@yahoo.com

2006-03-01, 7:49 am


flux wrote:
> In article <gq880255976je89bc9ie24f05genjqkh0r@4ax.com>,
> HVB <hvbnntp@googlemail.com> wrote:
>
>
> These sound like special exceptions.
>
>
> You seem to be saying that 2 cases you just quoted count for plenty.



While 100TB shops are certainly at the larger end of the spectrum,
they're hardly uncommon. If they were, why would the big storage
vendors all sell single arrays that can store more than that? For
example, IBM's DS8000 can store 192TB, EMC's Symmetrix DMX2000 holds
118TB, the DMX3000 230TB, and the DMX-3 1052TB, and HP's StorageWorks
XP12000 supports 332TB.

bruce.clarke@yahoo.com

2006-03-01, 5:48 pm

Having consulted in several large data centers over the years including
Yahoo, TI, Intel, B-of-A, and Oracle it is clear that appetite for
storage continues to grow at a very high pace. Drive capacities
continue to grow, storage units from NetApp, EMC, HP, IBM, et al
continue to increase their handling capabilities, all in an attempt to
keep up.

New startups such as Pillar, 3Par, Agamai, Isilon, and others show
there is still room for new players and innovation.

We're poised on the edge of a new period for storage architectures
where the ability to store, manage, and most importantly, access huge
amounts of storage spread over very large farms of drives/servers is
about to become real. Spinnaker and Isilon have shown the way and it
will truly revolutionize how data, corporate, personal, and www, will
be managed.

I already have right at a TB of storage in my home and I do not
consider myself unique. So if individuals have a TB, it will be common
for corporations to have PB. I'm not sure how close Google and Yahoo
are at closing in on an EB of storage, but suspect one or both will
reach that level soon.

How will all this be setup and managed? At this point, my crystal ball
starts to cloud. But some of the best minds in industry and academia
are working on it and the winners will make lots of $$$. You and I will
benefit by not having to have umpteen copies of the same file (how many
copies of notepad.exe are really required in the world after all?)
taking up our personal space.

100TB? As mentioned earlier, this is now only two standard 19" racks
of storage. How many corporate data centers have only two racks in
those beautiful computer rooms they built and manage? PB will become
the norm shortly. In 10 years, EB will be the norm.

My $.02/GB

Bruce Clarke

Faeandar

2006-03-01, 5:48 pm

On Tue, 28 Feb 2006 00:14:40 -0500, flux <support@fluxsoft.com> wrote:

>In article <1140933623.229616@smirk>,
> _firstname_@lr_dot_los-gatos_dot_ca.us wrote:
>
>
>Oh really?


Really. Rarely do I need to have a mb or power supply swapped. Same
goes for cpu although a few bugs have caused more ram swaps than I
would like.

But disks fail every day. I manage a decent sized NAS environment and
of the 400TB of usable storage I've only once had to have a
motherboard replaced, twice ram, and the occasional power
supply/cable/misc. But drives are replaced by the shipment every
week.

>
>
>What about electronics, motherboards, CPUs, memory?


Rarely any issues with these unless you are unlucky enough to run into
a bug.

>
>
>I wonder if there are ANY data centers that store 100 TB let alone...


I wonder if you have a clue.

~F
_firstname_@lr_dot_los-gatos_dot_ca.us

2006-03-03, 6:02 pm

In article <1141227562.479968.104220@e56g2000cwe.googlegroups.com>,
<bruce.clarke@yahoo.com> wrote:
>I already have right at a TB of storage in my home and I do not
>consider myself unique.


Oh - you have two disk drives, I see. :-)

That joke was a little glib and cruel; not that many 500GB drives have
shipped in the consumer channel yet. I'm still at 300 some GB in my
server (3 drives), but the disks are not full yet, so I haven't seen a
need to upgrade for a few years. I know several people who have
multi-TB systems at home. The easy way to need and fill that disk
space is to build your own PVR, or to rip all your DVDs onto disk,
which makes it easier for the kids to watch the movies they want to
watch (like Nemo or Toy Story) without risk of the DVDs getting
scratched.

Clearly, the way consumers use disk space at home, and the way
corporations use disk space, are very different. Interestingly,
digital movie production is a large consumer of disk space; supposedly
making a feature film today consumes many PB in temporary space.

>I'm not sure how close Google and Yahoo
>are at closing in on an EB of storage, but suspect one or both will
>reach that level soon.


There are no firm numbers in public about their storage capacites,
those are closely guarded secrets. From usually reliable sources
(lots of people live in the bay area, and people talk), I hear that
Google had at the minimum several times 3PB in the Mountain View data
center alone about 2 years ago; if you include their remote data
centers, they are probably at dozens or hundreds of PB today.

>100TB? As mentioned earlier, this is now only two standard 19" racks
>of storage. How many corporate data centers have only two racks in
>those beautiful computer rooms they built and manage?


There are pictures of large data centers around the web, google for
them. They typically have hundreds of racks. A good fraction of that
is storage. It is not uncommon to see a dozen Sharks, Lightnings or
Symmetrix in one room; with up-to-date models that is a PB right
there. This is not even counting racks and racks of 1U- or 2U-servers
being used as storage devices.

The largest single file system I know of (and I probably missed a few)
is over 2PB (single file system means that you can mount it a single
mount point and access it as a single name space with a single data
space). Google for "ASCI Purple". Quite a few other customers have
storage plants that size, just not in a single file system.

In article <gvqb02pnk3e18bionmql0g5f8n3poggo10@4ax.com>, Faeandar
<mr_castalot@yahoo.com> wrote:

> On Tue, 28 Feb 2006 00:14:40 -0500, flux <support@fluxsoft.com> wrote:
>
> I wonder if you have a clue.


Possibly he doesn't. Which is OK: anyone in the storage industry who
claims that 100TB systems don't exist will be irrelevant in a short
period. Or possibly he is a troll with a clue. Either way is fine
with me.

To be honest, I've not built a 100TB myself yet. Somewhere on the
public web is a picture of a 30TB system I built 2.5 years ago, with
another guy and me standing proudly in front of it; took 4 racks back
then (using SCSI disks). But then, I don't work with real customers
in real data centers.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca.us
flux

2006-03-03, 6:02 pm

In article <gvqb02pnk3e18bionmql0g5f8n3poggo10@4ax.com>,
Faeandar <mr_castalot@yahoo.com> wrote:

>
> Really. Rarely do I need to have a mb or power supply swapped. Same
> goes for cpu although a few bugs have caused more ram swaps than I
> would like.


My experience is essentially the opposite.

> But disks fail every day. I manage a decent sized NAS environment and
> of the 400TB of usable storage I've only once had to have a


Well, 400 TB is an awful lot of storage. I really got to wonder what's on there.

> motherboard replaced, twice ram, and the occasional power
> supply/cable/misc. But drives are replaced by the shipment every
> week.


400 TB at 500 GB drives is 800 drives. So how many motherboards are there?
flux

2006-03-03, 6:02 pm

In article <1141227562.479968.104220@e56g2000cwe.googlegroups.com>,
bruce.clarke@yahoo.com wrote:

> 100TB? As mentioned earlier, this is now only two standard 19" racks
> of storage. How many corporate data centers have only two racks in
> those beautiful computer rooms they built and manage?


How does the availability of that capacity automatically mean it is
present and in use in these data centers? Certainly, a Lamborghini
doesn't take up a lot of space, but that does that mean it is common?
flux

2006-03-03, 6:02 pm

In article <1141260823.733349@smirk>,
_firstname_@lr_dot_los-gatos_dot_ca.us wrote:

> There are pictures of large data centers around the web, google for
> them. They typically have hundreds of racks. A good fraction of that
> is storage. It is not uncommon to see a dozen Sharks, Lightnings or
> Symmetrix in one room; with up-to-date models that is a PB right
> there. This is not even counting racks and racks of 1U- or 2U-servers
> being used as storage devices.


I'm not seeing this as evidence as supporting the claim they are common.
They can still be very rare. It's just that because they are so large,
pictures of them are making their way to web. Why take a picture of data
center with just 1 TB?

> Possibly he doesn't. Which is OK: anyone in the storage industry who
> claims that 100TB systems don't exist will be irrelevant in a short


Oh, there are certain to be some, for sure. But common to me is another
matter entirely.

There has to be something on these disks that takes up 100 TB and it has
to be in a LOT of data centers for it to count as "common".

So you can reasonably have a TB in your home. Then I would say perhaps
you have more data in your home than in a corporate data center!

How does a data center in a corporate environment have a need for this
volume of storage? If you are an insurance company, for example, what
data do you have that takes up 100 TB? How many ASCII text filled in
insurance claim forms does it take to fill 100 TB? Even given Katrina,
I'm hard-pressed to believe this is taking up 100 TB across all the
insurance companies involved. Are you trying to suggest that insurance
agents going to out to file a claim bring along a movie crew to film the
damaged goods in high-def?
robertwessel2@yahoo.com

2006-03-03, 6:02 pm


flux wrote:
> My experience is essentially the opposite.
>
>
> Well, 400 TB is an awful lot of storage. I really got to wonder what's on there.
>
>
> 400 TB at 500 GB drives is 800 drives. So how many motherboards are there?



He's almost certainly not using all 500GB drives. Assuming it's 150GB
drives, you'd expect a failed drive every 2.5 weeks based on the
(optimistic) published MTBFs (typically 1.2M hours for high end SCSI/FC
drives, divided by 2700 drives). If a chunk of his array is
performance critical, he may well be using 36 or 72GB drives in that
portion. While you start with the 2.5 weeks per "real" failure,
remember that all high end arrays do a considerable amount of
monitoring and tend to call for drive replacements when correctable
error counts start increasing (and whatever other events they're
monitoring). Hopefully *before* they actually fail. The typical
process is that the drive that's acting up is migrated to the hot
spare, the questionable drive is remarked as the hot spare, and its
replacement is scheduled. The high end arrays will all phone home to
let the support folks know to make sure a new drive gets shipped.

Construction of the big arrays varies considerably, but you typically
have 7-15 drives plugging into a single backplane. The backplane isn't
usually too smart, but does have the power management and isolation
circuitry needed to isolate and hot swap the drives, plus various
indicators and whatnot (usually a few LEDs for each drive, sometimes
powered locks for each drive).

Those backplanes are typically plugged into controller boards, which
contain the actual smarts of the array. Controllers in big arrays
typically handle 4-16 backplanes each. Then you have some
interconnnect, I/O cards for the host interface, and often a higher
level of management hardware. Again, actual implementations are all
over the place.

robertwessel2@yahoo.com

2006-03-03, 6:02 pm


flux wrote:
> How does a data center in a corporate environment have a need for this
> volume of storage? If you are an insurance company, for example, what
> data do you have that takes up 100 TB? How many ASCII text filled in
> insurance claim forms does it take to fill 100 TB? Even given Katrina,
> I'm hard-pressed to believe this is taking up 100 TB across all the
> insurance companies involved. Are you trying to suggest that insurance
> agents going to out to file a claim bring along a movie crew to film the
> damaged goods in high-def?



No, that would never fit. OTOH, someone rear-ended my car last summer,
and caused a minimal amount of damage. My insurance company took about
a dozen pictures with a 5MP camera. But even the text part of the
record bulks up considerably. There will be logs of all contacts,
events and whatnot that happen, plus all the database overhead.

Consider the storage requirements of someone who collects half a MB of
data on each customer every year, has 10 million customers, and keeps a
decade of history, and has a database overhead for 2x - that's 100TB
right there. There are probably a hundred credit card vendors on the
planet who meet those criteria.

There was a reasonably credible report a couple of years ago that
Walmart's data warehouse had exceeded 500TB - and that's all database,
and little or no multimedia.

In some cases the storage used is smaller than the storage installed
for performance reasons (as I alluded to in my other post). Some big
database application need to sustain some number of I/Os per second for
their workloads, which can require more drives that a straight storage
estimate would show. The smallest drive installed in large arrays was
18GB until a couple of years ago, now it's the 36GB drives. And in
some big database applications even those small drives are pretty
empty, just because you need to get enough actuators over the data.

HVB

2006-03-03, 6:02 pm

On Wed, 01 Mar 2006 00:27:08 -0500, flux <support@fluxsoft.com> wrote:

>In article <gq880255976je89bc9ie24f05genjqkh0r@4ax.com>,
> HVB <hvbnntp@googlemail.com> wrote:
>
>These sound like special exceptions.


They're at the upper end, but they are not 'special' in any way.

>
>You seem to be saying that 2 cases you just quoted count for plenty.


No I'm not. I quoted you two examples.

For every one data centre at this size (1PB+, 10 times what you were
asking about) there are hundreds more well in excess of 100TB.

It is rapidly becoming the norm for large organizations to require
more than 100TB of storage.

I deal with this kind of environment every day. If it's less than,
say 50TB, I wouldn't usually get involved.

HVB.
HVB

2006-03-03, 6:02 pm

On Thu, 02 Mar 2006 01:55:51 -0500, flux <support@fluxsoft.com> wrote:

>In article <1141260823.733349@smirk>,
> _firstname_@lr_dot_los-gatos_dot_ca.us wrote:
>
>
>I'm not seeing this as evidence as supporting the claim they are common.
>They can still be very rare. It's just that because they are so large,
>pictures of them are making their way to web. Why take a picture of data
>center with just 1 TB?


This is just the tip of the iceberg. I don't know of any organization
that would allow pictures to be taken of their data centre, not even
for publicity reasons.

>
>Oh, there are certain to be some, for sure. But common to me is another
>matter entirely.
>
>There has to be something on these disks that takes up 100 TB and it has
>to be in a LOT of data centers for it to count as "common".


Define "a lot".

>How does a data center in a corporate environment have a need for this
>volume of storage?


Surprisingly easily. If you are used to dealing with small amounts of
data, typically in a small organization, or home environment, it can
be difficult to see how anybody could ever need 100TB+.

As another poster suggested, insurance companies regularly deal in
image data these days, and they also tend to keep a scanned copy of
all paperwork. When you have lots of customers, that means a huge
amount of data. Most keep this for a very long time, if not forever.

It's rare to find organizations that purely deal in ASCII text data
these days. Rich media is very common, even in the most unexpected
places.

When you ring a call centre and they tell you that your call "may be
monitored", what they really mean is "your call *is* being recorded".
This data is kept for a long time, if not forever.

One manufacturing client of mine creates huge amounts of video data.
For quality control purposes they use video to check their production
runs. They keep this for a long time, in case they need to check for
faults. An outsider to the business would probably never consider
storing data like this. They used to use video tape, but that has
it's own problems and for them the advantages of online storage
outweighed the costs.

Again, those are just two examples.

HVB
Paul Rubin

2006-03-03, 6:02 pm

HVB <hvbnntp@googlemail.com> writes:
> When you ring a call centre and they tell you that your call "may be
> monitored", what they really mean is "your call *is* being recorded".
> This data is kept for a long time, if not forever.
>
> One manufacturing client of mine creates huge amounts of video data.
> For quality control purposes they use video to check their production
> runs. They keep this for a long time, in case they need to check for
> faults. An outsider to the business would probably never consider
> storing data like this. They used to use video tape, but that has
> it's own problems and for them the advantages of online storage
> outweighed the costs.
>
> Again, those are just two examples.


Applications like that probably tend to mostly use the most recent
data. Is it really worth keeping so much older, rarely used data
spinning all the time, instead of having some big tape robots (or even
cabinets full of tape cartridges) like we used to see before disks got
so cheap?
HVB

2006-03-03, 6:02 pm

On 02 Mar 2006 03:41:33 -0800, Paul Rubin
<http://phr.cx@NOSPAM.invalid> wrote:

>HVB <hvbnntp@googlemail.com> writes:
[vbcol=seagreen]
>Applications like that probably tend to mostly use the most recent
>data. Is it really worth keeping so much older, rarely used data
>spinning all the time, instead of having some big tape robots (or even
>cabinets full of tape cartridges) like we used to see before disks got
>so cheap?


Recovery from tape takes too long - it isn't impossible, but it means
that the client keeps expensive staff waiting around while the
recovery takes place.

They keep the data on ATA drives, so they get relatively low cost
storage and practically instant visual access to any manufacturing
run.

HVB
Paul Rubin

2006-03-03, 6:02 pm

HVB <hvbnntp@googlemail.com> writes:
> They keep the data on ATA drives, so they get relatively low cost
> storage and practically instant visual access to any manufacturing run.


Do they keep those hundreds (thousands?) of ATA drives spinning all
the time in case of some rare access to any particular one, or do they
have some way of powering them up only when needed? The dozen or so
seconds of latency from that would probably be tolerable.
HVB

2006-03-03, 6:02 pm

On 02 Mar 2006 06:19:13 -0800, Paul Rubin
<http://phr.cx@NOSPAM.invalid> wrote:

>HVB <hvbnntp@googlemail.com> writes:
>
>Do they keep those hundreds (thousands?) of ATA drives spinning all
>the time in case of some rare access to any particular one, or do they
>have some way of powering them up only when needed? The dozen or so
>seconds of latency from that would probably be tolerable.


Yes, all drives are on and in use all the time. Failures are
therefore detected and spared out immediately. All failures are
covered by maintenance agreement.

This particular client is a manufacturer of a precision product. They
check everything following manufacture and again at various stages in
it's life. This system has revolutionized their quality control
processes and costs less overall than a video tape based system.

HVB
Torbjorn Lindgren

2006-03-03, 6:02 pm

Paul Rubin <http://phr.cx@NOSPAM.invalid> wrote:
>HVB <hvbnntp@googlemail.com> writes:
>
>Do they keep those hundreds (thousands?) of ATA drives spinning all
>the time in case of some rare access to any particular one, or do they
>have some way of powering them up only when needed? The dozen or so
>seconds of latency from that would probably be tolerable.


Spinning down the disks would presumably have both benefits and
drawbacks (spin-up can cause failures...). But there are ready-made
products that seems to be designed for scenarios like this and will
automatically manage the disks.

I see that the densest storage box I know of (Nexsan ATABeast/
SATABeast, 42 disks in 4U!) is now available in a SATA version, which
seems to have added something they call AutoMAID(TM) (Massive Array of
Idle Disks) which seems tailored for this (no mention of this in the
older ATABeast, wonder if they have or potentially could add this on
it via new firmware)

They list 210 TB in a standard rack (40U, leaving 2U for two FC
switches), but that's the raw capacity before removing RAID overhead
or hot-spares (and using 500GB disks). Say 150 TB usable perhaps
(somewhere in the 100-170TB range depending on RAID array size and hot
spares).

So if 100+ TB on multiple (FC/SAN) volumes is OK it can actually be
done in less than a rack (29-40U depending on degree of redundancy
required).
_firstname_@lr_dot_los-gatos_dot_ca.us

2006-03-03, 6:02 pm

In article <7xoe0pc4ke.fsf@ruckus.brouhaha.com>,
Paul Rubin <http://phr.cx@NOSPAM.invalid> wrote:
>HVB <hvbnntp@googlemail.com> writes:
>Do they keep those hundreds (thousands?) of ATA drives spinning all
>the time in case of some rare access to any particular one, or do they
>have some way of powering them up only when needed? The dozen or so
>seconds of latency from that would probably be tolerable.


Nearly all disk drives have been kept on and spinning.

Traditionally, there has been a lot of scepticism towards spinning
disks down, as it is not clear that they will ever spin back up. The
old "sticktion" problems come to mind. There are also questions about
what happens to spindle lubricants if the spindle isn't rotating for
long periods.

In spite of these questions, systems are now being built in which the
bulk of all (SATA) disks are kept spun down; this technology has even
acquired a new acronym, namely MAID (and I don't remember what it
stands for exactly). Please google for Copan systems.

To my knowledge (which is guaranteed to be only partial), the Copan
system has the highest density of storage in TB/sqft of floor space,
or TB/cuft of data center colume, or TB/kW of power used, or some
metric like that (I don't remember the details). Probably Copan's
website will have such information. I would bet that a Copan system
is several hundred disks in a rack. There may be other vendors that
are providing disk systems that spin down or have similar densities.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca.us
_firstname_@lr_dot_los-gatos_dot_ca.us

2006-03-03, 6:02 pm

In article <support-C42079.01314102032006@optonline.svc.highwinds-media.com>,
flux <support@fluxsoft.com> wrote:

>400 TB at 500 GB drives is 800 drives. So how many motherboards are there?


400 TB is not typically today done with motherboards. A system that
size typically has drawers of disks (often called JBODs), with simple
passive backplanes or minimal switching. Traditionally, that used to
be 14-16 disks per 3U rackmount; for the newest packaging you can
double or triple that.

Those JBODs (typically connected via FC, traditionally also via SCSI,
going more towards SATA/SAS, maybe also via Ethernet/iSCSI) are then
connected to array controllers (or NAS heads), which again typically
don't use motherboards in the traditional sense (you won't find an ATX
board with a Pentium in a Netapp or an EMC Symmetrix).

There is a trend towards building massive storage systems out of
commodity-style hardware, connected via commodity networks. In this
model, there are identifiable motherboards; but those are likely to be
either modified motherboards, or stock mass-market motherboards with
somewhat unusual disk controllers. It is easy with stock hardware for
one MoBo to control 16 or 32 SATA or SCSI disks; with ATA cabling,
anything above 12 disks is annoying, unless you use custom-built ATA
backplanes.

So in summary, in 400TB systems, motherboards are not yet really
relevant. I would suggest that the poster go examine a multi-dozen-TB
system to see how it is put together. Having taken a Hitachi
Lightning apart once, it is really an eye-opener on the difference in
construction between enterprise systems and personal computers.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca.us
robertwessel2@yahoo.com

2006-03-03, 6:02 pm


_firstname_@lr_dot_los-gatos_dot_ca.us wrote:
> In article <7xoe0pc4ke.fsf@ruckus.brouhaha.com>,
> Paul Rubin <http://phr.cx@NOSPAM.invalid> wrote:
>
> Nearly all disk drives have been kept on and spinning.
>
> Traditionally, there has been a lot of scepticism towards spinning
> disks down, as it is not clear that they will ever spin back up. The
> old "sticktion" problems come to mind. There are also questions about
> what happens to spindle lubricants if the spindle isn't rotating for
> long periods.
>
> In spite of these questions, systems are now being built in which the
> bulk of all (SATA) disks are kept spun down; this technology has even
> acquired a new acronym, namely MAID (and I don't remember what it
> stands for exactly). Please google for Copan systems.
>
> To my knowledge (which is guaranteed to be only partial), the Copan
> system has the highest density of storage in TB/sqft of floor space,
> or TB/cuft of data center colume, or TB/kW of power used, or some
> metric like that (I don't remember the details). Probably Copan's
> website will have such information. I would bet that a Copan system
> is several hundred disks in a rack. There may be other vendors that
> are providing disk systems that spin down or have similar densities.



This is targeting what's often called the near-line storage market.
Optical disk have been the biggest player in that niche for the last
couple of decades, but you certainly can't beat the price of SATA disks
- certainly far lower per $/MB that just the MO media. And SATA disks
have modest connects, that ought to be trivial to switch (I'm sure only
a small number of drives can be online at any one time), and the spinup
times are probably going to be better than picking an MO disk out of a
library.

flux

2006-03-03, 6:02 pm

In article <vkgd029bkgunimndvtumnr9dku5d2qv41q@4ax.com>,
HVB <hvbnntp@googlemail.com> wrote:

>
> Define "a lot".


If you are claiming they are common, then it's really up to you to define it..

> As another poster suggested, insurance companies regularly deal in
> image data these days, and they also tend to keep a scanned copy of
> all paperwork. When you have lots of customers, that means a huge
> amount of data. Most keep this for a very long time, if not forever.


I'm not so convinced. Even scanned copies stored as jpegs wouldn't take up that much data.

Plus, why would anyone store this data for a "long time" on live disks?

> It's rare to find organizations that purely deal in ASCII text data
> these days. Rich media is very common, even in the most unexpected
> places.


This is just an assertion analogous to the claim that 100TB data centers are common.

> When you ring a call centre and they tell you that your call "may be
> monitored", what they really mean is "your call *is* being recorded".
> This data is kept for a long time, if not forever.


Also another thing that could compressed and not take up a lot of space.

Plus, why would anyone store this data for a "long time" on live disks?

>
> One manufacturing client of mine creates huge amounts of video data.


> Again, those are just two examples.


The claim that 100 TB data centers are common seems to cry out for several hundred, if not thousand, examples.
flux

2006-03-03, 6:02 pm

In article <1141285352.047692.318000@e56g2000cwe.googlegroups.com>,
"robertwessel2@yahoo.com" <robertwessel2@yahoo.com> wrote:

> No, that would never fit. OTOH, someone rear-ended my car last summer,
> and caused a minimal amount of damage. My insurance company took about
> a dozen pictures with a 5MP camera. But even the text part of the
> record bulks up considerably. There will be logs of all contacts,
> events and whatnot that happen, plus all the database overhead.


I'm not so convinced the text part bulks up enough to need such a level
of storage.

> Consider the storage requirements of someone who collects half a MB of
> data on each customer every year, has 10 million customers,


That is a HUGE number of customers. That sounds like a lot of data to
collect EVERY year for EACH and EVERY customer.

>and keeps a
> decade of history,


And it's ENTIRELY online.

Each element seems to make it and more and unbelievable.

> right there. There are probably a hundred credit card vendors on the
> planet who meet those criteria.


Do the individual banks keep this info or it is ALL stored by Mastercard
and Visa?

If it's the latter, then it's really only two vendors and certainly
would have a lot. But it were the separate banks, then it would be just
a few large banks like Citibank and Chase. And still it would a very
TINY amount of data (unless they are storing TIFFs of every product that
everyone purchases everytime) and again it would being stored actively
online for longer than anyone could ever need.

> There was a reasonably credible report a couple of years ago that
> Walmart's data warehouse had exceeded 500TB - and that's all database,
> and little or no multimedia.


To store what? They keep a photo of each item of every product they ever
sell? I just don't see how they generate even ONE/TENTH this volume.
flux

2006-03-03, 6:02 pm

In article <1141337877.132056@smirk>,
_firstname_@lr_dot_los-gatos_dot_ca.us wrote:

> In article <support-C42079.01314102032006@optonline.svc.highwinds-media.com>,
> flux <support@fluxsoft.com> wrote:
>
>
> 400 TB is not typically today done with motherboards. A system that


I think you misunderstood what I meant here. I am referring to the
failure rate of those drives compared to other electronic components. To
be fair, criteria must set to decide what is equivalent to a single
drive. A motherboard has a lot of components, so it's probably not fair
to compare the failure rate of a single drive to a single motherboard,
but it certainly seems that there are a lot more drives in a data center
than there are motherboards, so it may appear that drives fail very
often, but in reality, there are just so many of them, there are bound
to be some failures.
flux

2006-03-03, 6:02 pm

In article <1141283451.849531.21120@u72g2000cwu.googlegroups.com>,
"robertwessel2@yahoo.com" <robertwessel2@yahoo.com> wrote:

> flux wrote:
>
>
> He's almost certainly not using all 500GB drives. Assuming it's 150GB


Which means way more drives, which are that many ways more for a failure
to occur, so it could appear that drives are "very unreliable" as the
original poster claimed.
Torbjorn Lindgren

2006-03-03, 8:45 pm

<_firstname_@lr_dot_los-gatos_dot_ca.us> wrote:
>In spite of these questions, systems are now being built in which the
>bulk of all (SATA) disks are kept spun down; this technology has even
>acquired a new acronym, namely MAID (and I don't remember what it
>stands for exactly). Please google for Copan systems.


You'd need to actively check the disks now and then. Yep, I see that
Copan call this DISK AEROBICS(TM). Others probably have similar
systems.

>To my knowledge (which is guaranteed to be only partial), the Copan
>system has the highest density of storage in TB/sqft of floor space,
>or TB/cuft of data center colume, or TB/kW of power used, or some
>metric like that (I don't remember the details). Probably Copan's
>website will have such information. I would bet that a Copan system
>is several hundred disks in a rack. There may be other vendors that
>are providing disk systems that spin down or have similar densities.


896 disks in a rack! Yeah, that's impressive though the 250GB disks
keeps the toal storage down to 224TB (raw). If they certify 500 GB
disks it could house 448TB raw. Up to 4 GigE ports (supporting NFS!,
TCP/IP?, COPAN?)

With 112 disks per shelf it's probably safe to assume that a lot of
the Copan systems have more than 100 disks.

I haven't studied the area, so the densest I knew of was 420 disks,
but since they have 500GB disks for it ends up at 210TB raw, it's
still much more dense than what several others here had mentioned.
Nexsan SATABeast, 4U units so 10 in a 42U rack. These are FC/SAN
units, with 2 or 4 2Gb FC (one or two controllers).

It does have something they call AutoMAID and the aggregate BTU/h is
slightly lower (but not by much, considering that Copan has more than
twice as many disks), though the official peak power is higher.

Both these are probably because they seem to push a bit for it as a
high-performance system which may result in more disks spinning at the
same time (and the older ATA model is very similar but doesn't list
AutoMAID).
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com