Very Large Filesystems
Web Server forum
Back To The Forum Home!Search!Private Messaging System

Web Server Talk Web Server Talk > WebserverTalk Community > Data Storage > Very Large Filesystems




Pages (2): [1] 2 »   Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    Very Large Filesystems  
Aknin


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
04-28-07 12:12 PM

Following some research I've been doing on the matter across
newsgroups and mailing lists, I'd be glad if people could share
numbers about real life large filesystem and their experience with
them. I'm slowly coming to a realization that regardless of
theoretical filesystem capabilities (1TB, 32TB, 256TB or more), more
or less across the enterprise filesystem arena people are recommending
to keep practical filesystems up to 1TB in size, for manageability and
recoverability.

What's the maximum filesystem size you've used in production
environment? How did the experience come out?

Thanks,
-Yaniv






[ Post a follow-up to this message ]



    Re: Very Large Filesystems  
Faeandar


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
04-30-07 12:12 AM

On 28 Apr 2007 02:21:30 -0700, Aknin <the.aknin@gmail.com> wrote:

>Following some research I've been doing on the matter across
>newsgroups and mailing lists, I'd be glad if people could share
>numbers about real life large filesystem and their experience with
>them. I'm slowly coming to a realization that regardless of
>theoretical filesystem capabilities (1TB, 32TB, 256TB or more), more
>or less across the enterprise filesystem arena people are recommending
>to keep practical filesystems up to 1TB in size, for manageability and
>recoverability.
>
>What's the maximum filesystem size you've used in production
>environment? How did the experience come out?
>
>Thanks,
> -Yaniv

The true constraint, as you've pointed out, is recoverability.  If you
need to recover and entire file system in any sane amount of time 16TB
and bigger is out of the question.

I think 3-4TB is fine with today's tape drive speeds but there may be
limitations from your backup software.  I recall hearing a limit of
4TB per NDMP stream for NBU.

You could go higher I think if you have a directory structure that
allows for recovery prioritization.  If you have a 36TB file system
but you know that these 9 directories are the priority, then you
really only have a recover limit of those 9 directories.  The rest can
be done as time permits.

~F





[ Post a follow-up to this message ]



    Re: Very Large Filesystems  
Bill Todd


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
04-30-07 12:12 AM

Faeandar wrote:
> On 28 Apr 2007 02:21:30 -0700, Aknin <the.aknin@gmail.com> wrote:
> 
>
> The true constraint, as you've pointed out, is recoverability.  If you
> need to recover and entire file system in any sane amount of time 16TB
> and bigger is out of the question.

That really depends on how you're recovering it, which in turn depends
on what kind of problem you need to recover it from.

If you're talking about restoring from backup tapes, fine.  If you're
talking about recovery from backup disks (plus a few recent
incrementals, whether on disk or on tape, that can be applied directly
to them to recreate the running system), you can usually probably go
larger.  If you're talking about recovery using a synchronous
replication site, no size limit exists at all (though you need a)
snapshot or CDP facilities to ensure that common corruption at both
sites can be quickly backed out and b) *real* confidence in the software
not to have introduced system-level corruption at both sites, though the
latter can in part be addressed by using logical inter-site mirroring
with different software implementations at the two sites).

As the required software matures, CDP in combination with inter-site
synchronous replication (or low-delay asynchronous replication plus
local logging to cover the gap for anything save complete primary site
destruction) should help make make backups as obsolete as paper tape:
decreasing hardware costs for such services should make the management
costs of backups (let alone their effect on recovery-time objectives)
increasingly untenable.

- bill





[ Post a follow-up to this message ]



    Re: Very Large Filesystems  
Aknin


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
04-30-07 12:12 PM

On Apr 30, 12:48 am, Bill Todd <billt...@metrocast.net> wrote:
> Faeandar wrote: 
> 
> 
> 
> 
>
> That really depends on how you're recovering it, which in turn depends
> on what kind of problem you need to recover it from.
>
> If you're talking about restoring from backup tapes, fine.  If you're
> talking about recovery from backup disks (plus a few recent
> incrementals, whether on disk or on tape, that can be applied directly
> to them to recreate the running system), you can usually probably go
> larger.  If you're talking about recovery using a synchronous
> replication site, no size limit exists at all (though you need a)
> snapshot or CDP facilities to ensure that common corruption at both
> sites can be quickly backed out and b) *real* confidence in the software
> not to have introduced system-level corruption at both sites, though the
> latter can in part be addressed by using logical inter-site mirroring
> with different software implementations at the two sites).
>
> As the required software matures, CDP in combination with inter-site
> synchronous replication (or low-delay asynchronous replication plus
> local logging to cover the gap for anything save complete primary site
> destruction) should help make make backups as obsolete as paper tape:
> decreasing hardware costs for such services should make the management
> costs of backups (let alone their effect on recovery-time objectives)
> increasingly untenable.
>
> - bill

The system in question is made of millions (sometimes more) of small
files. Corruption in any particular file isn't troublesome, nor even
in hundreds of files. The block device is mirrored and is stored on
expensive SAN arrays that are trusted not to choke and die, and
snapshots can be taken at regular intervals.

As you can probably understand, the amount of files times the capacity
(tens of TBs and growing...) makes backups quite irrelevant, and what
we're counting on (maybe unjustly) is the mirroring and the
snapshotting. We trust the system in the sense that it's too stupid to
do something wrong, it works at the file level and is exceedingly
unlikely to corrupt more than a file (or two, or a hundred - but no
more) at a time.

What /is/ worrying to me is silent filesystem corruption that will at
some point jump and bite my arse. Filesystem corruption will cause
prompt snapshot rollback and incremental recovery*, but I'm worried
about rolling back only to discover the filesystem was already
corrupted at the time of the snap. I don't have room for much more
than one or two snaps.

So you see the most complex part of my scenario is the filesystem,
rather than the system, and tape backup is totally impractical even
for sizes much smaller than 4TB.

Does that change your advice?






[ Post a follow-up to this message ]



    Re: Very Large Filesystems  
Ernst S Blofeld


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
04-30-07 12:12 PM

Aknin wrote:
> What /is/ worrying to me is silent filesystem corruption that will at
> some point jump and bite my arse.

Which is leading you to suggest that your files are best kept across a
number of independent filesystem 'domains' so as to contain the possible
effects of any corruption. This would seem a reasonable suggestion with
the proviso that the 'domains' are genuinely independent and not sitting
on the same SAN, fileserver etc. etc. You also need to be confident of
detecting the corruption as soon as possible, for the reasons that you
outline.

It seems that the only logical solution is automatic checksumming
coupled with redundancy, in the manner that ZFS does. No doubt this
feature will be found in other filesystems in the future.

ESB





[ Post a follow-up to this message ]



    Re: Very Large Filesystems  
Jan-Frode Myklebust


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
04-30-07 06:13 PM

On 2007-04-30, Ernst S Blofeld <E.Blofeld@new-spectre-base.com> wrote:
>
> It seems that the only logical solution is automatic checksumming
> coupled with redundancy, in the manner that ZFS does.

This blind trust in ZFS amazes me.. ZFS will have bugs, and get corrupted
like any other file system, and then you'll need your backups. Also, when
the automatic checksumming finds a corruption, you'll need your backups.

So the answers is backup (online, nearline, offline, whatever), and spread
your files over many small'ish fs's to reduce the time to recover from a
fs corrution.


-jf





[ Post a follow-up to this message ]



    Re: Very Large Filesystems  
Bill Todd


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
04-30-07 06:13 PM

Jan-Frode Myklebust wrote:
> On 2007-04-30, Ernst S Blofeld <E.Blofeld@new-spectre-base.com> wrote: 
>
> This blind trust in ZFS amazes me..

Perhaps as much as blind ignorance like yours amazes me.  The main
difference between us being that no one talking about ZFS in any way
suggested trusting it blindly:  ESB above suggested a mechanism *like*
ZFS's (in case you're unaware of the fact, other more mature systems
provide features of this nature), and I suggested that ZFS, while by no
means mature, *might* still satisfy the expressed needs..

ZFS will have bugs, and get corrupted
> like any other file system, and then you'll need your backups. Also, when
> the automatic checksumming finds a corruption, you'll need your backups.

Well, no:  the redundancy is used to correct it.  And in the unlikely
event that the corruption was system-caused and hence loyally replicated
by lower-level functions, that's what no-overwrite snapshotting is for:
it would take a particularly pathological bug to subvert both the
main-line data and the separate snapshots.

>
> So the answers is backup (online, nearline, offline, whatever)

Since the original poster just told us that this is *not* a suitable
answer, one can only assume that you're listening to him as poorly as
you've apparently listened to others.

- bill





[ Post a follow-up to this message ]



    Re: Very Large Filesystems  
Jan-Frode Myklebust


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
05-01-07 12:13 AM

On 2007-04-30, Bill Todd <billtodd@metrocast.net> wrote:
>
> Perhaps as much as blind ignorance like yours amazes me.  The main
> difference between us being that no one talking about ZFS in any way
> suggested trusting it blindly:  ESB above suggested a mechanism *like*
> ZFS's (in case you're unaware of the fact, other more mature systems
> provide features of this nature), and I suggested that ZFS, while by no
> means mature, *might* still satisfy the expressed needs..

To quote the OP:

"What /is/ worrying to me is silent filesystem corruption that will at
some point jump and bite my arse. Filesystem corruption will cause
prompt snapshot rollback and incremental recovery*, but I'm worried
about rolling back only to discover the filesystem was already
corrupted at the time of the snap. I don't have room for much more
than one or two snaps."

Is there any other solution than backups, if neither the fs nor the two
snaps can be trusted ? I would argue that making your fs's as small as
possible, to confine the damage, and keeping good backups is the best
option. Why would tape backup be "totally impractical even for sizes
much smaller than 4TB." ?


And the quoting you from another recent thread:

"Though (as I already noted) I don't have any direct experience with it,
my impression is that people are using it in production systems
successfully "

"My impression is that *some* customers have workloads that have found
ZFS to be very stable already, while others push corner cases that are
still uncovering bugs."

So you agree it's a fairly new fs where people are still uncovering bugs,
have no direct experience with it, and do you still think it's the
solution to the OP's worry about file system corruption ?

> 
>
> Since the original poster just told us that this is *not* a suitable
> answer, one can only assume that you're listening to him as poorly as
> you've apparently listened to others.

He doesn't say much about why backups would be "totally impractical", so
I'm suggesting the best option (when you have fs corruption, and the 2
snaps isn't good enough) is to spread the files over as many fs's as
possible to confine the damage and amount of files that's needed to
restore from backup.


-jf





[ Post a follow-up to this message ]



    Re: Very Large Filesystems  
Ernst S Blofeld


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
05-01-07 06:13 AM

Jan-Frode Myklebust wrote:
> Is there any other solution than backups, if neither the fs nor the two
> snaps can be trusted ? I would argue that making your fs's as small as
> possible, to confine the damage, and keeping good backups is the best
> option. Why would tape backup be "totally impractical even for sizes
> much smaller than 4TB." ?

Who said don't make backups ? ZFS is not a backup solution but a
filesystem with checksumming and redundancy features. I've never heard
anyone seriously suggest that ZFS obviated the need for backups, not in
this thread or anywhere else. Rant about non-issues elsewhere please.

As already pointed out, increasing the number of filesystems does not
increase the protection because you still have all the common modes of
failure (including the software bugs that you are so apparently keen
on). How much better off are a million files on a single filesystem
against the same files on a thousand filesystems if everything else
remains equal? There is no meaningful difference at all.

Moreover backups do not address the OP's point - silent corruption. If
you aren't checking your files how can you have any confidence in your
backups? A backup is as problematic in terms of integrity as the
filesystem it is read from. Backing-up a corrupt file doesn't fix it.

You cannot avoid the need for checksumming to detect errors and
redundancy to fix them. Putting these features directly in your
filesystem is a good idea - integrity is maintained and there is fast
recovery. The fact that there will be teething problems in ZFS or an
equivalent filesystem is not a sound basis for rejecting these features.

There will still be backups in the future too.

ESB





[ Post a follow-up to this message ]



    Re: Very Large Filesystems  
Aknin


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
05-01-07 12:14 PM

On May 1, 3:15 am, Ernst S Blofeld <E.Blof...@new-spectre-base.com>
wrote:
> Jan-Frode Myklebust wrote: 
>
> Who said don't make backups ? ZFS is not a backup solution but a
> filesystem with checksumming and redundancy features. I've never heard
> anyone seriously suggest that ZFS obviated the need for backups, not in
> this thread or anywhere else. Rant about non-issues elsewhere please.
>
> As already pointed out, increasing the number of filesystems does not
> increase the protection because you still have all the common modes of
> failure (including the software bugs that you are so apparently keen
> on). How much better off are a million files on a single filesystem
> against the same files on a thousand filesystems if everything else
> remains equal? There is no meaningful difference at all.
>
> Moreover backups do not address the OP's point - silent corruption. If
> you aren't checking your files how can you have any confidence in your
> backups? A backup is as problematic in terms of integrity as the
> filesystem it is read from. Backing-up a corrupt file doesn't fix it.
>
> You cannot avoid the need for checksumming to detect errors and
> redundancy to fix them. Putting these features directly in your
> filesystem is a good idea - integrity is maintained and there is fast
> recovery. The fact that there will be teething problems in ZFS or an
> equivalent filesystem is not a sound basis for rejecting these features.
>
> There will still be backups in the future too.
>
> ESB

I've cross-posted this question on several places, and practically all
answers switched immediately to backup/restore issues. It seems that
no-one puts any kind of trust in filesystems, in the sense that even
if you have an expensive mirrored SAN, the system (the software
managing the data) is too stupid to cause corruption (more about that
in my previous post) and small amounts of data /may/ be lost without
too much pain, people here (and on VxFS ML, and on ZFS-discuss)
recommend to backup the filesystem (i.e., copy all it's data to
something which has a different data structure than the filesystem
itself, implicitly because the FS /will/ get corrupt at some point) or
split it into smaller FSs (implicitly because then if one of them gets
corrupt, we can contain the damage and restore backups).

So it seems like 'we' always think an FS will get corrupt, and no
amount of sophistication will make it not-to, or at least not in a way
that is a total-loss. Would anyone here trust the filesystem (any
filesystem, name your pick) enough to make a few (say 3 or 4) 32TB
monsters holding the above-mentioned kind of data and being backed
solely by snaps? If you feel that it's not safe - what good are those
gigantic-interconnected/grid-multi-TB-super-expensive SANs, if you
can't mkfs more than a few TBs without fear because of filesystem
limitation?

Thanks for your replies, they've been very interesting and useful so
far!

- Yaniv






[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 10:22 AM.      Post New Thread    Post A Reply      
Pages (2): [1] 2 »   Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 
Medical and Health forum | Computer Games Reviews | Graphics design forum

Back To The Top
Home | Usercp | Faq | Register