|
Home > Archive > Data Storage > October 2004 > Question: Reliability of physical snapshots on SANs
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Question: Reliability of physical snapshots on SANs
|
|
| Erik H. 2004-10-28, 5:45 pm |
| Hello!
As far as I know some SAN solutions are able to snapshot
a 'partition' on hardware level.
So after performing the snapshot (which takes a second or so)
directly on the SAN, I see partition P and the snapshot P'.
Changes to P are not visible in the snapshot P' and vice versa.
Now my problem is:
Because the SAN doesn't know what kind of file system is
stored on the partition it cannot know if the file
system on partition P is consistent when the snapshot is done.
So why does everybody tell me, that the file system P' will ALWAYS be ok.?
If I don't make sure (on the computer using Partition P) that no
(sensitive) 'control data' is send while the snapshots starts on the SAN
I might get a corrupted file system!
E.g. I imaging the following scenario:
time
1 File system on P is fine
2 control data of P is updated (e.g send in two scsi blocks)
block O1 and O2 need to be replaced with N1 and N2.
3a First block N1 of control data is send to SAN
4a SAN starts snapshot partition P
3b Second block N2 of control data is send to SAN
4b SAN sets up P'
4c SAN provides P and snapshot P'
Because the snapshot 'freeze' was done between 3a and 3b the
partition P' will see N1 and O2 - the file system is broken!
I don't see how the SAN will detect such a situation to make sure
this NEVER happens!
Comments:
* I know this might look like an academic question - but if
it can fail in theory it can fail in real life.
[E.g. a running program creates thousand of new directories]
* It might work 100% failsafe it the underlying protocol is using
transactions but SCSI doesn't do. Or is there some other
SCSI trick I don't know (e.g. some 'drive will be removed, flush
your (control) data' SCSI message)
* Modern (journaling) file systems are not corrupted that easily but
even if only some journaling data is inconsistent on P' it might
cause trouble if the file system P' is a readonly file system so the
journaling data cannot be repaired.
* I think the only reliable way is to make sure no
control data is written to the partition while the snapshot is done.
That is you have to unmount the file system first!
* What happens if you are currently copying a 1 GB file and only
512 MB have been copied when the snapshot is done. I guess you
get a truncated file on P'.
But what if the (badly designed?) file system control data
already says it should be 1 GB -> 'broken' file system.
* Some people say it is enough to synchronize the partition using the
computer it is mounted on a few seconds before starting the snapshot.
That way changed control data is only in the cache.
NO! What if someone else is starting a second sync!
Or caching is disabled or the partition is accessed directly (databases).
* Some people got the idea the SAN waits until it gets no write requests for
that partition for 'some time' - so no 'half-done' transactions are
seen by the SAN - very unreliable! And a snapshot would not be possible
if a big file is currently written.
* I understand that it works if on your computer you can
'block' control data for some time and you can trigger the snapshot
on the SAN - but I have been told the computer doesn't have to know
and you don't have to do anything on your computer!
* All people I talked to do agree that copying a big file during
the snapshot procedure will result in a truncated file but they
deny this happens to 'control data' - I don't see the difference:
if a file that is one million blocks in size is truncated by the snapshot
a two blocks file will be too - and because the SAN doesn't know the
difference between a 'harmless' two blocks file and two blocks of
important control data that belongs together the problem is still there.
* To me a SAN hardware snapshot is equal to splitting a SCSI mirror
(RAID-1) when the write LED is off by plugging one disk out of the box.
When mounting the disk on another computer it might work - but there
is no guarantee it does!
Thank You,
Erik
| |
| Arne Joris 2004-10-28, 5:45 pm |
| Erik H. <sn192he@uni-duisburg.de> wrote:
> Hello!
>
> As far as I know some SAN solutions are able to snapshot
> a 'partition' on hardware level.
> So after performing the snapshot (which takes a second or so)
> directly on the SAN, I see partition P and the snapshot P'.
> Changes to P are not visible in the snapshot P' and vice versa.
>
> Now my problem is:
> Because the SAN doesn't know what kind of file system is
> stored on the partition it cannot know if the file
> system on partition P is consistent when the snapshot is done.
Indeed, either your snapshot solution knows about the filesystem and is
integrated with it, or you need to unmount the filesystem (or at least
stop all apps and cause a host cache flush before starting the snap).
Sure you could rely on filesystem journals to recover from whatever
inconsistencies you caused by snapshotting while the filesystem was
still running, but that doesn't count as a point-in-time snapshot then.
Arne Joris
| |
| Erik H. 2004-10-29, 7:45 am |
| Arne Joris <nospam@orf.org> wrote in message news:<Wmbgd.51049$Pl.44100@pd7tw1no>...
> Erik H. <sn192he@uni-duisburg.de> wrote:
>
> Indeed, either your snapshot solution knows about the filesystem and is
> integrated with it, or you need to unmount the filesystem (or at least
> stop all apps and cause a host cache flush before starting the snap).
The problem is that some people say the 'flush' solution works, but
I say it doesn't (on a multi-user environment). Reason:
If User A (root) does a host cache flush and a few seconds later
starts the snapshot, some other user/application might have changed
the data again (e.g. creating many directories). The changes might
still be in the cache, but I think I cannot rely on that fact, e.g.
another User/Application might have started another host cache clush
and User A starts the snapshot while the 2. host cache flush is ongoing,
which might result in a broken file system again.
>
> Sure you could rely on filesystem journals to recover from whatever
> inconsistencies you caused by snapshotting while the filesystem was
> still running, but that doesn't count as a point-in-time snapshot then.
>
> Arne Joris
Thank You,
Erik
| |
| Arne Joris 2004-10-29, 5:45 pm |
| Erik H. <sn192he@uni-duisburg.de> wrote:
> The problem is that some people say the 'flush' solution works, but
> I say it doesn't (on a multi-user environment). Reason:
> If User A (root) does a host cache flush and a few seconds later
> starts the snapshot, some other user/application might have changed
> the data again (e.g. creating many directories). The changes might
> still be in the cache, but I think I cannot rely on that fact, e.g.
> another User/Application might have started another host cache clush
> and User A starts the snapshot while the 2. host cache flush is ongoing,
> which might result in a broken file system again.
Yes you need to ensure the filesystem is not doing any metadata or data
operations while the snapshot is being taken. Most filesystems do not
have any guarantees about changes staying in the host cache for any
length of time, so you can't rely on that.
It all depends on the purpose of your snapshot; if all you want is a
"workable" filesystem in your snapshot, you might be able to live with
the proposed solution; metadata journaling should log a couple hundred
meta data operations, and it could be unlikely (depending what your
filesystem is being used for) that a user causes this many metadata changes
(ie. create a file, delete a file, append to a file,...) while the
snapshot is being taken. So even though the filesystem on your snapshot will
be slightly incoherent, the journal allows it to become coherent again.
If you require data consistency on the other hand (for example you rely
on data in lock files and the files being locked to be in sync, or you rely on
a file's header to be used for locking purposes) this isn't good enough.
Arne Joris
|
|
|
|
|