|
Home > Archive > Data Storage > February 2006 > shared disk, two hosts (*without clustering* ) in linux
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
shared disk, two hosts (*without clustering* ) in linux
|
|
| javier_cobas 2006-01-30, 5:48 pm |
| Hello,
i have been searching for a solution to have a shared disk (f.ex. SAN )
visible across two (or perhaps more) hosts, both of them able to read
or write on the disk [better if it is a filesystem in it, though raw
disk is also good]. The data must be perfectly synchronized between
both hosts. (If no f.s. this should be managed by my application, but
then i think that perhaps an option for no-cacheing data is needed from
the O.S.).
I have 'googled' for it but i am unable to find anything useful. O.S.
is linux.
i think "lustre", "cluster file system (CFS)" or other clustering
schema, is not an option here, since i would need the two
(LAN-connected ) hosts sharing the disk be one apart the other and
running different processes, because of different functionality.
any thoughts about it?.
TIA, any help appreciated.
--
Fco. Javier Cobas-Seoane
Madrid (Spain)
| |
| The Natural Philosopher 2006-01-30, 5:48 pm |
| javier_cobas wrote:
> Hello,
>
> i have been searching for a solution to have a shared disk (f.ex. SAN )
> visible across two (or perhaps more) hosts, both of them able to read
> or write on the disk [better if it is a filesystem in it, though raw
> disk is also good]. The data must be perfectly synchronized between
> both hosts. (If no f.s. this should be managed by my application, but
> then i think that perhaps an option for no-cacheing data is needed from
> the O.S.).
> I have 'googled' for it but i am unable to find anything useful. O.S.
> is linux.
>
> i think "lustre", "cluster file system (CFS)" or other clustering
> schema, is not an option here, since i would need the two
> (LAN-connected ) hosts sharing the disk be one apart the other and
> running different processes, because of different functionality.
>
> any thoughts about it?.
> TIA, any help appreciated.
>
> --
> Fco. Javier Cobas-Seoane
> Madrid (Spain)
>
this is a fairly common problem, and occurs absolutely and every day in
any shared database application, solved by breaking data writes into
atomic units, locking access whilst these atomic unit writes are carried
out, and keeping a log of all atomic transactions on a separate disk, so
you can roll back to a backup and roll forward the transactions.
Without knowing what is going ON to the shared data area, its hard to
say exactly what approach to take, but implementing a database server
add accessing it through normal APIs from your applications would be one
way.
| |
| javier_cobas 2006-01-30, 5:48 pm |
| Thanks.
The shared data area can be modified by some processes running on both
hosts.
They will be "normal" files, though somewhat big.
The idea of using a shared disk (via SAN [with HBA, controllers, ...
etc every path redundant]) is for having a means of quick and reliable
data communication between processes in both hosts, instead of "shared
memory". I have done it with "raw" shared disk on an hp's EVA3000 under
"Tru64 Unix" O.S. The SAN put the data quickly on the other host. It
was fast and efficient for our needs.
Now i wonder if that can be done under Linux also, *better* if it is
under a f.s. structure instead of raw disk. Linux distribution will be
(perhaps) Suse L. Ent. Server or Redhat E. S.
Javier.
--
F. Javier Cobas-Seoane
Madrid (Spain)
| |
| The Natural Philosopher 2006-01-30, 5:48 pm |
| javier_cobas wrote:
> Thanks.
> The shared data area can be modified by some processes running on both
> hosts.
> They will be "normal" files, though somewhat big.
> The idea of using a shared disk (via SAN [with HBA, controllers, ...
> etc every path redundant]) is for having a means of quick and reliable
> data communication between processes in both hosts, instead of "shared
> memory". I have done it with "raw" shared disk on an hp's EVA3000 under
> "Tru64 Unix" O.S. The SAN put the data quickly on the other host. It
> was fast and efficient for our needs.
>
> Now i wonder if that can be done under Linux also, *better* if it is
> under a f.s. structure instead of raw disk. Linux distribution will be
> (perhaps) Suse L. Ent. Server or Redhat E. S.
>
> Javier.
> --
> F. Javier Cobas-Seoane
> Madrid (Spain)
>
My only point is that you should not write to a shared disk and expect
to make it bombproof. Use of a decent protocol with file locking may
make this work, but its better to build a shim over the file system that
handles access to it, and dream up a protocol to 'write' through it via
networked connections, so you can ensure file system integrity by having
just one system finally wrote to the file.
| |
| Michael Heiming 2006-01-30, 5:48 pm |
| In comp.os.linux.misc javier_cobas <f.javier.cobas@gmail.com>:
> Thanks.
> The shared data area can be modified by some processes running on both
> hosts.
> They will be "normal" files, though somewhat big.
> The idea of using a shared disk (via SAN [with HBA, controllers, ...
> etc every path redundant]) is for having a means of quick and reliable
> data communication between processes in both hosts, instead of "shared
> memory". I have done it with "raw" shared disk on an hp's EVA3000 under
> "Tru64 Unix" O.S. The SAN put the data quickly on the other host. It
> was fast and efficient for our needs.
It's called TruCluster. IMHO the most advanced *nix cluster
technology, we'll see what HP makes out of it?
> Now i wonder if that can be done under Linux also, *better* if it is
> under a f.s. structure instead of raw disk. Linux distribution will be
> (perhaps) Suse L. Ent. Server or Redhat E. S.
Checkout GFS, sounds like what you want. IIRC it has been
completely GPL'ed by rh, so one could probably roll out his own.
But if you need a certified solution you should get it from rh.
Good luck
--
Michael Heiming (X-PGP-Sig > GPG-Key ID: EDD27B94)
mail: echo zvpunry@urvzvat.qr | PERL -pe 'y/a-z/n-za-m/'
#bofh excuse 128: Power Company having EMP problems with
their reactor
| |
| Faeandar 2006-01-30, 8:46 pm |
| On 30 Jan 2006 09:56:47 -0800, "javier_cobas"
<f.javier.cobas@gmail.com> wrote:
>Hello,
>
>i have been searching for a solution to have a shared disk (f.ex. SAN )
>visible across two (or perhaps more) hosts, both of them able to read
>or write on the disk [better if it is a filesystem in it, though raw
>disk is also good]. The data must be perfectly synchronized between
>both hosts. (If no f.s. this should be managed by my application, but
>then i think that perhaps an option for no-cacheing data is needed from
>the O.S.).
>I have 'googled' for it but i am unable to find anything useful. O.S.
>is linux.
>
>i think "lustre", "cluster file system (CFS)" or other clustering
>schema, is not an option here, since i would need the two
>(LAN-connected ) hosts sharing the disk be one apart the other and
>running different processes, because of different functionality.
>
>any thoughts about it?.
>TIA, any help appreciated.
I may not be understanding exactly what you're asking but it seems to
me a clustered file system is absolutely required. In the case of a
single node not controlling data access (like NAS for instance) the
hosts involved in writes have to be able to share lock information.
I'm not aware of anything other than a filesystem that will do this,
and a clustered filesystem at that.
Even in NFS the locks are advisory only.
Running different processes should not be an issue for any clustered
file system out there today, and as long as they're LAN connected they
can easily share lock and meta data.
~F
| |
| CBFalconer 2006-01-31, 2:46 am |
| The Natural Philosopher wrote:
> javier_cobas wrote:
>
> this is a fairly common problem, and occurs absolutely and every day in
> any shared database application, solved by breaking data writes into
> atomic units, locking access whilst these atomic unit writes are carried
> out, and keeping a log of all atomic transactions on a separate disk, so
> you can roll back to a backup and roll forward the transactions.
>
> Without knowing what is going ON to the shared data area, its hard to
> say exactly what approach to take, but implementing a database server
> add accessing it through normal APIs from your applications would be one
> way.
This is known as the readers and writers problem. See:
http://www.acm.org/classics/feb96/
and you will get an idea of how long ago it was solved.
--
"The power of the Executive to cast a man into prison without
formulating any charge known to the law, and particularly to
deny him the judgement of his peers, is in the highest degree
odious and is the foundation of all totalitarian government
whether Nazi or Communist." -- W. Churchill, Nov 21, 1943
| |
| Marten Kemp 2006-01-31, 2:46 am |
| javier_cobas wrote:
> Hello,
>
> i have been searching for a solution to have a shared disk (f.ex. SAN )
> visible across two (or perhaps more) hosts, both of them able to read
> or write on the disk [better if it is a filesystem in it, though raw
> disk is also good]. The data must be perfectly synchronized between
> both hosts. (If no f.s. this should be managed by my application, but
> then i think that perhaps an option for no-cacheing data is needed from
> the O.S.).
> I have 'googled' for it but i am unable to find anything useful. O.S.
> is linux.
>
> i think "lustre", "cluster file system (CFS)" or other clustering
> schema, is not an option here, since i would need the two
> (LAN-connected ) hosts sharing the disk be one apart the other and
> running different processes, because of different functionality.
>
> any thoughts about it?.
> TIA, any help appreciated.
>
> --
> Fco. Javier Cobas-Seoane
> Madrid (Spain)
I think that the SCSI command set includes Reserve and Release,
which can be used as a volume-level lock mechanism. The mainframe
OS MVS and its successors used this in the past. I think you'll
have to do without disk caching, though.
--
-- Marten Kemp
(Fix name and ISP to reply)
-=-=-
.... "There are no problems that cannot be solved by the judicious use of
high explosives" -- British Commando quote, circa WWII.
* TagZilla 0.059 * http://tagzilla.mozdev.org
| |
| Jan-Frode Myklebust 2006-01-31, 7:55 am |
| I don't understand why you don't consider a cluster file system an
option here ? It sounds like a cfs is just what you need. On linux
I would consider the major options:
o Oracle's ocfs2 http://oss.oracle.com/projects/ocfs2/ which
was recently added to the standard linux kernel, should soon
be certified for Oracle 10g, and probably will be a major player
in this area.
o IBM GPFS http://www-03.ibm.com/servers/eserv...tware/gpfs.html
which I'm sure is the most mature option on linux.. (but I
guess I might be biased).
o GFS from Red Hat / Sistina, which I haven't yet tested ...
mostly because it looks too complicated.
Lustre I think is more of a file system for larger clusters / high
performance computing, and not for a 2-node server cluster.
-jf
| |
| Javier Cobas 2006-01-31, 6:40 pm |
| Thanks, and also to all other responses so far.
Michael Heiming wrote
..../...
[vbcol=seagreen]
> It's called TruCluster. IMHO the most advanced *nix cluster
> technology, we'll see what HP makes out of it?
..../...
That's right, but for that particular case it was done without
TruCluster. At that time it was only needed transferring "big" data
structures between hosts (as an alternative to IPC messaging).
Then there were two possibilities, one using filesystem (UFS, AdvFs)
and the other using "raw" disk, write the big data structures to disk
and then signaling the data transfer completion to the other host (via
sys V or POSIX IPC messages with mention to the size and checksum of
the data transferred).
Since it was less overhead without f.s., "raw" disk was selected then.
This time however there are more issues, i think f.s. will be needed
eventually, but each host should maintain its own processes for itself,
apart from the other hosts. Suse Linux or Red Hat L. surely will be the
OS.
>Checkout GFS, .../...
I am doing it now. I was said that GFS could also be used without
clustering...
| |
| Javier Cobas 2006-01-31, 6:40 pm |
| Faeandar wrote on comp.arch.storage:
..../...
>Running different processes should not be an issue for any clustered
file system out there today
..../...
I have to follow that clue, because if i found one cluster that can
achieve it under Linux, then perhaps i could use that cluster (if i
cannot find a solution with independent hosts under Linux).
I would need the hosts in the cluster to be able to run their own
processes independently, while sharing the storage.
But if the cluster were not able to achieve process independence then i
would need sharing *only* the SAN storage *without* clustering the
hosts, either with a sort of "clustered" f.s. (preferably) or raw disk
without f.s. and without cacheing data on disks (as i have done in the
past with hp's EVA3000 SAN and Tru64 Unix). [see below the discussion].
Thanks
Javier
| |
| Michael Heiming 2006-01-31, 6:40 pm |
| In comp.os.linux.misc Javier Cobas <f.javier.cobas@gmail.com>:
> Michael Heiming wrote
[..]
> .../...
> That's right, but for that particular case it was done without
> TruCluster. At that time it was only needed transferring "big" data
> structures between hosts (as an alternative to IPC messaging).
> Then there were two possibilities, one using filesystem (UFS, AdvFs)
> and the other using "raw" disk, write the big data structures to disk
> and then signaling the data transfer completion to the other host (via
> sys V or POSIX IPC messages with mention to the size and checksum of
> the data transferred).
> Since it was less overhead without f.s., "raw" disk was selected then.
Wouldn't really count that as shared disks, but this seems to be
just a question of definition.
[..]
> I am doing it now. I was said that GFS could also be used without
> clustering...
Perhaps you could take a look at enbd, unsure if this is what you
want, the author is a regular in this ng, perhaps Peter has few
warm words?
--
Michael Heiming (X-PGP-Sig > GPG-Key ID: EDD27B94)
mail: echo zvpunry@urvzvat.qr | PERL -pe 'y/a-z/n-za-m/'
#bofh excuse 220: Someone thought The Big Red Button was a
light switch.
| |
| Peter T. Breuer 2006-02-01, 2:47 am |
| In comp.os.linux.misc Michael Heiming <michael+USENET@www.heiming.de> wrote:
[vbcol=seagreen]
> Perhaps you could take a look at enbd, unsure if this is what you
> want, the author is a regular in this ng, perhaps Peter has few
> warm words?
It's not a clustering technology, however!
But with O_DIRECT it can certainly handle multiple writes from different
clients. That will prevent the clients caching a false image too.
The trouble is that's not enough - if you put a file system on top then
the fs's have their own caches, which wpn't be synced. So only a raw
device will work properly - in O_DIRECT mode.
Peter
| |
| Javier Cobas 2006-02-01, 5:50 pm |
| Thanks, and also to all other people responding.
Jan-Frode Myklebust wrote:
> I don't understand why you don't consider a cluster file system an
> option here ? It sounds like a cfs is just what you need.
..../...
Yes, i do consider any sort of Cluster File System, *but* only if it
lets me the option of *not* clusterize also the hosts. i.e.
clusterizing *only* the storage. (Independent hosts sharing only
storage in a SAN, and with filesystem preferably).
Host A has application functionality "X", host B has "Y". Then if A is
brought down, "X" funcionality ceases, but "Y" functionality remains,
and host B still has the CFS available to R/W.
Then, when A is available again, it 'sees' the changes applied to the
shared cluster filesystem by B while A was knocked out. [[Strange,
because with a standard cluster i know i would have both X and Y funcs.
all the time, but i was asked exactly as i have said].
If that were *not* possible with some available Cluster F.S. under
Linux, and then i am forced to take a "whole cluster" solution then, in
order to respect the specification, i would needed to at least preserve
exclusive application funcionality on each host, i. e. force "X"
application functionality only on host A (and "Y" only on B). [Up in
this thread Faeandar said that force independent processes should not
be a problem for most modern clusters. Now i am looking for one that is
able to achieve it].
Javier
| |
| Javier Cobas 2006-02-01, 5:50 pm |
| CBFalconer wrote:
>This is known as the readers and writers problem. See:
> http://www.acm.org/classics/feb96/ .../...
Thanks!. Interesting reading.
I was looking better for an available f.s. under Linux able to do that
without clustering the hosts, only the storage, and avoiding re-invent
the wheel. It is good to have this reference, to know exactly how it
can be done.
Javier
| |
| Jan-Frode Myklebust 2006-02-01, 5:50 pm |
| On 2006-02-01, Javier Cobas <f.javier.cobas@gmail.com> wrote:
>
> Yes, i do consider any sort of Cluster File System, *but* only if it
> lets me the option of *not* clusterize also the hosts. i.e.
> clusterizing *only* the storage. (Independent hosts sharing only
> storage in a SAN, and with filesystem preferably).
OK, then these clustering file systems should work for you. They're
general purpose unix file systems, and you can run most anything
on them as on any other local file system.
>
> Host A has application functionality "X", host B has "Y". Then if A is
> brought down, "X" funcionality ceases, but "Y" functionality remains,
> and host B still has the CFS available to R/W.
The only uncertainty I see here is that for a 2-node cluster you
might have problems maintaining quorum when you lose one node.
For GPFS, which I'm most familiar with, you can achieve this is
resolved by using 2 nodes and 1 disk for quorum. As long as 2 of
{node1, node2, quorum-disk} is up, your file system cluster should
be OK.
Not sure how/if GFS/OCFS2 solves this, but I'd expect them to do
something similar.
-jf
|
|
|
|
|