Data Storage - Network mirroring

This is Interesting: Free IT Magazines  
Home > Archive > Data Storage > January 2008 > Network mirroring





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Network mirroring
S

2007-12-24, 7:11 pm

I started thinking about this after a conversation with an
acquaintance who runs a large database driven website. Currently he
has only one data center, all writes from all over the world come
there.

So he's planning 3 things:
0. Open up more DCs.
1. Some sort of geographic identification of an incoming IP request.
2. Redirect that request to a DC thats closest to the client.

The unresolved issue is how to keep his data in sync. Some more issues
to muddy the waters:
1. The storage is heterogeneous, so he can't just go with Snapmirror.
2. The updates across DCs must happen in near-real-time.
3. It can't cost an arm and a leg.

One big advantage is that since everything is DB-driven, the DB could
tell the mirroring s/w which files to transfer across, which
eliminates a huge issue.

So I was thinking...has anyone played around with rsync to make this
happen? I'd imagine you'd have to make some serious code changes to
the client, but it would be real interesting.

S
lahuman9

2007-12-26, 1:17 am

On Dec 24, 4:22=A0pm, S <enjoylife_95...@hotmail.com> wrote:
> I started thinking about this after a conversation with an
> acquaintance who runs a large database driven website. Currently he
> has only one data center, all writes from all over the world come
> there.
>
> So he's planning 3 things:
> 0. Open up more DCs.
> 1. Some sort of geographic identification of an incoming IP request.
> 2. Redirect that request to a DC thats closest to the client.
>
> The unresolved issue is how to keep his data in sync. Some more issues
> to muddy the waters:
> 1. The storage is heterogeneous, so he can't just go with Snapmirror.
> 2. The updates across DCs must happen in near-real-time.
> 3. It can't cost an arm and a leg.
>
> One big advantage is that since everything is DB-driven, the DB could
> tell the mirroring s/w which files to transfer across, which
> eliminates a huge issue.
>
> So I was thinking...has anyone played around with rsync to make this
> happen? I'd imagine you'd have to make some serious code changes to
> the client, but it would be real interesting.
>
> S


At first glance it without homogeneous storage and expecting it to be
cheap
I would say 'no way'.

But,

Depends on what you mean by "near-real-time" and how many files.
rsync
has a delay while it checks for which files have changed, but if your
software
can tell which files to transfer why not just use scp? Assuming you'd
set up
rsync to use scp to copy the file, you'd get rid of the overhead of
rsync.

I would not say that scp is near-real-time...
Alvin Andries

2007-12-26, 1:17 am


"S" <enjoylife_95135@hotmail.com> wrote in message
news:de0c0ca4-df05-4616-83de-4088fd554d92@t1g2000pra.googlegroups.com...
> I started thinking about this after a conversation with an
> acquaintance who runs a large database driven website. Currently he
> has only one data center, all writes from all over the world come
> there.
>
> So he's planning 3 things:
> 0. Open up more DCs.
> 1. Some sort of geographic identification of an incoming IP request.
> 2. Redirect that request to a DC thats closest to the client.
>
> The unresolved issue is how to keep his data in sync. Some more issues
> to muddy the waters:
> 1. The storage is heterogeneous, so he can't just go with Snapmirror.
> 2. The updates across DCs must happen in near-real-time.
> 3. It can't cost an arm and a leg.
>
> One big advantage is that since everything is DB-driven, the DB could
> tell the mirroring s/w which files to transfer across, which
> eliminates a huge issue.
>
> So I was thinking...has anyone played around with rsync to make this
> happen? I'd imagine you'd have to make some serious code changes to
> the client, but it would be real interesting.
>
> S


If your acquaintancce is using big $$$ DBs like Oracle or DB2, they have
options for distributed DB sites.
If he's running less elaborate DBs, the I would consider looking into
replaying change log files: otherwise, the syncing will take longer and
onger as the DB grows. Still, without more details, I only can say that you
should be aware of invalid states that can occur, e.g. you start with 10
items in stock and in the same sync slot, people at locations 1 and 2 order
6 items which they will be told to be delivered in 3 days.

Regards,
Alvin.


Dieter Stumpner

2007-12-26, 7:15 am

S wrote:
> I started thinking about this after a conversation with an
> acquaintance who runs a large database driven website. Currently he
> has only one data center, all writes from all over the world come
> there.


Hi!

It is complicate to replicate the database because of the locking
mechanism of a DB. Like previous poster mentioned, you cant sell one
item to two people.
I dont know your workload, but i would prefer a other approach. Use
reverse proxies to distribute your load. A huge example will be
"wikipedia" [1]. Only "one" DB-MySQL-Master and a lot of Apaches and Squids.

[1] http://meta.wikimedia.org/wiki/Wikimedia_servers

with best regards
Dieter Stumpner
S

2007-12-26, 7:12 pm

Hi Dieter,
The wiki example is awesome. Wiki's problem is kinda easy though
because presumably they have very few writes and a LOT of reads, for
which apache/squid would work.This guy has a different problem though
because he has lots of reads and lots of writes. Its not a big $$$ DB,
so I'm thinking replay the change logs and use scp/rsync.

Thanks all this was an interesting discussion.
S


On Dec 26, 3:49 am, Dieter Stumpner <D.Stump...@gmx.at> wrote:
> S wrote:
>
> Hi!
>
> It is complicate to replicate the database because of the locking
> mechanism of a DB. Like previous poster mentioned, you cant sell one
> item to two people.
> I dont know your workload, but i would prefer a other approach. Use
> reverse proxies to distribute your load. A huge example will be
> "wikipedia" [1]. Only "one" DB-MySQL-Master and a lot of Apaches and Squids.
>
> [1]http://meta.wikimedia.org/wiki/Wikimedia_servers
>
> with best regards
> Dieter Stumpner


JimK

2007-12-27, 7:13 pm

Any of the virtualization engines- Natapp gateways, IBM SVC, Falconstor,
etc., will aggregate heterogeneous back end disk arrays and do remote
mirroring of one sort or another. The LUNS are distributed to the
virtualization engines and by them to the servers.

You use the example of Snapmirror, which is a Netapp feature, so I
assume they have Netapp devices- probably filers with their own disk.
The gateways use existing backend disk. If they have Netapp boxes at
primary and secondary sites, they CAN use Snapmirror with heterogeneous
backend disk.


lahuman9 wrote:
> On Dec 24, 4:22 pm, S <enjoylife_95...@hotmail.com> wrote:
>
> At first glance it without homogeneous storage and expecting it to be
> cheap
> I would say 'no way'.
>
> But,
>
> Depends on what you mean by "near-real-time" and how many files.
> rsync
> has a delay while it checks for which files have changed, but if your
> software
> can tell which files to transfer why not just use scp? Assuming you'd
> set up
> rsync to use scp to copy the file, you'd get rid of the overhead of
> rsync.
>
> I would not say that scp is near-real-time...

SnowCanada@gmail.com

2008-01-02, 1:12 pm

JimK is correct. I work with FalconStor and it is possible to setup
synchronous or asynchronous remote mirrors between sites using
disparate hardware. The mirroring functions are done as a software
service through a gateway server/appliance and the back end hardware
has little or no bearing on the functionality. I can't comment on how
NetApps manages to keep the data in sync but can regarding FalconStor
for anyone interested. Once the first sync is done anything after that
is only sync'ing deltas based on changed sectors (not blocks). It is
very bandwidth efficient. There is a cache area defined so that
applications do not see the lag between sites and yet the sites will
remain in sync. In the event of a complete communications loss the
mirror is suspended and then once reestablished will perform a
comparison and fix the deltas not re-do the whole mirror. For complete
data integrity there are also application aware snapshot agents that
will properly quiesce databases for good restore points. Once
established you could perform backups at the centralized site and get
out of the backup handling and backup windows at the remote site(s).
Good DR is actually easier and more afordable that you may think.


On Dec 27 2007, 6:48=A0pm, JimK <jkell...@nc.rr.com> wrote:
> Any of the virtualization engines- Natapp gateways, IBM SVC,Falconstor,
> etc., will aggregate heterogeneous back end disk arrays and do remote
> mirroring of one sort or another. =A0The LUNS are distributed to the
> virtualization engines and by them to the servers.
>
> You use the example of Snapmirror, which is a Netapp feature, so I
> assume they have Netapp devices- probably filers with their own disk.
> The gateways use existing backend disk. =A0If they have Netapp boxes at
> primary and secondary sites, they CAN use Snapmirror with heterogeneous
> backend disk.
>
>
>
> lahuman9 wrote:
>
>
>
>
>
>
>
>
[vbcol=seagreen]
>
>
> - Show quoted text -


S

2008-01-03, 1:17 am

Wow neat!

Can you use Falconstor to mirror data between 2 netapps or say,
between a netapp and a linux box?

I'd be very curious to know how this works.

Thanks.
S

On Jan 2, 6:30 am, SnowCan...@gmail.com wrote:[vbcol=seagreen]
> JimK is correct. I work with FalconStor and it is possible to setup
> synchronous or asynchronous remote mirrors between sites using
> disparate hardware. The mirroring functions are done as a software
> service through a gateway server/appliance and the back end hardware
> has little or no bearing on the functionality. I can't comment on how
> NetApps manages to keep the data in sync but can regarding FalconStor
> for anyone interested. Once the first sync is done anything after that
> is only sync'ing deltas based on changed sectors (not blocks). It is
> very bandwidth efficient. There is a cache area defined so that
> applications do not see the lag between sites and yet the sites will
> remain in sync. In the event of a complete communications loss the
> mirror is suspended and then once reestablished will perform a
> comparison and fix the deltas not re-do the whole mirror. For complete
> data integrity there are also application aware snapshot agents that
> will properly quiesce databases for good restore points. Once
> established you could perform backups at the centralized site and get
> out of the backup handling and backup windows at the remote site(s).
> Good DR is actually easier and more afordable that you may think.
>
> On Dec 27 2007, 6:48 pm, JimK <jkell...@nc.rr.com> wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>

Cydrome Leader

2008-01-03, 1:17 am

S <enjoylife_95135@hotmail.com> wrote:
> I started thinking about this after a conversation with an
> acquaintance who runs a large database driven website. Currently he
> has only one data center, all writes from all over the world come
> there.
>
> So he's planning 3 things:
> 0. Open up more DCs.
> 1. Some sort of geographic identification of an incoming IP request.
> 2. Redirect that request to a DC thats closest to the client.


Instead of inventing a CDN, just use a real one.

> The unresolved issue is how to keep his data in sync. Some more issues
> to muddy the waters:
> 1. The storage is heterogeneous, so he can't just go with Snapmirror.
> 2. The updates across DCs must happen in near-real-time.


updates of what? static content or something trapped in a database?

> 3. It can't cost an arm and a leg.


Good luck with that part. It won't happen.

> One big advantage is that since everything is DB-driven, the DB could
> tell the mirroring s/w which files to transfer across, which
> eliminates a huge issue.


Or just have the database in one location. Why do you need to split the DB
up all over the place?

> So I was thinking...has anyone played around with rsync to make this
> happen? I'd imagine you'd have to make some serious code changes to
> the client, but it would be real interesting.


You can't rsync a live database. No matter what any hardware gadget vendor
tries to tell you about data replication, it doesn't work that way for
databases.

Just lookup the problems people have with database clusters with nodes
just feet away from each other. Now add latency and drop the connection
between those machines every now and then and see how things work out.
This applies to big boy databases like oracle, not just toys like
mysql.
SnowCanada@gmail.com

2008-01-04, 1:14 pm

There may be ways of doing this. FalconStor needs to see iSCSI or FC
(or IB) storage behind it but can act as a storage router serving out
over any storage protocol including acting as a NAS. Even without
FalconStor if the Netapps box can provide a disk to the Linux box you
could use the LVM tools provided to mirror a LUN locally. If you need
distant replication you would need some go between to efficiently
handle the communications. In that case you could use LVM to a local
IPStor appliance, replicate to a remote appliance that may point to
the Netapps if the right protocols are available. (Note: not LVMs are
created equally and some may not be able to do this).

There is also a pretty cool tool called FileSafe that would allow
periodic copying of a file/folder but you may need to have your own
open file manager tool as it does not include one for Linux (yet).


On Jan 2, 8:53=A0pm, S <enjoylife_95...@hotmail.com> wrote:
> Wow neat!
>
> Can you useFalconstorto mirror data between 2 netapps or say,
> between a netapp and a linux box?
>
> I'd be very curious to know how this works.
>
> Thanks.
> S
>
> On Jan 2, 6:30 am, SnowCan...@gmail.com wrote:
>
>
>
>
>
,[vbcol=seagreen]
>
t[vbcol=seagreen]
s[vbcol=seagreen]
>
[vbcol=seagreen]
>
..[vbcol=seagreen]
>
ues[vbcol=seagreen]
r.[vbcol=seagreen]
>
ld[vbcol=seagreen]
>
s[vbcol=seagreen]
[vbcol=seagreen]
>
>
e[vbcol=seagreen]
>
>
r[vbcol=seagreen]
ou'd[vbcol=seagreen]
>
>
>
> - Show quoted text -


Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com