Data Storage - NDMP backups I/O bottleneck?

This is Interesting: Free IT Magazines  
Home > Archive > Data Storage > January 2007 > NDMP backups I/O bottleneck?





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author NDMP backups I/O bottleneck?
Mike

2007-01-17, 1:18 am

Hello,

Scheme:

10x NetApp filer, 2x 1Tb volumes
ADIC library with 6x LTO-2 drives
Brocade Silkworm 3800 16 port FC Switch
Veritas Netbackup with shared storage option enabled

Backup Server - IBM machine with 2x Intel Xeon and 8Gb memory, running
fc3. There is an external storage with 14x 146Gb Raid5EE U320 10K RPM
drives attached.

Problem:
Volumes on the netapps have a lot of directories and small files. When
I start backup (over NDMP), netbackup mounts the tape and starts a NDMP
session between NetApp and tape drive. It takes forever to complete
that backup.

This is a log from the netapp:
Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15997
Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006386
Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096
Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15998
Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006387
Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096

I'm getting a lot of those messages...

This is tape I/O log from the same netapp..

CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s
Cache
in out read write read write
age
44% 7659 0 0 4381 11951 16155 16 0 0
4
51% 6027 0 0 4357 9726 20571 12323 0 0
4
46% 6137 0 0 3098 11679 20951 10438 0 0
4
78% 5207 0 0 3369 11168 25557 9513 0 1843
4
93% 6318 0 0 4424 9623 24429 12163 0 3270
4
46% 5846 0 0 3410 8087 16164 14505 0 516
4
31% 5225 0 0 1910 7638 11612 6680 0 0
4
31% 5986 0 0 4416 11591 11960 24 0 0
4
30% 5746 0 0 4670 9569 12627 0 0 0
4
34% 7059 0 0 4049 12973 9670 8 0 0
4


As you can see here, I'm getting tape write performance about 563
Kb/s.. LTO-2 can store data at 30Mb/s (I saw it!!)

Now, top output from the backup server:

[root@backup root]# top

23:00:23 up 7 days, 9:41, 2 users, load average: 7.56, 7.25, 7.15
214 processes: 213 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait
idle
total 4.6% 0.0% 6.5% 0.0% 0.0% 88.4%
0.1%
cpu00 4.5% 0.0% 5.3% 0.0% 0.0% 90.0%
0.0%
cpu01 8.9% 0.0% 13.5% 0.0% 0.0% 77.4%
0.0%
cpu02 3.5% 0.0% 2.5% 0.0% 0.0% 93.0%
0.7%
cpu03 1.3% 0.0% 4.5% 0.3% 0.1% 93.4%
0.0%

It's clearly an I/O bottle neck here. NetApp send a NDMP packet with
directories, and since the backup is running in the indexed mode (with
indexes off I can't restore by file), and netback parses it storing
information in the DB on that external storage. While data storage
procedure is running - netbackup doesn't send out any comfirmation
packets to the netapp.

So, how do I get rid of that I/O bottleneck?


Thanks!

Faeandar

2007-01-18, 1:17 am

On 16 Jan 2007 20:08:35 -0800, "Mike" <mike.belov@gmail.com> wrote:

>Hello,
>
>Scheme:
>
>10x NetApp filer, 2x 1Tb volumes
>ADIC library with 6x LTO-2 drives
>Brocade Silkworm 3800 16 port FC Switch
>Veritas Netbackup with shared storage option enabled
>
>Backup Server - IBM machine with 2x Intel Xeon and 8Gb memory, running
>fc3. There is an external storage with 14x 146Gb Raid5EE U320 10K RPM
>drives attached.
>
>Problem:
>Volumes on the netapps have a lot of directories and small files. When
>I start backup (over NDMP), netbackup mounts the tape and starts a NDMP
>session between NetApp and tape drive. It takes forever to complete
>that backup.
>
>This is a log from the netapp:
>Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
>Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
>Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15997
>Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006386
>Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
>Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
>Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
>Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
>Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096
>Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
>Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
>Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15998
>Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006387
>Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
>Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
>Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
>Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
>Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096


How long does it take for the backup to complete? And what is the
average transfer rate once it gets to phase 4?

What I think you're seeing is the same issue every file system has
when you try to do a sequential dump of a large number files and/or
directories. There is a lot of metadata that the server has to map
before it can start sending it full stream to tape.

My guess is once you get to phase 4 dump you will see very good tape
stream performance.

>
>I'm getting a lot of those messages...
>
>This is tape I/O log from the same netapp..
>
> CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s
>Cache
> in out read write read write
> age
> 44% 7659 0 0 4381 11951 16155 16 0 0
> 4
> 51% 6027 0 0 4357 9726 20571 12323 0 0
> 4
> 46% 6137 0 0 3098 11679 20951 10438 0 0
> 4
> 78% 5207 0 0 3369 11168 25557 9513 0 1843
> 4
> 93% 6318 0 0 4424 9623 24429 12163 0 3270
> 4
> 46% 5846 0 0 3410 8087 16164 14505 0 516
> 4
> 31% 5225 0 0 1910 7638 11612 6680 0 0
> 4
> 31% 5986 0 0 4416 11591 11960 24 0 0
> 4
> 30% 5746 0 0 4670 9569 12627 0 0 0
> 4
> 34% 7059 0 0 4049 12973 9670 8 0 0
> 4


Capture this information at phase 4.

>
>
>As you can see here, I'm getting tape write performance about 563
>Kb/s.. LTO-2 can store data at 30Mb/s (I saw it!!)
>
>Now, top output from the backup server:
>
>[root@backup root]# top
>
> 23:00:23 up 7 days, 9:41, 2 users, load average: 7.56, 7.25, 7.15
>214 processes: 213 sleeping, 1 running, 0 zombie, 0 stopped
>CPU states: cpu user nice system irq softirq iowait
>idle
> total 4.6% 0.0% 6.5% 0.0% 0.0% 88.4%
>0.1%
> cpu00 4.5% 0.0% 5.3% 0.0% 0.0% 90.0%
>0.0%
> cpu01 8.9% 0.0% 13.5% 0.0% 0.0% 77.4%
>0.0%
> cpu02 3.5% 0.0% 2.5% 0.0% 0.0% 93.0%
>0.7%
> cpu03 1.3% 0.0% 4.5% 0.3% 0.1% 93.4%
>0.0%


All this is expected and normal when dealing with phase 1 through 3 if
a dump.

>
>It's clearly an I/O bottle neck here. NetApp send a NDMP packet with
>directories, and since the backup is running in the indexed mode (with
>indexes off I can't restore by file), and netback parses it storing
>information in the DB on that external storage. While data storage
>procedure is running - netbackup doesn't send out any comfirmation
>packets to the netapp.
>
>So, how do I get rid of that I/O bottleneck?


Have fewer files and directories per volume. Or use snapmirror to
tape. Or skip tape alltogether and go with something like Avamar or
Data Domain.

This issue has plagued people for decades and is not specific to
NetApp and certainly not NDMP.

NDMP is simply the command protocol, not a transfer or backup
protocol. NDMP is simply keeping a connection open so the filer can
tell the NDMP client (backup server) when it's ready to start sending
data to tape via standard *nix dump command. Plus all the metadata
bits of course.

~F
Moojit

2007-01-25, 1:14 pm

if your server is running windows, try using datamover to determine how well
the server to storage interface performs. You'll need a demo license to use
the advanced dialog features which is probably what you want
to do. Please be careful, datamover can talk to logical or physical LUNs.
Physical device I/O will destroy your data, please use logical only.

download from www.moojit.net


"Faeandar" <mr_castalot@yahoo.com> wrote in message
news:fpmtq2dnrji87ej7juvq9ng8ulhs37llhj@
4ax.com...
> On 16 Jan 2007 20:08:35 -0800, "Mike" <mike.belov@gmail.com> wrote:
>
>
> How long does it take for the backup to complete? And what is the
> average transfer rate once it gets to phase 4?
>
> What I think you're seeing is the same issue every file system has
> when you try to do a sequential dump of a large number files and/or
> directories. There is a lot of metadata that the server has to map
> before it can start sending it full stream to tape.
>
> My guess is once you get to phase 4 dump you will see very good tape
> stream performance.
>
>
> Capture this information at phase 4.
>
>
> All this is expected and normal when dealing with phase 1 through 3 if
> a dump.
>
>
> Have fewer files and directories per volume. Or use snapmirror to
> tape. Or skip tape alltogether and go with something like Avamar or
> Data Domain.
>
> This issue has plagued people for decades and is not specific to
> NetApp and certainly not NDMP.
>
> NDMP is simply the command protocol, not a transfer or backup
> protocol. NDMP is simply keeping a connection open so the filer can
> tell the NDMP client (backup server) when it's ready to start sending
> data to tape via standard *nix dump command. Plus all the metadata
> bits of course.
>
> ~F



Raju Mahala

2007-01-25, 1:14 pm


have you tried storage pool in between backup server and tape. I am not
sure it works in netbackup. Same issue we have. We have lots of files
on almost all netapp volume. average file size is very less.
We use Tivoli storage manager so we configured LAN back which I feel is
better in case of lots of smaller file size. Backup server first backup
files in diskstorage pool and once backup completed then moves from
storage pool to tape in offline manner so primary netapp files doesn't
remain busy alltime.
please check if diskstorage pool posibility exists in netbackup

Regards,
Raju

On Jan 17, 9:08 am, "Mike" <mike.be...@gmail.com> wrote:
> Hello,
>
> Scheme:
>
> 10x NetApp filer, 2x 1Tb volumes
> ADIC library with 6x LTO-2 drives
> Brocade Silkworm 3800 16 port FC Switch
> Veritas Netbackup with shared storage option enabled
>
> Backup Server - IBM machine with 2x Intel Xeon and 8Gb memory, running
> fc3. There is an external storage with 14x 146Gb Raid5EE U320 10K RPM
> drives attached.
>
> Problem:
> Volumes on the netapps have a lot of directories and small files. When
> I start backup (over NDMP), netbackup mounts the tape and starts a NDMP
> session between NetApp and tape drive. It takes forever to complete
> that backup.
>
> This is a log from the netapp:
> Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
> Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
> Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15997
> Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006386
> Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
> Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
> Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
> Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
> Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096
> Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
> Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
> Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15998
> Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006387
> Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
> Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
> Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
> Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
> Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096
>
> I'm getting a lot of those messages...
>
> This is tape I/O log from the same netapp..
>
> CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s
> Cache
> in out read write read write
> age
> 44% 7659 0 0 4381 11951 16155 16 0 0
> 4
> 51% 6027 0 0 4357 9726 20571 12323 0 0
> 4
> 46% 6137 0 0 3098 11679 20951 10438 0 0
> 4
> 78% 5207 0 0 3369 11168 25557 9513 0 1843
> 4
> 93% 6318 0 0 4424 9623 24429 12163 0 3270
> 4
> 46% 5846 0 0 3410 8087 16164 14505 0 516
> 4
> 31% 5225 0 0 1910 7638 11612 6680 0 0
> 4
> 31% 5986 0 0 4416 11591 11960 24 0 0
> 4
> 30% 5746 0 0 4670 9569 12627 0 0 0
> 4
> 34% 7059 0 0 4049 12973 9670 8 0 0
> 4
>
> As you can see here, I'm getting tape write performance about 563
> Kb/s.. LTO-2 can store data at 30Mb/s (I saw it!!)
>
> Now, top output from the backup server:
>
> [root@backup root]# top
>
> 23:00:23 up 7 days, 9:41, 2 users, load average: 7.56, 7.25, 7.15
> 214 processes: 213 sleeping, 1 running, 0 zombie, 0 stopped
> CPU states: cpu user nice system irq softirq iowait
> idle
> total 4.6% 0.0% 6.5% 0.0% 0.0% 88.4%
> 0.1%
> cpu00 4.5% 0.0% 5.3% 0.0% 0.0% 90.0%
> 0.0%
> cpu01 8.9% 0.0% 13.5% 0.0% 0.0% 77.4%
> 0.0%
> cpu02 3.5% 0.0% 2.5% 0.0% 0.0% 93.0%
> 0.7%
> cpu03 1.3% 0.0% 4.5% 0.3% 0.1% 93.4%
> 0.0%
>
> It's clearly an I/O bottle neck here. NetApp send a NDMP packet with
> directories, and since the backup is running in the indexed mode (with
> indexes off I can't restore by file), and netback parses it storing
> information in the DB on that external storage. While data storage
> procedure is running - netbackup doesn't send out any comfirmation
> packets to the netapp.
>
> So, how do I get rid of that I/O bottleneck?
>
> Thanks!


Curtis Preston

2007-01-25, 1:14 pm

A lot of things can slow down an NDMP backup. =20

First, make sure you're using a locally attached tape drive, not
three-way NDMP (where backups are sent via IP to another system). The
other will work, but not if you want any performance. ;)

Second, the layout of the volume can really affect your performance. If
your volume consists of only a few spindles, or if they're all from the
same disk shelf, etc.... Any of those can make a volume that won't
supply you with enough data to make that LTO-2 tape drive happy.

Third, the content of the filesystem really affects performance as well.
Small files are your enemy. The more files you have per GB, the slower
your performance will be.

Finally, if your NetApp is sending only, say 15 MB/s, that isn't going
to stream that LTO-2. The LTO-2 ends up spending all its time
backhitching and shoeshing, and your 15 MB/s turns into only a few MB/s.
This means that it becomes the next bottleneck.

So, do what you can to layout the volume well. As to the files, they
are what they are. Not much you can probably do there. Finally, if you
want tape performance to exactly match what's coming out of the NetApp,
consider a virtual tape library. NDMP and VTLs are a match made in
heaven. You can give every filer as many virtual tape drives as it
needs to do its local backup, without actually purchasing dozens of tape
drives.

---
W. Curtis Preston
Author of O'Reilly's Backup & Recovery and Using SANs and NAS
VP Data Protection
GlassHouse Technologies


-----Original Message-----
From: comp.arch.storage-bounces@backupcentral.com
[mailto:comp.arch.storage-bounces@backupcentral.com] On Behalf Of Raju
Mahala
Sent: Thursday, January 25, 2007 9:02 AM
To: comp.arch.storage@backupcentral.com
Subject: Re: [C.A.S.] NDMP backups I/O bottleneck?


have you tried storage pool in between backup server and tape. I am not
sure it works in netbackup. Same issue we have. We have lots of files
on almost all netapp volume. average file size is very less.
We use Tivoli storage manager so we configured LAN back which I feel is
better in case of lots of smaller file size. Backup server first backup
files in diskstorage pool and once backup completed then moves from
storage pool to tape in offline manner so primary netapp files doesn't
remain busy alltime.
please check if diskstorage pool posibility exists in netbackup

Regards,
Raju

On Jan 17, 9:08 am, "Mike" <mike.be...@gmail.com> wrote:
> Hello,
>
> Scheme:
>
> 10x NetApp filer, 2x 1Tb volumes
> ADIC library with 6x LTO-2 drives
> Brocade Silkworm 3800 16 port FC Switch
> Veritas Netbackup with shared storage option enabled
>
> Backup Server - IBM machine with 2x Intel Xeon and 8Gb memory, running
> fc3. There is an external storage with 14x 146Gb Raid5EE U320 10K RPM
> drives attached.
>
> Problem:
> Volumes on the netapps have a lot of directories and small files. When
> I start backup (over NDMP), netbackup mounts the tape and starts a

NDMP
> session between NetApp and tape drive. It takes forever to complete
> that backup.
>
> This is a log from the netapp:
> Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
> Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
> Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15997
> Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006386
> Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
> Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
> Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
> Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
> Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096
> Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
> Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
> Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15998
> Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006387
> Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
> Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
> Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
> Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
> Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096
>
> I'm getting a lot of those messages...
>
> This is tape I/O log from the same netapp..
>
> CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s
> Cache
> in out read write read write
> age
> 44% 7659 0 0 4381 11951 16155 16 0 0
> 4
> 51% 6027 0 0 4357 9726 20571 12323 0 0
> 4
> 46% 6137 0 0 3098 11679 20951 10438 0 0
> 4
> 78% 5207 0 0 3369 11168 25557 9513 0 1843
> 4
> 93% 6318 0 0 4424 9623 24429 12163 0 3270
> 4
> 46% 5846 0 0 3410 8087 16164 14505 0 516
> 4
> 31% 5225 0 0 1910 7638 11612 6680 0 0
> 4
> 31% 5986 0 0 4416 11591 11960 24 0 0
> 4
> 30% 5746 0 0 4670 9569 12627 0 0 0
> 4
> 34% 7059 0 0 4049 12973 9670 8 0 0
> 4
>
> As you can see here, I'm getting tape write performance about 563
> Kb/s.. LTO-2 can store data at 30Mb/s (I saw it!!)
>
> Now, top output from the backup server:
>
> [root@backup root]# top
>
> 23:00:23 up 7 days, 9:41, 2 users, load average: 7.56, 7.25, 7.15
> 214 processes: 213 sleeping, 1 running, 0 zombie, 0 stopped
> CPU states: cpu user nice system irq softirq iowait
> idle
> total 4.6% 0.0% 6.5% 0.0% 0.0% 88.4%
> 0.1%
> cpu00 4.5% 0.0% 5.3% 0.0% 0.0% 90.0%
> 0.0%
> cpu01 8.9% 0.0% 13.5% 0.0% 0.0% 77.4%
> 0.0%
> cpu02 3.5% 0.0% 2.5% 0.0% 0.0% 93.0%
> 0.7%
> cpu03 1.3% 0.0% 4.5% 0.3% 0.1% 93.4%
> 0.0%
>
> It's clearly an I/O bottle neck here. NetApp send a NDMP packet with
> directories, and since the backup is running in the indexed mode (with
> indexes off I can't restore by file), and netback parses it storing
> information in the DB on that external storage. While data storage
> procedure is running - netbackup doesn't send out any comfirmation
> packets to the netapp.
>=20
> So, how do I get rid of that I/O bottleneck?
>=20
> Thanks!


________________________________________
_______
Subscribe or Unsubscribe to this mailing list here:
http://backupcentral.com/mailman/li...ge_backupcentra
l.com

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com