Red Hat Kernel - FC SCSI Tape Read Speed Issue

This is Interesting: Free IT Magazines  
Home > Archive > Red Hat Kernel > January 2004 > FC SCSI Tape Read Speed Issue





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author FC SCSI Tape Read Speed Issue
Jeff Mulliken

2004-01-23, 7:28 pm

Greetings Kernel Guru's,

We are doing work here that involves doing a lot of testing of LTO type
tape drive performance on Linux Operating Systems. (as well as other flavors
of UNIX) We seem to have stumbled across a strange behavior that someone
in the group may be able to shed some light on.

And for the purpose of this discussion, I can only document that the
behavior in question exists on RedHat 8.0, running 2.4.18-14, and with a
Qlogic 23XX series HBA, with v. 6.0.4 drivers. (might be 6.0.1)

Write performance seems to be consistent, and reasonably fast, accross a
variety of block sizes. It is the read performance that raises the old
eyebrow, so to speak. What we are seeing, through the use of a Finisar
FibreChannel Analyzer, is that if we use a blocksize that is off the 'power
of 2' boudary, i.e. 60K, then read commands happen at a rate of 5 per
second, sending throughput into the toilet.

Anyone have any thoughts on this? We'd sure be interested to hear them.
Feel free to email me with any response, at
'jeffreyDOTmullikenATadicDOTcom'.

Here's a simple test that you can try to see if your results are the
same as ours.

you@yourhost > dd if=/dev/zero of=/dev/st0 bs=32772 count=1000

you@yourhost > dd if=/dev/st0 of=/dev/null bs=32772 count=1000



Thanks for your time and consideration.... Jeff
Mulliken


J.O. Aho

2004-01-23, 7:28 pm

Jeff Mulliken wrote:
quote:

> Greetings Kernel Guru's,
>
> We are doing work here that involves doing a lot of testing of LTO type
> tape drive performance on Linux Operating Systems. (as well as other flavors
> of UNIX) We seem to have stumbled across a strange behavior that someone
> in the group may be able to shed some light on.
>
> And for the purpose of this discussion, I can only document that the
> behavior in question exists on RedHat 8.0, running 2.4.18-14, and with a
> Qlogic 23XX series HBA, with v. 6.0.4 drivers. (might be 6.0.1)
>
> Write performance seems to be consistent, and reasonably fast, accross a
> variety of block sizes. It is the read performance that raises the old
> eyebrow, so to speak. What we are seeing, through the use of a Finisar
> FibreChannel Analyzer, is that if we use a blocksize that is off the 'power
> of 2' boudary, i.e. 60K, then read commands happen at a rate of 5 per
> second, sending throughput into the toilet.



No guru here, but there has been some bad kernels released by RedHat for the
Rh8/Rh9, mostly buggy drivers, which drivers, I don't know, but this could be
the reason why the preformance is so bad. I suggest you upgrade to the latest
kernel released by RedHat.

Another funny things is those RedHat patches for the kernel, they do in some
cases make preformance to be worse, so trying out a vanilla kernel could be
good too.


//Aho

Scott Lurndal

2004-01-23, 7:28 pm

"Jeff Mulliken" <mullikenNOSPAM@attbi.com> writes:
quote:

>Greetings Kernel Guru's,


quote:

>
> Write performance seems to be consistent, and reasonably fast, accross a
>variety of block sizes. It is the read performance that raises the old
>eyebrow, so to speak. What we are seeing, through the use of a Finisar
>FibreChannel Analyzer, is that if we use a blocksize that is off the 'power
>of 2' boudary, i.e. 60K, then read commands happen at a rate of 5 per
>second, sending throughput into the toilet.



One thing to be aware of is that with Linux, unlike traditional Unices,
there is no concept of a 'raw character device'. What that means is that
Input/Output operations to the tape drive are going through the kernel filesystem
cache (resulting in evictions, etc). Also, physical transfers to a tape drive
must come from physical pages and the starting buffer address must start on a
page boundary. That means that the data you read will have to be read into
a temporary kernel buffer (I believe the st.o module has a 64k-byte buffer per
drive, by default, but the size of that buffer can be made larger) then moved
into the buffer (file) cache and finally moved again into the program address
space (for dd(1) or whatever application is reading the tape).

With traditional unix, when you access a raw character device, the data is read
and written directly into the program virtual address space without any
intermediate buffering by the kernel - this requires the application to be
aware of alignment requirements for buffer addresses and sizes (which
GNU dd wasn't, last I checked). I have patches somewhere that align the
buffers for gnu DD. (The kernel must take care to lock down the physical
pages which map the program virtual pages containing the buffer for the duration
of the I/O request, and all traditional unices handle this well).

You may want to try the silly '/dev/raw' stuff that S. Tweedy put into
linux - it does allow you to bypass the buffer/file cache, but you must
explicitly bind your physical device (/dev/st0) to a raw device using
IIRC the raw command. I think they still double-buffer, but avoid the
file/buffer cache (the double-buffering avoids the necessity of the application
needing to align buffers on page boundaries as well as locking down physical
pages in the application).

Once upon a time, SGI made real unix-style raw SCSI devices available as a
patch to the linux kernel. Check <http://oss.sgi.com/>. Probably haven't
been ported to a 2.4 series kernel, however, much less 2.6.

scott
quote:

>
> Anyone have any thoughts on this? We'd sure be interested to hear them.
>Feel free to email me with any response, at
> 'jeffreyDOTmullikenATadicDOTcom'.
>
> Here's a simple test that you can try to see if your results are the
>same as ours.
>
>you@yourhost > dd if=/dev/zero of=/dev/st0 bs=32772 count=1000
>
>you@yourhost > dd if=/dev/st0 of=/dev/null bs=32772 count=1000
>
>
>
> Thanks for your time and consideration.... Jeff
>Mulliken
>
>


Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com