Data Storage - Very High Rate Continous Transfer

This is Interesting: Free IT Magazines  
Home > Archive > Data Storage > July 2005 > Very High Rate Continous Transfer





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Very High Rate Continous Transfer
jim_nospam_beasley@yahoo.com

2005-07-16, 5:46 pm

I am looking into what it will take to support continuous (not burst)
160 to 200 MBytes/sec transfer to disk. What types of drives and how
would they be configured as an array (or multiple arrays)?

What type of processor and bus architecture would be appropriate? Data
will be from a capture memory that is shared between the processor and
the capture electronics. So memory bandwidth will be at least 320 to
400 MBytes/sec.

Thanks in advance,
Jim

Bill Todd

2005-07-16, 5:46 pm

jim_nospam_beasley@yahoo.com wrote:
> I am looking into what it will take to support continuous (not burst)
> 160 to 200 MBytes/sec transfer to disk. What types of drives and how
> would they be configured as an array (or multiple arrays)?
>
> What type of processor and bus architecture would be appropriate? Data
> will be from a capture memory that is shared between the processor and
> the capture electronics. So memory bandwidth will be at least 320 to
> 400 MBytes/sec.


Modern x86 processor/memory configurations should be able to handle that
kind of memory bandwidth without even starting to break a sweat - if the
capture electronics can use standard DMA mechanisms to provide the data.

But in that case you'd need a double-speed, double-width (64/66) PCI bus
at a minimum to handle the bi-directional bandwidth unless you used a
chipset that bypassed the PCI for disk activity (I'm not even sure
they're available: if they only support IDE, you'd have to split the
load across two IDE ports - and where are you going to find an IDE RAID
box?) or worked around the problem by using the AGP port for the input
or output stream, in which case you'd still need double-width or
double-speed PCI but probably not both. In other words, PCI-X or
PCI-express might be a better option (and today probably much more
available than double-speed, double-width PCI).

RAID-3 sounds like what you want for disk storage (assuming that you
want some redundancy in it - though I'm not sure where you'd find a
non-redundant implementation comparable in performance to a good RAID-3
box even if redundancy weren't necessary). I saw a single 1 Gbit/s
fibre channel link deliver almost 90 MB/s of streaming write bandwidth
to an 8 + 1 RAID-3 array back in 1998, so 2 Gbit/s fiber channel with at
most 6 + 1 of today's disks - which should offer 30 - 40 MB/s/disk
streaming bandwidth at an absolute minimum - should do the job with a
single array (if not, perhaps the application can easily distribute the
streaming data across two arrays or use software RAID-0 to do so).

Probably someone with more recent and detailed experience can answer
this question better, but at least that's a start.

- bill
_firstname_@lr_dot_los-gatos_dot_ca.us

2005-07-17, 2:45 am

In article <6PidnW5L1Ywq_0TfRVn-2w@metrocastcablevision.com>,
Bill Todd <billtodd@metrocast.net> wrote:
>jim_nospam_beasley@yahoo.com wrote:
>
>Modern x86 processor/memory configurations should be able to handle that
>kind of memory bandwidth without even starting to break a sweat - if the
>capture electronics can use standard DMA mechanisms to provide the data.


Agree. Although you have to be careful with memory <-> CPU bandwidth.
If you need to do multiple passes over the data (for example, copy if
between buffers, or run CRCs or TCP checksums over them) the memory
bandwidth could become an issue.

>But in that case you'd need a double-speed, double-width (64/66) PCI bus
>at a minimum to handle the bi-directional bandwidth unless you used a
>chipset that bypassed the PCI for disk activity (I'm not even sure
>they're available: if they only support IDE, you'd have to split the
>load across two IDE ports - and where are you going to find an IDE RAID
>box?) or worked around the problem by using the AGP port for the input
>or output stream, in which case you'd still need double-width or
>double-speed PCI but probably not both.


You can get motherboards with multiple PCI channels. Somewhere in my
lab, I have a few 2-U rackmounts with 3 open PCI busses (on 3 slots);
if I remember right the Ethernet and SCSI/RAID chips on the
motherboard are on a separate bus.

Don't know whether such motherboards are sold in the white-box market;
you may have to buy a server-class x86 box from one of the big vendors
to get a system like this.

Also, AFAIK the AGP bus is a superset of the PCI bus, just on a
different connector. If you are custom-building your electronics, you
might want to use the AGP connector.

> In other words, PCI-X or
>PCI-express might be a better option (and today probably much more
>available than double-speed, double-width PCI).


One other issue: For the outgoing link, I would split the bandwidth
over two separate fibre channel ports. With 2Gbit FC, each port can
theoretically handle >200 MB/sec, so each would be loaded <50%, which
will make the whole thing run much more smoothly. You could for
either stripe the data yourself (if you control the data writigng
software, or for example use a LVM on the host to stripe the data over
the two FC ports. Whether to also stripe it over two disk arrays
depends on what disk array you buy. As 2-port 2Gbit FC cards are
easily available, this does not require extra PCI slots.

For an inexpensive solution (no redundancy, do-it-yourself), get a PCI
SCSI controller with two U320 ports. Connect a few (maybe a half
dozen) 10K RPM SCSI disks to each port. This might be quite easy: Buy
a rackmount JBOD (a.k.a. disk tray) with two SCSI ports; make sure you
get a model with a splittable SCSI backplane (two half-backplanes,
each with about a half dozen slots, instead of one long backplane with
two SCSI connevtors and with a dozen SCSI slots). Then use custom
software or an LVM to stripe the data across the disks. Let's look at
the numbers: 10K RPM SCSI drives can write data at about 50 MB/sec
each; the reason I picked a half dozen drives per SCSI port is to
match disk bandwidth with SCSI bandwidth. This hardware configuration
can theoretically handle about 600 MB/sec, so it should have no
problem runnind day-in day-out at 1/3 of that. Problem is: No
redundancy, and you have to roll your own striping.

>RAID-3 sounds like what you want for disk storage (assuming that you
>want some redundancy in it - though I'm not sure where you'd find a
>non-redundant implementation comparable in performance to a good RAID-3
>box even if redundancy weren't necessary). I saw a single 1 Gbit/s
>fibre channel link deliver almost 90 MB/s of streaming write bandwidth
>to an 8 + 1 RAID-3 array back in 1998, so 2 Gbit/s fiber channel with at
>most 6 + 1 of today's disks - which should offer 30 - 40 MB/s/disk
>streaming bandwidth at an absolute minimum - should do the job with a
>single array (if not, perhaps the application can easily distribute the
>streaming data across two arrays or use software RAID-0 to do so).


To some extent I agree. RAID-3 has traditionally been used for large
streaming IO (examples: multimedia, supercomputing). On the other
hand, the bandwidth required here is so small that any modern
mid-range disk array could do it, in any RAID-level. So if disk cost
is an issue and you don't need redundancy (meaning you are willing to
tolerate loss of data, and downtime), you might want to try RAID-0.
If disk costs are irrelevant, and you want the best possible
redundancy, you could try RAID-1 (in effect implemented as RAID-10).
Or maybe RAID-5 might work better than RAID-3: few people use RAID-3
today, so it is quite possible that RAID-5 implementations have been
carefully tested and tuned, and are quite fast.

Another warning: If you think RAID will give you redandancy, and at
the same time you are relying on maxing out the performance of the
disk array, you are cheating yourself. What I mean is this: If you
buy a disk array that can barely handle 200 MB/sec (meaning your
configuration is cost optimized), then it will not handle that
bandwidth in degraded mode (with a dead disk). RAID redundancy is in
some sense about preserving your data; while running with a dead disk,
the speed will be quite low. If your data is lost for good if you
can't write it to disk, you'll have to significantly overdesign your
RAID arrays to make sure they can handle the traffic even in degraded
mode. This should be less of an issue for RAID-1 (which is easier to
write to in degraded mode) than for the parity-based RAIDs.

Happy experimenting!

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca.us
Bill Todd

2005-07-17, 2:45 am

_firstname_@lr_dot_los-gatos_dot_ca.us wrote:

....

> Another warning: If you think RAID will give you redandancy, and at
> the same time you are relying on maxing out the performance of the
> disk array, you are cheating yourself. What I mean is this: If you
> buy a disk array that can barely handle 200 MB/sec (meaning your
> configuration is cost optimized), then it will not handle that
> bandwidth in degraded mode (with a dead disk). RAID redundancy is in
> some sense about preserving your data; while running with a dead disk,
> the speed will be quite low. If your data is lost for good if you
> can't write it to disk, you'll have to significantly overdesign your
> RAID arrays to make sure they can handle the traffic even in degraded
> mode.


While I generally agree with the other points you made, my distinct
impression is that good RAID-3 implementations (unlike, say, RAID-4, -5,
or -6) suffer no noticeable performance degradation (for either reads or
writes) while running with a dead disk. That was one of the reasons I
suggested it.

- bill
_firstname_@lr_dot_los-gatos_dot_ca.us

2005-07-18, 2:46 am

In article <OJWdnYuY47nSaUTfRVn-rA@metrocastcablevision.com>,
Bill Todd <billtodd@metrocast.net> wrote:
>_firstname_@lr_dot_los-gatos_dot_ca.us wrote:
>
>...
>
>
>While I generally agree with the other points you made, my distinct
>impression is that good RAID-3 implementations (unlike, say, RAID-4, -5,
>or -6) suffer no noticeable performance degradation (for either reads or
>writes) while running with a dead disk. That was one of the reasons I
>suggested it.


I agree, and correction gladly accepted. With one minor fly in the
ointment: While the disk is dead, things should run just fine. As
soon as you put a spare disk in, the array might try to rebuild onto
the new disk; that rebuild will compete with the foreground workload.
With good systems management, this could be circumvented, for example:
once a LUN has a dead disk, slowly drain the data from it, then get
the disk array to accept a spare disk without rebuilding, for example
by destroying the LUN, and recreating it using the spare (doing all
this on a live system without having to shut the software stack down
might be tricky).

Compared to all the other questions the original poster should think
about (SCSI or FC? Commercial disk arrays or JBODs? PCI/AGP and
memory bandwidth? Redundancy or not? Stripe by hand, by LVM, or not
at all? and many others) the selection of RAID level is actually a
minor point.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca.us
Maxim S. Shatskih

2005-07-18, 2:46 am

> Also, AFAIK the AGP bus is a superset of the PCI bus, just on a
> different connector. If you are custom-building your electronics, you
> might want to use the AGP connector.


AGP is obsolete, being replaced by PCI-X in all newer mobos, even for home
gaming platforms.

AGP is rather hardly 3D-oriented, while PCI-X is universal.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com


Zak

2005-07-18, 7:46 am

Maxim S. Shatskih wrote:

> AGP is obsolete, being replaced by PCI-X in all newer mobos, even for home
> gaming platforms.


PCI-express, that is. PCI-X is 100/133 MHz PCI in servers, PCI express
is a serial version.


Thomas
jim_nospam_beasley@yahoo.com

2005-07-19, 2:46 am

I appreciate the advice. I am absorbing it a fast as I can.

I'll be using this as a real-time recording device, and data can be
transferred to more reliable long term memory after the recording
procedure is completed, which may be several hours later.

I am inclined to think that RAID-0 for performance, or RAID-5 for some
redundancy, is the way to go. I am also considering 2.5" vs 3.5"
drives, and a review at Tom's hardware suggests an investment in 2.5"
will reduce my noise and power by a lot. Also, I expect the equivalent
array of 2.5" drives to be cooler, lighter and inherently more rugged,
which I need for this application.

I will need at least 600 MB of hard drive storage.

I may be willing to reduce the hard drive continuous bandwidth by half
(at the expense of losing a main feature). All other requirements
remain the same.

Now I will need to identify RAID controllers and a processor board. I
would prefer a 3U form factor.

I'll need about 10 GB of fast memory, and will DMA data to it from an
input device. I expect this to be pricey, so no need to beat me about
that.

I am going about this blindly at the moment, but any suggestions will
be greatly appreciated.

You are all very helpful, and I appreciate it very much.

Jim

Stephen Maudsley

2005-07-19, 7:46 am


<jim_nospam_beasley@yahoo.com> wrote in message
news:1121528669.995816.282650@o13g2000cwo.googlegroups.com...
> I am looking into what it will take to support continuous (not burst)
> 160 to 200 MBytes/sec transfer to disk. What types of drives and how
> would they be configured as an array (or multiple arrays)?
>
> What type of processor and bus architecture would be appropriate? Data
> will be from a capture memory that is shared between the processor and
> the capture electronics. So memory bandwidth will be at least 320 to
> 400 MBytes/sec.


Might be worth looking for articles and papers from CERN - they've been
doing this sort of stuff for years and used to publishe papers on the
computing architectures.


carmelomcc

2005-07-19, 7:46 am

You need to just go get a CX700... It will handle the I/O you are
talking about even with a bad disk. Just make sure you have a global
hot spare for every shelf. As long as the data is not random you will
be fine with Raid 5 on that arrary.

_firstname_@lr_dot_los-gatos_dot_ca.us

2005-07-19, 5:47 pm

In article <ASSY525711F330@assayer.co.uk>,
Stephen Maudsley <news2@sjmaudsley.fsnet.co.uk> wrote:
>
><jim_nospam_beasley@yahoo.com> wrote in message
>news:1121528669.995816.282650@o13g2000cwo.googlegroups.com...
>
>Might be worth looking for articles and papers from CERN - they've been
>doing this sort of stuff for years and used to publishe papers on the
>computing architectures.


Being a retired high-energy physicist (and former CERN collaborator)
myself ...

Yes, it would be a good idea to start there, and read their stuff. A
good starting point is to look for the web presence of the "CERN
OpenLab", and read what is posted there.

But the original poster's situation and CERN are in different leagues.
I've recently seen a ~1 GByte/sec test running at CERN, sustained for
a whole week. But it required O(100) computers, massive networking
gear, and many hundred disk drives, with some of the finer software
and hardware products from industry thrown into the mix. It also
consumed all told probably a dozen people (both from CERN and from
industry) for a year to set up, and the hardware cost should be
measured in units of M$.

The other thing to remember is that to CERN, the data storage problem
(even though it is massive) is a small part of their overall mission.
Anyone who spends ~10 billion $ on building an accelerator, about the
same on the physics experiments, and a few billion $ a year on
operation and support, has a strong incentive to build a reliable and
fast data storage system, because loss of data would have huge
economic costs.

I very much doubt that the original poster's system will reach this
scale; still, stealing some good ideas there is a good plan.

Another thing to remember from the CERN experience: Just because the
system can do a certain speed (say 400 MB/sec) once, doesn't mean at
all that it can do so sustained. Things go wrong all the time
(guaranteed to happen in a large system, which typically even involves
a few humans, which are about as unreliable as disk drives, and nobody
has invented RAID for sys admins yet). The real test is not to do 400
MB/sec for 10 seconds, but do so sustained 24x7 for a month. This is
much much harder.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca.us
jim_nospam_beasley@yahoo.com

2005-07-23, 5:46 pm

Because of the nature of the data I am saving, I think I can simplify
quite a bit. I'll still need to figure out what hardware I need to
support this, but here is the direction I am heading:

Most of the time the data will be in blocks that are about 1 mS in
sample time duration. At 160 MBps, that's only 160KB per block. With a
set of 7 or 8 separate physical volumes, I will write data blocks into
separate files, sequentially writing in the next physical drive for
each block. I can do this under software control.

(I am not worried about redundancy at the moment. This data will be
stored for only a few hours before it is transferred to a server, where
data security can be addressed.)

I read a report at Tom's hardware that shows the worst case write
bandwidth for a 2.5" Toshiba MK1032GAX 100 GB drive assymtotically goes
to about 27 MBps as the drive becomes full. (Can someone verify my
understanding of this? It's here:
http://www.tomshardware.com/storage..._transfer_graph
)

Overhead
--------
I am not sure what the processor overhead will be to open and close
files while doing this. The alternative is to stream the blocks into
larger files, which only changes the data read process.

With this process, I can probably rely on disk cache to absorb most
remaining delays (like seek time).

Drive reliability
-----------------
I am wondering if I can stream data into these blocks continuously
without buffering it in main memory. As I said in an earlier post, I
want to continuously capture into memory and decide when to offload the
last 10 GB of recorded data. But I am now thinking I can stream this
data directly to the disks (with a significant savings in main memory)
and overwrite that which I don't want to keep. How hard is this on the
drives, if I do this continuously for 12 hours striaght, or for 24/7?

Drive Controller
----------------
With the solution outline given above, I will need a controller,
preferrably in a 3U format (cPCI). Like I said, it will support 7 to 8
independent physical drives, at a minimum. Does anyone have a
suggestion?


Regards,
Jim

Bill Todd

2005-07-23, 5:46 pm

jim_nospam_beasley@yahoo.com wrote:
> Because of the nature of the data I am saving, I think I can simplify
> quite a bit. I'll still need to figure out what hardware I need to
> support this, but here is the direction I am heading:
>
> Most of the time the data will be in blocks that are about 1 mS in
> sample time duration. At 160 MBps, that's only 160KB per block. With a
> set of 7 or 8 separate physical volumes, I will write data blocks into
> separate files, sequentially writing in the next physical drive for
> each block. I can do this under software control.


It would be even easier using a single file spread across the disks
under RAID-0 software control (you're effectively talking above about
recreating RAID-0 in your application).

>
> (I am not worried about redundancy at the moment. This data will be
> stored for only a few hours before it is transferred to a server, where
> data security can be addressed.)


Hmmm. 3 hrs. x 3600 sec/hr. x 160 MB/sec = 1.728 TB - considerably more
space than you'll have using 7 or 8 100 GB drives even if you manage it
optimally.

>
> I read a report at Tom's hardware that shows the worst case write
> bandwidth for a 2.5" Toshiba MK1032GAX 100 GB drive assymtotically goes
> to about 27 MBps as the drive becomes full. (Can someone verify my
> understanding of this? It's here:
> http://www.tomshardware.com/storage..._transfer_graph
> )


The number sounds reasonable, but you should still leave a bit of margin
just in case (especially using a non-RAID-3 array where the disks won't
be synchronized with each other, though your application may tend to be
self-synchronizing). Of course, you should check the manufacturer's
spec sheet too.

>
> Overhead
> --------
> I am not sure what the processor overhead will be to open and close
> files while doing this.


You almost certainly don't want to be opening and closing files at all
frequently: that could start to screw up your data rate to disk (even
if the relevant file data is usually cached, it often gets updated on
close). For that matter, you'll want to suppress any frequent on-disk
updates to things like the file's last-accessed and last-modified times,
reuse existing file space rather than allocate new space to avoid
on-disk allocation update activity and suppress end-of-file-mark
updates, etc.

The alternative is to stream the blocks into
> larger files, which only changes the data read process.
>
> With this process, I can probably rely on disk cache to absorb most
> remaining delays (like seek time).


Quite possibly not at the data rates you're talking about.

>
> Drive reliability
> -----------------
> I am wondering if I can stream data into these blocks continuously
> without buffering it in main memory.


Probably not - see previous comment. Besides, if you don't go through
main memory you'd be completely by-passing the file system and writing
driver code. But using asynchronous multi-buffering you can stay within
the realm of normal application behavior without needing much memory.

As I said in an earlier post, I
> want to continuously capture into memory and decide when to offload the
> last 10 GB of recorded data. But I am now thinking I can stream this
> data directly to the disks (with a significant savings in main memory)
> and overwrite that which I don't want to keep. How hard is this on the
> drives, if I do this continuously for 12 hours striaght, or for 24/7?
>
> Drive Controller
> ----------------
> With the solution outline given above, I will need a controller,
> preferrably in a 3U format (cPCI). Like I said, it will support 7 to 8
> independent physical drives, at a minimum. Does anyone have a
> suggestion?


If you're as cost-conscious as you appear to be, consider 3.5" SATA
drives - which will give you the temporary storage space you need and
comparable or better bandwidth in numbers that should fit in a 3U
enclosure. 3Ware makes controllers which may handle the bandwidth when
used as a simple JBOD (I've heard varying reports of their capabilities
at the higher RAID levels).

- bill
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com