Storage compression (was: Cost of storage calculator?)
Web Server forum
Back To The Forum Home!Search!Private Messaging System

Web Server Talk Web Server Talk > WebserverTalk Community > Data Storage > Storage compression (was: Cost of storage calculator?)




  Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    Storage compression (was: Cost of storage calculator?)  
_firstname_@lr_dot_los-gatos_dot_ca.us


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
04-28-06 06:12 PM

In article <1146241178.602187.47380@u72g2000cwu.googlegroups.com>,
<lacka_dacka@yahoo.com> wrote:
>Thats Correct Faeandar... it can compress existing data as well...
>
>I agree that its implementation doesent fully match marketing yet...
>but I guess my question is... *if* the product were to match marketing
>and be able to introduce minimal latency and at the same time compress
>atleast about 60%, would this be a type of technology to invest in?

Storage compression is fun.  It is quite easy if the data is only
written and never modified or read.  Except for the fact that it
requires quite a bit of CPU power, which has to come from somewhere:
either extra CPU boxes have to be introduced in the storage stack
(which cost money and add complexity and unreliability), or existing
CPUs in the disk arrays / NAS boxes have to be used (which slows down
the storage systems), or the compression runs like a device driver or
loadable file system on the application server (where compression uses
expensive CPU cycles on a machine that was bought to run the
customer's application, not to masturbate the customers data).

Reading the compressed data sequentially from the beginning is
typically easy.  Reading it randomly can be hard if decompression is
implemented carelessly.  Reading little blocks in the middle can be
very hard, if decompression relies on compressing the whole stream
sequentially in large chunks.

What can be catastrophic is modifying (overwriting) the data in place.
First off, many compression algorithms rely on finding similarities
within the data stream, and modifying the data disrupts them, so the
new data is typically larger (compresses less).  If the new data is
larger (compresses less), then the storage system has to virtualize
the new data and store it outside the hole in the file.  If you do
that for a while, the originally file layout becomes completely
chaotic, and both reading and writing speed goes to hell, and the
metadata overhead and complexity of the stored files becomes a big
mess.  Furthermore, it is very difficult (but not impossible) to
implement a storage system that can move blocks of data around within
a file, and is correct and doesn't lose or corrupt data, even in the
face of system failures.  Example: What if the compression system is
in the middle of updating the file to indicate (typically in some
metadata) that one block had to be moved to the end because it is less
compressible, and then the power fails, and this complex multi-phase
update is only partially recorded on disk?  There are ways around this
(which typically involve logging, hardware NVRAM, and very careful
ordering of operations), but those require serious thought and great
care in the implementation.

One more hair in the soup: Some data doesn't compress very well.
Examples include images (for example in JPEG format), documents (in
PDF format, which is often internally compressed), backups (which are
often compressed by the backup software), and archival data such as
mail archives (which are usually compressed by the archiving
software).  If you are running interestingly complex ILM software, you
probably already have more compression going on in the software stack,
and then adding one more compression layer won't help much.

One technique that is closely related to compression is duplicate
elimination: Don't store copies of files (or blocks or mail messages)
if the content is identical.  This really helps with backups of
desktop workstations (because every machine has a copy of the MS Excel
DLLs, which are mostly identical), and sometimes helps with mail
archiving (because the same 5MB spreadsheet attachment is forward 100
times within the same mail system, meaning that 100 copies of it are
in the mail archive).  But again, be warned: some ILM software already
contains such duplicate elimination, so doing it again in the software
stack can be pointless and wasteful.

>I see all the arguments that I need to cut down on the cost of
>management of the data... and beleive me, we are doing everything we
>can to do that using products from Archivus etc...

This is really the place where compression can shine: data that is
written once, never modified, and not read all that often.  Examples
include backup, reference data, and compliance archives.  But the
above warnings still apply, compression is not a panacea.

> But in addition, we
>do spend a lot of money on NetApp filers that these kind of products
>seem to be able to help with...  Not to mention, since I started this
>thread, in doing more research into our environment, I am amazed to see
>how much money we are paying on energy bills alone with all the storage
>equipment we have...

Absolutely true.  Here is my new rule of thumb: For every $1 you spend
on storage systems, you will spend another $1 on energy and
infrastructure costs (that includes air conditioning and floorspace
for it) over the lifetime of the hardware, and anywhere between $3 and
$15 on system administration and management (a good fraction of which
goes into avoiding, planning for, and managing failures of the storage
system).  And if you buy a tape drive for $1, you can easily spend
anywhere between $10 and $100 on the blank tapes required to operate
it.

Also remember that management overhead doesn't just scale with the
size of the storage system in GB, but also with the complexity of the
storage system.  A Netapp with 80 disks is only a little harder to
administer than a Netapp with 60 disks.  But a Netapp with a separate
compression system installed is a lot harder to administer than just a
Netapp.  It might be much cheaper to throw a few dozen disks at the
problem than have another moving part in an already complex system.

From this point of view, an investment of $0.40 in compression
hardware/software that makes your storage 30% more space efficient,
but increases the management overhead (for example because it reduces
reliability by 20%), may be very foolish:

Before:
$1 for storage system
$1 for energy/cooling/floorspace
$10 for management
=> $12 lifetime cost

After:
$0.70 for storage system
$0.70 for energy/cooling/floorspace
$0.40 for compression system
$12 for management
=> $13.80 lifetime cost

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy      _firstname_@lr_dot_los-gatos_dot_ca.us





[ Post a follow-up to this message ]



    Re: Storage compression (was: Cost of storage calculator?)  
lacka_dacka@yahoo.com


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
04-28-06 06:12 PM

Thanks Ralph... With the storewiz product they claim that management
overhead is nil (I doubt this, but lets just say its true), then would
such a technology (even if not from storewiz) be appropiate to invest
it?

I can attest to the fact that thus far, I've had no issue with
StoreWiz's reliability... it actually isnt taking me too much time to
manage it in my dev lab... almost none at all...

/l






[ Post a follow-up to this message ]



    Re: Storage compression (was: Cost of storage calculator?)  
_firstname_@lr_dot_los-gatos_dot_ca.us


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
04-29-06 12:12 AM

In article <1146248345.436036.252850@u72g2000cwu.googlegroups.com>,
<lacka_dacka@yahoo.com> wrote:
>Thanks Ralph... With the storewiz product they claim that management
>overhead is nil (I doubt this, but lets just say its true), then would
>such a technology (even if not from storewiz) be appropiate to invest
>it?

So it is installed by elves, without any downtime by using Hermione's
time-turner, configuration updates are done by angels while you sleep
(my computer never sleeps, by the way, too many chocolate-covered
espresso-beans in its youth), it never fails, and if it fails anyhow,
the Storewiz field engineering people arrive an hour before the
failure to hot-swap a spare in (they have the second sight)?

Give me a break    <- intentional pun.

>I can attest to the fact that thus far, I've had no issue with
>StoreWiz's reliability... it actually isnt taking me too much time to
>manage it in my dev lab... almost none at all...

OK, now manage it in a production setting, say in the data center of a
bank or insurance company (think Chief Privacy Officer, think SEC
regulation, think Sorbanes-Oxley).  First step: Have your computer
security people do a full audit that the compression gadget is secure
enough.  That includes checking key distribution, TCP/IP ports,
password management, integration into the LDAP authentication
infrastructure, and auditing all software upgrades.

I'll stop right there, because one of your computer security experts
(who run $120K per year salary alone) just spent a week on this new
gadget.  We haven't even started using the device yet, because the
computer security people won't allow you to uncrate it until they have
given it their blessing, and written the 30-page policy manual for how
you shall manage it from a security point of view.

I'm exaggerating here, but not very much.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy      _firstname_@lr_dot_los-gatos_dot_ca.us





[ Post a follow-up to this message ]



    Re: Storage compression (was: Cost of storage calculator?)  
lacka_dacka@yahoo.com


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
04-29-06 12:12 AM

I understand the points you are raising... they are very valid in our
environment... But those are true of any product we introduce Ralph...
All new products we introduce into the production go through quite
similar rigorous scruitiny and yes, they do cost a lot (LOT) of time...
But it applies horizontally for all products... so we dont even compute
that anymore into comparing different products...  But once setup, it
seems to be able to work silently...  I do have a lot of issues with
its ability to work on non textual (non office) type of data, but I
want to see where this industry will go... and if I should invest in
the technology now (assuming they fix all their nuances)...

/l






[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 09:43 PM.      Post New Thread    Post A Reply      
  Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 
Medical and Health forum | Computer Games Reviews | Graphics design forum

Back To The Top
Home | Usercp | Faq | Register