Data Storage - ILM and Full Text Search

This is Interesting: Free IT Magazines  
Home > Archive > Data Storage > February 2007 > ILM and Full Text Search





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author ILM and Full Text Search
ron.lindman@gmail.com

2007-01-30, 7:15 pm

Hello,

I'm looking into various ILM products such as those from Kazeon, EMC,
NeoPath, etc. One question that comes up is how these products behave
when a client does a full-text search against a volume that contains
data that's been migrated away.

>From what I understand, a file access causes many of these products to

bring the file back from a secondary tier. I know that some ILM API's
allow for redirection, which would seemingly avoid this issue.
However, others do not have redirection. Wouldn't this mean that a
full-text search causes the entire set of data to be brought back onto
the primary tier? Doesn't this cause capacity issues?

What am I missing? Your help is greatly appreciated.

Thanks,
Ron

Nik Simpson

2007-01-30, 7:15 pm

ron.lindman@gmail.com wrote:
> Hello,
>
> I'm looking into various ILM products such as those from Kazeon, EMC,
> NeoPath, etc. One question that comes up is how these products behave
> when a client does a full-text search against a volume that contains
> data that's been migrated away.
>
> bring the file back from a secondary tier. I know that some ILM API's
> allow for redirection, which would seemingly avoid this issue.
> However, others do not have redirection. Wouldn't this mean that a
> full-text search causes the entire set of data to be brought back onto
> the primary tier? Doesn't this cause capacity issues?
>
> What am I missing? Your help is greatly appreciated.


Typically, a content search is performed against a content index, not
against the original file, so the search doesn't touch the file at all.
The file is read during the indexing process, if that occurs before
migration then the file will not be hit after migration.


PS. If you looking at this space you should also take a look at Scentric
(FTR I work for Scentric, well at least for another ten days :-)

--
Nik Simpson
dvymiller@yahoo.com

2007-02-01, 1:13 pm

On Jan 30, 2:47 pm, Nik Simpson <n_simp...@bellsouth.net> wrote:
> ron.lind...@gmail.com wrote:
>
>
>
>
> Typically, a content search is performed against a content index, not
> against the original file, so the search doesn't touch the file at all.
> The file is read during the indexing process, if that occurs before
> migration then the file will not be hit after migration.
>
> PS. If you looking at this space you should also take a look at Scentric
> (FTR I work for Scentric, well at least for another ten days :-)
>
> --
> Nik Simpson


What happens when someone opens Windows file explorer and performs a
search through it's search tool? Wont it try and read all the files
off of the NAS and to the OPs point, wont it cause all the files to be
moved from tier II to tier I again?

Dvy

Nik Simpson

2007-02-01, 7:13 pm

dvymiller@yahoo.com wrote:
>
>
> What happens when someone opens Windows file explorer and performs a
> search through it's search tool? Wont it try and read all the files
> off of the NAS and to the OPs point, wont it cause all the files to be
> moved from tier II to tier I again?
>


ON XP, Think the answer would be yes, unless the ILM solution is smart
and recognizes the type of access as being something it should not
migrate for. In Vista (or if you are using something like Google Desktop
search which maintains an index this should not be such a big problem.
--
Nik Simpson
bcwalrus

2007-02-01, 7:13 pm

On Jan 30, 2:01 pm, "ron.lind...@gmail.com" <ron.lind...@gmail.com>
wrote:
> Hello,
>
> I'm looking into various ILM products such as those from Kazeon, EMC,
> NeoPath, etc. One question that comes up is how these products behave
> when a client does a full-text search against a volume that contains
> data that's been migrated away.
>
>
> bring the file back from a secondary tier. I know that some ILM API's
> allow for redirection, which would seemingly avoid this issue.
> However, others do not have redirection. Wouldn't this mean that a
> full-text search causes the entire set of data to be brought back onto
> the primary tier? Doesn't this cause capacity issues?
>
> What am I missing? Your help is greatly appreciated.
>
> Thanks,
> Ron


Not for the NeoPath FileDirector. They redirect access traffic to the
migration destination. If you access it frequent enough, then
depending on how you set up the placement policy, data may be migrated
back to the primary tier. Or you can set up your policy not to do
that. In other words, data access and data placement policy are
independent.

(I happen to be the NFS guy at NeoPath.)

Cheers,
bc

Faeandar

2007-02-01, 7:13 pm

On 30 Jan 2007 14:01:52 -0800, "ron.lindman@gmail.com"
<ron.lindman@gmail.com> wrote:

>Hello,
>
>I'm looking into various ILM products such as those from Kazeon, EMC,
>NeoPath, etc. One question that comes up is how these products behave
>when a client does a full-text search against a volume that contains
>data that's been migrated away.
>
>bring the file back from a secondary tier. I know that some ILM API's
>allow for redirection, which would seemingly avoid this issue.
>However, others do not have redirection. Wouldn't this mean that a
>full-text search causes the entire set of data to be brought back onto
>the primary tier? Doesn't this cause capacity issues?
>
>What am I missing? Your help is greatly appreciated.
>
>Thanks,
>Ron



So, since we have two people from companies in this space I'd like to
pose the competitive question:

What are your thoughts on Index Engines?

Thanks.

~F
Nik Simpson

2007-02-02, 1:16 am

Faeandar wrote:
>
> So, since we have two people from companies in this space I'd like to
> pose the competitive question:
>
> What are your thoughts on Index Engines?
>



First, right now I would not see Index Engines as a direct competitor,
they are purely a search application and don't offer much in the way of
classification or policy-based data management which is needed for ILM.

Second for enterprise wide search the problem is that when I'm looking
for document X, I'd rather find it on disk than buried on a backup tape.
If I can't find it online, then I'd go backup tape. So other than as an
application for helping me keep better track of what I've backed up I
don't see much of a future for it.

Interesting technology that I suspect will get embedded in things like
VTLs and D2D disk backup appliances. I don't see it as a standalone
technology. Good acquisition candidate for somebody in that space.

--
Nik Simpson
dvymiller@yahoo.com

2007-02-02, 1:16 am

On Feb 1, 5:54 pm, Nik Simpson <n_simp...@bellsouth.net> wrote:
> Faeandar wrote:
>
>
>
> First, right now I would not see Index Engines as a direct competitor,
> they are purely a search application and don't offer much in the way of
> classification or policy-based data management which is needed for ILM.
>
> Second for enterprise wide search the problem is that when I'm looking
> for document X, I'd rather find it on disk than buried on a backup tape.
> If I can't find it online, then I'd go backup tape. So other than as an
> application for helping me keep better track of what I've backed up I
> don't see much of a future for it.
>
> Interesting technology that I suspect will get embedded in things like
> VTLs and D2D disk backup appliances. I don't see it as a standalone
> technology. Good acquisition candidate for somebody in that space.
>
> --
> Nik Simpson


Where does the google search appliance fit into this?

Dvy

Faeandar

2007-02-02, 1:16 am

On Thu, 01 Feb 2007 20:54:20 -0500, Nik Simpson
<n_simpson@bellsouth.net> wrote:

>Faeandar wrote:
>
>
>First, right now I would not see Index Engines as a direct competitor,
>they are purely a search application and don't offer much in the way of
>classification or policy-based data management which is needed for ILM.
>
>Second for enterprise wide search the problem is that when I'm looking
>for document X, I'd rather find it on disk than buried on a backup tape.
>If I can't find it online, then I'd go backup tape. So other than as an
>application for helping me keep better track of what I've backed up I
>don't see much of a future for it.
>
>Interesting technology that I suspect will get embedded in things like
>VTLs and D2D disk backup appliances. I don't see it as a standalone
>technology. Good acquisition candidate for somebody in that space.



They can get metadata directly from NDMP dumps. If someone figures
out how to flag the dump to only pass the metadata then they will be
able to get an entire storage array's metadata in a matter of hours
instead of days that file crawlers will take.
Even without the flag they still get data far faster than any file
crawler.

I may have been asking far too open ended a question. My needs are
fairly simple; tell me what, where, how big, how frequently accessed,
what type of file, etc. I've no need for a deep dive of content.

I'm looking for typical SRM stats, but on a fair scale.

Hopefully this provides more to go on.

Thanks.

~F
Nik Simpson

2007-02-03, 7:13 am

Faeandar wrote:
>
>
> They can get metadata directly from NDMP dumps. If someone figures
> out how to flag the dump to only pass the metadata then they will be
> able to get an entire storage array's metadata in a matter of hours
> instead of days that file crawlers will take.
> Even without the flag they still get data far faster than any file
> crawler.
>


Yes, they could do that, but then so could every other competitor, NDMP
is available to anybody, not just Index Engines. EMC does something
similar, though probably proprietary with it's classification product
which gets a "dump" of metadata from Celerra file servers rather walking
the file system over the network.

> I may have been asking far too open ended a question. My needs are
> fairly simple; tell me what, where, how big, how frequently accessed,
> what type of file, etc. I've no need for a deep dive of content.


Index Engines wouldn't be a solution then, since to the best of my
knowledge it's all about content indexing & search. However, both
Scentric and Kazeon can do what you want without having to generate a
content index.

>
> I'm looking for typical SRM stats, but on a fair scale.


So you don't actually want to take any actions like migrating little
used stuff to tier2? Anyway, both Scentric and Kazeon offer extensive
SRM reporting, though if reporting is all you want, you might want to
take a look at Monosphere which has a pure file SRM solution. How big is
a "fair scale" to you, 10s, 100s, 1000s of TB?

If you do want to take actions, then a policy engine is something you
want to look at. I can't speak for Kazeon's policy engine, but Scentric
lets you build policies with classification rules. For example "find all
OFFICE files larger than 50MB, & not accessed in 30days" which can be
combined with one or more actions that work on the results of the
filter. Actions include move, copy, delete, script, archive with
retention, etc.

You can schedule these policies on a calendar or event trigger (i.e.
once a week, or when file system has less than 20% free), you can also
trigger them from external scripts.



--
Nik Simpson
Faeandar

2007-02-04, 1:14 am

On Sat, 03 Feb 2007 08:06:42 -0500, Nik Simpson
<n_simpson@bellsouth.net> wrote:

>Faeandar wrote:


>
>Yes, they could do that, but then so could every other competitor, NDMP
>is available to anybody, not just Index Engines. EMC does something
>similar, though probably proprietary with it's classification product
>which gets a "dump" of metadata from Celerra file servers rather walking
>the file system over the network.


Any/every other product could but, so far as I've seen, do not. That
one bit is intriguing enough to me to look at them.

>
>
>Index Engines wouldn't be a solution then, since to the best of my
>knowledge it's all about content indexing & search. However, both
>Scentric and Kazeon can do what you want without having to generate a
>content index.


We have Kazeon on eval and so far I can't say I'm impressed. It's
quite slow. Getting data on an entire filer would take many weeks
based on performance tests. It took 4 days to run a single qtree on a
filer.

>
>
>So you don't actually want to take any actions like migrating little
>used stuff to tier2?


That is correct. No automated migrations or anything. I want
information that me and my staff can make decisions based on, but our
needs are not simple enough for policy based file migration.

>Anyway, both Scentric and Kazeon offer extensive
>SRM reporting, though if reporting is all you want, you might want to
>take a look at Monosphere which has a pure file SRM solution. How big is
>a "fair scale" to you, 10s, 100s, 1000s of TB?


I thought Monosphere was more of a trending and analysis tool? Not
file level reporting. We are slated to eval them for a different
purpose but I'll keep them in mind for this as well.
Fair scale would be 100's of TB.

~F
dvymiller@yahoo.com

2007-02-05, 7:13 pm

On Feb 3, 7:44 pm, Faeandar <mr_casta...@yahoo.com> wrote:
> On Sat, 03 Feb 2007 08:06:42 -0500, Nik Simpson
>
> <n_simp...@bellsouth.net> wrote:
>
>
> Any/every other product could but, so far as I've seen, do not. That
> one bit is intriguing enough to me to look at them.
>
>
>
>
>
> We have Kazeon on eval and so far I can't say I'm impressed. It's
> quite slow. Getting data on an entire filer would take many weeks
> based on performance tests. It took 4 days to run a single qtree on a
> filer.


Is this for the kazeon to crawl the filer? How much data is on that
filer? And how many files is that data in?
Is it crawling the filer via the FPolicy link or via a NFS link?

>
>
>
>
>
> That is correct. No automated migrations or anything. I want
> information that me and my staff can make decisions based on, but our
> needs are not simple enough for policy based file migration.
>
>
> I thought Monosphere was more of a trending and analysis tool? Not
> file level reporting. We are slated to eval them for a different
> purpose but I'll keep them in mind for this as well.
> Fair scale would be 100's of TB.
>
> ~F



bcwalrus

2007-02-05, 7:13 pm

On Feb 3, 7:44 pm, Faeandar <mr_casta...@yahoo.com> wrote:
<...>

> I thought Monosphere was more of a trending and analysis tool? Not
> file level reporting. We are slated to eval them for a different
> purpose but I'll keep them in mind for this as well.
> Fair scale would be 100's of TB.
>
> ~F


If the number of files is around tens of millions, then this
Fileyzer tool seems to do what you mentioned:
http://neopathnetworks.com/products/fileyzer.aspx
(Trial download)

It's purely for analysis; no data placement. It's light and
fast. The GUI is neat, too.

Cheers,
bc

Faeandar

2007-02-06, 1:14 am

On 5 Feb 2007 15:45:22 -0800, "bcwalrus" <bcwalrus@gmail.com> wrote:

>On Feb 3, 7:44 pm, Faeandar <mr_casta...@yahoo.com> wrote:
><...>
>
>
>If the number of files is around tens of millions, then this
>Fileyzer tool seems to do what you mentioned:
> http://neopathnetworks.com/products/fileyzer.aspx
> (Trial download)
>
>It's purely for analysis; no data placement. It's light and
>fast. The GUI is neat, too.
>
>Cheers,
>bc



i will check this out and post back my findings. Thanks.

~F
Faeandar

2007-02-06, 1:14 am

On 5 Feb 2007 15:45:22 -0800, "bcwalrus" <bcwalrus@gmail.com> wrote:

>On Feb 3, 7:44 pm, Faeandar <mr_casta...@yahoo.com> wrote:
><...>
>
>
>If the number of files is around tens of millions, then this
>Fileyzer tool seems to do what you mentioned:
> http://neopathnetworks.com/products/fileyzer.aspx
> (Trial download)
>
>It's purely for analysis; no data placement. It's light and
>fast. The GUI is neat, too.
>
>Cheers,
>bc


I downloaded it and two things I notice make it less than helpful.

1) the trial version won't analyze network drives (That's where the
problems are !!!)

2) it seems to only analyze the C drive on my windows box and
disregards any selection criteria I give it. It's the same view no
matter what my criteria are.

I may talk to NeoPath and get a full fledged eval because the concept
is interesting. But this download, thoughI appreciate the effort and
thought, proved to be useless.

~F
jmcgann@gmail.com

2007-02-06, 1:13 pm

On Feb 1, 8:54 pm, Nik Simpson <n_simp...@bellsouth.net> wrote:
> Faeandar wrote:
>
>
>
> First, right now I would not seeIndex Enginesas a direct competitor,
> they are purely a search application and don't offer much in the way of
> classification or policy-based data management which is needed for ILM.
>
> Second for enterprise wide search the problem is that when I'm looking
> for document X, I'd rather find it on disk than buried on a backup tape.
> If I can't find it online, then I'd go backup tape. So other than as an
> application for helping me keep better track of what I've backed up I
> don't see much of a future for it.
>
> Interesting technology that I suspect will get embedded in things like
> VTLs and D2D disk backup appliances. I don't see it as a standalone
> technology. Good acquisition candidate for somebody in that space.
>
> --
> Nik Simpson


Nik:
I am with Index Engines - and want to update your post above. We
initially entered the market with a search capability, however we have
since added reporting and classification solutions. We are seeing
strong traction in the data classification space as we are the only
vendor that can provide full knowledge of data at the scale required
for enterprise wide engagements. Many of the ILM/classification
vendors have used Open Source indexing solutions - which do not scale
to millions/billions of files and email. We have architected a
purpose built indexing solution that provides comprehensive insight
into all enterprise data assets. A logical fit for anyone looking
into ILM or classification solutions.

Additionally, we can ingest data from a SAN, LAN or directly from
tape. Our architecture is designed to understand storage protocols -
so plugging us into any of these environments will allow us to ingest
data.

Hope this clarifies how we fit in the market and differentiates us
from the others.

Jim McGann
www.indexengines.com

dvymiller@yahoo.com

2007-02-06, 1:13 pm

On Feb 5, 7:00 pm, Faeandar <mr_casta...@yahoo.com> wrote:
> On 5 Feb 2007 15:45:22 -0800, "bcwalrus" <bcwal...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> I downloaded it and two things I notice make it less than helpful.
>
> 1) the trial version won't analyze network drives (That's where the
> problems are !!!)
>
> 2) it seems to only analyze the C drive on my windows box and
> disregards any selection criteria I give it. It's the same view no
> matter what my criteria are.
>
> I may talk to NeoPath and get a full fledged eval because the concept
> is interesting. But this download, thoughI appreciate the effort and
> thought, proved to be useless.
>
> ~F


Faeandar, just out of curiosity, how does the kazeon to crawl the
filer... does it do it via NFS or via NetApp's FPolicy API? And how
much data is on your filer that it took so many days to analyze (how
many TB and how many files?)


Dvy

Faeandar

2007-02-06, 1:13 pm

On 6 Feb 2007 09:58:53 -0800, dvymiller@yahoo.com wrote:


>
>Faeandar, just out of curiosity, how does the kazeon to crawl the
>filer... does it do it via NFS or via NetApp's FPolicy API? And how
>much data is on your filer that it took so many days to analyze (how
>many TB and how many files?)
>
>
>Dvy


It does not use FPolicy for crawls, though I hear it does do
migrations now so I assume it talke to Fpolicy in some fashion?

The amount of data on the filer does not seem to make a difference.
Some qtrees have 10's of millions, others have millions, even others
had 100's of thousands. In all cases it traversed at about 16 objects
per sec. Not good.

~F
dvymiller@yahoo.com

2007-02-06, 7:15 pm

On Feb 6, 11:11 am, Faeandar <mr_casta...@yahoo.com> wrote:
> On 6 Feb 2007 09:58:53 -0800, dvymil...@yahoo.com wrote:
>
>
>
>
>
> It does not use FPolicy for crawls, though I hear it does do
> migrations now so I assume it talke to Fpolicy in some fashion?
>
> The amount of data on the filer does not seem to make a difference.
> Some qtrees have 10's of millions, others have millions, even others
> had 100's of thousands. In all cases it traversed at about 16 objects
> per sec. Not good.
>
> ~F


16 files per second via NFS seems very bad... One should easily be
able to findfirst/findnext via NFS to get meta data much faster than
16 files per sec... I wonder what is holding them up...

Nik Simpson

2007-02-06, 7:15 pm

jmcgann@gmail.com wrote:
>
> Hope this clarifies how we fit in the market and differentiates us
> from the others.
>
> Jim McGann
> www.indexengines.com
>


Thanks Jim, useful information, though I suspect some in the market
would disagree with your differentiators :-)

--
Nik Simpson
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com