IIS Index Server - Indexing service ignores most files in directory

This is Interesting: Free IT Magazines  
Home > Archive > IIS Index Server > January 2007 > Indexing service ignores most files in directory





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Indexing service ignores most files in directory
cushlomokree

2006-10-31, 1:30 am

Hi,
I am running Windows Server 2003 SP1.
I'm developing a reference website with approximately 12000 .htm files in
the directory.
I created a new catalog in Computer Management under Indexing Service.
Added the directory on my local drive containing the development website and
excluded some subdirectories that contain unwanted material.
When I restart the service it only indexes 43 out of 12110 *.htm files.
Have "tuned the service", played with the folder's permissions until almost
anyone on earth has access. deleted the catalog and retried...
still only 43 files get indexed (~1mbyte).
Any help would be greatly appreciated.
Mike


WenJun Zhang[msft]

2006-10-31, 1:30 am

Hi Mike,

If there are a large number of files need to be indexed, when the catalog
was initially created, Indexing Service might take a bit long time to
finish the scanning on all the files. This is a expected behvaior.
Sometimes all file contents will show up a couple of hours later.

Please open Computer Management, expand Indexing Service snap-in. Verify
the 'Docs to Index' number. See if there is still a large number of docs
haven't been indexed there. Also check if the status is 'Indexing Paused
(User Active)'. Indexing Service generally performs full speed scanning
only when interactive user and system isn't in an active stage. Therefore
to let it finish the indexing as soon as possible, you can stop and restart
Indexing Service, then do not move mouse and click keyboad anymore. Wait
for the 'Docs to Index' number decrease to 0. Then all the web pages should
be properly returned from query.

Please feel free let me know if problem still persists.

Have a nice day.

Sincerely,

WenJun Zhang

Microsoft Online Community Support

========================================
==========

Get notification to my posts through email? Please refer to:
http://msdn.microsoft.com/subscript...ault.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at:

http://msdn.microsoft.com/subscript...t/default.aspx.

========================================
==========

This posting is provided "AS IS" with no warranties, and confers no rights.

cushlomokree

2006-10-31, 1:30 am

Hi WenJun,

Nothing has changed overnight.
Total docs=43; Docs to Index=0, not paused, Saved Indexes=1, Word Lists=0,
Size=1Mb, Status=Started
Please note that the folder in question is a Web Site on Frontpage 2003, as
well as Visual Studio 2005, and is listed as a web site (not default) in the
IIS snap in Computer Management.
On the IIS Properties dialog for this web site, 'Home Directory' tab the
checkboxes 'Read' and 'Index this resource' are checked.
I think there may be some conflict that I am not aware of.
As an experiment, I just copied all the content from the web site folder
into a temporary new folder and successfully created a new, complete
catalog(docs=12111) there with the Indexing service.
The indexing will ultimately be used as a "search function" for this site,
so I need to get the indexer working for the web site.
Thanks for your help,
Mike

""WenJun Zhang[msft]"" <wjzhang@online.microsoft.com> wrote in message
news:jOKtIg$%23GHA.4432@TK2MSFTNGXA01.phx.gbl...
> Hi Mike,
>
> If there are a large number of files need to be indexed, when the catalog
> was initially created, Indexing Service might take a bit long time to
> finish the scanning on all the files. This is a expected behvaior.
> Sometimes all file contents will show up a couple of hours later.
>
> Please open Computer Management, expand Indexing Service snap-in. Verify
> the 'Docs to Index' number. See if there is still a large number of docs
> haven't been indexed there. Also check if the status is 'Indexing Paused
> (User Active)'. Indexing Service generally performs full speed scanning
> only when interactive user and system isn't in an active stage. Therefore
> to let it finish the indexing as soon as possible, you can stop and
> restart
> Indexing Service, then do not move mouse and click keyboad anymore. Wait
> for the 'Docs to Index' number decrease to 0. Then all the web pages
> should
> be properly returned from query.
>
> Please feel free let me know if problem still persists.
>
> Have a nice day.
>
> Sincerely,
>
> WenJun Zhang
>
> Microsoft Online Community Support
>
> ========================================
==========
>
> Get notification to my posts through email? Please refer to:
> http://msdn.microsoft.com/subscript...ault.aspx#notif
> ications.
>
> Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
> where an initial response from the community or a Microsoft Support
> Engineer within 1 business day is acceptable. Please note that each follow
> up response may take approximately 2 business days as the support
> professional working with you may need further investigation to reach the
> most efficient resolution. The offering is not appropriate for situations
> that require urgent, real-time or phone-based interactions or complex
> project analysis and dump analysis issues. Issues of this nature are best
> handled working with a dedicated Microsoft Support Engineer by contacting
> Microsoft Customer Support Services (CSS) at:
>
> http://msdn.microsoft.com/subscript...t/default.aspx.
>
> ========================================
==========
>
> This posting is provided "AS IS" with no warranties, and confers no
> rights.
>



AC

2006-10-31, 1:20 pm

"cushlomokree" <cushlomokree@newsgroup.nospam> wrote in message
news:%23NLC$pD$GHA.2180@TK2MSFTNGP05.phx.gbl...
> Hi WenJun,
>
> Nothing has changed overnight.
> Total docs=43; Docs to Index=0, not paused, Saved Indexes=1, Word Lists=0,
> Size=1Mb, Status=Started
> Please note that the folder in question is a Web Site on Frontpage 2003,
> as

<snip/>

Could changing the archive attribute of the files force it to re-index the
files?

Regards


WenJun Zhang[msft]

2006-10-31, 1:20 pm

Hi Mike,

The problem is why the total doc number is only 43. Please compare the
problematic catalog's scope(directory) configuration and the correct new
test catalog. Any obvious difference between them?

If you cannot figure out where the problem is, please export the following
registry entry and send it to me at: wjzhang@online.microsoft.com
(remove.). I will help on reviewing it for clues.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControl
Set\Control\ContentIndex\Catalogs

I look forward to your message. Have a good day.

Sincerely,

WenJun Zhang

Microsoft Online Community Support

========================================
==========

Get notification to my posts through email? Please refer to:
http://msdn.microsoft.com/subscript...ault.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at:

http://msdn.microsoft.com/subscript...t/default.aspx.

========================================
==========

This posting is provided "AS IS" with no warranties, and confers no rights.

cushlomokree

2006-10-31, 7:25 pm

Hi WenJun,
Here are the registry entries. There are 3 catalogs: "rh06", "system", and
"test"
"rh06" is the website that is not indexing properly, and "test" is the copy
in another directory that is indexing correctly.

Key Name:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControl
Set\Control\ContentIndex\Catalogs
Class Name: <NO CLASS>
Last Write Time: 10/31/2006 - 11:21 AM

Key Name:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControl
Set\Control\ContentIndex\Catalogs\rh06
Class Name: <NO CLASS>
Last Write Time: 10/31/2006 - 11:21 AM
Value 0
Name: Location
Type: REG_SZ
Data: D:\MV\rh06\html\htmlhelp\searchcat

Value 1
Name: IsIndexingW3Svc
Type: REG_DWORD
Data: 0

Value 2
Name: IsIndexingNNTPSvc
Type: REG_DWORD
Data: 0


Key Name:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControl
Set\Control\ContentIndex\Catalogs\rh06\S
copes
Class Name: <NO CLASS>
Last Write Time: 10/31/2006 - 11:22 AM
Value 0
Name: D:\MV\rh06\html\htmlhelp\htmlROBO
Type: REG_SZ
Data: ,,5


Key Name:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControl
Set\Control\ContentIndex\Catalogs\System

Class Name: <NO CLASS>
Last Write Time: 12/7/2004 - 2:26 PM
Value 0
Name: Location
Type: REG_SZ
Data: C:\System Volume Information

Value 1
Name: IsIndexingW3Svc
Type: REG_DWORD
Data: 0

Value 2
Name: IsIndexingNNTPSvc
Type: REG_DWORD
Data: 0


Key Name:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControl
Set\Control\ContentIndex\Catalogs\System
\Scopes
Class Name: <NO CLASS>
Last Write Time: 10/30/2006 - 9:20 AM
Value 0
Name: D:\Documents and Settings
Type: REG_SZ
Data: ,,4

Value 1
Name: C:\
Type: REG_SZ
Data: ,,4

Value 2
Name: D:\
Type: REG_SZ
Data: ,,4

Value 3
Name: D:\Documents and Settings\*\Application Data\*
Type: REG_SZ
Data: ,,4

Value 4
Name: D:\Documents and Settings\*\Local Settings\*
Type: REG_SZ
Data: ,,4

Value 5
Name: D:\MV\rh06\html\htmlhelp\htmlROBO
Type: REG_SZ
Data: ,,5


Key Name:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControl
Set\Control\ContentIndex\Catalogs\test
Class Name: <NO CLASS>
Last Write Time: 10/31/2006 - 11:18 AM
Value 0
Name: Location
Type: REG_SZ
Data: D:\MV\rh06\html\backup\searchcat

Value 1
Name: IsIndexingW3Svc
Type: REG_DWORD
Data: 0

Value 2
Name: IsIndexingNNTPSvc
Type: REG_DWORD
Data: 0


Key Name:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControl
Set\Control\ContentIndex\Catalogs\test\S
copes
Class Name: <NO CLASS>
Last Write Time: 10/31/2006 - 11:19 AM
Value 0
Name: D:\MV\rh06\html\backup\RH06
Type: REG_SZ
Data: ,,5

thanks,
Mike

""WenJun Zhang[msft]"" <wjzhang@online.microsoft.com> wrote in message
news:WvlpAGO$GHA.1984@TK2MSFTNGXA01.phx.gbl...
> Hi Mike,
>
> The problem is why the total doc number is only 43. Please compare the
> problematic catalog's scope(directory) configuration and the correct new
> test catalog. Any obvious difference between them?
>
> If you cannot figure out where the problem is, please export the following
> registry entry and send it to me at: wjzhang@online.microsoft.com
> (remove.). I will help on reviewing it for clues.
>
> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControl
Set\Control\ContentIndex\Catalogs
>
> I look forward to your message. Have a good day.
>
> Sincerely,
>
> WenJun Zhang
>
> Microsoft Online Community Support
>
> ========================================
==========
>
> Get notification to my posts through email? Please refer to:
> http://msdn.microsoft.com/subscript...ault.aspx#notif
> ications.
>
> Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
> where an initial response from the community or a Microsoft Support
> Engineer within 1 business day is acceptable. Please note that each follow
> up response may take approximately 2 business days as the support
> professional working with you may need further investigation to reach the
> most efficient resolution. The offering is not appropriate for situations
> that require urgent, real-time or phone-based interactions or complex
> project analysis and dump analysis issues. Issues of this nature are best
> handled working with a dedicated Microsoft Support Engineer by contacting
> Microsoft Customer Support Services (CSS) at:
>
> http://msdn.microsoft.com/subscript...t/default.aspx.
>
> ========================================
==========
>
> This posting is provided "AS IS" with no warranties, and confers no
> rights.
>



WenJun Zhang[msft]

2006-11-01, 7:21 am

Hi Mike,

I saw the problematic catalog and your test catalog include difference
pathes as their scopes:

rh06 Catalog:

D:\MV\rh06\html\htmlhelp\htmlROBO

test Catalog:

D:\MV\rh06\html\backup\RH06

Looks like this is the reason of the problem. Please double-check their
directory setting. Also does the \htmlhelp\htmlROBO directory only contain
43 documents? If you also include \backup\RH06 directory to rh06 catalog
and restart IS service, will the doc number be reasonable?

Have a nice day.

Sincerely,

WenJun Zhang

Microsoft Online Community Support

========================================
==========

Get notification to my posts through email? Please refer to:
http://msdn.microsoft.com/subscript...ault.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at:

http://msdn.microsoft.com/subscript...t/default.aspx.

========================================
==========

This posting is provided "AS IS" with no warranties, and confers no rights.

cushlomokree

2006-11-01, 1:17 pm

Hi WenJun,
D:\MV\rh06\html\htmlhelp\htmlROBO (rh06 Catalog) is the problem directory.
It contains 12113 files, but the indexer only "sees" 43 files.
D:\MV\rh06\html\backup\RH06 (test Catalog) is a backup copy of the above
directory. It also contains 12113 files, but the indexer "sees" all 12113
files.
Directory settings for both are the same.

--> If you also include \backup\RH06 directory to rh06 catalog
and restart IS service, will the doc number be reasonable?<--

Yes, if I add that directory to the rh06 catalog, the indexer "sees" all the
files.

However that just brings us back to the original problem:
Why does the indexer not see all the files in the original directory?

Thanks,
Mike

""WenJun Zhang[msft]"" <wjzhang@online.microsoft.com> wrote in message
news:4Z8AgRZ$GHA.1984@TK2MSFTNGXA01.phx.gbl...
> Hi Mike,
>
> I saw the problematic catalog and your test catalog include difference
> pathes as their scopes:
>
> rh06 Catalog:
>
> D:\MV\rh06\html\htmlhelp\htmlROBO
>
> test Catalog:
>
> D:\MV\rh06\html\backup\RH06
>
> Looks like this is the reason of the problem. Please double-check their
> directory setting. Also does the \htmlhelp\htmlROBO directory only contain
> 43 documents? If you also include \backup\RH06 directory to rh06 catalog
> and restart IS service, will the doc number be reasonable?
>
> Have a nice day.
>
> Sincerely,
>
> WenJun Zhang
>
> Microsoft Online Community Support
>
> ========================================
==========
>
> Get notification to my posts through email? Please refer to:
> http://msdn.microsoft.com/subscript...ault.aspx#notif
> ications.
>
> Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
> where an initial response from the community or a Microsoft Support
> Engineer within 1 business day is acceptable. Please note that each follow
> up response may take approximately 2 business days as the support
> professional working with you may need further investigation to reach the
> most efficient resolution. The offering is not appropriate for situations
> that require urgent, real-time or phone-based interactions or complex
> project analysis and dump analysis issues. Issues of this nature are best
> handled working with a dedicated Microsoft Support Engineer by contacting
> Microsoft Customer Support Services (CSS) at:
>
> http://msdn.microsoft.com/subscript...t/default.aspx.
>
> ========================================
==========
>
> This posting is provided "AS IS" with no warranties, and confers no
> rights.
>



WenYuan Wang

2006-11-03, 7:20 am

Hi Mike

I'm Wen-Jun's Backup.
Wen-Jun is on vacation, these days.
He will go back next week and reply here.
If you have any more concerns on it, please feel free to post here.

Sincerely,
WenYuan

WenJun Zhang[msft]

2006-11-07, 7:24 am

Hi Mike,

Since the current situation is a little bit strange, my suggestion is to
rebuild the old catalog and directory to test. Please follow the steps as
below:

1) Stop Indexing Service.

2) Rename the directory htmlROBO and create a new one under
D:\MV\rh06\html\htmlhelp\.

3) Copy all the files from the renamed directory to the new
D:\MV\rh06\html\htmlhelp\htmlROBO\

4) Select all files in the new directory and open
Properties->General->Advanced, make sure option 'File is ready for
archiving' isn't selected and 'For fast searching, allow Indexing Service
to index this file' is selected.

5) Delete the problematic catalog in Indexing Service snap-in. Also make
sure its corresponding hidden catalog.wci directory is deleted.

6) Recreate a new catalog with the same name and include the directory:
D:\MV\rh06\html\htmlhelp\htmlROBO\

7) Start Indexing Service to allow it crawl on the new data folder.

Let's see if it works this time. If still no sucess, I will ping our
indexing service product group for further suggestions on the
troubleshooting.

Have a great day.

Sincerely,

WenJun Zhang

Microsoft Online Community Support

========================================
==========

Get notification to my posts through email? Please refer to:
http://msdn.microsoft.com/subscript...ault.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at:

http://msdn.microsoft.com/subscript...t/default.aspx.

========================================
==========

This posting is provided "AS IS" with no warranties, and confers no rights.

cccstar

2007-01-09, 7:26 pm

I am expierencing exactly the same symptoms with a fewer number of files.
The suggested copy into empty directories appears to work correctly but how
can know that future additions to the directory will work? Is there a patch
or other work around soultion to this problem
--
Romans 12:1-3


""WenJun Zhang[msft]"" wrote:

> Hi Mike,
>
> Since the current situation is a little bit strange, my suggestion is to
> rebuild the old catalog and directory to test. Please follow the steps as
> below:
>
> 1) Stop Indexing Service.
>
> 2) Rename the directory htmlROBO and create a new one under
> D:\MV\rh06\html\htmlhelp\.
>
> 3) Copy all the files from the renamed directory to the new
> D:\MV\rh06\html\htmlhelp\htmlROBO\
>
> 4) Select all files in the new directory and open
> Properties->General->Advanced, make sure option 'File is ready for
> archiving' isn't selected and 'For fast searching, allow Indexing Service
> to index this file' is selected.
>
> 5) Delete the problematic catalog in Indexing Service snap-in. Also make
> sure its corresponding hidden catalog.wci directory is deleted.
>
> 6) Recreate a new catalog with the same name and include the directory:
> D:\MV\rh06\html\htmlhelp\htmlROBO\
>
> 7) Start Indexing Service to allow it crawl on the new data folder.
>
> Let's see if it works this time. If still no sucess, I will ping our
> indexing service product group for further suggestions on the
> troubleshooting.
>
> Have a great day.
>
> Sincerely,
>
> WenJun Zhang
>
> Microsoft Online Community Support
>
> ========================================
==========
>
> Get notification to my posts through email? Please refer to:
> http://msdn.microsoft.com/subscript...ault.aspx#notif
> ications.
>
> Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
> where an initial response from the community or a Microsoft Support
> Engineer within 1 business day is acceptable. Please note that each follow
> up response may take approximately 2 business days as the support
> professional working with you may need further investigation to reach the
> most efficient resolution. The offering is not appropriate for situations
> that require urgent, real-time or phone-based interactions or complex
> project analysis and dump analysis issues. Issues of this nature are best
> handled working with a dedicated Microsoft Support Engineer by contacting
> Microsoft Customer Support Services (CSS) at:
>
> http://msdn.microsoft.com/subscript...t/default.aspx.
>
> ========================================
==========
>
> This posting is provided "AS IS" with no warranties, and confers no rights.
>
>

Alec MacLean

2007-01-17, 7:27 pm

I had been expereinceing the same problem as Mike, but now have results
coming through after following this change...

WenJun Zhang's response of Nov 3rd contained the following comment:

> 4) Select all files in the new directory and open
> Properties->General->Advanced, make sure option 'File is ready for
> archiving' isn't selected and 'For fast searching, allow Indexing Service
> to index this file' is selected.


What really caught my eye was the "For fast searching, allow Indexing
Service to index this file".

I had already specified in IIS Manager to "Index this folder", as well as
setting up the catalog in Index Server. I had had some change to the word
list and saved indexes after forcing a rescan of the directory, but this
hadn't helped the search results.

Only after going to Windows Explorer and selecting the folder the site is
physically located in, e.g. D:\website (which is not the default of the
C:\InetPub due to use of RAID and separate drive volumes, etc.) and checking
that box did the catalog start to behave properly.

Thanks to WenJun Zhang for this snippet of absolutely crucial info! But why
was it required at all - IMHO the other more obvious settings should either
have controlled it or over-ridden it !!!!

Al


"cushlomokree" <cushlomokree@newsgroup.nospam> wrote in message
news:%23f0RV55%23GHA.3352@TK2MSFTNGP03.phx.gbl...
> Hi,
> I am running Windows Server 2003 SP1.
> I'm developing a reference website with approximately 12000 .htm files in
> the directory.
> I created a new catalog in Computer Management under Indexing Service.
> Added the directory on my local drive containing the development website
> and excluded some subdirectories that contain unwanted material.
> When I restart the service it only indexes 43 out of 12110 *.htm files.
> Have "tuned the service", played with the folder's permissions until
> almost anyone on earth has access. deleted the catalog and retried...
> still only 43 files get indexed (~1mbyte).
> Any help would be greatly appreciated.
> Mike
>



Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com