IIS Index Server - File(s) being missed by Indexing Service

This is Interesting: Free IT Magazines  
Home > Archive > IIS Index Server > July 2004 > File(s) being missed by Indexing Service





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author File(s) being missed by Indexing Service
CJM

2004-07-09, 12:05 pm

I set up a simple search application (ASP) on our intranet a while ago to
provide a means to search a group of several hundred technical documents
(Word documents). It seem to be working Ok, in that that the users could
enter a document number and it would be listed.

However it appears that at least one document isnt 'searchable' - there may
be more but we havent found them yet.

This document (RCL349) is wedged between RCL348 & RCL350, and appears to be
of a similar format - these are all technical specifications that appear to
be formatted the same. But it doesnt appear in the returned results when a
search is made.

I tried a test, where I copied rcl348/9 and renamed the copies rcl99348 &
rcl99349. I then re-built the indexes from scratch, and tried searching
again... This time rcl348 & rcl99348 were found, but rcl349 & rcl99349 were
not.

This seems to point to the fact that there is something 'different' about
the rcl349 specification, but what?

What exactly does the index server look at when indexing files? It seems to
be checking the document summary, but apparently not the filename, and I'm
unsure how much of the document body itself...

Thanks in advance...

Chris


WenJun Zhang[msft]

2004-07-09, 12:05 pm

Glad to see you in this group again Chris :-)
I believe assumption and suggestion may not be very efficient and
helpful on troubleshooting this kind of specific issues. Could you do
me a favor and send me with your ASP query pages and the problematic
RFC documents? I'd like to see if the symptom can be reproduced on my
side since you've rebuilding catalog and the issue still perisists.
You can return my mail to send the files or just attach them in your
newsgroup thread.

I'm looking forward to your response.
Best regards,

WenJun Zhang
Microsoft Online Support
This posting is provided "AS IS" with no warranties, and confers no
rights.
Get Secure! - www.microsoft.com/security

WenJun Zhang[msft]

2004-07-09, 12:05 pm

Chris,

There is something corrupt in the RCL349 doc. I created a new catalog
which only contains these 2 documents. The 'Docs to index' and
'Deferred for Indexing' columns were always stuck at 1 which means
the Office filter offfilt.dll met errors to finish processing at
least 1 document, i.e RCL349.

Fortunately I simply resolved the problem by opening the RCL349.doc
in word. Try to add some dummy changes and then save as it as a new
doc file. Then delete the original one and rename the new doc to
RCL349.doc. After that, stop index, delete catalog.wci and restart
service to rebuild anything. Now both of the docs are returned
properly and 'Deferred for Indexing' can drop to zero successfully.

There have been lots of issues discovered on offfilt.dll. If you are
using Win2K, make sure SP4 has been applied which contains the
lastest version of this dll.

Let me know if the above way can help you resolve the problem too or
it still remains.

Best regards,

WenJun Zhang
Microsoft Online Support
This posting is provided "AS IS" with no warranties, and confers no
rights.
Get Secure! - www.microsoft.com/security

CJM

2004-07-09, 12:05 pm

WenJun,

Once again you have saved the day. I'm still puzzled as to what is so
different about that particular file, but regardless, your solution worked
straight away.

Thanks

Chris


WenJun Zhang[msft]

2004-07-12, 2:49 am

Chris,

It's pleasure to work with you. You always provide me with the most
detailed information as possible as you can, which helps me better
address the problem. Wish you have a nice week.

Best regards,

WenJun Zhang
Microsoft Online Support
This posting is provided "AS IS" with no warranties, and confers no
rights.
Get Secure! - www.microsoft.com/security

M6rk

2004-07-14, 5:54 pm

We have 1.5 million documents in our catalog. What is the best practice to
determine if all content is indexable?
Thank you!


WenJun Zhang[msft]

2004-07-14, 8:48 pm

You may check the 'Docs to Index' and 'Deferred for Indexing' columns
in Index snap-in. 1.5 million volume is a bit high. Generally we
recommend single catalog shouldn't contain more than 1 million
documents. If you have a total filename list, you can also query out
all the filenames from the catalog and compare the 2 lists to check
which documents are currently not indexed.

Best regards,

WenJun Zhang
Microsoft Online Support
This posting is provided "AS IS" with no warranties, and confers no
rights.
Get Secure! - www.microsoft.com/security

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com