IIS Index Server - Make Index Server act more precisely

This is Interesting: Free IT Magazines  
Home > Archive > IIS Index Server > September 2004 > Make Index Server act more precisely





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Make Index Server act more precisely
Christian

2004-09-02, 6:44 pm

Hello,

while testing my website I recognized, that the results Index Server
produces are not quite precisely. When I enter the word "bluetooth" it also
offers all documents containing the part "blue", just like bluescreen,
bluenote or just blue. This behaviour also occurs in the IS-MMC, so I don't
blame my responsible ASP to bring those results.
But just in case... This is the important code of my ASP:
var sSQL = "SELECT Characterization, DocTitle, FileName, vpath, rank FROM
Webcatalog..SCOPE() where CONTAINS( \'" + sWord + "\') order by rank DESC";

How can I solve this behaviour?
Thank you.
Christian

Hilary Cotter

2004-09-02, 6:44 pm

This doesn't make sense. A contains type query is strict and will only match
with bluebird for a query of bluebird.

What OS and SP are you running?

--
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html


"Christian" <Christian@discussions.microsoft.com> wrote in message
news:D090C3D0-9C2A-42D4-A52A-A5B5D9AB194F@microsoft.com...
> Hello,
>
> while testing my website I recognized, that the results Index Server
> produces are not quite precisely. When I enter the word "bluetooth" it

also
> offers all documents containing the part "blue", just like bluescreen,
> bluenote or just blue. This behaviour also occurs in the IS-MMC, so I

don't
> blame my responsible ASP to bring those results.
> But just in case... This is the important code of my ASP:
> var sSQL = "SELECT Characterization, DocTitle, FileName, vpath, rank FROM
> Webcatalog..SCOPE() where CONTAINS( '" + sWord + "') order by rank

DESC";
>
> How can I solve this behaviour?
> Thank you.
> Christian
>



Christian

2004-09-02, 6:44 pm

And it gets even more complicated. I also tested other words which consist of
other words, f.i. "headmaster". In this case, it all works as it should, only
"headmaster" is found.
To test it: http://www.smul.sachsen.de/de/suche.asp (the effect shows up
when using the german word "blausieb" - it offers dozens of docs which only
contain the word "blau" whereas using "hausbau" only shows results for
"hausbau", not for "haus" or "bau").

Thank you.
Christian

PS: It's Win2003.

"Hilary Cotter" wrote:

> This doesn't make sense. A contains type query is strict and will only match
> with bluebird for a query of bluebird.
>
> What OS and SP are you running?
>
> --
> Hilary Cotter
> Looking for a book on SQL Server replication?
> http://www.nwsu.com/0974973602.html
>
>
> "Christian" <Christian@discussions.microsoft.com> wrote in message
> news:D090C3D0-9C2A-42D4-A52A-A5B5D9AB194F@microsoft.com...
> also
> don't
> DESC";
>
>
>

Ton Plooy

2004-09-02, 6:44 pm


"Christian" <Christian@discussions.microsoft.com> wrote in message
news:BF9E944B-BAAB-4F9D-AF8B-077AE45E512A@microsoft.com...
> And it gets even more complicated. I also tested other words which consist

of
> other words, f.i. "headmaster". In this case, it all works as it should,

only
> "headmaster" is found.
> To test it: http://www.smul.sachsen.de/de/suche.asp (the effect shows up
> when using the german word "blausieb" - it offers dozens of docs which

only
> contain the word "blau" whereas using "hausbau" only shows results for
> "hausbau", not for "haus" or "bau").


That makes perfect sense, see
http://msdn.microsoft.com/library/d...enario_3qib.asp .
The German wordbreaker decomposes compound words.However, as some tests
showed, it doesn't break up all words and it's pretty arbitrary which words
get decomposed (try Sommersonnenwendepunkt, no breaking here).
You might want to 'fix' this decomposition behaviour altogether by having
your documents indexed using the neutral wordbreaker (although this may have
some unwanted side effects). I'm not aware of any settings with which you
can disable this wordbreaker feature.

I do have another question regarding compound word breaking in case anyone
is following this thread. With a small utility that I wrote I tested the
German wordbreaker for the word 'Lebensversicherungsgesellschaft', the
following results is returned:

WordSink PutAltWord: Lebensversicherungsgesellschaft
WordSink PutWord: Lebensversicherung
WordSink PutAltWord: 1
WordSink PutWord: Gesellschaft

Note that the result is not in line with what the documentation in the link
above specifies since Levensversicherung is not split up. Anyway, my
questions is about the '1'. This is indeed the (unicode) string '1' that is
returned, does anyone know (or has a theory about) why this is returned and
what it means?

For the interested, I posted a question about the Ducth wordbreaker
behaviour a week ago. As it turns out, strangely enough this wordbreaker
does not seem to do any compound word decomposition at all. It would be nice
if Microsoft could publish a list of implemented wordbreaker features for
each supported language.

Ton




Christian

2004-09-02, 6:44 pm

Very interesting, I did'nt know that. I've also found an article how to
activate the neutral wordbreaker:
http://support.microsoft.com/defaul...kb;EN-US;271818

Greetings
Christian


> That makes perfect sense, see
> http://msdn.microsoft.com/library/d...enario_3qib.asp .
> The German wordbreaker decomposes compound words.However, as some tests
> showed, it doesn't break up all words and it's pretty arbitrary which words
> get decomposed (try Sommersonnenwendepunkt, no breaking here).
> You might want to 'fix' this decomposition behaviour altogether by having
> your documents indexed using the neutral wordbreaker (although this may have
> some unwanted side effects). I'm not aware of any settings with which you
> can disable this wordbreaker feature.
>
> I do have another question regarding compound word breaking in case anyone
> is following this thread. With a small utility that I wrote I tested the
> German wordbreaker for the word 'Lebensversicherungsgesellschaft', the
> following results is returned:
>
> WordSink PutAltWord: Lebensversicherungsgesellschaft
> WordSink PutWord: Lebensversicherung
> WordSink PutAltWord: 1
> WordSink PutWord: Gesellschaft
>
> Note that the result is not in line with what the documentation in the link
> above specifies since Levensversicherung is not split up. Anyway, my
> questions is about the '1'. This is indeed the (unicode) string '1' that is
> returned, does anyone know (or has a theory about) why this is returned and
> what it means?
>
> For the interested, I posted a question about the Ducth wordbreaker
> behaviour a week ago. As it turns out, strangely enough this wordbreaker
> does not seem to do any compound word decomposition at all. It would be nice
> if Microsoft could publish a list of implemented wordbreaker features for
> each supported language.
>
> Ton
>
>
>
>
>

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com