|
Home > Archive > IIS Index Server > April 2005 > looking for efficient way to handle noise words in ASP
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
looking for efficient way to handle noise words in ASP
|
|
| Kevin Blount 2005-04-07, 6:05 pm |
| I'm using the following (example) query in my search script, and as
you'll probably spot, there's a noise/stop word in there, "to", which
basically causes an error.
--
SELECT DocTitle, vpath, path, filename, size, write, characterization,
rank, Authored, Product FROM SCOPE(' DEEP TRAVERSAL OF "/us" ') WHERE
(CONTAINS ('"software to go"') OR CONTAINS ('"software" NEAR "to" NEAR
"go"') > 0) ORDER BY rank DESC
--
What I'd like to do is to work through the entered keywords (i.e. what
the user wants to search for) and ignore any stop words when creating
my query. I thought about reading the noise.enu file into a variable,
then checking each word in the search string against the variable, and
skipping any found in the variable.
The problem is with that idea, is the word "go". While this doesn't
exist in the noise.enu file as a whole word, "got" does, so when
reading the whole noise.enu file in as one variable, "go" does appear
in the variable value.
CONTAINS ('"software" NEAR "to" NEAR "go"')
to be simply:
CONTAINS ('"software" NEAR "go"')
i.e. with the noise word removed.
Does anyone have any better suggestions for handling noise words with
ASP (not .NET). Ideally the end result would change the above query of:
| |
| Kevin Blount 2005-04-08, 5:51 pm |
| I went for the slightly slower method of reading each line of the
noise.enu file and checking it against each word in the search string.
The performance of the script isn't hit as much as I expected, so I'm
happy with this solution.
| |
| Hilary Cotter 2005-04-08, 8:48 pm |
| check this out.
http://www.indexserverfaq.com/searchpage1.zip
Kevin Blount wrote:
> I went for the slightly slower method of reading each line of the
> noise.enu file and checking it against each word in the search string.
> The performance of the script isn't hit as much as I expected, so I'm
> happy with this solution.
>
|
|
|
|
|