|
Home > Archive > IIS Index Server > June 2005 > Indexing/Searching Chinese
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Indexing/Searching Chinese
|
|
| Kirk Potter 2005-06-03, 6:01 pm |
| Hi,
I am having some trouble with indexing & searching HTML files which contain
a UTF-8 representation of Chinese.
I have done a load of reading of previous articles on this and most suggest
a variety of things to get this working, namely:
1. Adding <meta http-equiv="Content-Type" content="text/html;
charset=UTF-8"> to the head of the pages.
2. Adding <meta name="ms.locale" content="cn-zh">
3. Specifying the locale identifier when connecting via MSIDXS
4. Specifying a code page (when using ASP we are are)
Unfortunately none of these things worked for me.
What I have done with some success is the following:
1. Installed the Chinese language packs to the Windows 2000 Server
concerned.
This has stopped our initial error of "The query contained only ignored
words"
2. Made sure that files to be indexed are saved to disk in UTF-8 format.
With these two items I can get very simple Chinese indexing and searching to
work (e.g. finding the Chinese for "tree" successfully - ?)
The problem I have is that if my page to be indexed and searched contains
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> then I
get no results returned. I need this to be there as web browsers will
require this to display the pages correctly (I know in IIS I can add this
header but I don't want to if I don't need to).
This seems counter-intuitive to me and due to the nature of how we are
creating these pages I don't really want to have to remove the meta tag. Can
anyone explain why Indexing Service will not return any results if this tag
is present and if there is anyway around this?
Many thanks in advance,
Kirk
| |
| Hilary Cotter 2005-06-04, 7:47 am |
| This should be working. Please post sample docs here or send them to me
offline.
--
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Kirk Potter" <potter_kirk@hot|\|OSPAMmail.com> wrote in message
news:J3_ne.12792$YH5.5290@fe1.news.blueyonder.co.uk...
> Hi,
>
> I am having some trouble with indexing & searching HTML files which
contain
> a UTF-8 representation of Chinese.
>
> I have done a load of reading of previous articles on this and most
suggest
> a variety of things to get this working, namely:
>
> 1. Adding <meta http-equiv="Content-Type" content="text/html;
> charset=UTF-8"> to the head of the pages.
> 2. Adding <meta name="ms.locale" content="cn-zh">
> 3. Specifying the locale identifier when connecting via MSIDXS
> 4. Specifying a code page (when using ASP we are are)
>
> Unfortunately none of these things worked for me.
>
> What I have done with some success is the following:
>
> 1. Installed the Chinese language packs to the Windows 2000 Server
> concerned.
> This has stopped our initial error of "The query contained only ignored
> words"
>
> 2. Made sure that files to be indexed are saved to disk in UTF-8 format.
>
> With these two items I can get very simple Chinese indexing and searching
to
> work (e.g. finding the Chinese for "tree" successfully - ?)
>
> The problem I have is that if my page to be indexed and searched contains
> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> then I
> get no results returned. I need this to be there as web browsers will
> require this to display the pages correctly (I know in IIS I can add this
> header but I don't want to if I don't need to).
>
> This seems counter-intuitive to me and due to the nature of how we are
> creating these pages I don't really want to have to remove the meta tag.
Can
> anyone explain why Indexing Service will not return any results if this
tag
> is present and if there is anyway around this?
>
> Many thanks in advance,
>
> Kirk
>
>
|
|
|
|
|