10-16-04 02:25 AM
your server locale has to be Chinese for the characterization to show up
correctly.
"Dan Meineck" <DanMeineck@discussions.microsoft.com> wrote in message
news:B415FA8C-A79F-4D5B-88A4-26D1B3A47418@microsoft.com...
> Hi there, i wonder if anyone can help me. I am creating an index server
> based
> search plug-in for a .NET site, using Cisso as my method of using index
> server. It's all working nicely and i have now started looking at the
> server
> indexing pages with unicode characters, specifically in this example,
> chinese.
>
> I am indexing flat HTML pages in a publish directory which have a meta
> element of MS.LOCALE set to the locale of the correct language, in my case
> 'zh-CN'.
>
> Setting the codepage of the cisso wrapper allows the foreign characters to
> render correctly, and setting the localeid to chinese allows for chinese
> characters to be acceptable as a search terms.
>
> My problem is that when i have conducted a search and am getting the
> results
> back, the value immediately retrived from the dataset from cisso in the
> characterisation column, for the chinese content result, is corrupt:
>
> "my keywords. 锘?html>. Latest News鏅寸鍦板尯鏀垮簻 鈥撴湇鍔
ぇ浼?
> 鏅寸鍦板尯鏀垮簻
> 鈥撴湇鍔″ぇ浼楁櫞绌哄湴鍖
斂搴滅幇鏈?5浣嶅鍛樸備
粬
> 潵鑷拰 h〃鐫28涓夊尯浜烘皯缇や紬
屽苟鍦ㄤ换鏈熺
殑鍥涘勾閲岋紝璐熻矗鏅寸鍦
尯鐨勫畯瑙傛斂绛
> 笌瑙勫垝锛屾彁渚涘叕鍏辨湇
鍔″拰鍐冲畾鍚_鏈嶅姟
勬敹璐广?
> 閫氳繃鏈綉绔欙紝鎮ㄥ彲 ラ_笎娣卞叆浜嗚В鏅寸鏀垮
簻鍚勯」鏂逛究甯傛皯鐨勬湇
′互鍙婃斂搴滃姛鑳斤紝
勯儴闂ㄧ殑鑱旂郴鏂瑰紡鍜
斂搴滃勾搴︽姤鍛娿俉hat's
> NewTwo Column Lorem ipsum dolor sit amet, consete"
>
> - Notice the ?html along with the w missing of 'What's NewTwo Column' - i
> will add the HTML source of the index page below to clarify:
>
> <html><head><title>dan</title>
> <meta name="MS.LOCALE" content="zh-cn">
> <meta name="keywords" content="my keywords">
> <meta name="comments" content="">
> <meta name="author" content="Admin">
> <meta name="accessrights" content=",1,2,">
> <meta name="immediacyurl" content="http://localhost/immsample501">
> <meta name="lastsavedtm" content="08/10/2004 10:31:53">
> <meta name="categories" content=",">
> <meta name="language" content="--">
> </head><body>Latest News晴空地区政府 –服务大众
>
> 晴空地区政府
> –服务大众晴空地区政府现有5
5位委员。他们来自和代表
着28个选区人民群众,并在任
的四年里,负责晴空地区
宏观政_与规划,提供公共服
务和决定各种服务的收费。
>
> 通过本网站,您可以逐渐深入
解晴空政府各项方便市
的服务以及政府功能,各部门
的联系方式和政府年度报告
。What''s
> NewTwo Column
> Lorem ipsum dolor sit
> amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor
> invidunt ut labore et dolore magna aliquyam erat, sed diam
> voluptua. At vero eos et accusam et justo duo dolores et ea rebum.
> Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum
> dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing
> elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore
> magna aliquyam erat, sed diam voluptua. At vero eos et accusam et
> justo duo dolores et ea rebum. Stet clita kasd gubergren.
> dan dan dan dan dan dan my keywords my keywords my keywords my keywords my
> keywords
> </body>
> </html>
>
> - It looks as if the corruption comes straight out of index server - can
> anyone shed any light on this? Also another problem found is if the title
> is
> in chinese text, it gets ignored because for some reason the meta data
> below
> is corrupted.
>
> Thanks,
>
> Dan
[ Post a follow-up to this message ]
|