07-28-04 11:09 PM
We have a problem in that it appears that the content in our CMS 2002 sp1
web site(s) has ended up with varying encodings. This becomes a problem with
content with special characters and/or multilingual content. A posting can
have utf-8, but second posting might have ascii.
How come, not exactly known, will investigate further, but input is usual
authoring via vanilla htlmplaceholders or auth. connector, should it not all
end up as utf-8...?
This problem has developed as there are generally no problems when viewing
the content. However the search engine that has been used will not recognize
some encodings for special characters (while they should be valid as content
as such at a given encoding) and thus will miss part of the seach hits for a
given search phrase containing special chars.
I guess that we need to make sure that all content that we get via
HtmlPlaceholder controls and Word auth. conn. is encoded uniformly (it seems
not be sufficient to just to carry correct meta tags for each encoding since
the search engine can't deal with them all) - how to? (This while we can't
quickly upgrade the search engine right now.)
Any suggestions/better ideas ;)
-Arto
[ Post a follow-up to this message ]
|