| Hilary Cotter 2006-05-01, 7:20 am |
| The Chinese word breaker appears to look at each character, detect radicals,
subcharacters, and then parse the token looking for compound characters.
You can find the patent filed for the actual process that they use -
unfortunately I can't find it right now, but I did find it through Google
some time ago.
Once upon a time Oracle, Sybase, Microsoft, and IBM all used the same
company's word breaker - infosoft. I am not sure who uses what now.
--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Martin" <bartekma@gmail.com> wrote in message
news:1146168594.774392.124940@y43g2000cwc.googlegroups.com...
>I am looking for commercially available Chinese word breaker for Index
> server. I am looking for a word breaker that would be able to perform
> the actual Chinese words segmentation instead of considering each
> Chinese character as a word like current Index Server Chinese word
> breaker does.
>
> Does anyone know where I could find it?
>
|