It is currently not yet ready for the end-user, but propably interesting for software developers.
TextBreak is a language-Independent textual breaking module, which is a program that segment text into smaller units, word - for instance, for all languages (e.g. English, Chinese and etc. ) by one engine.
Project pages[edit | edit source]
Design[edit | edit source]
Overview[edit | edit source]
Suite[edit | edit source]
Result[edit | edit source]
Implementation strategy[edit | edit source]
This diagram show the development strategy of TextBreak. There 3 sub-projects that are running simultaneously. Since implementation TextBreak in C is pretty difficult. Thus the prototype in Python was built before building fully implementation in C. However, there is some modules have been written in C already. For instance, Dict, which is dictionary in Trie structure. In order to integrate them, Python binding is built. At the last phase, the prototype will be ported into C.
Status[edit | edit source]
It is not usable yet.