It is currently not yet ready for the end-user, but propably interesting for software developers. |
TextBreak is a language-Independent textual breaking module, which is a program that segment text into smaller units, word - for instance, for all languages (e.g. English, Chinese and etc. ) by one engine.
Contents
Project pages[edit | edit source]
https://gna.org/projects/textbreak/
Design[edit | edit source]
Overview[edit | edit source]

Textbreak overview
Suite[edit | edit source]

Breaking suites
Result[edit | edit source]

Breaking result
Implementation strategy[edit | edit source]
This diagram show the development strategy of TextBreak. There 3 sub-projects that are running simultaneously. Since implementation TextBreak in C is pretty difficult. Thus the prototype in Python was built before building fully implementation in C. However, there is some modules have been written in C already. For instance, Dict, which is dictionary in Trie structure. In order to integrate them, Python binding is built. At the last phase, the prototype will be ported into C.

Implementation strategy of TextBreak
Status[edit | edit source]
It is not usable yet.