This software project is work in progress.

It is currently not yet ready for the end-user, but propably interesting for software developers.

TextBreak is a language-Independent textual breaking module, which is a program that segment text into smaller units, word - for instance, for all languages (e.g. English, Chinese and etc. ) by one engine.

Project pages[edit | edit source]

Design[edit | edit source]

Overview[edit | edit source]

Textbreak overview

Suite[edit | edit source]

Breaking suites

Result[edit | edit source]

Breaking result

Implementation strategy[edit | edit source]

This diagram show the development strategy of TextBreak. There 3 sub-projects that are running simultaneously. Since implementation TextBreak in C is pretty difficult. Thus the prototype in Python was built before building fully implementation in C. However, there is some modules have been written in C already. For instance, Dict, which is dictionary in Trie structure. In order to integrate them, Python binding is built. At the last phase, the prototype will be ported into C.

Implementation strategy of TextBreak

Status[edit | edit source]

It is not usable yet.

Community content is available under CC-BY-SA unless otherwise noted.