pyaiml21.utils.text_preprocessors.normalize_cjk_user_input¶
- pyaiml21.utils.text_preprocessors.normalize_cjk_user_input(s: str) List[List[str]][source][source]¶
Perform CJK normalisation, split to sentences and each to words.
CJK (Chinese, Japanses, Korean) normalisation is equivalent to using <explode> on each word. Also UNICODE normalisation with uppercase-ing is done.
- Parameters
s – user input to normalize
- Returns
list of sentences, each sentence is a list of words
- Example:
>>> text = u"こんにちは。この企画を気に入っていただけたでしょうか?" >>> expected = [list("こんにちは"), ... list("この企画を気に入っていただけたでしょうか")] >>> normalize_cjk_user_input(text) == expected True