Skip to content

seanghay/khmercut

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

khmercut

A (fast) Khmer word segmentation toolkit.

  • A single python file
  • Using pycrfsuite only
pip install khmercut

Python

from khmercut import tokenize

tokenize("ឃាត់ខ្លួនជនសង្ស័យ០៤នាក់ ករណីលួចខ្សែភ្លើង នៅស្រុកព្រៃនប់")
# => ['ឃាត់ខ្លួន', 'ជនសង្ស័យ', '០៤', 'នាក់', ' ', 'ករណី', 'លួច', 'ខ្សែភ្លើង', ' ', 'នៅ', 'ស្រុក', 'ព្រៃនប់']

Reference

About

A (fast) Khmer word segmentation toolkit.

Topics

Resources

Stars

Watchers

Forks

Languages