Taiwanese Hokkien Transliterator and Tokeniser
-
Updated
Aug 31, 2024 - Python
Taiwanese Hokkien Transliterator and Tokeniser
A fast, simple, multilingual tokenizer
A Lightweight Word Piece Tokenizer
Conditional unigram tokenization, an extension of unigram tokenization by conditioning target token probabilities on source-language tokens from parallel data
Add a description, image, and links to the tokeniser topic page so that developers can more easily learn about it.
To associate your repository with the tokeniser topic, visit your repo's landing page and select "manage topics."