Skip to content

Latest commit

 

History

History
42 lines (26 loc) · 1005 Bytes

README.md

File metadata and controls

42 lines (26 loc) · 1005 Bytes

DeepSeek Tokenizer

English | 中文

Introduction

DeepSeek Tokenizer is an efficient and lightweight tokenization libraries which doesn't require heavy dependencies like the transformers library, DeepSeek Tokenizer solely relies on the tokenizers library, making it a more streamlined and efficient choice for tokenization tasks.

Installation

To install DeepSeek Tokenizer, use the following command:

pip install deepseek_tokenizer

Basic Usage

Below is a simple example demonstrating how to use DeepSeek Tokenizer to encode text:

from deepseek_tokenizer import ds_token

# Sample text
text = "Hello! 毕老师!1 + 1 = 2 ĠÑĤвÑĬÑĢ"

# Encode text
result = ds_token.encode(text)

# Print result
print(result)

Output

[19923, 3, 223, 5464, 5008, 1175, 19, 940, 223, 19, 438, 223, 20, 6113, 257, 76589, 131, 100, 76032, 1628, 76589, 131, 108, 76589, 131, 98]

License

This project is licensed under the MIT License.