forked from zkoch/CEDICT_Parser
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
21 lines (12 loc) · 1.16 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
CC-CEDICT is an English Chinese dictionary freely available for use in applications and other such things. It can be downloaded here: http://www.mdbg.net/chindict/chindict.php?page=cc-cedict
The dictionary contains around 100k lines, and they all follow the same order:
TRADITIONAL_CHARS SIMPLIFIED_CHARS [PINYIN] /DEF 1/DEF 2
The biggest issue with this dictionary, however, is that the pinyin comes in the form: zhong1 guo2. That's not all that helpful for people, so we need to convert those into letters with actual tones marks: Zhōngguó
This code was originally writhed for use in converting the dictionary into a JS object for use in another project, but you should be able to extract what you need for your own purposes.
File Overview:
pinyin.py -> Within is a method that takes a string of pinyin and tone marks (e.g. zhong1 guo2) and converts to actual tone marks
parser.py -> Reads the dictionary, parses out the simplified chars, pinyin, and definitions, uses pinyin.py to convert the pinyin, and then puts the resulting dictionaries into an array.
GOALS:
- write cleaner code
- genericize it so people can use it easily
- do some command line wizardry maybe