AUX dictionary is a mechanism to add word entries to
data/dictionary_oss/dictionary0x.txt
without running the internal training
pipeline.
Note
We recommend to update TSV files in
data/dictionary_manual as an
easier way. Please consider using data/dictionary_manual/
at first.
key | value | base_key | base_value | cost_offset |
---|---|---|---|---|
あるぱか | アルパカ | かぴばら | カピバラ | -1 |
- aux_dictionary.tsv is a list of additional words to be added to Mozc's word dictionary.
- key and value (i.e. アルパカ) are fields for the new word.
- base_key and base_value (i.e. カピバラ) are fields of the reference word already in the dictionary.
- cost_offset is the difference from the base cost. (i.e. -1 makes the new word prioritized than the base word).
- other fields of the new word (e.g. lid, rid , cost) are copied from the reference word.
The entries in this file are used to filter out the entries in the dictionary.
key | value |
---|---|
あるぱか | 或るパカ |
key
andvalue
are the regexp patterns of words to be removed.- Each line is extended to the following regexp pattern:
'^{key}\t\\d+\t\\d+\t\\d+\t{value}(\t.*)?\n$'