A console app I wrote to aid in my study of japanese. It automatically generates anki decks based on a japanese text. For example, if you have a song that you enjoy but don't really understand you can use the app to make two decks, one with all the words from the song, and one with all the kanji. Then, once you have learned all the cards, you will (hopefully) understand the whole song.
The app takes text as input, extracts words from it, and creates an anki card for each of them. It also takes all the kanji from the text and also makes cards for them. The word and kanji definitions are fetched from jisho. You can either choose to use tiny-segmenter
or nagisa
as the text segmentation engine. nagisa
tends to produce fewer garbage cards and generally do a better job, but it is also harder to set up.
First, you need to download the repo and install the required node modules. I'm running Windows 11 and Node v16.15.1. Other node versions should work as well.
git clone https://github.com/Equbuxu/JPTextToAnki.git
cd .\JPTextToAnki\
npm install
Then, put a .txt file with your text into files/inputs
. Then, open files/config.json
and find the line that says:
"input":"/files/inputs/..."
Change the path to point to your file. Also, change input-type
to text-tiny-segmenter
since we don't want to bother with setting up Nagisa just yet. Now build the app and run it:
npx tsc
node build/index.js
You should end up with two files in files/output
: kanji-NameOfYourFile.txt
and words-NameOfYourFile.txt
.
You'll need two note types in anki, one for kanji notes, and one for work notes. To create the note type for words, go into Tools -> Manage Note Types -> Add -> Add: Basic. Select the newly created type and click Fields
. There, rename the Front
field to Word
. Then, exit from the fields dialog and click Cards
. There, choose Styling
and replace the default styles with these:
.card {
font-size: 24pt;
text-align: left;
color: black;
background-color: #FFFAF0;
font-family: yuumichou;
line-height: 1.2em;
}
.weak {
font-size: 16pt;
color: gray;
display:block;
margin-top: 10px;
line-height: 0.9;
}
.inline-weak {
font-size: 16pt;
color: gray;
}
.card-front {
font-size: 35pt;
}
.title {
font-weight: bold;
margin-top: 5px;
margin-left: 30px;
height: 35px
}
.info {
font-size: 16pt;
}
Then, choose Front Template
and replace the default template with this:
<span class="card-front">
{{Word}}
</span>
Now, to create the note type for kanji cards, return to the note types dialog. Press Add
, and choose Clone: JPTextToAnki-Word
. Name the new type JPTextToAnki-Kanji
. Then, open the fields dialog and rename the Word
field to Kanji
. Likewise, open the Cards
dialog, and in the front template, change {{Word}}
to {{Kanji}}
.
This only needs to be done once, and now you will be able to import the file generated by the app.
Press Import File
on the main anki screen and choose one of the generated files, e.g. words-NameOfYourFile.txt
. Choose the previously created card type instead of Basic
(use JPTextToAnki-Word
when importing words, and JPTextToAnki-Kanji
when importing kanji). Pick a deck for the cards or create a new one. In the dropdown, choose Ignore lines where first field matches existing note
(unless you want to keep duplicate cards). Make sure Allow HTML in fields
is ticked. Press Import
. The rest of the options should be filled out correctly by default (Fiels separated by: Tab
, Field 1 of file maps to Word
/Kanji
, Field 2 maps to Back
). That's it!
Nagisa
is a library for text segmentation. It's a python library, but the rest of the project is written in typescript. Because of that I'm using Nagisa
in a small python server that provides a text segmentation api. You'll need to run that server alongside the main app. First, install python 3.8 if you don't already have it. Then, from the project root do:
cd split-words-server
py -3.8 -m pip install nagisa
py -3.8 split-words.py
This will launch the server on localhost:8000
. After that you can follow the instructions from Running the app with tiny-segmenter
, but change input-type
to text-nagisa
in the config file.