A while ago, I had a 4 page school assignment that I was very not motivated to do. Since I thought that the teacher will not be reading it anyways, I decided to generate the text with the Markov chain. I had one problem though, In English, it is easy to tokenize words: they are just separated with spaces. But in Japanese, words are concatenated without separation, which makes splitting much harder. That day, I finally ended up using an open source Japanese lexical analyzer: Wakame. Although my school assignment got submitted, I had a few more ambitions. I wanted to port this to the web. I first looked into turning Wakame to WebAssembly, but since it relies heavily on the linux syscalls, I will have to end up porting much of linux to the web, which is very inefficient to say the least. The next option that I came up with is to code something similar to Wakame from ground up. Fortunately for me, Wakame was an open source, and was well documented. So I was able to learn a few things about its inner workings. First, it uses a specialized dictionaries that contain all possible conjugations and permutations as separate entries. Second, it uses bi-gram markov model, which means it estimates the possible next words type using the previous word and its type. I realized that I can sort of copy this behavior using a connectivity graph of parts of speech, which I have constructed right here. (insert an image here)