Wiktionary dump in accessible JSON
format.
Just get the files: Wiktionary dump in JSON
format
The English Wiktionary gracefully provides regular dumps of all its content so anyone can easily parse the content for his/her own use. Unfortunately, the parsing part can be daunting and quickly turn any interested developer off. This is because:
- The dump file is very big (5.5Gb uncompressed)
- The dump file contains many other things.
- Wiktionary is a dictionary in English, not a English dictionary, so its entries includes a lot of non-English words.
- The entries aren't in alphabetical order.
In other words, to get to the content you want, you have first filter through millions of pages you aren't remotely interested in. Or, you can spam the system with lots of API calls.
Jsonbook helps remove the first road block of your making awesome use of the Wiktionary dump. It does the following:
- Retrieve only the word articles
- Organize all the articles by language
- Convert the text to hierarchical tree
- Save all the content to individual
JSON
files.
See a sample output of the entry for "gratis".
Currently it takes about 58 minutes to parse the entire English Wiktionary dump on a 2013 MacBook Pro.