Dictionary builder is a demonstration of advanced JAXB techniques to unmarshall very large xml document with very low memory footprint. This project allow you to build dictionaries based on Wiktionary entries.
dictionary-builder is an EDLA project.
The purpose of edla.org is to promote the state of the art in various domains.
-
Get a fresh wiktionary backup
Choose your favorite language and download the dump containing the current versions of article content here
Example for the french dump:
http://dumps.wikimedia.org/frwiktionary/latest/frwiktionary-latest-pages-articles.xml.bz2 -
Uncompress the fresh downloaded dump somewhere
-
Edit dico.properties to indicate the language you choose, where the dump is located and last but not least where the dictionary should be generated. (Take care you need some free disk space to store your dictionary)
-
Build the project : mvn install
-
Launch the program : java -jar dictionary-builder.jar
-
From the french dictionary 1167195 entries are generated in less than 15 min and 5 Gigas disk space are required for the dictionary.
That's it.