Skip to content

Real world example to demonstrate advanced JAXB techniques to unmarshall very large xml document with very low memory footprint..

License

Notifications You must be signed in to change notification settings

kikoaumond/dictionary-builder

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Dictionary builder

About

Dictionary builder is a demonstration of advanced JAXB techniques to unmarshall very large xml document with very low memory footprint. This project allow you to build dictionaries based on Wiktionary entries.

dictionary-builder is an EDLA project.

The purpose of edla.org is to promote the state of the art in various domains.

How to use it

  1. Get a fresh wiktionary backup
    Choose your favorite language and download the dump containing the current versions of article content here
    Example for the french dump:
    http://dumps.wikimedia.org/frwiktionary/latest/frwiktionary-latest-pages-articles.xml.bz2

  2. Uncompress the fresh downloaded dump somewhere

  3. Edit dico.properties to indicate the language you choose, where the dump is located and last but not least where the dictionary should be generated. (Take care you need some free disk space to store your dictionary)

  4. Build the project : mvn install

  5. Launch the program : java -jar dictionary-builder.jar

  6. From the french dictionary 1167195 entries are generated in less than 15 min and 5 Gigas disk space are required for the dictionary.

That's it.

About

Real world example to demonstrate advanced JAXB techniques to unmarshall very large xml document with very low memory footprint..

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published