Semester project for the course "Industrial Strength Multilingual Language Analysis" at Universität Tübingen, Winter Semester 17-18. This application analyzes Japanese text and displays information about the text that helps English-speaking learners of Japanese, including word segmentation, translation, pronunciation, inflection information etc.
You can import it as a Maven project in Eclipse/IntelliJ IDEA and launch it as a GWT application, after having installed the GWT plugin. Remember to check whether the resources folder is excluded from the Build Path. If so, include it before building and launching the project.
For detailed documentation (information about the project, its use cases, implementation, linguistic background, plus screenshots), please refer to the PDF report.
- Add the (uncompressed) files to
src/main/webapp/WEB-INF/tokenize
. Make sure Eclipse notices the added files. - If you want the tokenized content to be separated by something other than a blank space, change the
SEPARATOR
constant inLookupServiceImpl.java
. - Run the project in SuperDev mode, open the website, and press the
Tokenize files
button. Update logs about which file is currently being read are in the console. Once all files have been processed, there is a pop-up on the website. - The tokenized files are in the
target/JapaneseHelper-1.0-SNAPSHOT
directory. Note that this directory will be rebuilt and the files will be deleted whenever the project is run again!
The Wikimedia Foundation licenses its texts on Wikipedia and Wiktionary under a Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license (Here is the full text of the license.). This applies to the files in our resource folders /src/main/webapp/WEB-INF/dictionary
, /src/main/webapp/WEB-INF/difficulty-rating
, and /src/main/webapp/WEB-INF/inflection-templates
, which are based on Wikipedia/Wiktionary articles. These files, including any modifications we made, are also licensed by the same license.