- I adapt the detection tool from https://github.com/platisd/duplicate-code-detection-tool
The following Python packages have to be installed:
- nltk
pip3 install --user nltk
- gensim
pip3 install --user gensim
- astor
pip3 install --user astor
- punkt
python -m nltk.downloader punkt
- Git clone the repository (which contains 20 repositories from github. You can add more).
- Put the duplicate_detect_java.py script, the Reference folder, and all your group assignment together as the following screenshot:
The similarity analysis is based on the topic model TFIDF, so there will be many false positives.
The reported results cannot substitute human analysis.
Normally speaking, the duplicated code will be similar to only one file. One duplicated project will contain 20+ duplicated files.
You can change the threshold to adjust the sensitivity (I set it to 60 in my case).