- I adapt the detection tool from https://github.com/platisd/duplicate-code-detection-tool
The following Python packages have to be installed:
- nltk
pip3 install --user nltk
- gensim
pip3 install --user gensim
- astor
pip3 install --user astor
- punkt
python -m nltk.downloader punkt
- Git clone the repository (which contains 20 repositories from github. You can add more).
- Put the duplicate_detect_java.py script, the Reference folder, and all your group assignment together as the following screenshot:
-
The similarity analysis is based on the topic model TFIDF, so there will be many false positives.
-
The reported results cannot substitute human analysis.
-
Normally speaking, the duplicated code will be similar to only one file. One duplicated project will contain 20+ duplicated files.
-
You can change the threshold to adjust the sensitivity (I set it to 60 in my case).