Skip to content

LightChaser666/CE2002_Duplication_Check

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Detection Script Usage

Dependencies

The following Python packages have to be installed:

  • nltk
    • pip3 install --user nltk
  • gensim
    • pip3 install --user gensim
  • astor
    • pip3 install --user astor
  • punkt
    • python -m nltk.downloader punkt

Usage

  • Git clone the repository (which contains 20 repositories from github. You can add more).
  • Put the duplicate_detect_java.py script, the Reference folder, and all your group assignment together as the following screenshot:

tool_usage

  • Open your IDE, modify the script to match your group names:tool_usage2

  • Run it, you will see the result

Important note

  • The similarity analysis is based on the topic model TFIDF, so there will be many false positives.

  • The reported results cannot substitute human analysis.

  • Normally speaking, the duplicated code will be similar to only one file. One duplicated project will contain 20+ duplicated files.

  • You can change the threshold to adjust the sensitivity (I set it to 60 in my case).

About

Checking tool for CE2002 Final Group Assignment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published