Skip to content

In this project, we use deeplearning to do document summary and plagiarism detection

Notifications You must be signed in to change notification settings

abdoulfataoh/doc-summary-and-plagiarism-detection

Repository files navigation

Description

In this project, we use deep learning models for plagiarism detection and document synthesis purposes.

Config

  • Clone the project
  git clone https://github.com/abdoulfataoh/doc-summary-and-plagiarism-detection.git
  cd doc-summary-and-plagiarism-detection
  • Install poetry for virtual environment management
  sudo apt-get update
  sudo apt-get install curl
  curl -sSL https://install.python-poetry.org | python3 -
  • Install dependancies with poetry and use virtual env
  poetry install --dev
  poetry shell
  • Install Spacy and NLTK language models
  python -m spacy download en_core_web_sm
  python -m spacy download fr_core_news_sm
  python -c "import nltk;nltk.download('punkt')"
  python -c "import nltk;nltk.download('stopwords')"
  • (Optional) Use test configuration and file
  echo -n 'TEST=True' > .env
  make flake8
  make test
  • Settings variables

The configuration of the system is done through configuration variables.

export command can be used to set a variable value.

The complete settings vars cant be found at app/settings.py

For example:

  export OPENAI_API_KEY='API KEY'  # Your openai api key to interact with chatgpt model
  export TEST=True  # To enable test mode
  export WORKDIR='path/to/wordir'  # default value is 'static'
  export PLAGIARISM_TRAIN_DATASET_FOLDER='path/to/dataset'  # PDFs dataset folder path

Workflows

Plagiarism detection

  • Train train

  • Create embeddings embeddings

  • Predict predict

Summarize

  • Predict predict

Usage with streamlit

  1. (if TEST env is True) Set it to False
  rm .env
  1. Train models

Dataset must be pdf files and stored in assets/dataset/plagiarism/train/

  make train
  1. Create embeddings
  make embeddings
  1. Run the streamlit server to use the app
  make streamlit-server

streamlit

About

In this project, we use deeplearning to do document summary and plagiarism detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published