GitHub - abdoulfataoh/doc-summary-and-plagiarism-detection: In this project, we use deeplearning to do document summary and plagiarism detection

Description

In this project, we use deep learning models for plagiarism detection and document synthesis purposes.

Config

Clone the project

  git clone https://github.com/abdoulfataoh/doc-summary-and-plagiarism-detection.git
  cd doc-summary-and-plagiarism-detection

Install poetry for virtual environment management

  sudo apt-get update
  sudo apt-get install curl
  curl -sSL https://install.python-poetry.org | python3 -

Install dependancies with poetry and use virtual env

  poetry install --dev
  poetry shell

Install Spacy and NLTK language models

  python -m spacy download en_core_web_sm
  python -m spacy download fr_core_news_sm
  python -c "import nltk;nltk.download('punkt')"
  python -c "import nltk;nltk.download('stopwords')"

(Optional) Use test configuration and file

  echo -n 'TEST=True' > .env
  make flake8
  make test

Settings variables

The configuration of the system is done through configuration variables.

export command can be used to set a variable value.

The complete settings vars cant be found at app/settings.py

For example:

  export OPENAI_API_KEY='API KEY'  # Your openai api key to interact with chatgpt model
  export TEST=True  # To enable test mode
  export WORKDIR='path/to/wordir'  # default value is 'static'
  export PLAGIARISM_TRAIN_DATASET_FOLDER='path/to/dataset'  # PDFs dataset folder path

Workflows

Plagiarism detection

Train
Create embeddings
Predict

Summarize

Predict

Usage with streamlit

(if TEST env is True) Set it to False

  rm .env

Train models

Dataset must be pdf files and stored in assets/dataset/plagiarism/train/

  make train

Create embeddings

  make embeddings

Run the streamlit server to use the app

  make streamlit-server

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
app		app
assets/dataset		assets/dataset
docs		docs
pages		pages
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
embeddings.py		embeddings.py
evaluation.py		evaluation.py
poetry.lock		poetry.lock
predict.py		predict.py
pyproject.toml		pyproject.toml
report.pdf		report.pdf
statistic.py		statistic.py
streamlit.py		streamlit.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Config

Workflows

Plagiarism detection

Summarize

Usage with streamlit

About

Releases

Packages

Languages

abdoulfataoh/doc-summary-and-plagiarism-detection

Folders and files

Latest commit

History

Repository files navigation

Description

Config

Workflows

Plagiarism detection

Summarize

Usage with streamlit

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages