Skip to content

RishabD/Doctor-Assistant

Repository files navigation

Doctor-Assistant

Setup

To setup this project, you need to use Python 3.12 or higher. You can either install dependencies via requirements.txt or using Poetry and the provided poetry.lock file.

Additionally, you will need to have Nvidia CUDA version 12.1 installed on the system.

Server

Then, create a .env file using the structure shown in .env.example. The FAISS_STORAGE_DIR is where the indexing files for the FAISS Vector Store will be saved. The KB_DIR is where you place text files with information you want to be indexed. For the purpose of this project, the textbooks provided in this repository were used.

After creating the directories and updating the .env file, run python -m scripts.create_faiss_index (make sure you activate it first)

Lastly, you can start the server using fastapi dev .\server.py --no-reload --port 8000. There is another repository with a React based web UI that can be used to interface with this server.

Scripts

In the scripts directory, there are a couple of scripts that can be run for setup and evaluation of the various RAG implementations. To use the scripts, a .env.scripts file must be created and configured. All fields necessary are documented in the .env.scripts.example file.

Scripts should be run in module mode (python -m ...).The scripts are:

  • scripts.create_faiss_index.py will create an FAISS index using files from the KB_DIR and store the data in the FAISS_STORIAGE_DIR. The FAISS index is creating using the CHUNK_SIZE and CHUNK_OVERLAP parameters in the .env.scripts file.
  • scripts.create_test_output.py runs all RAG methods over the question answer test set (subset of the MedQuAD dataset) from the TEST_INPUT_DIR. The generated outputs are stored in the TEST_OUTPUT_DIR.
  • scripts.create_test_score.py calculates BERTScore and cosine similarity between the generated response of each RAG method and the expected response. This script reads generated output from theTEST_OUTPUT_DIR and stores the scores in the TEST_OUTPUT_DIR.
  • scripts.create_text_mult_output.py will read the multiple choice test set (subset of the MedQA) from the TEST_INPUT_DIR and run all RAG methods over it. Results are stored in the TEST_OUTPUT_DIR

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published