Merge pull request #194 from alan-turing-institute/refactor

Refactor
alan-turing-institute · Jun 13, 2024 · 66e1ae2 · 66e1ae2
2 parents 9c32d82 + 2da4a62
commit 66e1ae2
Show file tree

Hide file tree

Showing 42 changed files with 2,001 additions and 3,367 deletions.
diff --git a/ENVIRONMENT_VARIABLES.md b/ENVIRONMENT_VARIABLES.md
@@ -1,9 +1,9 @@
 ## Environment variables for running Reginald
 
-To set up the Reginald app (which consists of _both_ the full response engine along with the Slack bot), you can use the `reginald_run` on the terminal. To see the CLI arguments, you can simply run:
+To set up the Reginald app (which consists of _both_ the full response engine along with the Slack bot), you can use the `reginald run_all` on the terminal. To see the CLI arguments, you can simply run:
 
 ```bash
-reginald_run --help
+reginald run_all --help
 ```
 
 **Note**: specifying CLI arguments will override any environment variables set.
@@ -33,7 +33,7 @@ For creating a data index, you must set the GitHub token environment variable `G
 
 ### Model environment variables
 
-Lastly, to avoid using CLI variables and be able to simply use `reginald_run`, you can also set the following variables too:
+Lastly, to avoid using CLI variables and be able to simply use `reginald run_all`, you can also set the following variables too:
 
 - `REGINALD_MODEL`: name of model to use (see the [models README](MODELS.md)) for the list of models available
 - `REGINALD_MODEL_NAME`: name of sub-model to use with the one requested if not using `hello` model.
@@ -59,24 +59,24 @@ source .env
 
 ## Environment variables for running _only_ the response engine
 
-To set up the Reginald response engine (without the Slack bot), you can use the `reginald_run_engine` on the terminal. To see the CLI arguments, you can simply run:
+To set up the Reginald response engine (without the Slack bot), you can use the `reginald run_all_engine` on the terminal. To see the CLI arguments, you can simply run:
 
 ```bash
-reginald_run_api_llm --help
+reginald run_all_api_llm --help
 ```
 
-The CLI arguments are largely the same as `reginald_run` except that the Slack bot tokens are not required (as they will be used to set up the Slack bot which will call the response engine via an API that is set up using `reginald_run_api_llm`). You can also use the same environment variables as `reginald_run` except for the Slack bot tokens.
+The CLI arguments are largely the same as `reginald run_all` except that the Slack bot tokens are not required (as they will be used to set up the Slack bot which will call the response engine via an API that is set up using `reginald run_all_api_llm`). You can also use the same environment variables as `reginald run_all` except for the Slack bot tokens.
 
-You can still use the same `.env` file that you used for `reginald_run` to set up the environment variables or choose to have a separate `.response_engine_env` file to store the environment variables required for the response engine set up.
+You can still use the same `.env` file that you used for `reginald run_all` to set up the environment variables or choose to have a separate `.response_engine_env` file to store the environment variables required for the response engine set up.
 
 ## Environment variables for running _only_ the Slack-bot
 
-To set up the Reginald Slack bot (without the response engine), you can use the `reginald_run_api_bot` on the terminal. To see the CLI arguments, you can simply run:
+To set up the Reginald Slack bot (without the response engine), you can use the `reginald run_all_api_bot` on the terminal. To see the CLI arguments, you can simply run:
 
 ```bash
-reginald_run_api_bot --help
+reginald run_all_api_bot --help
 ```
 
 This command takes in an emoji to respond with and will set up a Slack bot that responds with the specified emoji (by default, this is the :rocket: emoji if no emoji is specified). You can also set an environment variable for the emoji to respond with using `REGINALD_EMOJI`.
 
-You can use the same `.env` file that you used for `reginald_run` to set up the environment variables or choose to have a separate `.slack_bot_env` file to store the environment variables required for the Slack bot set up. This must include the Slack bot tokens.
+You can use the same `.env` file that you used for `reginald run_all` to set up the environment variables or choose to have a separate `.slack_bot_env` file to store the environment variables required for the Slack bot set up. This must include the Slack bot tokens.
diff --git a/MODELS.md b/MODELS.md
@@ -32,7 +32,7 @@ When running the Reginald Slack bot, you can specify which data index to use usi
 - `public`: builds an index with the all the public data listed above
 - `all_data`: builds an index with all the data listed above including data from our private repo
 
-Once a data index has been built, it will be saved in the `data` directory specified in the `reginald_run` (or `reginald_run_api_llm`) CLI arguments or the `LLAMA_INDEX_DATA_DIR` environment variable. If you want to force a new index to be built, you can use the `--force-new-index` or `-f` flag, or you can set the `LLAMA_INDEX_FORCE_NEW_INDEX` environment variable to `True`.
+Once a data index has been built, it will be saved in the `data` directory specified in the `reginald run_all` (or `reginald run_all_api_llm`) CLI arguments or the `LLAMA_INDEX_DATA_DIR` environment variable. If you want to force a new index to be built, you can use the `--force-new-index` or `-f` flag, or you can set the `LLAMA_INDEX_FORCE_NEW_INDEX` environment variable to `True`.
 
 There are several options of the LLM to use with the `llama-index` models, some of which we have implemented in this library and which we discuss below.
 
@@ -45,7 +45,7 @@ We have two models which involve hosting the LLM ourselves and using the `llama-
 This model uses the [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) library to host a quantised LLM. In our case, we have been using quantised versions of Meta's Llama-2 model uploaded by [TheBloke](https://huggingface.co/TheBloke) on Huggingface's model hub. An example of running this model locally is:
 
 ```bash
-reginald_run \
+reginald run_all \
   --model llama-index-llama-cpp \
   --model-name https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf \
   --mode chat \
@@ -64,7 +64,7 @@ Running this command requires about 7GB of RAM. We were able to run this on our
 If you wish to download the quantised model (as a `.gguf` file) and host it yourself, you can do so by passing the file name to the `--model-name` argument and using the `--is-path` flag (alternatively, you can re-run the above but first set the environment variable `LLAMA_INDEX_IS_PATH` to `True`):
 
 ```bash
-reginald_run \
+reginald run_all \
   --model llama-index-llama-cpp \
   --model-name gguf_models/llama-2-7b-chat.Q4_K_M.gguf \
   --is-path \
@@ -82,7 +82,7 @@ given that the `llama-2-7b-chat.Q4_K_M.gguf` file is in a `gguf_models` director
 This model uses an LLM from [Huggingface](https://huggingface.co/models) to generate a response. An example of running this model locally is:
 
 ```bash
-reginald_run \
+reginald run_all \
   --model llama-index-hf \
   --model-name microsoft/phi-1_5 \
   --mode chat \
@@ -107,7 +107,7 @@ To use this model, you must set the following environment variables:
 An example of running this model locally is:
 
 ```bash
-reginald_run \
+reginald run_all \
   --model llama-index-gpt-azure \
   --model-name "reginald-gpt35-turbo" \
   --mode chat \
@@ -124,7 +124,7 @@ To use this model, you must set the `OPENAI_API_KEY` environment variable and se
 An example of running this model locally is:
 
 ```bash
-reginald_run \
+reginald run_all \
   --model llama-index-gpt-openai \
   --model-name "gpt-3.5-turbo" \
   --mode chat \
@@ -145,7 +145,7 @@ To use this model, you must set the following environment variables:
 An example of running this model locally is:
 
 ```bash
-reginald_run \
+reginald run_all \
   --model chat-completion-azure \
   --model-name "reginald-curie"
 ```
@@ -160,7 +160,7 @@ To use this model, you must set the `OPENAI_API_KEY` environment variable and se
 An example of running this model locally is:
 
 ```bash
-reginald_run \
+reginald run_all \
   --model chat-completion-openai \
   --model-name "gpt-3.5-turbo"
 ```
diff --git a/azure/README.md b/azure/README.md
@@ -48,6 +48,6 @@ You will need to source this file before deploying in the next step.
 4. Deploy with Pulumi
 
 ```bash
-> source .secrets (if this exists)
-> AZURE_KEYVAULT_AUTH_VIA_CLI=true pulumi up
+source .secrets
+AZURE_KEYVAULT_AUTH_VIA_CLI=true pulumi up
 ```
diff --git a/data/llama_index_indices/handbook/default__vector_store.json b/data/llama_index_indices/handbook/default__vector_store.json
diff --git a/data/llama_index_indices/handbook/docstore.json b/data/llama_index_indices/handbook/docstore.json
diff --git a/data/llama_index_indices/handbook/image__vector_store.json b/data/llama_index_indices/handbook/image__vector_store.json
@@ -0,0 +1 @@
+{"embedding_dict": {}, "text_id_to_ref_doc_id": {}, "metadata_dict": {}}
diff --git a/data/llama_index_indices/handbook/index_store.json b/data/llama_index_indices/handbook/index_store.json