Merge pull request #8 from defog-ai/rishabh/hf-testing

Added support for HuggingFace models, added NER embeddings, and refactored code
defog-ai · Aug 16, 2023 · 4af0195 · 4af0195
2 parents b4f4989 + 488e6ea
commit 4af0195
Show file tree

Hide file tree

Showing 17 changed files with 1,083 additions and 206 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,6 @@
 data/postgres
+data/ner_metadata.pkl
+data/embeddings.pkl
 
 # pycache
 **/__pycache__/

diff --git a/README.md b/README.md
@@ -54,25 +54,42 @@ The data for importing is already in the exported sql dumps in the `data/export`
 
 ### Query Generator
 
-To test your own query generator with our framework, you would need to extend `QueryGenerator` and implement the `generate_query` method returning the query of interest. We create a new class for each question/query pair to isolate each pair's runtime state against the others when running concurrently. You can see a sample `OpenAIChatQueryGenerator` in `query_generators/openai.py` implementing it and using a simple prompt to send a message over to openai's chat api. Feel free to extend it for your own use.
+To test your own query generator with our framework, you would need to extend `QueryGenerator` and implement the `generate_query` method returning the query of interest. We create a new class for each question/query pair to isolate each pair's runtime state against the others when running concurrently. You can see a sample `OpenAIQueryGenerator` in `query_generators/openai.py` implementing it and using a simple prompt to send a message over to openai's api. Feel free to extend it for your own use.
 
 ### Running the test
 
+#### OpenAI
 To test it out with just 1 question (instead of all 175):
 
 ```bash
 mkdir results # create directory for storing results
 python main.py \
   -q data/questions_gen.csv \
   -o results/my_query_generator.csv \
-  -g oa_chat \
-  -f query_generators/prompts/sample_chat_prompt.yaml \
+  -g oa \
+  -f query_generators/prompts/prompt.md \
   -m gpt-3.5-turbo-0613 \
   -n 1 \
   -p 1 \
   -v
 ```
 
+#### HuggingFace
+To test it out with just 10 questions (instead of all 175):
+
+```bash
+mkdir results #create directory for storing results
+
+# use the -W option to ignore warnings about sequential use of transformers pipeline
+python -W ignore main.py \
+  -q data/questions_gen.csv \
+  -o results/results.csv \
+  -g hf \
+  -f query_generators/prompts/prompt.md \
+  -m defog/starcoder-finetune-v3 \
+  -n 10
+```
+
 You can explore the results generated and aggregated the various metrics that you care about to understand your query generator's performance. Happy iterating!
 
 ## Misc