Skip to content

Commit

Permalink
Merge pull request #8 from defog-ai/rishabh/hf-testing
Browse files Browse the repository at this point in the history
Added support for HuggingFace models, added NER embeddings, and refactored code
  • Loading branch information
rishsriv committed Aug 16, 2023
2 parents b4f4989 + 488e6ea commit 4af0195
Show file tree
Hide file tree
Showing 17 changed files with 1,083 additions and 206 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
data/postgres
data/ner_metadata.pkl
data/embeddings.pkl

# pycache
**/__pycache__/
Expand Down
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,25 +54,42 @@ The data for importing is already in the exported sql dumps in the `data/export`

### Query Generator

To test your own query generator with our framework, you would need to extend `QueryGenerator` and implement the `generate_query` method returning the query of interest. We create a new class for each question/query pair to isolate each pair's runtime state against the others when running concurrently. You can see a sample `OpenAIChatQueryGenerator` in `query_generators/openai.py` implementing it and using a simple prompt to send a message over to openai's chat api. Feel free to extend it for your own use.
To test your own query generator with our framework, you would need to extend `QueryGenerator` and implement the `generate_query` method returning the query of interest. We create a new class for each question/query pair to isolate each pair's runtime state against the others when running concurrently. You can see a sample `OpenAIQueryGenerator` in `query_generators/openai.py` implementing it and using a simple prompt to send a message over to openai's api. Feel free to extend it for your own use.

### Running the test

#### OpenAI
To test it out with just 1 question (instead of all 175):

```bash
mkdir results # create directory for storing results
python main.py \
-q data/questions_gen.csv \
-o results/my_query_generator.csv \
-g oa_chat \
-f query_generators/prompts/sample_chat_prompt.yaml \
-g oa \
-f query_generators/prompts/prompt.md \
-m gpt-3.5-turbo-0613 \
-n 1 \
-p 1 \
-v
```

#### HuggingFace
To test it out with just 10 questions (instead of all 175):

```bash
mkdir results #create directory for storing results

# use the -W option to ignore warnings about sequential use of transformers pipeline
python -W ignore main.py \
-q data/questions_gen.csv \
-o results/results.csv \
-g hf \
-f query_generators/prompts/prompt.md \
-m defog/starcoder-finetune-v3 \
-n 10
```

You can explore the results generated and aggregated the various metrics that you care about to understand your query generator's performance. Happy iterating!

## Misc
Expand Down
Loading

0 comments on commit 4af0195

Please sign in to comment.