The clustering folder contains code for pre-processing, computing embeddings, and clustering.
Finetuning code can be found at the finetuning folder.
Code for evaluations can be found at the evaluating folder.
PII removal code can be found at the submodule bluesky_persona_pii.
The scratch folder contains a placeholder dataset to demonstrate how the pipeline works. It is also used for various files created during execution of the pipeline.
MIT License