Skip to content

ComplexData-MILA/BluePrint

Repository files navigation

BluePrint: A Social Media User Dataset for LLM Persona Evaluation and Training

BluePrint dataset creation process

Clustering

The clustering folder contains code for pre-processing, computing embeddings, and clustering.

Finetuning

Finetuning code can be found at the finetuning folder.

Evaluating

Code for evaluations can be found at the evaluating folder.

PII Removal

PII removal code can be found at the submodule bluesky_persona_pii.

Scratch

The scratch folder contains a placeholder dataset to demonstrate how the pipeline works. It is also used for various files created during execution of the pipeline.

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •