Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
-
Updated
Sep 5, 2024 - Jupyter Notebook
Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)
EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.
EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.
Add a description, image, and links to the direct-preference-optimization topic page so that developers can more easily learn about it.
To associate your repository with the direct-preference-optimization topic, visit your repo's landing page and select "manage topics."