trl

Star

Here are 15 public repositories matching this topic...

jasonvanf / llama-trl

Star

LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA

adapter transformer llama gpt lora ppo peft trl gpt-4 chatgpt rlhf

Updated May 23, 2023
Python

argilla-io / notus

Star

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

zephyr fine-tuning dpo trl lm-alignment preference-data alignment-handbook

Updated Jan 15, 2024
Python

sugarandgugu / Simple-Trl-Training

Star

基于DPO算法微调语言大模型，简单好上手。

simple dpo trl llm rlhf

Updated Jul 3, 2024
Python

RobinSmits / Dutch-LLMs

Star

Various training, inference and validation code and results related to Open LLM's that were pretrained (full or partially) on the Dutch language.

transformers pytorch alpaca peft dpo trl large-language-models open-llama polylm qwen2

Updated Apr 9, 2024
Jupyter Notebook

ssbuild / llm_rlhf

Star

realize the reinforcement learning training for gpt2 llama bloom and so on llm model

lora reward trl llm rlhf trlx llm-rlhf

Updated Sep 19, 2023
Python

LegendLeoChen / llm-finetune

Star

使用trl、peft、transformers等库，实现对huggingface上模型的微调。

reinforcement-learning transformers rl lora peft sft huggingface trl llm rlhf qwen grpo

Updated Mar 21, 2025
Python

SharathHebbar / sft_mathgpt2

Star

Supervised Fine tuning using TRL library

decoder transformers text-generation sft gpt2 trl llm mathgpt

Updated Jan 24, 2024
Jupyter Notebook

rasyosef / phi-2-sft-and-dpo

Star

Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

transformers pytorch huggingface trl llm supervised-finetuning direct-preference-optimization

Updated Nov 27, 2024
Jupyter Notebook

pberlandier / irl-to-bal

Star

ODM: TRL to BAL rules automated translation

translation odm ruleset irl operational-decision-manager verbalization bal-rule technical-rule trl

Updated Dec 6, 2019
Java

WCoetser / Trl.TermDataRepresentation

Star

The overall aim of this project is to create a term rewriting system that could be useful in everyday programming, and to represent data in a way that roughly correspond to the definition of a term in formal logic. Terms should be familiar to any programmer because they are basically constants, variables, and function symbols.

syntax-tree term-rewriting trl term-database

Updated Dec 16, 2020
C#

SharathHebbar / dpo_chatgpt2

Star

Direct Preference Optimization of ChatGPT2 using TRL Library

decoder transformers text-generation dpo gpt2 trl llm rlhf chatgpt2

Updated Jan 24, 2024
Jupyter Notebook

rasyosef / phi-1_5-instruct

Star

Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

transformers pytorch trl llm supervised-finetuning direct-preference-optimization

Updated Aug 17, 2024

Akshint0407 / Nano-R1

Star

This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.

python transformer adapters huggingface trl safetensors text-generation-inference unsloth qwen2-5 grpo

Updated Apr 7, 2025
Jupyter Notebook

SofiaKhutsieva / LLM_experiments

Star

Эксперименты с LLM (инференс, rag, дообучение)

mistral peft rag trl llm langchain llamacpp

Updated Mar 23, 2024
Jupyter Notebook

Mikesterner87 / Nano-R1

Star

This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.

python build openwrt transformer adapters nanopi huggingface trl nanopi-r1s nanopi-r1 safetensors text-generation-inference unsloth grpo

Updated Apr 23, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the trl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the trl topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trl

Here are 15 public repositories matching this topic...

jasonvanf / llama-trl

argilla-io / notus

sugarandgugu / Simple-Trl-Training

RobinSmits / Dutch-LLMs

ssbuild / llm_rlhf

LegendLeoChen / llm-finetune

SharathHebbar / sft_mathgpt2

rasyosef / phi-2-sft-and-dpo

pberlandier / irl-to-bal

WCoetser / Trl.TermDataRepresentation

SharathHebbar / dpo_chatgpt2

rasyosef / phi-1_5-instruct

Akshint0407 / Nano-R1

SofiaKhutsieva / LLM_experiments

Mikesterner87 / Nano-R1

Improve this page

Add this topic to your repo