rlhf

Star

Here are 147 public repositories matching this topic...

hiyouga / LLaMA-Factory

Star

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Updated Apr 3, 2025
Python

LAION-AI / Open-Assistant

Star

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

python machine-learning ai nextjs discord-bot assistant language-model chatgpt rlhf

Updated Aug 17, 2024
Python

RUCAIBox / LLMSurvey

Star

The official GitHub page for the survey paper "A Survey of Large Language Models".

natural-language-processing pre-training pre-trained-language-models in-context-learning large-language-models llm llms chain-of-thought chatgpt rlhf instruction-tuning

Updated Mar 11, 2025
Python

ymcui / Chinese-LLaMA-Alpaca-2

Star

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

nlp yarn llama alpaca 64k large-language-models llm rlhf flash-attention llama2 llama-2 alpaca-2 alpaca2

Updated Sep 23, 2024
Python

InternLM / InternLM

Star

Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).

chatbot chinese gpt pretrained-models llm long-context rlhf large-language-model flash-attention fine-tuning-llm

Updated Feb 7, 2025
Python

huggingface / alignment-handbook

Star

Robust recipes to align language models with human and AI preferences

transformers llm rlhf

Updated Nov 21, 2024
Python

argilla-io / argilla

Star

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

nlp machine-learning natural-language-processing ai weak-supervision developer-tools active-learning annotation-tool text-annotation weakly-supervised-learning human-in-the-loop mlops text-labeling gpt-4 llm langchain rlhf

Updated Mar 24, 2025
Python

hiyouga / ChatGLM-Efficient-Tuning

Star

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

transformers pytorch lora language-model alpaca fine-tuning peft huggingface chatgpt rlhf chatglm qlora chatglm2

Updated Oct 12, 2023
Python

Kiln-AI / Kiln

Star

The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

python windows macos machine-learning ai evaluation prompt ml collaboration openai dataset-generation synthetic-data fine-tuning prompt-engineering chain-of-thought rlhf evals ollama

Updated Apr 4, 2025
Python

Docta-ai / docta

Star

A Doctor for your data

data language-model data-curation data-centric-ai data-diagnosis data-centric-machine-learning rlhf

Updated Jan 14, 2025
Python

argilla-io / distilabel

Star

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

python ai openai synthetic-data synthetic-dataset-generation huggingface llms rlhf rlaif

Updated Mar 24, 2025
Python

THUDM / WebGLM

Star

WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)

llm chatgpt rlhf webglm

Updated Mar 25, 2025
Python

PKU-Alignment / safe-rlhf

Star

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Updated Jun 13, 2024
Python

THUDM / ImageReward

Star

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

generative-model diffusion-models human-preferences rlhf

Updated Jan 24, 2025
Python

OpenLMLab / MOSS-RLHF

Star

Secrets of RLHF in Large Language Models Part I: PPO

alignment ai-safety rlhf

Updated Mar 3, 2024
Python

RLHFlow / RLHF-Reward-Modeling

Star

Recipes to train reward model for RLHF.

llm rlhf reward-models llama3

Updated Feb 9, 2025
Python

princeton-nlp / SimPO

Star

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

alignment large-language-models rlhf preference-alignment

Updated Feb 16, 2025
Python

ContextualAI / HALOs

Star

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignment ppo halos dpo kto rlhf

Updated Mar 24, 2025
Python

jerry1993-tech / Cornucopia-LLaMA-Fin-Chinese

Star

聚宝盆(Cornucopia): 中文金融系列开源可商用大模型，并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)

nlp finance qa transformers text-generation chinese llama sft large-language-models rlhf

Updated Jun 30, 2023
Python

jianzhnie / LLamaTuner

Star

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

llama ppo dpo chatgpt rlhf qlora qwen mixtral llama3

Updated Jan 24, 2025
Python

Improve this page

Add a description, image, and links to the rlhf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rlhf topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rlhf

Here are 147 public repositories matching this topic...

hiyouga / LLaMA-Factory

LAION-AI / Open-Assistant

RUCAIBox / LLMSurvey

ymcui / Chinese-LLaMA-Alpaca-2

InternLM / InternLM

huggingface / alignment-handbook

argilla-io / argilla

hiyouga / ChatGLM-Efficient-Tuning

Kiln-AI / Kiln

Docta-ai / docta

argilla-io / distilabel

THUDM / WebGLM

PKU-Alignment / safe-rlhf

THUDM / ImageReward

OpenLMLab / MOSS-RLHF

RLHFlow / RLHF-Reward-Modeling

princeton-nlp / SimPO

ContextualAI / HALOs

jerry1993-tech / Cornucopia-LLaMA-Fin-Chinese

jianzhnie / LLamaTuner

Improve this page

Add this topic to your repo