Chain-of-Hindsight, A Scalable RLHF Method
-
Updated
Sep 30, 2023 - Python
Chain-of-Hindsight, A Scalable RLHF Method
ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
Code for the KDD-2023 paper: Neural-Hidden-CRF: A Robust Weakly-Supervised Sequence Labeler
A curated list of awesome Weak-Supervision-Sequence-Labeling (WSSL) papers, methods & resources.
Learning Behaviors with Uncertain Human Feedback using Speech Recognition
Add a description, image, and links to the learning-from-human-feedback topic page so that developers can more easily learn about it.
To associate your repository with the learning-from-human-feedback topic, visit your repo's landing page and select "manage topics."