llm-rlhf

Here are 3 public repositories matching this topic...

ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates

realize the reinforcement learning training for gpt2 llama bloom and so on llm model

lora reward trl llm rlhf trlx llm-rlhf

ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates

Add a description, image, and links to the llm-rlhf topic page so that developers can more easily learn about it.

To associate your repository with the llm-rlhf topic, visit your repo's landing page and select "manage topics."