LLaMA-O1: Open Large Reasoning Model Frameworks For Training, Inference and Evaluation With PyTorch and HuggingFace
Towards Open-Source Large Reasoning Models
The first version of LLaMA-O1 has been uploaded to HF now! Here We Come!
Supervised:
https://huggingface.co/SimpleBerry/LLaMA-O1-Supervised-1129
Base(Pretrain):
https://huggingface.co/SimpleBerry/LLaMA-O1-Base-1127
Supervised Finetune Dataset:
https://huggingface.co/datasets/SimpleBerry/OpenLongCoT-SFT
Pretraining Dataset:
https://huggingface.co/datasets/SimpleBerry/OpenLongCoT-Pretrain-1202
RLHF is on the way! View our GitHub Repo:
https://github.com/SimpleBerry/LLaMA-O1
Our ongoing related researches:
https://huggingface.co/papers/2406.07394
https://huggingface.co/papers/2410.02884
https://huggingface.co/papers/2411.18203
GGUF:https://huggingface.co/Lyte/LLaMA-O1-Supervised-1129-Q4_K_M-GGUF
Online Demo (CPU-only): https://huggingface.co/spaces/SimpleBerry/LLaMA-O1-Supervised-1129-Demo
- Marked Language of Long CoT (Done)
- Pretrain Dataset (Done)
- Supervised Dataset (Done)
- PRM token rectifcation Dataset (Done)
- Reinforcement Learning With Self-Play (Codes done, training)
- Inference-time Reasoning Enhancement Frameworks (Codes done, Temporarily postponed)