Generative Artificial Intelligence Research Lab (GAIR)

All

29 repositories

O1-Journey
Public
O1 Replication Journey
57•1.9k•14•0•Updated Jan 14, 2025Jan 14, 2025
PC-Agent
Public
PC Agent: While You Sleep, AI Works - A Cognitive Journey into Digital World
Python
•
MIT License
•12•157•0•0•Updated Dec 25, 2024Dec 25, 2024
ReasonEval
Public
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
Python
•2•44•1•0•Updated Dec 15, 2024Dec 15, 2024
OlympicArena
Public
This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"
JavaScript
•4•90•0•0•Updated Dec 15, 2024Dec 15, 2024
SimulateBench
Public
GPT as Human
Python
•2•18•0•0•Updated Dec 11, 2024Dec 11, 2024
MathPile
Public
[NeurlPS D&B 2024] Generative AI for Math: MathPile
math corpus language-model pre-training large-language-models
Python
•
Apache License 2.0
•21•401•1•0•Updated Oct 27, 2024Oct 27, 2024
ProX
Public
Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"
llama data-quality mistral pre-training continual neural-symbolic data-centric-ai llm continual-pre-training
Python
•
Apache License 2.0
•15•208•2•0•Updated Oct 16, 2024Oct 16, 2024
walnut-plan
Public
The Walnut Plan
0•11•0•0•Updated Oct 10, 2024Oct 10, 2024
OpenResearcher
Public
OpenResearcher, an advanced Scientific Research Assistant
HTML
•
Apache License 2.0
•32•417•1•2•Updated Oct 10, 2024Oct 10, 2024
math-evaluation-harness
Public
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨
Python
•
MIT License
•12•2•0•0•Updated Oct 6, 2024Oct 6, 2024
ReAlign
Public
Reformatted Alignment
nlp natural-language-processing alignment large-language-models llms generative-ai
JavaScript
•7•113•0•0•Updated Sep 23, 2024Sep 23, 2024
weak-to-strong-reasoning
Public
Python
•3•56•1•0•Updated Sep 2, 2024Sep 2, 2024
factool
Public
FacTool: Factuality Detection in Generative AI
python natural-language-processing fact-checking large-language-models generative-ai chatgpt
Python
•
Apache License 2.0
•64•847•19•3•Updated Aug 19, 2024Aug 19, 2024
BeHonest
Public
BeHonest: Benchmarking Honesty in Large Language Models
nlp benchmark evaluation alignment honesty llm
JavaScript
•0•31•0•0•Updated Aug 15, 2024Aug 15, 2024
anole
Public
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
Python
•36•708•28•1•Updated Aug 5, 2024Aug 5, 2024
Safety-J
Public
Safety-J: Evaluating Safety with Critique
JavaScript
•1•16•0•0•Updated Jul 28, 2024Jul 28, 2024
MoPS
Public
[ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"
Jupyter Notebook
•1•33•0•0•Updated Jul 19, 2024Jul 19, 2024
self-improvement-reversal
Public
JavaScript
•0•13•0•0•Updated Jul 14, 2024Jul 14, 2024
MetaCritique
Public
Evaluate the Quality of Critique
Python
•
Apache License 2.0
•0•35•0•0•Updated Jun 1, 2024Jun 1, 2024
alignment-for-honesty
Public
Python
•2•71•0•0•Updated May 22, 2024May 22, 2024
benbench
Public
Benchmarking Benchmark Leakage in Large Language Models
dataset benchmarks leakage-detection large-language-models
JavaScript
•3•47•4•0•Updated May 20, 2024May 20, 2024
Preference-Dissection
Public
Python
•2•24•0•0•Updated May 16, 2024May 16, 2024
cs2916
Public
Python
•8•21•0•0•Updated May 12, 2024May 12, 2024
OPO
Public
Python
•6•48•2•0•Updated Mar 2, 2024Mar 2, 2024
scaleeval
Public
Scalable Meta-Evaluation of LLMs as Evaluators
nlp evaluation-framework llm generative-ai
Python
•3•42•1•0•Updated Feb 15, 2024Feb 15, 2024
Entropy-ABF
Public
Official implementation for 'Extending LLMs’ Context Window with 100 Samples'
Python
•3•76•2•0•Updated Jan 18, 2024Jan 18, 2024
auto-j
Public
Generative Judge for Evaluating Alignment
Python
•15•223•6•0•Updated Jan 18, 2024Jan 18, 2024
abel
Public
SOTA Math Opensource LLM
math llm generative-ai
Python
•19•329•10•0•Updated Dec 12, 2023Dec 12, 2023
ChineseFactEval
Public
JavaScript
•0•2•0•0•Updated Sep 13, 2023Sep 13, 2023