ai-benchmarks

Star

Here are 4 public repositories matching this topic...

scicode-bench / SciCode

Star

A benchmark that challenges language models to code solutions for scientific problems

benchmark ai ai-benchmarks llm

Updated Sep 29, 2025
Python

SS47816 / AGI-Elo

Star

AGI-Elo: How Far Are We From Mastering A Task?

benchmark leaderboard agi imagenet coco artificial-general-intelligence datasets evaluation-metrics elo-rating rating-system evaluation-framework sota ai-benchmarks waymo-open-dataset mmlu vision-language-action ai-evaluation-framework livecodebench navsim

Updated May 21, 2025
Python

scriptstar / vector-db-benchmark

Star

A production-grade benchmarking suite that evaluates vector databases (Qdrant, Milvus, Weaviate, ChromaDB, Pinecone, SQLite, TopK) for music semantic search applications. Features automated performance testing, statistical analysis across 15-20 iterations, real-time web UI for database comparison, and comprehensive reporting with production.

Updated Aug 27, 2025
Python

Paraskevi-KIvroglou / Hackathon-LlamaEval

Star

LlamaEval is a rapid prototype developed during a hackathon to provide a user-friendly dashboard for evaluating and comparing Llama models using the TogetherAI API.

evaluation-metrics streamlit ai-benchmarks llms togetherai llms-benchmarking llama3

Updated Nov 10, 2024
Python

Improve this page

Add a description, image, and links to the ai-benchmarks topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-benchmarks topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-benchmarks

Here are 4 public repositories matching this topic...

scicode-bench / SciCode

SS47816 / AGI-Elo

scriptstar / vector-db-benchmark

Paraskevi-KIvroglou / Hackathon-LlamaEval

Improve this page

Add this topic to your repo