benchmark

Here are 1,111 public repositories matching this topic...

zalandoresearch / fashion-mnist

A MNIST-like fashion product database. Benchmark 👇

benchmark machine-learning computer-vision deep-learning fashion dataset gan mnist convolutional-neural-networks zalando fashion-mnist

Updated Jun 13, 2022
Python

open-mmlab / mmpose

Star

OpenMMLab Pose Estimation Toolbox and Benchmark.

Updated Aug 7, 2024
Python

erikbern / ann-benchmarks

Star

Benchmarks of approximate nearest neighbor libraries in Python

docker benchmark nearest-neighbors

Updated Dec 23, 2024
Python

open-compass / opencompass

Star

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

benchmark evaluation openai llm chatgpt large-language-model llama2 llama3

Updated Jan 10, 2025
Python

open-mmlab / mmaction2

Star

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

benchmark deep-learning pytorch ava x3d action-recognition video-understanding video-classification tsm non-local i3d tsn slowfast temporal-action-localization spatial-temporal-action-detection openmmlab posec3d uniformerv2

Updated Aug 14, 2024
Python

baichuan-inc / Baichuan2

Star

A series of large language models developed by Baichuan Intelligent Technology

benchmark natural-language-processing artificial-intelligence chinese gpt huggingface ceval gpt-4 large-language-models chatgpt mmlu llama2

Updated Nov 8, 2024
Python

CLUEbenchmark / CLUE

Star

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

benchmark tensorflow nlu glue corpus transformers pytorch dataset chinese pretrained-models language-model albert bert roberta chineseglue

Updated May 23, 2024
Python

MichaelGrupp / evo

Star

Python package for the evaluation of odometry and SLAM

benchmark robotics tum mapping metrics evaluation ros slam trajectory-analysis odometry trajectory ros2 kitti euroc trajectory-evaluation

Updated Dec 15, 2024
Python

baichuan-inc / Baichuan-13B

Star

A 13B large language model developed by Baichuan Intelligent Technology

benchmark natural-language-processing artificial-intelligence chinese huggingface ceval gpt-4 large-language-models chatgpt mmlu

Updated Sep 6, 2023
Python

microsoft / promptbench

Star

A unified evaluation framework for large language models

benchmark evaluation prompt robustness adversarial-attacks large-language-models prompt-engineering chatgpt

Updated Oct 28, 2024
Python

swe-bench / SWE-bench

Star

[ICLR 2024] SWE-bench: Can Language Models Resolve Real-world Github Issues?

benchmark software-engineering language-model

Updated Jan 11, 2025
Python

beir-cellar / beir

Star

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

nlp elasticsearch benchmark information-retrieval deep-learning retrieval pytorch dataset bert dpr passage-retrieval question-generation sentence-transformers sbert zero-shot-retrieval colbert retrieval-models ance use-qa

Updated Jul 28, 2024
Python

logpai / logparser

Star

A machine learning toolkit for log parsing [ICSE'19, DSN'16]

benchmark log-analysis log log-parser log-mining anomaly-detection log-parsing

Updated Jan 28, 2024
Python

mlcommons / training

Star

Reference implementations of MLPerf™ training benchmarks

benchmark machine-learning

Updated Jan 6, 2025
Python

RUC-NLPIR / FlashRAG

Star

⚡FlashRAG: A Python Toolkit for Efficient RAG Research

benchmark datasets large-language-models retrieval-augmented-generation

Updated Jan 11, 2025
Python

OpenGVLab / InternVideo

Star

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Updated Dec 11, 2024
Python

xlang-ai / OSWorld

Star

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

agent cli benchmark natural-language-processing gui reinforcement-learning artificial-intelligence code-generation language-model vlm rpa multimodal llm large-action-model

Updated Dec 20, 2024
Python

IntelLabs / fastRAG

Star

Efficient Retrieval Augmentation and Generation Framework

nlp benchmark information-retrieval transformers knowledge-graph question-answering summarization multi-modal semantic-search diffusion sentence-transformers colbert llm generative-ai

Updated Jan 9, 2025
Python

cheind / py-motmetrics

Star

📊 Benchmark multiple object trackers (MOT) in Python

tracker benchmark metrics object-detection object-tracking mot clear-mot-metrics mot-challenge

Updated Oct 30, 2024
Python

evalplus / evalplus

Star

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

testing benchmark efficiency program-synthesis gpt-4 large-language-models chatgpt

Updated Jan 6, 2025
Python

Improve this page

Add a description, image, and links to the benchmark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the benchmark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark

Here are 1,111 public repositories matching this topic...

zalandoresearch / fashion-mnist

open-mmlab / mmpose

erikbern / ann-benchmarks

open-compass / opencompass

open-mmlab / mmaction2

baichuan-inc / Baichuan2

CLUEbenchmark / CLUE

MichaelGrupp / evo

baichuan-inc / Baichuan-13B

microsoft / promptbench

swe-bench / SWE-bench

beir-cellar / beir

logpai / logparser

mlcommons / training

RUC-NLPIR / FlashRAG

OpenGVLab / InternVideo

xlang-ai / OSWorld

IntelLabs / fastRAG

cheind / py-motmetrics

evalplus / evalplus

Improve this page

Add this topic to your repo