llm-as-evaluator

Star

Here are 13 public repositories matching this topic...

prometheus-eval / prometheus-eval

Star

Evaluate your LLM's response with Prometheus and GPT4 💯

python evaluation gpt4 llm llmops vllm litellm llm-as-a-judge llm-as-evaluator

Updated Apr 25, 2025
Python

Pacific-AI-Corp / langtest

Star

Deliver safe & effective language models

nlp artificial-intelligence benchmarks benchmark-framework model-assessment ai-safety mlops responsible-ai ml-safety trustworthy-ai ethics-in-ai ml-testing large-language-models llm ai-testing llm-test llm-evaluation-toolkit llm-as-evaluator llm-testing

Updated Sep 27, 2025
Python

IAAR-Shanghai / xFinder

Star

[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

Updated Feb 26, 2025
Python

KID-22 / LLM-IR-Bias-Fairness-Survey

Star

This is the repo for the survey of Bias and Fairness in IR with LLMs.

information-retrieval recommender-systems bias ir fairness large-language-models llm chatgpt llm4rec llm4rs llm-as-a-judge llm-as-evaluator llm4ir

Updated Sep 4, 2025

zhaochen0110 / Timo

Star

Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)

temporal-reasoning sota-model llms rlhf rlaif llm-as-a-judge llm-as-evaluator self-critic-framework colm2024

Updated Oct 23, 2024
Python

minnesotanlp / cobbler

Star

Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"

nlp evaluation bias bias-detection llm llms llm-evaluation llms-benchmarking llm-as-judge llm-as-a-judge llm-as-evaluator

Updated Feb 16, 2024
Jupyter Notebook

HillPhelmuth / LlmAsJudgeEvalPlugins

Star

LLM-as-judge evals as Semantic Kernel Plugins

semantickernel llm-evaluation llm-as-a-judge llm-as-evaluator

Updated Aug 22, 2025
C#

djokester / groqeval

Star

Use groq for evaluations

groq llm generative-ai mixtral llm-as-a-judge llm-as-evaluator llama3

Updated Jul 11, 2024
Python

Kakz / prometheus-llm

Star

PrometheusLLM is a unique transformer architecture inspired by dignity and recursion. This project aims to explore new frontiers in AI research and welcomes contributions from the community. 🐙🌟

deep-learning mcp evaluation pipelines tracing language-model self-organization cognitive-architecture hermeneutics philosophy-of-mind gpt4 llm llmops ollama litellm llm-as-evaluator autopoietic-systems prompt-logging

Updated Oct 2, 2025
Python

trustyai-explainability / vllm_judge

Star

A tiny, lightweight library for LLM-as-a-Judge evaluations on vLLM-hosted models.

evaluation-metrics llmops llm-evaluation llm-as-a-judge llm-as-evaluator

Updated Sep 29, 2025
Python

Non-NeutralZero / LLM-EvalSys

Star

Automated evaluation of llm generated responses on aws

aws llmops llm-as-a-judge llm-as-evaluator

Updated Apr 11, 2025
Python

rafaelsandroni / antibodies

Star

Antibodies for LLMs hallucinations (grouping LLM as a judge, NLI, reward models)

python nli hallucinations llms hallucination-detection llm-as-a-judge llm-as-evaluator

Updated Jun 13, 2024
Python

LaurentVeyssier / Agentic-workflow-for-project-management

Star

project from UDACITY second module 'agentic workflow' as part of Agentic AI Nanodegree

python agentic llm-as-evaluator agentic-workflow

Updated Aug 16, 2025
Python

Improve this page

Add a description, image, and links to the llm-as-evaluator topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-as-evaluator topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-as-evaluator

Here are 13 public repositories matching this topic...

prometheus-eval / prometheus-eval

Pacific-AI-Corp / langtest

IAAR-Shanghai / xFinder

KID-22 / LLM-IR-Bias-Fairness-Survey

zhaochen0110 / Timo

minnesotanlp / cobbler

HillPhelmuth / LlmAsJudgeEvalPlugins

djokester / groqeval

Kakz / prometheus-llm

trustyai-explainability / vllm_judge

Non-NeutralZero / LLM-EvalSys

rafaelsandroni / antibodies

LaurentVeyssier / Agentic-workflow-for-project-management

Improve this page

Add this topic to your repo