N-Compariw: End-to-End Workflow for Neural Networks Comparison
-
Updated
Oct 2, 2021
N-Compariw: End-to-End Workflow for Neural Networks Comparison
A hybrid search engine based on the BM25 and VSM retrieval models.
Large language model evaluation framework for logic and open-ended Q&A with a vareity of RAG and other contextual information sources.
A simple template module for evaluating user/runtime-unknown value expressions in a safe manner, using Python's 'eval'.
A program to automate testing open source LLMs for their political compass scores
Frontier papers in the evaluation methodologies of language models.
This is the accompanying repo of the NeurIPS '24 D&B Spotlight paper, PertEval, including code, data, and main results.
An experimental information retrieval framework and a workbench for innovation in entity-oriented search.
Web-Interface for the evaluation of the different GDSC entries.
Homebrew tap for vivaria, METR's AI evaluation tool
Evaluate open-source language models on Agent, formatted output, command following, long text, multilingual, coding, and custom task capabilities. 开源语言模型在Agent,格式化输出,指令追随,长文本,多语言,代码,自定义任务的能力基准测试。
ETUDE (Evaluation Tool for Unstructured Data and Extractions) is a Python-based tool that provides consistent evaluation options across a range of annotation schemata and corpus formats
LLM evaluation framework
Official Implementation of ACL2024 paper "Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs"(https://arxiv.org/abs/2402.11199).
A tool to perform functional testing and performance testing of the Dhruva Platform
MODELAR: MODular and EvaLuative framework to improve surgical Augmented Reality visualization
A Visual Dashboard for Fundamental Benchmarking of LLMs
The AndroTest24 Study is the first comprehensive statistical study of existing Android GUI testing metrics. This repository provides the corresponding ① AndroTest24 App Benchmark ② Study Data ③ SATE (Statistical Android Testing Evaluation) Framework.
Add a description, image, and links to the evaluation-framework topic page so that developers can more easily learn about it.
To associate your repository with the evaluation-framework topic, visit your repo's landing page and select "manage topics."