ai-testing

Here are 14 public repositories matching this topic...

Giskard-AI / giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

ai-security mlops fairness-ai responsible-ai ml-validation red-team-tools trustworthy-ai ml-testing llm ai-red-team ai-testing llmops llm-security llm-eval llm-evaluation rag-evaluation agent-evaluation

Updated Oct 1, 2025
Python

Pacific-AI-Corp / langtest

Star

Deliver safe & effective language models

nlp artificial-intelligence benchmarks benchmark-framework model-assessment ai-safety mlops responsible-ai ml-safety trustworthy-ai ethics-in-ai ml-testing large-language-models llm ai-testing llm-test llm-evaluation-toolkit llm-as-evaluator llm-testing

Updated Sep 27, 2025
Python

Addepto / contextcheck

Star

MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.

open-source ci testing-tools chatbot-framework testing-framework chatbot-testing rag ai-chat large-language-models llm ai-testing llm-evaluation llm-evaluation-framework prompt-test llm-testing ai-testing-tool generative-ai-testing rag-testing summarization-testing

Updated Dec 11, 2024
Python

kdunee / intentguard

Sponsor

Star

A Python library for verifying code properties using natural language assertions.

testing natural-language test-automation pytest unittest code-quality language-models code-verification llm ai-testing

Updated Mar 1, 2025
Python

Open-source framework for stress-testing LLMs and conversational AI. Identify hallucinations, policy violations, and edge cases with scalable, realistic simulations. Join the discord: https://discord.gg/ssd4S37WNW

security ai simulation chatbot ai-agents ai-testing llm-testing chatbot-simulation

Updated Sep 15, 2025
Python

jhd3197 / Prompture

Sponsor

Star

Prompture is an API-first library for requesting structured JSON output from LLMs (or any structure), validating it against a schema, and running comparative tests between models.

openai json-validation structured-output pydantic llm prompt-engineering ai-testing prompt-testing

Updated Oct 4, 2025
Python

taurus5650 / open_ai_with_pytest_simple_version

Star

Integration of OpenAI with Pytest to automate API test generation.

artificial-intelligence pytest openai api-testing software-testing automated-testing open-ai automation-testing ai-testing llm-agents ai-test-case-generator

Updated Jun 11, 2025
Python

nfodor / mcp-chromium-arm64

Star

🚀 ARM64 Browser Automation for Claude Code - SaaS testing on 80 Raspberry Pi budget. The first solution that works where Playwright/Puppeteer fail on ARM64. Autonomous testing without human debugging.

nodejs raspberry-pi mcp arm64 browser-automation startup-tools ai-testing claude-code saas-testing budget-ai

Updated Aug 10, 2025
Python

Chatbot-TRACER / TRACER

Star

An automated approach for exploring and testing conversational agents using large language models. TRACER discovers chatbot functionalities, generates user profiles, and creates comprehensive test suites for conversational AI systems.

test-automation software-testing automated-testing dialogue-systems conversational-ai chatbot-testing llm ai-testing

Updated Sep 9, 2025
Python

AetherLabCo / aetherlab-community

Star

Open-source tools, SDKs, and resources for AetherLab AI quality control platform

javascript python machine-learning quality-control sdk ai artificial-intelligence developer-tools ai-safety ai-ethics llm ai-testing

Updated Jul 22, 2025
Python

taurus5650 / test_result_dashboard_streamlit_gemini

Star

A lightweight dashboard to view and analyze test automation results. Built with Streamlit + PostgreSQL, and powered by AI (Gemini) to help debug test failures faster.

Updated Sep 9, 2025
Python

ashleysally00 / agent_eval_testing_workflow

Star

Agentic Workflow Evaluation: Text Summarization Agent. This project includes an AI agent evaluation workflow using a text summarization model with OpenAI API and Transformers library. It follows an iterative approach: generate summaries, analyze metrics, adjust parameters, and retest to refine AI agents for accuracy, readability, and performance.

machine-learning text-summarization semantic-similarity model-performance transformers-library openai-api ai-optimization ai-testing llm-evaluation ai-workflow agentic-ai ai-agent-evaluation ai-metrics readability-n

Updated Feb 23, 2025
Python

Q-Aware-Labs / Evaluating_AI_Web_Search

Star

This repository contains a study comparing the web search capabilities of four AI assistants: Gemini 2.0 Flash, ChatGPT-4 Turbo, DeepSeekR1, and Grok 3

artificial-intelligence gemini ai-evaluation llm ai-testing chatgpt- llm-evaluation grok-3 ai-assistans

Updated Jun 2, 2025
Python

sergillam / smart-inference-ai-fusion

Star

Modular and extensible Python framework for applying synthetic inference and controlled perturbations to AI model inputs, labels, and hyperparameters. Evaluate robustness, sensitivity, and stability of algorithms under realistic variations and adverse scenarios.

machine-learning model-robustness data-perturbation ai-testing synthetic-inference

Updated Sep 24, 2025
Python

Improve this page

Add a description, image, and links to the ai-testing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-testing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-testing

Here are 14 public repositories matching this topic...

Giskard-AI / giskard-oss

Pacific-AI-Corp / langtest

Addepto / contextcheck

kdunee / intentguard

onerun-ai / onerun

jhd3197 / Prompture

taurus5650 / open_ai_with_pytest_simple_version

nfodor / mcp-chromium-arm64

Chatbot-TRACER / TRACER

AetherLabCo / aetherlab-community

taurus5650 / test_result_dashboard_streamlit_gemini

ashleysally00 / agent_eval_testing_workflow

Q-Aware-Labs / Evaluating_AI_Web_Search

sergillam / smart-inference-ai-fusion

Improve this page

Add this topic to your repo