LLM vs LLM Debate Toolkit

A library for running two-agent debates in response to a question, and judging the answer. This library will be used to identify the limits of scalability of debate - what the maximum Elo gap can be between debaters and judges before the debate mechanism begins to fail. This could be helpful when assessing far more capable models in the future. If scaling speed increases such that the capability delta between successive generations is above the Elo gap found here, then debate ceases to be a viable method of scalable oversight.

to do:

prompt engineering, to run the debate successfully
find a question bank (or make one) for debate. read the papers below to see what they use.
use replicate/ollama to hit the different ELOs and see how after higher ELO chatbots win, based on how subjective the position is.
get a graph of win % vs ELO

Project submitted as part of the AI Alignment course run by BlueDot Impact, June 2024 cohort.

Relevant reading:

Irving et al. (2018): "AI Safety via Debate" - Foundational work on using debate for AI oversight.
Saunders et al. (2022): "Self-critiquing models for assisting human evaluators" - Recent work on using language models to assist human evaluation.
Michael et al. (2023) "Debate helps supervise unreliable experts"
Choen et al. (2023) "LM vs LM: Detecting Factual Errors via Cross Examination"
Khan et al. (2024)"Debating with More Persuasive LLMs Leads to More Truthful Answers"

Installation

Git clone, then pip install . from the repo root. Then run llm install llm-claude-3 to install the Claude set of models, and provide keys with llm keys set claude.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
html		html
src/llm_vs_llm_debate_toolkit		src/llm_vs_llm_debate_toolkit
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM vs LLM Debate Toolkit

Installation

About

Releases

Packages

Languages

License

wllgrnt/bluedot-2024-scalable-debate

Folders and files

Latest commit

History

Repository files navigation

LLM vs LLM Debate Toolkit

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages