Skip to content
@METR

METR

Model Evaluation and Threat Research

Model Evaluation and Threat Research (METR)

METR is a research nonprofit that works on assessing whether cutting-edge AI systems could pose catastrophic risks to society.

We build the science of accurately assessing risks, so that humanity is informed before developing transformative AI systems.

Read more about our work here.

Our Software

Popular repositories Loading

  1. task-standard task-standard Public

    METR Task Standard

    TypeScript 142 32

  2. public-tasks public-tasks Public

    TeX 84 7

  3. vivaria vivaria Public

    Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

    TypeScript 79 29

  4. RE-Bench RE-Bench Public

    Python 60 6

  5. task-template task-template Public template

    TypeScript 9 6

  6. autonomy-evals-guide autonomy-evals-guide Public

    SCSS 3 4

Repositories

Showing 10 of 25 repositories
  • vivaria Public

    Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

    METR/vivaria’s past year of commit activity
    TypeScript 79 MIT 29 224 (3 issues need help) 16 Updated Feb 28, 2025
  • eval-analysis-public Public

    Public repository containing METR's DVC pipeline for eval data analysis

    METR/eval-analysis-public’s past year of commit activity
    Jupyter Notebook 2 4 5 1 Updated Feb 27, 2025
  • task-assets Public
    METR/task-assets’s past year of commit activity
    Python 0 0 1 1 Updated Feb 26, 2025
  • METR/autonomy-evals-guide’s past year of commit activity
    SCSS 3 MIT 4 0 2 Updated Feb 21, 2025
  • agent-prs-on-vivaria Public Forked from METR/vivaria

    Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

    METR/agent-prs-on-vivaria’s past year of commit activity
    TypeScript 0 MIT 30 0 32 Updated Feb 19, 2025
  • inspect_k8s_sandbox Public Forked from UKGovernmentBEIS/inspect_k8s_sandbox

    A Kubernetes sandbox environment for use with inspect_ai

    METR/inspect_k8s_sandbox’s past year of commit activity
    Python 0 MIT 3 0 0 Updated Feb 14, 2025
  • SWE-bench-fork Public Forked from SWE-bench/SWE-bench

    [ICLR 2024] SWE-bench: Can Language Models Resolve Real-world Github Issues?

    METR/SWE-bench-fork’s past year of commit activity
    Python 0 MIT 435 0 0 Updated Feb 11, 2025
  • METR/KernelBenchFiltered’s past year of commit activity
    Python 3 1 0 0 Updated Feb 11, 2025
  • viv-task-dev Public
    METR/viv-task-dev’s past year of commit activity
    Shell 0 1 7 0 Updated Feb 6, 2025
  • inspect_ai Public Forked from UKGovernmentBEIS/inspect_ai

    Inspect: A framework for large language model evaluations

    METR/inspect_ai’s past year of commit activity
    Python 0 MIT 190 0 0 Updated Feb 4, 2025

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…