#

llm-safety

Here are 5 public repositories matching this topic...

PKU-YuanGroup / Hallucination-Attack

Attack to induce LLMs within hallucinations

nlp machine-learning deep-learning ai-safety adversarial-attacks hallucinations llm llm-safety

Updated May 17, 2024
Python

Babelscape / ALERT

Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"

nlp benchmark ai artificial-intelligence nlp-machine-learning red-teaming bias-detection safety-monitoring transformers-models llm llm-evaluation llm-safety llm-safety-benchmark

Updated Sep 20, 2024
Python

declare-lab / resta

Restore safety in fine-tuned language models through task arithmetic

alignment safety alignment-algorithm llm llms llm-safety llms-benchmarking llm-safety-benchmark

Updated Mar 28, 2024
Python

llm-editing / editing-attack

Code and dataset for the paper: "Can Editing LLMs Inject Harm?"

llms knowledge-editing llm-safety

Updated Nov 9, 2024
Python

poloclub / llm-landscape

NeurIPS'24 - LLM Safety Landscape

llm llm-safety safety-basin llm-safety-landscape llm-landscape

Updated Oct 29, 2024
Python

Improve this page

Add a description, image, and links to the llm-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-safety topic, visit your repo's landing page and select "manage topics."