A survey on harmful fine-tuning attack for large language model
-
Updated
Dec 26, 2024
A survey on harmful fine-tuning attack for large language model
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)
This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation".
Sanitizer is a server-side method that ensures client-embedded backdoors can only be used for contribution demonstration in federated learning but not be triggered on natural queries in harmful ways.
94种病毒的源代码
Educational Ransomware Simulation
The research mainly aims to identify through classification algorithms if one day, based on its climatic features and concentrations of harmful elements in the air, it turns out to be harmful (or not) to the health of citizens in the Milan metropolis. A second prediction model was adopted to predict daily mean PM2.5 values.
your device may get warm while visiting this site
Add a description, image, and links to the harmful topic page so that developers can more easily learn about it.
To associate your repository with the harmful topic, visit your repo's landing page and select "manage topics."