Skip to content
/ NLSR Public

NLSR: Neuron-Level SafetyRealignment of LargeLanguage Models AgainstHarmful Fine-Tuning (accepted by AAAI2025)

Notifications You must be signed in to change notification settings

xinykou/NLSR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLSR: Neuron-Level SafetyRealignment of LargeLanguage Models AgainstHarmful Fine-Tuning

image

Conda Environment

  • pip install requirements.txt

Step 1: Construction of a safety reference model

bash ./scripts/expo-sft_to_dpo-lora.sh

Step 2: Recognition of Safety-Critical Neurons

bash ./scripts/low_rank_prune.sh
bash ./scripts/low_rank_sparsity.sh

Step 3: Restruction for Safety-Broken Neurons

bash ./scripts/expo-adaptive_mask_replace-realign.sh

Others

We evaluate the trade-off between safety and utility.

bash ./scripts/expo-adaptive_mask_replace-eval_downstream.sh
bash ./scripts/expo-adaptive_mask_replace-eval_safety.sh

Citation

@article{yi2024nlsr,
  title={NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning},
  author={Yi, Xin and Zheng, Shunfan and Wang, Linlin and de Melo, Gerard and Wang, Xiaoling and He, Liang},
  journal={arXiv preprint arXiv:2412.12497},
  year={2024}
}

About

NLSR: Neuron-Level SafetyRealignment of LargeLanguage Models AgainstHarmful Fine-Tuning (accepted by AAAI2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published