huangtiansheng

Follow

🌴

On vacation

Tiansheng Huang huangtiansheng

🌴

On vacation

Follow

A PhD student from Georgia Institute of Technology

44 followers · 100 following

Georgia Institute of Technology
Atlanta
https://huangtiansheng.github.io/
in/tiansheng-huang-5661a8293

Achievements

Achievements

Organizations

huangtiansheng/README.md

Hi there 👋 I am Tiansheng Huang

I’m currently a third-year PhD candidate from Georgia Tech.
I am working on safety alignment for large language models. Particularly, I am interested in red-teaming attacks and defenses for LLMs.

Selected Publications

I try to push myself to publish high quality papers in the periodicity of every three months.

Here are the papers I wrote in 2025.

[2025/3/01] Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable arXiv [paper] [code]
[2025/1/30] Virus: Harmful Fine-tuning Attack for Large Language Models bypassing Guardrail Moderation arXiv [paper] [code]

Here are the papers I wrote in 2024.

[2024/9/26] Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey arXiv [paper] [repo]
[2024/9/3] Booster: Tackling harmful fine-tuning for large language models via attenuating harmful perturbation ICLR2025 [paper] [code] [Openreview]
[2024/8/18] Antidote: Post-fine-tuning safety alignment for large language models against harmful fine-tuning arXiv [paper]
[2024/5/28] Lazy safety alignment for large language models against harmful fine-tuning NeurIPS2024 [paper] [code]
[2024/2/2] Vaccine: Perturbation-aware alignment for large language model aginst harmful fine-tuning NeurIPS2024 [paper] [code]

Pinned Loading

git-disl/Safety-Tax git-disl/Safety-Tax Public

This is the official code for the paper "Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable".

Python 7
git-disl/awesome_LLM-harmful-fine-tuning-papers git-disl/awesome_LLM-harmful-fine-tuning-papers Public

A survey on harmful fine-tuning attack for large language model

146 3
git-disl/Virus git-disl/Virus Public

This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"

Python 44 2
git-disl/Booster git-disl/Booster Public

This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation" (ICLR2025).

Shell 16
git-disl/Vaccine git-disl/Vaccine Public

This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)

Shell 39 4
git-disl/Lisa git-disl/Lisa Public

This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)

Python 17