Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[P1] Tutorial of Inference-time Intervention #68

Closed
frankaging opened this issue Jan 18, 2024 · 1 comment
Closed

[P1] Tutorial of Inference-time Intervention #68

frankaging opened this issue Jan 18, 2024 · 1 comment
Assignees

Comments

@frankaging
Copy link
Collaborator

Descriptions:

Interventions on activations at inference to steer model behaviors are good applications of this library. It fits the ultimate goal of this library well. Ideally, people should be able to share their steering mounting point along with injecting vectors with others easily.

Original GitHub:
https://github.com/likenneth/honest_llama

@frankaging frankaging self-assigned this Jan 18, 2024
@frankaging
Copy link
Collaborator Author

updates: its hard to find the raw activation addition, and i will probably do a model weight diff by loading https://huggingface.co/likenneth/honest_llama2_chat_7B and the original one to get head diff and then apply.

the original implementation is with BauKit to do the intervention, i am hoping to show we can save the weight diff along with intervention config so ppl can apply to act diff directly.

frankaging added a commit that referenced this issue Jan 27, 2024
[Minor] Support ITI Paper Results (#68)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant