[Bug-Fix] fix attention head intervention for multiple models #159

Bakser · 2024-05-25T02:34:27Z

Description

Fix the bug reported in #158 by modifying the modeling scripts of gemma, gpt_neo, gpt_neox, llama, llava, mistral, mainly involving three points:

add the missed split_head_and_permute operation to split the hidden representations by attention heads
fix the support of models with LM head (_lm_type_to_module_mapping), which missed elements in v[1:]
add the support of grouped query attention modules used by Llama-like models

Testing Done

All tests passed.

'pyvene' is not installed.
PASS: pyvene is not installed. Testing local dev code.
=== Test Suite: VanillaInterventionWithTransformerTestCase ===
loaded model
./shared/nas2/wangxz/pyvene/pyvene/models/intervenable_base.py:55: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
  logging.warn(
WARNING:root:Detected use_fast=True means the intervention location will be static within a batch.

In case multiple location tags are passed only the first one will be considered
.WARNING:root:Detected use_fast=True means the intervention location will be static within a batch.

In case multiple location tags are passed only the first one will be considered
.WARNING:root:Detected use_fast=True means the intervention location will be static within a batch.

In case multiple location tags are passed only the first one will be considered
./shared/nas2/wangxz/miniconda3/envs/llm/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
loaded model
.loaded model
model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 548M/548M [00:02<00:00, 241MB/s]
loaded model
.loaded model
.loaded model
.loaded model
.loaded model
IntervenableConfig
{
    "model_type": "None",
    "representations": [
        {
            "layer": 0,
            "component": "mlp_output",
            "unit": "pos",
            "max_number_of_units": 1,
            "low_rank_dimension": null,
            "intervention_type": null,
            "intervention": null,
            "subspace_partition": null,
            "group_key": null,
            "intervention_link_key": null,
            "moe_key": null,
            "source_representation": "PLACEHOLDER",
            "hidden_source_representation": null
        },
        {
            "layer": 1,
            "component": "mlp_output",
            "unit": "pos",
            "max_number_of_units": 1,
            "low_rank_dimension": null,
            "intervention_type": null,
            "intervention": null,
            "subspace_partition": null,
            "group_key": null,
            "intervention_link_key": null,
            "moe_key": null,
            "source_representation": "PLACEHOLDER",
            "hidden_source_representation": null
        },
        {
            "layer": 2,
            "component": "mlp_output",
            "unit": "pos",
            "max_number_of_units": 1,
            "low_rank_dimension": null,
            "intervention_type": null,
            "intervention": null,
            "subspace_partition": null,
            "group_key": null,
            "intervention_link_key": null,
            "moe_key": null,
            "source_representation": "PLACEHOLDER",
            "hidden_source_representation": null
        },
        {
            "layer": 3,
            "component": "mlp_output",
            "unit": "pos",
            "max_number_of_units": 1,
            "low_rank_dimension": null,
            "intervention_type": null,
            "intervention": null,
            "subspace_partition": null,
            "group_key": null,
            "intervention_link_key": null,
            "moe_key": null,
            "source_representation": "PLACEHOLDER",
            "hidden_source_representation": null
        }
    ],
    "intervention_types": "<class 'pyvene.models.interventions.VanillaIntervention'>",
    "mode": "parallel",
    "sorted_keys": "None",
    "intervention_dimensions": "None"
}
.loaded model
.loaded model
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Once upon a time there was a little girl named Lucy. She was three years old and loved to explore. One day, Lucy was walking in the park when
.loaded model
loaded model
.loaded model
loaded model
.loaded model
.sentencepiece is not installed. skipping
.loaded model
Directory './tmp/' already exists.
/shared/nas2/wangxz/pyvene/pyvene/models/intervenable_base.py:165: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
  logging.warn(
WARNING:root:The key is provided in the config. Assuming this is loaded from a pretrained module.
.loaded model
.loaded model
.loaded model
.loaded model
.loaded model
Directory './test_output_dir_prefix-542c1b' already exists.
WARNING:root:The key is provided in the config. Assuming this is loaded from a pretrained module.
.loaded model
.loaded model
.loaded model
.Removing testing dir ./test_output_dir_prefix-542c1b
=== Test Suite: InterventionWithGPT2TestCase ===
loaded model
testing stream: head_attention_value_output with multiple heads positions
testing stream: head_query_output with multiple heads positions
testing stream: head_key_output with multiple heads positions
testing stream: head_value_output with multiple heads positions
.=== Test Suite: InterventionWithMLPTestCase ===
loaded model
......=== Test Suite: CausalModelTestCase ===
......=== Test Suite: IntervenableConfigUnitTestCase ===
loaded model
.=== Test Suite: InterventionUtilsTestCase ===
loaded model
.....Directory './test_output_dir_prefix-dbeab8' created successfully.
WARNING:root:The key is provided in the config. Assuming this is loaded from a pretrained module.
Directory './test_output_dir_prefix-5f226d' created successfully.
WARNING:root:The key is provided in the config. Assuming this is loaded from a pretrained module.
Directory './test_output_dir_prefix-da0f35' created successfully.
WARNING:root:The key is provided in the config. Assuming this is loaded from a pretrained module.
.Directory './test_output_dir_prefix-3a4b1e' created successfully.
.Directory './test_output_dir_prefix-23496a' created successfully.
.tensor([[1.9266e-05, 1.0024e+00, 1.4001e+01, 1.5001e+01, 1.6000e+01, 1.7000e+01],
        [6.0000e+00, 7.0024e+00, 2.0001e+01, 2.1001e+01, 2.2000e+01, 2.3000e+01]],
       grad_fn=<AddBackward0>)
./shared/nas2/wangxz/pyvene/pyvene/models/interventions.py:422: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  mask_sigmoid = torch.sigmoid(self.mask / torch.tensor(self.temperature))
.........Removing testing dir ./test_output_dir_prefix-dbeab8
Removing testing dir ./test_output_dir_prefix-5f226d
Removing testing dir ./test_output_dir_prefix-da0f35
Removing testing dir ./test_output_dir_prefix-3a4b1e
Removing testing dir ./test_output_dir_prefix-23496a
.............
----------------------------------------------------------------------
Ran 71 tests in 33.548s

OK

Checklist:

My PR title strictly follows the format: [Your Priority] Your Title
I have attached the testing log above
I provide enough comments to my code
I have changed documentations
I have added tests for my changes

frankaging

LGTM! Thanks for the change!

frankaging · 2024-05-25T04:05:14Z

@Bakser would you provide an unit test for one of the model type you changed? e.g., llama

thanks. i can add in a new unit test based on your script.

Bakser · 2024-05-25T04:27:00Z

Sure, I can help with that.

I'm not quite familiar with the unit tests of this repo. Do you mean something like the tests/integration_tests/InterventionWithGPT2TestCase.py but for llama?

frankaging · 2024-05-25T05:17:21Z

Sure, I can help with that.

I'm not quite familiar with the unit tests of this repo. Do you mean something like the tests/integration_tests/InterventionWithGPT2TestCase.py but for llama?

yes! that would be great! and you can initialize a much smaller llama for test! e.g., just a single layer llama for instance, since we want the unit test to be quick. thanks

frankaging · 2024-05-27T05:10:52Z

@Bakser hey! any updates on the progress? thanks!

Bakser · 2024-05-27T20:17:08Z

I didn't work on it on the weekend but I think I can finish it today. sorry for worrying

Bakser · 2024-05-28T04:33:37Z

I find that I underestimated the workload. I will try to finish it in a couple of days.

Bakser · 2024-06-11T22:41:21Z

@frankaging I've finished the test for Llama. Sorry for the delay since I was on travel.

Basically, I just copied the tests in InterventionWithGPT2TestCase.py into InterventionWithLlamaTestCase.py and added the implementations for Llama forward process into tests/utils.py as like for GPT2 (but I do think if we want to implement tests for more models we need to split this file).

It can be run with python -m unittest tests.integration_tests.InterventionWithLlamaTestCase and the output should be like:

'pyvene' is not installed.
PASS: pyvene is not installed. Testing local dev code.
=== Test Suite: InterventionWithLlamaTestCase ===
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
loaded model
testing stream: head_attention_value_output with multiple heads positions
testing stream: head_query_output with multiple heads positions
testing stream: head_key_output with multiple heads positions
testing stream: head_value_output with multiple heads positions
.
----------------------------------------------------------------------
Ran 1 test in 37.738s

OK

frankaging · 2024-06-12T00:05:54Z

@frankaging I've finished the test for Llama. Sorry for the delay since I was on travel.

Basically, I just copied the tests in InterventionWithGPT2TestCase.py into InterventionWithLlamaTestCase.py and added the implementations for Llama forward process into tests/utils.py as like for GPT2 (but I do think if we want to implement tests for more models we need to split this file).

It can be run with python -m unittest tests.integration_tests.InterventionWithLlamaTestCase and the output should be like:
'pyvene' is not installed.
PASS: pyvene is not installed. Testing local dev code.
=== Test Suite: InterventionWithLlamaTestCase ===
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
loaded model
testing stream: head_attention_value_output with multiple heads positions
testing stream: head_query_output with multiple heads positions
testing stream: head_key_output with multiple heads positions
testing stream: head_value_output with multiple heads positions
.
----------------------------------------------------------------------
Ran 1 test in 37.738s

OK

Thanks! @Bakser

fix head-level intervention for multiple models

6ed100a

frankaging approved these changes May 25, 2024

View reviewed changes

frankaging assigned Bakser May 25, 2024

Bakser added 2 commits June 12, 2024 06:09

implement Llama test

02e95f1

make llama accept customized config for running test fastly

81836d7

frankaging merged commit 3da8474 into stanfordnlp:main Jun 12, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug-Fix] fix attention head intervention for multiple models #159

[Bug-Fix] fix attention head intervention for multiple models #159

Bakser commented May 25, 2024

frankaging left a comment

frankaging commented May 25, 2024

Bakser commented May 25, 2024

frankaging commented May 25, 2024

frankaging commented May 27, 2024

Bakser commented May 27, 2024

Bakser commented May 28, 2024

Bakser commented Jun 11, 2024

frankaging commented Jun 12, 2024

[Bug-Fix] fix attention head intervention for multiple models #159

[Bug-Fix] fix attention head intervention for multiple models #159

Conversation

Bakser commented May 25, 2024

Description

Testing Done

Checklist:

frankaging left a comment

Choose a reason for hiding this comment

frankaging commented May 25, 2024

Bakser commented May 25, 2024

frankaging commented May 25, 2024

frankaging commented May 27, 2024

Bakser commented May 27, 2024

Bakser commented May 28, 2024

Bakser commented Jun 11, 2024

frankaging commented Jun 12, 2024