Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug-Fix] fix attention head intervention for multiple models #159

Merged
merged 3 commits into from
Jun 12, 2024

Conversation

Bakser
Copy link
Contributor

@Bakser Bakser commented May 25, 2024

Description

Fix the bug reported in #158 by modifying the modeling scripts of gemma, gpt_neo, gpt_neox, llama, llava, mistral, mainly involving three points:

  • add the missed split_head_and_permute operation to split the hidden representations by attention heads
  • fix the support of models with LM head (_lm_type_to_module_mapping), which missed elements in v[1:]
  • add the support of grouped query attention modules used by Llama-like models

Testing Done

All tests passed.

'pyvene' is not installed.
PASS: pyvene is not installed. Testing local dev code.
=== Test Suite: VanillaInterventionWithTransformerTestCase ===
loaded model
./shared/nas2/wangxz/pyvene/pyvene/models/intervenable_base.py:55: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
  logging.warn(
WARNING:root:Detected use_fast=True means the intervention location will be static within a batch.

In case multiple location tags are passed only the first one will be considered
.WARNING:root:Detected use_fast=True means the intervention location will be static within a batch.

In case multiple location tags are passed only the first one will be considered
.WARNING:root:Detected use_fast=True means the intervention location will be static within a batch.

In case multiple location tags are passed only the first one will be considered
./shared/nas2/wangxz/miniconda3/envs/llm/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
loaded model
.loaded model
model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 548M/548M [00:02<00:00, 241MB/s]
loaded model
.loaded model
.loaded model
.loaded model
.loaded model
IntervenableConfig
{
    "model_type": "None",
    "representations": [
        {
            "layer": 0,
            "component": "mlp_output",
            "unit": "pos",
            "max_number_of_units": 1,
            "low_rank_dimension": null,
            "intervention_type": null,
            "intervention": null,
            "subspace_partition": null,
            "group_key": null,
            "intervention_link_key": null,
            "moe_key": null,
            "source_representation": "PLACEHOLDER",
            "hidden_source_representation": null
        },
        {
            "layer": 1,
            "component": "mlp_output",
            "unit": "pos",
            "max_number_of_units": 1,
            "low_rank_dimension": null,
            "intervention_type": null,
            "intervention": null,
            "subspace_partition": null,
            "group_key": null,
            "intervention_link_key": null,
            "moe_key": null,
            "source_representation": "PLACEHOLDER",
            "hidden_source_representation": null
        },
        {
            "layer": 2,
            "component": "mlp_output",
            "unit": "pos",
            "max_number_of_units": 1,
            "low_rank_dimension": null,
            "intervention_type": null,
            "intervention": null,
            "subspace_partition": null,
            "group_key": null,
            "intervention_link_key": null,
            "moe_key": null,
            "source_representation": "PLACEHOLDER",
            "hidden_source_representation": null
        },
        {
            "layer": 3,
            "component": "mlp_output",
            "unit": "pos",
            "max_number_of_units": 1,
            "low_rank_dimension": null,
            "intervention_type": null,
            "intervention": null,
            "subspace_partition": null,
            "group_key": null,
            "intervention_link_key": null,
            "moe_key": null,
            "source_representation": "PLACEHOLDER",
            "hidden_source_representation": null
        }
    ],
    "intervention_types": "<class 'pyvene.models.interventions.VanillaIntervention'>",
    "mode": "parallel",
    "sorted_keys": "None",
    "intervention_dimensions": "None"
}
.loaded model
.loaded model
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Once upon a time there was a little girl named Lucy. She was three years old and loved to explore. One day, Lucy was walking in the park when
.loaded model
loaded model
.loaded model
loaded model
.loaded model
.sentencepiece is not installed. skipping
.loaded model
Directory './tmp/' already exists.
/shared/nas2/wangxz/pyvene/pyvene/models/intervenable_base.py:165: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
  logging.warn(
WARNING:root:The key is provided in the config. Assuming this is loaded from a pretrained module.
.loaded model
.loaded model
.loaded model
.loaded model
.loaded model
Directory './test_output_dir_prefix-542c1b' already exists.
WARNING:root:The key is provided in the config. Assuming this is loaded from a pretrained module.
.loaded model
.loaded model
.loaded model
.Removing testing dir ./test_output_dir_prefix-542c1b
=== Test Suite: InterventionWithGPT2TestCase ===
loaded model
testing stream: head_attention_value_output with multiple heads positions
testing stream: head_query_output with multiple heads positions
testing stream: head_key_output with multiple heads positions
testing stream: head_value_output with multiple heads positions
.=== Test Suite: InterventionWithMLPTestCase ===
loaded model
......=== Test Suite: CausalModelTestCase ===
......=== Test Suite: IntervenableConfigUnitTestCase ===
loaded model
.=== Test Suite: InterventionUtilsTestCase ===
loaded model
.....Directory './test_output_dir_prefix-dbeab8' created successfully.
WARNING:root:The key is provided in the config. Assuming this is loaded from a pretrained module.
Directory './test_output_dir_prefix-5f226d' created successfully.
WARNING:root:The key is provided in the config. Assuming this is loaded from a pretrained module.
Directory './test_output_dir_prefix-da0f35' created successfully.
WARNING:root:The key is provided in the config. Assuming this is loaded from a pretrained module.
.Directory './test_output_dir_prefix-3a4b1e' created successfully.
.Directory './test_output_dir_prefix-23496a' created successfully.
.tensor([[1.9266e-05, 1.0024e+00, 1.4001e+01, 1.5001e+01, 1.6000e+01, 1.7000e+01],
        [6.0000e+00, 7.0024e+00, 2.0001e+01, 2.1001e+01, 2.2000e+01, 2.3000e+01]],
       grad_fn=<AddBackward0>)
./shared/nas2/wangxz/pyvene/pyvene/models/interventions.py:422: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  mask_sigmoid = torch.sigmoid(self.mask / torch.tensor(self.temperature))
.........Removing testing dir ./test_output_dir_prefix-dbeab8
Removing testing dir ./test_output_dir_prefix-5f226d
Removing testing dir ./test_output_dir_prefix-da0f35
Removing testing dir ./test_output_dir_prefix-3a4b1e
Removing testing dir ./test_output_dir_prefix-23496a
.............
----------------------------------------------------------------------
Ran 71 tests in 33.548s

OK

Checklist:

  • My PR title strictly follows the format: [Your Priority] Your Title
  • I have attached the testing log above
  • I provide enough comments to my code
  • I have changed documentations
  • I have added tests for my changes

Copy link
Collaborator

@frankaging frankaging left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the change!

@frankaging
Copy link
Collaborator

@Bakser would you provide an unit test for one of the model type you changed? e.g., llama

thanks. i can add in a new unit test based on your script.

@Bakser
Copy link
Contributor Author

Bakser commented May 25, 2024

Sure, I can help with that.

I'm not quite familiar with the unit tests of this repo. Do you mean something like the tests/integration_tests/InterventionWithGPT2TestCase.py but for llama?

@frankaging
Copy link
Collaborator

Sure, I can help with that.

I'm not quite familiar with the unit tests of this repo. Do you mean something like the tests/integration_tests/InterventionWithGPT2TestCase.py but for llama?

yes! that would be great! and you can initialize a much smaller llama for test! e.g., just a single layer llama for instance, since we want the unit test to be quick. thanks

@frankaging
Copy link
Collaborator

@Bakser hey! any updates on the progress? thanks!

@Bakser
Copy link
Contributor Author

Bakser commented May 27, 2024

I didn't work on it on the weekend but I think I can finish it today. sorry for worrying

@Bakser
Copy link
Contributor Author

Bakser commented May 28, 2024

I find that I underestimated the workload. I will try to finish it in a couple of days.

@Bakser
Copy link
Contributor Author

Bakser commented Jun 11, 2024

@frankaging I've finished the test for Llama. Sorry for the delay since I was on travel.

Basically, I just copied the tests in InterventionWithGPT2TestCase.py into InterventionWithLlamaTestCase.py and added the implementations for Llama forward process into tests/utils.py as like for GPT2 (but I do think if we want to implement tests for more models we need to split this file).

It can be run with python -m unittest tests.integration_tests.InterventionWithLlamaTestCase and the output should be like:

'pyvene' is not installed.
PASS: pyvene is not installed. Testing local dev code.
=== Test Suite: InterventionWithLlamaTestCase ===
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
loaded model
testing stream: head_attention_value_output with multiple heads positions
testing stream: head_query_output with multiple heads positions
testing stream: head_key_output with multiple heads positions
testing stream: head_value_output with multiple heads positions
.
----------------------------------------------------------------------
Ran 1 test in 37.738s

OK

@frankaging
Copy link
Collaborator

@frankaging I've finished the test for Llama. Sorry for the delay since I was on travel.

Basically, I just copied the tests in InterventionWithGPT2TestCase.py into InterventionWithLlamaTestCase.py and added the implementations for Llama forward process into tests/utils.py as like for GPT2 (but I do think if we want to implement tests for more models we need to split this file).

It can be run with python -m unittest tests.integration_tests.InterventionWithLlamaTestCase and the output should be like:

'pyvene' is not installed.
PASS: pyvene is not installed. Testing local dev code.
=== Test Suite: InterventionWithLlamaTestCase ===
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
loaded model
testing stream: head_attention_value_output with multiple heads positions
testing stream: head_query_output with multiple heads positions
testing stream: head_key_output with multiple heads positions
testing stream: head_value_output with multiple heads positions
.
----------------------------------------------------------------------
Ran 1 test in 37.738s

OK

Thanks! @Bakser

@frankaging frankaging merged commit 3da8474 into stanfordnlp:main Jun 12, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants