We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transformers v4.36 implementedLlamaSdpaAttention (huggingface/transformers#26572) that calls FlashAttention by default.
LlamaSdpaAttention
But running LOAD_LADE=1 USE_LADE=1 python minimal.py leads to:
LOAD_LADE=1 USE_LADE=1 python minimal.py
Traceback (most recent call last): File "/home/LookaheadDecoding/minimal.py", line 31, in <module> greedy_output = model.generate(**model_inputs, max_new_tokens=256, do_sample=False) File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1718, in generate return self.greedy_search( File "/home/LookaheadDecoding/lade/decoding.py", line 23, in greedy_search_proxy return jacobi_greedy_search_multilevel(self, chat=False, *args, **kwargs) File "/home/LookaheadDecoding/lade/decoding.py", line 278, in jacobi_greedy_search_multilevel outputs = self.jforward_multilevel( File "/home/LookaheadDecoding/lade/models/llama.py", line 383, in jforward_multilevel outputs = self.model.LlamaModeljforward( File "/home/LookaheadDecoding/lade/models/llama.py", line 235, in LlamaModeljforward layer_outputs = decoder_layer.forward( File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 796, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) TypeError: LlamaSdpaAttention.forward() got an unexpected keyword argument 'padding_mask'
The text was updated successfully, but these errors were encountered:
This PR can fix the problem
Sorry, something went wrong.
No branches or pull requests
transformers v4.36 implemented
LlamaSdpaAttention
(huggingface/transformers#26572) that calls FlashAttention by default.But running
LOAD_LADE=1 USE_LADE=1 python minimal.py
leads to:The text was updated successfully, but these errors were encountered: