Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there workaround for 3090 #344

Open
momo1986 opened this issue Sep 23, 2024 · 4 comments
Open

Is there workaround for 3090 #344

momo1986 opened this issue Sep 23, 2024 · 4 comments

Comments

@momo1986
Copy link

The machine is in 3090 platform.

Looks like that the pytorch version would be limited.

Is there any workaround for this version?

Thanks & Regards!

@dabensongbing
Copy link

A100 GPU detected, using flash attention if input tensor is on cuda
D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
out = F.scaled_dot_product_attention(
D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: Memory efficient kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:415.)
out = F.scaled_dot_product_attention(
D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: Memory Efficient attention has been runtime disabled. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen/native/transformers/sdp_utils_cpp.h:456.)
out = F.scaled_dot_product_attention(
D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: Flash attention kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:417.)
out = F.scaled_dot_product_attention(
0%| | 0/700000 [02:43<?, ?it/s]
Traceback (most recent call last):
File "D:\denoising-diffusion-pytorch-main\test.py", line 32, in
trainer.train()
File "D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py", line 1058, in train
loss = self.model(data)
File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "D:\condaa312\envs\ddp\lib\site-packages\accelerate\utils\operations.py", line 820, in forward
return model_forward(*args, **kwargs)
File "D:\condaa312\envs\ddp\lib\site-packages\accelerate\utils\operations.py", line 808, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "D:\condaa312\envs\ddp\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py", line 841, in forward
return self.p_losses(img, t, *args, **kwargs)
File "D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py", line 817, in p_losses
model_out = self.model(x, t, x_self_cond)
File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py", line 411, in forward
x = attn(x) + x
File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py", line 269, in forward
out = self.attend(q, k, v)
File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py", line 107, in forward
return self.flash_attn(q, k, v)
File "D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py", line 88, in flash_attn
out = F.scaled_dot_product_attention(
RuntimeError: No available kernel. Aborting execution.
我遇到了这个问题,不知道应该如何解决

@MADAO-King
Copy link

如果输入张量位于 cuda 上,则检测到 A100 GPU,使用 Flash 注意 D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88:UserWarning:1Torch 未使用 Flash 注意进行编译。(在 C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263 内部触发。 out = F.scaled_dot_product_attention( D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: 未使用内存高效内核,因为:(在 C:\actions-runner_work\pytorch\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:415 内部触发。 out = F.scaled_dot_product_attention( D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: 内存效率注意已在运行时禁用。(在 C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen/native/transformers/sdp_utils_cpp.h:456 内部触发。 out = F.scaled_dot_product_attention( D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: Flash 注意内核未使用,因为:(在 C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:417 内部触发。 out = F.scaled_dot_product_attention( 0%| | 0/700000 [02:43<?, ?it/s] 回溯(最近调用最后):文件 “D:\denoising-diffusion-pytorch-main\test.py”,第 32 行,在 trainer.train() 文件中 “D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py”,第 1058 行,在 train loss = self.model(data)文件“D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1511 行,_wrapped_call_impl返回self._call_impl(*args, **kwargs)文件“D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1520 行,_call_impl返回forward_call(*args, **kwargs) 文件“D:\condaa312\envs\ddp\lib\site-packages\accelerate\utils\operations.py”,第 820 行, 在正向返回 model_forward(*args, **kwargs) 文件中 “D:\condaa312\envs\ddp\lib\site-packages\accelerate\utils\operations.py”,第 808 行,在调用返回 convert_to_fp32(self.model_forward(*args, **kwargs)) 文件中 “D:\condaa312\envs\ddp\lib\site-packages\torch\amp\autocast_mode.py”,第 16 行,在 decorate_autocast 中返回 func(*args, **kwargs)文件“D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py”,第 841 行,正向返回self.p_losses(img, t, *args, **kwargs)文件“D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py”,第 817 行,p_losses model_out = self.model(x, t, x_self_cond) 文件“D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1511 行,_wrapped_call_impl返回 self._call_impl(*args, **kwargs) 文件 “D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1520 行,_call_impl 返回 forward_call(*args, **kwargs) 文件 “D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py“,第 411 行,转发 x = attn(x) + x 文件”D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py“,第 1511 行,_wrapped_call_impl返回 self._call_impl(*args, **kwargs)文件”D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py“,第 1520 行,_call_impl返回forward_call(*args, **kwargs)文件“D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py”,第 269 行,传入转发 = self.attend(q, k, v) 文件“D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1511 行,_wrapped_call_impl返回self._call_impl(*args, **kwargs) 文件“D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1520 行,_call_impl返回 forward_call(*args, **kwargs) 文件 “D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py”,第 107 行,正向返回 self.flash_attn(q, k, v) 文件 “D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py”,第 88 行,输入 flash_attn 输出 = F.scaled_dot_product_attention( RuntimeError:无可用内核。正在中止执行。 我遇到了这个问题,不知道应该如何解决

@MADAO-King
Copy link

I have encountered the same problem, have you solved it?

@dabensongbing
Copy link

I have encountered the same problem, have you solved it?

i havnt solve it,i found something is not supported in 3090,some mechanism pytorch what。。emmm ,maybe u need a100.。。。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants