-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows doesn't support cudaMemPrefetchAsync() #453
Comments
Above linked solution that works for me - checking capabilities for that feature before running the call as hinted in https://stackoverflow.com/a/43430831/950131 . The call is fast, so maybe no need to cache the answer. |
Does this issue only apply to QLoRA training? |
Since I am unable to rebuild bitsandbytes because of Maxwell architecture incompatibility with synchronization primitives, I am trying a different solution, i.e. trapping SIGSEGV signal. I has not dumped core yet, but I am not sure what it is doing. Python seems to be running but I don't see any activity on the GPUs for about one hour either. I am running on Ubuntu 20.04, not Windows. So it may be a wider issue than the OS. |
…Fixes artidoro/qlora#73 and bitsandbytes-foundation#453 (cherry picked from commit e02f078)
…Fixes artidoro/qlora#73 and bitsandbytes-foundation#453 (cherry picked from commit e02f078)
This is a duplicate of #477, please redirect all discussion there. TL;DR: I need to think if I will support Maxwell or not. There might be a workaround for Maxwell support by excluding Paged Optimziers. |
This has been fixed and pushed to pip. Memory problems might remain, but these are Windows-specific and there is nothing I can do about that. Thank you for the fix @stoperro , this was an important bugfix. |
…Fixes artidoro/qlora#73 and bitsandbytes-foundation#453 (cherry picked from commit e02f078)
…Fixes artidoro/qlora#73 and bitsandbytes-foundation#453 (cherry picked from commit e02f078)
…d to run on Windows (#13957) [Windows doesn't support cudaMemPrefetchAsync()](bitsandbytes-foundation/bitsandbytes#453) which is used in the call to `prefetch` in the test. [urEnqueueUSMPrefetch](https://github.com/oneapi-src/unified-runtime/blob/c0c607c3a88933b4c5c20a0aca4539781c678411/source/adapters/cuda/enqueue.cpp#L1629) is also commented with a note for not having the support for CUDA on Windows.
…d to run on Windows (intel#13957) [Windows doesn't support cudaMemPrefetchAsync()](bitsandbytes-foundation/bitsandbytes#453) which is used in the call to `prefetch` in the test. [urEnqueueUSMPrefetch](https://github.com/oneapi-src/unified-runtime/blob/c0c607c3a88933b4c5c20a0aca4539781c678411/source/adapters/cuda/enqueue.cpp#L1629) is also commented with a note for not having the support for CUDA on Windows.
Also memory oversubscription is not supported https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#system-requirements which I presume means paged optimizer that overcomes memory spikes won't work on windows.
This results in below error in QLoRA training:
(note: above wasn't caused by old transformers version)
The text was updated successfully, but these errors were encountered: