-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with DeviceVector allocation heuristic #3251
Labels
Comments
Thanks for the careful report. @wickedfoo would you mind taking a look? |
wickedfoo
pushed a commit
to wickedfoo/faiss
that referenced
this issue
Feb 15, 2024
Summary: Per facebookresearch#3251 there are two problems with DeviceVector resizing and capacity growth. The first is that if you resize a vector with enough capacity available for the new size, it will go ahead and re-allocate memory anyways. The second is that the calculation that was supposed to produce x + 0.25 * x was actually producing x + 4 * x for determining the new size of the allocated memory for a vector. This is also fixed. Differential Revision: D53813207
wickedfoo
pushed a commit
to wickedfoo/faiss
that referenced
this issue
Feb 15, 2024
Summary: Per facebookresearch#3251 there are two problems with DeviceVector resizing and capacity growth. The first is that if you resize a vector with enough capacity available for the new size, it will go ahead and re-allocate memory anyways. The second is that the calculation that was supposed to produce x + 0.25 * x was actually producing x + 4 * x for determining the new size of the allocated memory for a vector. This is also fixed. Differential Revision: D53813207
Thanks for noticing this! #3256 should fix. |
wickedfoo
pushed a commit
to wickedfoo/faiss
that referenced
this issue
Feb 20, 2024
Summary: Per facebookresearch#3251 there are two problems with DeviceVector resizing and capacity growth. The first is that if you resize a vector with enough capacity available for the new size, it will go ahead and re-allocate memory anyways. The second is that the calculation that was supposed to produce x + 0.25 * x was actually producing x + 4 * x for determining the new size of the allocated memory for a vector. This is also fixed. Reviewed By: mdouze Differential Revision: D53813207
facebook-github-bot
pushed a commit
that referenced
this issue
Feb 21, 2024
Summary: Pull Request resolved: #3256 Per #3251 there are two problems with DeviceVector resizing and capacity growth. The first is that if you resize a vector with enough capacity available for the new size, it will go ahead and re-allocate memory anyways. The second is that the calculation that was supposed to produce x + 0.25 * x was actually producing x + 4 * x for determining the new size of the allocated memory for a vector. This is also fixed. Reviewed By: mdouze Differential Revision: D53813207 fbshipit-source-id: 5aa67bc0a87c171a070645bdcc6bc5d22ba6b36b
should be fixed in the source tree |
Awesome, thanks! |
abhinavdangeti
pushed a commit
to blevesearch/faiss
that referenced
this issue
Jul 12, 2024
Summary: Pull Request resolved: facebookresearch#3256 Per facebookresearch#3251 there are two problems with DeviceVector resizing and capacity growth. The first is that if you resize a vector with enough capacity available for the new size, it will go ahead and re-allocate memory anyways. The second is that the calculation that was supposed to produce x + 0.25 * x was actually producing x + 4 * x for determining the new size of the allocated memory for a vector. This is also fixed. Reviewed By: mdouze Differential Revision: D53813207 fbshipit-source-id: 5aa67bc0a87c171a070645bdcc6bc5d22ba6b36b
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Summary
I ran into some performance anomalies with IVFFlat indexes on GPU.
We are evaluating a scenario where we need to minimize index training & adding costs, thus we were looking at clustering with small numbers of centroids.
To my surprise, this started slowing things down at some point, and also caused a significant increase in memory (> 24GiB for 30M vectors of dim 40).
GPU profiling revealed significant activity for CudaMalloc, and I could narrow it down to the recently introduced new heuristic for DeviceVector allocation (#2691) in the regime where individual IVFLists are supposed to grow by 1.25.
I think I managed to isolate two separate but related issues:
DeviceVector::resize()
getNewCapacity_()
is always called when the new size is bigger than the current number of elements. This works fine in the regime where the size is always rounded up to the next power of 2, as this results in the same capacity for all input arguments that are between two powers of 2. But in the recently added regime targeting 1.25 allocated space, two capacities that differ by just 1 byte will always get different new capacities, effectively triggering a reallocation for every resize call that grows the vector, even if it could have fit into the existing capacity. As far as I can see, this can be fixed by only calculating the new capacity if the new size is larger than the current capacity (not the current size) infaiss/faiss/gpu/utils/DeviceVector.cuh
Lines 135 to 137 in f262011
reserve()
is a no-op for sizes that are not bigger than the current capacity, this should not be a problem.faiss/faiss/gpu/utils/DeviceVector.cuh
Line 252 in f262011
>>
instead of<<
, as at the moment it actually returns 5x the requested allocation size.I compiled with the mentioned changes, and no longer see the mentioned performance & memory problems.
Hope that helps, and thanks for the speedy library! 🚀
Platform
x86-64, intel
RTX 4090
OS: Ubuntu 20.04 LTS
Faiss version:
1.7.4
Installed from: anaconda, but also built myself
Running on:
Interface:
Reproduction instructions
Create a GPUIVFFlat index with L2 norm and d=40 and 100 centroids.
train with 10M vectors and add 10M vectors to it. Observe time for adding and memory use.
Repeat the same with 5000 centroids.
time for adding and memory use should be much lower.
The text was updated successfully, but these errors were encountered: