Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with DeviceVector allocation heuristic #3251

Closed
3 of 4 tasks
gabuzi opened this issue Feb 13, 2024 · 4 comments
Closed
3 of 4 tasks

Problems with DeviceVector allocation heuristic #3251

gabuzi opened this issue Feb 13, 2024 · 4 comments
Assignees
Labels

Comments

@gabuzi
Copy link

gabuzi commented Feb 13, 2024

Summary

I ran into some performance anomalies with IVFFlat indexes on GPU.
We are evaluating a scenario where we need to minimize index training & adding costs, thus we were looking at clustering with small numbers of centroids.

To my surprise, this started slowing things down at some point, and also caused a significant increase in memory (> 24GiB for 30M vectors of dim 40).

GPU profiling revealed significant activity for CudaMalloc, and I could narrow it down to the recently introduced new heuristic for DeviceVector allocation (#2691) in the regime where individual IVFLists are supposed to grow by 1.25.

I think I managed to isolate two separate but related issues:

  1. in DeviceVector::resize() getNewCapacity_() is always called when the new size is bigger than the current number of elements. This works fine in the regime where the size is always rounded up to the next power of 2, as this results in the same capacity for all input arguments that are between two powers of 2. But in the recently added regime targeting 1.25 allocated space, two capacities that differ by just 1 byte will always get different new capacities, effectively triggering a reallocation for every resize call that grows the vector, even if it could have fit into the existing capacity. As far as I can see, this can be fixed by only calculating the new capacity if the new size is larger than the current capacity (not the current size) in
    if (num_ < newSize) {
    mem = reserve(getNewCapacity_(newSize), stream);
    }
    Since the call to reserve() is a no-op for sizes that are not bigger than the current capacity, this should not be a problem.
  2. There's a typo in the calculation for 1.25x the allocation size:
    return preferredSize + (preferredSize << 2);
    The bit-shift should be >> instead of <<, as at the moment it actually returns 5x the requested allocation size.

I compiled with the mentioned changes, and no longer see the mentioned performance & memory problems.

Hope that helps, and thanks for the speedy library! 🚀

Platform

x86-64, intel
RTX 4090

OS: Ubuntu 20.04 LTS

Faiss version:
1.7.4
Installed from: anaconda, but also built myself

Running on:

  • CPU
  • GPU

Interface:

  • C++
  • Python

Reproduction instructions

Create a GPUIVFFlat index with L2 norm and d=40 and 100 centroids.
train with 10M vectors and add 10M vectors to it. Observe time for adding and memory use.

Repeat the same with 5000 centroids.
time for adding and memory use should be much lower.

@mdouze
Copy link
Contributor

mdouze commented Feb 14, 2024

Thanks for the careful report. @wickedfoo would you mind taking a look?

wickedfoo pushed a commit to wickedfoo/faiss that referenced this issue Feb 15, 2024
Summary:
Per facebookresearch#3251 there are two problems with DeviceVector resizing and capacity growth. The first is that if you resize a vector with enough capacity available for the new size, it will go ahead and re-allocate memory anyways.

The second is that the calculation that was supposed to produce x + 0.25 * x was actually producing x + 4 * x for determining the new size of the allocated memory for a vector. This is also fixed.

Differential Revision: D53813207
wickedfoo pushed a commit to wickedfoo/faiss that referenced this issue Feb 15, 2024
Summary:

Per facebookresearch#3251 there are two problems with DeviceVector resizing and capacity growth. The first is that if you resize a vector with enough capacity available for the new size, it will go ahead and re-allocate memory anyways.

The second is that the calculation that was supposed to produce x + 0.25 * x was actually producing x + 4 * x for determining the new size of the allocated memory for a vector. This is also fixed.

Differential Revision: D53813207
@wickedfoo
Copy link
Contributor

Thanks for noticing this!

#3256 should fix.

wickedfoo pushed a commit to wickedfoo/faiss that referenced this issue Feb 20, 2024
Summary:

Per facebookresearch#3251 there are two problems with DeviceVector resizing and capacity growth. The first is that if you resize a vector with enough capacity available for the new size, it will go ahead and re-allocate memory anyways.

The second is that the calculation that was supposed to produce x + 0.25 * x was actually producing x + 4 * x for determining the new size of the allocated memory for a vector. This is also fixed.

Reviewed By: mdouze

Differential Revision: D53813207
facebook-github-bot pushed a commit that referenced this issue Feb 21, 2024
Summary:
Pull Request resolved: #3256

Per #3251 there are two problems with DeviceVector resizing and capacity growth. The first is that if you resize a vector with enough capacity available for the new size, it will go ahead and re-allocate memory anyways.

The second is that the calculation that was supposed to produce x + 0.25 * x was actually producing x + 4 * x for determining the new size of the allocated memory for a vector. This is also fixed.

Reviewed By: mdouze

Differential Revision: D53813207

fbshipit-source-id: 5aa67bc0a87c171a070645bdcc6bc5d22ba6b36b
@wickedfoo
Copy link
Contributor

should be fixed in the source tree
updated conda/etc packages will have to wait until we refresh those

@gabuzi
Copy link
Author

gabuzi commented Feb 22, 2024

Awesome, thanks!

abhinavdangeti pushed a commit to blevesearch/faiss that referenced this issue Jul 12, 2024
Summary:
Pull Request resolved: facebookresearch#3256

Per facebookresearch#3251 there are two problems with DeviceVector resizing and capacity growth. The first is that if you resize a vector with enough capacity available for the new size, it will go ahead and re-allocate memory anyways.

The second is that the calculation that was supposed to produce x + 0.25 * x was actually producing x + 4 * x for determining the new size of the allocated memory for a vector. This is also fixed.

Reviewed By: mdouze

Differential Revision: D53813207

fbshipit-source-id: 5aa67bc0a87c171a070645bdcc6bc5d22ba6b36b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants