Problems with DeviceVector allocation heuristic #3251

gabuzi · 2024-02-13T21:54:05Z

Summary

I ran into some performance anomalies with IVFFlat indexes on GPU.
We are evaluating a scenario where we need to minimize index training & adding costs, thus we were looking at clustering with small numbers of centroids.

To my surprise, this started slowing things down at some point, and also caused a significant increase in memory (> 24GiB for 30M vectors of dim 40).

GPU profiling revealed significant activity for CudaMalloc, and I could narrow it down to the recently introduced new heuristic for DeviceVector allocation (#2691) in the regime where individual IVFLists are supposed to grow by 1.25.

I think I managed to isolate two separate but related issues:

in DeviceVector::resize() getNewCapacity_() is always called when the new size is bigger than the current number of elements. This works fine in the regime where the size is always rounded up to the next power of 2, as this results in the same capacity for all input arguments that are between two powers of 2. But in the recently added regime targeting 1.25 allocated space, two capacities that differ by just 1 byte will always get different new capacities, effectively triggering a reallocation for every resize call that grows the vector, even if it could have fit into the existing capacity. As far as I can see, this can be fixed by only calculating the new capacity if the new size is larger than the current capacity (not the current size) in

faiss/faiss/gpu/utils/DeviceVector.cuh

Lines 135 to 137 in f262011

    
           if (num_ < newSize) { 
        
               mem = reserve(getNewCapacity_(newSize), stream); 
        
           }

Since the call to reserve() is a no-op for sizes that are not bigger than the current capacity, this should not be a problem.

There's a typo in the calculation for 1.25x the allocation size:

faiss/faiss/gpu/utils/DeviceVector.cuh

Line 252 in f262011

return preferredSize + (preferredSize << 2);

The bit-shift should be >> instead of <<, as at the moment it actually returns 5x the requested allocation size.

I compiled with the mentioned changes, and no longer see the mentioned performance & memory problems.

Hope that helps, and thanks for the speedy library! 🚀

Platform

x86-64, intel
RTX 4090

OS: Ubuntu 20.04 LTS

Faiss version:
1.7.4
Installed from: anaconda, but also built myself

Running on:

CPU
GPU

Interface:

C++
Python

Reproduction instructions

Create a GPUIVFFlat index with L2 norm and d=40 and 100 centroids.
train with 10M vectors and add 10M vectors to it. Observe time for adding and memory use.

Repeat the same with 5000 centroids.
time for adding and memory use should be much lower.

The text was updated successfully, but these errors were encountered:

mdouze · 2024-02-14T15:26:16Z

Thanks for the careful report. @wickedfoo would you mind taking a look?

Summary: Per facebookresearch#3251 there are two problems with DeviceVector resizing and capacity growth. The first is that if you resize a vector with enough capacity available for the new size, it will go ahead and re-allocate memory anyways. The second is that the calculation that was supposed to produce x + 0.25 * x was actually producing x + 4 * x for determining the new size of the allocated memory for a vector. This is also fixed. Differential Revision: D53813207

wickedfoo · 2024-02-15T16:13:59Z

Thanks for noticing this!

#3256 should fix.

Summary: Per facebookresearch#3251 there are two problems with DeviceVector resizing and capacity growth. The first is that if you resize a vector with enough capacity available for the new size, it will go ahead and re-allocate memory anyways. The second is that the calculation that was supposed to produce x + 0.25 * x was actually producing x + 4 * x for determining the new size of the allocated memory for a vector. This is also fixed. Reviewed By: mdouze Differential Revision: D53813207

Summary: Pull Request resolved: #3256 Per #3251 there are two problems with DeviceVector resizing and capacity growth. The first is that if you resize a vector with enough capacity available for the new size, it will go ahead and re-allocate memory anyways. The second is that the calculation that was supposed to produce x + 0.25 * x was actually producing x + 4 * x for determining the new size of the allocated memory for a vector. This is also fixed. Reviewed By: mdouze Differential Revision: D53813207 fbshipit-source-id: 5aa67bc0a87c171a070645bdcc6bc5d22ba6b36b

wickedfoo · 2024-02-21T20:17:27Z

should be fixed in the source tree
updated conda/etc packages will have to wait until we refresh those

gabuzi · 2024-02-22T18:41:07Z

Awesome, thanks!

Summary: Pull Request resolved: facebookresearch#3256 Per facebookresearch#3251 there are two problems with DeviceVector resizing and capacity growth. The first is that if you resize a vector with enough capacity available for the new size, it will go ahead and re-allocate memory anyways. The second is that the calculation that was supposed to produce x + 0.25 * x was actually producing x + 4 * x for determining the new size of the allocated memory for a vector. This is also fixed. Reviewed By: mdouze Differential Revision: D53813207 fbshipit-source-id: 5aa67bc0a87c171a070645bdcc6bc5d22ba6b36b

mdouze added the GPU label Feb 14, 2024

mdouze assigned wickedfoo Feb 14, 2024

wickedfoo mentioned this issue Feb 15, 2024

faiss gpu: fix DeviceVector reallocations #3256

Closed

wickedfoo closed this as completed Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with DeviceVector allocation heuristic #3251

Problems with DeviceVector allocation heuristic #3251

gabuzi commented Feb 13, 2024

mdouze commented Feb 14, 2024

wickedfoo commented Feb 15, 2024

wickedfoo commented Feb 21, 2024

gabuzi commented Feb 22, 2024

Problems with DeviceVector allocation heuristic #3251

Problems with DeviceVector allocation heuristic #3251

Comments

gabuzi commented Feb 13, 2024

Summary

Platform

Reproduction instructions

mdouze commented Feb 14, 2024

wickedfoo commented Feb 15, 2024

wickedfoo commented Feb 21, 2024

gabuzi commented Feb 22, 2024