Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could Lucene's default Directory (FSDirectory.open) somehow preload .vec files? #13551

Open
mikemccand opened this issue Jul 8, 2024 · 6 comments

Comments

@mikemccand
Copy link
Member

Description

This is really a "discussion" issue. I'm not sure at all that the idea is feasible:

I've been testing luceneutil with heavy KNN indexing (Cohere wikipedia en 768 dimension vectors) and one dismal part of the experience is the swap storm caused by HNSW's random access to the raw vectors stored in the .vec files for each segment on "cold start" searching.

Even when the box has plenty of RAM to hold all .vec files, the swap storm takes minutes on the nightly benchy box, even with a fast NVMe SSD holding the index. Even if the index is freshly built, the OS doesn't seem to cache the .vec files since they appear to be "write once", until the searching starts up, and then the swap storm begins. This was with FLOAT32 vectors ... I suspect the problem is less severe with int4 or int8 compressed vectors (haven't tested).

At Amazon (customer facing product search) we also see this swap storm when cold starting the searching process even after NRT segment replication has just copied the files locally: they don't stay "hot" in the OS in that case either (looks like "write once" to the OS).

Lucene already has an awesome feature to "fix" this:MMapDirectory.setPreload. It will pre-touch all pages associated with that file on open, so the OS caches them "once" on startup, much more efficiently than the random access HNSW. But this only makes sense for applications/users that know they have enough RAM (we will test this at Amazon to see if it helps our cold start problems). For my luceneutil tests, simply cat /path/to/index/*.vec >> /dev/null (basically the same as .setPreload I think) fixed/sidestepped the swap storm.

(The Linux kernel's less aggressive default readahead for memory-mapped IO vs traditional NIO is likely not helping either? Is this really a thing? I have not tested NIOFSDirectory to see if the swap storm is lessened).

Longer term we have discussed using AKNN algorithms that are more disk-friendly (e.g. DiskANN), but shorter t erm, I'm wondering if we could somehow help users with a default Directory (from FSDirectory.open) that somehow / sometimes preloads .vec files? It's not easy to do -- you wouldn't know up front that the application will do KNN searching at all. And, maybe only certain vectors in the .vec will ever be accessed and so you need to pay that long random access price to find and cache just those ones.

@rmuir
Copy link
Member

rmuir commented Jul 8, 2024

It's not easy to do -- you wouldn't know up front that the application will do KNN searching at all. And, maybe only certain vectors in the .vec will ever be accessed and so you need to pay that long random access price to find and cache just those ones.

I'm trying to understand this sentence:

  • it should be a one-liner using setPreload to preload "*.vec" if we wanted to do it either from FSDirectory.open or by default in MMapDirectory
  • if the application isn't doing KNN searching then they won't have .vec? I struggle to imagine someone indexing a ton of "extra" vectors that isnt "using" them and hasn't noticed big performance impact

@jpountz
Copy link
Contributor

jpountz commented Jul 8, 2024

It wouldn't solve the issue, only mitigate it, but hopefully cold start performance gets better when we start leveraging IndexInput#prefetch to load multiple vectors from disk concurrently (#13179).

The Linux kernel's less aggressive default readahead for memory-mapped IO vs traditional NIO

FWIW, it's not that the read-ahead is less agressive for mmap than traditional I/O, it's that since recently MMapDirectory explicitly tells the OS to not perform readahead for vectors by passing MADV_RANDOM to madvise.

@mikemccand
Copy link
Member Author

Oh sorry I used the wrong term (thank you @rmuir for clarifying!): it's not a swap storm I'm seeing, it's a page storm. The OS has plenty of free ram (reported by free), and that goes down and buff/cache goes up as the OS pulls and caches pages in for the .vec file. I don't think I'm running too many ram hogging crapplications ;)

  • it should be a one-liner using setPreload to preload "*.vec" if we wanted to do it either from FSDirectory.open or by default in MMapDirectory

+1 -- it would be a simple change. But I worry if it would do more harm than good in some cases, e.g. if there are truly cold HNSW cases where the application plans to suffer through paging to identify the subset of hot vectors? I don't know if that is even a thing -- a bunch of dark matter vectors that never get visited? I guess we do know that HNSW can and does sometimes produce disconnected graphs, but I think those dark matter islands are "typically" smallish.

  • if the application isn't doing KNN searching then they won't have .vec? I struggle to imagine someone indexing a ton of "extra" vectors that isnt "using" them and hasn't noticed big performance impact

+1! Especially because indexing them is quite costly!

It wouldn't solve the issue, only mitigate it, but hopefully cold start performance gets better when we start leveraging IndexInput#prefetch to load multiple vectors from disk concurrently (#13179).

+1 -- that would be a big help especially when paging in from fast SSDs since these devices have high IO concurrency.

The Linux kernel's less aggressive default readahead for memory-mapped IO vs traditional NIO

FWIW, it's not that the read-ahead is less agressive for mmap than traditional I/O, it's that since recently MMapDirectory explicitly tells the OS to not perform readahead for vectors by passing MADV_RANDOM to madvise.

Ahhh, that makes sense! And I think it is still correct to madvise in this way for our .vec files: it really likely (is there any locality to HNSW's access patterns?) is a random access pattern from HNSW. It does make me wonder if the scalar compression will actually help or not. I guess it might still help if that compression means multiple vectors fit into a single IO page.

@uschindler
Copy link
Contributor

I don't think we should change anything here in MMapDirectory. It is all available and easy to do for one that wants to do this. Elasticserach is doing this for some files, but we have no default on preloading anything.

For preloading you can pass MMapDirectory#setPreload and give it a lamda expression to select on which file extensions you want preloading: https://lucene.apache.org/core/9_11_0/core/org/apache/lucene/store/MMapDirectory.html#setPreload(java.util.function.BiPredicate)

The default is: https://lucene.apache.org/core/9_11_0/core/org/apache/lucene/store/MMapDirectory.html#NO_FILES

@uschindler
Copy link
Contributor

uschindler commented Jul 8, 2024

  • it should be a one-liner using setPreload to preload "*.vec" if we wanted to do it either from FSDirectory.open or by default in MMapDirectory

It is trivial:

mmapDir.setPreload((name,ctx) -> name.endsWith(".vec"));

So no change needed in the API or behaviour of Lucene.

@gautamworah96
Copy link
Contributor

@uschindler At Amazon, we implemented the mmapDir.setPreload((name,ctx) -> name.endsWith(".vec")); style fix you had suggested but later realized that this would not work for compound files that have vector data in them eg .cfs?

One approach would be to somehow limit vector files from being folded into compound files by instructing Lucene not to do it.. I was wondering if you had thoughts here/have thought about this approach before?

The slightly tricky bit here is we only want to preload the vector data from the compound files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants