[Searchable Remote Index] Add safeguards to ensure a cluster cannot be over-subscribed #7033

andrross · 2023-04-06T16:54:12Z

A searchable snapshot index consumes some resources in the cluster and also requires some amount of local disk to store cached data. Therefore storage is not infinite and there are limits, beyond which the user experience will degrade to unacceptable levels. The goal of this task is to add safeguards to ensure a reasonable user experience is maintained.

We should implement limits to ensure the cluster stays healthy and cannot be unwittingly pushed into an unusable state. In particular, there are two areas to investigate here:

(Broken out as separate issue) At search time, if there are enough concurrent searches that there is no capacity remaining in the reserved disk cache, then searches should be rejected with an appropriate error message.
At searchable snapshot index creation time, we should implement limits and reject the requests if the cluster capacity is exceeded. This can likely be implemented as a ratio of cluster-based disk cache size to total data size in the remote repository.

Note that there is future work to implement features that give more of an "infinite" storage experience (at the cost of higher search latencies), but searchable snapshots as it is implemented today keeps some metadata loaded so that data readily searchable and therefore subject to some limits.

kotwanikunal · 2023-05-11T17:12:03Z

Digging a bit into this issue -

This can likely be implemented as a ratio of cluster-based disk cache size to total data size in the remote repository.

I think this will have to be a function of sum_of_all(restored_index_size) and sum_of_all * (disk_cache_size).

When restoring an index, we will check if sum_of_all * (restored_index_size) + (requested_restored_index_size) is within the bounds to the ratio of sum_of_all * (disk_cache_size) or total_disk_cache_size.

The size of the remote repository can be deceiving since the user might want to restore only a single index, as well as the fact that there can be multiple repositories within a cluster.

xiaoshi2013 · 2023-12-18T11:54:24Z

good

andrross added bug Something isn't working untriaged labels Apr 6, 2023

anasalkouz added distributed framework and removed untriaged labels Apr 6, 2023

andrross mentioned this issue Apr 11, 2023

[Searchable Snapshots] Fail searches if local disk cache is full #7095

Open

This was referenced Apr 28, 2023

[RFC] Add Search to Remote-backed Storage #6528

Closed

Store size APIs should be updated to reflect when all data is not local #7332

Open

anasalkouz changed the title ~~[Searchable Snapshots] Add safeguards to ensure a cluster cannot be over-subscribed~~ [Searchable Remote Index] Add safeguards to ensure a cluster cannot be over-subscribed May 4, 2023

kotwanikunal self-assigned this May 10, 2023

kotwanikunal mentioned this issue May 23, 2023

Add safeguards to prevent file cache over-subscription #7713

Closed

6 tasks

This was referenced Jul 7, 2023

Add safeguard limits for file cache during node level allocation #8208

Merged

Add support for a FileCacheDecider #8535

Open

kotwanikunal closed this as completed in #8208 Jul 10, 2023

kotwanikunal mentioned this issue Jul 11, 2023

Add restore level safeguards to prevent file cache oversubscription #8606

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Searchable Remote Index] Add safeguards to ensure a cluster cannot be over-subscribed #7033

[Searchable Remote Index] Add safeguards to ensure a cluster cannot be over-subscribed #7033

andrross commented Apr 6, 2023 •

edited

Loading

kotwanikunal commented May 11, 2023 •

edited

Loading

xiaoshi2013 commented Dec 18, 2023

[Searchable Remote Index] Add safeguards to ensure a cluster cannot be over-subscribed #7033

[Searchable Remote Index] Add safeguards to ensure a cluster cannot be over-subscribed #7033

Comments

andrross commented Apr 6, 2023 • edited Loading

kotwanikunal commented May 11, 2023 • edited Loading

xiaoshi2013 commented Dec 18, 2023

andrross commented Apr 6, 2023 •

edited

Loading

kotwanikunal commented May 11, 2023 •

edited

Loading