Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add expire-after-write to roles cache #30505

Open
tvernum opened this issue May 10, 2018 · 4 comments
Open

Add expire-after-write to roles cache #30505

tvernum opened this issue May 10, 2018 · 4 comments
Labels
>enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team

Comments

@tvernum
Copy link
Contributor

tvernum commented May 10, 2018

The current role store cache has no automatic expiry except if full (which will rarely happen).

However that means that any inconsistencies that somehow find their way into the cache (through bugs, poorly timed node failures, etc) will persist until either

  1. the cache is manually cleared
  2. the node is restarted

it would be preferable to add a eviction time on this cache, even if it is measured in hours.

@tvernum tvernum added >enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC labels May 10, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security

@tvernum
Copy link
Contributor Author

tvernum commented May 10, 2018

Regarding poorly timed node failures:

We execute an acction to clear entries from the cache after they are updated in the native store
(source) but if the update occurs, and then the node fails, it's possible that the index will be updated, but the cache will not be cleared on some/all other nodes.
When that failed node is restarted (or another node is brought online) it will see the new role definition from the index but other nodes will still have the cached definition, indefinitely.

The client that sent the failed update ought to retry since it never got a successful response, but we cannot guarantee that will happen (e.g. perhaps the client failed at the same time, due to the same power outage)

@bizybot
Copy link
Contributor

bizybot commented May 10, 2018

I like the idea to expire after write, just for my understanding: When the index gets updated on the other nodes, do we get any event to a listener that allows us to monitor security index changes and then do invalidation of cache? Eventually, index on the other node will be updated, I do see IndexOutOfDateChange not sure if it is the same thing, would this handle the scenario?

@tvernum
Copy link
Contributor Author

tvernum commented May 10, 2018

@bizybot
The IndexOutOfDateChange (which is being refactored by #30466, but the concept will still exist) is only dealing with index metadata. It captures the internal version marker of the index, to assist in upgrading etc. It does not change when the data inside the index changes.

At the moment, because we theoretically (*) replicate the security index to every node, each node knows when the index gets updated, but we don't have listeners for such events.

And:

  1. It's not actually true that every node has a copy of the security index. If the cluster has multiple nodes per host, and is configured to prevent replicas being allocated to nodes on the same host (which they should do), then the security index will only be replicated to 1 node per host. See .security with status yellow when cluster.routing.allocation.same_shard.host enabled #29933
  2. We may change our replication strategy in the future to not replicate security to every node.
  3. All of this could change if we introduce and use an internal blog store instead of a standard index (which has been proposed, but I don't believe anyone has done any work on that)

@rjernst rjernst added the Team:Security Meta label for security team label May 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team
Projects
None yet
Development

No branches or pull requests

4 participants