-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit user to single concurrent auth per realm #30794
Conversation
This commit reworks the way our realms perform caching in order to limit each principal to a single ongoing authentication per realm. In other words, this means that multiple requests made by the same user will not trigger more that one authentication attempt at a time if no entry has been stored in the cache. If an entry is present in our cache, there is no restriction on the number of concurrent authentications performed for this user. This change enables us to limit the load we place on an external system like an LDAP server and also preserve resources such as CPU on expensive operations such as BCrypt authentication. Closes elastic#30355
Pinging @elastic/es-security |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach, but I have a suggested change.
listeners.clear(); | ||
} | ||
|
||
private void notifyListener(Tuple<ActionListener<V>, ExecutorService> tuple) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels weird for this to take a tuple rather than 2 arguments (since sometimes we're constructing the Tuple just to pass as an argument and then be decomposed again)
private volatile boolean done = false; | ||
private final List<Tuple<ActionListener<V>, ExecutorService>> listeners = new ArrayList<>(); | ||
|
||
public void addListener(ActionListener<V> listener, ExecutorService executor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs javadoc - if only to point to the explanation in the class javadoc.
cache.invalidate(token.principal(), future); | ||
authenticateWithCache(token, listener); | ||
} else if (authResult.isAuthenticated()) { | ||
if (createdAndStartedFuture) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of createdAndStartedFuture
here worries me.
We need to handle the two cases (directly authenticated vs loaded from cache) differently, but this variable is a couple of steps removed from that and I fear that something like a race condition somewhere in the cache could mean that this is true
, but the user was not actually authenticated with the credentials from token
.
Perhaps we could set an AtomicReference
to User
in the listener for doAuthenticate
and check whether the User
being pulled from the cache matches. Then were no longer testing "did we create the future" but "did we actually authenticate this User object".
e.g.
doAuthenticate(token, ActionListener.wrap(result -> {
if (result.isAuthenticated()) {
final User user = result.getUser();
authenticatedUser.set(user);
final boolean wasAuthenticatedWithThisToken = tuple.v2() == authenticatedUser.get();
handleResult(future, wasAuthenticatedWithThisToken, token, tuple, listener);
@tvernum thanks for the suggestion. I implemented the suggestion slightly differently. Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This commit reworks the way our realms perform caching in order to limit each principal to a single ongoing authentication per realm. In other words, this means that multiple requests made by the same user will not trigger more that one authentication attempt at a time if no entry has been stored in the cache. If an entry is present in our cache, there is no restriction on the number of concurrent authentications performed for this user. This change enables us to limit the load we place on an external system like an LDAP server and also preserve resources such as CPU on expensive operations such as BCrypt authentication. Closes #30355
* master: Update the version checks around ip_range bucket keys, now that the change was backported. Mute IndexMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges Use geohash cell instead of just a corner in geo_bounding_box (elastic#30698) Limit user to single concurrent auth per realm (elastic#30794) [Tests] Move templated _rank_eval tests (elastic#30679) Security: fix dynamic mapping updates with aliases (elastic#30787) Ensure that ip_range aggregations always return bucket keys. (elastic#30701) Use remote client in TransportFieldCapsAction (elastic#30838) Move Watcher versioning setting to meta field (elastic#30832) [Docs] Explain incomplete dates in range queries (elastic#30689) Move persistent task registrations to core (elastic#30755) Decouple ClusterStateTaskListener & ClusterApplier (elastic#30809) Send client headers from TransportClient (elastic#30803) Packaging: Ensure upgrade_is_oss flag file is always deleted (elastic#30732) Force stable file modes for built packages (elastic#30823) [DOCS] Fixes typos in security settings Fix GeoShapeQueryBuilder serialization after backport
* es/master: Move score script context from SearchScript to its own class (#30816) Fix bad version check writing Repository nodes (#30846) [docs] explainer for java packaging tests (#30825) Remove Throwable usage from transport modules (#30845) REST high-level client: add put ingest pipeline API (#30793) Update the version checks around ip_range bucket keys, now that the change was backported. Mute IndexMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges Use geohash cell instead of just a corner in geo_bounding_box (#30698) Limit user to single concurrent auth per realm (#30794) [Tests] Move templated _rank_eval tests (#30679) Security: fix dynamic mapping updates with aliases (#30787) Ensure that ip_range aggregations always return bucket keys. (#30701) Use remote client in TransportFieldCapsAction (#30838) Move Watcher versioning setting to meta field (#30832) [Docs] Explain incomplete dates in range queries (#30689) Move persistent task registrations to core (#30755) Decouple ClusterStateTaskListener & ClusterApplier (#30809) Send client headers from TransportClient (#30803) Packaging: Ensure upgrade_is_oss flag file is always deleted (#30732) Force stable file modes for built packages (#30823)
* es/6.x: Move score script context from SearchScript to its own class (#30816) Fix bad version check writing Repository nodes (#30846) Modify state of VerifyRepositoryResponse for bwc (#30762) QA: Fix tribe tests when running default zip Use remote client in TransportFieldCapsAction (#30838) Mute IndexMasterFailoverIT.testMasterFailoverDuringIndexingWithMappingChanges Ensure that ip_range aggregations always return bucket keys. (#30701) Limit user to single concurrent auth per realm (#30794) Security: fix dynamic mapping updates with aliases (#30787) [Tests] Move templated _rank_eval tests (#30679) Move Watcher versioning setting to meta field (#30832) Restore "Add more yaml tests for get alias API " (#30814) Send client headers from TransportClient (#30803) [Docs] Explain incomplete dates in range queries (#30689) Move persistent task registrations to core (#30755) Decouple ClusterStateTaskListener & ClusterApplier (#30809) Packaging: Ensure upgrade_is_oss flag file is always deleted (#30732) Force stable file modes for built packages (#30823)
I have pushed e888467 and 935c6c7 to fix a minor test bug:
if
|
* master: silence InstallPluginCommandTests, see #30900 Remove left-over comment Fix double semicolon in import statement [TEST] Fix minor random bug from #30794 Include size of snapshot in snapshot metadata #18543, bwc clean up (#30890) Enabling testing against an external cluster (#30885) Add public key header/footer (#30877) SQL: Remove the last remaining server dependencies from jdbc (#30771) Include size of snapshot in snapshot metadata (#29602) Do not serialize basic license exp in x-pack info (#30848) Change BWC version for VerifyRepositoryResponse (#30796) [DOCS] Document index name limitations (#30826) Harmonize include_defaults tests (#30700)
* 6.x: Fix double semicolon in import statement [TEST] Fix minor random bug from #30794 Enabling testing against an external cluster (#30885) SQL: Remove the last remaining server dependencies from jdbc (#30771) Add public key header/footer (#30877) Include size of snapshot in snapshot metadata (#29602) QA: Test template creation during rolling restart (#30850) REST high-level client: add put ingest pipeline API (#30793) Do not serialize basic license exp in x-pack info (#30848) [docs] explainer for java packaging tests (#30825) Verify signatures on official plugins (#30800) [DOCS] Document index name limitations (#30826) [Docs] Add reindex.remote.whitelist example (#30828)
thanks @albertzaharovits |
if (tuple != null) { | ||
final UserWithHash userWithHash = tuple.v2(); | ||
final boolean performedAuthentication = createdAndStartedFuture.get() && userWithHash != null && | ||
tuple.v2().user == authenticatedUser.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaymode I think tuple.v2().user == authenticatedUser.get()
is redundant (or there's a deeper reasoning that eludes me). Can you please clarify why this check is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was introduced in response to my feedback above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That link doesn't seem to work correctly.
It's the 3rd comment in my review above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tvernum . I understand, it's always better to test for the actual thing, if you can, instead of a surrogate.
But as it makes you feel surer on the whole it makes me feel less of. The whole seems too gnarly: handleResult
has both createdAndStartedFuture
and performedAuthentication
as params.
Then, I think we can do without createdAndStartedFuture
.
I will open a nimble refactor non-issue on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tvernum I am going to concede on this one, for now 😭 .
I could not find a satisfactory refactoring such that the current behavior is maintained over all branches. Specifically, I was looking to minimize authenticationResult.isAuthenticated()
checks, as well as to split the listener into two types, one for the thread going for the network call and the other for the stalled threads. In the end I got a huge branching monster no better than the present one.
😢
After elastic#30794, our caching realms limit each principal to a single auth attempt at a time. This prevents hammering of external servers but can cause a significant performance hit when requests need to go through a realm that takes a long time to attempt to authenticate in order to get to the realm that actually authenticates. In order to address this, this change will propagate failed results to listeners if they use the same set of credentials that the authentication attempt used. This does prevent these stalled requests from retrying the authentication attempt but the implementation does allow for new requests to retry the attempt.
After #30794, our caching realms limit each principal to a single auth attempt at a time. This prevents hammering of external servers but can cause a significant performance hit when requests need to go through a realm that takes a long time to attempt to authenticate in order to get to the realm that actually authenticates. In order to address this, this change will propagate failed results to listeners if they use the same set of credentials that the authentication attempt used. This does prevent these stalled requests from retrying the authentication attempt but the implementation does allow for new requests to retry the attempt.
After #30794, our caching realms limit each principal to a single auth attempt at a time. This prevents hammering of external servers but can cause a significant performance hit when requests need to go through a realm that takes a long time to attempt to authenticate in order to get to the realm that actually authenticates. In order to address this, this change will propagate failed results to listeners if they use the same set of credentials that the authentication attempt used. This does prevent these stalled requests from retrying the authentication attempt but the implementation does allow for new requests to retry the attempt.
This commit reworks the way our realms perform caching in order to
limit each principal to a single ongoing authentication per realm. In
other words, this means that multiple requests made by the same user
will not trigger more that one authentication attempt at a time if no
entry has been stored in the cache. If an entry is present in our
cache, there is no restriction on the number of concurrent
authentications performed for this user.
This change enables us to limit the load we place on an external system
like an LDAP server and also preserve resources such as CPU on
expensive operations such as BCrypt authentication.
Closes #30355