-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ILM: Skip rolling indexes that are already rolled #47324
Conversation
An index with an ILM policy that has a `rollover` action in one of the phases was rolled over when the ILM conditions dictated regardless if it was already rolled over (eg. manually after modifying an index template in order to force the creation of a new index that uses the new mappings). This commit adds the `index.lifecycle.rollover.skip_rolled` setting, which defaults to true, to change this behaviour and have ILM check if the index it's about to roll has not been rolled over in the meantime.
Pinging @elastic/es-core-features |
@@ -23,6 +23,17 @@ policy that contains a rollover action. When the index rolls over, the alias is | |||
updated to reflect that the index is no longer the write index. For more | |||
information about rollover, see <<using-policies-rollover>>. | |||
|
|||
`index.lifecycle.rollover.skip_rolled`:: (Expert Setting) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better suggestion for naming this setting? I also went ahead and made it true
by default. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allow_multiple_rollovers
maybe? I don't love that either.
skip_rollover_if_already_rolled
? More verbose, but that might be okay.
I'm honestly not sure why someone would want to change this setting in the first place. It's probably good to have a setting for it just in case, but I can't think of any actual use case for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this setting is a little strange, if a user were to set it to false they would essentially be rolling over an index that was no longer the write index, and in the case that ILM is configured like it usually is (the policy is applied to every index), it will lead to unintended behavior.
I think we should remove this setting, since to me this feels a little like having a setting to re-introduce buggy behavior.
IndexMetaData indexMetaData = IndexMetaData.builder(randomAlphaOfLength(10)) | ||
.putAlias(AliasMetaData.builder(rolloverAlias)) | ||
.settings(settings(Version.CURRENT).put(RolloverAction.LIFECYCLE_ROLLOVER_ALIAS, rolloverAlias)) | ||
.putRolloverInfo(new RolloverInfo(randomAlphaOfLength(5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think this scenario makes sense? (ie. allowing ILM to attempt a rollover if the index was manually rolled under a different alias or should it skip rolling over a rolled index regardless of the alias?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is like, the corneriest of corner cases. I suspect the behavior in this case will not ever matter. That said, I think we should only pay attention to rollovers using the same alias, and if we're doing that intentionally, might as well test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check to see if rolling the index using a different alias prompts the same error that #44175 sees (manually is fine), and if so, we should fix that too. I think Gordon is right that it won't really matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good start, thanks! Left some comments, I also think an integration test in TimeSeriesLifecycleActionsIT
would probably be good here to make sure we get through all the steps in rollover without erroring.
@@ -23,6 +23,17 @@ policy that contains a rollover action. When the index rolls over, the alias is | |||
updated to reflect that the index is no longer the write index. For more | |||
information about rollover, see <<using-policies-rollover>>. | |||
|
|||
`index.lifecycle.rollover.skip_rolled`:: (Expert Setting) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allow_multiple_rollovers
maybe? I don't love that either.
skip_rollover_if_already_rolled
? More verbose, but that might be okay.
I'm honestly not sure why someone would want to change this setting in the first place. It's probably good to have a setting for it just in case, but I can't think of any actual use case for it.
@@ -53,6 +55,12 @@ public void evaluateCondition(IndexMetaData indexMetaData, Listener listener) { | |||
return; | |||
} | |||
|
|||
if (indexMetaData.getSettings().getAsBoolean(LIFECYCLE_ROLLOVER_SKIP_ROLLED, true) && | |||
indexMetaData.getRolloverInfos().get(rolloverAlias) != null) { | |||
logger.trace("Index {} was already rolled, skipping", indexMetaData.getIndex().getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think this should be
info
(rather than `trace) - Use brackets around index name, and don't capitalize, to fit better with the typical log format
- Nitpicky, but I'd use a message more like
index [{}] was already rolled over for alias [{}], not attempting to roll over again
@@ -53,6 +55,12 @@ public void evaluateCondition(IndexMetaData indexMetaData, Listener listener) { | |||
return; | |||
} | |||
|
|||
if (indexMetaData.getSettings().getAsBoolean(LIFECYCLE_ROLLOVER_SKIP_ROLLED, true) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll also need this check in RolloverStep
, or it'll just try to issue the rollover request there and fail a step later. I think UpdateRolloverLifecycleDateStep
is fine not checking it, because we should probably update the reference date anyway, and the step to set indexing complete is probably good to do anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, otherwise we pass the check step but fail on the "actual" rollover
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I wasn't seeing this in the manual tests as we need to complete this step here (call the listener
callback). I'll add the check to the RolloverStep
too with the listener
callback as well. Will capture this in an integration test too as you guys suggested
IndexMetaData indexMetaData = IndexMetaData.builder(randomAlphaOfLength(10)) | ||
.putAlias(AliasMetaData.builder(rolloverAlias)) | ||
.settings(settings(Version.CURRENT).put(RolloverAction.LIFECYCLE_ROLLOVER_ALIAS, rolloverAlias)) | ||
.putRolloverInfo(new RolloverInfo(randomAlphaOfLength(5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is like, the corneriest of corner cases. I suspect the behavior in this case will not ever matter. That said, I think we should only pay attention to rollovers using the same alias, and if we're doing that intentionally, might as well test it.
step.evaluateCondition(indexMetaData, new AsyncWaitStep.Listener() { | ||
|
||
@Override | ||
public void onResponse(boolean complete, ToXContentObject informationContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably check that complete
is true
.
step.evaluateCondition(indexMetaData, new AsyncWaitStep.Listener() { | ||
|
||
@Override | ||
public void onResponse(boolean complete, ToXContentObject informationContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably check that complete
is true
.
Mockito.verify(indicesClient, Mockito.only()).rolloverIndex(Mockito.any(), Mockito.any()); | ||
} | ||
|
||
private IndicesAdminClient indicesAdminClientMock() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Helper methods should come after all of the actual test methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Andrei! I left some comments, in particular there is a bug where we don't actually advance the step. I think it's not caught because we don't have any integration tests for this (only unit tests). I think we should add some REST tests (should be something we can exercise from a .yaml test) for manually rolling over an index.
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/WaitForRolloverReadyStep.java
Show resolved
Hide resolved
@@ -53,6 +55,12 @@ public void evaluateCondition(IndexMetaData indexMetaData, Listener listener) { | |||
return; | |||
} | |||
|
|||
if (indexMetaData.getSettings().getAsBoolean(LIFECYCLE_ROLLOVER_SKIP_ROLLED, true) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, otherwise we pass the check step but fail on the "actual" rollover
IndexMetaData indexMetaData = IndexMetaData.builder(randomAlphaOfLength(10)) | ||
.putAlias(AliasMetaData.builder(rolloverAlias)) | ||
.settings(settings(Version.CURRENT).put(RolloverAction.LIFECYCLE_ROLLOVER_ALIAS, rolloverAlias)) | ||
.putRolloverInfo(new RolloverInfo(randomAlphaOfLength(5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check to see if rolling the index using a different alias prompts the same error that #44175 sees (manually is fine), and if so, we should fix that too. I think Gordon is right that it won't really matter.
} | ||
|
||
private IndicesAdminClient indicesAdminClientMock() { | ||
AdminClient adminClient = Mockito.mock(AdminClient.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would vastly prefer that we subclass NoOpClient
to test this rather than using mocks, but since this test already has precedence for using them, it's okay.
@@ -23,6 +23,17 @@ policy that contains a rollover action. When the index rolls over, the alias is | |||
updated to reflect that the index is no longer the write index. For more | |||
information about rollover, see <<using-policies-rollover>>. | |||
|
|||
`index.lifecycle.rollover.skip_rolled`:: (Expert Setting) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this setting is a little strange, if a user were to set it to false they would essentially be rolling over an index that was no longer the write index, and in the case that ILM is configured like it usually is (the policy is applied to every index), it will lead to unintended behavior.
I think we should remove this setting, since to me this feels a little like having a setting to re-introduce buggy behavior.
Thanks for your comments @dakrone and @gwbrown. I agree on the setting being odd (and offers little value). I'll drop it. @dakrone a manual rollover on a different alias between |
@elasticmachine update branch |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I did leave a comment about a minor change to the test, but it shouldn't need another review. Thanks for iterating on this Andrei!
|
||
@Override | ||
public void onResponse(boolean complete) { | ||
assertThat(complete, is(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is asynchronous, it's possible that the onResponse never actually gets called (we don't verify that it was called), can you add a CountDownLatch
that in onResponse
is counted down, we can then .await()
on the latch and ensure that onResponse
did check the complete
boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes! Good shout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this would be true in an integration test but in this case we call evaluateStep
from the main test thread, making this effectively synchronous.
|
||
@Override | ||
public void onResponse(boolean complete, ToXContentObject informationContext) { | ||
assertThat(complete, is(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about the latch, we need to check that onResponse
is actually called in the test.
|
||
@Override | ||
public void onResponse(boolean complete, ToXContentObject informationContext) { | ||
assertThat(complete, is(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about the latch :)
An index with an ILM policy that has a rollover action in one of the phases was rolled over when the ILM conditions dictated regardless if it was already rolled over (eg. manually after modifying an index template in order to force the creation of a new index that uses the new mappings). This changes this behaviour and has ILM check if the index it's about to roll has not been rolled over in the meantime. (cherry picked from commit 37d6106) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
An index with an ILM policy that has a rollover action in one of the phases was rolled over when the ILM conditions dictated regardless if it was already rolled over (eg. manually after modifying an index template in order to force the creation of a new index that uses the new mappings). This changes this behaviour and has ILM check if the index it's about to roll has not been rolled over in the meantime. (cherry picked from commit 37d6106) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
An index with an ILM policy that has a
rollover
action in one of thephases was rolled over when the ILM conditions dictated regardless if
it was already rolled over (eg. manually after modifying an index
template in order to force the creation of a new index that uses the new
mappings).
This changes this behaviour and has ILM check if the index it's about to
roll has not been rolled over in the meantime.
Closes #44175