-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Continuous data frame should be more robust to new and deleted indices #43992
Comments
Pinging @elastic/ml-core |
Full exception from "failure in update check"
|
Furthermore, if the setup also periodically deletes trailing indices that fall within the pattern, the checkpoint progress fails to move forward. For the most recent test run, the progress is stopped at high |
Additional exception snippet. This occurs less frequently.
|
At a code level I think the problems observed are:
We need to find a way to make checkpoints robust to indices entering or leaving the set of source indices. When an index enters or leaves the set it's reasonable to treat this as meaning there's been a change since the previous checkpoint. But for the indices that do still exist and are still open it's still possible to calculate checkpoint stats. |
I discussed this with @hendrikmuhs. For 7.3 some simple bug fixes we could do are:
However, this also interacts quite heavily with solving the 65000 terms problem. So the timeline and mechanism for fixing that affects the decision of what to do about this problem. |
- do not let checkpointing fail if indexes got deleted - treat missing seqNoStats as just created indices (checkpoint 0) - loglevel: do not treat failed updated checks as error fixes elastic#43992
make checkpointing more robust: - do not let checkpointing fail if indexes got deleted - treat missing seqNoStats as just created indices (checkpoint 0) - loglevel: do not treat failed updated checks as error fixes #43992
make checkpointing more robust: - do not let checkpointing fail if indexes got deleted - treat missing seqNoStats as just created indices (checkpoint 0) - loglevel: do not treat failed updated checks as error fixes elastic#43992
make checkpointing more robust: - do not let checkpointing fail if indexes got deleted - treat missing seqNoStats as just created indices (checkpoint 0) - loglevel: do not treat failed updated checks as error fixes elastic#43992
Found in 7.3.0
"build_hash" : "f8fd432", "build_date" : "2019-07-03T15:05:06.452272Z",
3 node cluster.
Index template for
temp-*
has 3 shards and 1 replica.New index
temp-100?
is being created every 12 seconds with a bulk upload of 4000 documents.When polling
GET _data_frame/transforms/blah*/_stats
periodic checkpoint exceptions occur. These are displayed in the UI transform list as generic server error 500 toast messages, providing the page refresh cycle coincides.Index
temp_1013
has just been created. There is a small window when this index health is yellow. I think it might also be possible that the replica is not yet ready (not sure if health is considered yellow in this case).The elasticsearch logs contained repeated messages
Expected behavior
New source index creation is likely for continuous data frames. Continuous data frames should be tolerant of this.
The text was updated successfully, but these errors were encountered: