-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] CorruptedFileIT.testCorruptFileThenSnapshotAndRestore failure #30577
Comments
Pinging @elastic/es-distributed |
This test creates an index with 0 replica, merges disabled and a very high value for the flush translog size setting. Then it corrupts a file in a random primary shard (in this failure, it is When creating the snapshot, the test expects that the shard snapshot process loads the store metadata and then fails because one of the Lucene files is corrupted, failing the shard snapshot and marking the snapshot as PARTIAL. In this test failure the snapshot completed as SUCCESS. I looked closely at it and I can't figure why it didn't fail when loading the store metadata. Of course it does not reproduce locally. The test correclty corrupted the file:
I suspect a test bug, or maybe that the file was not part of the snapshot but it should have been. Merges are disabled and flushes are manually executed before corrupting the file. I pushed 7915b5f to add more debug information, hopefully this error will appear again and we'll be able to grab the shard files and the snapshotted files. |
This test failed but the cause is not obvious. This commit adds more debug logging traces so that if it reproduces we could gather more information. Related #30577
@dnhatn With the changes that were made to |
@ywelsch I have looked at the test. I will follow-up this test with @tlrx. |
awesome! |
@tlrx You're right. My assumption is not correct in this case. |
@tlrx I will be taking care of this. |
@tlrx and @ywelsch I've reproduced and have an explanation for this. This is possibly caused by LUCENE-8253
Previously, a fully deleted segment is kept around until the next commit, however since LUCENE-8253 it is dropped immediately. The problem is that its files, which are not referenced by any commit point, are not released but retained in IndexFileDeleter's lastFiles.
I opened LUCENE-8324. If @s1monw agrees to fix this in Lucene, we are good; otherwise we need to update the test. |
Thanks @dnhatn. I suspected something like that but I did not look deeply enough. |
This test failed but the cause is not obvious. This commit adds more debug logging traces so that if it reproduces we could gather more information. Related elastic#30577
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=sles/2435/console
After a file is corrupted, snapshot should throw an error, but it seems to succeed instead. Doesn't reproduce locally though.
consoleText.txt
The text was updated successfully, but these errors were encountered: