-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a NoopEngine implementation #31163
Add a NoopEngine implementation #31163
Conversation
This adds a new Engine implementation that does.. nothing. Any operations throw an `UnsupportedOperationException` when tried. This engine is intended as a building block for replicated closed indices in subsequent code changes. Relates to elastic#31141
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some high-level comments. I have not done a careful review.
/** | ||
* Returns true if the engine is a noop engine | ||
*/ | ||
public abstract boolean isNoopEngine(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels wrong to me. It looks like it's only used for testing? Is this really needed? We did not feel the need to add something like this for the following engine in CCR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this snuck in from me extracting this from my other branch (where this is actually used), I'll remove it for now and re-add it for the next set of work
@@ -24,4 +24,9 @@ | |||
public Engine newReadWriteEngine(EngineConfig config) { | |||
return new InternalEngine(config); | |||
} | |||
|
|||
@Override | |||
public Engine newNoopEngine(EngineConfig config) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you take a look at how we handle the following engine in CCR and see if that extension point makes sense instead of adding this here? It feels conceptually wrong that the internal engine factory returns something other than an internal engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to make this a plugin? I was thinking built-in but I guess we could move it to a module if we wanted to, I wasn't sure if it's a good idea to do engine implementations inside of modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, it appears that EnginePlugin
is part of the CCR branch, are there plans to factor it out into a separate PR that goes into master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be great if we could make it a module, I am always happy to see isolation when it makes sense and here I think it can. Note that CCR has an engine implementation inside a module. 😇
We have talked exactly about the possibility of reusing the engine plugin work in the context of closed indices many months ago so we could bring that work out of the CCR branch and into 6.x/master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've removed the changes to the EngineFactory
and I can use the pluggable one for the next PR (since this one doesn't hook the engine in yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some comments.
|
||
@Override | ||
public void restoreLocalCheckpointFromTranslog() { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra NL
Translog translog = null; | ||
|
||
// The deletion policy for the translog should not keep any translogs around, so the min age/size is set to -1 | ||
final TranslogDeletionPolicy translogDeletionPolicy = new TranslogDeletionPolicy(-1, -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move this into the try block please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly
private final IndexCommit lastCommit; | ||
private final LocalCheckpointTracker localCheckpointTracker; | ||
private final String historyUUID; | ||
private SegmentInfos lastCommittedSegmentInfos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be final no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
* Directory so that the last commit's user data can be read for the historyUUID | ||
* and last committed segment info. | ||
*/ | ||
public class NoopEngine extends Engine { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this class be final and maybe pkg private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, though I'll un-final it when I work on the second half of this, but it can be final for now :)
@Override | ||
public CommitId flush() throws EngineException { | ||
try { | ||
translog.rollGeneration(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks wrong. We don't write anything here why do we need to modify the translog? I think this should be read-only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order for flushing to clear existing translogs (because we don't want any translog operations to be replayed for peer or store recovery) we want the flush method to remove the translog, this was added so that flushing the new engine would ensure that we don't have any translog operations around that could cause UOEs during recovery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove this for now and re-introduce it later when adding the state transition part, if that makes it better, but we still need to be able to completely remove translog ops before doing recovery since we have no way to do operation-based recovery.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I'd like to remove it for now I can 't see in this change why it's needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've removed that change from this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dakrone . I left some comments and questions.
translogDeletionPolicy.setTranslogGenerationOfLastCommit(lastGen); | ||
translogDeletionPolicy.setMinTranslogGenerationForRecovery(lastGen); | ||
|
||
localCheckpointTracker = createLocalCheckpointTracker(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't really need a local checkpoint tracker here, instead can we work on master to not expose the localCheckpointTracker out of engine (similar to how we don't expose the translog) and then we can avoid creating it? We should assert that maxSeq == localCheckpoint when opening lucene and otherwise fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean something like this: #31213
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. thanks.
final TranslogDeletionPolicy translogDeletionPolicy = new TranslogDeletionPolicy(-1, -1); | ||
|
||
lastCommittedSegmentInfos = store.readLastCommittedSegmentsInfo(); | ||
translog = openTranslog(engineConfig, translogDeletionPolicy, engineConfig.getGlobalCheckpointSupplier()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a translog? all we want is to validate that the translog has a the right uuid and that it's empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the translog all over the place in other methods in Engine
. We can try to encapsulate all of these into different methods (similar to #31213), but I'm not sure what benefit we'd actually get from that. In the next iteration we'll need the translog because we'll need to sync and flush it on engine opening so that there are no operations to be replayed during peer recovery (which I think will still need a translog to retrieve stats about # of ops).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what benefit we'd actually get from that
I looked at these methods and I agree it's on the edge - there are quite a few. I would still prefer we replace them with unsupported operations or noop returns (ex empty sanpshots). This will lock things down and prevent things that aren't supposed to happen - I think that's good. An alternative is to implement a NoopTranslog but that's another rabbit hole.
we'll need to sync and flush it on engine opening
Why is that? I think it's good to only close indices that have no ongoing indexing (like our plan for frozen index). Regardless - why can't we do the flush / trim when we close the open engine and convert it to a noop engine?
I can see one thing down the road because we may close an index on recovery where it has broken settings (TBD). In that case I would still prefer to make utilities methods like Store#associateIndexWithNewTranslog
that work on the folder. Note that this problem isn't solved even if we keep the translog open as we can't index operations from it into the lucene index with the NoopEngine
nor ship to a recovering shard with NoopEngine
in it. We assume those don't exist. I think we should discuss this separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to only close indices that have no ongoing indexing (like our plan for frozen index).
Yes, but even with no ongoing indexing, a translog still remains (due to retention policy)
Regardless - why can't we do the flush / trim when we close the open engine and convert it to a noop engine?
That would require setting a new retention policy on an existing engine (making a part of InternalEngine mutable which makes me :(
). In the future though, we could do the retention policy, sync, and flush/trim when the NoopEngine is opened, and then immediately close the translog.
In order to do this though, we'll have to remove the getTranslog
method from Engine, is that something you want me to do as a precursor to this?
|
||
@Override | ||
public boolean ensureTranslogSynced(Stream<Translog.Location> locations) { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can throw an unsupported operation exception too?
} | ||
|
||
@Override | ||
public Translog getTranslog() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this increases visibilty
@jasontedor @bleskes @s1monw I've removed almost all the pieces from this that were deemed unnecessary, can you take another look please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great thanks for the extra iteration. I left some minor asks.
lastCommit = indexCommits.get(indexCommits.size()-1); | ||
historyUUID = lastCommit.getUserData().get(HISTORY_UUID_KEY); | ||
// We don't want any translogs hanging around for recovery, so we need to set these accordingly | ||
final long lastGen = Long.parseLong(lastCommit.getUserData().get(Translog.TRANSLOG_GENERATION_KEY)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not used?
Also - I think it's to validate integrity - i.e. open the translog, see that it's uuid matches, see that it's empty and shut it down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I pushed a commit removing the unused lastGen
and open-closing the translog for validation purposes
|
||
@Override | ||
public SeqNoStats getSeqNoStats(long globalCheckpoint) { | ||
return new SeqNoStats(0, 0, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the incoming globalCheckpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that violate the assertion that global checkpoint is the minimum of all shards' local checkpoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would but we don't check this here. Also - I think we should load the stats from the last commit point, not just 0 (both max and local checkpoint are stored in the commit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did another round. Sorry for not thinking about these in the first run.
final TranslogDeletionPolicy translogDeletionPolicy = new TranslogDeletionPolicy(-1, -1); | ||
|
||
// The translog is opened and closed to validate that the translog UUID from lucene is the same as the one in the translog | ||
Translog translog = openTranslog(engineConfig, translogDeletionPolicy, engineConfig.getGlobalCheckpointSupplier()); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
|
||
@Override | ||
public SeqNoStats getSeqNoStats(long globalCheckpoint) { | ||
return new SeqNoStats(0, 0, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would but we don't check this here. Also - I think we should load the stats from the last commit point, not just 0 (both max and local checkpoint are stored in the commit)
} | ||
|
||
@Override | ||
public void resetLocalCheckpoint(long localCheckpoint) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you assert this is equal to getLocalCheckpoint()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep added this
} | ||
|
||
@Override | ||
public long getLocalCheckpoint() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you return the local checkpoint stored in the commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added this
|
||
public class NoopEngineTests extends EngineTestCase { | ||
private static final IndexSettings INDEX_SETTINGS = IndexSettingsModule.newIndexSettings("index", Settings.EMPTY); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add a test that creates a normal engine, adds some data, flushes, closes and the opens a noop engine and verify all the stats makes sense and getting a lucene snapshot returns the expected data (doc counts etc)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getting a lucene snapshot returns the expected data (doc counts etc)?
Do you mean from newTranslogSnapshotFromMinSeqNo
? I've been returning an empty snapshot for this with docCount of 0 (and no operations, because we have no way of returning operations), so it won't match the same doc count that a snapshot from a regular engine will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant this . The standard engine ends up returning a SnapshotIndexCommit
which is why I used the term lucene snapshot. I hope this is clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, thanks for clarifying, this was added as a test
@bleskes thanks for taking another look, I think I addressed all your feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the extra work @dakrone . I left some question on the test, but I'm happy either way.
@@ -98,6 +98,9 @@ public void close() { | |||
|
|||
// The translog is opened and closed to validate that the translog UUID from lucene is the same as the one in the translog | |||
Translog translog = openTranslog(engineConfig, translogDeletionPolicy, engineConfig.getGlobalCheckpointSupplier()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrap this in try finally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good catch, I thought about that last night but forgot to do it, I'll update that
@@ -56,4 +67,38 @@ public void testTwoNoopEngines() throws IOException { | |||
engine2.close(); | |||
} | |||
|
|||
public void testNoopAfterRegularEngine() throws IOException { | |||
int docs = randomIntBetween(1, 10); | |||
ReplicationTracker tracker = (ReplicationTracker) engine.config().getGlobalCheckpointSupplier(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? we typically just write to the engine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rabbit hole @jasontedor and I got into last night; in order to completely flush and remove the translog we need a deletion policy where we can advance the generation needed for recovery, in order to advance that we have to have the tracker working correctly (which I guess is normally managed at the IndexShard level), so this allows us to advance global checkpoints as needed so we can remove the translog and open a new noop engine (otherwise the noop engine would throw an exception about there being translog operations when being opened).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can pass the global checkpoint as a LongSupplier to an engine.
final AtomicLong globalCheckpoint = new AtomicLong(SequenceNumbers.NO_OPS_PERFORMED);
try (Store store = createStore();
InternalEngine engine = createEngine(store, createTempDir(), globalCheckpoint::get)) {
engine.syncTranslog(); // Make sure that the global checkpoint is persisted to the translog's checkpoint
}
List<IndexCommit> indexCommits = DirectoryReader.listCommits(store.directory()); | ||
lastCommit = indexCommits.get(indexCommits.size()-1); | ||
historyUUID = lastCommit.getUserData().get(HISTORY_UUID_KEY); | ||
localCheckpoint = Long.parseLong(lastCommit.getUserData().get(SequenceNumbers.LOCAL_CHECKPOINT_KEY)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not yet relevant to this PR but these will have to be the same IMO .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments/questions.
try { | ||
lastCommittedSegmentInfos = store.readLastCommittedSegmentsInfo(); | ||
List<IndexCommit> indexCommits = DirectoryReader.listCommits(store.directory()); | ||
lastCommit = indexCommits.get(indexCommits.size()-1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting nit: indexCommits.size()-1
-> indexCommits.size() - 1
// The deletion policy for the translog should not keep any translogs around, so the min age/size is set to -1 | ||
final TranslogDeletionPolicy translogDeletionPolicy = new TranslogDeletionPolicy(-1, -1); | ||
|
||
// The translog is opened and closed to validate that the translog UUID from lucene is the same as the one in the translog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a test case for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added a test
final NoopEngine engine = new NoopEngine(noopConfig(INDEX_SETTINGS, store, primaryTranslogDir)); | ||
expectThrows(UnsupportedOperationException.class, () -> engine.index(null)); | ||
expectThrows(UnsupportedOperationException.class, () -> engine.delete(null)); | ||
expectThrows(UnsupportedOperationException.class, () -> engine.noOp(null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow-up can we have:
Noop
->NoOp
noop
->no-op
(when written in exception messages to the user, and code comments)noopEngine
-> `noOpEngine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooookkaayy
|
||
public void testTwoNoopEngines() throws IOException { | ||
engine.close(); | ||
// It's so noop you can even open two engines for the same store without tripping anything |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that we are going to rely on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not? I just thought it was a good test that we aren't acquiring locks or locking files accidentally or anything like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, maybe the test name should be clearer, or at least a comment of its purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I'll add a comment
@jasontedor I addressed your comments (I'll do the rename in a followup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@elasticmachine run sample packaging tests please |
In elastic#31163 (comment) Jason requested renaming NoopEngine to NoOpEngine. This commit renames the class and relevant parts. Relates to elastic#31141
In #31163 (comment) Jason requested renaming NoopEngine to NoOpEngine. This commit renames the class and relevant parts. Relates to #31141
This commit uses the NoOp engine introduced in elastic#31163 for closed indices. Instead of being removed from the routing table and essentially "dropped" (but not deleted), closed indices are now replicated the same way open indices are. Relates to elastic#31141
This adds a new Engine implementation that does.. nothing. Any operations throw an `UnsupportedOperationException` when tried. This engine is intended as a building block for replicated closed indices in subsequent code changes. Relates to #31141
In #31163 (comment) Jason requested renaming NoopEngine to NoOpEngine. This commit renames the class and relevant parts. Relates to #31141
This adds a new Engine implementation that does.. nothing. Any operations throw an `UnsupportedOperationException` when tried. This engine is intended as a building block for replicated closed indices in subsequent code changes. Relates to elastic#31141
In elastic#31163 (comment) Jason requested renaming NoopEngine to NoOpEngine. This commit renames the class and relevant parts. Relates to elastic#31141
This adds a new Engine implementation that does.. nothing. Any operations throw
an
UnsupportedOperationException
when tried. This engine is intended as abuilding block for replicated closed indices in subsequent code changes.
Relates to #31141