Skip to content

Commit

Permalink
fix(resharding): Allow creating the flat storage multiple times for a…
Browse files Browse the repository at this point in the history
… shard. (#10696)

Removing the assertion and allowing flat storage to be created multiple
times for a shard. This is needed so fix an issue when node is restarted
in the middle of resharding. The flat storage may be created already for
a subset of shards but unless all are finished resharding will get
restarted. Becuase the flat storage was created, for those shards, it
will be created on node startup as well as after the second resharding
is finished.

This is not a perfect solution and not particularly clean. The best
alternative seems to be to implement resuming of resharding where we
don't restart resharding for shards that were finished. This is more a
comlex change and we want to get this PR in to the release so for now
I'm sticking to the simplest approach.

This seems to be safe because even though the flat storage for children
shards is created it's not used anywhere.

Sanity check - do we ever check the existance of flat storage for a
shard for anything?
  • Loading branch information
wacban authored and posvyatokum committed Mar 5, 2024
1 parent 9e98b54 commit f3597b8
Showing 1 changed file with 14 additions and 8 deletions.
22 changes: 14 additions & 8 deletions core/store/src/flat/manager.rs
Original file line number Diff line number Diff line change
Expand Up @@ -64,19 +64,25 @@ impl FlatStorageManager {
);
}

/// Creates flat storage instance for shard `shard_uid`. The function also checks that
/// the shard's flat storage state hasn't been set before, otherwise it panics.
/// Creates flat storage instance for shard `shard_uid`. This function
/// allows creating flat storage if it already exists even though it's not
/// desired. It needs to allow that to cover a resharding restart case.
/// TODO (#7327): this behavior may change when we implement support for state sync
/// and resharding.
pub fn create_flat_storage_for_shard(&self, shard_uid: ShardUId) -> Result<(), StorageError> {
tracing::debug!(target: "store", ?shard_uid, "Creating flat storage for shard");
let mut flat_storages = self.0.flat_storages.lock().expect(POISONED_LOCK_ERR);
let original_value =
flat_storages.insert(shard_uid, FlatStorage::new(self.0.store.clone(), shard_uid)?);
// TODO (#7327): maybe we should propagate the error instead of assert here
// assert is fine now because this function is only called at construction time, but we
// will need to be more careful when we want to implement flat storage for resharding
assert!(original_value.is_none());
let flat_storage = FlatStorage::new(self.0.store.clone(), shard_uid)?;
let original_value = flat_storages.insert(shard_uid, flat_storage);
if original_value.is_some() {
// Generally speaking this shouldn't happen. It may only happen when
// the node is restarted in the middle of resharding.
//
// TODO(resharding) It would be better to detect when building state
// is finished for a shard and skip doing it again after restart. We
// can then assert that the flat storage is only created once.
tracing::warn!(target: "store", ?shard_uid, "Creating flat storage for shard that already has flat storage.");
}
Ok(())
}

Expand Down

0 comments on commit f3597b8

Please sign in to comment.