Skip to content

Commit

Permalink
[Bifrost] Provisioning nodes are not authoritative empty (#2625)
Browse files Browse the repository at this point in the history
This fixes a nasty correctness bug where provisioning node is considered empty(). empty nodes are implicitly authoritative but provisioning nodes are not authoritative since they can later join the write-set. I've also increased the checkSeal requirement to avoid mismarking a loglet as sealed as we don't want to violate the invariant: a sealed loglet must remain sealed, always.

```
// intentionally empty
```
  • Loading branch information
AhmedSoliman authored Feb 5, 2025
1 parent 4f362fa commit b48e6fa
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 14 deletions.
14 changes: 13 additions & 1 deletion crates/bifrost/src/providers/replicated_loglet/provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,19 @@ pub fn logserver_candidate_filter(_node_id: PlainNodeId, config: &NodeConfig) ->
// the role was removed by mistake (although some protection should be added for this)
match config.log_server_config.storage_state {
StorageState::ReadWrite => true,
StorageState::Provisioning if config.has_role(Role::LogServer) => true,
// Why is this being commented out?
// Just being conservative to avoid polluting nodesets with nodes that might have started
// and crashed and never became log-servers. If enough nodes in this state were added to
// nodesets, we might never be able to seal those loglets unless we actually start those
// nodes.
// The origin of allowing those nodes to be in new nodesets came from the need to
// swap log-servers with new ones on rolling upgrades (N5 starts up to replace N1 for instance).
// This requires that we drain N1 (marking it read-only) and start up N5. New nodesets
// after this point should not include N1 and will include N5. For now, I prefer to
// constraint this until we have the full story on how those operations will be
// coordinated.
//
// StorageState::Provisioning if config.has_role(Role::LogServer) => true,
// explicit match to make it clear that we are excluding nodes with the following states,
// any new states added will force the compiler to fail
StorageState::Provisioning
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ use restate_types::replicated_loglet::{LogNodeSetExt, ReplicatedLogletParams};
use super::util::Disposition;
use crate::loglet::util::TailOffsetWatch;
use crate::providers::replicated_loglet::log_server_manager::RemoteLogServerManager;
use crate::providers::replicated_loglet::replication::{FMajorityResult, NodeSetChecker};
use crate::providers::replicated_loglet::replication::NodeSetChecker;
use crate::providers::replicated_loglet::tasks::util::RunOnSingleNode;

/// Attempts to detect if the loglet has been sealed or if there is a seal in progress by
Expand Down Expand Up @@ -163,7 +163,7 @@ impl CheckSealTask {
// loglet sealed even if some nodes lost data. The check-seal wouldn't determine the
// known_global_tail, only whether the loglet is sealed or not and it's safe to do so.
// non-authoritative nodes cannot accept normal writes, only repair writes.
if nodeset_checker.check_fmajority(|attr| *attr) >= FMajorityResult::BestEffort {
if nodeset_checker.check_fmajority(|attr| *attr).passed() {
// no need to detach, we don't need responses from the rest of the nodes
return Ok(CheckSealOutcome::FullySealed);
}
Expand Down Expand Up @@ -191,7 +191,7 @@ impl CheckSealTask {
return Ok(CheckSealOutcome::Sealing);
}

if nodeset_checker.check_fmajority(|attr| !(*attr)) >= FMajorityResult::BestEffort {
if nodeset_checker.check_fmajority(|attr| !(*attr)).passed() {
return Ok(CheckSealOutcome::Open);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,6 @@ impl SealTask {
nodeset_checker.set_attribute(node_id, true);
nodeset_status.insert(node_id, NodeSealStatus::Sealed);

// todo: consider allowing the seal to pass at best-effort f-majority.
if nodeset_checker.check_fmajority(|attr| *attr).passed() {
info!(
loglet_id = %my_params.loglet_id,
Expand Down
21 changes: 12 additions & 9 deletions crates/types/src/nodes_config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -314,25 +314,28 @@ impl Versioned for NodesConfiguration {
#[strum(serialize_all = "kebab-case")]
pub enum StorageState {
/// The node is not expected to be a member in any write set and the node will self-provision
/// its log-store to `Disabled` once it's written its own storage marker on disk.
/// its log-store to `ReadWrite` once it's written its own storage marker on disk.
///
/// The node can never transition back to `Provisioning` once it has transitioned into
/// `Disabled`.
/// The node can never transition back to `Provisioning` once it has transitioned out of it.
///
/// should read from: no
/// can write to: no
#[default]
Provisioning,
/// Node's storage is not expected to be accessed in reads nor write. The node is not
/// considered as part of the replicated log cluster (yet). Node can be safely decommissioned.
/// considered as part of the replicated log cluster. Node can be safely decommissioned.
///
/// The node can never transition out of `Disabled` after it has transitioned into it.
///
/// should read from: no
/// can write to: no
Disabled,
/// Node is not picked in new write sets and it'll reject new writes on its own storage except for
/// critical metadata updates.
/// Node is not picked in new write sets, but it may still accept writes on existing nodeset
/// and it's included in critical metadata updates (seal, release, etc.)
/// should read from: yes
/// can write to: no
/// can write to: yes
/// **should write to: no**
/// **excluded from new nodesets**
ReadOnly,
/// Can be picked up in new write sets and accepts writes in existing write sets.
///
Expand Down Expand Up @@ -373,9 +376,9 @@ impl StorageState {
}
}

/// Empty nodes are automatically excluded from node sets.
/// Empty nodes are considered not part of the nodeset and they'll never join back.
pub fn empty(&self) -> bool {
matches!(self, StorageState::Provisioning | StorageState::Disabled)
matches!(self, StorageState::Disabled)
}
}

Expand Down

0 comments on commit b48e6fa

Please sign in to comment.