Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix[MQB]: use CSL to update state on QueueAssignmentAdvisory #584

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

emelialei88
Copy link
Collaborator

Similar to PR #581, this PR invokes CSL commit callback for QueueAssignmentAdvisories and the queue assignment part of LeaderAdvisory, effectively making CSL the source of truth.

@emelialei88 emelialei88 force-pushed the csl-QAA branch 2 times, most recently from 852e459 to 8cdba6e Compare January 24, 2025 22:36
@emelialei88 emelialei88 marked this pull request as ready for review January 29, 2025 20:23
@emelialei88 emelialei88 requested a review from a team as a code owner January 29, 2025 20:23
@emelialei88 emelialei88 requested a review from kaikulimu January 29, 2025 20:23
@emelialei88 emelialei88 self-assigned this Jan 29, 2025
Copy link
Collaborator

@kaikulimu kaikulimu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments

"(Committed advisory).*queueAssignmentAdvisory", timeout
)
assert not member.outputs_regex(
"'QueueUnAssignmentAdvisory' will be applied to", timeout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct for two reasons:

  1. Should be QueueAssignmentAdvisory instead of QueueUnAssignmentAdvisory
  2. This log line seems to be during
    // Apply 'queueAssignmentAdvisory' to CSL
    BALL_LOG_INFO << clusterData->identity().description()
    << ": 'QueueAssignmentAdvisory' will be applied to "
    << " cluster state ledger: " << queueAdvisory;
    , but should appear when we go through CSL path.

<< ": Queue assigned: " << queueAdvisory;

// Broadcast 'queueAssignmentAdvisory' to all followers
clusterData->messageTransmitter().broadcastMessage(controlMsg);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, QueueAssignmentAdvisory should never be sent out. We should remove

case MsgChoice::SELECTION_ID_QUEUE_ASSIGNMENT_ADVISORY: {
dispatcher()->execute(
bdlf::BindUtil::bind(
&ClusterOrchestrator::processQueueAssignmentAdvisory,
&d_clusterOrchestrator,
message,
source),
this);
} break; // BREAK
and
void ClusterOrchestrator::processQueueAssignmentAdvisory(
and
virtual void
processQueueAssignmentAdvisory(const bmqp_ctrlmsg::ControlMessage& message,
mqbnet::ClusterNode* source,
bool delayed = false) = 0;
.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, preserve enough code such that mqbblp::ClusterStateManager::processBufferedQueueAdvisories() and mqbblp::ClusterStateManager::processLeaderAdvisory() do not break.

emelialei88 and others added 4 commits February 6, 2025 13:50
Signed-off-by: Emelia Lei <wlei29@bloomberg.net>
…visory

Signed-off-by: Emelia Lei <wlei29@bloomberg.net>
Signed-off-by: Emelia Lei <wlei29@bloomberg.net>
* fix: always update CSL on QueueUpdateAdvisory

Signed-off-by: dorjesinpo <129227380+dorjesinpo@users.noreply.github.com>

* Updating IT

Signed-off-by: dorjesinpo <129227380+dorjesinpo@users.noreply.github.com>

---------

Signed-off-by: dorjesinpo <129227380+dorjesinpo@users.noreply.github.com>
@emelialei88 emelialei88 force-pushed the csl-QAA branch 2 times, most recently from 636cc18 to 022f710 Compare February 6, 2025 21:42
Signed-off-by: Emelia Lei <wlei29@bloomberg.net>
BALL_LOG_INFO << cluster->description()
<< ": Queue assigned: " << queueAdvisory;

// Broadcast 'queueAssignmentAdvisory' to all followers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about compatibility between broker versions, we should still broadcast this advisory for now, such as old broker can still receive and process.

Copy link
Collaborator Author

@emelialei88 emelialei88 Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the new leader broadcast through applyRecordInternal, if the followers are old version, it seems that they can still receive this e_CLUSTER_TYPE event and process correctly. Could you given an example of where there could be a mismatch?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is: Old version followers return on commit callback of e_CLUSTER_TYPE events, so they rely on having this additionally broadcasted 'queueAssignmentAdvisory' to process correctly. Thus, new version leader should keep broadcasting this additional time for now,

@@ -3076,15 +3076,6 @@ void Cluster::processClusterControlMessage(
source),
this);
} break; // BREAK
case MsgChoice::SELECTION_ID_QUEUE_ASSIGNMENT_ADVISORY: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about compatibility between different broker versions, we still keep this case as an no-op, in case an old broker sends us this:

case MsgChoice::SELECTION_ID_QUEUE_ASSIGNMENT_ADVISORY: {
    // NO-OP
} break; // BREAK

Copy link
Collaborator Author

@emelialei88 emelialei88 Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and upon some thinkings, maybe we should keep this branch and not remove anything related to processQueueAssignmentAdvisory for now. Consider this: when the old leader calls assignQueue, it will broadcast SELECTION_ID_QUEUE_ASSIGNMENT_ADVISORY. If we delete this case for new followers, they won't do anything upon receiving the advisory which is incorrect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old leader will also write QAA to CSL; old followers will return immediately, but new followers should process QAA as part of CSL commit callback. It should be safe to remove this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants