-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce clustermap change listener to propagate remote replica addition/removal events #1368
Introduce clustermap change listener to propagate remote replica addition/removal events #1368
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1368 +/- ##
============================================
+ Coverage 72.08% 72.09% +<.01%
- Complexity 6792 6805 +13
============================================
Files 489 489
Lines 38530 38651 +121
Branches 4902 4920 +18
============================================
+ Hits 27775 27865 +90
- Misses 9417 9437 +20
- Partials 1338 1349 +11
Continue to review full report at Codecov.
|
@@ -153,6 +153,11 @@ public MetricRegistry getMetricRegistry() { | |||
return metricRegistry; | |||
} | |||
|
|||
@Override | |||
public void registerClusterMapListener(ClusterMapChangeListener clusterMapChangeListener) { | |||
// no op for static cluster map. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we dont really support, do you think throwing an exception might work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will make the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this isnt addressed yet.
ambry-replication/src/main/java/com.github.ambry.replication/ReplicationManager.java
Outdated
Show resolved
Hide resolved
ambry-replication/src/main/java/com.github.ambry.replication/ReplicationManager.java
Outdated
Show resolved
Hide resolved
ambry-replication/src/main/java/com.github.ambry.replication/ReplicationManager.java
Outdated
Show resolved
Hide resolved
@Override | ||
public void onReplicaAddedOrRemoved(List<ReplicaId> addedReplicas, List<ReplicaId> removedReplicas) { | ||
// 1. wait for start() to complete | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it looks like we have some duplicate code between this method and addReplica/removeReplica methods. Can we try to refactor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addReplica()
, removeReplica()
are methods for changing replicas on local node. They create a new PartitionInfo along with all peer RemoteReplicaInfos. onReplicaAddedOrRemoved()
basically modifies existing PartitionInfo by adding/removing certain RemoteReplicaInfo. Refactoring will be considered in a separate PR after we have verified functionality of this change.
* concurrently update remote replica infos. | ||
*/ | ||
@Override | ||
public void onReplicaAddedOrRemoved(List<ReplicaId> addedReplicas, List<ReplicaId> removedReplicas) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding a private class to serve as the implementation of ClusterMapChangeListener, just like PartitionStateChangeListener.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any benefits we can get by doing so? One probably reason I come up with is: when unifying all existing replication managers into a general replication manager, it requires 2 clustermap change listeners taking actions against changes in VCR and regular Ambry cluster. Feel free to add more thoughts here.
* @return {@code true} if remote replica info is added. {@code false} if it is already present | ||
*/ | ||
boolean addReplicaInfoIfAbsent(RemoteReplicaInfo remoteReplicaInfo) { | ||
lock.lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please lock getRemoveReplicaInfos method as well, this lock is protecting remoteReplicas list, we should lock it everywhere we use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense, i will use read-write lock instead
if (partitionInfo.addReplicaInfoIfAbsent(remoteReplicaInfo)) { | ||
replicationMetrics.addMetricsForRemoteReplicaInfo(remoteReplicaInfo); | ||
} | ||
replicaInfosToAdd.add(remoteReplicaInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should only add remoteReplicaInfo to the "toAdd" list when it was successfully added to partitionInfo, so we should move this statement in the if-else statement above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, will fix this.
for (ReplicaId remoteReplica : removedPeerReplicas) { | ||
PartitionInfo partitionInfo = partitionToPartitionInfo.get(remoteReplica.getPartitionId()); | ||
List<RemoteReplicaInfo> remoteReplicaInfos = new ArrayList<>(partitionInfo.getRemoteReplicaInfos()); | ||
for (RemoteReplicaInfo remoteReplicaInfo : remoteReplicaInfos) { | ||
if (remoteReplicaInfo.getReplicaId().getDataNodeId() == remoteReplica.getDataNodeId()) { | ||
logger.info("Removing remote replica {} on {} from replica threads.", remoteReplica.getReplicaPath(), | ||
remoteReplica.getDataNodeId()); | ||
replicaInfosToRemove.add(remoteReplicaInfo); | ||
if (partitionInfo.removeRelicaInfoIfPresent(remoteReplicaInfo)) { | ||
replicationMetrics.removeMetricsForRemoteReplicaInfo(remoteReplicaInfo); | ||
} | ||
break; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this? we can just change the PartitionInfo's remove method to take a replica id as parameter, and looping the list in PartitionInfo, it's more aligned with addReplicaInfo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, will change removeRelicaInfoIfPresent
if (remoteReplicaInfo.getReplicaId().getDataNodeId() == remoteReplica.getDataNodeId()) { | ||
logger.info("Removing remote replica {} on {} from replica threads.", remoteReplica.getReplicaPath(), | ||
remoteReplica.getDataNodeId()); | ||
replicaInfosToRemove.add(remoteReplicaInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we should put this statement within if-else statement as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
events With "move replica", Helix cluster manager should be able to dynamically add/remove remote replica infos and therefore clustermap will be dynamically changed. External components like ReplicationManager needs to update thread assignment and some in-mem data structures in repsonse to such changes. This PR introduces clustermap change listener which allows ReplicationManager to listen to any change in clustermap and take actions accordingly. For now, there is only one method in clustermap change listener: onReplicaAddedOrRemoved(), we could add more methods like onReplicaLeadershipChange etc if needed.
b12d085
to
fa026e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall.
With "move replica", Helix cluster manager should be able to dynamically add/remove remote replica infos and therefore clustermap will be dynamically changed. External components like ReplicationManager needs to update thread assignment and some in-mem data structures in repsonse to such changes. This PR introduces clustermap change listener which allows ReplicationManager to listen to any change in
clustermap and take actions accordingly. For now, there is only one method in clustermap
change listener: onReplicaAddedOrRemoved(), we could add more methods like onReplicaLeadershipChange etc if needed.
Remote replica addition/removal in clustermap is currently supported by dynamic cluster change handler only.( #1355 )