-
Notifications
You must be signed in to change notification settings - Fork 9.1k
YARN-10942. Move AbstractCSQueue fields to separate objects that are tracking usage #3430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1909a78
to
cd2e557
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
I don't get what the Spotbugs issue is about for the "VO_VOLATILE_INCREMENT" bug, as I'm updating the volatile field from synchroinzed methods only, which is enough to keep it thread-safe.
Also, the unit test failure is a known flaky: |
cd2e557
to
9f10332
Compare
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @szilard-nemeth for the patch, LGTM (non-binding).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think the patch is fine, there are a few changes I've suggested, please take a look at them, one of them will fix the false positive SpotBugs warning as well.
...rc/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
Outdated
Show resolved
Hide resolved
...pache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueueUsageTracker.java
Outdated
Show resolved
Hide resolved
...pache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueueUsageTracker.java
Outdated
Show resolved
Hide resolved
...pache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueueUsageTracker.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @szilard-nemeth. I had one comment, but its looking good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think making these methods synchronized is not necessary. This will make the whole object locked, but the queue already has its own fine-grained RWLock, which is used for synchronization. I think an all-in or nothing methodology is better here: either make the whole class thread-safe, or let the queue handle the synchronization. In my opinion, since its only used internally within the queue, the latter would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment @9uapaw and @shuzirra about the locking.
Let me share my opinion.
Originally, numContainers was a volatile field of AbstractCSQueue and I just moved here.
The thing is, it may have not been necessarily a volatile field in the first place as the getter of it was invoked by these methods:
CapacitySchedulerLeafQueueInfo.CapacitySchedulerLeafQueueInfo(CapacityScheduler, LeafQueue) (org.apache.hadoop.yarn.server.resourcemanager.webapp.dao)
ClusterMetricsInfo.ClusterMetricsInfo(ResourceScheduler) (org.apache.hadoop.yarn.server.resourcemanager.webapp.dao)
LeafQueue.toString() (org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity)
ParentQueue.toString() (org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity)
The question of whether marking this field with the volatile keyword made sense really depends on if these methods are called from multiple threads or not.
Internally, AbstractCSQueue increased and decreased the value with the ++/-- operators and IntelliJ showed a warning already, as it's not an atomic operation to use these operators on a volatile field.
Volatile prevents all the threads from reading a non up to date value.
Anyway, what you described makes sense for me so that we should use the queue-level lock, it's already happening as calls of increaseNumContainers / decreaseNumContainers as the respective callers are allocateResource / releaseResource that are making sure of guarding the usageTracker operations with the writeLock so I admit it doesn't really make sense to mark the methods synchronized.
One possible caveat is that the increaseNumContainers / decreaseNumContainers methods are now protected, so it's not guaranteed at all that the only callee would be originated from AbstractCSQueue's methods.
Alternatively, the writeLock could be passed to the CSQueueUsageTracker but I wouldn't prefer doing that.
All in all, I think we should keep the volatile and keep the locks as is in AbstractCSQueue, but we will have the warning in IntelliJ as a consequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any issue with the synchronization, if this object needs to be thread safe, then we need to guarantee it on the object level, not hoping for other objects to use it in a thread safe only way.
However keeping the field az volatile AND omitting the synchronization is actually a mistake. This class does NOT provide a proper way of incrementing the field, which means if it is used in a non-thread safe environment (volatile suggests that, since volatile does not make any sense in a thread safe environment), then the increment might fail. Hence the spotbug warning. So I'd strongly suggest to keep the synchronized it's not much overhead if we the methods are already invoked in threadsafe way, but a good guard if they are not. Also I'd omit volatile since value is only accessible via a getter, so outer methods won't benefit from the keyword, and internally we don't really use the variables (only to increase / decrease it in a volatile incompatible way).
💔 -1 overall
This message was automatically generated. |
8b5fff5
to
b356405
Compare
💔 -1 overall
This message was automatically generated. |
b356405
to
85b09ac
Compare
💔 -1 overall
This message was automatically generated. |
I suppose we can not submit a change that will introduce a spotbug error. One workaround for this issue would be to use an AtomicInteger, that is intended to solve all the issue mentioned beforehand. |
🎊 +1 overall
This message was automatically generated. |
db66cc6
to
2a29869
Compare
🎊 +1 overall
This message was automatically generated. |
Description of PR
How was this patch tested?
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?