Replies: 3 comments 27 replies
-
I have participated in the discussion of "update scheduler", see discussion 1491 #comment, and I'd like to give some suggestions from my view:
Thanks. |
Beta Was this translation helpful? Give feedback.
-
I want to break the compatibility and make our definition simple. I think we should not fear breaking backward compatibility. Create new |
Beta Was this translation helpful? Give feedback.
-
I have another proposal: Keep the Unbind
|
Beta Was this translation helpful? Give feedback.
-
It has been one year since Chaos Mesh got open sourced. The
controller-manager
of Chaos Mesh keeps growing and turning into a huge and hard to manage project 😢 . Thetwophase
controller of it epitomises the difficulty, which I have submitted a lot of PRs to try to make it simpler.In this discussion, I will do a self-criticism about the bad design ——
duration
andscheduler
of current controller and CRDs and will also explain the historical reason of the bad design. Several choices to overcome these issues will be provided as references (with or without breaking the compatibility).Under this discussion, **I hope we can decide one acceptable plan to make this thing better. And the implementation of the choice should arrive as soon as possible. **
The
duration
andscheduler
fields in every CRDs is really a bad design. It combines the thing ofscheduler
with "chaos implementation". With these fields in the CRD, we have to manage much more status in one CRD. What's more, a lot of "high-level" features are brought to them later: likepause
anddynamic scheduler
.Now, nearly every new features on the controller (e.g. "injecting chaos to newly created chaos", "dynamically change configuration") are blocked by the complexity of
twophase
controller. We (at least I) didn't have enough confidence to write a bug-free implementation without breakingtwophase
controller.Though I have tried to make
twophase
controller easier, it has been proved this attempt failed finally 😿. It doesn't make things better enough.historical reason
The first CRD of Chaos Mesh is
PodChaos
, and the first chaos of Chaos Mesh ispod-kill
, which is a "job" but not a "status",. Only withduration
andscheduler
, which means the pods are killed periodically, it becomes a status. Then all of the Chaos followed it, without thinking whether it's really a good design.Solution
All these solutions will make the CRD more complicate. Just like
Deployment -> ReplicaSet -> Pod
, spliting the functions into multiple standalone resource shouldn't be feared, and cloud be a good practice. The only concern is about how to describe the phenomenon to the users.CronWorkflow
and remove these two fields from the CRDPros:
Cons:
duration
andscheduler
like a standalone resource, which will create common chaos to fulfill its implementation. Cronjob (twophase scheduler) through standalone resource #1481Pros:
Cons:
XXXChaos
created by the machine.XXXChaos
RawChaos
, and the twophase controller implements through creating and monitoring theRawChaos
.Pros:
RawChaos
CRD will bring us a unified interface of chaos, which could be really helpful forplugin
Cons:
RawChaos
and common chaos are nearly the same.@Yiyiyimu @YangKeao @STRRL @Hexilee You have all contributed to the twophase controller or talked about the design of it with me before. Do you have any suggestions on this problem?
Beta Was this translation helpful? Give feedback.
All reactions