-
Notifications
You must be signed in to change notification settings - Fork 40.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial lifecycle and hooks into kubelet. #1028
Conversation
|
||
// Lifecycle is an interface implemented by things that handle lifecycle events | ||
type Lifecycle interface { | ||
// OnStart is called when a container is started. The management system blocks until it returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before or after container start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after. (updated)
comments (mostly) addressed, ptal. thanks! |
type Lifecycle interface { | ||
// OnStart is called after a container is started. The management system blocks until it returns | ||
OnStart(containerID string) (LifecycleOutput, error) | ||
// OnStop is called before a container is stopped. The management system blocks until it returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest the names PostStart and PreStop. Also, I'm concerned about blocking indefinitely. Should have a timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Names changed. I would like to leave them as blocking indefinitely. I think that these need to be true lifecycle events, not notifications. The call needs to return before management can proceed.
(I agree that practically we probably need a timeout, or other guarantees to "kill things that are not responding" but I'd like to start out of the gate with the right model, and then add the safety valve later)
I don't want development of lifecycle to proceed assuming that communication is one way from kubelet to container.
If you feel strongly about the safety side of things, I'll spend the time to expand this into something with timeouts, but I'd prefer to do that in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Seems to be missing a push.) I'm not sure I buy that indefinite wait is the right model. @thockin may have thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts about wait are this:
Management should wait until some timeout. Once the timeout is reached, the management system should unequivocally kill the container as "not responding" the same as OS X or Android.
I can implement that behavior in this PR, but I'd rather do it in a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is to add a timeout to the function calls, with the expectation that after that time the functions will return with an error. I'm being picky about this because you really need to pass the timeout down into the script you're calling as a parameter. I don't care if you implement in another PR, but I think the timeout really deserves to make it into the interface for the first cut, because it's really very important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People internally use shutdown notification to do important cleanup, I think fairly commonly. We have a configurable time limit. I'm not saying you should make it configurable with this change, but I think it's very important that the scripts know what their time limit is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my point is that I think that model is broken. The management
system shouldn't give you a quota of time and hope you can deal with it.
It should stick around for as long as it possibly can, and then treat your
container as broken (similar to failing health checks) if you fail to
respond within the maximum time it can allot to lifecycle events, it seems
subtle, but I think it's a really important difference.
In one model, the container and management system are partners, in the
other, the management system is in charge, and simply notifying containers
"this is the way it is"
I too am being detailed here, because I think it's really important. The
"notification" style approach leads to a style of lifecycle that is much
more advisorial, than contract driven, and that's precisely what I'm trying
to avoid.
--brendan
On Mon, Aug 25, 2014 at 9:12 PM, Daniel Smith notifications@github.com
wrote:
In pkg/kubelet/lifecycle.go:
+See the License for the specific language governing permissions and
+limitations under the License.
+*/
+
+package kubelet
+
+// LifecycleOutput is the output from a lifecycle event
+type LifecycleOutput struct {
- Details string
+}
+// Lifecycle is an interface implemented by things that handle lifecycle events
+type Lifecycle interface {
- // OnStart is called after a container is started. The management system blocks until it returns
- OnStart(containerID string) (LifecycleOutput, error)
- // OnStop is called before a container is stopped. The management system blocks until it returns
People internally use shutdown notification to do important cleanup, I
think fairly commonly. We have a configurable time limit. I'm not saying
you should make it configurable with this change, but I think it's very
important that the scripts know what their time limit is.—
Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/kubernetes/pull/1028/files#r16695049
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the management system has to give you a quota of time (and if you can't deal with it, you get killed unceremoniously). Otherwise rogue tasks can eat up all our resources. And if there is a quota, it's imperative that we tell the user what it is, so they know what they reasonably have a chance of running. Otherwise you can get nondeterministic behavior with folks' scripts sometimes completing and sometimes not, with no obvious fix.
There's no question about who is in the driver's seat here. Timeout makes it very clear to the task what is about to happen to it, to let it best use its remaining cycles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with a lot of what @lavalamp says; however, I think there is certainly room to do something different than what we do in the internal system that @brendandburns dislikes.
We can certainly make a system that lets some pods be able to take a whole day to respond to a lifecycle event, or more. But if we have to assume that all pods might take a day, or whatever time limit you chose, it severely limits our ability to deliver both high utilization and rapid pod starting.
One option is to make the pod creator voluntarily set a time limit on the command, visible in the pod API object. And then, later, we can expose the costs of picking a long time to users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will need to be a timeout. As Brendan pointed out, even systems like Android have timeouts.
In practice, different applications need different timeout values. Data processing workers with fine-grain shards may be able to complete their work and terminate in seconds. A stateless, load-balanced server may need to wait 2 readiness probe cycles to ensure the load balancer has redirected traffic. A database with an asynchronously updated backup may need a couple minutes to fully sync its state. A sharded service may need tens of minutes to spin up a replacement. A storage server may need hours to offload its data.
One could allow the timeout to be directly configurable. However, this could create priority inversion in a system where multiple levels of quality of service are supported. Observe in the examples above that the amount of time required to drain correlates strongly with possible quality of service levels. I recommend setting the timeout according to availability QoS level.
HOWEVER, users want different timeout values in different situations. For instance, they won't want to wait an hour if they just want to take down their service. We'll need all user-initiated operations that effectively stop containers to accept timeout values, which should override the default (assuming that they're smaller than the default).
comments (mostly) addressed. ptal thanks |
now actually pushed the commit... sigh |
@@ -320,12 +324,24 @@ func (kl *Kubelet) runContainer(pod *Pod, container *api.Container, podVolumes v | |||
Binds: binds, | |||
NetworkMode: netMode, | |||
}) | |||
if kl.lifecycle != nil { | |||
if output, err := kl.lifecycle.PostStart(dockerContainer.ID); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a different "err" than the one used to return the result of StartContainer? If so, that's confusing.
OTOH, it seems like we should convey failure of a hook to the user somehow. What use cases do we imagine for PostStart hooks? Registration with external discovery mechanisms (e.g., worker registration)? If this were a PreStart hook, which might be used to initialize data in the container, I'd abort starting the container on hook failure, or at least provide the option to do that. (Which I suppose implies we'd need the user to request the hook in the API.)
Have you thought about a mechanism to return the output to the user? Probably we need an event stream they can listen on. In this case, it would be nice to be able to get structured data back, so we could grow this into a real API. Could we require the output to be YAML?
Similarly, I'd like to pass structured information to the hook, so that we can grow it into a real API. In other threads, we've discussed environment variables, but YAML seems better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, err is shadowed. I assumed that was intentional?
There's chatter about making our api not take YAML; I'd prefer to be consistent and use JSON for structured info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to suggest we have a unifying proposal for info that travels from the kubelet to the cluster, which includes start/stop events as well as results of lifecycle hooks, once Dawns restart work was up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton I think "kubelet -> pod" info path is a totally different story from the "kubelet -> rest of cell" path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lavalamp I was referring to hook output back to the user, I was assuming that as a client if I want to see hook results I'd need to be able to identify the pod that specified it, but that I probably want hook info to be historically available (it's more like log results than snapshot info). So I was assuming that this lived somewhere between the "this died and got restarted" restart info of Dawn's item and the "how do I store, ship, aggregate, and search/view logs for a particular pod" work. Didn't seem like it fits on either extreme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton yeah, kubelet needs to log its actions per pod and be able to regurgitate that on request. I think that's larger than this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure - I think that's #285 at least in part
I'm going to try not to get stuck in the code, and instead talk about the design. I think this is one of those things that would really benefit from a design doc that gets committed, and that the discussion would be better on a doc than on the code. I'm 100% in favor of the idea, but this feels inside-out to me. Maybe my mental model is over-complex, but I'm going to write it down anyway. I expected to see something like the API expanding:
Specifically, that the user specifies the events they want delivered and how to deliver them (exec, signal, http GET/POST, etc). Hardcoding the exec paths makes this very specific to k8s -- I think we want to push this as a semi-standard way of doing lifecycle, even on existing containers. |
+1 to sticking this in the api. |
Ref. #140, just to connect the PR to the issue. |
Ok, I'm trying this the other way. I opened a PR for the API types: #1041 I'll close this for now, and we can resume discussion there. |
No description provided.