-
Notifications
You must be signed in to change notification settings - Fork 159
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for adding this! 👍
Status BuildStatus `json:"status"` | ||
Spec BuildSpec `json:"spec"` | ||
Status BuildStatus `json:"status"` | ||
Timeout string `json:"timeout,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll want Timeout
to exist inside of BuildSpec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. 👍
pkg/builder/cluster/builder.go
Outdated
@@ -69,6 +69,14 @@ func (op *operation) Checkpoint(status *v1alpha1.BuildStatus) error { | |||
return nil | |||
} | |||
|
|||
func (op *operation) Terminate() error { | |||
err := op.builder.kubeclient.CoreV1().Pods(op.namespace).Delete(op.name, &metav1.DeleteOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: return op.builder.kubeclient.CoreV1().Pods(op.namespace).Delete(op.name, &metav1.DeleteOptions{})
if err := c.waitForOperationAsync(build, op); err != nil { | ||
return err | ||
|
||
// Check if build has timed out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will syncHandler
be called periodically? Or only when there are updates to the build object? If it's periodic, how often is it called?
I worry that if this code is only run when a build updates (i.e., the pod's status changes), then a build that specifies one hour-long sleep
step with a 10m timeout will not have any occasion to check for timeout until that step finishes, at which point the build has long exceeded its timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes syncHandler
is called periodically. I need to do some research to find how often its called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From logs it appears 30 sec is the sync interval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we should document that somewhere, that we effectively only enforce timeouts to a ~30s resolution. Maybe we should also Watch
the pod, with a timeout? If we do that, do we need to worry about the controller job restarting while builds are ongoing, and losing state? (All questions for future work, not blocking this PR)
pkg/controller/build/controller.go
Outdated
//cleanup operation and update status | ||
timeoutMsg := fmt.Sprintf("Build %q failed to finish within %q", build.Name, build.Timeout) | ||
|
||
err = op.Terminate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cases where the method only returns an err, I tend to prefer:
if err := op.Terminate(); err != nil {
c.logger.Error(...)
return err
}
- name: step1 | ||
image: ubuntu | ||
command: ["/bin/bash"] | ||
args: ["-c", "sleep 2000"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh okay, so this test indicates that syncHandler
must be called periodically, or at least not only when steps start/finish. Do we have any idea when or how often exactly syncHandler
is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
30 seconds is the sync interval
metadata: | ||
name: test-with-low-timeout | ||
labels: | ||
expect: failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably another case where our integration tests can be improved, but it would be great if we could have expect: timeout
and have e2e-test.sh
assert not only that the test failed, but that it failed due to timeout specifically. If the build was changed to exit 1
the test would still pass. 😕
Anyway, not necessary for this PR, but something we should think about for future integration testing improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah failed
is very generic and asserting on specific error would be ideal. I added controller level test for timeout which triggers syncHandler
twice manually.
@@ -71,6 +71,10 @@ type BuildSpec struct { | |||
// More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ | |||
// +optional | |||
NodeSelector map[string]string `json:"nodeSelector,omitempty"` | |||
|
|||
// Time after which the build times out. | |||
// Defaults to 10minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: 10 minutes
.
This should maybe also link to Go's ParseDuration
documentation to clarify the expected format: https://golang.org/pkg/time/#ParseDuration
Should we enforce a maximum timeout too? Today, GCB limits builds to 24h for our own implementation purposes, though I suppose there's no strong reason to carry that limitation over to Knative Build. OTOH, it's easier to remove a restriction than to add one later. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of maximum timeout. Let me incorporate this into validation
of build.
pkg/builder/interface.go
Outdated
@@ -70,6 +75,25 @@ func IsDone(status *v1alpha1.BuildStatus) bool { | |||
return false | |||
} | |||
|
|||
// IsTimedOut returns true if the build's status indicates that build is timedout. | |||
func IsTimeout(status *v1alpha1.BuildStatus, buildTimeout string) bool { | |||
defaultTimeout := "10m" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be simpler to have the default already stated in terms of a time.Duration
:
var timeout time.Duration
if buildTimeout == "" {
timeout = 10*time.Minute
} else {
timeout, err = time.ParseDuration(buildTimeout)
if err != nil {
return err
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I like this. Much simpler
/assign @pivotal-nader-ziada |
pkg/controller/build/controller.go
Outdated
Message: message, | ||
}} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you using the http package and error to indicate an timeout from an external server? Does the other status errors include an http error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status error package includes client timeouts but in this case I wanted to throw custom timeout error with more detail about build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually sure what benefit using a StatusError
gives us, can we just return fmt.Errorf("Build timed out after %s", timeout)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I am good with that
pkg/builder/cluster/builder_test.go
Outdated
t.Errorf("Expected no error while terminating operation") | ||
} | ||
// Verify pod is not available | ||
_, err = podsclient.Get(op.Name(), metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This can be combined:
if _, err := podsclient.Get(...); err == nil {
t.Fatalf(...)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to have an obsession with this pattern :P I will be more mindful of this going forward. Thanks for your patience
if err := c.waitForOperationAsync(build, op); err != nil { | ||
return err | ||
|
||
// Check if build has timed out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we should document that somewhere, that we effectively only enforce timeouts to a ~30s resolution. Maybe we should also Watch
the pod, with a timeout? If we do that, do we need to worry about the controller job restarting while builds are ongoing, and losing state? (All questions for future work, not blocking this PR)
pkg/controller/build/controller.go
Outdated
Message: message, | ||
}} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually sure what benefit using a StatusError
gives us, can we just return fmt.Errorf("Build timed out after %s", timeout)
?
The following is the coverage report on pkg/.
|
I will submit PR for docs.
I did consider the idea of |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ImJasonH, shashwathi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Introduce build timeout * Update status for build objects * Update error msg format * Remove debug statements * Update to error msg * Address comments * Add test to check timeout error flow * Add more tests and simplify default timeout * Introduce maximum timeout * Fix typo * Address comments
* Introduce build timeout * Update status for build objects * Update error msg format * Remove debug statements * Update to error msg * Address comments * Add test to check timeout error flow * Add more tests and simplify default timeout * Introduce maximum timeout * Fix typo * Address comments
Fixes #254
Proposed Changes
timeout