-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement reference Unstructured store API to upload TaskRun logs GCS store #107
Comments
@tejal29 this is a stretch goal for the milestone so I'm going to remove it from the required milestone tasks |
This pr implements a simple TaskRun controller that creates a knative/build Build and updates the TaskRun status to reflect the Build status. We delegate to the knative/build controller to do the work of actually fulfillign the Build itself - meaning we have a hard dependency on knative/build. The integration test doesn't actually assert on the logs output by the bulid step because the pods disappear immediately after completion, so we need a better solution here (e.g. writing to a PVC in the test) - in the long run we need to implement better log support (#107). Remaining work for #59 is to improve unit test coverage, and add some docs on running + debugging.
After feedback with Build working group, we decided to go with following approach.
This will be a Https Service running in our cluster or any other cluster which your cluster can access. The Sink
The reason, we have e.g.: I have implemented a GCSSink and installed in my cluster. It is running at "104.198.205.71:8080".
I can define two GCS Sinks which point to 2 buckets "cluster1", "cluster2"
Along with that, Sink Interface need to handle 4 url requests "upload/taskruns/", "download/taskruns/" , "upload/pipelineruns/", "download/taskruns/" The Task Reconciler will now make a HTTP request to this The Design Question over here is how to Define a Sink for a pipeline or a Task run.
Should Sink be defined per Pipeline or Task?
/cc @imjasonh and @bobcatfish and @aaron-prindle does this all make sense? |
Nice! These are some initial thoughts/questions:
|
Yes you would need like Credentials file added to sink.GCS definition and then pass that along?
not sure, what would happen if we provide a default path. We we have taskruns with same id running in separate clusters. They might end up writing to same path. Maybe we could add some validation to make sure path is always specified.
Ahh! For GCS i thinking
The endpoint will be actually GCS Sink Implementation Http Service like "10.x.x.x:8080" which will have all the code to upload and download content from GCS. |
When a user kicks off a run, they will provide an endpoint to upload logs to (initial implementation will be in #107). The corresponding fields in `status` will indicate where the logs actually got uplaoded to. Once we actually get to #107, and especially once we start supporting endpoints other than GCS, we may find this isn't useful and remove it. Fixes tektoncd#146
We noticed early on that logs from init containers are often cleaned up immediately by k8s, particularly if the containers are short running (e.g. just echoing "hello world"). We started down a path to correct that, which takes an approach based on Prow's entrypoint solution (https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint) (even using the same image at the moment!) which wraps the user's provided command and streams logs to a volume, from which the logs can be uploaded/streamed by a sidecar. Since we are using init containers for step execution, we can't yet use sidecars, but we are addressing that in tektoncd#224 (also an entrypoint re-writing based solution). Once we have that, we can sidecar support, starting with GCS as a POC (#107) and moving into other types. In the meantime, to enable us to get logs (particularly in tests), we had the taskrun controller create a PVC on the fly to hold logs. This has two problems: * The PVCs are not cleaned up so this is an unexpected side effect for users * Combined with PVC based input + ouput linking, this causes scheduling problems for the resulting pods (tektoncd#375) Now that we want to have an official release, this would be a bad state to release in, so we will remove this magical log PVC creation logic, which was never our intended end state anyway. Since we _do_ need the entrypoint rewriting and log interception logic in the long run, this commit leaves most functionality intact, removing only the PVC creation and changing the volume being used to an `emptyDir`, which is what we will likely use for #107 (and this is how Prow handles this as well). This means the released functionality will be streaming logs to a location where nothing can read them, however I think it is better than completely removing the functionality b/c: 1. We need the functionality in the long run 2. Users should be prepared for this functionality (e.g. dealing with edge cases around the taskrun controller being able to fetch an image's entrypoint) Fixes tektoncd#387
We noticed early on that logs from init containers are often cleaned up immediately by k8s, particularly if the containers are short running (e.g. just echoing "hello world"). We started down a path to correct that, which takes an approach based on Prow's entrypoint solution (https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint) (even using the same image at the moment!) which wraps the user's provided command and streams logs to a volume, from which the logs can be uploaded/streamed by a sidecar. Since we are using init containers for step execution, we can't yet use sidecars, but we are addressing that in #224 (also an entrypoint re-writing based solution). Once we have that, we can sidecar support, starting with GCS as a POC (#107) and moving into other types. In the meantime, to enable us to get logs (particularly in tests), we had the taskrun controller create a PVC on the fly to hold logs. This has two problems: * The PVCs are not cleaned up so this is an unexpected side effect for users * Combined with PVC based input + ouput linking, this causes scheduling problems for the resulting pods (#375) Now that we want to have an official release, this would be a bad state to release in, so we will remove this magical log PVC creation logic, which was never our intended end state anyway. Since we _do_ need the entrypoint rewriting and log interception logic in the long run, this commit leaves most functionality intact, removing only the PVC creation and changing the volume being used to an `emptyDir`, which is what we will likely use for #107 (and this is how Prow handles this as well). This means the released functionality will be streaming logs to a location where nothing can read them, however I think it is better than completely removing the functionality b/c: 1. We need the functionality in the long run 2. Users should be prepared for this functionality (e.g. dealing with edge cases around the taskrun controller being able to fetch an image's entrypoint) Fixes #387
In tektoncd#549 @hrishin pointed out that it's hard to understand from the step status exactly which step did what. While looking at this I realized that we have included a field `logsURL` which we never populate - I thought this was copied over from Build but it was actually from our original prototype API and we have never used it. In #107 we should be revisiting making logs available and we may add in something like this, but since we're not using it and it's not clear if we ever will, let's remove it for now.
In #549 @hrishin pointed out that it's hard to understand from the step status exactly which step did what. While looking at this I realized that we have included a field `logsURL` which we never populate - I thought this was copied over from Build but it was actually from our original prototype API and we have never used it. In #107 we should be revisiting making logs available and we may add in something like this, but since we're not using it and it's not clear if we ever will, let's remove it for now.
As @cmoulliard pointed out, it's not obvious how to get to the logs for a PipelineRun or a TaskRun. If you know how the underlying kubernetes resources work you can figure it out but it can be hard to know where to start. Plus, folks may not realize that we are working on better ways of accessing logs. And once we work on #107 we can build up these docs with more detail about how to upload logs too.
As @cmoulliard pointed out, it's not obvious how to get to the logs for a PipelineRun or a TaskRun. If you know how the underlying kubernetes resources work you can figure it out but it can be hard to know where to start. Plus, folks may not realize that we are working on better ways of accessing logs. And once we work on #107 we can build up these docs with more detail about how to upload logs too. Fixes tektoncd#898
As @cmoulliard pointed out, it's not obvious how to get to the logs for a PipelineRun or a TaskRun. If you know how the underlying kubernetes resources work you can figure it out but it can be hard to know where to start. Plus, folks may not realize that we are working on better ways of accessing logs. And once we work on #107 we can build up these docs with more detail about how to upload logs too. Fixes #898
I'm closing this issue out as we have now circulated a design doc for logging in Tekton and the utility of information retained in this issue is limited due to its age. I've opened #1155 to encompass the work of validating and implementing the proposed design and encourage anyone looking to get involved on this topic to add commentary, use cases and counterpoints to the design doc or github issue linked above. Cheers! |
In #107 and related issues we decided to let tools dedicated to this (e.g. fluentd) take care of it!
In #107 and related issues we decided to let tools dedicated to this (e.g. fluentd) take care of it!
Expected Behavior
The Pipeline
TaskRun
logs should be uploaded to an endpoint and available to download later.In our initial reference implementation we should support uploading to GCS. In the long run we should support other kinds of stores, and provide a default that does not require GCS.
Actual Behavior
As of #167 the logs will be streamed to a PVC. This volume will continue to exist after the TaskRun has completed. Once this task is done, that PVC should no longer be needed.(This functionality was removed in #443)Since we moved from init containers to containers for steps in #564 logs are available via the pod logs through kube, however there are still only limited guarantees about how long the logs will be available for.
Steps to Reproduce the Problem
Additional Info
The text was updated successfully, but these errors were encountered: