Skip to content
This repository has been archived by the owner on Jun 28, 2023. It is now read-only.

Improve visibility into management and workload cluster creation #2730

Closed
stmcginnis opened this issue Dec 15, 2021 · 9 comments · Fixed by #2769
Closed

Improve visibility into management and workload cluster creation #2730

stmcginnis opened this issue Dec 15, 2021 · 9 comments · Fixed by #2769
Labels
kind/feature A request for a new feature proposal/acccepted Change is accepted
Milestone

Comments

@stmcginnis
Copy link
Contributor

stmcginnis commented Dec 15, 2021

Abstract

Our current bootstrapping process emits a lot of output, but it's very hard to follow what is happening. Even if you understand the overall process of deployment, parsing the output can be confusing, and it's not very clear where in the deployment process we are.

When deployment takes awhile, it's hard to tell if things are locked up or if there's something happening under the covers that we are waiting to complete.

When the deployment does fail, it's not always clear what caused the failure. We only give generic troubleshooting steps that may not even be relevant to the failure they are encountering.

Current Issues:

Current cluster bootstrap process is very noisy, with a lot of output that is not meaningful to the user and can be potentially confusing:

image

There are a few problems with this output:

  1. There is no formatting (or a mix of formatting)
  2. High level steps are not clear from low level steps being taken
  3. It's hard to tell what is relevant to pay attention to, versus what can be ignored
  4. Not clear what step is being done and how far along in the process it is
  5. When things go wrong, not clear what the cause of the failure is
  6. Not clear what the user needs to do to resolve the problem
  7. Not clear if there is cleanup that needs to be done before trying again

Issue Tracking

The following list tracks the issues required to resolve in order to achieve this capability. Please see the next section to understand the larger proposal.

Tanzu Framework

Community Edition

  • TBD

Other

  • TBD

Proposal

@stmcginnis stmcginnis added triage/needs-triage Needs triage by TCE maintainers kind/feature A request for a new feature proposal/pending Capability has not yet been accepted by TCE project. Work should not start until accepted. labels Dec 15, 2021
@stmcginnis
Copy link
Contributor Author

cc @garrying

@joshrosso joshrosso added proposal/needs-design-doc Design doc is required to move forward with decision and removed proposal/pending Capability has not yet been accepted by TCE project. Work should not start until accepted. labels Dec 16, 2021
@joshrosso
Copy link
Contributor

Thanks for bootstrapping this @stmcginnis. Looking forward to the design doc. Here are some ideas that come to mind, worth considering:

  1. 💯 to your common on capx-manager holding the key to failures. The logs from these would surely be too noisy to print at default verbosity, however, I think always writing those logs to a bootstrap log file and outputting to the user its available during bootstrap would be extremely high value.
    Creating management cluster in ${INFRASTURCTURE_PROVIDER}
        View bootstrap logs at: ${HOME}/.config/tanzu/tkg/bootstrap-logs/${CLUSTER_NAME}.log
    1. At a higher-verbosity, we just tail those logs to stdout as well.
  2. This proposal should break down the bootstrap visibility for management and workload clusters. I forsee these two looking quite different. In other words, i care about different things, like for my workload cluster, I want to understand what TKR was selected, CNI, etc -- not too dissimilar from our new standalone model.

@garrying
Copy link
Contributor

This is great @stmcginnis! The high-level changes resonates with me. Happy to help start the design doc.

@joshrosso joshrosso removed the triage/needs-triage Needs triage by TCE maintainers label Jan 18, 2022
@joshrosso joshrosso added proposal/pending Capability has not yet been accepted by TCE project. Work should not start until accepted. and removed proposal/needs-design-doc Design doc is required to move forward with decision labels Jan 18, 2022
@joshrosso joshrosso changed the title Improve visibility in the cluster bootstrap process Improve visibility into management and workload cluster creation Jan 18, 2022
@joshrosso
Copy link
Contributor

RFC open, initially targeting closure on 02/04/2022.

@DennisFaucher
Copy link

What about something similar to what Linux distros use for installation? A simple screen with one line the describes the current activity and the progress speedometer of that activity. The activity can be expanded to show detail if needed. Does not need to be a GUI, can be ascii/curses/whatever-based
fedora_progress

@stmcginnis
Copy link
Contributor Author

Great idea @DennisFaucher. That's a slightly different approach from what is being proposed here, but I could see that as a great follow on. If we make the updates proposed in the design doc, the UI could read those and update the output like the example you show. Then there could be a "Details" expander or something that would give the full output.

What do you think @miclettej ?

@joshrosso
Copy link
Contributor

Great idea @DennisFaucher. That's a slightly different approach from what is being proposed here, but I could see that as a great follow on. If we make the updates proposed in the design doc, the UI could read those and update the output like the example you show. Then there could be a "Details" expander or something that would give the full output.

What do you think @miclettej ?

I agree that it's a great suggestion 🎉 , but something we should consider for larger UI work and keep this proposal scoped to giving bootstrap log visibility.

For our future consideration, how do a progress UI like this differ from the kickstart UI?

image

@miclettej
Copy link

To Josh's point, we have something like this in the UI but may need to adjust the granularity of steps or filtering of logs. What we know from customers is that they like to know what stage of the deployment they are on, and what is remaining to complete. The step progress on the left was added in response to customer feedback. We can adjust appearance or granularity of steps/messaging to the customer. I think it may be important to continue to show some indication of steps that are completed and not yet completed.

@joshrosso joshrosso reopened this Mar 1, 2022
@joshrosso joshrosso added proposal/acccepted Change is accepted and removed proposal/pending Capability has not yet been accepted by TCE project. Work should not start until accepted. labels Mar 1, 2022
@stmcginnis stmcginnis removed their assignment Aug 4, 2022
@jdumars jdumars closed this as completed Oct 31, 2022
@DennisFaucher
Copy link

So long team and thanks for all the fish. TCE was great.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature A request for a new feature proposal/acccepted Change is accepted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants