-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make dcp and dashboard communication reliable and robust #2422
Comments
All what I have seen so far in terms of inability to start DCP/dashboard fell into one of 3 buckets:
These are the things that IMO have/will make a difference--hope this helps and happy to hear more thoughts on the subject. |
Seems like we also have issues around the dashboard connecting to the app host as well. I've seen 2 people with complaints like this #2539 ( @drewnoakes @JamesNK ), there might be work to do on the dashboard side of things as well. |
Our validation team also reported the issue in #2539 earlier this week but they said that they didn't open a new issue because they weren't able to get a stable repro. I'll ask them to get more logs the next time they see it. |
@balachir I plan to work on some of these issues in preview 5 so a reliable repro or more data would be great. I'll try to add more logs so we can see what might be happening when we get into this state. |
@danegsta will take an initial look here. |
Just saw this one:
PC went to sleep and the background thread watching resources died. |
We should probably have a way to restart the watchers. It is part of the K8s contract that the API server might end them occasionally. We just need to be careful not to try to restart them in a tight loop. In other words, retry with exponential backoff, and ideally have means to indicate in the dashboard UI that the data is stale. |
Hi, Aspire.Hosting Dashboard exception:
AppHost logs:
AppHost env variables (excluded what I thought it wasn't relevant):
Dashboard env variables (excluded what I thought it wasn't relevant):
I hope it can help a bit :) |
Hi Guys! Same issue of the Riff451, Aspire looks a game change technology! |
@karolz-ms I just got bit by mismatched binaries and it took a while to figure it out. I didn't realize to check until I saw your comment above. I thought dependabot already upgraded the packages for me :) Do you think it makes sense to detect the mismatch and throw an exception stating as much? |
These will be much less common once we ship GA. Well gracefully degrade functionality or throw if it’s not available. |
Just had an interesting one. Walked away from my machine for an hour (desktop, always on), and saw this exception:
|
Ah ... so this was the same as Davids. |
Hrm I am wondering whether this is a long poll timing out? |
@mitchdenny see #2422 (comment) My recommendation would be not to rely on very long timeouts, but instead retry as described in the comment above. |
Just got this:
|
Does it reproduce? Reliably? I'm wondering whether we have a race to read the file before DCP has finished writing it. |
No it's not reliable. |
This reliability issue should be partially addressed in this PR #3132 |
We are making 3 changes here to improve things in Aspire P5
|
Was just chatting with @mitchdenny about this. I'm hitting this constantly:
|
@adityamandaleeka what does it say when you run |
Looks like you are still using Aspire P4 version @adityamandaleeka |
We should print the output from the health check on failure |
Yes, updating fixed the error I was seeing, thanks @karolz-ms |
Made progress on this in P5, will likely still be more issues found so keeping alive for P6. |
Given we are not aware of any work that we need to do here, We opted for closing this and we can open a new issue if new work is discovered/planned. |
It seems like we have a set of issues that all have to do with launching and connecting to DCP and making sure that it's reliable. We should harder this code to make sure we a resilient to launching DCP (retrying if it fails to launch) and making sure it's healthy and if it goes unhealthy having a good way to recover.
HttpRequestException
exception #881The text was updated successfully, but these errors were encountered: