Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes ui-side behavior in response to websocket job status updates on several lists #5968

Merged

Conversation

mabashian
Copy link
Member

@mabashian mabashian commented Feb 17, 2020

SUMMARY

Part of #5883

ISSUE TYPE
  • Feature Pull Request
COMPONENT NAME
  • UI
ADDITIONAL INFORMATION

It might be easiest for me to outline the expected behavior with these changes in order to illustrate what's changed and pose questions.

  1. UI gets a socket message with a status other than pending.

    • UI takes that message, loops across the available rows attempting to find a row that matches up with the unified_job_id provided in the socket message. If it finds a matching row, we update the status of the row to match the status in the message. (This same pattern will extend out to other fields (started/finished/etc) when provided by the api).
  2. UI gets a socket message with at status of pending.

    • This is where things get interesting
    • This code treats this message like: This is a new job and you don't have it in your list.
      • Although now that I think about it, maybe this isn't quite true. Could there be scenarios where we do have this job in the list already but it's status is new? If so, the code as it currently stands would potentially insert an extra job for this row (we don't check for this job in our existing list rows).
    • When we get a message like this, we have to go out and perform a GET in order to determine whether or not this particular user has sufficient privileges to see this job and whether or not it fits in with the current search filters.
      • We only do this if the user is on page 1 of the jobs list
      • We only do this if the user has the default -finished or similar -started order_by filter applied to the list. If they're sorting by something else we have no idea where this potential new row fits into the list.
      • We only do this if the user hasn't already filtered the list down by id__in.
        • There's a corner case here - if the user provided an id__in filter that matched a future job id. This probably isn't a realistic scenario though.
      • These GET requests are time-banded on a 5 second interval. For example, lets say a new job comes in over the websocket and we make a GET request for it: /api/v2/unified_jobs?id__in=9000&count_disabled=1. Over the next 5 seconds, we hold on to all the id's of new jobs that come in and bundle them up into a single request to be executed 5 seconds after the initial request (id=9000). That second request would look like /api/v2/unified_jobs?id=9001,9002,9003&count_disabled=1. The cycle continues until there are no more new jobs in the past 5 seconds.
      • If we get rows back, this code adds them to the top of the list (this is why the sort order is important). Page size is maintained by bumping the corresponding number of rows off the bottom.
      • Once the rows are added, future socket messages with updates to these rows fall into scenario 1 above.

A special note about the launch modal: while it's open we will still update rows underneath the modal (because this no longer requires making GET requests and re-rendering the entire row). We don't add any new rows to the list while this modal is open though. This is because the modal itself (it's element) lives within the row. If this row were to get bumped out of the list the modal would disappear. As such, we hold on to new job id's and fire off the corresponding request only after the launch modal is closed. This behavior is consistent with the way we handle messages while the launch modal is open currently.

Question(s)

Probably for @wenottingham but I'm curious what others think as well.

  1. Adding rows to the top of the list when the sort order is -finished (the default sort value) is not exactly "right". In reality, we're adding rows to the top of the list in the order which we receive the socket messages telling us that a new job has been created. This is probably OK until the job actually finishes. Once the jobs finish, they may be out of order based on the -finished order_by filter. Is that OK?

Alternatively we could try to do something like:

[new jobs ordered by whatever order they came in first (or something else?)]
[new jobs that we learned about from sockets but they've finished running, ordered by -finished (ordering would happen by the UI)]
[api jobs ordered by -finished]

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@wenottingham
Copy link
Contributor

we have to go out and perform a GET in order to determine whether or not this particular user has sufficient privileges to see this job

Shouldn't we be fixing this on the sending side?

@ryanpetrello
Copy link
Contributor

ryanpetrello commented Feb 18, 2020

Shouldn't we be fixing this on the sending side?

@wenottingham unlike "give me event data about Job XYZ", the "tell me about job status changes" websocket group isn't currently scoped in any way - it's literally just "tell me the IDs of jobs that encounter a status transition".

So if we wanted to change this (especially if we wanted to make it include additional data about the job in question), we'd probably have to have some sort of per-user channel, or do RBAC checks for every status change for every subscriber. However, even if we changed this, we'd still probably have to do a GET request if our end goal is to only show list views based on currently applied filters.

One of the things we're experimenting with here is the idea that doing a list view and limiting it to id__in=X,Y,Z is really fast and gives us all the context we need - hopefully with fairly minimal API and UI changes compared to supporting an entirely new set of websocket interactions.

@chrismeyersfsu
Copy link
Member

image
👍

@AlanCoding
Copy link
Member

Could there be scenarios where we do have this job in the list already but it's status is new?

No, we got rid of that. In this flow, you POST to JT jobs list or global jobs list to create a job. Then you POST to the start endpoint. This was removed, there is no way to do this anymore. It might still be the case that some tracebacks will put a job into the "new" state, as a vestigial thing. That means the server had an error, and is not a concern for us here.

@AlanCoding
Copy link
Member

Adding rows to the top of the list when the sort order is -finished (the default sort value) is not exactly "right". In reality, we're adding rows to the top of the list in the order which we receive the socket messages telling us that a new job has been created.

If you're filtering by -finished that leaves ambiguous how jobs with a null finished time are sorted amongst themselves. Probably the default API sorting. By the Django ORM, it's possible to provide a secondary order_by column, so you could sort by -finished, and then -created (or -id could proxy for reverse creation order). That could require some extra work on the filter logic, but I think that's what you're moving towards. I don't necessarily see an inconsistency here.

Anyway, I say it's a safe assumption that "pending" status messages are received in order of creation and that successful/fail/error status messages are received in order of finished time, and discrepancies would be so minuscule that there is no value in worrying about it.

+1 from me after reading your synopsis.

@jakemcdermott
Copy link
Contributor

jakemcdermott commented Feb 20, 2020

implementation and described behavior lgtm 👍

It's worth pointing out:

We only do this if the user has the default -finished or similar -started order_by filter applied to the list. If they're sorting by something else we have no idea where this potential new row fits into the list.

I think this means we don't add new rows to lists with search filters applied? My gut tells me that removing live updates for these is a reasonable trade-off but I'm only guessing. @wenottingham @ryanpetrello @AlanCoding / anyone else with a bit more insight to customer usage at scale, does this seem like an acceptable trade-off?

@wenottingham
Copy link
Contributor

Is the implication that right now if you have some filter applied to the list, new rows will appear that don't match the filter?

@mabashian
Copy link
Member Author

@wenottingham no, that shouldn't happen. We can apply those same filters when making the GET requests for the new rows. If they don't match the filters, we won't get them back. The order_by is the one exception. If it's not -finished or -started, this code won't try to fetch new rows because we wouldn't know where to put them.

@wenottingham
Copy link
Contributor

So the options are a) not show them b) do a 'full' list request in that case only?

@mabashian
Copy link
Member Author

@wenottingham yea, that's my understanding. Those would be the two options.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

@softwarefactory-project-zuul
Copy link
Contributor

Build failed.

rebeccahhh and others added 10 commits March 3, 2020 11:05
…e need to stick with a single spelling of the word
… to update the relevant row in the templates list.
…vant job row. Also adds logic to attempt to re-order the list when the sort order is -finished since we have enough information client-side to do that.
…t and populated in the object.

For anyone reading this later, know that AdHocCommands still have unified_job_template and unified_job_template_id fields, they are just nonetypes because they don't get used by the AdHocCommand objects. Which means you have to actually get the object, not just check that it's there, to use it the way I am in this change.
@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@jakemcdermott
Copy link
Contributor

@unlikelyzero @mabashian @rebeccahhh 👋 Still LGTM - Where do we stand with this one? Is this thing ready to merge or does it need more review / testing?

@unlikelyzero
Copy link

Final stages of testing. Merge meeting is tomorrow

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded (gate pipeline).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants