Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI Add Cluster Activity Page #31123

Merged
merged 18 commits into from
Jun 1, 2023
Merged

Conversation

pierrejeambrun
Copy link
Member

@pierrejeambrun pierrejeambrun commented May 7, 2023

closes: #16081

This add a native dashboard page to the UI to gather a few useful metrics to monitor your airflow cluster.

Up to now people had to build their own dashboard with third parties tools such as grafana.

This PR is not finalized yet but aim to gather feedback on the general structure, UI or any other improvement you might want to see in this page before I go ahead and move forward with a more complete implementation.

  • Add a chart library to our front end, leverage another Apache project Echart 🎉 (which is great btw, very flexible and exhaustive library).
  • 2 kind of metrics (live that uses current data, and historical that fetch older records from the db and has filter enabled)
  • Everything is auto refreshed
  • Live metrics use different REST API calls to fetch information that we have available
  • Historical metrics use a single call to a custom private endpoint that fetch everything we need. (Could split that and integrate it to the REST API but we would need some standards for KPIs endpoint and generic way of fetching that to be consistent across resources, i.e how to pass the columns of interest, aggregation type, validation standard payload response etc...). We could do that in a separate PR because this would require on it's own some effort
  • UI is responsive and wraps as needed. (Could definitely be improved)

TODO:

  • Add tests
  • Fix colors by plugin in STATE_COLORS and Chakra theme / harmonize the palette
  • Factorize some front end components (maybe with compound components)
  • Memoize options if needed (no gain identified)

image
image
image
image

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues labels May 7, 2023
@pierrejeambrun pierrejeambrun changed the title 16081 dashboard UI Add dashboard page May 7, 2023
@pierrejeambrun pierrejeambrun added this to the Airflow 2.7.0 milestone May 7, 2023
@pierrejeambrun pierrejeambrun added the type:new-feature Changelog: New Features label May 7, 2023
@eladkal
Copy link
Contributor

eladkal commented May 7, 2023

This looks great! Kudos!

Couple of ideas to consider:

  1. Viewing dashboard only for specific owner / specific tags
  2. STATE_COLORS is configurable. can we support the colors user define? (Not sure if this is what you meant in the TODO)
  3. As part of the Live metrics would be nice to also see Number of DAGs to be scheduled in the next hour
  4. If we can some how show pool utilization across time? e.g graph of occupied slots vs time. This can be useful to detect bottle necks.
  5. I think the dashboard can be named: Cluster Activity?

@potiuk
Copy link
Member

potiuk commented May 7, 2023

Whoa . This is going to be incredibly useful!

I think especially if it could focus on Airflow-specific charts and graphs, surfacing some common problems people might have.

Also a small idea sparked from those chart. How about those (way in the future) with some guidance on how to approach seeing some of those values !

Think of the instructions for your appliance: If you see 3 blinking dots, it likely means that problem and you can approach it possibly by cleaning your ... (half-joking of course, but I think there are a number o things that are petty repeatably fixable problems in Airflow if you could surface some problems in forms of similar charts and people would have some instructions next to those charts.

There is a long way from the charts to such instructions, but I think they are an absolute prerequisite and extremely useful on its own.

@bbovenzi
Copy link
Contributor

bbovenzi commented May 8, 2023

Great work! Some quick usability comments:

Let's put some labels here:
Screenshot 2023-05-08 at 12 24 17 PM

Looks like we need to fix some overflow:
Screenshot 2023-05-08 at 12 24 05 PM

Let's use the same state colors as the rest of the UI. I think that's already in your todo though. I also wonder if we can order the legend by number of times that the task instance state is in the pie chart:
Screenshot 2023-05-08 at 12 23 16 PM

airflow/www/static/js/dashboard/live-metrics/Pools.tsx Outdated Show resolved Hide resolved
airflow/www/static/js/api/useDagRuns.tsx Outdated Show resolved Hide resolved
airflow/www/static/js/api/useDagRuns.tsx Outdated Show resolved Hide resolved
airflow/www/static/js/dashboard/nav/FilterBar.tsx Outdated Show resolved Hide resolved
airflow/www/static/js/dashboard/useFilters.tsx Outdated Show resolved Hide resolved
@pierrejeambrun
Copy link
Member Author

Thank you for the feedbacks and early reviews, I updated the PR. I implemented everything I could that didn't required too much effort as the change is already considerably big. I would be glad to add more complex features in follow up PRs.

Screenshots have been updated :)

@eladkal
Copy link
Contributor

eladkal commented May 20, 2023

I gathered some feedback from the community (probably ideas for future work)

  1. Top 5 dags/tasks that constantly retry (motivation: if task always succesful after 3 retries its hard to notice it and ideally should be invistigated why it never succesed on 1st try)
  2. Anomally - show if latest 5 dagrus took segnificantly longer time to run then previous 5 dagruns.
  3. Bar graph that shows pool utilization across 24 hours.

@eladkal
Copy link
Contributor

eladkal commented May 20, 2023

Since no room for all statuses in the legened can we prefer to present first statuses that appear in the graph?

For example the graph doesnt show dagruns in Queued yet Queued appear on the legend. On the other had Succeed appear in graph but not in the legend.

Screenshot_20230520_123706_Chrome

@pierrejeambrun
Copy link
Member Author

pierrejeambrun commented May 20, 2023

Thank you for the valuable feedback and ideas.

Since no room for all statuses in the legened can we prefer to present first statuses that appear in the graph?

Done for all the pie charts. For the Pool bar chart, ordering in the same way is not straight forward, it's a little trickier. (success is missing from one pool, but exists on another pool, etc). But the overflow might never happen for the pool as the graph is really big. I will still try to implement though, I'll let you know.

I gathered some feedback from the community (probably ideas for future work)

Nice, I would be glad to work on those in follow up PRs.

@pierrejeambrun pierrejeambrun changed the title UI Add dashboard page UI Add Cluster Activity Page May 25, 2023
Copy link
Contributor

@bbovenzi bbovenzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments but overall lgtm

I wonder if we could include triggerer status in the future too and avoid this:
Screenshot 2023-05-25 at 3 55 13 PM

@pierrejeambrun
Copy link
Member Author

pierrejeambrun commented May 25, 2023

Thanks Brent, I will update soon. Yes I saw that @pankajkoti added the triggerer health in the API response #31529. I will add it to the dashboard as well. After that we can remove the banner I think

@pierrejeambrun
Copy link
Member Author

@jedcunningham This is how it looks like now, thanks
image

@jedcunningham
Copy link
Member

#protm

@pierrejeambrun pierrejeambrun merged commit d67b383 into apache:main Jun 1, 2023
@pierrejeambrun pierrejeambrun deleted the 16081-dashboard branch June 1, 2023 21:17
@marclamberti
Copy link

@pierrejeambrun that's amazing 🤩🤩🤩🤩

@pierrejeambrun
Copy link
Member Author

Thanks @marclamberti, hope users will like it :)

@kaxil
Copy link
Member

kaxil commented Jun 29, 2023

Great PR, well done @pierrejeambrun 👏

@vikramkoka
Copy link
Contributor

This looks amazing @pierrejeambrun, very well done!

@alex-astronomer
Copy link
Contributor

Whoaaaaaa...

@jscheffl
Copy link
Contributor

jscheffl commented Jul 8, 2023

@pierrejeambrun I really like the PR and the newly added page!

I just updated my latest master and realize that the menu etry is added in top menu as second entry, so quite present.
As it is some kind of admin related, I think it is better located below the or menu - it is rather some kind of secondary thing, not a primary entry.

Was there a discussion about how to place it? Would it be okay if I open a PR to move the menu entry?

@eladkal
Copy link
Contributor

eladkal commented Jul 8, 2023

Was there a discussion about how to place it? Would it be okay if I open a PR to move the menu entry?

I think the current location is OK.
This is going to be one of the most useful pages. I dont think this page will be soly for users with Admin role.

I suggest to wait after 2.7.0 is released. Gather user feedback and then make changes if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues type:new-feature Changelog: New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add dashboard to see DAGs (system wide status) overview
10 participants