Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: cache dashboard bootstrap data #11234

Merged
merged 4 commits into from
Oct 14, 2020
Merged

Conversation

ktmud
Copy link
Member

@ktmud ktmud commented Oct 12, 2020

SUMMARY

Large and complex dashboards can be very slow to load because they need to read all the related slices and datasources and they all have go through SQL queries with no caching enabled.

A previous attempt to cache the rendered dashboard page had to be reverted because it did not consider user-specific data on that page. Why don't we cache the non-user-specific bootstrap data instead?

This PR extracts the data building logics from the /dashboard/:id_or_slug Flask view to the Dashboard model and uses Flack caching decorator to manage the cache. Cache will be cleaned up when any one of the dashboard, the associated slices, datasources, or table columns/metrics is updated.

When tested with a dashboard of 300+ slices in our staging environment, this change can save up to more than 3 seconds of page load time.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TEST PLAN

Make sure the dashboard page still work as expected.

  • You second visit to a very large dashboard page should be much more faster
  • Update a slice or datasource, refresh the Dashboard page, you should see the updates
  • User access etc should still work

ADDITIONAL INFORMATION

@codecov-io
Copy link

codecov-io commented Oct 12, 2020

Codecov Report

Merging #11234 into master will decrease coverage by 2.41%.
The diff coverage is 66.08%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #11234      +/-   ##
==========================================
- Coverage   61.47%   59.05%   -2.42%     
==========================================
  Files         832      797      -35     
  Lines       39445    38232    -1213     
  Branches     3598     3396     -202     
==========================================
- Hits        24248    22579    -1669     
- Misses      15015    15471     +456     
  Partials      182      182              
Flag Coverage Δ
#cypress 55.94% <ø> (?)
#javascript ?
#python 60.73% <66.08%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/models/dashboard.py 80.63% <49.20%> (-7.41%) ⬇️
superset/views/core.py 74.51% <61.11%> (-0.29%) ⬇️
superset/charts/commands/delete.py 91.42% <100.00%> (+0.51%) ⬆️
superset/config.py 90.11% <100.00%> (+0.03%) ⬆️
superset/dashboards/dao.py 94.38% <100.00%> (ø)
superset/models/core.py 89.10% <100.00%> (+0.70%) ⬆️
superset/utils/core.py 89.73% <100.00%> (+0.08%) ⬆️
superset/utils/decorators.py 65.82% <100.00%> (+10.82%) ⬆️
superset/views/dashboard/views.py 70.58% <100.00%> (ø)
...uperset-frontend/src/dashboard/util/dnd-reorder.js 0.00% <0.00%> (-100.00%) ⬇️
... and 343 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 16f7b2b...985fdb0. Read the comment docs.

superset/config.py Outdated Show resolved Hide resolved
if is_feature_enabled("DASHBOARD_CACHE"):
from superset.models.dashboard import Dashboard

Dashboard.clear_cache_for_datasource(datasource_id=self.id)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to add this to SQLAlchemy events, just like what we did with slices, but it was somehow triggered multiple times (4 or 5), which could have performance risks when a datasource is used in multiple dashboards.

So I moved it to this BaseDatasource API instead. The drawback is it may not trigger if the datasource is updated from FAB CRUD view (an unlikely use case).

@@ -131,7 +140,7 @@ class Dashboard( # pylint: disable=too-many-instance-attributes
css = Column(Text)
json_metadata = Column(Text)
slug = Column(String(255), unique=True)
slices = relationship("Slice", secondary=dashboard_slices, backref="dashboards")
slices = relationship(Slice, secondary=dashboard_slices, backref="dashboards")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove quotes for consistency.

@@ -145,7 +154,7 @@ class Dashboard( # pylint: disable=too-many-instance-attributes
]

def __repr__(self) -> str:
return self.dashboard_title or str(self.id)
return f"Dashboard<{self.slug or self.id}>"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__repr__ is used in cache key.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change, when adding a dashboard schedule, dashboard names are seen as "Dashboard <1>" etc... with no name at all and hard to select a dashboard for a schedule from its ID.... Is not there a better option other than this?

image

superset/models/dashboard.py Outdated Show resolved Hide resolved
@@ -44,6 +45,7 @@ def __init__(self, user: User, model_id: int):
def run(self) -> Model:
self.validate()
try:
Dashboard.clear_cache_for_slice(slice_id=self._model_id)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't use before_delete event for Slice because the association record in the relationship table datasource_slice is deleted before the Slice object is deleted.

@@ -108,7 +108,7 @@ def set_dash_metadata(
and obj["meta"]["chartId"]
):
chart_id = obj["meta"]["chartId"]
obj["meta"]["uuid"] = uuid_map[chart_id]
obj["meta"]["uuid"] = uuid_map.get(chart_id)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will throw 500 errors when chart don't have uuid, which could happen when chart is deleted but the dashboard cache is not purged for some reason.

superset/models/dashboard.py Outdated Show resolved Hide resolved
@@ -50,6 +50,7 @@
SQL_MAX_ROW = 666
SQLLAB_CTAS_NO_LIMIT = True # SQL_MAX_ROW will not take affect for the CTA queries
FEATURE_FLAGS = {
**FEATURE_FLAGS,
Copy link
Member Author

@ktmud ktmud Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can override the configs from SUPERSET_CONFIG_PATH (I've set it to ~/.superset/config.py).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this do anything? FEATURE_FLAGS is empty by default in the config

Copy link
Member Author

@ktmud ktmud Oct 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But SUPERSET_CONFIG_PATH (a config file) is loaded before SUPERSET_CONFIG (a Python module). This allows users to override the feature flags in SUPERSET_CONFIG_PATH (a file not tracked in Git) when starting superset like this:

SUPERSET_CONFIG=tests.superset_test_config superset run

@ktmud ktmud force-pushed the dashboard-cache branch 6 times, most recently from ebe41c8 to ee36591 Compare October 12, 2020 20:25
@@ -14,4 +14,3 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
from . import models, views
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up to break circular imports, because I want to import DruidMetric and DruidColumn from dashboard.py.

@@ -719,8 +718,5 @@ class FavStar(Model): # pylint: disable=too-few-public-methods

# events for updating tags
if is_feature_enabled("TAGGING_SYSTEM"):
sqla.event.listen(Dashboard, "after_insert", DashboardUpdater.after_insert)
sqla.event.listen(Dashboard, "after_update", DashboardUpdater.after_update)
sqla.event.listen(Dashboard, "after_delete", DashboardUpdater.after_delete)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a duplicate of https://github.com/apache/incubator-superset/blob/93fdf1d64465b3378de65fb1b058e1a04742a70f/superset/models/dashboard.py#L573

cc @suddjian

Removed so that it's possible to import models.core from dashboard.py. It makes more sense to let Dashboard depends on core (if needed) than vice versa.

sqla.event.listen(Dashboard, "after_update", event_after_dashboard_changed)
update_thumbnail: OnDashboardChange = lambda _, __, target: target.update_thumbnail()
sqla.event.listen(Dashboard, "after_insert", update_thumbnail)
sqla.event.listen(Dashboard, "after_update", update_thumbnail)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some refactor for consistency.

@@ -1040,7 +1038,7 @@ def copy_dash( # pylint: disable=no-self-use
for value in data["positions"].values():
if isinstance(value, dict) and value.get("meta", {}).get("chartId"):
old_id = value["meta"]["chartId"]
new_id = old_to_new_slice_ids[old_id]
new_id = old_to_new_slice_ids.get(old_id)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a 500 error when duplicating a dashboard with deleted slices.

@ktmud ktmud force-pushed the dashboard-cache branch 3 times, most recently from 96590c2 to 19c6395 Compare October 13, 2020 18:07
Copy link
Member

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add unit tests for the new decorators?

superset/models/dashboard.py Show resolved Hide resolved
Copy link

@graceguo-supercat graceguo-supercat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ktmud
Copy link
Member Author

ktmud commented Oct 13, 2020

could we add unit tests for the new decorators?

7939b0a

Added some basic tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 1.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants