Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chart List performance issue caused by sql with cartesian join generated by flask-app-builder #23092

Closed
DataScientistSamChan opened this issue Feb 16, 2023 · 5 comments
Assignees
Labels
#bug:performance Performance bugs fab

Comments

@DataScientistSamChan
Copy link

Version: I found this bug even on the latest version cloned today.

When user is accessing the chart list page, the get_list controller method of BaseSupersetModelRestApi is called which eventually call the get_list_headless method of the BaseApi class in the flask_appbuilder.api.__init__ package. The query generated as it turns out is as below with cartesian join between anon_1 and last_saved_by:

SELECT anon_1.last_saved_by_id             AS anon_1_last_saved_by_id,
       anon_1.last_saved_by_first_name     AS anon_1_last_saved_by_first_name,
       anon_1.last_saved_by_last_name      AS anon_1_last_saved_by_last_name,
       last_saved_by.username              AS last_saved_by_username,
       last_saved_by.password              AS last_saved_by_password,
       last_saved_by.active                AS last_saved_by_active,
       last_saved_by.email                 AS last_saved_by_email,
       last_saved_by.last_login            AS last_saved_by_last_login,
       last_saved_by.login_count           AS last_saved_by_login_count,
       last_saved_by.fail_login_count      AS last_saved_by_fail_login_count,
       last_saved_by.created_on            AS last_saved_by_created_on,
       last_saved_by.changed_on            AS last_saved_by_changed_on,
       last_saved_by.created_by_fk         AS last_saved_by_created_by_fk,
       last_saved_by.changed_by_fk         AS last_saved_by_changed_by_fk,
       anon_1.tables_id                    AS anon_1_tables_id,
       anon_1.tables_default_endpoint      AS anon_1_tables_default_endpoint,
       anon_1.tables_table_name            AS anon_1_tables_table_name,
       anon_1.created_by_id                AS anon_1_created_by_id,
       anon_1.created_by_first_name        AS anon_1_created_by_first_name,
       anon_1.created_by_last_name         AS anon_1_created_by_last_name,
       anon_1.changed_by_id                AS anon_1_changed_by_id,
       anon_1.changed_by_first_name        AS anon_1_changed_by_first_name,
       anon_1.changed_by_last_name         AS anon_1_changed_by_last_name,
       anon_1.slices_changed_on            AS anon_1_slices_changed_on,
       anon_1.slices_id                    AS anon_1_slices_id,
       anon_1.slices_slice_name            AS anon_1_slices_slice_name,
       anon_1.slices_datasource_id         AS anon_1_slices_datasource_id,
       anon_1.slices_datasource_type       AS anon_1_slices_datasource_type,
       anon_1.slices_viz_type              AS anon_1_slices_viz_type,
       anon_1.slices_params                AS anon_1_slices_params,
       anon_1.slices_description           AS anon_1_slices_description,
       anon_1.slices_cache_timeout         AS anon_1_slices_cache_timeout,
       anon_1.slices_last_saved_at         AS anon_1_slices_last_saved_at,
       anon_1.slices_last_saved_by_fk      AS anon_1_slices_last_saved_by_fk,
       anon_1.slices_certified_by          AS anon_1_slices_certified_by,
       anon_1.slices_certification_details AS anon_1_slices_certification_details,
       anon_1.slices_created_by_fk         AS anon_1_slices_created_by_fk,
       anon_1.slices_changed_by_fk         AS anon_1_slices_changed_by_fk
FROM   (
                       SELECT          slices.changed_on            AS slices_changed_on,
                                       slices.id                    AS slices_id,
                                       slices.slice_name            AS slices_slice_name,
                                       slices.datasource_id         AS slices_datasource_id,
                                       slices.datasource_type       AS slices_datasource_type,
                                       slices.viz_type              AS slices_viz_type,
                                       slices.params                AS slices_params,
                                       slices.description           AS slices_description,
                                       slices.cache_timeout         AS slices_cache_timeout,
                                       slices.last_saved_at         AS slices_last_saved_at,
                                       slices.last_saved_by_fk      AS slices_last_saved_by_fk,
                                       slices.certified_by          AS slices_certified_by,
                                       slices.certification_details AS slices_certification_details,
                                       slices.created_by_fk         AS slices_created_by_fk,
                                       slices.changed_by_fk         AS slices_changed_by_fk,
                                       changed_by.id                AS changed_by_id,
                                       changed_by.first_name        AS changed_by_first_name,
                                       changed_by.last_name         AS changed_by_last_name,
                                       created_by.id                AS created_by_id,
                                       created_by.first_name        AS created_by_first_name,
                                       created_by.last_name         AS created_by_last_name,
                                       last_saved_by.id             AS last_saved_by_id,
                                       last_saved_by.first_name     AS last_saved_by_first_name,
                                       last_saved_by.last_name      AS last_saved_by_last_name,
                                       `tables`.id                  AS tables_id,
                                       `tables`.default_endpoint    AS tables_default_endpoint,
                                       `tables`.table_name          AS tables_table_name
                       FROM            slices
                       LEFT OUTER JOIN ab_user AS changed_by
                       ON              slices.changed_by_fk = changed_by.id
                       LEFT OUTER JOIN ab_user AS created_by
                       ON              slices.created_by_fk = created_by.id
                       LEFT OUTER JOIN ab_user AS last_saved_by
                       ON              slices.last_saved_by_fk = last_saved_by.id
                       LEFT OUTER JOIN `tables`
                       ON              slices.datasource_id = `tables`.id
                       AND             slices.datasource_type = %(datasource_type_1)s
                       ORDER BY        slices.changed_on DESC limit %(param_1)s) AS anon_1,
       ab_user                                                                   AS last_saved_by

As we can see, there is no join condition on anon_1 and last_saved_by and the performance is poor once both the table gets large.

As I dig deeper, I found the cartesian join is generated by the from_self call in the apply_all method of the class flask_appbuilder.models.sqla.interface, I wonder if there is anything we could do to fix this?

@DataScientistSamChan
Copy link
Author

After debugging for 2 days and seeing no easy solution for this, I have decided to comment out the user relation info part in the list_column of the class ChartRestApi and the performance boost is major, going down from 30 seconds in my production to roughly 0.5 ms, the catch is we lose the user related info, but that 's fine, I will add the user info back in the pre_get_list hook. For now, it's my solution.

@D3nn3
Copy link

D3nn3 commented Apr 12, 2023

I just tested v2.1.0 on our staging environment and saw a significant increase in loading times on the dashboards and charts pages and found this issue.

Compared to v2.0.1, the pod cpu usage in our K8s cluster increased from ~185 millicores to ~520 millicores when continuously refreshing the dashboards and charts pages. I assume this issue is the root cause for this? Didn't want to create a separate issue in case it's a duplicate.

@rusackas
Copy link
Member

I think this is something @john-bodley took a stab at quite some time ago if memory serves. Is it still an issue in 3.0?

@john-bodley
Copy link
Member

@rusackas
Copy link
Member

rusackas commented Jul 9, 2024

I'm assuming/hoping that the above PRs mitigated the problem. It's been several months since the last comments here, so I'm closing this as stale. If anyone wants to chip in here with updated context/metrics/etc, we can certainly reopen this.

@rusackas rusackas closed this as not planned Won't fix, can't repro, duplicate, stale Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#bug:performance Performance bugs fab
Projects
None yet
Development

No branches or pull requests

6 participants
@rusackas @dpgaspar @john-bodley @DataScientistSamChan @D3nn3 and others