Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(profiler): dynamically combine queries #3572

Merged
merged 28 commits into from
Nov 24, 2021

Conversation

hsheth2
Copy link
Collaborator

@hsheth2 hsheth2 commented Nov 15, 2021

The GE profiler issues a ton of queries that each return a single number or a single row. This PR adds support for dynamically combining these queries together at runtime, which reduces the total number of queries issued.

Performance

Using the same testing setup as #3369 and #3510.

Small table - 9 columns, 500 rows

  • previous: 8.8-10.2 seconds
  • with changes: 5.4-6.4 seconds

Medium table - 16 columns, 60k rows

  • previous: 17.4-18.5 seconds
  • with changes: 12.2-14 seconds

These changes really shine when turn_off_expensive_profiling_metrics is enabled, where it yields a 2-3x speedup.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

@github-actions
Copy link

github-actions bot commented Nov 15, 2021

Unit Test Results

     37 files  +  1       37 suites  +1   29m 59s ⏱️ + 3m 27s
   591 tests +87     539 ✔️ +87  52 💤 ±0  0 ±0 
1 330 runs  +91  1 262 ✔️ +91  68 💤 ±0  0 ±0 

Results for commit 6d2df18. ± Comparison against base commit 87478a1.

♻️ This comment has been updated with latest results.

@hsheth2 hsheth2 marked this pull request as ready for review November 17, 2021 22:17
Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@rslanka rslanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Harshal!

@shirshanka shirshanka merged commit cb2a3e6 into datahub-project:master Nov 24, 2021
@hsheth2 hsheth2 deleted the profiler-query-combine branch November 24, 2021 18:28
swaroopjagadish added a commit to swaroopjagadish/datahub that referenced this pull request Nov 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants