-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Broadcast plan over port 3238 #3756
base: main
Are you sure you want to change the base?
Conversation
CodSpeed Performance ReportMerging #3756 will improve performances by 10.31%Comparing Summary
Benchmarks breakdown
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3756 +/- ##
==========================================
+ Coverage 77.31% 77.66% +0.34%
==========================================
Files 734 735 +1
Lines 93737 95495 +1758
==========================================
+ Hits 72475 74167 +1692
- Misses 21262 21328 +66
|
daft/dataframe/dataframe.py
Outdated
@@ -177,6 +199,7 @@ def explain( | |||
file (Optional[io.IOBase]): Location to print the output to, or defaults to None which defaults to the default location for | |||
print (in Python, that should be sys.stdout) | |||
""" | |||
self.explain_broadcast() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Why would be broadcast to the dashboard if we're not executing a query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I was thinking if a user just wanted to see what plan would be produced without executing a query, they could also use the dashboard UI to view the query plan.
In this case, the UI would show:
- the query plan
- the time it took to produce the plan
- "N/A" for the time it took to execute the query (since
.explain
doesn't execute queries)
daft/dataframe/dataframe.py
Outdated
@@ -2818,6 +2841,7 @@ def collect(self, num_preview_rows: Optional[int] = 8) -> "DataFrame": | |||
DataFrame: DataFrame with materialized results. | |||
""" | |||
self._materialize_results() | |||
self.explain_broadcast() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe consider making this a decorator for all "execution" methods (show
, collect
, write_parquet
etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. A decorator makes sense here. Will update in the coming commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use decorator instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to add this to more APIs. Do we have an exhaustive list somewhere of all the APIs we would want to add this decorator to?
Overview
This PR adds the ability of broadcasting the serialized Mermaid query plan over port 3238.
Receiving services can listen to this TCP broadcast over 3238, read the data, and process/present it.
This will be useful for the dashboard, which will (through another proxy service) be listening over 3238 for the plan, so that it can display it to the end-user.
Notes
The number 3238 is
daft
spelled over the telephone's alphanumeric keypad.