Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Broadcast plan over port 3238 #3756

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

feat: Broadcast plan over port 3238 #3756

wants to merge 18 commits into from

Conversation

raunakab
Copy link
Contributor

@raunakab raunakab commented Feb 2, 2025

Overview

This PR adds the ability of broadcasting the serialized Mermaid query plan over port 3238.
Receiving services can listen to this TCP broadcast over 3238, read the data, and process/present it.

This will be useful for the dashboard, which will (through another proxy service) be listening over 3238 for the plan, so that it can display it to the end-user.

Notes

The number 3238 is daft spelled over the telephone's alphanumeric keypad.

image

@github-actions github-actions bot added the feat label Feb 2, 2025
Copy link

codspeed-hq bot commented Feb 2, 2025

CodSpeed Performance Report

Merging #3756 will improve performances by 10.31%

Comparing broadcast-metrics (45a4459) with main (86adc44)

Summary

⚡ 1 improvements
✅ 26 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
test_iter_rows_first_row[100 Small Files] 342.5 ms 310.5 ms +10.31%

Copy link

codecov bot commented Feb 2, 2025

Codecov Report

Attention: Patch coverage is 60.00000% with 16 lines in your changes missing coverage. Please review.

Project coverage is 77.66%. Comparing base (6deb87e) to head (cc1d15d).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
daft/dataframe/dataframe.py 66.66% 10 Missing ⚠️
daft/context.py 40.00% 6 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3756      +/-   ##
==========================================
+ Coverage   77.31%   77.66%   +0.34%     
==========================================
  Files         734      735       +1     
  Lines       93737    95495    +1758     
==========================================
+ Hits        72475    74167    +1692     
- Misses      21262    21328      +66     
Files with missing lines Coverage Δ
daft/context.py 84.88% <40.00%> (-2.78%) ⬇️
daft/dataframe/dataframe.py 84.97% <66.66%> (-0.55%) ⬇️

... and 38 files with indirect coverage changes

@raunakab raunakab marked this pull request as ready for review February 5, 2025 21:55
daft/dataframe/dataframe.py Outdated Show resolved Hide resolved
daft/dataframe/dataframe.py Outdated Show resolved Hide resolved
daft/dataframe/dataframe.py Outdated Show resolved Hide resolved
daft/dataframe/dataframe.py Outdated Show resolved Hide resolved
@@ -177,6 +199,7 @@ def explain(
file (Optional[io.IOBase]): Location to print the output to, or defaults to None which defaults to the default location for
print (in Python, that should be sys.stdout)
"""
self.explain_broadcast()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Why would be broadcast to the dashboard if we're not executing a query

Copy link
Contributor Author

@raunakab raunakab Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was thinking if a user just wanted to see what plan would be produced without executing a query, they could also use the dashboard UI to view the query plan.

In this case, the UI would show:

  • the query plan
  • the time it took to produce the plan
  • "N/A" for the time it took to execute the query (since .explain doesn't execute queries)

@@ -2818,6 +2841,7 @@ def collect(self, num_preview_rows: Optional[int] = 8) -> "DataFrame":
DataFrame: DataFrame with materialized results.
"""
self._materialize_results()
self.explain_broadcast()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe consider making this a decorator for all "execution" methods (show, collect, write_parquet etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. A decorator makes sense here. Will update in the coming commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use decorator instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need to add this to more APIs. Do we have an exhaustive list somewhere of all the APIs we would want to add this decorator to?

@raunakab raunakab requested a review from jaychia February 5, 2025 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants