-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: tpch + tpcds GHA launcher #3619
base: main
Are you sure you want to change the base?
Conversation
CodSpeed Performance ReportMerging #3619 will improve performances by 35.68%Comparing Summary
Benchmarks breakdown
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3619 +/- ##
==========================================
- Coverage 77.88% 77.88% -0.01%
==========================================
Files 719 719
Lines 88410 88410
==========================================
- Hits 68861 68860 -1
- Misses 19549 19550 +1 |
@raunakab It doesn't look like the prompt properly gets printed unless you have a really wide terminal: |
@universalmind303 Oh that's strange. I can throw in an edit there soon. If you want to get by that for now, just type in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing i'm worried about is discovery for this.
I know I'm not going to remember uv run tools/tpcds.py --scale-factor=100 --questions='1-10' --cluster-profile='medium-x86'
does uv have any built in discovery for scripts?
Hmm, that is a good point. This might be something that @samster25 might know about. I'll try to see if something can be fashioned to help with discoverability. |
@universalmind303 Yes, that is a point that I found annoying. I'm currently working on that right now. My current thought is to produce an output CSV file which can be downloaded and viewed. It would list the queries, how long each one took, and any failures observed. |
08879a4
to
5502a2c
Compare
@universalmind303, here is another PR which aims to make the outputs of runs nicer to visualize: #3625. The first run is still running right now, but you should be able to see an output.csv file uploaded to GitHub for you to download and view. The run is here: |
# /// script | ||
# requires-python = ">=3.12" | ||
# dependencies = [ | ||
# "ray[default]", | ||
# "getdaft", | ||
# ] | ||
# /// | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to make it possible to run this via uv run benchmarking/tpch/ray_job_runner.py
.
WRT discoverability, once we have more concrete workflows we can start organizing things as https://docs.astral.sh/uv/guides/tools/#running-tools We can probably have a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, some comments
"--questions", type=str, required=False, default="*", help="A comma separated list of questions to run" | ||
) | ||
parser.add_argument("--scale-factor", type=int, required=False, default=2, help="The scale factor to run on") | ||
parser.add_argument("--cluster-profile", type=str, required=False, help="The ray cluster configuration to run on") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choice
parser.add_argument( | ||
"--questions", type=str, required=False, default="*", help="A comma separated list of questions to run" | ||
) | ||
parser.add_argument("--scale-factor", type=int, required=False, default=2, help="The scale factor to run on") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choice
type=str, | ||
required=False, | ||
help="A comma separated list of environment variables to pass to ray job", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually CLIs take a list like so:
--env-var A=1 --env-var B=1
commit_hash = ( | ||
subprocess.check_output(["git", "rev-parse", branch_name], stderr=subprocess.STDOUT).strip().decode("utf-8") | ||
) | ||
return name, commit_hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename file into git_utils.py
?
except ValueError: | ||
raise ValueError(f"Invalid question item; expected a number or a range, instead got {item}") | ||
|
||
return nums |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize your question parsing logic was so complex until reading the code.
I think to keep things simple, your workflows can just:
- Take in comma separated list of questions
- Otherwise, runs all questions
This utility function should just be 4 lines:
if questions is None:
return list(range(total_number_of_questions))
else:
return [int(q) for q in questions.split(",")]
You can perform validation logic somewhere else with a regex if you want
... | ||
|
||
if "-" not in item: | ||
raise ValueError("...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this?
nums.append(str(num)) | ||
continue | ||
except ValueError: | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this?
Overview
This PR adds a "tpch" and "tpcds" launcher to the available tools. Allows you to easily scale up a ray-cluster and run queries against it.
Usage
In order to run tpcds, run the following:
uv run tools/tpch.py --scale-factor=2 --num-partitions=2 --questions='1-10'
In order to run tpcds, run the following:
uv run tools/tpcds.py --scale-factor=100 --questions='1-10'
As always, if you want help, run
uv run tools/tpch.py --help
oruv run tools/tpcds.py --help
.