feat: tpch + tpcds GHA launcher #3619

raunakab · 2024-12-19T18:13:51Z

Overview

This PR adds a "tpch" and "tpcds" launcher to the available tools. Allows you to easily scale up a ray-cluster and run queries against it.

Usage

In order to run tpcds, run the following:

uv run tools/tpch.py --scale-factor=2 --num-partitions=2 --questions='1-10'

In order to run tpcds, run the following:

uv run tools/tpcds.py --scale-factor=100 --questions='1-10'

As always, if you want help, run uv run tools/tpch.py --help or uv run tools/tpcds.py --help.

codspeed-hq · 2024-12-19T18:22:50Z

CodSpeed Performance Report

Merging #3619 will improve performances by 35.68%

_{Comparing tpcds-wrapper (ff772e1) with main (f6002f9)}

Summary

⚡ 1 improvements
✅ 26 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`tpcds-wrapper`	Change
⚡	`test_iter_rows_first_row[100 Small Files]`	308.4 ms	227.3 ms	+35.68%

codecov · 2024-12-19T18:36:34Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.88%. Comparing base (f6002f9) to head (ff772e1).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3619      +/-   ##
==========================================
- Coverage   77.88%   77.88%   -0.01%     
==========================================
  Files         719      719              
  Lines       88410    88410              
==========================================
- Hits        68861    68860       -1     
- Misses      19549    19550       +1

see 1 file with indirect coverage changes

universalmind303 · 2024-12-19T19:05:40Z

@raunakab It doesn't look like the prompt properly gets printed unless you have a really wide terminal:

raunakab · 2024-12-19T19:08:18Z

@raunakab It doesn't look like the prompt properly gets printed unless you have a really wide terminal:

@universalmind303 Oh that's strange. I can throw in an edit there soon. If you want to get by that for now, just type in a y (yes) or an n (no).

universalmind303

one thing i'm worried about is discovery for this.

I know I'm not going to remember uv run tools/tpcds.py --scale-factor=100 --questions='1-10' --cluster-profile='medium-x86'

does uv have any built in discovery for scripts?

raunakab · 2024-12-19T19:19:41Z

one thing i'm worried about is discovery for this.

I know I'm not going to remember uv run tools/tpcds.py --scale-factor=100 --questions='1-10' --cluster-profile='medium-x86'

does uv have any built in discovery for scripts?

Hmm, that is a good point. This might be something that @samster25 might know about. I'll try to see if something can be fashioned to help with discoverability.

universalmind303 · 2024-12-19T19:22:10Z

One other improvement that could be made. When I run the command, there's not a easy to use output, and I need to go dig through the logs to find out what even happened.

is it possible to customize the "build summary" with basic information about the run

raunakab · 2024-12-19T19:51:50Z

One other improvement that could be made. When I run the command, there's not a easy to use output, and I need to go dig through the logs to find out what even happened.
is it possible to customize the "build summary" with basic information about the run

@universalmind303 Yes, that is a point that I found annoying. I'm currently working on that right now.

My current thought is to produce an output CSV file which can be downloaded and viewed. It would list the queries, how long each one took, and any failures observed.

raunakab · 2024-12-19T21:29:05Z

@universalmind303, here is another PR which aims to make the outputs of runs nicer to visualize: #3625.

The first run is still running right now, but you should be able to see an output.csv file uploaded to GitHub for you to download and view.

The run is here:
https://github.com/Eventual-Inc/Daft/actions/runs/12420945783

raunakab · 2024-12-19T22:27:40Z

benchmarking/tpch/ray_job_runner.py

+# /// script
+# requires-python = ">=3.12"
+# dependencies = [
+#   "ray[default]",
+#   "getdaft",
+# ]
+# ///
+


This is just to make it possible to run this via uv run benchmarking/tpch/ray_job_runner.py.

jaychia · 2024-12-20T00:42:32Z

WRT discoverability, once we have more concrete workflows we can start organizing things as uv tools

https://docs.astral.sh/uv/guides/tools/#running-tools

We can probably have a daft-bench tool which is its own CLI that can be invoked from uv. Can include things such as data generation, running benchmarks etc.

jaychia

Mostly LGTM, some comments

jaychia · 2024-12-20T02:54:26Z

tools/tpcds.py

+        "--questions", type=str, required=False, default="*", help="A comma separated list of questions to run"
+    )
+    parser.add_argument("--scale-factor", type=int, required=False, default=2, help="The scale factor to run on")
+    parser.add_argument("--cluster-profile", type=str, required=False, help="The ray cluster configuration to run on")


jaychia · 2024-12-20T02:54:31Z

tools/tpcds.py

+    parser.add_argument(
+        "--questions", type=str, required=False, default="*", help="A comma separated list of questions to run"
+    )
+    parser.add_argument("--scale-factor", type=int, required=False, default=2, help="The scale factor to run on")


jaychia · 2024-12-20T02:56:07Z

tools/tpcds.py

+        type=str,
+        required=False,
+        help="A comma separated list of environment variables to pass to ray job",
+    )


Usually CLIs take a list like so:

--env-var A=1 --env-var B=1

jaychia · 2024-12-20T02:58:50Z

tools/utils.py

+    commit_hash = (
+        subprocess.check_output(["git", "rev-parse", branch_name], stderr=subprocess.STDOUT).strip().decode("utf-8")
+    )
+    return name, commit_hash


Rename file into git_utils.py?

jaychia · 2024-12-20T03:04:16Z

tools/utils.py

+        except ValueError:
+            raise ValueError(f"Invalid question item; expected a number or a range, instead got {item}")
+
+    return nums


I didn't realize your question parsing logic was so complex until reading the code.

I think to keep things simple, your workflows can just:

Take in comma separated list of questions

Otherwise, runs all questions

This utility function should just be 4 lines:

if questions is None: return list(range(total_number_of_questions)) else: return [int(q) for q in questions.split(",")]

You can perform validation logic somewhere else with a regex if you want

jaychia · 2024-12-20T03:04:28Z

tools/utils.py

+            ...
+
+        if "-" not in item:
+            raise ValueError("...")


What's this?

jaychia · 2024-12-20T03:04:35Z

tools/utils.py

+            nums.append(str(num))
+            continue
+        except ValueError:
+            ...


What's this?

github-actions bot added the feat label Dec 19, 2024

raunakab requested review from jaychia and universalmind303 December 19, 2024 18:15

raunakab marked this pull request as ready for review December 19, 2024 18:15

raunakab mentioned this pull request Dec 19, 2024

ci: tpcds benchmarking #3597

Closed

Add inline metadata to allow for local execution

a0c201c

Add a tpch wrapper script

6430684

universalmind303 approved these changes Dec 19, 2024

View reviewed changes

Finish tpch script

971ef5b

raunakab added 2 commits December 19, 2024 11:52

Add upper limit checking on questions; clean up logic

7e8dfe2

Add tpcds launcher as well

5502a2c

raunakab force-pushed the tpcds-wrapper branch from 08879a4 to 5502a2c Compare December 19, 2024 20:00

Merge branch 'main' into tpcds-wrapper

3e0882e

raunakab changed the title ~~feat: tpcds GHA launcher~~ feat: tpch + tpcds GHA launcher Dec 19, 2024

Change the args for tpch launcher

2378e68

Merge branch 'main' into tpcds-wrapper

ff772e1

raunakab commented Dec 19, 2024

View reviewed changes

jaychia approved these changes Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: tpch + tpcds GHA launcher #3619

feat: tpch + tpcds GHA launcher #3619

raunakab commented Dec 19, 2024 •

edited

Loading

codspeed-hq bot commented Dec 19, 2024 •

edited

Loading

codecov bot commented Dec 19, 2024 •

edited

Loading

universalmind303 commented Dec 19, 2024

raunakab commented Dec 19, 2024 •

edited

Loading

universalmind303 left a comment

raunakab commented Dec 19, 2024

universalmind303 commented Dec 19, 2024

raunakab commented Dec 19, 2024 •

edited

Loading

raunakab commented Dec 19, 2024

raunakab Dec 19, 2024

jaychia commented Dec 20, 2024

jaychia left a comment

jaychia Dec 20, 2024

jaychia Dec 20, 2024

jaychia Dec 20, 2024

jaychia Dec 20, 2024

jaychia Dec 20, 2024

jaychia Dec 20, 2024

jaychia Dec 20, 2024

feat: tpch + tpcds GHA launcher #3619

Are you sure you want to change the base?

feat: tpch + tpcds GHA launcher #3619

Conversation

raunakab commented Dec 19, 2024 • edited Loading

Overview

Usage

codspeed-hq bot commented Dec 19, 2024 • edited Loading

Merging #3619 will improve performances by 35.68%

Summary

Benchmarks breakdown

codecov bot commented Dec 19, 2024 • edited Loading

Codecov Report

universalmind303 commented Dec 19, 2024

raunakab commented Dec 19, 2024 • edited Loading

universalmind303 left a comment

Choose a reason for hiding this comment

raunakab commented Dec 19, 2024

universalmind303 commented Dec 19, 2024

raunakab commented Dec 19, 2024 • edited Loading

raunakab commented Dec 19, 2024

Choose a reason for hiding this comment

jaychia commented Dec 20, 2024

jaychia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raunakab commented Dec 19, 2024 •

edited

Loading

codspeed-hq bot commented Dec 19, 2024 •

edited

Loading

codecov bot commented Dec 19, 2024 •

edited

Loading

raunakab commented Dec 19, 2024 •

edited

Loading

raunakab commented Dec 19, 2024 •

edited

Loading