Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #8

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Update README.md #8

wants to merge 5 commits into from

Conversation

pbalcer
Copy link
Owner

@pbalcer pbalcer commented May 20, 2024

No description provided.

This patch adds a script for running compute-benchmarks,
https://github.com/intel/compute-benchmarks/, and a corresponding
GH Actions workflow that runs those benchmarks when prompted
to do so with a comment, like so:

/benchmarks-level-zero --env UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1

Additional arguments can be appended to the end of the line. After the build if finished,
the results will be presented through a comment.

For now, this runs only a single scenario, api_overhead_benchmark_sycl with SubmitKernel test,
but will expand over time to cover more.
@pbalcer
Copy link
Owner Author

pbalcer commented May 20, 2024

/benchmarks-level-zero --env UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1 --save baseline

Copy link

Copy link

Compute Benchmarks L0 run:
https://github.com/pbalcer/unified-runtime/actions/runs/9157096738
Job status: success. Test status: success.

Benchmark Results

Chart

xychart-beta
title "api_overhead_benchmark_sycl (lower is better)"
x-axis ["Batched In Order", "Batched Out Of Order", "Immediate In Order", "Immediate Out Of Order"]
y-axis "mean execution time per 10 kernels (in μs)" 0 --> 100.0
bar [52.319, 34.515, 53.157, 34.171]

Comparison

Comparison data not found. No comparison performed.

Details

Batched In Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0
UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_l0 --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=l0 Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),52.319,52.192,1.53%,51.105,81.569,[CPU],[us]

Batched Out Of Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0
UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_l0 --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=l0 Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),34.515,34.253,3.05%,32.832,55.529,[CPU],[us]

Immediate In Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1
UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_l0 --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=l0 Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),53.157,52.438,3.31%,51.132,67.065,[CPU],[us]

Immediate Out Of Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1
UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_l0 --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=l0 Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),34.171,34.074,3.92%,32.980,150.076,[CPU],[us]

@pbalcer
Copy link
Owner Author

pbalcer commented May 20, 2024

/benchmarks-level-zero --env UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1 --compare baseline

Copy link

Copy link

Compute Benchmarks L0 run:
https://github.com/pbalcer/unified-runtime/actions/runs/9157189347
Job status: success. Test status: success.

Benchmark Results

Chart

xychart-beta
title "api_overhead_benchmark_sycl (lower is better)"
x-axis ["Batched In Order", "Batched Out Of Order", "Immediate In Order", "Immediate Out Of Order"]
y-axis "mean execution time per 10 kernels (in μs)" 0 --> 100.0
bar [49.456, 33.692, 27.009, 26.39]
line [52.319, 34.515, 53.157, 34.171]

Loading

Comparison

Comparison with previous data:

  • Batched In Order: -5.47%
  • Batched Out Of Order: -2.38%
  • Immediate In Order: -49.19%
  • Immediate Out Of Order: -22.77%

Details

Batched In Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0
UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_l0 --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=l0 Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),49.456,49.354,1.48%,48.032,61.412,[CPU],[us]

Batched Out Of Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0
UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_l0 --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=l0 Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),33.692,33.370,3.30%,32.019,51.200,[CPU],[us]

Immediate In Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1
UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_l0 --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=l0 Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),27.009,25.353,12.38%,23.875,46.167,[CPU],[us]

Immediate Out Of Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1
UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_l0 --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=l0 Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),26.390,25.451,11.08%,22.776,38.362,[CPU],[us]

@pbalcer
Copy link
Owner Author

pbalcer commented May 20, 2024

/benchmarks-level-zero --save baseline

Copy link

Copy link

Compute Benchmarks L0 run:
https://github.com/pbalcer/unified-runtime/actions/runs/9157272368
Job status: cancelled. Test status: skipped.

@pbalcer
Copy link
Owner Author

pbalcer commented May 20, 2024

/benchmarks-level-zero --save baseline

Copy link

Copy link

Compute Benchmarks L0 run:
https://github.com/pbalcer/unified-runtime/actions/runs/9157294453
Job status: success. Test status: success.

Benchmark Results

Chart

xychart-beta
title "api_overhead_benchmark_sycl (lower is better)"
x-axis ["Batched In Order", "Batched Out Of Order", "Immediate In Order", "Immediate Out Of Order"]
y-axis "mean execution time per 10 kernels (in μs)" 0 --> 100.0
bar [26.188, 44.666, 30.889, 48.461]
line [26.188, 44.666, 30.889, 48.461]

Loading

Comparison

Comparison with previous data:

  • Batched In Order: +0.00%
  • Batched Out Of Order: +0.00%
  • Immediate In Order: +0.00%
  • Immediate Out Of Order: +0.00%

Details

Batched In Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),26.188,24.938,13.79%,24.203,190.529,[CPU],[us]

Batched Out Of Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),44.666,44.762,5.45%,27.310,215.884,[CPU],[us]

Immediate In Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),30.889,28.060,16.98%,26.813,64.287,[CPU],[us]

Immediate Out Of Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),48.461,48.397,1.26%,46.630,65.479,[CPU],[us]

@pbalcer
Copy link
Owner Author

pbalcer commented May 20, 2024

/benchmarks-level-zero --env UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1

Copy link

Copy link

Compute Benchmarks L0 run:
https://github.com/pbalcer/unified-runtime/actions/runs/9157375928
Job status: success. Test status: success.

Benchmark Results

Chart

xychart-beta
title "api_overhead_benchmark_sycl (lower is better)"
x-axis ["Batched In Order", "Batched Out Of Order", "Immediate In Order", "Immediate Out Of Order"]
y-axis "mean execution time per 10 kernels (in μs)" 0 --> 100.0
bar [46.693, 44.905, 53.893, 49.958]
line [26.188, 44.666, 30.889, 48.461]

Loading

Comparison

Comparison with previous data:

  • Batched In Order: +78.30%
  • Batched Out Of Order: +0.54%
  • Immediate In Order: +74.47%
  • Immediate Out Of Order: +3.09%

Details

Batched In Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0
UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),46.693,46.724,4.88%,28.176,210.150,[CPU],[us]

Batched Out Of Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0
UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),44.905,44.325,4.82%,43.274,211.954,[CPU],[us]

Immediate In Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1
UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),53.893,53.138,5.09%,51.210,112.757,[CPU],[us]

Immediate Out Of Order

Click to expand

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1
UR_L0_IMMEDIATE_COMMANDLISTS_BATCH_EVENT_COMPLETIONS=1

Command:

/home/pmdk/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),49.958,49.890,1.80%,48.124,101.896,[CPU],[us]

@pbalcer pbalcer force-pushed the main branch 3 times, most recently from 6ca52a5 to 844c209 Compare July 29, 2024 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant