Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: OOM investigations of map-only pipelines #3558

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Dec 12, 2024

Investigations on OOM issues for map-only pipelines.

Tracking actual memory usage

Adds memray tracking to report the actual memory usage vs requested memory usage on our Ray tracing tool

Mock workload

Adds benchmarking/ooms/big_task_heap_usage.py which is a script that runs 2 UDFs with big heap memory usage (but small inputs/outputs)

This can be run using:

uv run tools/gha_run_cluster_job.py benchmarking/ooms/big_task_heap_usage.py --daft-wheel-url  --branch jay/big-materialization-ooms -- --num-partitions=64 --inflation-factor=8000000 --deflation-factor=8000000

Unit tests for memory usage

Unit tests to check our requested vs actual memory usage for selected plans

@github-actions github-actions bot added the perf label Dec 12, 2024
Copy link

codspeed-hq bot commented Dec 12, 2024

CodSpeed Performance Report

Merging #3558 will degrade performances by 28.56%

Comparing jay/big-materialization-ooms (60eed9d) with main (e148248)

Summary

❌ 1 regressions
✅ 26 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main jay/big-materialization-ooms Change
test_iter_rows_first_row[100 Small Files] 142.9 ms 200.1 ms -28.56%

Copy link

codecov bot commented Dec 12, 2024

Codecov Report

Attention: Patch coverage is 42.10526% with 11 lines in your changes missing coverage. Please review.

Project coverage is 77.79%. Comparing base (35ed63c) to head (2c4246f).

Files with missing lines Patch % Lines
daft/runners/ray_tracing.py 9.09% 10 Missing ⚠️
daft/runners/ray_metrics.py 87.50% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3558      +/-   ##
==========================================
- Coverage   77.81%   77.79%   -0.02%     
==========================================
  Files         716      716              
  Lines       87987    88000      +13     
==========================================
- Hits        68463    68461       -2     
- Misses      19524    19539      +15     
Files with missing lines Coverage Δ
daft/runners/ray_metrics.py 58.24% <87.50%> (+2.94%) ⬆️
daft/runners/ray_tracing.py 61.29% <9.09%> (-1.89%) ⬇️

... and 4 files with indirect coverage changes

@jaychia jaychia force-pushed the jay/big-materialization-ooms branch from 7e46dac to 2c4246f Compare December 13, 2024 02:32
@jaychia jaychia force-pushed the jay/big-materialization-ooms branch from 2b08029 to 60eed9d Compare December 17, 2024 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant