Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] adapt log-prob TD batch-size to advantage shape in PPO #2756

Merged
merged 1 commit into from
Feb 4, 2025

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 4, 2025

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 4, 2025
ghstack-source-id: 8ccd12f65f4a74a42356a630e0e5a1f015337d4a
Pull Request resolved: #2756
Copy link

pytorch-bot bot commented Feb 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2756

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 Cancelled Job, 9 Unrelated Failures

As of commit 29cf596 with merge base 2f8c118 (image):

CANCELLED JOB - The following job was cancelled. Please retry:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 4, 2025
@vmoens vmoens added the bug Something isn't working label Feb 4, 2025
Copy link

github-actions bot commented Feb 4, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}23$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.5703s 0.4698s 2.1285 Ops/s 2.0893 Ops/s $\color{#35bf28}+1.87\%$
test_transformed 1.0618s 0.9484s 1.0544 Ops/s 1.0403 Ops/s $\color{#35bf28}+1.36\%$
test_serial 1.5175s 1.4091s 0.7097 Ops/s 0.6875 Ops/s $\color{#35bf28}+3.23\%$
test_parallel 1.3820s 1.2478s 0.8014 Ops/s 0.7746 Ops/s $\color{#35bf28}+3.46\%$
test_step_mdp_speed[True-True-True-True-True] 0.1978ms 30.0946μs 33.2286 KOps/s 33.6611 KOps/s $\color{#d91a1a}-1.28\%$
test_step_mdp_speed[True-True-True-True-False] 76.6840μs 17.7412μs 56.3658 KOps/s 56.1111 KOps/s $\color{#35bf28}+0.45\%$
test_step_mdp_speed[True-True-True-False-True] 72.8660μs 16.9887μs 58.8626 KOps/s 59.4916 KOps/s $\color{#d91a1a}-1.06\%$
test_step_mdp_speed[True-True-True-False-False] 46.3070μs 10.0071μs 99.9286 KOps/s 99.4302 KOps/s $\color{#35bf28}+0.50\%$
test_step_mdp_speed[True-True-False-True-True] 88.8370μs 32.0257μs 31.2249 KOps/s 31.2163 KOps/s $\color{#35bf28}+0.03\%$
test_step_mdp_speed[True-True-False-True-False] 65.1720μs 19.6400μs 50.9165 KOps/s 50.6985 KOps/s $\color{#35bf28}+0.43\%$
test_step_mdp_speed[True-True-False-False-True] 58.1790μs 18.9557μs 52.7546 KOps/s 52.9515 KOps/s $\color{#d91a1a}-0.37\%$
test_step_mdp_speed[True-True-False-False-False] 52.6690μs 11.8692μs 84.2515 KOps/s 86.0962 KOps/s $\color{#d91a1a}-2.14\%$
test_step_mdp_speed[True-False-True-True-True] 74.1790μs 33.7345μs 29.6433 KOps/s 29.1516 KOps/s $\color{#35bf28}+1.69\%$
test_step_mdp_speed[True-False-True-True-False] 0.5228ms 21.5189μs 46.4708 KOps/s 46.6131 KOps/s $\color{#d91a1a}-0.31\%$
test_step_mdp_speed[True-False-True-False-True] 61.3750μs 18.7134μs 53.4377 KOps/s 53.5296 KOps/s $\color{#d91a1a}-0.17\%$
test_step_mdp_speed[True-False-True-False-False] 67.6270μs 11.6606μs 85.7589 KOps/s 84.4941 KOps/s $\color{#35bf28}+1.50\%$
test_step_mdp_speed[True-False-False-True-True] 0.1120ms 35.2160μs 28.3962 KOps/s 27.9199 KOps/s $\color{#35bf28}+1.71\%$
test_step_mdp_speed[True-False-False-True-False] 73.8890μs 22.9811μs 43.5139 KOps/s 43.1285 KOps/s $\color{#35bf28}+0.89\%$
test_step_mdp_speed[True-False-False-False-True] 61.7460μs 20.4156μs 48.9822 KOps/s 48.8180 KOps/s $\color{#35bf28}+0.34\%$
test_step_mdp_speed[True-False-False-False-False] 54.3820μs 13.5114μs 74.0117 KOps/s 73.3610 KOps/s $\color{#35bf28}+0.89\%$
test_step_mdp_speed[False-True-True-True-True] 76.7040μs 33.4599μs 29.8865 KOps/s 29.4828 KOps/s $\color{#35bf28}+1.37\%$
test_step_mdp_speed[False-True-True-True-False] 67.5070μs 21.4376μs 46.6470 KOps/s 46.4100 KOps/s $\color{#35bf28}+0.51\%$
test_step_mdp_speed[False-True-True-False-True] 85.1660μs 21.3066μs 46.9337 KOps/s 47.1483 KOps/s $\color{#d91a1a}-0.46\%$
test_step_mdp_speed[False-True-True-False-False] 51.0060μs 13.0041μs 76.8986 KOps/s 75.0948 KOps/s $\color{#35bf28}+2.40\%$
test_step_mdp_speed[False-True-False-True-True] 82.2540μs 35.2076μs 28.4029 KOps/s 27.9230 KOps/s $\color{#35bf28}+1.72\%$
test_step_mdp_speed[False-True-False-True-False] 94.4570μs 23.0163μs 43.4474 KOps/s 42.9999 KOps/s $\color{#35bf28}+1.04\%$
test_step_mdp_speed[False-True-False-False-True] 2.8316ms 23.1990μs 43.1053 KOps/s 42.2901 KOps/s $\color{#35bf28}+1.93\%$
test_step_mdp_speed[False-True-False-False-False] 76.9340μs 14.7996μs 67.5694 KOps/s 65.3955 KOps/s $\color{#35bf28}+3.32\%$
test_step_mdp_speed[False-False-True-True-True] 81.2420μs 36.8976μs 27.1020 KOps/s 26.7026 KOps/s $\color{#35bf28}+1.50\%$
test_step_mdp_speed[False-False-True-True-False] 67.8180μs 24.8974μs 40.1649 KOps/s 39.4540 KOps/s $\color{#35bf28}+1.80\%$
test_step_mdp_speed[False-False-True-False-True] 59.2610μs 22.9931μs 43.4913 KOps/s 43.1697 KOps/s $\color{#35bf28}+0.75\%$
test_step_mdp_speed[False-False-True-False-False] 0.6621ms 14.7927μs 67.6009 KOps/s 66.6460 KOps/s $\color{#35bf28}+1.43\%$
test_step_mdp_speed[False-False-False-True-True] 98.1640μs 38.5113μs 25.9664 KOps/s 25.2391 KOps/s $\color{#35bf28}+2.88\%$
test_step_mdp_speed[False-False-False-True-False] 69.3010μs 26.5435μs 37.6739 KOps/s 35.5893 KOps/s $\textbf{\color{#35bf28}+5.86\%}$
test_step_mdp_speed[False-False-False-False-True] 69.5000μs 24.6742μs 40.5281 KOps/s 40.5632 KOps/s $\color{#d91a1a}-0.09\%$
test_step_mdp_speed[False-False-False-False-False] 65.3530μs 16.4727μs 60.7065 KOps/s 60.6397 KOps/s $\color{#35bf28}+0.11\%$
test_values[generalized_advantage_estimate-True-True] 13.8481ms 10.1741ms 98.2888 Ops/s 100.6914 Ops/s $\color{#d91a1a}-2.39\%$
test_values[vec_generalized_advantage_estimate-True-True] 30.0901ms 26.9198ms 37.1473 Ops/s 40.6780 Ops/s $\textbf{\color{#d91a1a}-8.68\%}$
test_values[td0_return_estimate-False-False] 0.3261ms 0.1913ms 5.2267 KOps/s 4.8987 KOps/s $\textbf{\color{#35bf28}+6.69\%}$
test_values[td1_return_estimate-False-False] 26.3179ms 25.3991ms 39.3715 Ops/s 39.5291 Ops/s $\color{#d91a1a}-0.40\%$
test_values[vec_td1_return_estimate-False-False] 29.6980ms 27.2715ms 36.6683 Ops/s 40.2282 Ops/s $\textbf{\color{#d91a1a}-8.85\%}$
test_values[td_lambda_return_estimate-True-False] 39.3702ms 36.8322ms 27.1502 Ops/s 27.3178 Ops/s $\color{#d91a1a}-0.61\%$
test_values[vec_td_lambda_return_estimate-True-False] 32.5617ms 26.9642ms 37.0862 Ops/s 38.6348 Ops/s $\color{#d91a1a}-4.01\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.0337ms 8.7960ms 113.6881 Ops/s 115.8604 Ops/s $\color{#d91a1a}-1.87\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.3332ms 1.8751ms 533.3056 Ops/s 494.2849 Ops/s $\textbf{\color{#35bf28}+7.89\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4969ms 0.3747ms 2.6687 KOps/s 2.6063 KOps/s $\color{#35bf28}+2.40\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 48.4851ms 46.1411ms 21.6726 Ops/s 23.9155 Ops/s $\textbf{\color{#d91a1a}-9.38\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.5021ms 3.5390ms 282.5667 Ops/s 276.7779 Ops/s $\color{#35bf28}+2.09\%$
test_dqn_speed[False-None] 8.2284ms 1.4078ms 710.3386 Ops/s 685.4056 Ops/s $\color{#35bf28}+3.64\%$
test_dqn_speed[False-backward] 2.2747ms 1.9267ms 519.0303 Ops/s 517.5579 Ops/s $\color{#35bf28}+0.28\%$
test_dqn_speed[True-None] 0.6999ms 0.4844ms 2.0645 KOps/s 2.0426 KOps/s $\color{#35bf28}+1.07\%$
test_dqn_speed[True-backward] 0.9799ms 0.9101ms 1.0988 KOps/s 1.0425 KOps/s $\textbf{\color{#35bf28}+5.40\%}$
test_dqn_speed[reduce-overhead-None] 0.7660ms 0.4908ms 2.0376 KOps/s 1.9252 KOps/s $\textbf{\color{#35bf28}+5.84\%}$
test_dqn_speed[reduce-overhead-backward] 1.1926ms 0.9356ms 1.0688 KOps/s 1.0603 KOps/s $\color{#35bf28}+0.80\%$
test_ddpg_speed[False-None] 3.6166ms 2.9909ms 334.3500 Ops/s 317.9629 Ops/s $\textbf{\color{#35bf28}+5.15\%}$
test_ddpg_speed[False-backward] 4.9683ms 4.1543ms 240.7161 Ops/s 229.4340 Ops/s $\color{#35bf28}+4.92\%$
test_ddpg_speed[True-None] 1.8373ms 1.2507ms 799.5386 Ops/s 783.2186 Ops/s $\color{#35bf28}+2.08\%$
test_ddpg_speed[True-backward] 2.4914ms 2.1889ms 456.8476 Ops/s 392.4973 Ops/s $\textbf{\color{#35bf28}+16.40\%}$
test_ddpg_speed[reduce-overhead-None] 1.9331ms 1.2346ms 809.9879 Ops/s 803.8207 Ops/s $\color{#35bf28}+0.77\%$
test_ddpg_speed[reduce-overhead-backward] 2.3927ms 2.1223ms 471.1881 Ops/s 427.7554 Ops/s $\textbf{\color{#35bf28}+10.15\%}$
test_sac_speed[False-None] 10.4855ms 8.7573ms 114.1898 Ops/s 113.6807 Ops/s $\color{#35bf28}+0.45\%$
test_sac_speed[False-backward] 13.4790ms 11.8566ms 84.3411 Ops/s 79.8165 Ops/s $\textbf{\color{#35bf28}+5.67\%}$
test_sac_speed[True-None] 3.3462ms 2.3474ms 426.0081 Ops/s 428.2777 Ops/s $\color{#d91a1a}-0.53\%$
test_sac_speed[True-backward] 5.0547ms 4.1680ms 239.9232 Ops/s 237.8375 Ops/s $\color{#35bf28}+0.88\%$
test_sac_speed[reduce-overhead-None] 3.5056ms 2.4030ms 416.1532 Ops/s 401.5137 Ops/s $\color{#35bf28}+3.65\%$
test_sac_speed[reduce-overhead-backward] 6.8016ms 4.6352ms 215.7403 Ops/s 222.3206 Ops/s $\color{#d91a1a}-2.96\%$
test_redq_speed[False-None] 23.6190ms 16.0711ms 62.2235 Ops/s 69.9547 Ops/s $\textbf{\color{#d91a1a}-11.05\%}$
test_redq_speed[False-backward] 33.0419ms 25.6029ms 39.0581 Ops/s 41.7078 Ops/s $\textbf{\color{#d91a1a}-6.35\%}$
test_redq_speed[True-None] 9.4850ms 6.5758ms 152.0720 Ops/s 153.6597 Ops/s $\color{#d91a1a}-1.03\%$
test_redq_speed[True-backward] 15.2215ms 13.4897ms 74.1304 Ops/s 72.4153 Ops/s $\color{#35bf28}+2.37\%$
test_redq_speed[reduce-overhead-None] 9.7052ms 6.4552ms 154.9149 Ops/s 161.5615 Ops/s $\color{#d91a1a}-4.11\%$
test_redq_speed[reduce-overhead-backward] 15.6704ms 13.8018ms 72.4541 Ops/s 73.5342 Ops/s $\color{#d91a1a}-1.47\%$
test_redq_deprec_speed[False-None] 17.1484ms 14.4201ms 69.3476 Ops/s 67.8358 Ops/s $\color{#35bf28}+2.23\%$
test_redq_deprec_speed[False-backward] 22.2765ms 20.3508ms 49.1382 Ops/s 48.0107 Ops/s $\color{#35bf28}+2.35\%$
test_redq_deprec_speed[True-None] 5.5868ms 4.5614ms 219.2323 Ops/s 218.2325 Ops/s $\color{#35bf28}+0.46\%$
test_redq_deprec_speed[True-backward] 10.2026ms 9.3539ms 106.9070 Ops/s 99.9319 Ops/s $\textbf{\color{#35bf28}+6.98\%}$
test_redq_deprec_speed[reduce-overhead-None] 7.6041ms 4.8971ms 204.2029 Ops/s 223.0352 Ops/s $\textbf{\color{#d91a1a}-8.44\%}$
test_redq_deprec_speed[reduce-overhead-backward] 13.4020ms 9.6547ms 103.5765 Ops/s 102.0840 Ops/s $\color{#35bf28}+1.46\%$
test_td3_speed[False-None] 9.8895ms 8.7657ms 114.0809 Ops/s 110.7607 Ops/s $\color{#35bf28}+3.00\%$
test_td3_speed[False-backward] 14.9060ms 12.0845ms 82.7508 Ops/s 83.8375 Ops/s $\color{#d91a1a}-1.30\%$
test_td3_speed[True-None] 3.0569ms 2.1070ms 474.6141 Ops/s 525.0167 Ops/s $\textbf{\color{#d91a1a}-9.60\%}$
test_td3_speed[True-backward] 5.2144ms 4.0484ms 247.0099 Ops/s 242.6747 Ops/s $\color{#35bf28}+1.79\%$
test_td3_speed[reduce-overhead-None] 2.9390ms 1.9541ms 511.7403 Ops/s 507.3676 Ops/s $\color{#35bf28}+0.86\%$
test_td3_speed[reduce-overhead-backward] 5.1035ms 4.1132ms 243.1193 Ops/s 224.0345 Ops/s $\textbf{\color{#35bf28}+8.52\%}$
test_cql_speed[False-None] 41.3437ms 37.3918ms 26.7438 Ops/s 24.9606 Ops/s $\textbf{\color{#35bf28}+7.14\%}$
test_cql_speed[False-backward] 55.5531ms 49.1014ms 20.3660 Ops/s 19.9538 Ops/s $\color{#35bf28}+2.07\%$
test_cql_speed[True-None] 18.0654ms 16.8812ms 59.2373 Ops/s 60.0086 Ops/s $\color{#d91a1a}-1.29\%$
test_cql_speed[True-backward] 27.1233ms 24.2794ms 41.1872 Ops/s 42.1325 Ops/s $\color{#d91a1a}-2.24\%$
test_cql_speed[reduce-overhead-None] 18.7542ms 16.8394ms 59.3845 Ops/s 60.5125 Ops/s $\color{#d91a1a}-1.86\%$
test_cql_speed[reduce-overhead-backward] 25.3038ms 23.9415ms 41.7685 Ops/s 41.9646 Ops/s $\color{#d91a1a}-0.47\%$
test_a2c_speed[False-None] 8.8840ms 7.7825ms 128.4937 Ops/s 127.9439 Ops/s $\color{#35bf28}+0.43\%$
test_a2c_speed[False-backward] 15.7425ms 15.0248ms 66.5565 Ops/s 64.1874 Ops/s $\color{#35bf28}+3.69\%$
test_a2c_speed[True-None] 5.6739ms 4.0679ms 245.8287 Ops/s 245.5812 Ops/s $\color{#35bf28}+0.10\%$
test_a2c_speed[True-backward] 11.8483ms 10.8752ms 91.9527 Ops/s 91.4783 Ops/s $\color{#35bf28}+0.52\%$
test_a2c_speed[reduce-overhead-None] 5.3017ms 3.9233ms 254.8843 Ops/s 251.3309 Ops/s $\color{#35bf28}+1.41\%$
test_a2c_speed[reduce-overhead-backward] 11.4552ms 10.7590ms 92.9455 Ops/s 89.5417 Ops/s $\color{#35bf28}+3.80\%$
test_ppo_speed[False-None] 8.4370ms 7.8945ms 126.6702 Ops/s 124.2781 Ops/s $\color{#35bf28}+1.92\%$
test_ppo_speed[False-backward] 16.7533ms 15.4886ms 64.5636 Ops/s 61.6992 Ops/s $\color{#35bf28}+4.64\%$
test_ppo_speed[True-None] 5.8798ms 4.5153ms 221.4712 Ops/s 216.4036 Ops/s $\color{#35bf28}+2.34\%$
test_ppo_speed[True-backward] 11.8399ms 10.6464ms 93.9284 Ops/s 92.9543 Ops/s $\color{#35bf28}+1.05\%$
test_ppo_speed[reduce-overhead-None] 5.5523ms 4.3178ms 231.6001 Ops/s 205.2255 Ops/s $\textbf{\color{#35bf28}+12.85\%}$
test_ppo_speed[reduce-overhead-backward] 11.1487ms 10.3833ms 96.3085 Ops/s 91.1768 Ops/s $\textbf{\color{#35bf28}+5.63\%}$
test_reinforce_speed[False-None] 8.3181ms 6.8948ms 145.0361 Ops/s 139.2009 Ops/s $\color{#35bf28}+4.19\%$
test_reinforce_speed[False-backward] 12.2025ms 10.2108ms 97.9359 Ops/s 93.9099 Ops/s $\color{#35bf28}+4.29\%$
test_reinforce_speed[True-None] 4.1822ms 3.2240ms 310.1699 Ops/s 269.2794 Ops/s $\textbf{\color{#35bf28}+15.19\%}$
test_reinforce_speed[True-backward] 10.1209ms 9.2963ms 107.5694 Ops/s 100.9987 Ops/s $\textbf{\color{#35bf28}+6.51\%}$
test_reinforce_speed[reduce-overhead-None] 4.9086ms 3.5090ms 284.9796 Ops/s 275.4993 Ops/s $\color{#35bf28}+3.44\%$
test_reinforce_speed[reduce-overhead-backward] 10.4644ms 9.7165ms 102.9179 Ops/s 97.6525 Ops/s $\textbf{\color{#35bf28}+5.39\%}$
test_iql_speed[False-None] 36.9796ms 34.0377ms 29.3792 Ops/s 28.9793 Ops/s $\color{#35bf28}+1.38\%$
test_iql_speed[False-backward] 50.5970ms 46.8566ms 21.3417 Ops/s 20.7810 Ops/s $\color{#35bf28}+2.70\%$
test_iql_speed[True-None] 12.8754ms 11.7394ms 85.1830 Ops/s 81.8473 Ops/s $\color{#35bf28}+4.08\%$
test_iql_speed[True-backward] 24.4119ms 23.3061ms 42.9072 Ops/s 40.0196 Ops/s $\textbf{\color{#35bf28}+7.22\%}$
test_iql_speed[reduce-overhead-None] 13.8591ms 11.7215ms 85.3133 Ops/s 78.2601 Ops/s $\textbf{\color{#35bf28}+9.01\%}$
test_iql_speed[reduce-overhead-backward] 26.4442ms 23.9865ms 41.6901 Ops/s 41.7700 Ops/s $\color{#d91a1a}-0.19\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.8093ms 5.4711ms 182.7786 Ops/s 178.1935 Ops/s $\color{#35bf28}+2.57\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0403ms 0.5793ms 1.7263 KOps/s 1.7339 KOps/s $\color{#d91a1a}-0.44\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8796ms 0.5383ms 1.8576 KOps/s 1.8519 KOps/s $\color{#35bf28}+0.30\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 7.5247ms 4.9626ms 201.5053 Ops/s 196.9779 Ops/s $\color{#35bf28}+2.30\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.6432ms 0.5365ms 1.8640 KOps/s 1.8034 KOps/s $\color{#35bf28}+3.36\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8831ms 0.5194ms 1.9254 KOps/s 1.8487 KOps/s $\color{#35bf28}+4.15\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.4832ms 1.7326ms 577.1532 Ops/s 563.2932 Ops/s $\color{#35bf28}+2.46\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.3040ms 1.6378ms 610.5724 Ops/s 594.3253 Ops/s $\color{#35bf28}+2.73\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 7.9201ms 5.1840ms 192.9007 Ops/s 191.9527 Ops/s $\color{#35bf28}+0.49\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.2520ms 0.6850ms 1.4599 KOps/s 1.4243 KOps/s $\color{#35bf28}+2.50\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.0005ms 0.6614ms 1.5119 KOps/s 1.4753 KOps/s $\color{#35bf28}+2.48\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.1980ms 4.9263ms 202.9933 Ops/s 197.0091 Ops/s $\color{#35bf28}+3.04\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.3525ms 0.5541ms 1.8047 KOps/s 1.7733 KOps/s $\color{#35bf28}+1.77\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7559ms 0.5430ms 1.8415 KOps/s 1.8545 KOps/s $\color{#d91a1a}-0.70\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.6314ms 5.0153ms 199.3912 Ops/s 202.3227 Ops/s $\color{#d91a1a}-1.45\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0908ms 0.5433ms 1.8406 KOps/s 1.8339 KOps/s $\color{#35bf28}+0.37\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.9259ms 0.5258ms 1.9019 KOps/s 1.8407 KOps/s $\color{#35bf28}+3.32\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0359ms 5.1585ms 193.8559 Ops/s 192.6325 Ops/s $\color{#35bf28}+0.64\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.4303ms 0.6907ms 1.4478 KOps/s 1.4248 KOps/s $\color{#35bf28}+1.61\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9219ms 0.6521ms 1.5335 KOps/s 1.4591 KOps/s $\textbf{\color{#35bf28}+5.10\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.8717ms 4.4393ms 225.2607 Ops/s 235.4067 Ops/s $\color{#d91a1a}-4.31\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.6604ms 2.3143ms 432.1018 Ops/s 421.4594 Ops/s $\color{#35bf28}+2.53\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 6.7018ms 1.4245ms 701.9787 Ops/s 761.2088 Ops/s $\textbf{\color{#d91a1a}-7.78\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5335s 14.9174ms 67.0357 Ops/s 229.8416 Ops/s $\textbf{\color{#d91a1a}-70.83\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.6987ms 2.4206ms 413.1234 Ops/s 380.4487 Ops/s $\textbf{\color{#35bf28}+8.59\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 5.7783ms 1.4680ms 681.1841 Ops/s 671.3011 Ops/s $\color{#35bf28}+1.47\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.0283ms 4.6614ms 214.5269 Ops/s 31.4702 Ops/s $\textbf{\color{#35bf28}+581.68\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 8.1327ms 2.5939ms 385.5196 Ops/s 355.8693 Ops/s $\textbf{\color{#35bf28}+8.33\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 5.1816ms 1.6026ms 623.9756 Ops/s 615.1998 Ops/s $\color{#35bf28}+1.43\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.7674ms 11.8904ms 84.1014 Ops/s 83.2143 Ops/s $\color{#35bf28}+1.07\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 16.9000ms 14.9943ms 66.6921 Ops/s 66.5945 Ops/s $\color{#35bf28}+0.15\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 22.2671ms 20.8225ms 48.0250 Ops/s 47.8866 Ops/s $\color{#35bf28}+0.29\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 16.8864ms 15.0657ms 66.3758 Ops/s 65.6728 Ops/s $\color{#35bf28}+1.07\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 22.2492ms 20.7560ms 48.1790 Ops/s 48.0290 Ops/s $\color{#35bf28}+0.31\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 18.0077ms 16.3625ms 61.1154 Ops/s 60.6863 Ops/s $\color{#35bf28}+0.71\%$

Copy link

github-actions bot commented Feb 4, 2025

Result of GPU Benchmark Tests

Expand to view detailed results
Name Max Mean Ops
test_simple 0.8501s 0.7459s 1.3406 Ops/s
test_transformed 1.3205s 1.3185s 0.7584 Ops/s
test_serial 2.1332s 2.1286s 0.4698 Ops/s
test_parallel 1.8466s 1.8157s 0.5507 Ops/s
test_step_mdp_speed[True-True-True-True-True] 0.1893ms 40.1282μs 24.9201 KOps/s
test_step_mdp_speed[True-True-True-True-False] 0.1265ms 23.3798μs 42.7719 KOps/s
test_step_mdp_speed[True-True-True-False-True] 56.3210μs 21.7572μs 45.9618 KOps/s
test_step_mdp_speed[True-True-True-False-False] 47.7910μs 12.9808μs 77.0369 KOps/s
test_step_mdp_speed[True-True-False-True-True] 0.1135ms 42.2349μs 23.6771 KOps/s
test_step_mdp_speed[True-True-False-True-False] 72.1810μs 25.5595μs 39.1244 KOps/s
test_step_mdp_speed[True-True-False-False-True] 69.5020μs 24.9127μs 40.1402 KOps/s
test_step_mdp_speed[True-True-False-False-False] 0.1190ms 15.5630μs 64.2551 KOps/s
test_step_mdp_speed[True-False-True-True-True] 77.3610μs 45.1875μs 22.1300 KOps/s
test_step_mdp_speed[True-False-True-True-False] 74.3410μs 27.8214μs 35.9435 KOps/s
test_step_mdp_speed[True-False-True-False-True] 55.4210μs 24.6564μs 40.5574 KOps/s
test_step_mdp_speed[True-False-True-False-False] 0.1811ms 14.8451μs 67.3623 KOps/s
test_step_mdp_speed[True-False-False-True-True] 86.0420μs 47.0684μs 21.2457 KOps/s
test_step_mdp_speed[True-False-False-True-False] 0.2170ms 30.0339μs 33.2958 KOps/s
test_step_mdp_speed[True-False-False-False-True] 0.2207ms 26.5456μs 37.6711 KOps/s
test_step_mdp_speed[True-False-False-False-False] 0.2148ms 17.4849μs 57.1921 KOps/s
test_step_mdp_speed[False-True-True-True-True] 86.9410μs 44.5538μs 22.4448 KOps/s
test_step_mdp_speed[False-True-True-True-False] 60.8210μs 28.0218μs 35.6865 KOps/s
test_step_mdp_speed[False-True-True-False-True] 61.4810μs 28.7213μs 34.8173 KOps/s
test_step_mdp_speed[False-True-True-False-False] 56.8810μs 16.8952μs 59.1883 KOps/s
test_step_mdp_speed[False-True-False-True-True] 86.7420μs 47.2579μs 21.1605 KOps/s
test_step_mdp_speed[False-True-False-True-False] 61.5610μs 30.2641μs 33.0425 KOps/s
test_step_mdp_speed[False-True-False-False-True] 3.3551ms 31.5658μs 31.6798 KOps/s
test_step_mdp_speed[False-True-False-False-False] 46.8710μs 19.3959μs 51.5574 KOps/s
test_step_mdp_speed[False-False-True-True-True] 0.1080ms 49.2992μs 20.2843 KOps/s
test_step_mdp_speed[False-False-True-True-False] 66.5310μs 32.6322μs 30.6446 KOps/s
test_step_mdp_speed[False-False-True-False-True] 59.3310μs 30.9683μs 32.2911 KOps/s
test_step_mdp_speed[False-False-True-False-False] 45.6600μs 19.2429μs 51.9673 KOps/s
test_step_mdp_speed[False-False-False-True-True] 84.6220μs 51.2933μs 19.4957 KOps/s
test_step_mdp_speed[False-False-False-True-False] 61.0610μs 34.9589μs 28.6050 KOps/s
test_step_mdp_speed[False-False-False-False-True] 64.9110μs 32.2624μs 30.9958 KOps/s
test_step_mdp_speed[False-False-False-False-False] 52.0510μs 21.5295μs 46.4478 KOps/s
test_values[generalized_advantage_estimate-True-True] 26.3221ms 25.1426ms 39.7731 Ops/s
test_values[vec_generalized_advantage_estimate-True-True] 0.1034s 2.9696ms 336.7440 Ops/s
test_values[td0_return_estimate-False-False] 0.1076ms 78.4166μs 12.7524 KOps/s
test_values[td1_return_estimate-False-False] 56.4885ms 56.0488ms 17.8416 Ops/s
test_values[vec_td1_return_estimate-False-False] 1.4094ms 1.0855ms 921.2309 Ops/s
test_values[td_lambda_return_estimate-True-False] 92.4275ms 89.3381ms 11.1934 Ops/s
test_values[vec_td_lambda_return_estimate-True-False] 1.3901ms 1.0763ms 929.1035 Ops/s
test_gae_speed[generalized_advantage_estimate-False-1-512] 25.4323ms 24.7401ms 40.4202 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0520ms 0.7595ms 1.3166 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8143ms 0.6659ms 1.5017 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.6254ms 1.4838ms 673.9326 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.8905ms 0.7078ms 1.4129 KOps/s
test_dqn_speed[False-None] 1.7029ms 1.5309ms 653.2277 Ops/s
test_dqn_speed[False-backward] 2.2858ms 2.1541ms 464.2314 Ops/s
test_dqn_speed[True-None] 0.7048ms 0.5519ms 1.8119 KOps/s
test_dqn_speed[True-backward] 1.1988ms 1.1343ms 881.6038 Ops/s
test_dqn_speed[reduce-overhead-None] 0.7798ms 0.5840ms 1.7123 KOps/s
test_dqn_speed[reduce-overhead-backward] 1.1337ms 0.9665ms 1.0347 KOps/s
test_ddpg_speed[False-None] 3.1973ms 2.8884ms 346.2070 Ops/s
test_ddpg_speed[False-backward] 4.2678ms 4.1178ms 242.8480 Ops/s
test_ddpg_speed[True-None] 1.4898ms 1.3206ms 757.2523 Ops/s
test_ddpg_speed[True-backward] 2.5737ms 2.3885ms 418.6675 Ops/s
test_ddpg_speed[reduce-overhead-None] 1.5281ms 1.3380ms 747.3823 Ops/s
test_ddpg_speed[reduce-overhead-backward] 1.9777ms 1.8722ms 534.1427 Ops/s
test_sac_speed[False-None] 8.4925ms 8.0363ms 124.4360 Ops/s
test_sac_speed[False-backward] 11.3521ms 10.8631ms 92.0543 Ops/s
test_sac_speed[True-None] 2.0070ms 1.8324ms 545.7359 Ops/s
test_sac_speed[True-backward] 3.6713ms 3.5193ms 284.1441 Ops/s
test_sac_speed[reduce-overhead-None] 21.5051ms 11.9928ms 83.3836 Ops/s
test_sac_speed[reduce-overhead-backward] 1.7269ms 1.6204ms 617.1362 Ops/s
test_redq_speed[False-None] 7.8809ms 7.4163ms 134.8373 Ops/s
test_redq_speed[False-backward] 11.4995ms 11.1308ms 89.8409 Ops/s
test_redq_speed[True-None] 2.4678ms 2.2622ms 442.0444 Ops/s
test_redq_speed[True-backward] 4.1482ms 3.8834ms 257.5031 Ops/s
test_redq_speed[reduce-overhead-None] 2.4575ms 2.2807ms 438.4550 Ops/s
test_redq_speed[reduce-overhead-backward] 4.0942ms 3.9240ms 254.8450 Ops/s
test_redq_deprec_speed[False-None] 9.4380ms 8.9919ms 111.2113 Ops/s
test_redq_deprec_speed[False-backward] 12.2816ms 11.8836ms 84.1497 Ops/s
test_redq_deprec_speed[True-None] 2.8517ms 2.6559ms 376.5251 Ops/s
test_redq_deprec_speed[True-backward] 4.4393ms 4.2929ms 232.9446 Ops/s
test_redq_deprec_speed[reduce-overhead-None] 2.8867ms 2.6587ms 376.1275 Ops/s
test_redq_deprec_speed[reduce-overhead-backward] 4.5613ms 4.2344ms 236.1634 Ops/s
test_td3_speed[False-None] 8.0475ms 7.9561ms 125.6892 Ops/s
test_td3_speed[False-backward] 10.9515ms 10.2264ms 97.7860 Ops/s
test_td3_speed[True-None] 1.6462ms 1.6168ms 618.4985 Ops/s
test_td3_speed[True-backward] 3.2996ms 3.1435ms 318.1137 Ops/s
test_td3_speed[reduce-overhead-None] 57.4498ms 26.0467ms 38.3926 Ops/s
test_td3_speed[reduce-overhead-backward] 1.4263ms 1.3485ms 741.5891 Ops/s
test_cql_speed[False-None] 17.1049ms 16.6822ms 59.9443 Ops/s
test_cql_speed[False-backward] 22.5149ms 21.6639ms 46.1598 Ops/s
test_cql_speed[True-None] 3.5743ms 3.2923ms 303.7367 Ops/s
test_cql_speed[True-backward] 5.7071ms 5.3021ms 188.6032 Ops/s
test_cql_speed[reduce-overhead-None] 21.0185ms 12.9029ms 77.5020 Ops/s
test_cql_speed[reduce-overhead-backward] 1.9165ms 1.7952ms 557.0366 Ops/s
test_a2c_speed[False-None] 4.1639ms 3.1929ms 313.1929 Ops/s
test_a2c_speed[False-backward] 7.1179ms 5.9906ms 166.9288 Ops/s
test_a2c_speed[True-None] 1.5838ms 1.3324ms 750.5326 Ops/s
test_a2c_speed[True-backward] 3.0072ms 2.8521ms 350.6143 Ops/s
test_a2c_speed[reduce-overhead-None] 16.0714ms 9.1324ms 109.4999 Ops/s
test_a2c_speed[reduce-overhead-backward] 1.6136ms 1.4573ms 686.2022 Ops/s
test_ppo_speed[False-None] 4.0584ms 3.6952ms 270.6201 Ops/s
test_ppo_speed[False-backward] 7.4965ms 6.7085ms 149.0641 Ops/s
test_ppo_speed[True-None] 1.7718ms 1.4153ms 706.5483 Ops/s
test_ppo_speed[True-backward] 3.3178ms 3.0132ms 331.8768 Ops/s
test_ppo_speed[reduce-overhead-None] 1.2177ms 0.9530ms 1.0493 KOps/s
test_ppo_speed[reduce-overhead-backward] 1.6107ms 1.4068ms 710.8533 Ops/s
test_reinforce_speed[False-None] 2.6374ms 2.3098ms 432.9369 Ops/s
test_reinforce_speed[False-backward] 4.0098ms 3.3695ms 296.7764 Ops/s
test_reinforce_speed[True-None] 1.6250ms 1.3070ms 765.1394 Ops/s
test_reinforce_speed[True-backward] 3.0055ms 2.8888ms 346.1617 Ops/s
test_reinforce_speed[reduce-overhead-None] 17.8860ms 9.9950ms 100.0500 Ops/s
test_reinforce_speed[reduce-overhead-backward] 1.6039ms 1.4815ms 674.9709 Ops/s
test_iql_speed[False-None] 9.9647ms 9.3984ms 106.4010 Ops/s
test_iql_speed[False-backward] 13.5929ms 12.9706ms 77.0972 Ops/s
test_iql_speed[True-None] 2.5644ms 2.2492ms 444.6031 Ops/s
test_iql_speed[True-backward] 4.8090ms 4.6363ms 215.6915 Ops/s
test_iql_speed[reduce-overhead-None] 18.4617ms 11.0586ms 90.4274 Ops/s
test_iql_speed[reduce-overhead-backward] 2.2320ms 1.8811ms 531.5914 Ops/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 8.4579ms 6.4687ms 154.5907 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.5750ms 0.3346ms 2.9886 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6089ms 0.3023ms 3.3079 KOps/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.7075ms 6.2120ms 160.9790 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.3077ms 0.3223ms 3.1029 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6171ms 0.3153ms 3.1717 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7290ms 1.4277ms 700.4215 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6074ms 1.3264ms 753.9171 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.8801ms 6.3728ms 156.9177 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9521ms 0.4327ms 2.3113 KOps/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6610ms 0.4099ms 2.4394 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.5518ms 6.1905ms 161.5386 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.8302ms 0.2959ms 3.3800 KOps/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7572ms 0.2987ms 3.3474 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 10.2311ms 6.5194ms 153.3878 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7155ms 0.2778ms 3.5994 KOps/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5784ms 0.2556ms 3.9124 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.6668ms 6.3175ms 158.2917 Ops/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.9177ms 0.5072ms 1.9716 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7608ms 0.4846ms 2.0637 KOps/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 7.2799ms 5.6763ms 176.1723 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 9.5980ms 2.1382ms 467.6923 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 6.8001ms 1.2144ms 823.4623 Ops/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.1148ms 5.6570ms 176.7713 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 11.8320ms 2.0314ms 492.2696 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.1369ms 1.1719ms 853.3128 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5663s 16.9954ms 58.8396 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 8.5906ms 2.2062ms 453.2647 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 8.8888ms 1.4332ms 697.7278 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.9908ms 13.4067ms 74.5894 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.6514ms 17.1277ms 58.3850 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 18.0828ms 17.8558ms 56.0042 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 18.6269ms 17.0329ms 58.7098 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 18.2580ms 17.9242ms 55.7904 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 0.4450s 27.1531ms 36.8282 Ops/s

@vmoens vmoens merged commit 29cf596 into gh/vmoens/85/base Feb 4, 2025
68 of 78 checks passed
vmoens added a commit that referenced this pull request Feb 4, 2025
ghstack-source-id: 8ccd12f65f4a74a42356a630e0e5a1f015337d4a
Pull Request resolved: #2756
@vmoens vmoens deleted the gh/vmoens/85/head branch February 4, 2025 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
2 participants