Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix KLPENPPOLoss KL computation #1922

Merged
merged 5 commits into from
Feb 17, 2024
Merged

[BugFix] Fix KLPENPPOLoss KL computation #1922

merged 5 commits into from
Feb 17, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 16, 2024

Fixes #1920

Copy link

pytorch-bot bot commented Feb 16, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1922

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (25 Unrelated Failures)

As of commit 33abbad with merge base 67f659c (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 16, 2024
Copy link

github-actions bot commented Feb 16, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 62.3462ms 61.9360ms 16.1457 Ops/s 16.4474 Ops/s $\color{#d91a1a}-1.83\%$
test_sync 40.8298ms 33.5033ms 29.8478 Ops/s 30.4009 Ops/s $\color{#d91a1a}-1.82\%$
test_async 74.1144ms 32.3004ms 30.9594 Ops/s 30.9322 Ops/s $\color{#35bf28}+0.09\%$
test_simple 0.5040s 0.4403s 2.2710 Ops/s 2.3188 Ops/s $\color{#d91a1a}-2.06\%$
test_transformed 0.6496s 0.5954s 1.6795 Ops/s 1.7131 Ops/s $\color{#d91a1a}-1.96\%$
test_serial 1.5047s 1.4569s 0.6864 Ops/s 0.7048 Ops/s $\color{#d91a1a}-2.61\%$
test_parallel 1.4898s 1.3995s 0.7145 Ops/s 0.7283 Ops/s $\color{#d91a1a}-1.90\%$
test_step_mdp_speed[True-True-True-True-True] 0.1009ms 20.6613μs 48.3997 KOps/s 46.9842 KOps/s $\color{#35bf28}+3.01\%$
test_step_mdp_speed[True-True-True-True-False] 42.4590μs 12.8265μs 77.9639 KOps/s 77.4479 KOps/s $\color{#35bf28}+0.67\%$
test_step_mdp_speed[True-True-True-False-True] 36.4680μs 12.3463μs 80.9961 KOps/s 80.6192 KOps/s $\color{#35bf28}+0.47\%$
test_step_mdp_speed[True-True-True-False-False] 41.0470μs 7.4513μs 134.2048 KOps/s 133.8685 KOps/s $\color{#35bf28}+0.25\%$
test_step_mdp_speed[True-True-False-True-True] 46.5070μs 22.3051μs 44.8328 KOps/s 44.0703 KOps/s $\color{#35bf28}+1.73\%$
test_step_mdp_speed[True-True-False-True-False] 41.7980μs 14.0209μs 71.3223 KOps/s 70.2532 KOps/s $\color{#35bf28}+1.52\%$
test_step_mdp_speed[True-True-False-False-True] 36.4680μs 13.6043μs 73.5060 KOps/s 73.3563 KOps/s $\color{#35bf28}+0.20\%$
test_step_mdp_speed[True-True-False-False-False] 38.9720μs 8.7118μs 114.7872 KOps/s 114.1941 KOps/s $\color{#35bf28}+0.52\%$
test_step_mdp_speed[True-False-True-True-True] 49.3830μs 23.7675μs 42.0743 KOps/s 41.9236 KOps/s $\color{#35bf28}+0.36\%$
test_step_mdp_speed[True-False-True-True-False] 56.4750μs 15.4874μs 64.5688 KOps/s 64.0067 KOps/s $\color{#35bf28}+0.88\%$
test_step_mdp_speed[True-False-True-False-True] 85.7500μs 13.4451μs 74.3765 KOps/s 73.1324 KOps/s $\color{#35bf28}+1.70\%$
test_step_mdp_speed[True-False-True-False-False] 35.6170μs 8.7185μs 114.6984 KOps/s 115.7210 KOps/s $\color{#d91a1a}-0.88\%$
test_step_mdp_speed[True-False-False-True-True] 61.0940μs 25.1290μs 39.7946 KOps/s 40.7843 KOps/s $\color{#d91a1a}-2.43\%$
test_step_mdp_speed[True-False-False-True-False] 41.6680μs 16.6824μs 59.9436 KOps/s 60.4378 KOps/s $\color{#d91a1a}-0.82\%$
test_step_mdp_speed[True-False-False-False-True] 41.1770μs 14.6965μs 68.0435 KOps/s 67.5303 KOps/s $\color{#35bf28}+0.76\%$
test_step_mdp_speed[True-False-False-False-False] 42.8000μs 9.9860μs 100.1398 KOps/s 100.2683 KOps/s $\color{#d91a1a}-0.13\%$
test_step_mdp_speed[False-True-True-True-True] 77.6120μs 23.7895μs 42.0353 KOps/s 41.8339 KOps/s $\color{#35bf28}+0.48\%$
test_step_mdp_speed[False-True-True-True-False] 39.1320μs 15.5435μs 64.3357 KOps/s 64.9648 KOps/s $\color{#d91a1a}-0.97\%$
test_step_mdp_speed[False-True-True-False-True] 50.1320μs 15.8114μs 63.2453 KOps/s 63.0400 KOps/s $\color{#35bf28}+0.33\%$
test_step_mdp_speed[False-True-True-False-False] 29.9250μs 10.0260μs 99.7411 KOps/s 100.3707 KOps/s $\color{#d91a1a}-0.63\%$
test_step_mdp_speed[False-True-False-True-True] 36.6270μs 25.1849μs 39.7064 KOps/s 39.6152 KOps/s $\color{#35bf28}+0.23\%$
test_step_mdp_speed[False-True-False-True-False] 49.7330μs 16.7450μs 59.7193 KOps/s 60.4346 KOps/s $\color{#d91a1a}-1.18\%$
test_step_mdp_speed[False-True-False-False-True] 71.1030μs 16.7395μs 59.7389 KOps/s 58.3651 KOps/s $\color{#35bf28}+2.35\%$
test_step_mdp_speed[False-True-False-False-False] 43.6310μs 11.0994μs 90.0949 KOps/s 89.4551 KOps/s $\color{#35bf28}+0.72\%$
test_step_mdp_speed[False-False-True-True-True] 62.4560μs 26.2615μs 38.0786 KOps/s 38.0038 KOps/s $\color{#35bf28}+0.20\%$
test_step_mdp_speed[False-False-True-True-False] 53.9200μs 18.1455μs 55.1101 KOps/s 55.6339 KOps/s $\color{#d91a1a}-0.94\%$
test_step_mdp_speed[False-False-True-False-True] 49.9240μs 16.9642μs 58.9475 KOps/s 58.6930 KOps/s $\color{#35bf28}+0.43\%$
test_step_mdp_speed[False-False-True-False-False] 33.2620μs 11.2375μs 88.9876 KOps/s 89.5627 KOps/s $\color{#d91a1a}-0.64\%$
test_step_mdp_speed[False-False-False-True-True] 63.3480μs 27.0533μs 36.9640 KOps/s 37.2315 KOps/s $\color{#d91a1a}-0.72\%$
test_step_mdp_speed[False-False-False-True-False] 50.3440μs 18.8679μs 53.0002 KOps/s 53.1109 KOps/s $\color{#d91a1a}-0.21\%$
test_step_mdp_speed[False-False-False-False-True] 51.7370μs 17.9337μs 55.7610 KOps/s 55.9292 KOps/s $\color{#d91a1a}-0.30\%$
test_step_mdp_speed[False-False-False-False-False] 33.6520μs 12.2049μs 81.9341 KOps/s 82.4772 KOps/s $\color{#d91a1a}-0.66\%$
test_values[generalized_advantage_estimate-True-True] 10.6565ms 9.1958ms 108.7449 Ops/s 107.4061 Ops/s $\color{#35bf28}+1.25\%$
test_values[vec_generalized_advantage_estimate-True-True] 36.7884ms 35.5049ms 28.1651 Ops/s 28.3415 Ops/s $\color{#d91a1a}-0.62\%$
test_values[td0_return_estimate-False-False] 0.2332ms 0.1671ms 5.9838 KOps/s 5.8761 KOps/s $\color{#35bf28}+1.83\%$
test_values[td1_return_estimate-False-False] 26.1931ms 23.3142ms 42.8922 Ops/s 42.7666 Ops/s $\color{#35bf28}+0.29\%$
test_values[vec_td1_return_estimate-False-False] 37.2376ms 35.4485ms 28.2099 Ops/s 28.2520 Ops/s $\color{#d91a1a}-0.15\%$
test_values[td_lambda_return_estimate-True-False] 36.3910ms 33.5547ms 29.8021 Ops/s 29.7056 Ops/s $\color{#35bf28}+0.33\%$
test_values[vec_td_lambda_return_estimate-True-False] 37.3673ms 35.2407ms 28.3763 Ops/s 28.3448 Ops/s $\color{#35bf28}+0.11\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.2407ms 8.1274ms 123.0399 Ops/s 120.6872 Ops/s $\color{#35bf28}+1.95\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.2287ms 1.9535ms 511.8892 Ops/s 501.5317 Ops/s $\color{#35bf28}+2.07\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4429ms 0.3548ms 2.8186 KOps/s 2.8635 KOps/s $\color{#d91a1a}-1.57\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 47.0600ms 45.5614ms 21.9484 Ops/s 23.0125 Ops/s $\color{#d91a1a}-4.62\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.6062ms 3.0492ms 327.9502 Ops/s 326.7621 Ops/s $\color{#35bf28}+0.36\%$
test_dqn_speed 6.9232ms 1.3760ms 726.7218 Ops/s 743.2669 Ops/s $\color{#d91a1a}-2.23\%$
test_ddpg_speed 76.0314ms 2.9212ms 342.3222 Ops/s 373.3936 Ops/s $\textbf{\color{#d91a1a}-8.32\%}$
test_sac_speed 11.4620ms 8.6259ms 115.9300 Ops/s 118.9795 Ops/s $\color{#d91a1a}-2.56\%$
test_redq_speed 14.4924ms 13.3937ms 74.6620 Ops/s 75.2422 Ops/s $\color{#d91a1a}-0.77\%$
test_redq_deprec_speed 14.9502ms 13.5097ms 74.0212 Ops/s 73.4174 Ops/s $\color{#35bf28}+0.82\%$
test_td3_speed 9.1961ms 8.6742ms 115.2846 Ops/s 117.7646 Ops/s $\color{#d91a1a}-2.11\%$
test_cql_speed 38.4894ms 36.7788ms 27.1896 Ops/s 27.5702 Ops/s $\color{#d91a1a}-1.38\%$
test_a2c_speed 8.5984ms 7.2601ms 137.7398 Ops/s 137.8853 Ops/s $\color{#d91a1a}-0.11\%$
test_ppo_speed 8.6482ms 7.5817ms 131.8964 Ops/s 130.2702 Ops/s $\color{#35bf28}+1.25\%$
test_reinforce_speed 7.7629ms 6.5367ms 152.9825 Ops/s 148.8322 Ops/s $\color{#35bf28}+2.79\%$
test_iql_speed 35.4595ms 33.8188ms 29.5694 Ops/s 30.3960 Ops/s $\color{#d91a1a}-2.72\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 4.0936ms 2.8033ms 356.7217 Ops/s 369.5628 Ops/s $\color{#d91a1a}-3.47\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7265ms 0.5131ms 1.9488 KOps/s 1.9334 KOps/s $\color{#35bf28}+0.80\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7354ms 0.4826ms 2.0723 KOps/s 2.0389 KOps/s $\color{#35bf28}+1.64\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.9639ms 2.5873ms 386.5040 Ops/s 361.1666 Ops/s $\textbf{\color{#35bf28}+7.02\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1923ms 0.5106ms 1.9585 KOps/s 1.9853 KOps/s $\color{#d91a1a}-1.35\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5970ms 0.4819ms 2.0752 KOps/s 2.1095 KOps/s $\color{#d91a1a}-1.63\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.4537ms 2.9416ms 339.9464 Ops/s 343.7348 Ops/s $\color{#d91a1a}-1.10\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8495ms 0.6350ms 1.5747 KOps/s 1.5896 KOps/s $\color{#d91a1a}-0.94\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.7688ms 0.6252ms 1.5995 KOps/s 1.6820 KOps/s $\color{#d91a1a}-4.90\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.3941ms 2.8033ms 356.7224 Ops/s 375.9183 Ops/s $\textbf{\color{#d91a1a}-5.11\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7698ms 0.5124ms 1.9515 KOps/s 1.3052 KOps/s $\textbf{\color{#35bf28}+49.52\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7651ms 0.4862ms 2.0567 KOps/s 1.8907 KOps/s $\textbf{\color{#35bf28}+8.78\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.0226ms 2.7497ms 363.6702 Ops/s 353.7156 Ops/s $\color{#35bf28}+2.81\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7714ms 0.5084ms 1.9670 KOps/s 1.9456 KOps/s $\color{#35bf28}+1.10\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6043ms 0.4797ms 2.0844 KOps/s 2.0853 KOps/s $\color{#d91a1a}-0.04\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.1597ms 2.8405ms 352.0506 Ops/s 360.1406 Ops/s $\color{#d91a1a}-2.25\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9462ms 0.6333ms 1.5791 KOps/s 1.5937 KOps/s $\color{#d91a1a}-0.92\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7022ms 0.6013ms 1.6632 KOps/s 1.6624 KOps/s $\color{#35bf28}+0.05\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1031s 7.9982ms 125.0275 Ops/s 98.4946 Ops/s $\textbf{\color{#35bf28}+26.94\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 16.0388ms 13.3915ms 74.6742 Ops/s 75.1223 Ops/s $\color{#d91a1a}-0.60\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 4.7826ms 2.5359ms 394.3338 Ops/s 397.9345 Ops/s $\color{#d91a1a}-0.90\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1015s 9.8111ms 101.9250 Ops/s 100.7787 Ops/s $\color{#35bf28}+1.14\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 15.7892ms 13.3983ms 74.6362 Ops/s 75.0481 Ops/s $\color{#d91a1a}-0.55\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 4.6059ms 2.5035ms 399.4357 Ops/s 397.7475 Ops/s $\color{#35bf28}+0.42\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1092s 10.3276ms 96.8277 Ops/s 121.2863 Ops/s $\textbf{\color{#d91a1a}-20.17\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 15.4244ms 13.5217ms 73.9550 Ops/s 71.2364 Ops/s $\color{#35bf28}+3.82\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 4.8537ms 2.7519ms 363.3884 Ops/s 358.1906 Ops/s $\color{#35bf28}+1.45\%$

Copy link

github-actions bot commented Feb 16, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 92. Improved: $\large\color{#35bf28}1$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1157s 0.1142s 8.7576 Ops/s 8.8288 Ops/s $\color{#d91a1a}-0.81\%$
test_sync 0.1707s 0.1026s 9.7507 Ops/s 9.7441 Ops/s $\color{#35bf28}+0.07\%$
test_async 0.2528s 91.8171ms 10.8912 Ops/s 11.0597 Ops/s $\color{#d91a1a}-1.52\%$
test_single_pixels 0.1272s 0.1263s 7.9168 Ops/s 7.6460 Ops/s $\color{#35bf28}+3.54\%$
test_sync_pixels 87.2121ms 85.0504ms 11.7577 Ops/s 12.1852 Ops/s $\color{#d91a1a}-3.51\%$
test_async_pixels 0.1355s 70.9403ms 14.0964 Ops/s 13.9422 Ops/s $\color{#35bf28}+1.11\%$
test_simple 0.9043s 0.8355s 1.1969 Ops/s 1.2330 Ops/s $\color{#d91a1a}-2.93\%$
test_transformed 1.1290s 1.0631s 0.9406 Ops/s 0.9451 Ops/s $\color{#d91a1a}-0.48\%$
test_serial 2.5199s 2.4553s 0.4073 Ops/s 0.4088 Ops/s $\color{#d91a1a}-0.37\%$
test_parallel 2.2677s 2.1115s 0.4736 Ops/s 0.4751 Ops/s $\color{#d91a1a}-0.31\%$
test_step_mdp_speed[True-True-True-True-True] 87.0810μs 32.8434μs 30.4475 KOps/s 29.2596 KOps/s $\color{#35bf28}+4.06\%$
test_step_mdp_speed[True-True-True-True-False] 44.8410μs 19.7409μs 50.6564 KOps/s 49.9266 KOps/s $\color{#35bf28}+1.46\%$
test_step_mdp_speed[True-True-True-False-True] 48.4810μs 18.6369μs 53.6569 KOps/s 52.0673 KOps/s $\color{#35bf28}+3.05\%$
test_step_mdp_speed[True-True-True-False-False] 32.8610μs 11.2162μs 89.1564 KOps/s 87.5535 KOps/s $\color{#35bf28}+1.83\%$
test_step_mdp_speed[True-True-False-True-True] 65.2620μs 34.6790μs 28.8359 KOps/s 27.9348 KOps/s $\color{#35bf28}+3.23\%$
test_step_mdp_speed[True-True-False-True-False] 54.6100μs 21.6663μs 46.1547 KOps/s 45.2675 KOps/s $\color{#35bf28}+1.96\%$
test_step_mdp_speed[True-True-False-False-True] 80.3310μs 20.5448μs 48.6740 KOps/s 46.9767 KOps/s $\color{#35bf28}+3.61\%$
test_step_mdp_speed[True-True-False-False-False] 35.5300μs 13.0589μs 76.5763 KOps/s 74.8786 KOps/s $\color{#35bf28}+2.27\%$
test_step_mdp_speed[True-False-True-True-True] 0.1031ms 36.5943μs 27.3267 KOps/s 26.4052 KOps/s $\color{#35bf28}+3.49\%$
test_step_mdp_speed[True-False-True-True-False] 63.8310μs 23.5826μs 42.4042 KOps/s 41.5358 KOps/s $\color{#35bf28}+2.09\%$
test_step_mdp_speed[True-False-True-False-True] 54.2200μs 20.4328μs 48.9408 KOps/s 46.7513 KOps/s $\color{#35bf28}+4.68\%$
test_step_mdp_speed[True-False-True-False-False] 34.1810μs 13.0897μs 76.3960 KOps/s 75.1702 KOps/s $\color{#35bf28}+1.63\%$
test_step_mdp_speed[True-False-False-True-True] 70.6310μs 38.4757μs 25.9904 KOps/s 25.4093 KOps/s $\color{#35bf28}+2.29\%$
test_step_mdp_speed[True-False-False-True-False] 58.7210μs 25.3804μs 39.4005 KOps/s 38.5052 KOps/s $\color{#35bf28}+2.33\%$
test_step_mdp_speed[True-False-False-False-True] 48.1010μs 22.4847μs 44.4748 KOps/s 43.6019 KOps/s $\color{#35bf28}+2.00\%$
test_step_mdp_speed[True-False-False-False-False] 39.6410μs 14.9279μs 66.9885 KOps/s 65.4596 KOps/s $\color{#35bf28}+2.34\%$
test_step_mdp_speed[False-True-True-True-True] 82.9920μs 36.6687μs 27.2712 KOps/s 26.0368 KOps/s $\color{#35bf28}+4.74\%$
test_step_mdp_speed[False-True-True-True-False] 57.2710μs 23.5155μs 42.5252 KOps/s 41.6599 KOps/s $\color{#35bf28}+2.08\%$
test_step_mdp_speed[False-True-True-False-True] 49.6100μs 25.2399μs 39.6198 KOps/s 40.1748 KOps/s $\color{#d91a1a}-1.38\%$
test_step_mdp_speed[False-True-True-False-False] 42.4400μs 15.0460μs 66.4627 KOps/s 64.7539 KOps/s $\color{#35bf28}+2.64\%$
test_step_mdp_speed[False-True-False-True-True] 68.3210μs 38.1973μs 26.1799 KOps/s 25.0704 KOps/s $\color{#35bf28}+4.43\%$
test_step_mdp_speed[False-True-False-True-False] 49.7210μs 25.4467μs 39.2978 KOps/s 38.2967 KOps/s $\color{#35bf28}+2.61\%$
test_step_mdp_speed[False-True-False-False-True] 62.1010μs 26.8153μs 37.2922 KOps/s 37.0781 KOps/s $\color{#35bf28}+0.58\%$
test_step_mdp_speed[False-True-False-False-False] 39.9110μs 16.7456μs 59.7171 KOps/s 58.1409 KOps/s $\color{#35bf28}+2.71\%$
test_step_mdp_speed[False-False-True-True-True] 75.6410μs 40.7612μs 24.5331 KOps/s 23.7349 KOps/s $\color{#35bf28}+3.36\%$
test_step_mdp_speed[False-False-True-True-False] 96.7110μs 27.3474μs 36.5665 KOps/s 35.6779 KOps/s $\color{#35bf28}+2.49\%$
test_step_mdp_speed[False-False-True-False-True] 51.1810μs 26.6028μs 37.5900 KOps/s 37.2982 KOps/s $\color{#35bf28}+0.78\%$
test_step_mdp_speed[False-False-True-False-False] 48.4410μs 17.0160μs 58.7682 KOps/s 58.4837 KOps/s $\color{#35bf28}+0.49\%$
test_step_mdp_speed[False-False-False-True-True] 73.1410μs 41.9620μs 23.8311 KOps/s 23.2710 KOps/s $\color{#35bf28}+2.41\%$
test_step_mdp_speed[False-False-False-True-False] 62.1710μs 28.9660μs 34.5232 KOps/s 33.2729 KOps/s $\color{#35bf28}+3.76\%$
test_step_mdp_speed[False-False-False-False-True] 97.6820μs 27.8131μs 35.9543 KOps/s 35.1146 KOps/s $\color{#35bf28}+2.39\%$
test_step_mdp_speed[False-False-False-False-False] 44.9700μs 18.4982μs 54.0594 KOps/s 52.4089 KOps/s $\color{#35bf28}+3.15\%$
test_values[generalized_advantage_estimate-True-True] 25.0234ms 24.3749ms 41.0258 Ops/s 43.2840 Ops/s $\textbf{\color{#d91a1a}-5.22\%}$
test_values[vec_generalized_advantage_estimate-True-True] 80.8190ms 3.1708ms 315.3760 Ops/s 310.3869 Ops/s $\color{#35bf28}+1.61\%$
test_values[td0_return_estimate-False-False] 0.1040ms 62.9827μs 15.8774 KOps/s 17.2439 KOps/s $\textbf{\color{#d91a1a}-7.92\%}$
test_values[td1_return_estimate-False-False] 53.7380ms 52.8491ms 18.9218 Ops/s 20.1115 Ops/s $\textbf{\color{#d91a1a}-5.92\%}$
test_values[vec_td1_return_estimate-False-False] 2.0497ms 1.7700ms 564.9578 Ops/s 573.9436 Ops/s $\color{#d91a1a}-1.57\%$
test_values[td_lambda_return_estimate-True-False] 85.5226ms 84.8372ms 11.7873 Ops/s 12.5684 Ops/s $\textbf{\color{#d91a1a}-6.22\%}$
test_values[vec_td_lambda_return_estimate-True-False] 3.9902ms 1.7822ms 561.0896 Ops/s 565.3948 Ops/s $\color{#d91a1a}-0.76\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 24.3885ms 23.4005ms 42.7342 Ops/s 44.0351 Ops/s $\color{#d91a1a}-2.95\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.8559ms 0.6788ms 1.4733 KOps/s 1.4946 KOps/s $\color{#d91a1a}-1.43\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.6934ms 0.6336ms 1.5784 KOps/s 1.5847 KOps/s $\color{#d91a1a}-0.40\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5100ms 1.4422ms 693.3890 Ops/s 699.3882 Ops/s $\color{#d91a1a}-0.86\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9182ms 0.6719ms 1.4884 KOps/s 1.5573 KOps/s $\color{#d91a1a}-4.42\%$
test_dqn_speed 7.7430ms 1.4732ms 678.7782 Ops/s 699.8349 Ops/s $\color{#d91a1a}-3.01\%$
test_ddpg_speed 3.3456ms 2.7790ms 359.8475 Ops/s 365.5831 Ops/s $\color{#d91a1a}-1.57\%$
test_sac_speed 8.9717ms 8.5602ms 116.8193 Ops/s 119.1779 Ops/s $\color{#d91a1a}-1.98\%$
test_redq_speed 11.2096ms 10.5481ms 94.8037 Ops/s 94.9232 Ops/s $\color{#d91a1a}-0.13\%$
test_redq_deprec_speed 12.2041ms 11.6439ms 85.8817 Ops/s 86.6681 Ops/s $\color{#d91a1a}-0.91\%$
test_td3_speed 8.8919ms 8.7223ms 114.6486 Ops/s 115.8122 Ops/s $\color{#d91a1a}-1.00\%$
test_cql_speed 26.4657ms 25.7658ms 38.8111 Ops/s 38.9932 Ops/s $\color{#d91a1a}-0.47\%$
test_a2c_speed 5.6820ms 5.4447ms 183.6664 Ops/s 184.5413 Ops/s $\color{#d91a1a}-0.47\%$
test_ppo_speed 6.2554ms 5.8233ms 171.7225 Ops/s 175.3342 Ops/s $\color{#d91a1a}-2.06\%$
test_reinforce_speed 4.7441ms 4.5485ms 219.8504 Ops/s 223.8080 Ops/s $\color{#d91a1a}-1.77\%$
test_iql_speed 0.1128s 22.2552ms 44.9332 Ops/s 50.5367 Ops/s $\textbf{\color{#d91a1a}-11.09\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.6981ms 3.5784ms 279.4540 Ops/s 279.3223 Ops/s $\color{#35bf28}+0.05\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7535ms 0.5659ms 1.7672 KOps/s 1.7856 KOps/s $\color{#d91a1a}-1.03\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6867ms 0.5361ms 1.8652 KOps/s 1.8827 KOps/s $\color{#d91a1a}-0.93\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.0217ms 3.5875ms 278.7480 Ops/s 279.1606 Ops/s $\color{#d91a1a}-0.15\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7741ms 0.5598ms 1.7864 KOps/s 1.8097 KOps/s $\color{#d91a1a}-1.29\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6948ms 0.5334ms 1.8749 KOps/s 1.8960 KOps/s $\color{#d91a1a}-1.11\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.8098ms 3.6993ms 270.3178 Ops/s 267.5877 Ops/s $\color{#35bf28}+1.02\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8221ms 0.6958ms 1.4372 KOps/s 1.4556 KOps/s $\color{#d91a1a}-1.27\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8234ms 0.6684ms 1.4961 KOps/s 1.5183 KOps/s $\color{#d91a1a}-1.46\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.6596ms 3.5777ms 279.5070 Ops/s 279.1142 Ops/s $\color{#35bf28}+0.14\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7041ms 0.5665ms 1.7653 KOps/s 1.7877 KOps/s $\color{#d91a1a}-1.25\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7236ms 0.5379ms 1.8592 KOps/s 1.8739 KOps/s $\color{#d91a1a}-0.78\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.7725ms 3.5847ms 278.9635 Ops/s 276.4390 Ops/s $\color{#35bf28}+0.91\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6796ms 0.5590ms 1.7888 KOps/s 1.7988 KOps/s $\color{#d91a1a}-0.55\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6741ms 0.5309ms 1.8837 KOps/s 1.8848 KOps/s $\color{#d91a1a}-0.06\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.8251ms 3.7075ms 269.7202 Ops/s 267.3695 Ops/s $\color{#35bf28}+0.88\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8790ms 0.6977ms 1.4333 KOps/s 1.4526 KOps/s $\color{#d91a1a}-1.32\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8542ms 0.6674ms 1.4984 KOps/s 1.5079 KOps/s $\color{#d91a1a}-0.63\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1405s 10.3663ms 96.4663 Ops/s 99.4814 Ops/s $\color{#d91a1a}-3.03\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 18.5166ms 16.0945ms 62.1332 Ops/s 55.2287 Ops/s $\textbf{\color{#35bf28}+12.50\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 8.1363ms 3.0801ms 324.6682 Ops/s 332.7289 Ops/s $\color{#d91a1a}-2.42\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1176s 9.8935ms 101.0762 Ops/s 101.6335 Ops/s $\color{#d91a1a}-0.55\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 18.4196ms 16.0326ms 62.3731 Ops/s 62.1991 Ops/s $\color{#35bf28}+0.28\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 8.1837ms 3.1085ms 321.7010 Ops/s 328.0462 Ops/s $\color{#d91a1a}-1.93\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1189s 10.2383ms 97.6725 Ops/s 97.3370 Ops/s $\color{#35bf28}+0.34\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1246s 18.4504ms 54.1993 Ops/s 60.2175 Ops/s $\textbf{\color{#d91a1a}-9.99\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 5.9103ms 3.2949ms 303.4972 Ops/s 298.1531 Ops/s $\color{#35bf28}+1.79\%$

@vmoens
Copy link
Contributor Author

vmoens commented Feb 17, 2024

@albertbou92 we should include the KL version of PPO in our SOTA benchmarks to make sure this is covered too!

@vmoens vmoens added bug Something isn't working Suitable for minor Suitable to be integrated in minor release (no new feature) labels Feb 17, 2024
@vmoens vmoens merged commit e538fdc into main Feb 17, 2024
18 of 27 checks passed
@vmoens vmoens deleted the fix-kl-ppo branch February 17, 2024 08:04
@skandermoalla
Copy link
Contributor

skandermoalla commented Feb 17, 2024

@vmoens @albertbou92 you may very likely need to redesign the beta update. Updating every minibatch (every loss run) and on a minibatch kl, which may be noisy, is very unstable. I would recommend taking it out of the forward, making a method called something like loss.update_adaptive parameters(kl) that also takes an estimate kl (that is typically the mean kl over all the minibatches of the last epoch) and call it outside at the algorithm level at every batch (i.e. rollout).
This is what is done in the original implementation of KLPPO
https://github.com/joschu/modular_rl/blob/6970cde3da265cf2a98537250fea5e0c0d9a7639/modular_rl/ppo.py#L208C19-

@albertbou92
Copy link
Contributor

albertbou92 commented Feb 17, 2024

it is also useful sometimes to be able to use the ClipPPOLoss with the KL term (for example against a prior). I think we should included that. Maybe the KL term should be a loss module by itself so it can be combined with any other losses? we could reproduce KLPPO as PPOLoss + KLLoss for example, and at the same time enable other options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Suitable for minor Suitable to be integrated in minor release (no new feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] kl divergence calculation in KLPENPPOLoss is always zero
4 participants