-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix KLPENPPOLoss KL computation #1922
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1922
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (25 Unrelated Failures)As of commit 33abbad with merge base 67f659c ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_single | 62.3462ms | 61.9360ms | 16.1457 Ops/s | 16.4474 Ops/s | |
test_sync | 40.8298ms | 33.5033ms | 29.8478 Ops/s | 30.4009 Ops/s | |
test_async | 74.1144ms | 32.3004ms | 30.9594 Ops/s | 30.9322 Ops/s | |
test_simple | 0.5040s | 0.4403s | 2.2710 Ops/s | 2.3188 Ops/s | |
test_transformed | 0.6496s | 0.5954s | 1.6795 Ops/s | 1.7131 Ops/s | |
test_serial | 1.5047s | 1.4569s | 0.6864 Ops/s | 0.7048 Ops/s | |
test_parallel | 1.4898s | 1.3995s | 0.7145 Ops/s | 0.7283 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 0.1009ms | 20.6613μs | 48.3997 KOps/s | 46.9842 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 42.4590μs | 12.8265μs | 77.9639 KOps/s | 77.4479 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 36.4680μs | 12.3463μs | 80.9961 KOps/s | 80.6192 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 41.0470μs | 7.4513μs | 134.2048 KOps/s | 133.8685 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 46.5070μs | 22.3051μs | 44.8328 KOps/s | 44.0703 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 41.7980μs | 14.0209μs | 71.3223 KOps/s | 70.2532 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 36.4680μs | 13.6043μs | 73.5060 KOps/s | 73.3563 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 38.9720μs | 8.7118μs | 114.7872 KOps/s | 114.1941 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 49.3830μs | 23.7675μs | 42.0743 KOps/s | 41.9236 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 56.4750μs | 15.4874μs | 64.5688 KOps/s | 64.0067 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 85.7500μs | 13.4451μs | 74.3765 KOps/s | 73.1324 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 35.6170μs | 8.7185μs | 114.6984 KOps/s | 115.7210 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 61.0940μs | 25.1290μs | 39.7946 KOps/s | 40.7843 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 41.6680μs | 16.6824μs | 59.9436 KOps/s | 60.4378 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 41.1770μs | 14.6965μs | 68.0435 KOps/s | 67.5303 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 42.8000μs | 9.9860μs | 100.1398 KOps/s | 100.2683 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 77.6120μs | 23.7895μs | 42.0353 KOps/s | 41.8339 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 39.1320μs | 15.5435μs | 64.3357 KOps/s | 64.9648 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 50.1320μs | 15.8114μs | 63.2453 KOps/s | 63.0400 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 29.9250μs | 10.0260μs | 99.7411 KOps/s | 100.3707 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 36.6270μs | 25.1849μs | 39.7064 KOps/s | 39.6152 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 49.7330μs | 16.7450μs | 59.7193 KOps/s | 60.4346 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 71.1030μs | 16.7395μs | 59.7389 KOps/s | 58.3651 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 43.6310μs | 11.0994μs | 90.0949 KOps/s | 89.4551 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 62.4560μs | 26.2615μs | 38.0786 KOps/s | 38.0038 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 53.9200μs | 18.1455μs | 55.1101 KOps/s | 55.6339 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 49.9240μs | 16.9642μs | 58.9475 KOps/s | 58.6930 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 33.2620μs | 11.2375μs | 88.9876 KOps/s | 89.5627 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 63.3480μs | 27.0533μs | 36.9640 KOps/s | 37.2315 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 50.3440μs | 18.8679μs | 53.0002 KOps/s | 53.1109 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 51.7370μs | 17.9337μs | 55.7610 KOps/s | 55.9292 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 33.6520μs | 12.2049μs | 81.9341 KOps/s | 82.4772 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 10.6565ms | 9.1958ms | 108.7449 Ops/s | 107.4061 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 36.7884ms | 35.5049ms | 28.1651 Ops/s | 28.3415 Ops/s | |
test_values[td0_return_estimate-False-False] | 0.2332ms | 0.1671ms | 5.9838 KOps/s | 5.8761 KOps/s | |
test_values[td1_return_estimate-False-False] | 26.1931ms | 23.3142ms | 42.8922 Ops/s | 42.7666 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 37.2376ms | 35.4485ms | 28.2099 Ops/s | 28.2520 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 36.3910ms | 33.5547ms | 29.8021 Ops/s | 29.7056 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 37.3673ms | 35.2407ms | 28.3763 Ops/s | 28.3448 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 8.2407ms | 8.1274ms | 123.0399 Ops/s | 120.6872 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 2.2287ms | 1.9535ms | 511.8892 Ops/s | 501.5317 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.4429ms | 0.3548ms | 2.8186 KOps/s | 2.8635 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 47.0600ms | 45.5614ms | 21.9484 Ops/s | 23.0125 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 4.6062ms | 3.0492ms | 327.9502 Ops/s | 326.7621 Ops/s | |
test_dqn_speed | 6.9232ms | 1.3760ms | 726.7218 Ops/s | 743.2669 Ops/s | |
test_ddpg_speed | 76.0314ms | 2.9212ms | 342.3222 Ops/s | 373.3936 Ops/s | |
test_sac_speed | 11.4620ms | 8.6259ms | 115.9300 Ops/s | 118.9795 Ops/s | |
test_redq_speed | 14.4924ms | 13.3937ms | 74.6620 Ops/s | 75.2422 Ops/s | |
test_redq_deprec_speed | 14.9502ms | 13.5097ms | 74.0212 Ops/s | 73.4174 Ops/s | |
test_td3_speed | 9.1961ms | 8.6742ms | 115.2846 Ops/s | 117.7646 Ops/s | |
test_cql_speed | 38.4894ms | 36.7788ms | 27.1896 Ops/s | 27.5702 Ops/s | |
test_a2c_speed | 8.5984ms | 7.2601ms | 137.7398 Ops/s | 137.8853 Ops/s | |
test_ppo_speed | 8.6482ms | 7.5817ms | 131.8964 Ops/s | 130.2702 Ops/s | |
test_reinforce_speed | 7.7629ms | 6.5367ms | 152.9825 Ops/s | 148.8322 Ops/s | |
test_iql_speed | 35.4595ms | 33.8188ms | 29.5694 Ops/s | 30.3960 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 4.0936ms | 2.8033ms | 356.7217 Ops/s | 369.5628 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.7265ms | 0.5131ms | 1.9488 KOps/s | 1.9334 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.7354ms | 0.4826ms | 2.0723 KOps/s | 2.0389 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 2.9639ms | 2.5873ms | 386.5040 Ops/s | 361.1666 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.1923ms | 0.5106ms | 1.9585 KOps/s | 1.9853 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5970ms | 0.4819ms | 2.0752 KOps/s | 2.1095 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.4537ms | 2.9416ms | 339.9464 Ops/s | 343.7348 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8495ms | 0.6350ms | 1.5747 KOps/s | 1.5896 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 1.7688ms | 0.6252ms | 1.5995 KOps/s | 1.6820 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.3941ms | 2.8033ms | 356.7224 Ops/s | 375.9183 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.7698ms | 0.5124ms | 1.9515 KOps/s | 1.3052 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.7651ms | 0.4862ms | 2.0567 KOps/s | 1.8907 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.0226ms | 2.7497ms | 363.6702 Ops/s | 353.7156 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.7714ms | 0.5084ms | 1.9670 KOps/s | 1.9456 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6043ms | 0.4797ms | 2.0844 KOps/s | 2.0853 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 4.1597ms | 2.8405ms | 352.0506 Ops/s | 360.1406 Ops/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.9462ms | 0.6333ms | 1.5791 KOps/s | 1.5937 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7022ms | 0.6013ms | 1.6632 KOps/s | 1.6624 KOps/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1031s | 7.9982ms | 125.0275 Ops/s | 98.4946 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 16.0388ms | 13.3915ms | 74.6742 Ops/s | 75.1223 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 4.7826ms | 2.5359ms | 394.3338 Ops/s | 397.9345 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.1015s | 9.8111ms | 101.9250 Ops/s | 100.7787 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 15.7892ms | 13.3983ms | 74.6362 Ops/s | 75.0481 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 4.6059ms | 2.5035ms | 399.4357 Ops/s | 397.7475 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.1092s | 10.3276ms | 96.8277 Ops/s | 121.2863 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 15.4244ms | 13.5217ms | 73.9550 Ops/s | 71.2364 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 4.8537ms | 2.7519ms | 363.3884 Ops/s | 358.1906 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_single | 0.1157s | 0.1142s | 8.7576 Ops/s | 8.8288 Ops/s | |
test_sync | 0.1707s | 0.1026s | 9.7507 Ops/s | 9.7441 Ops/s | |
test_async | 0.2528s | 91.8171ms | 10.8912 Ops/s | 11.0597 Ops/s | |
test_single_pixels | 0.1272s | 0.1263s | 7.9168 Ops/s | 7.6460 Ops/s | |
test_sync_pixels | 87.2121ms | 85.0504ms | 11.7577 Ops/s | 12.1852 Ops/s | |
test_async_pixels | 0.1355s | 70.9403ms | 14.0964 Ops/s | 13.9422 Ops/s | |
test_simple | 0.9043s | 0.8355s | 1.1969 Ops/s | 1.2330 Ops/s | |
test_transformed | 1.1290s | 1.0631s | 0.9406 Ops/s | 0.9451 Ops/s | |
test_serial | 2.5199s | 2.4553s | 0.4073 Ops/s | 0.4088 Ops/s | |
test_parallel | 2.2677s | 2.1115s | 0.4736 Ops/s | 0.4751 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 87.0810μs | 32.8434μs | 30.4475 KOps/s | 29.2596 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 44.8410μs | 19.7409μs | 50.6564 KOps/s | 49.9266 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 48.4810μs | 18.6369μs | 53.6569 KOps/s | 52.0673 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 32.8610μs | 11.2162μs | 89.1564 KOps/s | 87.5535 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 65.2620μs | 34.6790μs | 28.8359 KOps/s | 27.9348 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 54.6100μs | 21.6663μs | 46.1547 KOps/s | 45.2675 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 80.3310μs | 20.5448μs | 48.6740 KOps/s | 46.9767 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 35.5300μs | 13.0589μs | 76.5763 KOps/s | 74.8786 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 0.1031ms | 36.5943μs | 27.3267 KOps/s | 26.4052 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 63.8310μs | 23.5826μs | 42.4042 KOps/s | 41.5358 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 54.2200μs | 20.4328μs | 48.9408 KOps/s | 46.7513 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 34.1810μs | 13.0897μs | 76.3960 KOps/s | 75.1702 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 70.6310μs | 38.4757μs | 25.9904 KOps/s | 25.4093 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 58.7210μs | 25.3804μs | 39.4005 KOps/s | 38.5052 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 48.1010μs | 22.4847μs | 44.4748 KOps/s | 43.6019 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 39.6410μs | 14.9279μs | 66.9885 KOps/s | 65.4596 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 82.9920μs | 36.6687μs | 27.2712 KOps/s | 26.0368 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 57.2710μs | 23.5155μs | 42.5252 KOps/s | 41.6599 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 49.6100μs | 25.2399μs | 39.6198 KOps/s | 40.1748 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 42.4400μs | 15.0460μs | 66.4627 KOps/s | 64.7539 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 68.3210μs | 38.1973μs | 26.1799 KOps/s | 25.0704 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 49.7210μs | 25.4467μs | 39.2978 KOps/s | 38.2967 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 62.1010μs | 26.8153μs | 37.2922 KOps/s | 37.0781 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 39.9110μs | 16.7456μs | 59.7171 KOps/s | 58.1409 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 75.6410μs | 40.7612μs | 24.5331 KOps/s | 23.7349 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 96.7110μs | 27.3474μs | 36.5665 KOps/s | 35.6779 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 51.1810μs | 26.6028μs | 37.5900 KOps/s | 37.2982 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 48.4410μs | 17.0160μs | 58.7682 KOps/s | 58.4837 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 73.1410μs | 41.9620μs | 23.8311 KOps/s | 23.2710 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 62.1710μs | 28.9660μs | 34.5232 KOps/s | 33.2729 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 97.6820μs | 27.8131μs | 35.9543 KOps/s | 35.1146 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 44.9700μs | 18.4982μs | 54.0594 KOps/s | 52.4089 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 25.0234ms | 24.3749ms | 41.0258 Ops/s | 43.2840 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 80.8190ms | 3.1708ms | 315.3760 Ops/s | 310.3869 Ops/s | |
test_values[td0_return_estimate-False-False] | 0.1040ms | 62.9827μs | 15.8774 KOps/s | 17.2439 KOps/s | |
test_values[td1_return_estimate-False-False] | 53.7380ms | 52.8491ms | 18.9218 Ops/s | 20.1115 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 2.0497ms | 1.7700ms | 564.9578 Ops/s | 573.9436 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 85.5226ms | 84.8372ms | 11.7873 Ops/s | 12.5684 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 3.9902ms | 1.7822ms | 561.0896 Ops/s | 565.3948 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 24.3885ms | 23.4005ms | 42.7342 Ops/s | 44.0351 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 0.8559ms | 0.6788ms | 1.4733 KOps/s | 1.4946 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.6934ms | 0.6336ms | 1.5784 KOps/s | 1.5847 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.5100ms | 1.4422ms | 693.3890 Ops/s | 699.3882 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.9182ms | 0.6719ms | 1.4884 KOps/s | 1.5573 KOps/s | |
test_dqn_speed | 7.7430ms | 1.4732ms | 678.7782 Ops/s | 699.8349 Ops/s | |
test_ddpg_speed | 3.3456ms | 2.7790ms | 359.8475 Ops/s | 365.5831 Ops/s | |
test_sac_speed | 8.9717ms | 8.5602ms | 116.8193 Ops/s | 119.1779 Ops/s | |
test_redq_speed | 11.2096ms | 10.5481ms | 94.8037 Ops/s | 94.9232 Ops/s | |
test_redq_deprec_speed | 12.2041ms | 11.6439ms | 85.8817 Ops/s | 86.6681 Ops/s | |
test_td3_speed | 8.8919ms | 8.7223ms | 114.6486 Ops/s | 115.8122 Ops/s | |
test_cql_speed | 26.4657ms | 25.7658ms | 38.8111 Ops/s | 38.9932 Ops/s | |
test_a2c_speed | 5.6820ms | 5.4447ms | 183.6664 Ops/s | 184.5413 Ops/s | |
test_ppo_speed | 6.2554ms | 5.8233ms | 171.7225 Ops/s | 175.3342 Ops/s | |
test_reinforce_speed | 4.7441ms | 4.5485ms | 219.8504 Ops/s | 223.8080 Ops/s | |
test_iql_speed | 0.1128s | 22.2552ms | 44.9332 Ops/s | 50.5367 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.6981ms | 3.5784ms | 279.4540 Ops/s | 279.3223 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.7535ms | 0.5659ms | 1.7672 KOps/s | 1.7856 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6867ms | 0.5361ms | 1.8652 KOps/s | 1.8827 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 4.0217ms | 3.5875ms | 278.7480 Ops/s | 279.1606 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.7741ms | 0.5598ms | 1.7864 KOps/s | 1.8097 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6948ms | 0.5334ms | 1.8749 KOps/s | 1.8960 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.8098ms | 3.6993ms | 270.3178 Ops/s | 267.5877 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8221ms | 0.6958ms | 1.4372 KOps/s | 1.4556 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8234ms | 0.6684ms | 1.4961 KOps/s | 1.5183 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.6596ms | 3.5777ms | 279.5070 Ops/s | 279.1142 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.7041ms | 0.5665ms | 1.7653 KOps/s | 1.7877 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.7236ms | 0.5379ms | 1.8592 KOps/s | 1.8739 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.7725ms | 3.5847ms | 278.9635 Ops/s | 276.4390 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.6796ms | 0.5590ms | 1.7888 KOps/s | 1.7988 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6741ms | 0.5309ms | 1.8837 KOps/s | 1.8848 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.8251ms | 3.7075ms | 269.7202 Ops/s | 267.3695 Ops/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8790ms | 0.6977ms | 1.4333 KOps/s | 1.4526 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8542ms | 0.6674ms | 1.4984 KOps/s | 1.5079 KOps/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1405s | 10.3663ms | 96.4663 Ops/s | 99.4814 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 18.5166ms | 16.0945ms | 62.1332 Ops/s | 55.2287 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 8.1363ms | 3.0801ms | 324.6682 Ops/s | 332.7289 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.1176s | 9.8935ms | 101.0762 Ops/s | 101.6335 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 18.4196ms | 16.0326ms | 62.3731 Ops/s | 62.1991 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 8.1837ms | 3.1085ms | 321.7010 Ops/s | 328.0462 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.1189s | 10.2383ms | 97.6725 Ops/s | 97.3370 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 0.1246s | 18.4504ms | 54.1993 Ops/s | 60.2175 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 5.9103ms | 3.2949ms | 303.4972 Ops/s | 298.1531 Ops/s |
@albertbou92 we should include the KL version of PPO in our SOTA benchmarks to make sure this is covered too! |
@vmoens @albertbou92 you may very likely need to redesign the beta update. Updating every minibatch (every loss run) and on a minibatch kl, which may be noisy, is very unstable. I would recommend taking it out of the forward, making a method called something like |
it is also useful sometimes to be able to use the ClipPPOLoss with the KL term (for example against a prior). I think we should included that. Maybe the KL term should be a loss module by itself so it can be combined with any other losses? we could reproduce KLPPO as PPOLoss + KLLoss for example, and at the same time enable other options. |
Fixes #1920