Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix load_state_dict for TensorDictParams #689

Merged
merged 3 commits into from
Feb 23, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 23, 2024

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 23, 2024
@vmoens vmoens added the bug Something isn't working label Feb 23, 2024
@vmoens vmoens marked this pull request as ready for review February 23, 2024 23:47
@vmoens vmoens merged commit 003be12 into main Feb 23, 2024
23 of 34 checks passed
@vmoens vmoens deleted the state-dict-params branch February 23, 2024 23:47
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 126. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}19$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 57.2780μs 17.2065μs 58.1176 KOps/s 57.4174 KOps/s $\color{#35bf28}+1.22\%$
test_plain_set_stack_nested 51.3460μs 17.5587μs 56.9518 KOps/s 56.6815 KOps/s $\color{#35bf28}+0.48\%$
test_plain_set_nested_inplace 79.7600μs 19.7538μs 50.6232 KOps/s 49.9103 KOps/s $\color{#35bf28}+1.43\%$
test_plain_set_stack_nested_inplace 61.7360μs 19.7436μs 50.6494 KOps/s 49.5618 KOps/s $\color{#35bf28}+2.19\%$
test_items 34.1440μs 2.3533μs 424.9436 KOps/s 406.5861 KOps/s $\color{#35bf28}+4.52\%$
test_items_nested 0.9292ms 0.2709ms 3.6907 KOps/s 3.6258 KOps/s $\color{#35bf28}+1.79\%$
test_items_nested_locked 0.5754ms 0.2695ms 3.7105 KOps/s 3.6295 KOps/s $\color{#35bf28}+2.23\%$
test_items_nested_leaf 0.4578ms 0.1666ms 6.0019 KOps/s 5.8148 KOps/s $\color{#35bf28}+3.22\%$
test_items_stack_nested 0.4572ms 0.2709ms 3.6918 KOps/s 3.5735 KOps/s $\color{#35bf28}+3.31\%$
test_items_stack_nested_leaf 0.3304ms 0.1685ms 5.9355 KOps/s 5.8696 KOps/s $\color{#35bf28}+1.12\%$
test_items_stack_nested_locked 0.8195ms 0.2717ms 3.6801 KOps/s 3.5387 KOps/s $\color{#35bf28}+4.00\%$
test_keys 52.3300μs 3.8848μs 257.4168 KOps/s 253.5633 KOps/s $\color{#35bf28}+1.52\%$
test_keys_nested 2.1061ms 0.1516ms 6.5959 KOps/s 6.6460 KOps/s $\color{#d91a1a}-0.75\%$
test_keys_nested_locked 0.2619ms 0.1555ms 6.4304 KOps/s 6.5313 KOps/s $\color{#d91a1a}-1.55\%$
test_keys_nested_leaf 39.0402ms 0.1379ms 7.2497 KOps/s 7.6137 KOps/s $\color{#d91a1a}-4.78\%$
test_keys_stack_nested 0.2622ms 0.1518ms 6.5875 KOps/s 6.6039 KOps/s $\color{#d91a1a}-0.25\%$
test_keys_stack_nested_leaf 0.2287ms 0.1314ms 7.6110 KOps/s 7.4693 KOps/s $\color{#35bf28}+1.90\%$
test_keys_stack_nested_locked 0.3256ms 0.1572ms 6.3594 KOps/s 6.3960 KOps/s $\color{#d91a1a}-0.57\%$
test_values 9.1652μs 1.1418μs 875.7866 KOps/s 860.0100 KOps/s $\color{#35bf28}+1.83\%$
test_values_nested 97.0520μs 52.4259μs 19.0746 KOps/s 18.2742 KOps/s $\color{#35bf28}+4.38\%$
test_values_nested_locked 93.0350μs 52.1776μs 19.1653 KOps/s 18.3294 KOps/s $\color{#35bf28}+4.56\%$
test_values_nested_leaf 94.3070μs 46.7717μs 21.3805 KOps/s 20.2478 KOps/s $\textbf{\color{#35bf28}+5.59\%}$
test_values_stack_nested 97.9740μs 53.0536μs 18.8489 KOps/s 18.1655 KOps/s $\color{#35bf28}+3.76\%$
test_values_stack_nested_leaf 88.7060μs 46.9686μs 21.2908 KOps/s 20.7738 KOps/s $\color{#35bf28}+2.49\%$
test_values_stack_nested_locked 0.1378ms 52.7949μs 18.9412 KOps/s 18.1594 KOps/s $\color{#35bf28}+4.31\%$
test_membership 36.9200μs 1.3572μs 736.8122 KOps/s 684.5140 KOps/s $\textbf{\color{#35bf28}+7.64\%}$
test_membership_nested 48.1820μs 3.4611μs 288.9227 KOps/s 282.9253 KOps/s $\color{#35bf28}+2.12\%$
test_membership_nested_leaf 31.2590μs 3.4826μs 287.1458 KOps/s 281.6471 KOps/s $\color{#35bf28}+1.95\%$
test_membership_stacked_nested 16.0900μs 3.4631μs 288.7554 KOps/s 288.3582 KOps/s $\color{#35bf28}+0.14\%$
test_membership_stacked_nested_leaf 21.9410μs 3.4940μs 286.2014 KOps/s 282.2340 KOps/s $\color{#35bf28}+1.41\%$
test_membership_nested_last 30.2060μs 6.8432μs 146.1296 KOps/s 143.8332 KOps/s $\color{#35bf28}+1.60\%$
test_membership_nested_leaf_last 30.8280μs 6.7799μs 147.4957 KOps/s 144.0723 KOps/s $\color{#35bf28}+2.38\%$
test_membership_stacked_nested_last 27.0810μs 6.7933μs 147.2042 KOps/s 62.5802 KOps/s $\textbf{\color{#35bf28}+135.22\%}$
test_membership_stacked_nested_leaf_last 38.9330μs 6.8573μs 145.8305 KOps/s 61.5953 KOps/s $\textbf{\color{#35bf28}+136.76\%}$
test_nested_getleaf 56.0840μs 10.5607μs 94.6910 KOps/s 92.0629 KOps/s $\color{#35bf28}+2.85\%$
test_nested_get 51.9870μs 9.9884μs 100.1161 KOps/s 97.1049 KOps/s $\color{#35bf28}+3.10\%$
test_stacked_getleaf 42.7310μs 10.4558μs 95.6405 KOps/s 92.2989 KOps/s $\color{#35bf28}+3.62\%$
test_stacked_get 42.8610μs 9.9463μs 100.5400 KOps/s 96.9634 KOps/s $\color{#35bf28}+3.69\%$
test_nested_getitemleaf 36.7590μs 12.2844μs 81.4042 KOps/s 82.0089 KOps/s $\color{#d91a1a}-0.74\%$
test_nested_getitem 47.4090μs 11.8047μs 84.7117 KOps/s 86.8038 KOps/s $\color{#d91a1a}-2.41\%$
test_stacked_getitemleaf 41.0370μs 12.0752μs 82.8143 KOps/s 83.8970 KOps/s $\color{#d91a1a}-1.29\%$
test_stacked_getitem 49.0610μs 11.6538μs 85.8091 KOps/s 87.3205 KOps/s $\color{#d91a1a}-1.73\%$
test_lock_nested 0.8780ms 0.3451ms 2.8975 KOps/s 3.0253 KOps/s $\color{#d91a1a}-4.22\%$
test_lock_stack_nested 0.5168ms 0.3099ms 3.2269 KOps/s 3.4980 KOps/s $\textbf{\color{#d91a1a}-7.75\%}$
test_unlock_nested 89.9095ms 0.4377ms 2.2845 KOps/s 2.3998 KOps/s $\color{#d91a1a}-4.80\%$
test_unlock_stack_nested 0.4664ms 0.3185ms 3.1398 KOps/s 3.3963 KOps/s $\textbf{\color{#d91a1a}-7.55\%}$
test_flatten_speed 2.5714ms 0.3674ms 2.7215 KOps/s 2.7465 KOps/s $\color{#d91a1a}-0.91\%$
test_unflatten_speed 0.6638ms 0.4599ms 2.1743 KOps/s 2.1569 KOps/s $\color{#35bf28}+0.81\%$
test_common_ops 4.9585ms 0.7311ms 1.3678 KOps/s 1.4440 KOps/s $\textbf{\color{#d91a1a}-5.27\%}$
test_creation 35.1060μs 1.8463μs 541.6128 KOps/s 550.9462 KOps/s $\color{#d91a1a}-1.69\%$
test_creation_empty 38.3210μs 11.0661μs 90.3662 KOps/s 94.2167 KOps/s $\color{#d91a1a}-4.09\%$
test_creation_nested_1 40.6160μs 13.8811μs 72.0406 KOps/s 75.2717 KOps/s $\color{#d91a1a}-4.29\%$
test_creation_nested_2 52.7990μs 17.0146μs 58.7731 KOps/s 61.1268 KOps/s $\color{#d91a1a}-3.85\%$
test_clone 88.9170μs 13.3579μs 74.8620 KOps/s 77.2471 KOps/s $\color{#d91a1a}-3.09\%$
test_getitem[int] 43.6920μs 11.7652μs 84.9965 KOps/s 90.2082 KOps/s $\textbf{\color{#d91a1a}-5.78\%}$
test_getitem[slice_int] 0.1126ms 23.2839μs 42.9481 KOps/s 45.1111 KOps/s $\color{#d91a1a}-4.79\%$
test_getitem[range] 0.2657ms 42.7495μs 23.3921 KOps/s 24.3627 KOps/s $\color{#d91a1a}-3.98\%$
test_getitem[tuple] 60.6440μs 19.2542μs 51.9367 KOps/s 55.2798 KOps/s $\textbf{\color{#d91a1a}-6.05\%}$
test_getitem[list] 0.3095ms 37.4988μs 26.6675 KOps/s 26.7901 KOps/s $\color{#d91a1a}-0.46\%$
test_setitem_dim[int] 59.8820μs 31.9742μs 31.2752 KOps/s 32.1535 KOps/s $\color{#d91a1a}-2.73\%$
test_setitem_dim[slice_int] 0.1061ms 58.0133μs 17.2374 KOps/s 18.0437 KOps/s $\color{#d91a1a}-4.47\%$
test_setitem_dim[range] 0.1452ms 85.0691μs 11.7552 KOps/s 13.2966 KOps/s $\textbf{\color{#d91a1a}-11.59\%}$
test_setitem_dim[tuple] 78.9180μs 47.1824μs 21.1943 KOps/s 22.0299 KOps/s $\color{#d91a1a}-3.79\%$
test_setitem 0.1292ms 20.7991μs 48.0791 KOps/s 51.1585 KOps/s $\textbf{\color{#d91a1a}-6.02\%}$
test_set 0.1247ms 20.3047μs 49.2496 KOps/s 52.6875 KOps/s $\textbf{\color{#d91a1a}-6.53\%}$
test_set_shared 2.0579ms 0.1465ms 6.8275 KOps/s 7.2425 KOps/s $\textbf{\color{#d91a1a}-5.73\%}$
test_update 0.1280ms 23.3953μs 42.7437 KOps/s 44.5357 KOps/s $\color{#d91a1a}-4.02\%$
test_update_nested 0.1663ms 31.5126μs 31.7333 KOps/s 33.3783 KOps/s $\color{#d91a1a}-4.93\%$
test_set_nested 0.1225ms 22.3423μs 44.7582 KOps/s 47.2722 KOps/s $\textbf{\color{#d91a1a}-5.32\%}$
test_set_nested_new 0.1035ms 26.3507μs 37.9497 KOps/s 40.1796 KOps/s $\textbf{\color{#d91a1a}-5.55\%}$
test_select 0.1394ms 39.9835μs 25.0103 KOps/s 26.7276 KOps/s $\textbf{\color{#d91a1a}-6.43\%}$
test_select_nested 0.1183ms 59.9581μs 16.6783 KOps/s 17.4766 KOps/s $\color{#d91a1a}-4.57\%$
test_exclude_nested 0.2729ms 0.1195ms 8.3700 KOps/s 8.6279 KOps/s $\color{#d91a1a}-2.99\%$
test_empty[True] 0.7091ms 0.4224ms 2.3675 KOps/s 2.4565 KOps/s $\color{#d91a1a}-3.62\%$
test_empty[False] 8.0592μs 1.0874μs 919.6647 KOps/s 968.6325 KOps/s $\textbf{\color{#d91a1a}-5.06\%}$
test_unbind_speed 0.3721ms 0.2700ms 3.7032 KOps/s 4.0528 KOps/s $\textbf{\color{#d91a1a}-8.63\%}$
test_unbind_speed_stack0 0.4378ms 0.2494ms 4.0103 KOps/s 4.2638 KOps/s $\textbf{\color{#d91a1a}-5.95\%}$
test_unbind_speed_stack1 0.1285s 0.6935ms 1.4421 KOps/s 1.5446 KOps/s $\textbf{\color{#d91a1a}-6.64\%}$
test_split 0.1152s 1.7104ms 584.6637 Ops/s 584.5641 Ops/s $\color{#35bf28}+0.02\%$
test_chunk 2.3227ms 1.5128ms 661.0178 Ops/s 679.5059 Ops/s $\color{#d91a1a}-2.72\%$
test_creation[device0] 3.5498ms 0.1067ms 9.3697 KOps/s 9.4691 KOps/s $\color{#d91a1a}-1.05\%$
test_creation_from_tensor 0.1972ms 82.9072μs 12.0617 KOps/s 11.8166 KOps/s $\color{#35bf28}+2.07\%$
test_add_one[memmap_tensor0] 0.1138ms 5.5111μs 181.4512 KOps/s 177.0421 KOps/s $\color{#35bf28}+2.49\%$
test_contiguous[memmap_tensor0] 20.7990μs 0.6432μs 1.5548 MOps/s 1.5273 MOps/s $\color{#35bf28}+1.80\%$
test_stack[memmap_tensor0] 28.4130μs 3.7623μs 265.7957 KOps/s 273.5168 KOps/s $\color{#d91a1a}-2.82\%$
test_memmaptd_index 1.1344ms 0.2390ms 4.1834 KOps/s 4.1988 KOps/s $\color{#d91a1a}-0.37\%$
test_memmaptd_index_astensor 0.6870ms 0.3033ms 3.2974 KOps/s 3.3066 KOps/s $\color{#d91a1a}-0.28\%$
test_memmaptd_index_op 0.9749ms 0.6174ms 1.6196 KOps/s 1.6275 KOps/s $\color{#d91a1a}-0.48\%$
test_serialize_model 0.2217s 0.1176s 8.5007 Ops/s 8.1044 Ops/s $\color{#35bf28}+4.89\%$
test_serialize_model_pickle 0.4519s 0.3811s 2.6238 Ops/s 2.6356 Ops/s $\color{#d91a1a}-0.45\%$
test_serialize_weights 0.1093s 98.8503ms 10.1163 Ops/s 9.7909 Ops/s $\color{#35bf28}+3.32\%$
test_serialize_weights_returnearly 0.1294s 0.1217s 8.2170 Ops/s 7.0822 Ops/s $\textbf{\color{#35bf28}+16.02\%}$
test_serialize_weights_pickle 0.7896s 0.5015s 1.9941 Ops/s 2.3449 Ops/s $\textbf{\color{#d91a1a}-14.96\%}$
test_serialize_weights_filesystem 97.5070ms 92.8876ms 10.7657 Ops/s 10.5093 Ops/s $\color{#35bf28}+2.44\%$
test_serialize_model_filesystem 98.9760ms 95.4773ms 10.4737 Ops/s 10.5327 Ops/s $\color{#d91a1a}-0.56\%$
test_reshape_pytree 49.4030μs 21.1076μs 47.3762 KOps/s 48.5393 KOps/s $\color{#d91a1a}-2.40\%$
test_reshape_td 68.8190μs 31.8604μs 31.3869 KOps/s 31.9661 KOps/s $\color{#d91a1a}-1.81\%$
test_view_pytree 55.2740μs 21.3338μs 46.8740 KOps/s 47.8245 KOps/s $\color{#d91a1a}-1.99\%$
test_view_td 0.1406s 66.1486μs 15.1175 KOps/s 16.5772 KOps/s $\textbf{\color{#d91a1a}-8.81\%}$
test_unbind_pytree 76.3330μs 24.5752μs 40.6915 KOps/s 41.0396 KOps/s $\color{#d91a1a}-0.85\%$
test_unbind_td 0.1011ms 37.2459μs 26.8486 KOps/s 28.2274 KOps/s $\color{#d91a1a}-4.88\%$
test_split_pytree 64.5110μs 24.1037μs 41.4874 KOps/s 42.3478 KOps/s $\color{#d91a1a}-2.03\%$
test_split_td 0.1262ms 40.8793μs 24.4622 KOps/s 25.2835 KOps/s $\color{#d91a1a}-3.25\%$
test_add_pytree 71.8050μs 30.8831μs 32.3801 KOps/s 32.8774 KOps/s $\color{#d91a1a}-1.51\%$
test_add_td 0.1637ms 57.9354μs 17.2606 KOps/s 18.5938 KOps/s $\textbf{\color{#d91a1a}-7.17\%}$
test_distributed 0.2291ms 0.1034ms 9.6715 KOps/s 9.8502 KOps/s $\color{#d91a1a}-1.81\%$
test_tdmodule 0.1897ms 23.2260μs 43.0552 KOps/s 45.1489 KOps/s $\color{#d91a1a}-4.64\%$
test_tdmodule_dispatch 0.2018ms 45.5276μs 21.9647 KOps/s 22.9983 KOps/s $\color{#d91a1a}-4.49\%$
test_tdseq 0.3882ms 27.0276μs 36.9993 KOps/s 38.4745 KOps/s $\color{#d91a1a}-3.83\%$
test_tdseq_dispatch 0.1467ms 49.4527μs 20.2214 KOps/s 20.8148 KOps/s $\color{#d91a1a}-2.85\%$
test_instantiation_functorch 2.3156ms 1.3510ms 740.2112 Ops/s 768.0013 Ops/s $\color{#d91a1a}-3.62\%$
test_instantiation_td 1.6299ms 1.0175ms 982.7815 Ops/s 999.1495 Ops/s $\color{#d91a1a}-1.64\%$
test_exec_functorch 0.3408ms 0.1610ms 6.2109 KOps/s 6.2871 KOps/s $\color{#d91a1a}-1.21\%$
test_exec_functional_call 0.2934ms 0.1484ms 6.7370 KOps/s 6.7741 KOps/s $\color{#d91a1a}-0.55\%$
test_exec_td 0.2368ms 0.1464ms 6.8302 KOps/s 6.8446 KOps/s $\color{#d91a1a}-0.21\%$
test_exec_td_decorator 1.0033ms 0.1972ms 5.0710 KOps/s 5.1231 KOps/s $\color{#d91a1a}-1.02\%$
test_vmap_mlp_speed[True-True] 0.7006ms 0.4893ms 2.0439 KOps/s 2.0818 KOps/s $\color{#d91a1a}-1.82\%$
test_vmap_mlp_speed[True-False] 0.7663ms 0.4831ms 2.0701 KOps/s 2.0647 KOps/s $\color{#35bf28}+0.26\%$
test_vmap_mlp_speed[False-True] 0.6585ms 0.3931ms 2.5442 KOps/s 2.5412 KOps/s $\color{#35bf28}+0.12\%$
test_vmap_mlp_speed[False-False] 0.7352ms 0.3967ms 2.5207 KOps/s 2.5365 KOps/s $\color{#d91a1a}-0.62\%$
test_vmap_mlp_speed_decorator[True-True] 1.2177ms 0.5355ms 1.8674 KOps/s 1.8838 KOps/s $\color{#d91a1a}-0.87\%$
test_vmap_mlp_speed_decorator[True-False] 1.0108ms 0.5372ms 1.8616 KOps/s 1.9180 KOps/s $\color{#d91a1a}-2.94\%$
test_vmap_mlp_speed_decorator[False-True] 0.5359ms 0.4082ms 2.4497 KOps/s 2.4424 KOps/s $\color{#35bf28}+0.30\%$
test_vmap_mlp_speed_decorator[False-False] 0.8103ms 0.4090ms 2.4450 KOps/s 2.4486 KOps/s $\color{#d91a1a}-0.15\%$
test_to_module_speed[True] 2.0051ms 1.3967ms 715.9518 Ops/s 734.0346 Ops/s $\color{#d91a1a}-2.46\%$
test_to_module_speed[False] 2.4549ms 1.3806ms 724.2969 Ops/s 737.6815 Ops/s $\color{#d91a1a}-1.81\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 134. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.5501ms 14.3718μs 69.5806 KOps/s 68.4269 KOps/s $\color{#35bf28}+1.69\%$
test_plain_set_stack_nested 35.4000μs 14.4309μs 69.2958 KOps/s 67.8907 KOps/s $\color{#35bf28}+2.07\%$
test_plain_set_nested_inplace 47.5700μs 15.6411μs 63.9340 KOps/s 62.4794 KOps/s $\color{#35bf28}+2.33\%$
test_plain_set_stack_nested_inplace 0.1255ms 15.8599μs 63.0519 KOps/s 62.1695 KOps/s $\color{#35bf28}+1.42\%$
test_items 20.6700μs 4.7017μs 212.6875 KOps/s 210.7077 KOps/s $\color{#35bf28}+0.94\%$
test_items_nested 0.4019ms 0.3414ms 2.9290 KOps/s 2.9397 KOps/s $\color{#d91a1a}-0.37\%$
test_items_nested_locked 0.3752ms 0.3438ms 2.9087 KOps/s 2.9144 KOps/s $\color{#d91a1a}-0.20\%$
test_items_nested_leaf 0.2628ms 0.2018ms 4.9542 KOps/s 4.9498 KOps/s $\color{#35bf28}+0.09\%$
test_items_stack_nested 0.3898ms 0.3433ms 2.9127 KOps/s 2.9317 KOps/s $\color{#d91a1a}-0.65\%$
test_items_stack_nested_leaf 0.2314ms 0.1986ms 5.0344 KOps/s 4.9363 KOps/s $\color{#35bf28}+1.99\%$
test_items_stack_nested_locked 0.3662ms 0.3462ms 2.8882 KOps/s 2.9016 KOps/s $\color{#d91a1a}-0.46\%$
test_keys 25.4500μs 4.5867μs 218.0195 KOps/s 218.9410 KOps/s $\color{#d91a1a}-0.42\%$
test_keys_nested 46.1289ms 0.1006ms 9.9423 KOps/s 10.5484 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_keys_nested_locked 0.1433ms 98.0745μs 10.1963 KOps/s 10.2285 KOps/s $\color{#d91a1a}-0.31\%$
test_keys_nested_leaf 0.1138ms 77.6902μs 12.8716 KOps/s 12.8118 KOps/s $\color{#35bf28}+0.47\%$
test_keys_stack_nested 0.1265ms 93.8219μs 10.6585 KOps/s 10.5791 KOps/s $\color{#35bf28}+0.75\%$
test_keys_stack_nested_leaf 0.1313ms 77.5062μs 12.9022 KOps/s 12.7675 KOps/s $\color{#35bf28}+1.06\%$
test_keys_stack_nested_locked 0.1317ms 98.4986μs 10.1524 KOps/s 9.9933 KOps/s $\color{#35bf28}+1.59\%$
test_values 9.5070μs 1.8736μs 533.7349 KOps/s 531.3785 KOps/s $\color{#35bf28}+0.44\%$
test_values_nested 70.1810μs 45.9317μs 21.7714 KOps/s 21.9410 KOps/s $\color{#d91a1a}-0.77\%$
test_values_nested_locked 66.4810μs 47.9251μs 20.8659 KOps/s 20.7893 KOps/s $\color{#35bf28}+0.37\%$
test_values_nested_leaf 63.2710μs 39.7597μs 25.1511 KOps/s 24.8744 KOps/s $\color{#35bf28}+1.11\%$
test_values_stack_nested 78.5810μs 46.3733μs 21.5641 KOps/s 21.5439 KOps/s $\color{#35bf28}+0.09\%$
test_values_stack_nested_leaf 62.4600μs 39.4736μs 25.3334 KOps/s 24.7871 KOps/s $\color{#35bf28}+2.20\%$
test_values_stack_nested_locked 82.9510μs 48.5594μs 20.5933 KOps/s 20.6567 KOps/s $\color{#d91a1a}-0.31\%$
test_membership 3.8760μs 0.9570μs 1.0450 MOps/s 1.0399 MOps/s $\color{#35bf28}+0.49\%$
test_membership_nested 20.6600μs 2.9185μs 342.6377 KOps/s 344.8314 KOps/s $\color{#d91a1a}-0.64\%$
test_membership_nested_leaf 35.5900μs 2.9370μs 340.4889 KOps/s 347.9247 KOps/s $\color{#d91a1a}-2.14\%$
test_membership_stacked_nested 22.4500μs 2.9187μs 342.6140 KOps/s 341.0859 KOps/s $\color{#35bf28}+0.45\%$
test_membership_stacked_nested_leaf 35.8100μs 2.9280μs 341.5353 KOps/s 346.4302 KOps/s $\color{#d91a1a}-1.41\%$
test_membership_nested_last 22.4510μs 5.3247μs 187.8055 KOps/s 189.5250 KOps/s $\color{#d91a1a}-0.91\%$
test_membership_nested_leaf_last 34.8400μs 5.3668μs 186.3312 KOps/s 188.9217 KOps/s $\color{#d91a1a}-1.37\%$
test_membership_stacked_nested_last 28.9400μs 12.6093μs 79.3066 KOps/s 187.9349 KOps/s $\textbf{\color{#d91a1a}-57.80\%}$
test_membership_stacked_nested_leaf_last 35.0410μs 12.6340μs 79.1513 KOps/s 187.6339 KOps/s $\textbf{\color{#d91a1a}-57.82\%}$
test_nested_getleaf 37.0510μs 8.4029μs 119.0067 KOps/s 118.0279 KOps/s $\color{#35bf28}+0.83\%$
test_nested_get 34.8410μs 7.9534μs 125.7332 KOps/s 125.2087 KOps/s $\color{#35bf28}+0.42\%$
test_stacked_getleaf 25.7100μs 8.4698μs 118.0664 KOps/s 118.8627 KOps/s $\color{#d91a1a}-0.67\%$
test_stacked_get 37.7210μs 8.0037μs 124.9416 KOps/s 124.7723 KOps/s $\color{#35bf28}+0.14\%$
test_nested_getitemleaf 26.1700μs 9.8099μs 101.9383 KOps/s 102.0859 KOps/s $\color{#d91a1a}-0.14\%$
test_nested_getitem 38.0000μs 9.3635μs 106.7980 KOps/s 107.1325 KOps/s $\color{#d91a1a}-0.31\%$
test_stacked_getitemleaf 37.0800μs 9.8160μs 101.8744 KOps/s 101.7385 KOps/s $\color{#35bf28}+0.13\%$
test_stacked_getitem 26.8110μs 9.3916μs 106.4783 KOps/s 106.5866 KOps/s $\color{#d91a1a}-0.10\%$
test_lock_nested 2.1012ms 0.3545ms 2.8205 KOps/s 2.7961 KOps/s $\color{#35bf28}+0.87\%$
test_lock_stack_nested 0.4221ms 0.3047ms 3.2816 KOps/s 3.2180 KOps/s $\color{#35bf28}+1.98\%$
test_unlock_nested 0.7395ms 0.3500ms 2.8571 KOps/s 2.8377 KOps/s $\color{#35bf28}+0.68\%$
test_unlock_stack_nested 0.3872ms 0.3138ms 3.1869 KOps/s 3.1190 KOps/s $\color{#35bf28}+2.18\%$
test_flatten_speed 0.4725ms 0.2610ms 3.8312 KOps/s 3.8196 KOps/s $\color{#35bf28}+0.30\%$
test_unflatten_speed 0.4012ms 0.3552ms 2.8150 KOps/s 2.8061 KOps/s $\color{#35bf28}+0.32\%$
test_common_ops 1.0831ms 0.6304ms 1.5863 KOps/s 1.5554 KOps/s $\color{#35bf28}+1.99\%$
test_creation 35.8500μs 1.5562μs 642.6083 KOps/s 635.1076 KOps/s $\color{#35bf28}+1.18\%$
test_creation_empty 27.5500μs 9.6314μs 103.8266 KOps/s 96.0593 KOps/s $\textbf{\color{#35bf28}+8.09\%}$
test_creation_nested_1 26.9400μs 11.4168μs 87.5903 KOps/s 82.2333 KOps/s $\textbf{\color{#35bf28}+6.51\%}$
test_creation_nested_2 44.0300μs 13.9081μs 71.9003 KOps/s 67.8359 KOps/s $\textbf{\color{#35bf28}+5.99\%}$
test_clone 61.8510μs 14.2258μs 70.2946 KOps/s 71.3439 KOps/s $\color{#d91a1a}-1.47\%$
test_getitem[int] 25.6010μs 11.2007μs 89.2800 KOps/s 89.8824 KOps/s $\color{#d91a1a}-0.67\%$
test_getitem[slice_int] 44.9500μs 22.1671μs 45.1119 KOps/s 45.7488 KOps/s $\color{#d91a1a}-1.39\%$
test_getitem[range] 67.8700μs 55.9164μs 17.8838 KOps/s 19.3003 KOps/s $\textbf{\color{#d91a1a}-7.34\%}$
test_getitem[tuple] 47.3310μs 19.4108μs 51.5178 KOps/s 52.0174 KOps/s $\color{#d91a1a}-0.96\%$
test_getitem[list] 0.1797ms 38.6364μs 25.8823 KOps/s 26.5184 KOps/s $\color{#d91a1a}-2.40\%$
test_setitem_dim[int] 45.7310μs 28.7892μs 34.7352 KOps/s 32.9806 KOps/s $\textbf{\color{#35bf28}+5.32\%}$
test_setitem_dim[slice_int] 69.0510μs 49.7960μs 20.0819 KOps/s 19.4010 KOps/s $\color{#35bf28}+3.51\%$
test_setitem_dim[range] 97.7020μs 71.1837μs 14.0482 KOps/s 13.5576 KOps/s $\color{#35bf28}+3.62\%$
test_setitem_dim[tuple] 64.6010μs 42.9815μs 23.2658 KOps/s 22.1469 KOps/s $\textbf{\color{#35bf28}+5.05\%}$
test_setitem 46.5600μs 19.9738μs 50.0655 KOps/s 50.6474 KOps/s $\color{#d91a1a}-1.15\%$
test_set 53.1310μs 19.2455μs 51.9602 KOps/s 51.0706 KOps/s $\color{#35bf28}+1.74\%$
test_set_shared 0.1275s 0.1293ms 7.7353 KOps/s 7.6581 KOps/s $\color{#35bf28}+1.01\%$
test_update 88.3220μs 21.8961μs 45.6702 KOps/s 44.3041 KOps/s $\color{#35bf28}+3.08\%$
test_update_nested 93.1610μs 29.0231μs 34.4553 KOps/s 33.5488 KOps/s $\color{#35bf28}+2.70\%$
test_set_nested 0.1045ms 20.5201μs 48.7327 KOps/s 48.9493 KOps/s $\color{#d91a1a}-0.44\%$
test_set_nested_new 65.3120μs 23.3250μs 42.8724 KOps/s 42.3698 KOps/s $\color{#35bf28}+1.19\%$
test_select 72.0910μs 36.7280μs 27.2272 KOps/s 27.4660 KOps/s $\color{#d91a1a}-0.87\%$
test_select_nested 89.7420μs 54.7631μs 18.2605 KOps/s 18.5131 KOps/s $\color{#d91a1a}-1.36\%$
test_exclude_nested 0.1640ms 0.1161ms 8.6125 KOps/s 8.5141 KOps/s $\color{#35bf28}+1.16\%$
test_empty[True] 0.4715ms 0.3923ms 2.5493 KOps/s 2.5448 KOps/s $\color{#35bf28}+0.18\%$
test_empty[False] 3.2800μs 0.8696μs 1.1500 MOps/s 1.1738 MOps/s $\color{#d91a1a}-2.03\%$
test_to 80.4010μs 56.2768μs 17.7693 KOps/s 18.2007 KOps/s $\color{#d91a1a}-2.37\%$
test_to_nonblocking 59.2700μs 35.9724μs 27.7991 KOps/s 27.4995 KOps/s $\color{#35bf28}+1.09\%$
test_unbind_speed 0.2993ms 0.2747ms 3.6398 KOps/s 3.7583 KOps/s $\color{#d91a1a}-3.15\%$
test_unbind_speed_stack0 0.3335ms 0.2675ms 3.7380 KOps/s 3.7412 KOps/s $\color{#d91a1a}-0.08\%$
test_unbind_speed_stack1 0.1280s 0.7637ms 1.3094 KOps/s 1.2711 KOps/s $\color{#35bf28}+3.02\%$
test_split 1.6506ms 1.5781ms 633.6841 Ops/s 660.4411 Ops/s $\color{#d91a1a}-4.05\%$
test_chunk 1.6608ms 1.5750ms 634.9025 Ops/s 658.5019 Ops/s $\color{#d91a1a}-3.58\%$
test_creation[device0] 0.1386ms 74.0583μs 13.5029 KOps/s 13.2183 KOps/s $\color{#35bf28}+2.15\%$
test_creation_from_tensor 0.1923ms 54.9097μs 18.2117 KOps/s 16.8470 KOps/s $\textbf{\color{#35bf28}+8.10\%}$
test_add_one[memmap_tensor0] 0.1530ms 7.3522μs 136.0146 KOps/s 141.6236 KOps/s $\color{#d91a1a}-3.96\%$
test_contiguous[memmap_tensor0] 26.1410μs 0.6422μs 1.5572 MOps/s 1.5529 MOps/s $\color{#35bf28}+0.28\%$
test_stack[memmap_tensor0] 30.9000μs 4.6123μs 216.8094 KOps/s 219.6467 KOps/s $\color{#d91a1a}-1.29\%$
test_memmaptd_index 1.0644ms 0.2730ms 3.6634 KOps/s 3.7324 KOps/s $\color{#d91a1a}-1.85\%$
test_memmaptd_index_astensor 0.6353ms 0.3293ms 3.0366 KOps/s 3.0870 KOps/s $\color{#d91a1a}-1.63\%$
test_memmaptd_index_op 0.9676ms 0.6506ms 1.5370 KOps/s 1.5349 KOps/s $\color{#35bf28}+0.14\%$
test_serialize_model 92.2240ms 89.6565ms 11.1537 Ops/s 9.0113 Ops/s $\textbf{\color{#35bf28}+23.77\%}$
test_serialize_model_pickle 1.3539s 1.2367s 0.8086 Ops/s 0.8060 Ops/s $\color{#35bf28}+0.31\%$
test_serialize_weights 91.1025ms 87.3359ms 11.4500 Ops/s 10.8448 Ops/s $\textbf{\color{#35bf28}+5.58\%}$
test_serialize_weights_returnearly 0.2570s 75.4997ms 13.2451 Ops/s 17.2226 Ops/s $\textbf{\color{#d91a1a}-23.09\%}$
test_serialize_weights_pickle 1.3529s 1.2361s 0.8090 Ops/s 0.8087 Ops/s $\color{#35bf28}+0.04\%$
test_reshape_pytree 52.9910μs 25.5285μs 39.1719 KOps/s 39.1975 KOps/s $\color{#d91a1a}-0.07\%$
test_reshape_td 0.1634ms 31.2241μs 32.0266 KOps/s 32.1198 KOps/s $\color{#d91a1a}-0.29\%$
test_view_pytree 92.6910μs 24.8049μs 40.3145 KOps/s 39.6810 KOps/s $\color{#35bf28}+1.60\%$
test_view_td 0.1334s 57.5894μs 17.3643 KOps/s 17.0370 KOps/s $\color{#35bf28}+1.92\%$
test_unbind_pytree 47.0010μs 30.5476μs 32.7358 KOps/s 32.8223 KOps/s $\color{#d91a1a}-0.26\%$
test_unbind_td 59.6110μs 40.4375μs 24.7295 KOps/s 25.0132 KOps/s $\color{#d91a1a}-1.13\%$
test_split_pytree 52.2400μs 28.8357μs 34.6793 KOps/s 34.5109 KOps/s $\color{#35bf28}+0.49\%$
test_split_td 0.1095ms 40.0704μs 24.9561 KOps/s 25.9915 KOps/s $\color{#d91a1a}-3.98\%$
test_add_pytree 0.2079ms 38.9761μs 25.6567 KOps/s 27.2145 KOps/s $\textbf{\color{#d91a1a}-5.72\%}$
test_add_td 0.1372ms 53.7290μs 18.6119 KOps/s 18.2081 KOps/s $\color{#35bf28}+2.22\%$
test_distributed 0.2126ms 70.8038μs 14.1235 KOps/s 13.9378 KOps/s $\color{#35bf28}+1.33\%$
test_tdmodule 33.9600μs 18.4136μs 54.3076 KOps/s 51.4369 KOps/s $\textbf{\color{#35bf28}+5.58\%}$
test_tdmodule_dispatch 0.1421ms 38.4164μs 26.0305 KOps/s 25.6067 KOps/s $\color{#35bf28}+1.66\%$
test_tdseq 38.1700μs 21.6991μs 46.0848 KOps/s 44.7135 KOps/s $\color{#35bf28}+3.07\%$
test_tdseq_dispatch 60.7200μs 40.8045μs 24.5071 KOps/s 23.7184 KOps/s $\color{#35bf28}+3.33\%$
test_instantiation_functorch 2.0665ms 1.6742ms 597.3026 Ops/s 577.9651 Ops/s $\color{#35bf28}+3.35\%$
test_instantiation_td 1.6919ms 1.1636ms 859.4172 Ops/s 867.0440 Ops/s $\color{#d91a1a}-0.88\%$
test_exec_functorch 0.2316ms 0.1618ms 6.1812 KOps/s 6.1418 KOps/s $\color{#35bf28}+0.64\%$
test_exec_functional_call 0.2230ms 0.1591ms 6.2834 KOps/s 6.3464 KOps/s $\color{#d91a1a}-0.99\%$
test_exec_td 0.2230ms 0.1531ms 6.5312 KOps/s 6.6974 KOps/s $\color{#d91a1a}-2.48\%$
test_exec_td_decorator 0.7321ms 0.2011ms 4.9715 KOps/s 5.0915 KOps/s $\color{#d91a1a}-2.36\%$
test_vmap_mlp_speed[True-True] 0.8096ms 0.6276ms 1.5934 KOps/s 1.6111 KOps/s $\color{#d91a1a}-1.10\%$
test_vmap_mlp_speed[True-False] 0.7756ms 0.6332ms 1.5793 KOps/s 1.6353 KOps/s $\color{#d91a1a}-3.42\%$
test_vmap_mlp_speed[False-True] 0.7332ms 0.5663ms 1.7657 KOps/s 1.8391 KOps/s $\color{#d91a1a}-3.99\%$
test_vmap_mlp_speed[False-False] 0.6226ms 0.5512ms 1.8142 KOps/s 1.7926 KOps/s $\color{#35bf28}+1.21\%$
test_vmap_mlp_speed_decorator[True-True] 1.2037ms 0.6672ms 1.4988 KOps/s 1.5218 KOps/s $\color{#d91a1a}-1.51\%$
test_vmap_mlp_speed_decorator[True-False] 0.7777ms 0.6644ms 1.5052 KOps/s 1.5186 KOps/s $\color{#d91a1a}-0.88\%$
test_vmap_mlp_speed_decorator[False-True] 0.8348ms 0.5600ms 1.7858 KOps/s 1.7899 KOps/s $\color{#d91a1a}-0.23\%$
test_vmap_mlp_speed_decorator[False-False] 0.7592ms 0.5679ms 1.7608 KOps/s 1.7879 KOps/s $\color{#d91a1a}-1.51\%$
test_vmap_transformer_speed[True-True] 8.6602ms 8.2516ms 121.1881 Ops/s 121.5280 Ops/s $\color{#d91a1a}-0.28\%$
test_vmap_transformer_speed[True-False] 8.7326ms 8.2572ms 121.1059 Ops/s 120.7353 Ops/s $\color{#35bf28}+0.31\%$
test_vmap_transformer_speed[False-True] 8.2419ms 8.1481ms 122.7278 Ops/s 123.1286 Ops/s $\color{#d91a1a}-0.33\%$
test_vmap_transformer_speed[False-False] 8.5416ms 8.1753ms 122.3190 Ops/s 122.7719 Ops/s $\color{#d91a1a}-0.37\%$
test_vmap_transformer_speed_decorator[True-True] 19.7146ms 19.5391ms 51.1794 Ops/s 51.3820 Ops/s $\color{#d91a1a}-0.39\%$
test_vmap_transformer_speed_decorator[True-False] 20.4489ms 19.6483ms 50.8951 Ops/s 51.1448 Ops/s $\color{#d91a1a}-0.49\%$
test_vmap_transformer_speed_decorator[False-True] 0.2024s 22.7651ms 43.9268 Ops/s 52.2444 Ops/s $\textbf{\color{#d91a1a}-15.92\%}$
test_vmap_transformer_speed_decorator[False-False] 20.2814ms 19.6668ms 50.8472 Ops/s 52.1415 Ops/s $\color{#d91a1a}-2.48\%$
test_to_module_speed[True] 1.3760ms 1.2545ms 797.1394 Ops/s 793.7452 Ops/s $\color{#35bf28}+0.43\%$
test_to_module_speed[False] 1.3745ms 1.2185ms 820.6565 Ops/s 817.4518 Ops/s $\color{#35bf28}+0.39\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Restoring multiagent nets
2 participants