Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Best attempt to densly stack sub-tds when LazyStacked TDS are passed to maybe_dense_stack #799

Merged
merged 6 commits into from
May 30, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented May 30, 2024

The goal of this PR is that stacking 2 identical (in structure) lazy stacks, the resulting tensordict is a lazy stack containing dense tensordicts.

cc @matteobettini @dtsaras

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 30, 2024
@vmoens vmoens added the enhancement New feature or request label May 30, 2024
Copy link

github-actions bot commented May 30, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}24$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.1380μs 17.2399μs 58.0050 KOps/s 61.0208 KOps/s $\color{#d91a1a}-4.94\%$
test_plain_set_stack_nested 52.4690μs 17.3866μs 57.5155 KOps/s 59.6639 KOps/s $\color{#d91a1a}-3.60\%$
test_plain_set_nested_inplace 65.5420μs 19.5800μs 51.0726 KOps/s 52.4264 KOps/s $\color{#d91a1a}-2.58\%$
test_plain_set_stack_nested_inplace 66.9550μs 19.6111μs 50.9916 KOps/s 53.2368 KOps/s $\color{#d91a1a}-4.22\%$
test_items 28.6830μs 2.5261μs 395.8700 KOps/s 401.1831 KOps/s $\color{#d91a1a}-1.32\%$
test_items_nested 0.5674ms 0.2678ms 3.7337 KOps/s 3.7280 KOps/s $\color{#35bf28}+0.15\%$
test_items_nested_locked 1.1837ms 0.2716ms 3.6825 KOps/s 3.6418 KOps/s $\color{#35bf28}+1.12\%$
test_items_nested_leaf 0.1551ms 76.6907μs 13.0394 KOps/s 13.0154 KOps/s $\color{#35bf28}+0.18\%$
test_items_stack_nested 0.8646ms 0.2701ms 3.7026 KOps/s 3.7263 KOps/s $\color{#d91a1a}-0.64\%$
test_items_stack_nested_leaf 0.1533ms 77.4368μs 12.9138 KOps/s 12.4886 KOps/s $\color{#35bf28}+3.40\%$
test_items_stack_nested_locked 0.5673ms 0.2711ms 3.6887 KOps/s 3.6524 KOps/s $\color{#35bf28}+0.99\%$
test_keys 22.4320μs 3.8684μs 258.5069 KOps/s 251.2199 KOps/s $\color{#35bf28}+2.90\%$
test_keys_nested 0.2638ms 0.1377ms 7.2626 KOps/s 7.2879 KOps/s $\color{#d91a1a}-0.35\%$
test_keys_nested_locked 0.7358ms 0.1415ms 7.0687 KOps/s 6.5193 KOps/s $\textbf{\color{#35bf28}+8.43\%}$
test_keys_nested_leaf 0.2262ms 0.1164ms 8.5889 KOps/s 8.5176 KOps/s $\color{#35bf28}+0.84\%$
test_keys_stack_nested 0.2333ms 0.1351ms 7.4040 KOps/s 7.3045 KOps/s $\color{#35bf28}+1.36\%$
test_keys_stack_nested_leaf 0.2303ms 0.1151ms 8.6844 KOps/s 8.5434 KOps/s $\color{#35bf28}+1.65\%$
test_keys_stack_nested_locked 0.2338ms 0.1387ms 7.2112 KOps/s 7.0602 KOps/s $\color{#35bf28}+2.14\%$
test_values 12.2955μs 1.1657μs 857.8806 KOps/s 872.9479 KOps/s $\color{#d91a1a}-1.73\%$
test_values_nested 88.9860μs 50.2366μs 19.9058 KOps/s 19.6808 KOps/s $\color{#35bf28}+1.14\%$
test_values_nested_locked 0.1163ms 50.2829μs 19.8875 KOps/s 19.5508 KOps/s $\color{#35bf28}+1.72\%$
test_values_nested_leaf 0.1011ms 46.2508μs 21.6213 KOps/s 21.7506 KOps/s $\color{#d91a1a}-0.59\%$
test_values_stack_nested 99.1960μs 51.7724μs 19.3153 KOps/s 19.1464 KOps/s $\color{#35bf28}+0.88\%$
test_values_stack_nested_leaf 83.4860μs 44.8847μs 22.2793 KOps/s 21.6353 KOps/s $\color{#35bf28}+2.98\%$
test_values_stack_nested_locked 0.1076ms 51.5448μs 19.4006 KOps/s 19.3303 KOps/s $\color{#35bf28}+0.36\%$
test_membership 33.6430μs 1.3642μs 733.0166 KOps/s 740.0969 KOps/s $\color{#d91a1a}-0.96\%$
test_membership_nested 47.0560μs 3.4688μs 288.2800 KOps/s 297.7270 KOps/s $\color{#d91a1a}-3.17\%$
test_membership_nested_leaf 21.3600μs 3.5011μs 285.6266 KOps/s 272.4442 KOps/s $\color{#35bf28}+4.84\%$
test_membership_stacked_nested 44.1020μs 3.4467μs 290.1344 KOps/s 277.9604 KOps/s $\color{#35bf28}+4.38\%$
test_membership_stacked_nested_leaf 25.2470μs 3.4734μs 287.8993 KOps/s 292.1649 KOps/s $\color{#d91a1a}-1.46\%$
test_membership_nested_last 27.1010μs 4.2507μs 235.2538 KOps/s 237.6049 KOps/s $\color{#d91a1a}-0.99\%$
test_membership_nested_leaf_last 49.9040μs 4.2370μs 236.0177 KOps/s 239.1778 KOps/s $\color{#d91a1a}-1.32\%$
test_membership_stacked_nested_last 20.6680μs 6.8387μs 146.2276 KOps/s 188.9014 KOps/s $\textbf{\color{#d91a1a}-22.59\%}$
test_membership_stacked_nested_leaf_last 54.5220μs 6.7721μs 147.6650 KOps/s 188.6992 KOps/s $\textbf{\color{#d91a1a}-21.75\%}$
test_nested_getleaf 54.1410μs 10.4353μs 95.8281 KOps/s 95.3513 KOps/s $\color{#35bf28}+0.50\%$
test_nested_get 44.9840μs 9.8822μs 101.1916 KOps/s 100.4421 KOps/s $\color{#35bf28}+0.75\%$
test_stacked_getleaf 56.3050μs 10.3023μs 97.0658 KOps/s 96.6423 KOps/s $\color{#35bf28}+0.44\%$
test_stacked_get 32.7620μs 9.7578μs 102.4817 KOps/s 101.4904 KOps/s $\color{#35bf28}+0.98\%$
test_nested_getitemleaf 42.2790μs 11.1984μs 89.2986 KOps/s 90.4287 KOps/s $\color{#d91a1a}-1.25\%$
test_nested_getitem 56.5860μs 10.1837μs 98.1960 KOps/s 98.1151 KOps/s $\color{#35bf28}+0.08\%$
test_stacked_getitemleaf 49.3820μs 11.4506μs 87.3318 KOps/s 92.2719 KOps/s $\textbf{\color{#d91a1a}-5.35\%}$
test_stacked_getitem 31.4990μs 10.0558μs 99.4456 KOps/s 99.8725 KOps/s $\color{#d91a1a}-0.43\%$
test_lock_nested 0.7851ms 0.3477ms 2.8757 KOps/s 2.8953 KOps/s $\color{#d91a1a}-0.67\%$
test_lock_stack_nested 0.5446ms 0.3014ms 3.3175 KOps/s 3.2474 KOps/s $\color{#35bf28}+2.16\%$
test_unlock_nested 0.7450ms 0.3498ms 2.8585 KOps/s 2.5408 KOps/s $\textbf{\color{#35bf28}+12.51\%}$
test_unlock_stack_nested 0.5041ms 0.3097ms 3.2288 KOps/s 3.1614 KOps/s $\color{#35bf28}+2.13\%$
test_flatten_speed 0.5669ms 99.7113μs 10.0290 KOps/s 10.4799 KOps/s $\color{#d91a1a}-4.30\%$
test_unflatten_speed 0.6228ms 0.4092ms 2.4435 KOps/s 2.4216 KOps/s $\color{#35bf28}+0.91\%$
test_common_ops 3.5146ms 0.7194ms 1.3901 KOps/s 1.4446 KOps/s $\color{#d91a1a}-3.77\%$
test_creation 12.5030μs 1.9133μs 522.6621 KOps/s 520.7480 KOps/s $\color{#35bf28}+0.37\%$
test_creation_empty 34.4040μs 11.2396μs 88.9713 KOps/s 104.6206 KOps/s $\textbf{\color{#d91a1a}-14.96\%}$
test_creation_nested_1 41.5580μs 14.0626μs 71.1105 KOps/s 81.1235 KOps/s $\textbf{\color{#d91a1a}-12.34\%}$
test_creation_nested_2 0.2511ms 17.3453μs 57.6525 KOps/s 63.2576 KOps/s $\textbf{\color{#d91a1a}-8.86\%}$
test_clone 41.8980μs 13.4389μs 74.4111 KOps/s 73.5740 KOps/s $\color{#35bf28}+1.14\%$
test_getitem[int] 0.1867ms 13.1151μs 76.2480 KOps/s 88.7361 KOps/s $\textbf{\color{#d91a1a}-14.07\%}$
test_getitem[slice_int] 70.5420μs 22.5381μs 44.3693 KOps/s 43.3849 KOps/s $\color{#35bf28}+2.27\%$
test_getitem[range] 80.0100μs 61.0222μs 16.3875 KOps/s 17.5659 KOps/s $\textbf{\color{#d91a1a}-6.71\%}$
test_getitem[tuple] 50.3840μs 18.9148μs 52.8685 KOps/s 52.9558 KOps/s $\color{#d91a1a}-0.16\%$
test_getitem[list] 0.1092ms 40.8690μs 24.4684 KOps/s 24.8363 KOps/s $\color{#d91a1a}-1.48\%$
test_setitem_dim[int] 60.7440μs 35.5763μs 28.1086 KOps/s 30.9296 KOps/s $\textbf{\color{#d91a1a}-9.12\%}$
test_setitem_dim[slice_int] 0.1798ms 62.7414μs 15.9384 KOps/s 16.6233 KOps/s $\color{#d91a1a}-4.12\%$
test_setitem_dim[range] 0.1196ms 83.4135μs 11.9885 KOps/s 12.4352 KOps/s $\color{#d91a1a}-3.59\%$
test_setitem_dim[tuple] 86.8630μs 50.5220μs 19.7934 KOps/s 20.8188 KOps/s $\color{#d91a1a}-4.93\%$
test_setitem 75.8220μs 20.6278μs 48.4783 KOps/s 50.8417 KOps/s $\color{#d91a1a}-4.65\%$
test_set 47.5590μs 20.2371μs 49.4142 KOps/s 51.7978 KOps/s $\color{#d91a1a}-4.60\%$
test_set_shared 2.9429ms 0.1409ms 7.0961 KOps/s 7.2035 KOps/s $\color{#d91a1a}-1.49\%$
test_update 0.1020ms 22.4738μs 44.4963 KOps/s 48.4133 KOps/s $\textbf{\color{#d91a1a}-8.09\%}$
test_update_nested 0.2784ms 31.2964μs 31.9525 KOps/s 34.9567 KOps/s $\textbf{\color{#d91a1a}-8.59\%}$
test_update__nested 58.6300μs 25.2570μs 39.5929 KOps/s 40.2742 KOps/s $\color{#d91a1a}-1.69\%$
test_set_nested 60.9940μs 21.9529μs 45.5520 KOps/s 47.4747 KOps/s $\color{#d91a1a}-4.05\%$
test_set_nested_new 83.0050μs 26.9850μs 37.0576 KOps/s 39.6845 KOps/s $\textbf{\color{#d91a1a}-6.62\%}$
test_select 83.8860μs 41.8176μs 23.9134 KOps/s 24.3224 KOps/s $\color{#d91a1a}-1.68\%$
test_select_nested 0.1360ms 60.2073μs 16.6093 KOps/s 16.5563 KOps/s $\color{#35bf28}+0.32\%$
test_exclude_nested 0.4944ms 0.1225ms 8.1648 KOps/s 8.3622 KOps/s $\color{#d91a1a}-2.36\%$
test_empty[True] 0.6129ms 0.3940ms 2.5381 KOps/s 2.5534 KOps/s $\color{#d91a1a}-0.60\%$
test_empty[False] 10.3674μs 1.1713μs 853.7853 KOps/s 840.3650 KOps/s $\color{#35bf28}+1.60\%$
test_unbind_speed 1.5590ms 0.2606ms 3.8369 KOps/s 3.8365 KOps/s $+0.01\%$
test_unbind_speed_stack0 0.4972ms 0.2512ms 3.9810 KOps/s 3.9707 KOps/s $\color{#35bf28}+0.26\%$
test_unbind_speed_stack1 67.7382ms 0.7026ms 1.4232 KOps/s 1.2893 KOps/s $\textbf{\color{#35bf28}+10.38\%}$
test_split 68.6355ms 1.6226ms 616.2856 Ops/s 622.1792 Ops/s $\color{#d91a1a}-0.95\%$
test_chunk 66.3152ms 1.6167ms 618.5620 Ops/s 620.8810 Ops/s $\color{#d91a1a}-0.37\%$
test_creation[device0] 0.1857ms 84.0041μs 11.9042 KOps/s 11.9633 KOps/s $\color{#d91a1a}-0.49\%$
test_creation_from_tensor 3.8386ms 85.5313μs 11.6916 KOps/s 11.8731 KOps/s $\color{#d91a1a}-1.53\%$
test_add_one[memmap_tensor0] 67.0450μs 5.2829μs 189.2901 KOps/s 178.4011 KOps/s $\textbf{\color{#35bf28}+6.10\%}$
test_contiguous[memmap_tensor0] 9.2280μs 0.6304μs 1.5864 MOps/s 1.5932 MOps/s $\color{#d91a1a}-0.43\%$
test_stack[memmap_tensor0] 51.6060μs 3.4047μs 293.7092 KOps/s 280.8643 KOps/s $\color{#35bf28}+4.57\%$
test_memmaptd_index 0.9924ms 0.2559ms 3.9074 KOps/s 3.9731 KOps/s $\color{#d91a1a}-1.65\%$
test_memmaptd_index_astensor 0.7765ms 0.3327ms 3.0060 KOps/s 3.0728 KOps/s $\color{#d91a1a}-2.17\%$
test_memmaptd_index_op 0.9745ms 0.6188ms 1.6161 KOps/s 1.7096 KOps/s $\textbf{\color{#d91a1a}-5.47\%}$
test_serialize_model 0.1774s 0.1127s 8.8721 Ops/s 8.3531 Ops/s $\textbf{\color{#35bf28}+6.21\%}$
test_serialize_model_pickle 0.4462s 0.3740s 2.6736 Ops/s 2.6047 Ops/s $\color{#35bf28}+2.64\%$
test_serialize_weights 0.1646s 0.1102s 9.0731 Ops/s 8.7215 Ops/s $\color{#35bf28}+4.03\%$
test_serialize_weights_returnearly 0.1410s 0.1283s 7.7918 Ops/s 7.7932 Ops/s $\color{#d91a1a}-0.02\%$
test_serialize_weights_pickle 0.7404s 0.4808s 2.0798 Ops/s 2.3737 Ops/s $\textbf{\color{#d91a1a}-12.38\%}$
test_serialize_weights_filesystem 99.1066ms 92.3789ms 10.8250 Ops/s 9.7398 Ops/s $\textbf{\color{#35bf28}+11.14\%}$
test_serialize_model_filesystem 0.1627s 0.1001s 9.9887 Ops/s 10.6609 Ops/s $\textbf{\color{#d91a1a}-6.31\%}$
test_reshape_pytree 51.0350μs 25.2809μs 39.5556 KOps/s 40.0006 KOps/s $\color{#d91a1a}-1.11\%$
test_reshape_td 84.7090μs 34.2665μs 29.1830 KOps/s 29.0899 KOps/s $\color{#35bf28}+0.32\%$
test_view_pytree 58.1390μs 25.1049μs 39.8328 KOps/s 40.3423 KOps/s $\color{#d91a1a}-1.26\%$
test_view_td 93.8250μs 38.3650μs 26.0655 KOps/s 25.9877 KOps/s $\color{#35bf28}+0.30\%$
test_unbind_pytree 72.4260μs 29.0884μs 34.3780 KOps/s 34.5134 KOps/s $\color{#d91a1a}-0.39\%$
test_unbind_td 0.4461ms 38.5741μs 25.9241 KOps/s 26.6927 KOps/s $\color{#d91a1a}-2.88\%$
test_split_pytree 63.4590μs 28.9557μs 34.5355 KOps/s 34.9678 KOps/s $\color{#d91a1a}-1.24\%$
test_split_td 0.1266ms 41.2663μs 24.2328 KOps/s 24.4253 KOps/s $\color{#d91a1a}-0.79\%$
test_add_pytree 79.5990μs 34.3006μs 29.1540 KOps/s 29.5712 KOps/s $\color{#d91a1a}-1.41\%$
test_add_td 0.1367ms 55.7194μs 17.9471 KOps/s 19.2365 KOps/s $\textbf{\color{#d91a1a}-6.70\%}$
test_distributed 0.1800ms 0.1012ms 9.8798 KOps/s 9.8098 KOps/s $\color{#35bf28}+0.71\%$
test_tdmodule 37.0090μs 17.8689μs 55.9630 KOps/s 60.1068 KOps/s $\textbf{\color{#d91a1a}-6.89\%}$
test_tdmodule_dispatch 65.4820μs 35.6921μs 28.0174 KOps/s 29.7575 KOps/s $\textbf{\color{#d91a1a}-5.85\%}$
test_tdseq 40.9160μs 21.4418μs 46.6378 KOps/s 51.3158 KOps/s $\textbf{\color{#d91a1a}-9.12\%}$
test_tdseq_dispatch 65.7830μs 41.8305μs 23.9060 KOps/s 25.7080 KOps/s $\textbf{\color{#d91a1a}-7.01\%}$
test_instantiation_functorch 3.0941ms 1.3009ms 768.6999 Ops/s 760.8725 Ops/s $\color{#35bf28}+1.03\%$
test_instantiation_td 1.7859ms 1.0119ms 988.1992 Ops/s 995.4290 Ops/s $\color{#d91a1a}-0.73\%$
test_exec_functorch 0.2888ms 0.1618ms 6.1817 KOps/s 6.1066 KOps/s $\color{#35bf28}+1.23\%$
test_exec_functional_call 0.3336ms 0.1468ms 6.8118 KOps/s 6.3263 KOps/s $\textbf{\color{#35bf28}+7.67\%}$
test_exec_td 0.2347ms 0.1423ms 7.0271 KOps/s 6.7982 KOps/s $\color{#35bf28}+3.37\%$
test_exec_td_decorator 0.9583ms 0.2245ms 4.4551 KOps/s 4.0495 KOps/s $\textbf{\color{#35bf28}+10.01\%}$
test_vmap_mlp_speed[True-True] 0.8023ms 0.4980ms 2.0081 KOps/s 2.0718 KOps/s $\color{#d91a1a}-3.08\%$
test_vmap_mlp_speed[True-False] 0.7726ms 0.4951ms 2.0199 KOps/s 2.0788 KOps/s $\color{#d91a1a}-2.84\%$
test_vmap_mlp_speed[False-True] 0.6185ms 0.4006ms 2.4963 KOps/s 2.5346 KOps/s $\color{#d91a1a}-1.51\%$
test_vmap_mlp_speed[False-False] 0.6470ms 0.4012ms 2.4926 KOps/s 2.5416 KOps/s $\color{#d91a1a}-1.93\%$
test_vmap_mlp_speed_decorator[True-True] 1.1675ms 0.5708ms 1.7519 KOps/s 1.8032 KOps/s $\color{#d91a1a}-2.85\%$
test_vmap_mlp_speed_decorator[True-False] 1.0300ms 0.5686ms 1.7586 KOps/s 1.8107 KOps/s $\color{#d91a1a}-2.88\%$
test_vmap_mlp_speed_decorator[False-True] 0.6710ms 0.4638ms 2.1560 KOps/s 2.1871 KOps/s $\color{#d91a1a}-1.42\%$
test_vmap_mlp_speed_decorator[False-False] 0.7282ms 0.4652ms 2.1498 KOps/s 2.1827 KOps/s $\color{#d91a1a}-1.51\%$
test_to_module_speed[True] 2.0250ms 1.6922ms 590.9514 Ops/s 593.5919 Ops/s $\color{#d91a1a}-0.44\%$
test_to_module_speed[False] 2.6754ms 1.6702ms 598.7483 Ops/s 600.3164 Ops/s $\color{#d91a1a}-0.26\%$
test_tc_init 57.4070μs 30.1154μs 33.2056 KOps/s 38.5811 KOps/s $\textbf{\color{#d91a1a}-13.93\%}$
test_tc_init_nested 0.1104ms 61.9059μs 16.1535 KOps/s 18.3438 KOps/s $\textbf{\color{#d91a1a}-11.94\%}$
test_tc_first_layer_tensor 4.7919μs 0.6888μs 1.4517 MOps/s 1.4354 MOps/s $\color{#35bf28}+1.14\%$
test_tc_first_layer_nontensor 3.8744μs 0.6847μs 1.4605 MOps/s 1.4887 MOps/s $\color{#d91a1a}-1.89\%$
test_tc_second_layer_tensor 31.4290μs 1.8679μs 535.3475 KOps/s 540.5146 KOps/s $\color{#d91a1a}-0.96\%$
test_tc_second_layer_nontensor 16.9480μs 1.5455μs 647.0388 KOps/s 606.8925 KOps/s $\textbf{\color{#35bf28}+6.62\%}$
test_unbind 95.3116ms 6.7311ms 148.5639 Ops/s 136.3652 Ops/s $\textbf{\color{#35bf28}+8.95\%}$
test_full_like 15.6392ms 11.2311ms 89.0387 Ops/s 94.9418 Ops/s $\textbf{\color{#d91a1a}-6.22\%}$
test_zeros_like 12.7691ms 6.1638ms 162.2374 Ops/s 168.7050 Ops/s $\color{#d91a1a}-3.83\%$
test_ones_like 11.9608ms 6.5543ms 152.5713 Ops/s 160.6744 Ops/s $\textbf{\color{#d91a1a}-5.04\%}$
test_clone 15.3112ms 7.8692ms 127.0778 Ops/s 126.3658 Ops/s $\color{#35bf28}+0.56\%$
test_squeeze 79.4690μs 14.4905μs 69.0107 KOps/s 72.0751 KOps/s $\color{#d91a1a}-4.25\%$
test_unsqueeze 0.1189ms 60.5513μs 16.5149 KOps/s 16.6761 KOps/s $\color{#d91a1a}-0.97\%$
test_split 0.2443ms 0.1115ms 8.9693 KOps/s 8.7581 KOps/s $\color{#35bf28}+2.41\%$
test_permute 0.1987ms 0.1263ms 7.9149 KOps/s 7.9247 KOps/s $\color{#d91a1a}-0.12\%$
test_stack 28.6346ms 22.6927ms 44.0671 Ops/s 43.6584 Ops/s $\color{#35bf28}+0.94\%$
test_cat 28.8995ms 22.6409ms 44.1678 Ops/s 44.9788 Ops/s $\color{#d91a1a}-1.80\%$

Copy link

github-actions bot commented May 30, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}2$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.5618ms 13.2933μs 75.2260 KOps/s 75.8778 KOps/s $\color{#d91a1a}-0.86\%$
test_plain_set_stack_nested 48.0000μs 13.4286μs 74.4678 KOps/s 73.9275 KOps/s $\color{#35bf28}+0.73\%$
test_plain_set_nested_inplace 41.7400μs 14.5325μs 68.8112 KOps/s 68.5903 KOps/s $\color{#35bf28}+0.32\%$
test_plain_set_stack_nested_inplace 40.8200μs 14.7289μs 67.8935 KOps/s 68.2655 KOps/s $\color{#d91a1a}-0.54\%$
test_items 17.5900μs 4.6637μs 214.4238 KOps/s 210.6500 KOps/s $\color{#35bf28}+1.79\%$
test_items_nested 0.3752ms 0.3436ms 2.9102 KOps/s 2.9476 KOps/s $\color{#d91a1a}-1.27\%$
test_items_nested_locked 0.3868ms 0.3508ms 2.8508 KOps/s 2.9165 KOps/s $\color{#d91a1a}-2.25\%$
test_items_nested_leaf 0.1040ms 83.4444μs 11.9840 KOps/s 12.0688 KOps/s $\color{#d91a1a}-0.70\%$
test_items_stack_nested 0.3979ms 0.3439ms 2.9075 KOps/s 2.9000 KOps/s $\color{#35bf28}+0.26\%$
test_items_stack_nested_leaf 0.1045ms 83.9151μs 11.9168 KOps/s 12.1251 KOps/s $\color{#d91a1a}-1.72\%$
test_items_stack_nested_locked 0.3751ms 0.3470ms 2.8822 KOps/s 2.9294 KOps/s $\color{#d91a1a}-1.61\%$
test_keys 23.9600μs 4.3548μs 229.6331 KOps/s 230.1944 KOps/s $\color{#d91a1a}-0.24\%$
test_keys_nested 96.2410μs 67.2960μs 14.8597 KOps/s 14.8830 KOps/s $\color{#d91a1a}-0.16\%$
test_keys_nested_locked 2.0746ms 72.5474μs 13.7841 KOps/s 13.8542 KOps/s $\color{#d91a1a}-0.51\%$
test_keys_nested_leaf 92.0510μs 57.9310μs 17.2619 KOps/s 17.2579 KOps/s $\color{#35bf28}+0.02\%$
test_keys_stack_nested 87.0220μs 67.6823μs 14.7749 KOps/s 14.9076 KOps/s $\color{#d91a1a}-0.89\%$
test_keys_stack_nested_leaf 82.2510μs 58.1125μs 17.2080 KOps/s 17.2727 KOps/s $\color{#d91a1a}-0.37\%$
test_keys_stack_nested_locked 94.8610μs 72.8622μs 13.7245 KOps/s 13.8318 KOps/s $\color{#d91a1a}-0.78\%$
test_values 8.6367μs 1.8088μs 552.8502 KOps/s 544.3453 KOps/s $\color{#35bf28}+1.56\%$
test_values_nested 64.1110μs 35.3900μs 28.2566 KOps/s 28.4554 KOps/s $\color{#d91a1a}-0.70\%$
test_values_nested_locked 58.7510μs 37.0591μs 26.9839 KOps/s 26.9848 KOps/s $-0.00\%$
test_values_nested_leaf 53.8420μs 31.7707μs 31.4756 KOps/s 32.1321 KOps/s $\color{#d91a1a}-2.04\%$
test_values_stack_nested 63.0110μs 36.2387μs 27.5948 KOps/s 28.1058 KOps/s $\color{#d91a1a}-1.82\%$
test_values_stack_nested_leaf 63.8510μs 32.3240μs 30.9368 KOps/s 31.5539 KOps/s $\color{#d91a1a}-1.96\%$
test_values_stack_nested_locked 63.2310μs 38.1092μs 26.2404 KOps/s 26.9679 KOps/s $\color{#d91a1a}-2.70\%$
test_membership 13.0500μs 0.8484μs 1.1787 MOps/s 1.3772 MOps/s $\textbf{\color{#d91a1a}-14.42\%}$
test_membership_nested 31.5500μs 2.5998μs 384.6496 KOps/s 380.7351 KOps/s $\color{#35bf28}+1.03\%$
test_membership_nested_leaf 0.1304ms 2.6014μs 384.4049 KOps/s 378.8372 KOps/s $\color{#35bf28}+1.47\%$
test_membership_stacked_nested 34.0100μs 2.5997μs 384.6609 KOps/s 382.1272 KOps/s $\color{#35bf28}+0.66\%$
test_membership_stacked_nested_leaf 13.9200μs 2.5878μs 386.4230 KOps/s 381.5184 KOps/s $\color{#35bf28}+1.29\%$
test_membership_nested_last 34.5610μs 3.1479μs 317.6697 KOps/s 317.3087 KOps/s $\color{#35bf28}+0.11\%$
test_membership_nested_leaf_last 16.8100μs 3.1538μs 317.0773 KOps/s 316.0923 KOps/s $\color{#35bf28}+0.31\%$
test_membership_stacked_nested_last 20.9610μs 3.9161μs 255.3567 KOps/s 314.1615 KOps/s $\textbf{\color{#d91a1a}-18.72\%}$
test_membership_stacked_nested_leaf_last 35.6500μs 3.9495μs 253.1977 KOps/s 318.0146 KOps/s $\textbf{\color{#d91a1a}-20.38\%}$
test_nested_getleaf 46.1210μs 8.4447μs 118.4176 KOps/s 119.5118 KOps/s $\color{#d91a1a}-0.92\%$
test_nested_get 30.4310μs 7.9043μs 126.5141 KOps/s 126.9747 KOps/s $\color{#d91a1a}-0.36\%$
test_stacked_getleaf 25.2500μs 8.4047μs 118.9815 KOps/s 119.0787 KOps/s $\color{#d91a1a}-0.08\%$
test_stacked_get 25.6900μs 7.9317μs 126.0763 KOps/s 127.0536 KOps/s $\color{#d91a1a}-0.77\%$
test_nested_getitemleaf 39.7600μs 8.5937μs 116.3637 KOps/s 117.0969 KOps/s $\color{#d91a1a}-0.63\%$
test_nested_getitem 30.4700μs 8.1058μs 123.3682 KOps/s 124.2327 KOps/s $\color{#d91a1a}-0.70\%$
test_stacked_getitemleaf 25.6100μs 8.6317μs 115.8523 KOps/s 116.0213 KOps/s $\color{#d91a1a}-0.15\%$
test_stacked_getitem 33.7610μs 8.1084μs 123.3286 KOps/s 123.6391 KOps/s $\color{#d91a1a}-0.25\%$
test_lock_nested 60.2918ms 0.4262ms 2.3462 KOps/s 2.3594 KOps/s $\color{#d91a1a}-0.56\%$
test_lock_stack_nested 0.3401ms 0.3157ms 3.1681 KOps/s 3.1756 KOps/s $\color{#d91a1a}-0.24\%$
test_unlock_nested 62.6478ms 0.4251ms 2.3523 KOps/s 2.3571 KOps/s $\color{#d91a1a}-0.20\%$
test_unlock_stack_nested 0.3522ms 0.3229ms 3.0974 KOps/s 3.0946 KOps/s $\color{#35bf28}+0.09\%$
test_flatten_speed 0.1920ms 0.1040ms 9.6178 KOps/s 9.8803 KOps/s $\color{#d91a1a}-2.66\%$
test_unflatten_speed 0.3535ms 0.2911ms 3.4358 KOps/s 3.4213 KOps/s $\color{#35bf28}+0.42\%$
test_common_ops 1.2299ms 0.6087ms 1.6427 KOps/s 1.6604 KOps/s $\color{#d91a1a}-1.07\%$
test_creation 0.1840ms 1.6911μs 591.3255 KOps/s 594.3230 KOps/s $\color{#d91a1a}-0.50\%$
test_creation_empty 41.0310μs 9.4778μs 105.5096 KOps/s 106.9315 KOps/s $\color{#d91a1a}-1.33\%$
test_creation_nested_1 30.4010μs 11.3729μs 87.9281 KOps/s 89.3827 KOps/s $\color{#d91a1a}-1.63\%$
test_creation_nested_2 0.2006ms 13.6046μs 73.5046 KOps/s 74.4282 KOps/s $\color{#d91a1a}-1.24\%$
test_clone 64.1510μs 12.8301μs 77.9418 KOps/s 83.0808 KOps/s $\textbf{\color{#d91a1a}-6.19\%}$
test_getitem[int] 1.8991ms 11.8653μs 84.2796 KOps/s 86.1587 KOps/s $\color{#d91a1a}-2.18\%$
test_getitem[slice_int] 46.8620μs 22.1503μs 45.1461 KOps/s 46.7261 KOps/s $\color{#d91a1a}-3.38\%$
test_getitem[range] 68.0220μs 49.3688μs 20.2557 KOps/s 20.0607 KOps/s $\color{#35bf28}+0.97\%$
test_getitem[tuple] 53.6200μs 19.4067μs 51.5287 KOps/s 51.4525 KOps/s $\color{#35bf28}+0.15\%$
test_getitem[list] 0.2264ms 34.4587μs 29.0202 KOps/s 27.7884 KOps/s $\color{#35bf28}+4.43\%$
test_setitem_dim[int] 50.0310μs 31.7414μs 31.5046 KOps/s 31.9577 KOps/s $\color{#d91a1a}-1.42\%$
test_setitem_dim[slice_int] 68.4710μs 50.9308μs 19.6345 KOps/s 19.4338 KOps/s $\color{#35bf28}+1.03\%$
test_setitem_dim[range] 0.1028ms 68.4006μs 14.6198 KOps/s 14.3683 KOps/s $\color{#35bf28}+1.75\%$
test_setitem_dim[tuple] 64.8310μs 44.8554μs 22.2938 KOps/s 21.8194 KOps/s $\color{#35bf28}+2.17\%$
test_setitem 65.6910μs 17.8886μs 55.9015 KOps/s 57.8835 KOps/s $\color{#d91a1a}-3.42\%$
test_set 50.8510μs 17.3173μs 57.7458 KOps/s 59.5304 KOps/s $\color{#d91a1a}-3.00\%$
test_set_shared 1.4208ms 0.1004ms 9.9575 KOps/s 10.0255 KOps/s $\color{#d91a1a}-0.68\%$
test_update 85.8510μs 20.0913μs 49.7727 KOps/s 51.2975 KOps/s $\color{#d91a1a}-2.97\%$
test_update_nested 72.6010μs 25.0267μs 39.9573 KOps/s 40.6934 KOps/s $\color{#d91a1a}-1.81\%$
test_update__nested 56.1010μs 23.6489μs 42.2852 KOps/s 43.9103 KOps/s $\color{#d91a1a}-3.70\%$
test_set_nested 73.7610μs 18.2869μs 54.6841 KOps/s 55.9392 KOps/s $\color{#d91a1a}-2.24\%$
test_set_nested_new 68.3710μs 20.9964μs 47.6271 KOps/s 48.0990 KOps/s $\color{#d91a1a}-0.98\%$
test_select 74.1220μs 34.4069μs 29.0640 KOps/s 29.6823 KOps/s $\color{#d91a1a}-2.08\%$
test_select_nested 95.0420μs 55.3386μs 18.0706 KOps/s 18.0692 KOps/s $+0.01\%$
test_exclude_nested 0.1509ms 0.1089ms 9.1863 KOps/s 9.0065 KOps/s $\color{#35bf28}+2.00\%$
test_empty[True] 0.3873ms 0.3432ms 2.9134 KOps/s 2.8285 KOps/s $\color{#35bf28}+3.00\%$
test_empty[False] 2.6680μs 0.9251μs 1.0810 MOps/s 1.0723 MOps/s $\color{#35bf28}+0.81\%$
test_to 0.1047ms 79.6150μs 12.5604 KOps/s 13.0503 KOps/s $\color{#d91a1a}-3.75\%$
test_to_nonblocking 0.2211ms 63.0587μs 15.8582 KOps/s 16.7110 KOps/s $\textbf{\color{#d91a1a}-5.10\%}$
test_unbind_speed 1.5296ms 0.2799ms 3.5729 KOps/s 3.5739 KOps/s $\color{#d91a1a}-0.03\%$
test_unbind_speed_stack0 0.3277ms 0.2773ms 3.6063 KOps/s 3.5972 KOps/s $\color{#35bf28}+0.25\%$
test_unbind_speed_stack1 84.6137ms 0.8354ms 1.1970 KOps/s 1.1864 KOps/s $\color{#35bf28}+0.89\%$
test_split 78.4659ms 1.7522ms 570.7134 Ops/s 579.9782 Ops/s $\color{#d91a1a}-1.60\%$
test_chunk 78.2141ms 1.7475ms 572.2380 Ops/s 581.2616 Ops/s $\color{#d91a1a}-1.55\%$
test_creation[device0] 0.1996ms 60.3693μs 16.5647 KOps/s 16.7086 KOps/s $\color{#d91a1a}-0.86\%$
test_creation_from_tensor 0.1329ms 56.6635μs 17.6480 KOps/s 17.7765 KOps/s $\color{#d91a1a}-0.72\%$
test_add_one[memmap_tensor0] 81.0310μs 7.9791μs 125.3273 KOps/s 136.1717 KOps/s $\textbf{\color{#d91a1a}-7.96\%}$
test_contiguous[memmap_tensor0] 25.7910μs 0.7058μs 1.4168 MOps/s 1.4041 MOps/s $\color{#35bf28}+0.90\%$
test_stack[memmap_tensor0] 30.6310μs 5.2474μs 190.5719 KOps/s 197.6072 KOps/s $\color{#d91a1a}-3.56\%$
test_memmaptd_index 1.0830ms 0.3024ms 3.3071 KOps/s 3.2831 KOps/s $\color{#35bf28}+0.73\%$
test_memmaptd_index_astensor 0.7220ms 0.3727ms 2.6832 KOps/s 2.6668 KOps/s $\color{#35bf28}+0.61\%$
test_memmaptd_index_op 1.1528ms 0.7095ms 1.4095 KOps/s 1.4276 KOps/s $\color{#d91a1a}-1.27\%$
test_serialize_model 0.1864s 0.1124s 8.8957 Ops/s 8.3584 Ops/s $\textbf{\color{#35bf28}+6.43\%}$
test_serialize_model_pickle 1.3514s 1.2366s 0.8086 Ops/s 0.8079 Ops/s $\color{#35bf28}+0.09\%$
test_serialize_weights 0.1842s 0.1109s 9.0144 Ops/s 9.4856 Ops/s $\color{#d91a1a}-4.97\%$
test_serialize_weights_returnearly 0.2697s 0.1019s 9.8145 Ops/s 12.3241 Ops/s $\textbf{\color{#d91a1a}-20.36\%}$
test_serialize_weights_pickle 1.4017s 1.2542s 0.7973 Ops/s 0.8059 Ops/s $\color{#d91a1a}-1.07\%$
test_reshape_pytree 68.7910μs 26.7017μs 37.4508 KOps/s 37.6060 KOps/s $\color{#d91a1a}-0.41\%$
test_reshape_td 63.7510μs 31.7934μs 31.4531 KOps/s 31.5874 KOps/s $\color{#d91a1a}-0.43\%$
test_view_pytree 0.1753ms 26.4501μs 37.8070 KOps/s 37.9446 KOps/s $\color{#d91a1a}-0.36\%$
test_view_td 62.6810μs 36.1544μs 27.6592 KOps/s 27.7834 KOps/s $\color{#d91a1a}-0.45\%$
test_unbind_pytree 91.6220μs 32.5139μs 30.7561 KOps/s 30.7796 KOps/s $\color{#d91a1a}-0.08\%$
test_unbind_td 0.4559ms 42.7393μs 23.3977 KOps/s 23.9906 KOps/s $\color{#d91a1a}-2.47\%$
test_split_pytree 61.9920μs 34.9991μs 28.5722 KOps/s 27.5547 KOps/s $\color{#35bf28}+3.69\%$
test_split_td 0.1071ms 41.7088μs 23.9758 KOps/s 24.5117 KOps/s $\color{#d91a1a}-2.19\%$
test_add_pytree 70.3710μs 39.8937μs 25.0666 KOps/s 25.6754 KOps/s $\color{#d91a1a}-2.37\%$
test_add_td 86.0710μs 51.6141μs 19.3746 KOps/s 19.6111 KOps/s $\color{#d91a1a}-1.21\%$
test_distributed 0.2087ms 66.1867μs 15.1088 KOps/s 13.8371 KOps/s $\textbf{\color{#35bf28}+9.19\%}$
test_tdmodule 38.3400μs 15.0494μs 66.4478 KOps/s 64.7325 KOps/s $\color{#35bf28}+2.65\%$
test_tdmodule_dispatch 53.2110μs 29.3514μs 34.0700 KOps/s 34.3024 KOps/s $\color{#d91a1a}-0.68\%$
test_tdseq 32.8410μs 17.0678μs 58.5897 KOps/s 59.2385 KOps/s $\color{#d91a1a}-1.10\%$
test_tdseq_dispatch 50.3110μs 32.6542μs 30.6239 KOps/s 29.9594 KOps/s $\color{#35bf28}+2.22\%$
test_instantiation_functorch 1.6729ms 1.5289ms 654.0643 Ops/s 643.1564 Ops/s $\color{#35bf28}+1.70\%$
test_instantiation_td 1.5363ms 1.0467ms 955.3943 Ops/s 942.6270 Ops/s $\color{#35bf28}+1.35\%$
test_exec_functorch 0.1878ms 0.1554ms 6.4353 KOps/s 6.6135 KOps/s $\color{#d91a1a}-2.69\%$
test_exec_functional_call 0.1865ms 0.1461ms 6.8453 KOps/s 7.1329 KOps/s $\color{#d91a1a}-4.03\%$
test_exec_td 0.1740ms 0.1434ms 6.9756 KOps/s 7.0724 KOps/s $\color{#d91a1a}-1.37\%$
test_exec_td_decorator 0.6995ms 0.2171ms 4.6065 KOps/s 4.6762 KOps/s $\color{#d91a1a}-1.49\%$
test_vmap_mlp_speed[True-True] 0.7759ms 0.6045ms 1.6542 KOps/s 1.6412 KOps/s $\color{#35bf28}+0.79\%$
test_vmap_mlp_speed[True-False] 0.7405ms 0.6018ms 1.6617 KOps/s 1.6458 KOps/s $\color{#35bf28}+0.96\%$
test_vmap_mlp_speed[False-True] 0.6842ms 0.5315ms 1.8814 KOps/s 1.8706 KOps/s $\color{#35bf28}+0.58\%$
test_vmap_mlp_speed[False-False] 0.6710ms 0.5312ms 1.8826 KOps/s 1.8750 KOps/s $\color{#35bf28}+0.41\%$
test_vmap_mlp_speed_decorator[True-True] 1.2659ms 0.6629ms 1.5085 KOps/s 1.4925 KOps/s $\color{#35bf28}+1.07\%$
test_vmap_mlp_speed_decorator[True-False] 0.8069ms 0.6600ms 1.5152 KOps/s 1.5054 KOps/s $\color{#35bf28}+0.65\%$
test_vmap_mlp_speed_decorator[False-True] 0.7364ms 0.5863ms 1.7057 KOps/s 1.6949 KOps/s $\color{#35bf28}+0.64\%$
test_vmap_mlp_speed_decorator[False-False] 0.6972ms 0.5862ms 1.7060 KOps/s 1.6952 KOps/s $\color{#35bf28}+0.64\%$
test_vmap_transformer_speed[True-True] 8.1704ms 8.0729ms 123.8705 Ops/s 123.7563 Ops/s $\color{#35bf28}+0.09\%$
test_vmap_transformer_speed[True-False] 8.2426ms 8.0712ms 123.8968 Ops/s 124.4629 Ops/s $\color{#d91a1a}-0.45\%$
test_vmap_transformer_speed[False-True] 8.2453ms 8.0182ms 124.7164 Ops/s 122.4563 Ops/s $\color{#35bf28}+1.85\%$
test_vmap_transformer_speed[False-False] 8.1195ms 8.0052ms 124.9194 Ops/s 121.6465 Ops/s $\color{#35bf28}+2.69\%$
test_vmap_transformer_speed_decorator[True-True] 19.6718ms 19.4866ms 51.3173 Ops/s 50.3080 Ops/s $\color{#35bf28}+2.01\%$
test_vmap_transformer_speed_decorator[True-False] 19.5770ms 19.4482ms 51.4187 Ops/s 50.0549 Ops/s $\color{#35bf28}+2.72\%$
test_vmap_transformer_speed_decorator[False-True] 19.4967ms 19.3738ms 51.6160 Ops/s 50.4682 Ops/s $\color{#35bf28}+2.27\%$
test_vmap_transformer_speed_decorator[False-False] 20.0682ms 19.3744ms 51.6145 Ops/s 50.3859 Ops/s $\color{#35bf28}+2.44\%$
test_to_module_speed[True] 1.7035ms 1.5109ms 661.8533 Ops/s 643.0599 Ops/s $\color{#35bf28}+2.92\%$
test_to_module_speed[False] 1.6527ms 1.4914ms 670.5124 Ops/s 655.2732 Ops/s $\color{#35bf28}+2.33\%$
test_tc_init 74.7920μs 26.0798μs 38.3439 KOps/s 38.1273 KOps/s $\color{#35bf28}+0.57\%$
test_tc_init_nested 88.7210μs 53.5705μs 18.6670 KOps/s 17.8107 KOps/s $\color{#35bf28}+4.81\%$
test_tc_first_layer_tensor 1.1860μs 0.3573μs 2.7988 MOps/s 2.7481 MOps/s $\color{#35bf28}+1.84\%$
test_tc_first_layer_nontensor 10.3832μs 0.3903μs 2.5621 MOps/s 2.5692 MOps/s $\color{#d91a1a}-0.28\%$
test_tc_second_layer_tensor 16.9600μs 1.0767μs 928.7543 KOps/s 1.0254 MOps/s $\textbf{\color{#d91a1a}-9.42\%}$
test_tc_second_layer_nontensor 5.4118μs 0.8265μs 1.2100 MOps/s 1.2487 MOps/s $\color{#d91a1a}-3.10\%$
test_unbind 0.1046s 8.3051ms 120.4087 Ops/s 184.8951 Ops/s $\textbf{\color{#d91a1a}-34.88\%}$
test_full_like 14.5223ms 13.5712ms 73.6854 Ops/s 102.1083 Ops/s $\textbf{\color{#d91a1a}-27.84\%}$
test_zeros_like 8.0842ms 7.8666ms 127.1199 Ops/s 140.6153 Ops/s $\textbf{\color{#d91a1a}-9.60\%}$
test_ones_like 8.2602ms 7.8993ms 126.5935 Ops/s 139.6547 Ops/s $\textbf{\color{#d91a1a}-9.35\%}$
test_clone 10.3489ms 9.7199ms 102.8820 Ops/s 100.4318 Ops/s $\color{#35bf28}+2.44\%$
test_squeeze 66.5410μs 11.2613μs 88.7997 KOps/s 91.8872 KOps/s $\color{#d91a1a}-3.36\%$
test_unsqueeze 0.1875ms 53.8190μs 18.5808 KOps/s 19.2897 KOps/s $\color{#d91a1a}-3.68\%$
test_split 0.1794ms 99.5630μs 10.0439 KOps/s 9.9587 KOps/s $\color{#35bf28}+0.86\%$
test_permute 0.2024ms 0.1129ms 8.8569 KOps/s 9.0306 KOps/s $\color{#d91a1a}-1.92\%$
test_stack 30.6459ms 29.9010ms 33.4437 Ops/s 33.7804 Ops/s $\color{#d91a1a}-1.00\%$
test_cat 30.6095ms 29.8012ms 33.5557 Ops/s 33.6517 Ops/s $\color{#d91a1a}-0.29\%$

@vmoens vmoens merged commit 4abcf47 into main May 30, 2024
37 of 38 checks passed
@vmoens vmoens deleted the best-attempt-dense-stack branch May 30, 2024 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants