Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix lazy stack features (where and norm) #795

Merged
merged 4 commits into from
May 27, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented May 27, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 27, 2024
Copy link

github-actions bot commented May 27, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 32.5110μs 16.4753μs 60.6969 KOps/s 59.1149 KOps/s $\color{#35bf28}+2.68\%$
test_plain_set_stack_nested 33.5330μs 16.7667μs 59.6420 KOps/s 59.0406 KOps/s $\color{#35bf28}+1.02\%$
test_plain_set_nested_inplace 83.6370μs 18.5374μs 53.9450 KOps/s 52.8077 KOps/s $\color{#35bf28}+2.15\%$
test_plain_set_stack_nested_inplace 47.2080μs 18.5799μs 53.8217 KOps/s 53.2443 KOps/s $\color{#35bf28}+1.08\%$
test_items 17.3630μs 2.5743μs 388.4596 KOps/s 369.2930 KOps/s $\textbf{\color{#35bf28}+5.19\%}$
test_items_nested 0.4507ms 0.2694ms 3.7121 KOps/s 3.7555 KOps/s $\color{#d91a1a}-1.16\%$
test_items_nested_locked 1.0538ms 0.2716ms 3.6813 KOps/s 3.6497 KOps/s $\color{#35bf28}+0.87\%$
test_items_nested_leaf 0.1787ms 76.7361μs 13.0317 KOps/s 13.0042 KOps/s $\color{#35bf28}+0.21\%$
test_items_stack_nested 0.5015ms 0.2702ms 3.7014 KOps/s 3.6879 KOps/s $\color{#35bf28}+0.36\%$
test_items_stack_nested_leaf 0.1378ms 78.0173μs 12.8177 KOps/s 13.0026 KOps/s $\color{#d91a1a}-1.42\%$
test_items_stack_nested_locked 0.5737ms 0.2696ms 3.7094 KOps/s 3.7018 KOps/s $\color{#35bf28}+0.21\%$
test_keys 18.6750μs 3.9043μs 256.1291 KOps/s 261.8254 KOps/s $\color{#d91a1a}-2.18\%$
test_keys_nested 0.2322ms 0.1366ms 7.3192 KOps/s 7.3772 KOps/s $\color{#d91a1a}-0.79\%$
test_keys_nested_locked 0.7996ms 0.1419ms 7.0469 KOps/s 7.0946 KOps/s $\color{#d91a1a}-0.67\%$
test_keys_nested_leaf 0.2070ms 0.1168ms 8.5647 KOps/s 8.6319 KOps/s $\color{#d91a1a}-0.78\%$
test_keys_stack_nested 0.2652ms 0.1371ms 7.2916 KOps/s 7.4708 KOps/s $\color{#d91a1a}-2.40\%$
test_keys_stack_nested_leaf 0.1961ms 0.1155ms 8.6589 KOps/s 8.7283 KOps/s $\color{#d91a1a}-0.79\%$
test_keys_stack_nested_locked 0.3013ms 0.1410ms 7.0928 KOps/s 7.2009 KOps/s $\color{#d91a1a}-1.50\%$
test_values 7.8673μs 1.2693μs 787.8661 KOps/s 851.3185 KOps/s $\textbf{\color{#d91a1a}-7.45\%}$
test_values_nested 0.1066ms 49.7647μs 20.0946 KOps/s 19.8498 KOps/s $\color{#35bf28}+1.23\%$
test_values_nested_locked 0.1250ms 49.5554μs 20.1794 KOps/s 19.8123 KOps/s $\color{#35bf28}+1.85\%$
test_values_nested_leaf 86.1620μs 45.3733μs 22.0394 KOps/s 21.8687 KOps/s $\color{#35bf28}+0.78\%$
test_values_stack_nested 0.1018ms 51.2029μs 19.5301 KOps/s 19.3530 KOps/s $\color{#35bf28}+0.92\%$
test_values_stack_nested_leaf 83.3960μs 45.3182μs 22.0662 KOps/s 22.2093 KOps/s $\color{#d91a1a}-0.64\%$
test_values_stack_nested_locked 0.1117ms 50.2956μs 19.8825 KOps/s 19.4123 KOps/s $\color{#35bf28}+2.42\%$
test_membership 24.8760μs 1.3320μs 750.7612 KOps/s 735.5225 KOps/s $\color{#35bf28}+2.07\%$
test_membership_nested 18.3740μs 3.4106μs 293.2011 KOps/s 292.8551 KOps/s $\color{#35bf28}+0.12\%$
test_membership_nested_leaf 25.3580μs 3.4239μs 292.0669 KOps/s 293.0918 KOps/s $\color{#d91a1a}-0.35\%$
test_membership_stacked_nested 21.5510μs 3.4291μs 291.6197 KOps/s 289.2081 KOps/s $\color{#35bf28}+0.83\%$
test_membership_stacked_nested_leaf 30.3770μs 3.4147μs 292.8520 KOps/s 294.1225 KOps/s $\color{#d91a1a}-0.43\%$
test_membership_nested_last 19.7170μs 4.0955μs 244.1702 KOps/s 237.5661 KOps/s $\color{#35bf28}+2.78\%$
test_membership_nested_leaf_last 22.9230μs 4.1927μs 238.5116 KOps/s 237.9096 KOps/s $\color{#35bf28}+0.25\%$
test_membership_stacked_nested_last 21.3000μs 4.1014μs 243.8212 KOps/s 160.8667 KOps/s $\textbf{\color{#35bf28}+51.57\%}$
test_membership_stacked_nested_leaf_last 28.2430μs 4.1681μs 239.9197 KOps/s 159.6156 KOps/s $\textbf{\color{#35bf28}+50.31\%}$
test_nested_getleaf 37.7510μs 10.6827μs 93.6092 KOps/s 95.2766 KOps/s $\color{#d91a1a}-1.75\%$
test_nested_get 48.4400μs 10.0003μs 99.9969 KOps/s 100.2458 KOps/s $\color{#d91a1a}-0.25\%$
test_stacked_getleaf 38.6830μs 10.6759μs 93.6685 KOps/s 95.4106 KOps/s $\color{#d91a1a}-1.83\%$
test_stacked_get 35.6960μs 10.0283μs 99.7182 KOps/s 101.1248 KOps/s $\color{#d91a1a}-1.39\%$
test_nested_getitemleaf 25.4770μs 11.2833μs 88.6267 KOps/s 90.7655 KOps/s $\color{#d91a1a}-2.36\%$
test_nested_getitem 30.7280μs 10.2209μs 97.8384 KOps/s 98.1395 KOps/s $\color{#d91a1a}-0.31\%$
test_stacked_getitemleaf 33.3730μs 11.4015μs 87.7074 KOps/s 90.4865 KOps/s $\color{#d91a1a}-3.07\%$
test_stacked_getitem 25.5080μs 10.2776μs 97.2993 KOps/s 98.7549 KOps/s $\color{#d91a1a}-1.47\%$
test_lock_nested 49.0281ms 0.3874ms 2.5814 KOps/s 2.9020 KOps/s $\textbf{\color{#d91a1a}-11.05\%}$
test_lock_stack_nested 0.4836ms 0.3040ms 3.2897 KOps/s 3.2997 KOps/s $\color{#d91a1a}-0.30\%$
test_unlock_nested 1.4767ms 0.3496ms 2.8604 KOps/s 2.5450 KOps/s $\textbf{\color{#35bf28}+12.39\%}$
test_unlock_stack_nested 0.6807ms 0.3116ms 3.2089 KOps/s 3.2095 KOps/s $\color{#d91a1a}-0.02\%$
test_flatten_speed 0.5241ms 95.7501μs 10.4439 KOps/s 10.5293 KOps/s $\color{#d91a1a}-0.81\%$
test_unflatten_speed 0.5992ms 0.4112ms 2.4316 KOps/s 2.4046 KOps/s $\color{#35bf28}+1.12\%$
test_common_ops 5.6873ms 0.7047ms 1.4191 KOps/s 1.4097 KOps/s $\color{#35bf28}+0.66\%$
test_creation 27.1210μs 1.8670μs 535.6052 KOps/s 521.9958 KOps/s $\color{#35bf28}+2.61\%$
test_creation_empty 35.1960μs 9.7490μs 102.5745 KOps/s 95.4073 KOps/s $\textbf{\color{#35bf28}+7.51\%}$
test_creation_nested_1 35.3860μs 12.4046μs 80.6151 KOps/s 75.3363 KOps/s $\textbf{\color{#35bf28}+7.01\%}$
test_creation_nested_2 63.5990μs 15.7214μs 63.6076 KOps/s 60.3370 KOps/s $\textbf{\color{#35bf28}+5.42\%}$
test_clone 75.8820μs 13.1255μs 76.1875 KOps/s 75.4294 KOps/s $\color{#35bf28}+1.01\%$
test_getitem[int] 51.9770μs 11.1650μs 89.5655 KOps/s 85.9222 KOps/s $\color{#35bf28}+4.24\%$
test_getitem[slice_int] 65.1120μs 22.5371μs 44.3712 KOps/s 42.5402 KOps/s $\color{#35bf28}+4.30\%$
test_getitem[range] 81.1120μs 61.8478μs 16.1687 KOps/s 14.6524 KOps/s $\textbf{\color{#35bf28}+10.35\%}$
test_getitem[tuple] 58.4900μs 18.8509μs 53.0478 KOps/s 52.2638 KOps/s $\color{#35bf28}+1.50\%$
test_getitem[list] 0.1209ms 43.5720μs 22.9505 KOps/s 23.5021 KOps/s $\color{#d91a1a}-2.35\%$
test_setitem_dim[int] 63.8190μs 33.7638μs 29.6175 KOps/s 28.8157 KOps/s $\color{#35bf28}+2.78\%$
test_setitem_dim[slice_int] 0.1027ms 60.1646μs 16.6211 KOps/s 16.5119 KOps/s $\color{#35bf28}+0.66\%$
test_setitem_dim[range] 0.1316ms 81.3005μs 12.3000 KOps/s 11.6744 KOps/s $\textbf{\color{#35bf28}+5.36\%}$
test_setitem_dim[tuple] 94.7170μs 48.7229μs 20.5242 KOps/s 20.1435 KOps/s $\color{#35bf28}+1.89\%$
test_setitem 63.9400μs 20.0191μs 49.9523 KOps/s 50.8299 KOps/s $\color{#d91a1a}-1.73\%$
test_set 52.9590μs 19.0377μs 52.5272 KOps/s 52.1053 KOps/s $\color{#35bf28}+0.81\%$
test_set_shared 0.9093ms 0.1399ms 7.1500 KOps/s 7.0564 KOps/s $\color{#35bf28}+1.33\%$
test_update 0.1330ms 22.1822μs 45.0812 KOps/s 47.6303 KOps/s $\textbf{\color{#d91a1a}-5.35\%}$
test_update_nested 79.3180μs 30.2077μs 33.1042 KOps/s 34.0953 KOps/s $\color{#d91a1a}-2.91\%$
test_update__nested 56.2050μs 25.1701μs 39.7297 KOps/s 40.6232 KOps/s $\color{#d91a1a}-2.20\%$
test_set_nested 58.2890μs 21.2851μs 46.9812 KOps/s 44.4966 KOps/s $\textbf{\color{#35bf28}+5.58\%}$
test_set_nested_new 71.7940μs 25.6730μs 38.9514 KOps/s 39.9760 KOps/s $\color{#d91a1a}-2.56\%$
test_select 0.1232ms 39.8591μs 25.0884 KOps/s 24.2024 KOps/s $\color{#35bf28}+3.66\%$
test_select_nested 0.1569ms 58.8633μs 16.9885 KOps/s 16.5838 KOps/s $\color{#35bf28}+2.44\%$
test_exclude_nested 0.2644ms 0.1181ms 8.4642 KOps/s 8.3087 KOps/s $\color{#35bf28}+1.87\%$
test_empty[True] 0.6123ms 0.3922ms 2.5496 KOps/s 2.5406 KOps/s $\color{#35bf28}+0.35\%$
test_empty[False] 5.7628μs 1.1402μs 877.0767 KOps/s 841.9907 KOps/s $\color{#35bf28}+4.17\%$
test_unbind_speed 1.4473ms 0.2532ms 3.9489 KOps/s 3.8241 KOps/s $\color{#35bf28}+3.26\%$
test_unbind_speed_stack0 0.4170ms 0.2514ms 3.9775 KOps/s 4.0221 KOps/s $\color{#d91a1a}-1.11\%$
test_unbind_speed_stack1 68.2983ms 0.7315ms 1.3671 KOps/s 1.3466 KOps/s $\color{#35bf28}+1.52\%$
test_split 65.1105ms 1.6012ms 624.5245 Ops/s 611.3114 Ops/s $\color{#35bf28}+2.16\%$
test_chunk 61.0315ms 1.6019ms 624.2600 Ops/s 609.2231 Ops/s $\color{#35bf28}+2.47\%$
test_creation[device0] 3.7769ms 87.1928μs 11.4688 KOps/s 11.9263 KOps/s $\color{#d91a1a}-3.84\%$
test_creation_from_tensor 0.1712ms 85.8647μs 11.6462 KOps/s 11.7341 KOps/s $\color{#d91a1a}-0.75\%$
test_add_one[memmap_tensor0] 58.5400μs 5.6511μs 176.9563 KOps/s 182.8364 KOps/s $\color{#d91a1a}-3.22\%$
test_contiguous[memmap_tensor0] 8.2050μs 0.6509μs 1.5362 MOps/s 1.5201 MOps/s $\color{#35bf28}+1.06\%$
test_stack[memmap_tensor0] 21.5000μs 3.5392μs 282.5518 KOps/s 271.8132 KOps/s $\color{#35bf28}+3.95\%$
test_memmaptd_index 0.9757ms 0.2572ms 3.8875 KOps/s 3.8217 KOps/s $\color{#35bf28}+1.72\%$
test_memmaptd_index_astensor 0.7692ms 0.3354ms 2.9818 KOps/s 2.9833 KOps/s $\color{#d91a1a}-0.05\%$
test_memmaptd_index_op 0.9872ms 0.6026ms 1.6596 KOps/s 1.6233 KOps/s $\color{#35bf28}+2.23\%$
test_serialize_model 0.1634s 0.1101s 9.0842 Ops/s 8.8004 Ops/s $\color{#35bf28}+3.23\%$
test_serialize_model_pickle 0.4488s 0.3776s 2.6482 Ops/s 2.5996 Ops/s $\color{#35bf28}+1.87\%$
test_serialize_weights 0.1649s 0.1093s 9.1480 Ops/s 9.1307 Ops/s $\color{#35bf28}+0.19\%$
test_serialize_weights_returnearly 0.1316s 0.1238s 8.0783 Ops/s 7.3695 Ops/s $\textbf{\color{#35bf28}+9.62\%}$
test_serialize_weights_pickle 0.4510s 0.3917s 2.5532 Ops/s 2.3359 Ops/s $\textbf{\color{#35bf28}+9.30\%}$
test_serialize_weights_filesystem 0.1603s 0.1003s 9.9694 Ops/s 10.7423 Ops/s $\textbf{\color{#d91a1a}-7.19\%}$
test_serialize_model_filesystem 97.1785ms 92.7006ms 10.7874 Ops/s 10.5045 Ops/s $\color{#35bf28}+2.69\%$
test_reshape_pytree 58.5700μs 25.3277μs 39.4824 KOps/s 40.0666 KOps/s $\color{#d91a1a}-1.46\%$
test_reshape_td 65.7930μs 33.6412μs 29.7255 KOps/s 29.7921 KOps/s $\color{#d91a1a}-0.22\%$
test_view_pytree 56.6260μs 25.0168μs 39.9731 KOps/s 40.2436 KOps/s $\color{#d91a1a}-0.67\%$
test_view_td 80.6000μs 37.8568μs 26.4153 KOps/s 26.7092 KOps/s $\color{#d91a1a}-1.10\%$
test_unbind_pytree 62.8080μs 28.5312μs 35.0493 KOps/s 34.2720 KOps/s $\color{#35bf28}+2.27\%$
test_unbind_td 0.4270ms 37.7013μs 26.5243 KOps/s 26.1913 KOps/s $\color{#35bf28}+1.27\%$
test_split_pytree 61.7450μs 28.9277μs 34.5689 KOps/s 34.7007 KOps/s $\color{#d91a1a}-0.38\%$
test_split_td 0.5112ms 39.9226μs 25.0485 KOps/s 24.1559 KOps/s $\color{#35bf28}+3.70\%$
test_add_pytree 83.7270μs 34.5010μs 28.9847 KOps/s 29.0142 KOps/s $\color{#d91a1a}-0.10\%$
test_add_td 0.1142ms 54.0898μs 18.4878 KOps/s 18.3283 KOps/s $\color{#35bf28}+0.87\%$
test_distributed 0.2058ms 0.1013ms 9.8724 KOps/s 9.8461 KOps/s $\color{#35bf28}+0.27\%$
test_tdmodule 33.6530μs 16.8534μs 59.3352 KOps/s 59.1954 KOps/s $\color{#35bf28}+0.24\%$
test_tdmodule_dispatch 51.6270μs 33.6487μs 29.7189 KOps/s 29.7752 KOps/s $\color{#d91a1a}-0.19\%$
test_tdseq 39.4140μs 19.2784μs 51.8715 KOps/s 49.9093 KOps/s $\color{#35bf28}+3.93\%$
test_tdseq_dispatch 63.7890μs 38.2924μs 26.1148 KOps/s 25.5545 KOps/s $\color{#35bf28}+2.19\%$
test_instantiation_functorch 1.9355ms 1.3247ms 754.8724 Ops/s 772.7657 Ops/s $\color{#d91a1a}-2.32\%$
test_instantiation_td 65.9535ms 1.0731ms 931.8640 Ops/s 1.0009 KOps/s $\textbf{\color{#d91a1a}-6.90\%}$
test_exec_functorch 0.2901ms 0.1596ms 6.2655 KOps/s 6.3244 KOps/s $\color{#d91a1a}-0.93\%$
test_exec_functional_call 0.2862ms 0.1519ms 6.5848 KOps/s 6.8505 KOps/s $\color{#d91a1a}-3.88\%$
test_exec_td 0.2700ms 0.1470ms 6.8043 KOps/s 6.8867 KOps/s $\color{#d91a1a}-1.20\%$
test_exec_td_decorator 0.8618ms 0.2239ms 4.4669 KOps/s 4.6228 KOps/s $\color{#d91a1a}-3.37\%$
test_vmap_mlp_speed[True-True] 0.6758ms 0.4840ms 2.0663 KOps/s 2.0637 KOps/s $\color{#35bf28}+0.12\%$
test_vmap_mlp_speed[True-False] 0.7562ms 0.4823ms 2.0736 KOps/s 2.0733 KOps/s $\color{#35bf28}+0.01\%$
test_vmap_mlp_speed[False-True] 0.6551ms 0.3958ms 2.5263 KOps/s 2.5501 KOps/s $\color{#d91a1a}-0.93\%$
test_vmap_mlp_speed[False-False] 0.6696ms 0.3961ms 2.5246 KOps/s 2.5659 KOps/s $\color{#d91a1a}-1.61\%$
test_vmap_mlp_speed_decorator[True-True] 1.1063ms 0.5649ms 1.7703 KOps/s 1.8051 KOps/s $\color{#d91a1a}-1.93\%$
test_vmap_mlp_speed_decorator[True-False] 0.6910ms 0.5474ms 1.8270 KOps/s 1.8065 KOps/s $\color{#35bf28}+1.13\%$
test_vmap_mlp_speed_decorator[False-True] 0.9051ms 0.4557ms 2.1945 KOps/s 2.2125 KOps/s $\color{#d91a1a}-0.81\%$
test_vmap_mlp_speed_decorator[False-False] 0.5710ms 0.4525ms 2.2099 KOps/s 2.2151 KOps/s $\color{#d91a1a}-0.24\%$
test_to_module_speed[True] 2.2224ms 1.6543ms 604.4762 Ops/s 598.0521 Ops/s $\color{#35bf28}+1.07\%$
test_to_module_speed[False] 2.2241ms 1.6349ms 611.6489 Ops/s 600.5241 Ops/s $\color{#35bf28}+1.85\%$
test_tc_init 56.2850μs 26.7007μs 37.4522 KOps/s 34.7707 KOps/s $\textbf{\color{#35bf28}+7.71\%}$
test_tc_init_nested 0.1026ms 51.6512μs 19.3606 KOps/s 16.9753 KOps/s $\textbf{\color{#35bf28}+14.05\%}$
test_tc_first_layer_tensor 3.4550μs 0.6753μs 1.4809 MOps/s 1.5049 MOps/s $\color{#d91a1a}-1.60\%$
test_tc_first_layer_nontensor 2.4336μs 0.6541μs 1.5287 MOps/s 1.4803 MOps/s $\color{#35bf28}+3.27\%$
test_tc_second_layer_tensor 15.4890μs 1.8301μs 546.4168 KOps/s 540.8885 KOps/s $\color{#35bf28}+1.02\%$
test_tc_second_layer_nontensor 9.2137μs 1.5197μs 658.0197 KOps/s 654.8949 KOps/s $\color{#35bf28}+0.48\%$
test_unbind 80.2740ms 7.4583ms 134.0786 Ops/s 189.9941 Ops/s $\textbf{\color{#d91a1a}-29.43\%}$
test_full_like 15.8167ms 11.0333ms 90.6350 Ops/s 100.9392 Ops/s $\textbf{\color{#d91a1a}-10.21\%}$
test_zeros_like 11.9512ms 5.4736ms 182.6965 Ops/s 179.2579 Ops/s $\color{#35bf28}+1.92\%$
test_ones_like 10.9852ms 6.0765ms 164.5673 Ops/s 166.2605 Ops/s $\color{#d91a1a}-1.02\%$
test_clone 11.3230ms 7.2807ms 137.3496 Ops/s 135.7431 Ops/s $\color{#35bf28}+1.18\%$
test_squeeze 74.2590μs 13.5525μs 73.7869 KOps/s 74.7391 KOps/s $\color{#d91a1a}-1.27\%$
test_unsqueeze 0.1162ms 65.4150μs 15.2870 KOps/s 15.0601 KOps/s $\color{#35bf28}+1.51\%$
test_split 0.2541ms 0.1118ms 8.9475 KOps/s 9.0011 KOps/s $\color{#d91a1a}-0.59\%$
test_permute 0.3499ms 0.1359ms 7.3576 KOps/s 7.4351 KOps/s $\color{#d91a1a}-1.04\%$
test_stack 24.4249ms 21.2711ms 47.0122 Ops/s 48.6721 Ops/s $\color{#d91a1a}-3.41\%$
test_cat 43.9749ms 22.5347ms 44.3760 Ops/s 48.9653 Ops/s $\textbf{\color{#d91a1a}-9.37\%}$

Copy link

github-actions bot commented May 27, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}23$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.4933ms 12.4093μs 80.5846 KOps/s 76.3646 KOps/s $\textbf{\color{#35bf28}+5.53\%}$
test_plain_set_stack_nested 25.6110μs 12.4985μs 80.0096 KOps/s 76.4620 KOps/s $\color{#35bf28}+4.64\%$
test_plain_set_nested_inplace 37.3220μs 13.7142μs 72.9173 KOps/s 69.7825 KOps/s $\color{#35bf28}+4.49\%$
test_plain_set_stack_nested_inplace 0.1348ms 13.7713μs 72.6149 KOps/s 69.0948 KOps/s $\textbf{\color{#35bf28}+5.09\%}$
test_items 18.2410μs 4.6779μs 213.7689 KOps/s 210.3029 KOps/s $\color{#35bf28}+1.65\%$
test_items_nested 0.3733ms 0.3386ms 2.9530 KOps/s 2.9380 KOps/s $\color{#35bf28}+0.51\%$
test_items_nested_locked 0.5517ms 0.3397ms 2.9434 KOps/s 2.9381 KOps/s $\color{#35bf28}+0.18\%$
test_items_nested_leaf 0.2729ms 82.2863μs 12.1527 KOps/s 12.0922 KOps/s $\color{#35bf28}+0.50\%$
test_items_stack_nested 0.5272ms 0.3451ms 2.8975 KOps/s 2.9423 KOps/s $\color{#d91a1a}-1.52\%$
test_items_stack_nested_leaf 0.1041ms 83.8028μs 11.9328 KOps/s 11.9515 KOps/s $\color{#d91a1a}-0.16\%$
test_items_stack_nested_locked 0.3881ms 0.3440ms 2.9073 KOps/s 2.8978 KOps/s $\color{#35bf28}+0.33\%$
test_keys 17.6510μs 4.3427μs 230.2733 KOps/s 230.7186 KOps/s $\color{#d91a1a}-0.19\%$
test_keys_nested 89.7840μs 66.9066μs 14.9462 KOps/s 14.8680 KOps/s $\color{#35bf28}+0.53\%$
test_keys_nested_locked 0.7432ms 71.4548μs 13.9949 KOps/s 13.8212 KOps/s $\color{#35bf28}+1.26\%$
test_keys_nested_leaf 77.7140μs 57.2494μs 17.4674 KOps/s 17.2604 KOps/s $\color{#35bf28}+1.20\%$
test_keys_stack_nested 0.1206ms 67.0212μs 14.9206 KOps/s 15.0144 KOps/s $\color{#d91a1a}-0.62\%$
test_keys_stack_nested_leaf 89.9140μs 58.2466μs 17.1684 KOps/s 17.2186 KOps/s $\color{#d91a1a}-0.29\%$
test_keys_stack_nested_locked 0.1064ms 72.1977μs 13.8509 KOps/s 14.0270 KOps/s $\color{#d91a1a}-1.26\%$
test_values 11.1307μs 1.8050μs 554.0224 KOps/s 551.7121 KOps/s $\color{#35bf28}+0.42\%$
test_values_nested 65.0030μs 35.2330μs 28.3825 KOps/s 28.1191 KOps/s $\color{#35bf28}+0.94\%$
test_values_nested_locked 86.0940μs 37.0810μs 26.9680 KOps/s 26.5568 KOps/s $\color{#35bf28}+1.55\%$
test_values_nested_leaf 56.5830μs 31.6088μs 31.6368 KOps/s 31.7191 KOps/s $\color{#d91a1a}-0.26\%$
test_values_stack_nested 0.2141ms 36.3548μs 27.5067 KOps/s 27.7861 KOps/s $\color{#d91a1a}-1.01\%$
test_values_stack_nested_leaf 0.2086ms 32.4891μs 30.7795 KOps/s 30.7406 KOps/s $\color{#35bf28}+0.13\%$
test_values_stack_nested_locked 61.5730μs 38.0014μs 26.3148 KOps/s 26.2372 KOps/s $\color{#35bf28}+0.30\%$
test_membership 3.9816μs 0.7320μs 1.3661 MOps/s 1.3982 MOps/s $\color{#d91a1a}-2.30\%$
test_membership_nested 0.1607ms 2.6204μs 381.6206 KOps/s 389.3968 KOps/s $\color{#d91a1a}-2.00\%$
test_membership_nested_leaf 18.6800μs 2.6225μs 381.3176 KOps/s 387.8696 KOps/s $\color{#d91a1a}-1.69\%$
test_membership_stacked_nested 23.5410μs 2.6012μs 384.4338 KOps/s 386.0127 KOps/s $\color{#d91a1a}-0.41\%$
test_membership_stacked_nested_leaf 33.4710μs 2.5848μs 386.8719 KOps/s 385.8627 KOps/s $\color{#35bf28}+0.26\%$
test_membership_nested_last 0.1881ms 3.0961μs 322.9843 KOps/s 325.3619 KOps/s $\color{#d91a1a}-0.73\%$
test_membership_nested_leaf_last 42.7420μs 3.1168μs 320.8438 KOps/s 325.1759 KOps/s $\color{#d91a1a}-1.33\%$
test_membership_stacked_nested_last 31.8220μs 3.5639μs 280.5885 KOps/s 257.7704 KOps/s $\textbf{\color{#35bf28}+8.85\%}$
test_membership_stacked_nested_leaf_last 21.3210μs 3.5547μs 281.3165 KOps/s 258.4316 KOps/s $\textbf{\color{#35bf28}+8.86\%}$
test_nested_getleaf 23.8920μs 8.3483μs 119.7851 KOps/s 118.0945 KOps/s $\color{#35bf28}+1.43\%$
test_nested_get 30.3710μs 7.8741μs 126.9991 KOps/s 126.2290 KOps/s $\color{#35bf28}+0.61\%$
test_stacked_getleaf 35.7220μs 8.4105μs 118.8997 KOps/s 117.4553 KOps/s $\color{#35bf28}+1.23\%$
test_stacked_get 56.8430μs 7.9023μs 126.5452 KOps/s 125.5465 KOps/s $\color{#35bf28}+0.80\%$
test_nested_getitemleaf 32.2010μs 8.5365μs 117.1444 KOps/s 115.8331 KOps/s $\color{#35bf28}+1.13\%$
test_nested_getitem 29.3410μs 8.0519μs 124.1938 KOps/s 122.9695 KOps/s $\color{#35bf28}+1.00\%$
test_stacked_getitemleaf 22.5610μs 8.5619μs 116.7961 KOps/s 115.3495 KOps/s $\color{#35bf28}+1.25\%$
test_stacked_getitem 37.0020μs 8.0548μs 124.1499 KOps/s 123.0407 KOps/s $\color{#35bf28}+0.90\%$
test_lock_nested 59.7635ms 0.4063ms 2.4613 KOps/s 2.4595 KOps/s $\color{#35bf28}+0.07\%$
test_lock_stack_nested 0.4110ms 0.3007ms 3.3259 KOps/s 3.2987 KOps/s $\color{#35bf28}+0.83\%$
test_unlock_nested 0.7239ms 0.3474ms 2.8788 KOps/s 2.8500 KOps/s $\color{#35bf28}+1.01\%$
test_unlock_stack_nested 0.4113ms 0.3095ms 3.2309 KOps/s 3.1905 KOps/s $\color{#35bf28}+1.26\%$
test_flatten_speed 0.1872ms 0.1035ms 9.6600 KOps/s 9.8445 KOps/s $\color{#d91a1a}-1.87\%$
test_unflatten_speed 0.4286ms 0.2921ms 3.4235 KOps/s 3.4708 KOps/s $\color{#d91a1a}-1.36\%$
test_common_ops 1.1066ms 0.5577ms 1.7931 KOps/s 1.7322 KOps/s $\color{#35bf28}+3.52\%$
test_creation 34.5220μs 1.6194μs 617.5174 KOps/s 620.0551 KOps/s $\color{#d91a1a}-0.41\%$
test_creation_empty 25.4210μs 7.8557μs 127.2956 KOps/s 107.5311 KOps/s $\textbf{\color{#35bf28}+18.38\%}$
test_creation_nested_1 27.2620μs 9.5556μs 104.6505 KOps/s 90.5997 KOps/s $\textbf{\color{#35bf28}+15.51\%}$
test_creation_nested_2 41.8020μs 11.7843μs 84.8587 KOps/s 75.0125 KOps/s $\textbf{\color{#35bf28}+13.13\%}$
test_clone 0.2057ms 11.4486μs 87.3465 KOps/s 87.7159 KOps/s $\color{#d91a1a}-0.42\%$
test_getitem[int] 30.6110μs 10.7323μs 93.1763 KOps/s 95.0563 KOps/s $\color{#d91a1a}-1.98\%$
test_getitem[slice_int] 50.6420μs 20.2518μs 49.3783 KOps/s 50.7703 KOps/s $\color{#d91a1a}-2.74\%$
test_getitem[range] 63.1630μs 44.7322μs 22.3553 KOps/s 21.9915 KOps/s $\color{#35bf28}+1.65\%$
test_getitem[tuple] 39.9820μs 18.0194μs 55.4959 KOps/s 56.0926 KOps/s $\color{#d91a1a}-1.06\%$
test_getitem[list] 0.1518ms 31.6603μs 31.5853 KOps/s 31.7330 KOps/s $\color{#d91a1a}-0.47\%$
test_setitem_dim[int] 45.0420μs 28.6127μs 34.9496 KOps/s 33.4903 KOps/s $\color{#35bf28}+4.36\%$
test_setitem_dim[slice_int] 83.8630μs 50.6383μs 19.7479 KOps/s 20.5910 KOps/s $\color{#d91a1a}-4.09\%$
test_setitem_dim[range] 94.5540μs 64.3174μs 15.5479 KOps/s 14.9980 KOps/s $\color{#35bf28}+3.67\%$
test_setitem_dim[tuple] 0.1479ms 42.9187μs 23.2999 KOps/s 22.9398 KOps/s $\color{#35bf28}+1.57\%$
test_setitem 68.5230μs 15.5277μs 64.4011 KOps/s 61.0391 KOps/s $\textbf{\color{#35bf28}+5.51\%}$
test_set 49.9830μs 15.2202μs 65.7023 KOps/s 62.8732 KOps/s $\color{#35bf28}+4.50\%$
test_set_shared 1.1462ms 94.5110μs 10.5808 KOps/s 10.5403 KOps/s $\color{#35bf28}+0.38\%$
test_update 0.1085ms 17.0096μs 58.7905 KOps/s 53.7541 KOps/s $\textbf{\color{#35bf28}+9.37\%}$
test_update_nested 87.9240μs 22.2227μs 44.9990 KOps/s 42.6199 KOps/s $\textbf{\color{#35bf28}+5.58\%}$
test_update__nested 86.9340μs 21.7987μs 45.8743 KOps/s 45.6262 KOps/s $\color{#35bf28}+0.54\%$
test_set_nested 63.0930μs 16.2837μs 61.4113 KOps/s 59.0093 KOps/s $\color{#35bf28}+4.07\%$
test_set_nested_new 58.8730μs 18.6396μs 53.6493 KOps/s 50.3140 KOps/s $\textbf{\color{#35bf28}+6.63\%}$
test_select 0.1368ms 32.0809μs 31.1712 KOps/s 30.8488 KOps/s $\color{#35bf28}+1.04\%$
test_select_nested 0.7607ms 54.5383μs 18.3357 KOps/s 18.4053 KOps/s $\color{#d91a1a}-0.38\%$
test_exclude_nested 0.2489ms 0.1099ms 9.0953 KOps/s 9.0050 KOps/s $\color{#35bf28}+1.00\%$
test_empty[True] 0.3814ms 0.3430ms 2.9153 KOps/s 2.8525 KOps/s $\color{#35bf28}+2.20\%$
test_empty[False] 17.7209μs 0.9387μs 1.0653 MOps/s 1.0850 MOps/s $\color{#d91a1a}-1.81\%$
test_to 0.1006ms 73.4259μs 13.6192 KOps/s 13.5282 KOps/s $\color{#35bf28}+0.67\%$
test_to_nonblocking 0.2094ms 61.5963μs 16.2348 KOps/s 16.6467 KOps/s $\color{#d91a1a}-2.47\%$
test_unbind_speed 0.3171ms 0.2645ms 3.7802 KOps/s 3.7869 KOps/s $\color{#d91a1a}-0.18\%$
test_unbind_speed_stack0 0.4396ms 0.2670ms 3.7456 KOps/s 3.7302 KOps/s $\color{#35bf28}+0.41\%$
test_unbind_speed_stack1 77.4090ms 0.8014ms 1.2477 KOps/s 1.2279 KOps/s $\color{#35bf28}+1.62\%$
test_split 76.7394ms 1.6245ms 615.5565 Ops/s 614.7972 Ops/s $\color{#35bf28}+0.12\%$
test_chunk 77.2103ms 1.6202ms 617.2063 Ops/s 614.7506 Ops/s $\color{#35bf28}+0.40\%$
test_creation[device0] 0.2031ms 56.3542μs 17.7449 KOps/s 17.5383 KOps/s $\color{#35bf28}+1.18\%$
test_creation_from_tensor 0.2107ms 53.4156μs 18.7211 KOps/s 18.3623 KOps/s $\color{#35bf28}+1.95\%$
test_add_one[memmap_tensor0] 0.1097ms 6.6156μs 151.1583 KOps/s 149.8920 KOps/s $\color{#35bf28}+0.84\%$
test_contiguous[memmap_tensor0] 13.6110μs 0.6714μs 1.4894 MOps/s 1.5191 MOps/s $\color{#d91a1a}-1.95\%$
test_stack[memmap_tensor0] 30.7810μs 4.4320μs 225.6307 KOps/s 226.0126 KOps/s $\color{#d91a1a}-0.17\%$
test_memmaptd_index 1.1647ms 0.2790ms 3.5842 KOps/s 3.5856 KOps/s $\color{#d91a1a}-0.04\%$
test_memmaptd_index_astensor 0.6596ms 0.3501ms 2.8561 KOps/s 2.8617 KOps/s $\color{#d91a1a}-0.19\%$
test_memmaptd_index_op 1.1424ms 0.6276ms 1.5933 KOps/s 1.5507 KOps/s $\color{#35bf28}+2.75\%$
test_serialize_model 0.1802s 0.1101s 9.0861 Ops/s 8.6961 Ops/s $\color{#35bf28}+4.49\%$
test_serialize_model_pickle 1.3652s 1.2383s 0.8076 Ops/s 0.8064 Ops/s $\color{#35bf28}+0.14\%$
test_serialize_weights 0.1784s 0.1078s 9.2797 Ops/s 8.7875 Ops/s $\textbf{\color{#35bf28}+5.60\%}$
test_serialize_weights_returnearly 0.2268s 96.3780ms 10.3758 Ops/s 10.8432 Ops/s $\color{#d91a1a}-4.31\%$
test_serialize_weights_pickle 1.3579s 1.2482s 0.8011 Ops/s 0.8011 Ops/s $+0.00\%$
test_reshape_pytree 50.4530μs 25.5003μs 39.2153 KOps/s 39.3801 KOps/s $\color{#d91a1a}-0.42\%$
test_reshape_td 0.1101ms 30.5062μs 32.7802 KOps/s 32.1407 KOps/s $\color{#35bf28}+1.99\%$
test_view_pytree 0.1399ms 25.3473μs 39.4520 KOps/s 39.5215 KOps/s $\color{#d91a1a}-0.18\%$
test_view_td 0.1339ms 34.7984μs 28.7370 KOps/s 26.9273 KOps/s $\textbf{\color{#35bf28}+6.72\%}$
test_unbind_pytree 0.1895ms 31.2076μs 32.0435 KOps/s 32.0596 KOps/s $\color{#d91a1a}-0.05\%$
test_unbind_td 0.4710ms 41.4711μs 24.1132 KOps/s 24.8210 KOps/s $\color{#d91a1a}-2.85\%$
test_split_pytree 0.3873ms 38.9177μs 25.6952 KOps/s 29.6908 KOps/s $\textbf{\color{#d91a1a}-13.46\%}$
test_split_td 0.5721ms 39.1479μs 25.5441 KOps/s 26.4058 KOps/s $\color{#d91a1a}-3.26\%$
test_add_pytree 0.1822ms 37.5023μs 26.6651 KOps/s 27.3117 KOps/s $\color{#d91a1a}-2.37\%$
test_add_td 0.1770ms 49.9626μs 20.0150 KOps/s 20.3138 KOps/s $\color{#d91a1a}-1.47\%$
test_distributed 0.2255ms 66.7298μs 14.9858 KOps/s 11.6997 KOps/s $\textbf{\color{#35bf28}+28.09\%}$
test_tdmodule 0.1455ms 14.4143μs 69.3757 KOps/s 63.8250 KOps/s $\textbf{\color{#35bf28}+8.70\%}$
test_tdmodule_dispatch 43.7920μs 27.8616μs 35.8917 KOps/s 32.8784 KOps/s $\textbf{\color{#35bf28}+9.17\%}$
test_tdseq 32.0420μs 16.1504μs 61.9181 KOps/s 58.6231 KOps/s $\textbf{\color{#35bf28}+5.62\%}$
test_tdseq_dispatch 0.1373ms 31.0856μs 32.1693 KOps/s 30.1497 KOps/s $\textbf{\color{#35bf28}+6.70\%}$
test_instantiation_functorch 80.7402ms 1.6518ms 605.3963 Ops/s 656.9609 Ops/s $\textbf{\color{#d91a1a}-7.85\%}$
test_instantiation_td 1.5563ms 1.0500ms 952.3976 Ops/s 884.7582 Ops/s $\textbf{\color{#35bf28}+7.64\%}$
test_exec_functorch 0.2383ms 0.1451ms 6.8916 KOps/s 6.9855 KOps/s $\color{#d91a1a}-1.34\%$
test_exec_functional_call 0.2295ms 0.1337ms 7.4771 KOps/s 7.5365 KOps/s $\color{#d91a1a}-0.79\%$
test_exec_td 0.1854ms 0.1311ms 7.6297 KOps/s 7.6856 KOps/s $\color{#d91a1a}-0.73\%$
test_exec_td_decorator 0.8023ms 0.2054ms 4.8676 KOps/s 4.9466 KOps/s $\color{#d91a1a}-1.60\%$
test_vmap_mlp_speed[True-True] 0.8212ms 0.5883ms 1.6999 KOps/s 1.7057 KOps/s $\color{#d91a1a}-0.34\%$
test_vmap_mlp_speed[True-False] 0.8993ms 0.5935ms 1.6849 KOps/s 1.6757 KOps/s $\color{#35bf28}+0.55\%$
test_vmap_mlp_speed[False-True] 0.6893ms 0.5290ms 1.8905 KOps/s 1.9083 KOps/s $\color{#d91a1a}-0.93\%$
test_vmap_mlp_speed[False-False] 0.6910ms 0.5191ms 1.9265 KOps/s 1.9003 KOps/s $\color{#35bf28}+1.38\%$
test_vmap_mlp_speed_decorator[True-True] 1.0025ms 0.6618ms 1.5110 KOps/s 1.5325 KOps/s $\color{#d91a1a}-1.40\%$
test_vmap_mlp_speed_decorator[True-False] 0.9293ms 0.6556ms 1.5253 KOps/s 1.5200 KOps/s $\color{#35bf28}+0.35\%$
test_vmap_mlp_speed_decorator[False-True] 0.8184ms 0.5687ms 1.7583 KOps/s 1.7161 KOps/s $\color{#35bf28}+2.46\%$
test_vmap_mlp_speed_decorator[False-False] 0.7586ms 0.5656ms 1.7680 KOps/s 1.7223 KOps/s $\color{#35bf28}+2.65\%$
test_vmap_transformer_speed[True-True] 7.9217ms 7.6793ms 130.2197 Ops/s 130.5505 Ops/s $\color{#d91a1a}-0.25\%$
test_vmap_transformer_speed[True-False] 7.8132ms 7.6458ms 130.7914 Ops/s 129.7299 Ops/s $\color{#35bf28}+0.82\%$
test_vmap_transformer_speed[False-True] 9.0051ms 8.0433ms 124.3275 Ops/s 131.4705 Ops/s $\textbf{\color{#d91a1a}-5.43\%}$
test_vmap_transformer_speed[False-False] 8.4899ms 7.8549ms 127.3093 Ops/s 131.2682 Ops/s $\color{#d91a1a}-3.02\%$
test_vmap_transformer_speed_decorator[True-True] 20.3330ms 19.3967ms 51.5551 Ops/s 53.5636 Ops/s $\color{#d91a1a}-3.75\%$
test_vmap_transformer_speed_decorator[True-False] 20.3669ms 19.6905ms 50.7859 Ops/s 53.5854 Ops/s $\textbf{\color{#d91a1a}-5.22\%}$
test_vmap_transformer_speed_decorator[False-True] 20.1219ms 19.4104ms 51.5187 Ops/s 53.7673 Ops/s $\color{#d91a1a}-4.18\%$
test_vmap_transformer_speed_decorator[False-False] 19.9952ms 19.2322ms 51.9961 Ops/s 53.7679 Ops/s $\color{#d91a1a}-3.30\%$
test_to_module_speed[True] 1.8027ms 1.5461ms 646.7781 Ops/s 594.5193 Ops/s $\textbf{\color{#35bf28}+8.79\%}$
test_to_module_speed[False] 1.7661ms 1.5290ms 654.0290 Ops/s 665.9482 Ops/s $\color{#d91a1a}-1.79\%$
test_tc_init 0.1526ms 22.8614μs 43.7418 KOps/s 38.6579 KOps/s $\textbf{\color{#35bf28}+13.15\%}$
test_tc_init_nested 0.1839ms 51.1931μs 19.5339 KOps/s 18.0414 KOps/s $\textbf{\color{#35bf28}+8.27\%}$
test_tc_first_layer_tensor 4.4964μs 0.3638μs 2.7485 MOps/s 2.7988 MOps/s $\color{#d91a1a}-1.80\%$
test_tc_first_layer_nontensor 9.4227μs 0.3966μs 2.5214 MOps/s 2.5884 MOps/s $\color{#d91a1a}-2.59\%$
test_tc_second_layer_tensor 24.0230μs 0.9918μs 1.0083 MOps/s 928.9901 KOps/s $\textbf{\color{#35bf28}+8.53\%}$
test_tc_second_layer_nontensor 34.6333μs 0.8427μs 1.1866 MOps/s 1.2575 MOps/s $\textbf{\color{#d91a1a}-5.64\%}$
test_unbind 0.1013s 6.7728ms 147.6487 Ops/s 144.2992 Ops/s $\color{#35bf28}+2.32\%$
test_full_like 14.4599ms 13.8243ms 72.3363 Ops/s 71.9612 Ops/s $\color{#35bf28}+0.52\%$
test_zeros_like 7.7296ms 7.0648ms 141.5468 Ops/s 140.5136 Ops/s $\color{#35bf28}+0.74\%$
test_ones_like 8.5813ms 7.9991ms 125.0134 Ops/s 125.3927 Ops/s $\color{#d91a1a}-0.30\%$
test_clone 9.8975ms 9.5309ms 104.9214 Ops/s 105.0241 Ops/s $\color{#d91a1a}-0.10\%$
test_squeeze 62.6930μs 11.1303μs 89.8447 KOps/s 90.0583 KOps/s $\color{#d91a1a}-0.24\%$
test_unsqueeze 0.2060ms 59.6988μs 16.7508 KOps/s 16.4910 KOps/s $\color{#35bf28}+1.58\%$
test_split 0.2424ms 96.6557μs 10.3460 KOps/s 10.4109 KOps/s $\color{#d91a1a}-0.62\%$
test_permute 0.2655ms 0.1236ms 8.0882 KOps/s 8.3160 KOps/s $\color{#d91a1a}-2.74\%$
test_stack 30.3348ms 27.7650ms 36.0165 Ops/s 36.2238 Ops/s $\color{#d91a1a}-0.57\%$
test_cat 28.1897ms 27.6509ms 36.1651 Ops/s 36.1333 Ops/s $\color{#35bf28}+0.09\%$

@vmoens vmoens merged commit a01ffdd into main May 27, 2024
37 of 38 checks passed
@vmoens vmoens added the bug Something isn't working label May 27, 2024
@vmoens vmoens deleted the fix-lazy-stack-various branch October 21, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants