Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster tensorclass #791

Merged
merged 5 commits into from
May 25, 2024
Merged

[Performance] Faster tensorclass #791

merged 5 commits into from
May 25, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented May 25, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 25, 2024
Copy link

github-actions bot commented May 25, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}31$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 32.5010μs 16.5895μs 60.2791 KOps/s 57.1999 KOps/s $\textbf{\color{#35bf28}+5.38\%}$
test_plain_set_stack_nested 51.7470μs 17.0070μs 58.7993 KOps/s 56.3115 KOps/s $\color{#35bf28}+4.42\%$
test_plain_set_nested_inplace 56.9160μs 19.0482μs 52.4984 KOps/s 49.8819 KOps/s $\textbf{\color{#35bf28}+5.25\%}$
test_plain_set_stack_nested_inplace 58.3090μs 19.0146μs 52.5910 KOps/s 50.4222 KOps/s $\color{#35bf28}+4.30\%$
test_items 30.8980μs 2.4635μs 405.9201 KOps/s 392.2748 KOps/s $\color{#35bf28}+3.48\%$
test_items_nested 1.3305ms 0.2667ms 3.7495 KOps/s 3.7632 KOps/s $\color{#d91a1a}-0.36\%$
test_items_nested_locked 0.4339ms 0.2662ms 3.7563 KOps/s 3.6986 KOps/s $\color{#35bf28}+1.56\%$
test_items_nested_leaf 0.1285ms 77.3562μs 12.9272 KOps/s 12.9785 KOps/s $\color{#d91a1a}-0.40\%$
test_items_stack_nested 0.4554ms 0.2668ms 3.7476 KOps/s 3.7218 KOps/s $\color{#35bf28}+0.69\%$
test_items_stack_nested_leaf 0.1522ms 77.4172μs 12.9170 KOps/s 12.7323 KOps/s $\color{#35bf28}+1.45\%$
test_items_stack_nested_locked 0.3920ms 0.2698ms 3.7059 KOps/s 3.7640 KOps/s $\color{#d91a1a}-1.54\%$
test_keys 30.2970μs 3.8844μs 257.4377 KOps/s 260.1484 KOps/s $\color{#d91a1a}-1.04\%$
test_keys_nested 0.2955ms 0.1394ms 7.1714 KOps/s 7.2840 KOps/s $\color{#d91a1a}-1.55\%$
test_keys_nested_locked 2.2074ms 0.1432ms 6.9846 KOps/s 7.0485 KOps/s $\color{#d91a1a}-0.91\%$
test_keys_nested_leaf 0.2030ms 0.1177ms 8.4983 KOps/s 8.4660 KOps/s $\color{#35bf28}+0.38\%$
test_keys_stack_nested 0.2092ms 0.1383ms 7.2294 KOps/s 7.2559 KOps/s $\color{#d91a1a}-0.37\%$
test_keys_stack_nested_leaf 0.1987ms 0.1175ms 8.5071 KOps/s 8.4864 KOps/s $\color{#35bf28}+0.24\%$
test_keys_stack_nested_locked 0.2357ms 0.1418ms 7.0546 KOps/s 6.9523 KOps/s $\color{#35bf28}+1.47\%$
test_values 10.3245μs 1.1690μs 855.4678 KOps/s 836.6975 KOps/s $\color{#35bf28}+2.24\%$
test_values_nested 93.3250μs 50.6421μs 19.7464 KOps/s 19.8304 KOps/s $\color{#d91a1a}-0.42\%$
test_values_nested_locked 92.7230μs 50.4406μs 19.8253 KOps/s 19.6974 KOps/s $\color{#35bf28}+0.65\%$
test_values_nested_leaf 90.4600μs 46.2410μs 21.6258 KOps/s 22.0217 KOps/s $\color{#d91a1a}-1.80\%$
test_values_stack_nested 0.1026ms 51.1682μs 19.5434 KOps/s 19.6496 KOps/s $\color{#d91a1a}-0.54\%$
test_values_stack_nested_leaf 91.3820μs 46.3478μs 21.5760 KOps/s 22.0101 KOps/s $\color{#d91a1a}-1.97\%$
test_values_stack_nested_locked 0.1035ms 50.8055μs 19.6829 KOps/s 19.5380 KOps/s $\color{#35bf28}+0.74\%$
test_membership 12.8540μs 1.3334μs 749.9373 KOps/s 714.7484 KOps/s $\color{#35bf28}+4.92\%$
test_membership_nested 51.2260μs 3.4136μs 292.9484 KOps/s 289.0615 KOps/s $\color{#35bf28}+1.34\%$
test_membership_nested_leaf 57.6180μs 3.4306μs 291.4946 KOps/s 290.2561 KOps/s $\color{#35bf28}+0.43\%$
test_membership_stacked_nested 42.2390μs 3.4064μs 293.5657 KOps/s 294.4581 KOps/s $\color{#d91a1a}-0.30\%$
test_membership_stacked_nested_leaf 30.5070μs 3.4626μs 288.8011 KOps/s 290.1254 KOps/s $\color{#d91a1a}-0.46\%$
test_membership_nested_last 30.7580μs 4.1853μs 238.9325 KOps/s 236.4472 KOps/s $\color{#35bf28}+1.05\%$
test_membership_nested_leaf_last 39.0330μs 4.2118μs 237.4269 KOps/s 240.4541 KOps/s $\color{#d91a1a}-1.26\%$
test_membership_stacked_nested_last 31.7100μs 4.1987μs 238.1697 KOps/s 234.7829 KOps/s $\color{#35bf28}+1.44\%$
test_membership_stacked_nested_leaf_last 31.8300μs 4.2322μs 236.2844 KOps/s 235.8720 KOps/s $\color{#35bf28}+0.17\%$
test_nested_getleaf 79.8300μs 10.6993μs 93.4642 KOps/s 94.8256 KOps/s $\color{#d91a1a}-1.44\%$
test_nested_get 48.8110μs 10.1641μs 98.3852 KOps/s 100.4440 KOps/s $\color{#d91a1a}-2.05\%$
test_stacked_getleaf 82.4150μs 10.7142μs 93.3337 KOps/s 94.9165 KOps/s $\color{#d91a1a}-1.67\%$
test_stacked_get 36.5390μs 10.0107μs 99.8933 KOps/s 99.4684 KOps/s $\color{#35bf28}+0.43\%$
test_nested_getitemleaf 3.6831ms 11.3295μs 88.2652 KOps/s 89.6018 KOps/s $\color{#d91a1a}-1.49\%$
test_nested_getitem 59.9210μs 10.2522μs 97.5400 KOps/s 100.4627 KOps/s $\color{#d91a1a}-2.91\%$
test_stacked_getitemleaf 91.8720μs 11.1768μs 89.4714 KOps/s 89.4540 KOps/s $\color{#35bf28}+0.02\%$
test_stacked_getitem 45.8650μs 10.2094μs 97.9486 KOps/s 98.4882 KOps/s $\color{#d91a1a}-0.55\%$
test_lock_nested 60.9636ms 0.4265ms 2.3447 KOps/s 2.7621 KOps/s $\textbf{\color{#d91a1a}-15.11\%}$
test_lock_stack_nested 0.4849ms 0.3144ms 3.1808 KOps/s 3.1753 KOps/s $\color{#35bf28}+0.17\%$
test_unlock_nested 1.6483ms 0.3571ms 2.8006 KOps/s 2.3968 KOps/s $\textbf{\color{#35bf28}+16.85\%}$
test_unlock_stack_nested 0.4286ms 0.3169ms 3.1556 KOps/s 3.0804 KOps/s $\color{#35bf28}+2.44\%$
test_flatten_speed 0.2671ms 96.0424μs 10.4121 KOps/s 10.2913 KOps/s $\color{#35bf28}+1.17\%$
test_unflatten_speed 0.6054ms 0.4089ms 2.4458 KOps/s 2.4240 KOps/s $\color{#35bf28}+0.90\%$
test_common_ops 1.6815ms 0.7201ms 1.3887 KOps/s 1.3054 KOps/s $\textbf{\color{#35bf28}+6.39\%}$
test_creation 21.1090μs 1.8823μs 531.2566 KOps/s 528.1358 KOps/s $\color{#35bf28}+0.59\%$
test_creation_empty 0.1074ms 10.3090μs 97.0028 KOps/s 85.8574 KOps/s $\textbf{\color{#35bf28}+12.98\%}$
test_creation_nested_1 54.2810μs 13.1282μs 76.1720 KOps/s 69.0223 KOps/s $\textbf{\color{#35bf28}+10.36\%}$
test_creation_nested_2 62.5670μs 16.3863μs 61.0265 KOps/s 54.9944 KOps/s $\textbf{\color{#35bf28}+10.97\%}$
test_clone 0.1459ms 13.6171μs 73.4372 KOps/s 72.6865 KOps/s $\color{#35bf28}+1.03\%$
test_getitem[int] 53.0090μs 11.4041μs 87.6879 KOps/s 84.5881 KOps/s $\color{#35bf28}+3.66\%$
test_getitem[slice_int] 57.1670μs 22.2646μs 44.9144 KOps/s 42.1508 KOps/s $\textbf{\color{#35bf28}+6.56\%}$
test_getitem[range] 79.6990μs 58.8822μs 16.9831 KOps/s 16.5304 KOps/s $\color{#35bf28}+2.74\%$
test_getitem[tuple] 66.8450μs 18.7990μs 53.1943 KOps/s 51.4087 KOps/s $\color{#35bf28}+3.47\%$
test_getitem[list] 0.1754ms 41.8651μs 23.8862 KOps/s 22.8241 KOps/s $\color{#35bf28}+4.65\%$
test_setitem_dim[int] 75.2900μs 36.0437μs 27.7441 KOps/s 27.8166 KOps/s $\color{#d91a1a}-0.26\%$
test_setitem_dim[slice_int] 0.1244ms 62.8275μs 15.9166 KOps/s 15.4934 KOps/s $\color{#35bf28}+2.73\%$
test_setitem_dim[range] 0.1787ms 87.3861μs 11.4435 KOps/s 11.0522 KOps/s $\color{#35bf28}+3.54\%$
test_setitem_dim[tuple] 0.1532ms 51.0133μs 19.6027 KOps/s 19.4713 KOps/s $\color{#35bf28}+0.67\%$
test_setitem 69.4500μs 20.8401μs 47.9844 KOps/s 46.5433 KOps/s $\color{#35bf28}+3.10\%$
test_set 0.1016ms 20.4525μs 48.8938 KOps/s 47.3024 KOps/s $\color{#35bf28}+3.36\%$
test_set_shared 3.6495ms 0.1438ms 6.9521 KOps/s 6.8720 KOps/s $\color{#35bf28}+1.17\%$
test_update 0.1484ms 22.2411μs 44.9617 KOps/s 42.1412 KOps/s $\textbf{\color{#35bf28}+6.69\%}$
test_update_nested 96.1400μs 30.5301μs 32.7546 KOps/s 31.7422 KOps/s $\color{#35bf28}+3.19\%$
test_update__nested 64.2500μs 24.9439μs 40.0899 KOps/s 38.5167 KOps/s $\color{#35bf28}+4.08\%$
test_set_nested 98.4640μs 22.2937μs 44.8557 KOps/s 43.4141 KOps/s $\color{#35bf28}+3.32\%$
test_set_nested_new 0.1123ms 26.2807μs 38.0508 KOps/s 37.0491 KOps/s $\color{#35bf28}+2.70\%$
test_select 0.1131ms 41.7917μs 23.9282 KOps/s 23.2722 KOps/s $\color{#35bf28}+2.82\%$
test_select_nested 0.1458ms 60.8132μs 16.4438 KOps/s 16.2010 KOps/s $\color{#35bf28}+1.50\%$
test_exclude_nested 0.2296ms 0.1213ms 8.2434 KOps/s 8.1616 KOps/s $\color{#35bf28}+1.00\%$
test_empty[True] 0.5434ms 0.3965ms 2.5220 KOps/s 2.4811 KOps/s $\color{#35bf28}+1.65\%$
test_empty[False] 7.3678μs 1.0825μs 923.7492 KOps/s 922.3668 KOps/s $\color{#35bf28}+0.15\%$
test_unbind_speed 0.3709ms 0.2650ms 3.7742 KOps/s 3.8146 KOps/s $\color{#d91a1a}-1.06\%$
test_unbind_speed_stack0 0.4095ms 0.2545ms 3.9289 KOps/s 3.8856 KOps/s $\color{#35bf28}+1.11\%$
test_unbind_speed_stack1 72.8653ms 0.7317ms 1.3666 KOps/s 1.2480 KOps/s $\textbf{\color{#35bf28}+9.50\%}$
test_split 73.3713ms 1.5946ms 627.1275 Ops/s 597.5835 Ops/s $\color{#35bf28}+4.94\%$
test_chunk 70.5256ms 1.5921ms 628.1126 Ops/s 636.5812 Ops/s $\color{#d91a1a}-1.33\%$
test_creation[device0] 3.6176ms 87.2238μs 11.4648 KOps/s 11.5787 KOps/s $\color{#d91a1a}-0.98\%$
test_creation_from_tensor 0.2341ms 86.9931μs 11.4952 KOps/s 11.1252 KOps/s $\color{#35bf28}+3.33\%$
test_add_one[memmap_tensor0] 0.1111ms 5.3906μs 185.5072 KOps/s 178.6133 KOps/s $\color{#35bf28}+3.86\%$
test_contiguous[memmap_tensor0] 22.5920μs 0.6486μs 1.5417 MOps/s 1.5927 MOps/s $\color{#d91a1a}-3.20\%$
test_stack[memmap_tensor0] 18.3940μs 3.6371μs 274.9417 KOps/s 268.3840 KOps/s $\color{#35bf28}+2.44\%$
test_memmaptd_index 1.1589ms 0.2557ms 3.9110 KOps/s 3.8679 KOps/s $\color{#35bf28}+1.12\%$
test_memmaptd_index_astensor 0.7838ms 0.3334ms 2.9991 KOps/s 2.9740 KOps/s $\color{#35bf28}+0.84\%$
test_memmaptd_index_op 1.1775ms 0.6296ms 1.5884 KOps/s 1.5124 KOps/s $\textbf{\color{#35bf28}+5.03\%}$
test_serialize_model 0.1848s 0.1174s 8.5212 Ops/s 9.0403 Ops/s $\textbf{\color{#d91a1a}-5.74\%}$
test_serialize_model_pickle 0.4484s 0.3815s 2.6212 Ops/s 2.5609 Ops/s $\color{#35bf28}+2.35\%$
test_serialize_weights 0.1100s 0.1039s 9.6228 Ops/s 8.0770 Ops/s $\textbf{\color{#35bf28}+19.14\%}$
test_serialize_weights_returnearly 0.2010s 0.1384s 7.2247 Ops/s 6.8012 Ops/s $\textbf{\color{#35bf28}+6.23\%}$
test_serialize_weights_pickle 0.9629s 0.6640s 1.5060 Ops/s 2.4230 Ops/s $\textbf{\color{#d91a1a}-37.85\%}$
test_serialize_weights_filesystem 0.1724s 0.1022s 9.7826 Ops/s 10.1794 Ops/s $\color{#d91a1a}-3.90\%$
test_serialize_model_filesystem 0.1041s 94.8076ms 10.5477 Ops/s 9.5574 Ops/s $\textbf{\color{#35bf28}+10.36\%}$
test_reshape_pytree 88.7260μs 25.6763μs 38.9465 KOps/s 38.4575 KOps/s $\color{#35bf28}+1.27\%$
test_reshape_td 0.1198ms 33.3170μs 30.0147 KOps/s 28.9781 KOps/s $\color{#35bf28}+3.58\%$
test_view_pytree 82.8350μs 25.6826μs 38.9368 KOps/s 39.2178 KOps/s $\color{#d91a1a}-0.72\%$
test_view_td 95.4290μs 37.6174μs 26.5834 KOps/s 26.4313 KOps/s $\color{#35bf28}+0.58\%$
test_unbind_pytree 82.9060μs 29.5934μs 33.7913 KOps/s 33.7250 KOps/s $\color{#35bf28}+0.20\%$
test_unbind_td 0.4029ms 38.4692μs 25.9948 KOps/s 26.0164 KOps/s $\color{#d91a1a}-0.08\%$
test_split_pytree 78.5270μs 29.5466μs 33.8449 KOps/s 33.7769 KOps/s $\color{#35bf28}+0.20\%$
test_split_td 0.1138ms 40.6200μs 24.6184 KOps/s 23.3832 KOps/s $\textbf{\color{#35bf28}+5.28\%}$
test_add_pytree 76.1530μs 35.0054μs 28.5671 KOps/s 28.0725 KOps/s $\color{#35bf28}+1.76\%$
test_add_td 0.1731ms 58.4169μs 17.1183 KOps/s 17.0051 KOps/s $\color{#35bf28}+0.67\%$
test_distributed 0.3130ms 0.1055ms 9.4803 KOps/s 9.4008 KOps/s $\color{#35bf28}+0.85\%$
test_tdmodule 0.1104ms 18.4043μs 54.3351 KOps/s 54.6497 KOps/s $\color{#d91a1a}-0.58\%$
test_tdmodule_dispatch 69.7410μs 35.8339μs 27.9065 KOps/s 26.7691 KOps/s $\color{#35bf28}+4.25\%$
test_tdseq 55.0340μs 20.6727μs 48.3730 KOps/s 40.9574 KOps/s $\textbf{\color{#35bf28}+18.11\%}$
test_tdseq_dispatch 83.0360μs 39.9117μs 25.0553 KOps/s 23.6854 KOps/s $\textbf{\color{#35bf28}+5.78\%}$
test_instantiation_functorch 1.9403ms 1.3250ms 754.7450 Ops/s 749.7212 Ops/s $\color{#35bf28}+0.67\%$
test_instantiation_td 1.9188ms 1.0368ms 964.5192 Ops/s 950.7808 Ops/s $\color{#35bf28}+1.44\%$
test_exec_functorch 0.3374ms 0.1610ms 6.2119 KOps/s 6.1080 KOps/s $\color{#35bf28}+1.70\%$
test_exec_functional_call 0.2918ms 0.1502ms 6.6598 KOps/s 6.5308 KOps/s $\color{#35bf28}+1.98\%$
test_exec_td 0.2875ms 0.1469ms 6.8082 KOps/s 6.6771 KOps/s $\color{#35bf28}+1.96\%$
test_exec_td_decorator 1.6296ms 0.2225ms 4.4949 KOps/s 4.4837 KOps/s $\color{#35bf28}+0.25\%$
test_vmap_mlp_speed[True-True] 0.9453ms 0.4928ms 2.0293 KOps/s 2.0246 KOps/s $\color{#35bf28}+0.23\%$
test_vmap_mlp_speed[True-False] 0.8719ms 0.4838ms 2.0669 KOps/s 2.0231 KOps/s $\color{#35bf28}+2.17\%$
test_vmap_mlp_speed[False-True] 1.1131ms 0.4265ms 2.3446 KOps/s 2.4921 KOps/s $\textbf{\color{#d91a1a}-5.92\%}$
test_vmap_mlp_speed[False-False] 0.7529ms 0.4016ms 2.4900 KOps/s 2.4771 KOps/s $\color{#35bf28}+0.52\%$
test_vmap_mlp_speed_decorator[True-True] 1.3746ms 0.5656ms 1.7681 KOps/s 1.7549 KOps/s $\color{#35bf28}+0.75\%$
test_vmap_mlp_speed_decorator[True-False] 0.9811ms 0.5602ms 1.7850 KOps/s 1.7585 KOps/s $\color{#35bf28}+1.51\%$
test_vmap_mlp_speed_decorator[False-True] 0.8517ms 0.4668ms 2.1420 KOps/s 2.1555 KOps/s $\color{#d91a1a}-0.63\%$
test_vmap_mlp_speed_decorator[False-False] 0.7977ms 0.4668ms 2.1425 KOps/s 2.1495 KOps/s $\color{#d91a1a}-0.33\%$
test_to_module_speed[True] 2.5973ms 1.6996ms 588.3607 Ops/s 584.4152 Ops/s $\color{#35bf28}+0.68\%$
test_to_module_speed[False] 2.6243ms 1.6827ms 594.2974 Ops/s 549.7325 Ops/s $\textbf{\color{#35bf28}+8.11\%}$
test_tc_init 61.2750μs 28.5574μs 35.0171 KOps/s 14.5655 KOps/s $\textbf{\color{#35bf28}+140.41\%}$
test_tc_init_nested 0.1190ms 58.6180μs 17.0596 KOps/s 7.1687 KOps/s $\textbf{\color{#35bf28}+137.97\%}$
test_tc_first_layer_tensor 4.9793μs 0.6832μs 1.4638 MOps/s 160.8866 KOps/s $\textbf{\color{#35bf28}+809.82\%}$
test_tc_first_layer_nontensor 1.8971μs 0.6752μs 1.4811 MOps/s 159.4670 KOps/s $\textbf{\color{#35bf28}+828.79\%}$
test_tc_second_layer_tensor 24.1450μs 1.8234μs 548.4121 KOps/s 87.9681 KOps/s $\textbf{\color{#35bf28}+523.42\%}$
test_tc_second_layer_nontensor 8.9333μs 1.4777μs 676.7266 KOps/s 86.1024 KOps/s $\textbf{\color{#35bf28}+685.96\%}$
test_unbind 94.1599ms 7.3748ms 135.5976 Ops/s 74.7854 Ops/s $\textbf{\color{#35bf28}+81.32\%}$
test_full_like 19.4543ms 12.1912ms 82.0261 Ops/s 84.8644 Ops/s $\color{#d91a1a}-3.34\%$
test_zeros_like 12.2971ms 6.5376ms 152.9614 Ops/s 152.1959 Ops/s $\color{#35bf28}+0.50\%$
test_ones_like 15.4200ms 6.5841ms 151.8822 Ops/s 141.3483 Ops/s $\textbf{\color{#35bf28}+7.45\%}$
test_clone 14.3957ms 8.6462ms 115.6583 Ops/s 117.9237 Ops/s $\color{#d91a1a}-1.92\%$
test_squeeze 72.1440μs 14.4483μs 69.2124 KOps/s 36.0786 KOps/s $\textbf{\color{#35bf28}+91.84\%}$
test_unsqueeze 0.1751ms 70.9901μs 14.0865 KOps/s 10.0089 KOps/s $\textbf{\color{#35bf28}+40.74\%}$
test_split 0.1943ms 0.1125ms 8.8890 KOps/s 5.8273 KOps/s $\textbf{\color{#35bf28}+52.54\%}$
test_permute 0.2396ms 0.1375ms 7.2749 KOps/s 5.6667 KOps/s $\textbf{\color{#35bf28}+28.38\%}$
test_stack 30.7516ms 23.7975ms 42.0213 Ops/s 39.9472 Ops/s $\textbf{\color{#35bf28}+5.19\%}$
test_cat 41.3268ms 25.3404ms 39.4627 Ops/s 39.7305 Ops/s $\color{#d91a1a}-0.67\%$

Copy link

github-actions bot commented May 25, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}19$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 39.4810μs 20.6723μs 48.3738 KOps/s 47.2634 KOps/s $\color{#35bf28}+2.35\%$
test_plain_set_stack_nested 40.4110μs 20.7430μs 48.2090 KOps/s 47.4050 KOps/s $\color{#35bf28}+1.70\%$
test_plain_set_nested_inplace 44.3720μs 23.1374μs 43.2201 KOps/s 42.1874 KOps/s $\color{#35bf28}+2.45\%$
test_plain_set_stack_nested_inplace 45.0900μs 22.9479μs 43.5771 KOps/s 42.2361 KOps/s $\color{#35bf28}+3.17\%$
test_items 18.0410μs 4.1893μs 238.7027 KOps/s 230.7624 KOps/s $\color{#35bf28}+3.44\%$
test_items_nested 0.3687ms 0.3406ms 2.9362 KOps/s 2.8756 KOps/s $\color{#35bf28}+2.11\%$
test_items_nested_locked 0.3919ms 0.3434ms 2.9125 KOps/s 2.8590 KOps/s $\color{#35bf28}+1.87\%$
test_items_nested_leaf 0.1212ms 0.1012ms 9.8819 KOps/s 9.8736 KOps/s $\color{#35bf28}+0.08\%$
test_items_stack_nested 0.3698ms 0.3449ms 2.8995 KOps/s 2.8963 KOps/s $\color{#35bf28}+0.11\%$
test_items_stack_nested_leaf 0.1244ms 0.1005ms 9.9459 KOps/s 9.7130 KOps/s $\color{#35bf28}+2.40\%$
test_items_stack_nested_locked 0.3949ms 0.3425ms 2.9200 KOps/s 2.8896 KOps/s $\color{#35bf28}+1.05\%$
test_keys 23.5000μs 4.7207μs 211.8339 KOps/s 209.0952 KOps/s $\color{#35bf28}+1.31\%$
test_keys_nested 0.2064ms 0.1662ms 6.0186 KOps/s 5.9232 KOps/s $\color{#35bf28}+1.61\%$
test_keys_nested_locked 0.7297ms 0.1714ms 5.8343 KOps/s 5.8020 KOps/s $\color{#35bf28}+0.56\%$
test_keys_nested_leaf 0.1772ms 0.1436ms 6.9627 KOps/s 6.9023 KOps/s $\color{#35bf28}+0.87\%$
test_keys_stack_nested 0.1830ms 0.1627ms 6.1445 KOps/s 5.9120 KOps/s $\color{#35bf28}+3.93\%$
test_keys_stack_nested_leaf 0.2245ms 0.1412ms 7.0825 KOps/s 6.8450 KOps/s $\color{#35bf28}+3.47\%$
test_keys_stack_nested_locked 0.2284ms 0.1674ms 5.9723 KOps/s 5.8394 KOps/s $\color{#35bf28}+2.27\%$
test_values 8.8737μs 2.0388μs 490.4812 KOps/s 489.8219 KOps/s $\color{#35bf28}+0.13\%$
test_values_nested 83.0610μs 59.7844μs 16.7268 KOps/s 16.2391 KOps/s $\color{#35bf28}+3.00\%$
test_values_nested_locked 82.7020μs 60.6482μs 16.4885 KOps/s 16.0648 KOps/s $\color{#35bf28}+2.64\%$
test_values_nested_leaf 89.1510μs 54.6311μs 18.3046 KOps/s 17.9757 KOps/s $\color{#35bf28}+1.83\%$
test_values_stack_nested 98.0330μs 60.7869μs 16.4509 KOps/s 15.9196 KOps/s $\color{#35bf28}+3.34\%$
test_values_stack_nested_leaf 85.6620μs 53.5596μs 18.6708 KOps/s 17.4154 KOps/s $\textbf{\color{#35bf28}+7.21\%}$
test_values_stack_nested_locked 87.2620μs 60.7130μs 16.4709 KOps/s 16.2881 KOps/s $\color{#35bf28}+1.12\%$
test_membership 37.0410μs 1.5079μs 663.1619 KOps/s 650.5227 KOps/s $\color{#35bf28}+1.94\%$
test_membership_nested 19.1800μs 3.7863μs 264.1087 KOps/s 259.2228 KOps/s $\color{#35bf28}+1.88\%$
test_membership_nested_leaf 20.7900μs 3.7951μs 263.4969 KOps/s 257.4419 KOps/s $\color{#35bf28}+2.35\%$
test_membership_stacked_nested 24.1310μs 3.7995μs 263.1899 KOps/s 256.6625 KOps/s $\color{#35bf28}+2.54\%$
test_membership_stacked_nested_leaf 34.4000μs 3.7634μs 265.7169 KOps/s 258.0000 KOps/s $\color{#35bf28}+2.99\%$
test_membership_nested_last 20.3400μs 4.6823μs 213.5683 KOps/s 212.8543 KOps/s $\color{#35bf28}+0.34\%$
test_membership_nested_leaf_last 29.8210μs 4.7164μs 212.0248 KOps/s 211.9774 KOps/s $\color{#35bf28}+0.02\%$
test_membership_stacked_nested_last 25.1510μs 8.2598μs 121.0685 KOps/s 210.7908 KOps/s $\textbf{\color{#d91a1a}-42.56\%}$
test_membership_stacked_nested_leaf_last 38.7300μs 8.2713μs 120.8993 KOps/s 212.9455 KOps/s $\textbf{\color{#d91a1a}-43.23\%}$
test_nested_getleaf 34.8500μs 12.9354μs 77.3075 KOps/s 74.7961 KOps/s $\color{#35bf28}+3.36\%$
test_nested_get 31.0810μs 12.3489μs 80.9788 KOps/s 78.9636 KOps/s $\color{#35bf28}+2.55\%$
test_stacked_getleaf 35.7410μs 12.9353μs 77.3077 KOps/s 74.6475 KOps/s $\color{#35bf28}+3.56\%$
test_stacked_get 39.7910μs 12.2511μs 81.6253 KOps/s 79.2026 KOps/s $\color{#35bf28}+3.06\%$
test_nested_getitemleaf 37.5900μs 13.3055μs 75.1568 KOps/s 72.6193 KOps/s $\color{#35bf28}+3.49\%$
test_nested_getitem 31.7000μs 12.4591μs 80.2626 KOps/s 78.0123 KOps/s $\color{#35bf28}+2.88\%$
test_stacked_getitemleaf 42.9400μs 13.3752μs 74.7652 KOps/s 72.4123 KOps/s $\color{#35bf28}+3.25\%$
test_stacked_getitem 41.0610μs 12.5008μs 79.9946 KOps/s 77.7348 KOps/s $\color{#35bf28}+2.91\%$
test_lock_nested 0.7963ms 0.3961ms 2.5244 KOps/s 2.1969 KOps/s $\textbf{\color{#35bf28}+14.91\%}$
test_lock_stack_nested 0.3996ms 0.3463ms 2.8878 KOps/s 2.8304 KOps/s $\color{#35bf28}+2.03\%$
test_unlock_nested 0.7669ms 0.4034ms 2.4791 KOps/s 2.1691 KOps/s $\textbf{\color{#35bf28}+14.29\%}$
test_unlock_stack_nested 0.4061ms 0.3635ms 2.7511 KOps/s 2.7169 KOps/s $\color{#35bf28}+1.26\%$
test_flatten_speed 0.4183ms 0.1226ms 8.1559 KOps/s 8.1931 KOps/s $\color{#d91a1a}-0.45\%$
test_unflatten_speed 0.5943ms 0.4785ms 2.0901 KOps/s 2.0547 KOps/s $\color{#35bf28}+1.72\%$
test_common_ops 1.1589ms 0.7085ms 1.4114 KOps/s 1.4205 KOps/s $\color{#d91a1a}-0.64\%$
test_creation 16.9490μs 2.1148μs 472.8668 KOps/s 449.8448 KOps/s $\textbf{\color{#35bf28}+5.12\%}$
test_creation_empty 28.2910μs 11.7833μs 84.8656 KOps/s 83.0648 KOps/s $\color{#35bf28}+2.17\%$
test_creation_nested_1 40.8410μs 14.6115μs 68.4392 KOps/s 67.0999 KOps/s $\color{#35bf28}+2.00\%$
test_creation_nested_2 1.4980ms 18.1927μs 54.9672 KOps/s 53.3774 KOps/s $\color{#35bf28}+2.98\%$
test_clone 87.0520μs 14.4866μs 69.0292 KOps/s 65.5708 KOps/s $\textbf{\color{#35bf28}+5.27\%}$
test_getitem[int] 30.6000μs 13.4554μs 74.3194 KOps/s 73.4834 KOps/s $\color{#35bf28}+1.14\%$
test_getitem[slice_int] 62.3210μs 24.9784μs 40.0346 KOps/s 40.5632 KOps/s $\color{#d91a1a}-1.30\%$
test_getitem[range] 69.1910μs 50.3504μs 19.8608 KOps/s 20.0567 KOps/s $\color{#d91a1a}-0.98\%$
test_getitem[tuple] 55.9820μs 21.8557μs 45.7547 KOps/s 45.3945 KOps/s $\color{#35bf28}+0.79\%$
test_getitem[list] 98.1810μs 38.7780μs 25.7878 KOps/s 25.6223 KOps/s $\color{#35bf28}+0.65\%$
test_setitem_dim[int] 55.4310μs 37.8885μs 26.3932 KOps/s 27.8002 KOps/s $\textbf{\color{#d91a1a}-5.06\%}$
test_setitem_dim[slice_int] 82.3220μs 60.6254μs 16.4947 KOps/s 17.1168 KOps/s $\color{#d91a1a}-3.63\%$
test_setitem_dim[range] 99.5720μs 77.1779μs 12.9571 KOps/s 13.2608 KOps/s $\color{#d91a1a}-2.29\%$
test_setitem_dim[tuple] 76.1510μs 54.0082μs 18.5157 KOps/s 19.9682 KOps/s $\textbf{\color{#d91a1a}-7.27\%}$
test_setitem 44.9300μs 21.0212μs 47.5710 KOps/s 45.0457 KOps/s $\textbf{\color{#35bf28}+5.61\%}$
test_set 47.2400μs 20.5274μs 48.7153 KOps/s 47.6001 KOps/s $\color{#35bf28}+2.34\%$
test_set_shared 1.7381ms 0.1089ms 9.1856 KOps/s 9.0368 KOps/s $\color{#35bf28}+1.65\%$
test_update 73.2320μs 23.3878μs 42.7573 KOps/s 42.0700 KOps/s $\color{#35bf28}+1.63\%$
test_update_nested 70.7310μs 32.5061μs 30.7634 KOps/s 30.1063 KOps/s $\color{#35bf28}+2.18\%$
test_update__nested 65.8320μs 27.8032μs 35.9671 KOps/s 35.3441 KOps/s $\color{#35bf28}+1.76\%$
test_set_nested 55.6710μs 22.7012μs 44.0505 KOps/s 43.9190 KOps/s $\color{#35bf28}+0.30\%$
test_set_nested_new 66.9110μs 27.5315μs 36.3220 KOps/s 35.2252 KOps/s $\color{#35bf28}+3.11\%$
test_select 73.2520μs 44.7828μs 22.3300 KOps/s 21.9796 KOps/s $\color{#35bf28}+1.59\%$
test_select_nested 0.1010ms 66.2765μs 15.0883 KOps/s 14.8549 KOps/s $\color{#35bf28}+1.57\%$
test_exclude_nested 0.1519ms 0.1288ms 7.7663 KOps/s 7.5992 KOps/s $\color{#35bf28}+2.20\%$
test_empty[True] 0.4766ms 0.4395ms 2.2753 KOps/s 2.2458 KOps/s $\color{#35bf28}+1.31\%$
test_empty[False] 6.8077μs 1.2698μs 787.5198 KOps/s 781.7680 KOps/s $\color{#35bf28}+0.74\%$
test_to 0.1132ms 91.5163μs 10.9270 KOps/s 10.7800 KOps/s $\color{#35bf28}+1.36\%$
test_to_nonblocking 0.1055ms 71.9523μs 13.8981 KOps/s 13.4376 KOps/s $\color{#35bf28}+3.43\%$
test_unbind_speed 1.9963ms 0.3111ms 3.2146 KOps/s 3.2136 KOps/s $\color{#35bf28}+0.03\%$
test_unbind_speed_stack0 0.3501ms 0.3049ms 3.2793 KOps/s 3.2579 KOps/s $\color{#35bf28}+0.66\%$
test_unbind_speed_stack1 74.2666ms 0.9047ms 1.1053 KOps/s 1.0927 KOps/s $\color{#35bf28}+1.16\%$
test_split 74.8599ms 1.9471ms 513.5892 Ops/s 567.1944 Ops/s $\textbf{\color{#d91a1a}-9.45\%}$
test_chunk 75.3187ms 1.9468ms 513.6670 Ops/s 524.7081 Ops/s $\color{#d91a1a}-2.10\%$
test_creation[device0] 0.1664ms 72.0307μs 13.8830 KOps/s 14.0077 KOps/s $\color{#d91a1a}-0.89\%$
test_creation_from_tensor 0.1465ms 67.4860μs 14.8179 KOps/s 14.8461 KOps/s $\color{#d91a1a}-0.19\%$
test_add_one[memmap_tensor0] 0.1098ms 6.8540μs 145.9006 KOps/s 146.1106 KOps/s $\color{#d91a1a}-0.14\%$
test_contiguous[memmap_tensor0] 16.6800μs 0.6782μs 1.4745 MOps/s 1.4433 MOps/s $\color{#35bf28}+2.16\%$
test_stack[memmap_tensor0] 29.9500μs 4.5254μs 220.9730 KOps/s 221.1852 KOps/s $\color{#d91a1a}-0.10\%$
test_memmaptd_index 0.5358ms 0.3183ms 3.1416 KOps/s 3.1215 KOps/s $\color{#35bf28}+0.64\%$
test_memmaptd_index_astensor 0.7949ms 0.4062ms 2.4621 KOps/s 2.4330 KOps/s $\color{#35bf28}+1.20\%$
test_memmaptd_index_op 1.1528ms 0.7372ms 1.3565 KOps/s 1.3497 KOps/s $\color{#35bf28}+0.50\%$
test_serialize_model 0.1914s 0.1177s 8.4985 Ops/s 8.1702 Ops/s $\color{#35bf28}+4.02\%$
test_serialize_model_pickle 1.3504s 1.2360s 0.8091 Ops/s 0.8063 Ops/s $\color{#35bf28}+0.34\%$
test_serialize_weights 0.1814s 0.1144s 8.7427 Ops/s 8.3466 Ops/s $\color{#35bf28}+4.75\%$
test_serialize_weights_returnearly 0.2235s 0.1067s 9.3715 Ops/s 9.9927 Ops/s $\textbf{\color{#d91a1a}-6.22\%}$
test_serialize_weights_pickle 1.3590s 1.2361s 0.8090 Ops/s 0.8087 Ops/s $\color{#35bf28}+0.04\%$
test_reshape_pytree 93.9320μs 32.9180μs 30.3785 KOps/s 30.2836 KOps/s $\color{#35bf28}+0.31\%$
test_reshape_td 70.9910μs 37.4194μs 26.7241 KOps/s 27.4278 KOps/s $\color{#d91a1a}-2.57\%$
test_view_pytree 0.1727ms 33.0872μs 30.2232 KOps/s 30.8700 KOps/s $\color{#d91a1a}-2.10\%$
test_view_td 0.2102ms 43.9772μs 22.7390 KOps/s 23.9065 KOps/s $\color{#d91a1a}-4.88\%$
test_unbind_pytree 0.1573ms 39.0095μs 25.6348 KOps/s 26.1242 KOps/s $\color{#d91a1a}-1.87\%$
test_unbind_td 0.5347ms 46.0431μs 21.7188 KOps/s 22.1455 KOps/s $\color{#d91a1a}-1.93\%$
test_split_pytree 70.2210μs 38.0094μs 26.3093 KOps/s 26.9761 KOps/s $\color{#d91a1a}-2.47\%$
test_split_td 0.1130ms 47.6832μs 20.9717 KOps/s 21.5764 KOps/s $\color{#d91a1a}-2.80\%$
test_add_pytree 0.1585ms 44.6631μs 22.3898 KOps/s 22.9519 KOps/s $\color{#d91a1a}-2.45\%$
test_add_td 0.1035ms 65.4389μs 15.2814 KOps/s 16.5803 KOps/s $\textbf{\color{#d91a1a}-7.83\%}$
test_distributed 1.7716ms 81.5147μs 12.2677 KOps/s 9.7081 KOps/s $\textbf{\color{#35bf28}+26.37\%}$
test_tdmodule 52.4110μs 18.8387μs 53.0822 KOps/s 51.7435 KOps/s $\color{#35bf28}+2.59\%$
test_tdmodule_dispatch 55.6320μs 36.5268μs 27.3772 KOps/s 27.3525 KOps/s $\color{#35bf28}+0.09\%$
test_tdseq 38.1910μs 20.8016μs 48.0731 KOps/s 47.6737 KOps/s $\color{#35bf28}+0.84\%$
test_tdseq_dispatch 61.7920μs 40.9975μs 24.3917 KOps/s 24.5056 KOps/s $\color{#d91a1a}-0.46\%$
test_instantiation_functorch 1.5267ms 1.4472ms 690.9839 Ops/s 685.4634 Ops/s $\color{#35bf28}+0.81\%$
test_instantiation_td 1.5667ms 1.0749ms 930.3436 Ops/s 861.2229 Ops/s $\textbf{\color{#35bf28}+8.03\%}$
test_exec_functorch 0.2000ms 0.1735ms 5.7630 KOps/s 5.7103 KOps/s $\color{#35bf28}+0.92\%$
test_exec_functional_call 0.1964ms 0.1671ms 5.9827 KOps/s 6.0976 KOps/s $\color{#d91a1a}-1.88\%$
test_exec_td 0.1973ms 0.1607ms 6.2241 KOps/s 6.2883 KOps/s $\color{#d91a1a}-1.02\%$
test_exec_td_decorator 0.9013ms 0.2442ms 4.0947 KOps/s 4.0470 KOps/s $\color{#35bf28}+1.18\%$
test_vmap_mlp_speed[True-True] 0.7338ms 0.6077ms 1.6455 KOps/s 1.5917 KOps/s $\color{#35bf28}+3.38\%$
test_vmap_mlp_speed[True-False] 0.6721ms 0.6025ms 1.6599 KOps/s 1.5911 KOps/s $\color{#35bf28}+4.32\%$
test_vmap_mlp_speed[False-True] 0.5555ms 0.5185ms 1.9287 KOps/s 1.8576 KOps/s $\color{#35bf28}+3.82\%$
test_vmap_mlp_speed[False-False] 0.5645ms 0.5188ms 1.9276 KOps/s 1.8491 KOps/s $\color{#35bf28}+4.24\%$
test_vmap_mlp_speed_decorator[True-True] 1.3189ms 0.6782ms 1.4744 KOps/s 1.4187 KOps/s $\color{#35bf28}+3.93\%$
test_vmap_mlp_speed_decorator[True-False] 0.7956ms 0.6773ms 1.4765 KOps/s 1.4198 KOps/s $\color{#35bf28}+3.99\%$
test_vmap_mlp_speed_decorator[False-True] 0.7064ms 0.5867ms 1.7043 KOps/s 1.6376 KOps/s $\color{#35bf28}+4.07\%$
test_vmap_mlp_speed_decorator[False-False] 0.6943ms 0.5885ms 1.6993 KOps/s 1.6393 KOps/s $\color{#35bf28}+3.66\%$
test_vmap_transformer_speed[True-True] 7.6697ms 7.5885ms 131.7785 Ops/s 127.7062 Ops/s $\color{#35bf28}+3.19\%$
test_vmap_transformer_speed[True-False] 8.1786ms 7.6458ms 130.7902 Ops/s 128.0872 Ops/s $\color{#35bf28}+2.11\%$
test_vmap_transformer_speed[False-True] 7.8400ms 7.5160ms 133.0499 Ops/s 129.5190 Ops/s $\color{#35bf28}+2.73\%$
test_vmap_transformer_speed[False-False] 8.2688ms 7.5973ms 131.6251 Ops/s 129.6764 Ops/s $\color{#35bf28}+1.50\%$
test_vmap_transformer_speed_decorator[True-True] 19.3865ms 18.4535ms 54.1901 Ops/s 52.5880 Ops/s $\color{#35bf28}+3.05\%$
test_vmap_transformer_speed_decorator[True-False] 19.8040ms 18.4079ms 54.3246 Ops/s 52.6077 Ops/s $\color{#35bf28}+3.26\%$
test_vmap_transformer_speed_decorator[False-True] 19.1810ms 18.3043ms 54.6320 Ops/s 52.8019 Ops/s $\color{#35bf28}+3.47\%$
test_vmap_transformer_speed_decorator[False-False] 18.3274ms 18.2481ms 54.8001 Ops/s 52.9231 Ops/s $\color{#35bf28}+3.55\%$
test_to_module_speed[True] 2.0031ms 1.9056ms 524.7560 Ops/s 521.3636 Ops/s $\color{#35bf28}+0.65\%$
test_to_module_speed[False] 1.9797ms 1.8807ms 531.7107 Ops/s 527.2584 Ops/s $\color{#35bf28}+0.84\%$
test_tc_init 60.6710μs 32.6744μs 30.6050 KOps/s 13.9136 KOps/s $\textbf{\color{#35bf28}+119.96\%}$
test_tc_init_nested 86.4520μs 63.2893μs 15.8005 KOps/s 6.5205 KOps/s $\textbf{\color{#35bf28}+142.32\%}$
test_tc_first_layer_tensor 1.4811μs 0.6877μs 1.4542 MOps/s 151.7165 KOps/s $\textbf{\color{#35bf28}+858.50\%}$
test_tc_first_layer_nontensor 1.4425μs 0.6863μs 1.4571 MOps/s 155.1044 KOps/s $\textbf{\color{#35bf28}+839.46\%}$
test_tc_second_layer_tensor 17.2100μs 2.0246μs 493.9214 KOps/s 81.0368 KOps/s $\textbf{\color{#35bf28}+509.50\%}$
test_tc_second_layer_nontensor 8.9203μs 1.5959μs 626.6207 KOps/s 80.4124 KOps/s $\textbf{\color{#35bf28}+679.26\%}$
test_unbind 95.8214ms 8.9893ms 111.2438 Ops/s 76.2340 Ops/s $\textbf{\color{#35bf28}+45.92\%}$
test_full_like 13.7936ms 13.4808ms 74.1797 Ops/s 84.7511 Ops/s $\textbf{\color{#d91a1a}-12.47\%}$
test_zeros_like 8.1692ms 7.9593ms 125.6398 Ops/s 124.8989 Ops/s $\color{#35bf28}+0.59\%$
test_ones_like 8.4119ms 7.9619ms 125.5988 Ops/s 123.9740 Ops/s $\color{#35bf28}+1.31\%$
test_clone 9.8658ms 9.5936ms 104.2365 Ops/s 101.0100 Ops/s $\color{#35bf28}+3.19\%$
test_squeeze 66.5420μs 14.4214μs 69.3416 KOps/s 34.0236 KOps/s $\textbf{\color{#35bf28}+103.80\%}$
test_unsqueeze 0.1218ms 70.6438μs 14.1555 KOps/s 9.9217 KOps/s $\textbf{\color{#35bf28}+42.67\%}$
test_split 0.1812ms 0.1179ms 8.4816 KOps/s 5.8036 KOps/s $\textbf{\color{#35bf28}+46.14\%}$
test_permute 0.2030ms 0.1302ms 7.6795 KOps/s 5.8328 KOps/s $\textbf{\color{#35bf28}+31.66\%}$
test_stack 28.1398ms 27.7716ms 36.0080 Ops/s 34.9618 Ops/s $\color{#35bf28}+2.99\%$
test_cat 28.3083ms 27.7355ms 36.0549 Ops/s 35.1082 Ops/s $\color{#35bf28}+2.70\%$

@vmoens vmoens merged commit 41002bd into main May 25, 2024
23 of 28 checks passed
@vmoens vmoens deleted the faster-tc branch May 25, 2024 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants