Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] auto_batch_size_ #630

Merged
merged 1 commit into from
Jan 18, 2024
Merged

[Feature] auto_batch_size_ #630

merged 1 commit into from
Jan 18, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 18, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 18, 2024
@vmoens vmoens added the enhancement New feature or request label Jan 18, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 124. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}19$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 31.2580μs 16.6026μs 60.2317 KOps/s 60.3341 KOps/s $\color{#d91a1a}-0.17\%$
test_plain_set_stack_nested 0.1927ms 0.1400ms 7.1431 KOps/s 6.9356 KOps/s $\color{#35bf28}+2.99\%$
test_plain_set_nested_inplace 47.1080μs 18.8295μs 53.1080 KOps/s 53.2112 KOps/s $\color{#d91a1a}-0.19\%$
test_plain_set_stack_nested_inplace 0.3099ms 0.1763ms 5.6725 KOps/s 5.6763 KOps/s $\color{#d91a1a}-0.07\%$
test_items 15.3690μs 2.4017μs 416.3749 KOps/s 409.6448 KOps/s $\color{#35bf28}+1.64\%$
test_items_nested 0.4249ms 0.2687ms 3.7211 KOps/s 3.4970 KOps/s $\textbf{\color{#35bf28}+6.41\%}$
test_items_nested_locked 0.3979ms 0.2666ms 3.7503 KOps/s 3.4894 KOps/s $\textbf{\color{#35bf28}+7.48\%}$
test_items_nested_leaf 0.6892ms 0.1652ms 6.0515 KOps/s 5.6045 KOps/s $\textbf{\color{#35bf28}+7.98\%}$
test_items_stack_nested 1.5089ms 1.3127ms 761.7761 Ops/s 747.2534 Ops/s $\color{#35bf28}+1.94\%$
test_items_stack_nested_leaf 1.4659ms 1.1808ms 846.8561 Ops/s 825.2271 Ops/s $\color{#35bf28}+2.62\%$
test_items_stack_nested_locked 1.1798ms 0.8703ms 1.1491 KOps/s 1.1235 KOps/s $\color{#35bf28}+2.28\%$
test_keys 17.9140μs 3.8799μs 257.7401 KOps/s 248.3519 KOps/s $\color{#35bf28}+3.78\%$
test_keys_nested 47.6501ms 0.1539ms 6.4982 KOps/s 6.7903 KOps/s $\color{#d91a1a}-4.30\%$
test_keys_nested_locked 0.2572ms 0.1501ms 6.6626 KOps/s 6.5505 KOps/s $\color{#35bf28}+1.71\%$
test_keys_nested_leaf 0.2333ms 0.1275ms 7.8449 KOps/s 7.7060 KOps/s $\color{#35bf28}+1.80\%$
test_keys_stack_nested 1.9708ms 1.2651ms 790.4225 Ops/s 780.0356 Ops/s $\color{#35bf28}+1.33\%$
test_keys_stack_nested_leaf 1.5204ms 1.2560ms 796.1798 Ops/s 775.0904 Ops/s $\color{#35bf28}+2.72\%$
test_keys_stack_nested_locked 1.0437ms 0.8075ms 1.2383 KOps/s 1.2327 KOps/s $\color{#35bf28}+0.46\%$
test_values 8.6560μs 1.1525μs 867.6676 KOps/s 855.6045 KOps/s $\color{#35bf28}+1.41\%$
test_values_nested 97.7210μs 51.1690μs 19.5431 KOps/s 19.2839 KOps/s $\color{#35bf28}+1.34\%$
test_values_nested_locked 0.1012ms 51.1636μs 19.5452 KOps/s 19.1334 KOps/s $\color{#35bf28}+2.15\%$
test_values_nested_leaf 88.0540μs 45.9889μs 21.7444 KOps/s 21.1091 KOps/s $\color{#35bf28}+3.01\%$
test_values_stack_nested 1.6239ms 1.0356ms 965.6105 Ops/s 967.1996 Ops/s $\color{#d91a1a}-0.16\%$
test_values_stack_nested_leaf 1.2765ms 1.0247ms 975.9338 Ops/s 969.0949 Ops/s $\color{#35bf28}+0.71\%$
test_values_stack_nested_locked 1.0137ms 0.6079ms 1.6451 KOps/s 1.6438 KOps/s $\color{#35bf28}+0.08\%$
test_membership 10.5600μs 1.3271μs 753.5398 KOps/s 752.9106 KOps/s $\color{#35bf28}+0.08\%$
test_membership_nested 21.6500μs 3.3635μs 297.3053 KOps/s 278.0281 KOps/s $\textbf{\color{#35bf28}+6.93\%}$
test_membership_nested_leaf 22.6820μs 3.4152μs 292.8123 KOps/s 279.6292 KOps/s $\color{#35bf28}+4.71\%$
test_membership_stacked_nested 48.0790μs 11.8157μs 84.6332 KOps/s 80.9870 KOps/s $\color{#35bf28}+4.50\%$
test_membership_stacked_nested_leaf 34.8850μs 11.7919μs 84.8041 KOps/s 83.5822 KOps/s $\color{#35bf28}+1.46\%$
test_membership_nested_last 46.0560μs 6.5950μs 151.6305 KOps/s 146.7547 KOps/s $\color{#35bf28}+3.32\%$
test_membership_nested_leaf_last 30.9380μs 6.5914μs 151.7121 KOps/s 146.3615 KOps/s $\color{#35bf28}+3.66\%$
test_membership_stacked_nested_last 0.2897ms 0.1722ms 5.8079 KOps/s 5.5958 KOps/s $\color{#35bf28}+3.79\%$
test_membership_stacked_nested_leaf_last 48.6410μs 14.1206μs 70.8183 KOps/s 70.7892 KOps/s $\color{#35bf28}+0.04\%$
test_nested_getleaf 30.3170μs 10.6398μs 93.9869 KOps/s 91.1983 KOps/s $\color{#35bf28}+3.06\%$
test_nested_get 29.7860μs 10.1041μs 98.9694 KOps/s 95.5870 KOps/s $\color{#35bf28}+3.54\%$
test_stacked_getleaf 0.6253ms 0.3982ms 2.5111 KOps/s 2.4543 KOps/s $\color{#35bf28}+2.31\%$
test_stacked_get 0.5761ms 0.3618ms 2.7638 KOps/s 2.6395 KOps/s $\color{#35bf28}+4.71\%$
test_nested_getitemleaf 33.0720μs 10.7031μs 93.4311 KOps/s 90.4970 KOps/s $\color{#35bf28}+3.24\%$
test_nested_getitem 42.0580μs 10.1414μs 98.6057 KOps/s 95.8164 KOps/s $\color{#35bf28}+2.91\%$
test_stacked_getitemleaf 0.7091ms 0.4030ms 2.4812 KOps/s 2.4684 KOps/s $\color{#35bf28}+0.52\%$
test_stacked_getitem 0.5877ms 0.3653ms 2.7378 KOps/s 2.6739 KOps/s $\color{#35bf28}+2.39\%$
test_lock_nested 1.2715ms 0.3894ms 2.5678 KOps/s 2.9727 KOps/s $\textbf{\color{#d91a1a}-13.62\%}$
test_lock_stack_nested 71.5898ms 6.1385ms 162.9051 Ops/s 186.7083 Ops/s $\textbf{\color{#d91a1a}-12.75\%}$
test_unlock_nested 58.9624ms 0.4490ms 2.2272 KOps/s 2.5082 KOps/s $\textbf{\color{#d91a1a}-11.20\%}$
test_unlock_stack_nested 73.0555ms 5.8224ms 171.7496 Ops/s 182.0456 Ops/s $\textbf{\color{#d91a1a}-5.66\%}$
test_flatten_speed 4.9932ms 0.3723ms 2.6863 KOps/s 2.6953 KOps/s $\color{#d91a1a}-0.33\%$
test_unflatten_speed 0.6531ms 0.4491ms 2.2266 KOps/s 2.1097 KOps/s $\textbf{\color{#35bf28}+5.54\%}$
test_common_ops 4.5860ms 0.6682ms 1.4966 KOps/s 1.4403 KOps/s $\color{#35bf28}+3.91\%$
test_creation 23.8560μs 1.8332μs 545.5022 KOps/s 520.2139 KOps/s $\color{#35bf28}+4.86\%$
test_creation_empty 41.6270μs 10.0395μs 99.6061 KOps/s 110.8816 KOps/s $\textbf{\color{#d91a1a}-10.17\%}$
test_creation_nested_1 59.2400μs 12.5503μs 79.6793 KOps/s 85.8118 KOps/s $\textbf{\color{#d91a1a}-7.15\%}$
test_creation_nested_2 55.8150μs 15.8449μs 63.1118 KOps/s 68.4857 KOps/s $\textbf{\color{#d91a1a}-7.85\%}$
test_clone 97.6420μs 12.7123μs 78.6640 KOps/s 74.2815 KOps/s $\textbf{\color{#35bf28}+5.90\%}$
test_getitem[int] 37.8710μs 11.1187μs 89.9386 KOps/s 89.3080 KOps/s $\color{#35bf28}+0.71\%$
test_getitem[slice_int] 60.7630μs 22.5582μs 44.3297 KOps/s 44.0885 KOps/s $\color{#35bf28}+0.55\%$
test_getitem[range] 0.1013ms 41.4415μs 24.1304 KOps/s 24.3459 KOps/s $\color{#d91a1a}-0.89\%$
test_getitem[tuple] 49.6920μs 18.5161μs 54.0070 KOps/s 52.7340 KOps/s $\color{#35bf28}+2.41\%$
test_getitem[list] 0.2493ms 36.9432μs 27.0686 KOps/s 27.6244 KOps/s $\color{#d91a1a}-2.01\%$
test_setitem_dim[int] 56.0650μs 29.3252μs 34.1003 KOps/s 36.0515 KOps/s $\textbf{\color{#d91a1a}-5.41\%}$
test_setitem_dim[slice_int] 98.0620μs 55.9404μs 17.8762 KOps/s 18.6521 KOps/s $\color{#d91a1a}-4.16\%$
test_setitem_dim[range] 0.1437ms 74.2509μs 13.4679 KOps/s 13.8987 KOps/s $\color{#d91a1a}-3.10\%$
test_setitem_dim[tuple] 64.6800μs 43.7124μs 22.8768 KOps/s 23.5671 KOps/s $\color{#d91a1a}-2.93\%$
test_setitem 0.1043ms 18.9208μs 52.8518 KOps/s 51.1330 KOps/s $\color{#35bf28}+3.36\%$
test_set 0.1083ms 18.5284μs 53.9713 KOps/s 53.1772 KOps/s $\color{#35bf28}+1.49\%$
test_set_shared 3.1450ms 0.1405ms 7.1152 KOps/s 6.8529 KOps/s $\color{#35bf28}+3.83\%$
test_update 0.1332ms 20.9742μs 47.6776 KOps/s 47.6735 KOps/s $+0.01\%$
test_update_nested 0.1576ms 28.3461μs 35.2782 KOps/s 34.3466 KOps/s $\color{#35bf28}+2.71\%$
test_set_nested 92.3320μs 19.8596μs 50.3535 KOps/s 48.4324 KOps/s $\color{#35bf28}+3.97\%$
test_set_nested_new 0.1406ms 23.9670μs 41.7240 KOps/s 40.1643 KOps/s $\color{#35bf28}+3.88\%$
test_select 0.1163ms 36.6004μs 27.3221 KOps/s 26.5611 KOps/s $\color{#35bf28}+2.87\%$
test_select_nested 0.1121ms 58.6032μs 17.0639 KOps/s 16.8858 KOps/s $\color{#35bf28}+1.05\%$
test_exclude_nested 0.2045ms 0.1075ms 9.3026 KOps/s 9.0495 KOps/s $\color{#35bf28}+2.80\%$
test_empty[True] 0.5021ms 0.3248ms 3.0789 KOps/s 2.9929 KOps/s $\color{#35bf28}+2.88\%$
test_empty[False] 4.4702μs 1.0042μs 995.8577 KOps/s 941.4657 KOps/s $\textbf{\color{#35bf28}+5.78\%}$
test_unbind_speed 0.3888ms 0.3200ms 3.1251 KOps/s 4.0697 KOps/s $\textbf{\color{#d91a1a}-23.21\%}$
test_unbind_speed_stack0 71.5347ms 4.1854ms 238.9246 Ops/s 332.5318 Ops/s $\textbf{\color{#d91a1a}-28.15\%}$
test_unbind_speed_stack1 1.5489μs 0.6434μs 1.5542 MOps/s 511.9876 KOps/s $\textbf{\color{#35bf28}+203.56\%}$
test_split 1.8739ms 1.4742ms 678.3269 Ops/s 621.7456 Ops/s $\textbf{\color{#35bf28}+9.10\%}$
test_chunk 64.6699ms 1.5707ms 636.6498 Ops/s 635.9680 Ops/s $\color{#35bf28}+0.11\%$
test_creation[device0] 0.2100ms 99.7114μs 10.0289 KOps/s 10.0265 KOps/s $\color{#35bf28}+0.02\%$
test_creation_from_tensor 3.2893ms 80.6498μs 12.3993 KOps/s 12.4389 KOps/s $\color{#d91a1a}-0.32\%$
test_add_one[memmap_tensor0] 0.2342ms 5.2419μs 190.7720 KOps/s 190.2858 KOps/s $\color{#35bf28}+0.26\%$
test_contiguous[memmap_tensor0] 19.0050μs 0.6470μs 1.5455 MOps/s 1.5607 MOps/s $\color{#d91a1a}-0.98\%$
test_stack[memmap_tensor0] 52.5270μs 3.6772μs 271.9484 KOps/s 287.0070 KOps/s $\textbf{\color{#d91a1a}-5.25\%}$
test_memmaptd_index 1.0996ms 0.2223ms 4.4977 KOps/s 4.5471 KOps/s $\color{#d91a1a}-1.09\%$
test_memmaptd_index_astensor 0.5049ms 0.2833ms 3.5301 KOps/s 3.5472 KOps/s $\color{#d91a1a}-0.48\%$
test_memmaptd_index_op 0.9107ms 0.5737ms 1.7432 KOps/s 1.8279 KOps/s $\color{#d91a1a}-4.64\%$
test_serialize_model 0.1726s 0.1073s 9.3192 Ops/s 9.2437 Ops/s $\color{#35bf28}+0.82\%$
test_serialize_model_pickle 0.4685s 0.3728s 2.6822 Ops/s 2.6327 Ops/s $\color{#35bf28}+1.88\%$
test_serialize_weights 0.1103s 97.2941ms 10.2781 Ops/s 9.2717 Ops/s $\textbf{\color{#35bf28}+10.85\%}$
test_serialize_weights_returnearly 0.3073s 0.1517s 6.5909 Ops/s 7.3278 Ops/s $\textbf{\color{#d91a1a}-10.06\%}$
test_serialize_weights_pickle 0.6891s 0.5162s 1.9372 Ops/s 2.3536 Ops/s $\textbf{\color{#d91a1a}-17.69\%}$
test_serialize_weights_filesystem 0.1488s 96.6282ms 10.3489 Ops/s 10.5120 Ops/s $\color{#d91a1a}-1.55\%$
test_serialize_model_filesystem 0.1320s 0.1007s 9.9278 Ops/s 11.0800 Ops/s $\textbf{\color{#d91a1a}-10.40\%}$
test_reshape_pytree 52.5580μs 23.3379μs 42.8488 KOps/s 42.1412 KOps/s $\color{#35bf28}+1.68\%$
test_reshape_td 69.9600μs 30.0136μs 33.3183 KOps/s 32.6293 KOps/s $\color{#35bf28}+2.11\%$
test_view_pytree 55.6430μs 23.2292μs 43.0493 KOps/s 42.4914 KOps/s $\color{#35bf28}+1.31\%$
test_view_td 23.1830μs 4.9391μs 202.4666 KOps/s 196.3839 KOps/s $\color{#35bf28}+3.10\%$
test_unbind_pytree 63.8890μs 26.8697μs 37.2166 KOps/s 37.5425 KOps/s $\color{#d91a1a}-0.87\%$
test_unbind_td 0.1210ms 49.9485μs 20.0206 KOps/s 27.6483 KOps/s $\textbf{\color{#d91a1a}-27.59\%}$
test_split_pytree 61.4750μs 26.4697μs 37.7790 KOps/s 37.8909 KOps/s $\color{#d91a1a}-0.30\%$
test_split_td 0.5836ms 40.0574μs 24.9642 KOps/s 24.5685 KOps/s $\color{#35bf28}+1.61\%$
test_add_pytree 0.1004ms 31.7603μs 31.4859 KOps/s 31.1087 KOps/s $\color{#35bf28}+1.21\%$
test_add_td 0.1572ms 50.9289μs 19.6352 KOps/s 21.5653 KOps/s $\textbf{\color{#d91a1a}-8.95\%}$
test_distributed 0.1934ms 0.1015ms 9.8540 KOps/s 9.5084 KOps/s $\color{#35bf28}+3.64\%$
test_tdmodule 0.7961ms 22.9560μs 43.5616 KOps/s 45.8690 KOps/s $\textbf{\color{#d91a1a}-5.03\%}$
test_tdmodule_dispatch 0.1854ms 39.8753μs 25.0782 KOps/s 26.0702 KOps/s $\color{#d91a1a}-3.81\%$
test_tdseq 44.1620μs 24.8524μs 40.2375 KOps/s 40.0649 KOps/s $\color{#35bf28}+0.43\%$
test_tdseq_dispatch 0.1400ms 44.6697μs 22.3865 KOps/s 23.5103 KOps/s $\color{#d91a1a}-4.78\%$
test_instantiation_functorch 1.5243ms 1.2878ms 776.4888 Ops/s 755.6997 Ops/s $\color{#35bf28}+2.75\%$
test_instantiation_td 1.4926ms 1.0089ms 991.1936 Ops/s 985.0442 Ops/s $\color{#35bf28}+0.62\%$
test_exec_functorch 0.2949ms 0.1561ms 6.4056 KOps/s 6.2943 KOps/s $\color{#35bf28}+1.77\%$
test_exec_functional_call 0.2821ms 0.1424ms 7.0206 KOps/s 6.8268 KOps/s $\color{#35bf28}+2.84\%$
test_exec_td 0.2573ms 0.1366ms 7.3224 KOps/s 6.8713 KOps/s $\textbf{\color{#35bf28}+6.56\%}$
test_exec_td_decorator 70.0215ms 0.1893ms 5.2812 KOps/s 5.6169 KOps/s $\textbf{\color{#d91a1a}-5.98\%}$
test_vmap_mlp_speed[True-True] 1.2652ms 0.8822ms 1.1335 KOps/s 1.1304 KOps/s $\color{#35bf28}+0.28\%$
test_vmap_mlp_speed[True-False] 0.8996ms 0.4753ms 2.1037 KOps/s 2.1353 KOps/s $\color{#d91a1a}-1.48\%$
test_vmap_mlp_speed[False-True] 0.8964ms 0.7604ms 1.3151 KOps/s 1.2933 KOps/s $\color{#35bf28}+1.68\%$
test_vmap_mlp_speed[False-False] 0.6085ms 0.3820ms 2.6178 KOps/s 2.5849 KOps/s $\color{#35bf28}+1.27\%$
test_vmap_mlp_speed_decorator[True-True] 2.9126ms 2.3720ms 421.5774 Ops/s 424.5754 Ops/s $\color{#d91a1a}-0.71\%$
test_vmap_mlp_speed_decorator[True-False] 0.9706ms 0.5181ms 1.9302 KOps/s 1.9427 KOps/s $\color{#d91a1a}-0.64\%$
test_vmap_mlp_speed_decorator[False-True] 2.6052ms 1.9421ms 514.9058 Ops/s 542.0365 Ops/s $\textbf{\color{#d91a1a}-5.01\%}$
test_vmap_mlp_speed_decorator[False-False] 0.7484ms 0.3984ms 2.5102 KOps/s 2.4886 KOps/s $\color{#35bf28}+0.87\%$

@vmoens vmoens merged commit 89348f1 into main Jan 18, 2024
45 checks passed
@vmoens vmoens deleted the auto-batch-size branch January 18, 2024 15:48
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 132. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}16$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 60.7058ms 18.0805μs 55.3081 KOps/s 70.3450 KOps/s $\textbf{\color{#d91a1a}-21.38\%}$
test_plain_set_stack_nested 0.1406ms 0.1172ms 8.5309 KOps/s 8.3079 KOps/s $\color{#35bf28}+2.68\%$
test_plain_set_nested_inplace 33.1510μs 15.2015μs 65.7828 KOps/s 64.4587 KOps/s $\color{#35bf28}+2.05\%$
test_plain_set_stack_nested_inplace 0.1695ms 0.1448ms 6.9053 KOps/s 6.7110 KOps/s $\color{#35bf28}+2.89\%$
test_items 0.1223ms 4.8709μs 205.3017 KOps/s 208.7179 KOps/s $\color{#d91a1a}-1.64\%$
test_items_nested 0.5050ms 0.3440ms 2.9071 KOps/s 2.9057 KOps/s $\color{#35bf28}+0.05\%$
test_items_nested_locked 0.3889ms 0.3465ms 2.8863 KOps/s 2.8915 KOps/s $\color{#d91a1a}-0.18\%$
test_items_nested_leaf 0.2416ms 0.2028ms 4.9316 KOps/s 4.9299 KOps/s $\color{#35bf28}+0.03\%$
test_items_stack_nested 1.5419ms 1.3107ms 762.9737 Ops/s 749.8383 Ops/s $\color{#35bf28}+1.75\%$
test_items_stack_nested_leaf 1.2797ms 1.1500ms 869.5347 Ops/s 856.4557 Ops/s $\color{#35bf28}+1.53\%$
test_items_stack_nested_locked 1.8847ms 0.9175ms 1.0899 KOps/s 1.0943 KOps/s $\color{#d91a1a}-0.41\%$
test_keys 20.9810μs 4.7781μs 209.2879 KOps/s 219.3003 KOps/s $\color{#d91a1a}-4.57\%$
test_keys_nested 0.4804ms 96.1995μs 10.3951 KOps/s 10.4481 KOps/s $\color{#d91a1a}-0.51\%$
test_keys_nested_locked 0.1310ms 98.4147μs 10.1611 KOps/s 10.1361 KOps/s $\color{#35bf28}+0.25\%$
test_keys_nested_leaf 0.1802ms 79.0861μs 12.6444 KOps/s 12.6843 KOps/s $\color{#d91a1a}-0.31\%$
test_keys_stack_nested 1.2933ms 1.1335ms 882.2111 Ops/s 860.0208 Ops/s $\color{#35bf28}+2.58\%$
test_keys_stack_nested_leaf 1.2475ms 1.1281ms 886.4463 Ops/s 856.2838 Ops/s $\color{#35bf28}+3.52\%$
test_keys_stack_nested_locked 0.8736ms 0.7469ms 1.3389 KOps/s 1.3531 KOps/s $\color{#d91a1a}-1.04\%$
test_values 7.4537μs 1.8869μs 529.9649 KOps/s 524.0581 KOps/s $\color{#35bf28}+1.13\%$
test_values_nested 66.3010μs 45.2089μs 22.1195 KOps/s 21.9510 KOps/s $\color{#35bf28}+0.77\%$
test_values_nested_locked 79.4810μs 47.5788μs 21.0177 KOps/s 20.8932 KOps/s $\color{#35bf28}+0.60\%$
test_values_nested_leaf 0.2272ms 39.7645μs 25.1481 KOps/s 25.0472 KOps/s $\color{#35bf28}+0.40\%$
test_values_stack_nested 1.1453ms 0.9670ms 1.0342 KOps/s 1.0363 KOps/s $\color{#d91a1a}-0.20\%$
test_values_stack_nested_leaf 1.1614ms 0.9546ms 1.0476 KOps/s 1.0369 KOps/s $\color{#35bf28}+1.03\%$
test_values_stack_nested_locked 1.0042ms 0.5917ms 1.6901 KOps/s 1.7386 KOps/s $\color{#d91a1a}-2.79\%$
test_membership 16.0600μs 1.0519μs 950.7047 KOps/s 1.0564 MOps/s $\textbf{\color{#d91a1a}-10.01\%}$
test_membership_nested 24.0310μs 2.9430μs 339.7870 KOps/s 344.4663 KOps/s $\color{#d91a1a}-1.36\%$
test_membership_nested_leaf 20.9700μs 2.9126μs 343.3320 KOps/s 346.3035 KOps/s $\color{#d91a1a}-0.86\%$
test_membership_stacked_nested 93.1920μs 11.0955μs 90.1268 KOps/s 88.7966 KOps/s $\color{#35bf28}+1.50\%$
test_membership_stacked_nested_leaf 30.0400μs 11.0023μs 90.8903 KOps/s 88.3462 KOps/s $\color{#35bf28}+2.88\%$
test_membership_nested_last 35.8300μs 5.3363μs 187.3940 KOps/s 184.8738 KOps/s $\color{#35bf28}+1.36\%$
test_membership_nested_leaf_last 0.6323ms 5.3569μs 186.6741 KOps/s 185.3889 KOps/s $\color{#35bf28}+0.69\%$
test_membership_stacked_nested_last 0.2600ms 0.1415ms 7.0674 KOps/s 6.3560 KOps/s $\textbf{\color{#35bf28}+11.19\%}$
test_membership_stacked_nested_leaf_last 97.0210μs 13.0746μs 76.4840 KOps/s 76.5849 KOps/s $\color{#d91a1a}-0.13\%$
test_nested_getleaf 0.1973ms 8.3621μs 119.5869 KOps/s 119.1670 KOps/s $\color{#35bf28}+0.35\%$
test_nested_get 34.9590μs 7.9451μs 125.8643 KOps/s 126.0226 KOps/s $\color{#d91a1a}-0.13\%$
test_stacked_getleaf 0.5040ms 0.3160ms 3.1641 KOps/s 3.0306 KOps/s $\color{#35bf28}+4.41\%$
test_stacked_get 0.3195ms 0.2813ms 3.5546 KOps/s 3.3557 KOps/s $\textbf{\color{#35bf28}+5.93\%}$
test_nested_getitemleaf 22.6810μs 8.4287μs 118.6416 KOps/s 117.9759 KOps/s $\color{#35bf28}+0.56\%$
test_nested_getitem 29.2210μs 7.9636μs 125.5706 KOps/s 124.5136 KOps/s $\color{#35bf28}+0.85\%$
test_stacked_getitemleaf 0.4347ms 0.3198ms 3.1272 KOps/s 3.0248 KOps/s $\color{#35bf28}+3.39\%$
test_stacked_getitem 0.3962ms 0.2826ms 3.5384 KOps/s 3.3578 KOps/s $\textbf{\color{#35bf28}+5.38\%}$
test_lock_nested 0.8307ms 0.4020ms 2.4874 KOps/s 2.8451 KOps/s $\textbf{\color{#d91a1a}-12.57\%}$
test_lock_stack_nested 84.5780ms 6.4662ms 154.6493 Ops/s 158.6667 Ops/s $\color{#d91a1a}-2.53\%$
test_unlock_nested 1.0757ms 0.4018ms 2.4885 KOps/s 2.8724 KOps/s $\textbf{\color{#d91a1a}-13.36\%}$
test_unlock_stack_nested 84.0978ms 6.8683ms 145.5968 Ops/s 158.9672 Ops/s $\textbf{\color{#d91a1a}-8.41\%}$
test_flatten_speed 0.5368ms 0.2627ms 3.8072 KOps/s 3.7688 KOps/s $\color{#35bf28}+1.02\%$
test_unflatten_speed 0.4089ms 0.3595ms 2.7817 KOps/s 2.7102 KOps/s $\color{#35bf28}+2.64\%$
test_common_ops 1.1866ms 0.6228ms 1.6056 KOps/s 1.4296 KOps/s $\textbf{\color{#35bf28}+12.31\%}$
test_creation 17.2300μs 1.5467μs 646.5313 KOps/s 657.0377 KOps/s $\color{#d91a1a}-1.60\%$
test_creation_empty 24.1910μs 8.7665μs 114.0700 KOps/s 106.8143 KOps/s $\textbf{\color{#35bf28}+6.79\%}$
test_creation_nested_1 28.1100μs 10.5412μs 94.8657 KOps/s 89.3020 KOps/s $\textbf{\color{#35bf28}+6.23\%}$
test_creation_nested_2 37.4310μs 12.9397μs 77.2814 KOps/s 73.2884 KOps/s $\textbf{\color{#35bf28}+5.45\%}$
test_clone 0.1985ms 13.6643μs 73.1835 KOps/s 73.0998 KOps/s $\color{#35bf28}+0.11\%$
test_getitem[int] 27.2600μs 10.7975μs 92.6139 KOps/s 93.4223 KOps/s $\color{#d91a1a}-0.87\%$
test_getitem[slice_int] 0.2087ms 21.1884μs 47.1956 KOps/s 46.9396 KOps/s $\color{#35bf28}+0.55\%$
test_getitem[range] 55.3520μs 35.2973μs 28.3308 KOps/s 28.1185 KOps/s $\color{#35bf28}+0.75\%$
test_getitem[tuple] 41.2800μs 18.5231μs 53.9866 KOps/s 53.2330 KOps/s $\color{#35bf28}+1.42\%$
test_getitem[list] 0.4387ms 36.6844μs 27.2596 KOps/s 28.7907 KOps/s $\textbf{\color{#d91a1a}-5.32\%}$
test_setitem_dim[int] 47.5500μs 30.7978μs 32.4698 KOps/s 36.1851 KOps/s $\textbf{\color{#d91a1a}-10.27\%}$
test_setitem_dim[slice_int] 67.6110μs 50.5535μs 19.7810 KOps/s 20.3604 KOps/s $\color{#d91a1a}-2.85\%$
test_setitem_dim[range] 81.1420μs 63.0095μs 15.8706 KOps/s 15.9952 KOps/s $\color{#d91a1a}-0.78\%$
test_setitem_dim[tuple] 0.2473ms 44.8937μs 22.2749 KOps/s 23.4786 KOps/s $\textbf{\color{#d91a1a}-5.13\%}$
test_setitem 0.1480ms 20.0030μs 49.9924 KOps/s 51.0329 KOps/s $\color{#d91a1a}-2.04\%$
test_set 0.1912ms 19.9625μs 50.0940 KOps/s 51.1205 KOps/s $\color{#d91a1a}-2.01\%$
test_set_shared 3.1734ms 0.1079ms 9.2716 KOps/s 9.7478 KOps/s $\color{#d91a1a}-4.89\%$
test_update 0.1969ms 22.9389μs 43.5940 KOps/s 45.5417 KOps/s $\color{#d91a1a}-4.28\%$
test_update_nested 0.1285ms 29.1664μs 34.2860 KOps/s 35.5976 KOps/s $\color{#d91a1a}-3.68\%$
test_set_nested 0.1167ms 19.5397μs 51.1778 KOps/s 49.7887 KOps/s $\color{#35bf28}+2.79\%$
test_set_nested_new 0.1323ms 23.0178μs 43.4447 KOps/s 42.4360 KOps/s $\color{#35bf28}+2.38\%$
test_select 0.1170ms 35.5955μs 28.0935 KOps/s 27.6440 KOps/s $\color{#35bf28}+1.63\%$
test_select_nested 85.0910μs 53.8080μs 18.5846 KOps/s 18.9440 KOps/s $\color{#d91a1a}-1.90\%$
test_exclude_nested 0.1399ms 0.1091ms 9.1671 KOps/s 9.3546 KOps/s $\color{#d91a1a}-2.00\%$
test_empty[True] 0.4199ms 0.3179ms 3.1452 KOps/s 3.0822 KOps/s $\color{#35bf28}+2.04\%$
test_empty[False] 2.3400μs 0.8744μs 1.1436 MOps/s 1.1815 MOps/s $\color{#d91a1a}-3.21\%$
test_to 70.0810μs 55.2068μs 18.1137 KOps/s 19.2467 KOps/s $\textbf{\color{#d91a1a}-5.89\%}$
test_to_nonblocking 0.1817ms 33.5948μs 29.7665 KOps/s 30.0318 KOps/s $\color{#d91a1a}-0.88\%$
test_unbind_speed 0.3867ms 0.3166ms 3.1583 KOps/s 3.7535 KOps/s $\textbf{\color{#d91a1a}-15.86\%}$
test_unbind_speed_stack0 80.7460ms 3.7167ms 269.0576 Ops/s 272.1622 Ops/s $\color{#d91a1a}-1.14\%$
test_unbind_speed_stack1 1.3050μs 0.5334μs 1.8748 MOps/s 568.6596 KOps/s $\textbf{\color{#35bf28}+229.70\%}$
test_split 75.3856ms 1.6877ms 592.5232 Ops/s 653.6257 Ops/s $\textbf{\color{#d91a1a}-9.35\%}$
test_chunk 76.7415ms 1.6717ms 598.1819 Ops/s 609.5747 Ops/s $\color{#d91a1a}-1.87\%$
test_creation[device0] 0.2095ms 71.6542μs 13.9559 KOps/s 14.2636 KOps/s $\color{#d91a1a}-2.16\%$
test_creation_from_tensor 0.2000ms 53.3242μs 18.7532 KOps/s 18.6891 KOps/s $\color{#35bf28}+0.34\%$
test_add_one[memmap_tensor0] 0.1458ms 6.9906μs 143.0493 KOps/s 144.9980 KOps/s $\color{#d91a1a}-1.34\%$
test_contiguous[memmap_tensor0] 23.5000μs 0.6319μs 1.5826 MOps/s 1.5582 MOps/s $\color{#35bf28}+1.57\%$
test_stack[memmap_tensor0] 19.0100μs 4.5910μs 217.8197 KOps/s 222.8749 KOps/s $\color{#d91a1a}-2.27\%$
test_memmaptd_index 0.4661ms 0.2522ms 3.9657 KOps/s 3.9222 KOps/s $\color{#35bf28}+1.11\%$
test_memmaptd_index_astensor 0.5637ms 0.3104ms 3.2212 KOps/s 3.1898 KOps/s $\color{#35bf28}+0.98\%$
test_memmaptd_index_op 0.9512ms 0.6175ms 1.6193 KOps/s 1.5922 KOps/s $\color{#35bf28}+1.71\%$
test_serialize_model 0.1733s 97.5548ms 10.2506 Ops/s 9.6184 Ops/s $\textbf{\color{#35bf28}+6.57\%}$
test_serialize_model_pickle 1.3775s 1.2396s 0.8067 Ops/s 0.8082 Ops/s $\color{#d91a1a}-0.18\%$
test_serialize_weights 0.1641s 95.1385ms 10.5110 Ops/s 10.0494 Ops/s $\color{#35bf28}+4.59\%$
test_serialize_weights_returnearly 0.2485s 69.6635ms 14.3547 Ops/s 12.7813 Ops/s $\textbf{\color{#35bf28}+12.31\%}$
test_serialize_weights_pickle 1.3538s 1.2377s 0.8080 Ops/s 0.8083 Ops/s $\color{#d91a1a}-0.05\%$
test_reshape_pytree 0.1441ms 25.0614μs 39.9021 KOps/s 40.4943 KOps/s $\color{#d91a1a}-1.46\%$
test_reshape_td 70.4010μs 29.3208μs 34.1055 KOps/s 34.4969 KOps/s $\color{#d91a1a}-1.13\%$
test_view_pytree 80.8220μs 24.9174μs 40.1326 KOps/s 40.2219 KOps/s $\color{#d91a1a}-0.22\%$
test_view_td 25.8800μs 4.2205μs 236.9379 KOps/s 227.3165 KOps/s $\color{#35bf28}+4.23\%$
test_unbind_pytree 0.1385ms 30.7499μs 32.5204 KOps/s 32.3050 KOps/s $\color{#35bf28}+0.67\%$
test_unbind_td 0.1852ms 50.7145μs 19.7182 KOps/s 24.2214 KOps/s $\textbf{\color{#d91a1a}-18.59\%}$
test_split_pytree 0.1176ms 28.0353μs 35.6693 KOps/s 34.8844 KOps/s $\color{#35bf28}+2.25\%$
test_split_td 0.7003ms 41.4874μs 24.1037 KOps/s 25.4920 KOps/s $\textbf{\color{#d91a1a}-5.45\%}$
test_add_pytree 0.1766ms 37.9271μs 26.3664 KOps/s 27.2935 KOps/s $\color{#d91a1a}-3.40\%$
test_add_td 82.7210μs 52.4996μs 19.0478 KOps/s 20.1723 KOps/s $\textbf{\color{#d91a1a}-5.57\%}$
test_distributed 0.2200ms 69.2428μs 14.4419 KOps/s 14.4968 KOps/s $\color{#d91a1a}-0.38\%$
test_tdmodule 0.1636ms 18.0229μs 55.4849 KOps/s 54.4247 KOps/s $\color{#35bf28}+1.95\%$
test_tdmodule_dispatch 0.1350ms 33.5745μs 29.7845 KOps/s 28.6045 KOps/s $\color{#35bf28}+4.13\%$
test_tdseq 43.5800μs 20.8349μs 47.9965 KOps/s 46.8284 KOps/s $\color{#35bf28}+2.49\%$
test_tdseq_dispatch 56.6420μs 36.9428μs 27.0689 KOps/s 26.4849 KOps/s $\color{#35bf28}+2.21\%$
test_instantiation_functorch 1.7598ms 1.6771ms 596.2824 Ops/s 594.8874 Ops/s $\color{#35bf28}+0.23\%$
test_instantiation_td 1.7048ms 1.1733ms 852.2991 Ops/s 847.4178 Ops/s $\color{#35bf28}+0.58\%$
test_exec_functorch 0.2991ms 0.1614ms 6.1975 KOps/s 6.1297 KOps/s $\color{#35bf28}+1.11\%$
test_exec_functional_call 0.2862ms 0.1611ms 6.2063 KOps/s 6.1662 KOps/s $\color{#35bf28}+0.65\%$
test_exec_td 0.3036ms 0.1528ms 6.5436 KOps/s 6.2349 KOps/s $\color{#35bf28}+4.95\%$
test_exec_td_decorator 0.6199ms 0.1913ms 5.2261 KOps/s 5.0836 KOps/s $\color{#35bf28}+2.80\%$
test_vmap_mlp_speed[True-True] 1.2740ms 1.1076ms 902.8400 Ops/s 894.1984 Ops/s $\color{#35bf28}+0.97\%$
test_vmap_mlp_speed[True-False] 0.8486ms 0.6640ms 1.5061 KOps/s 1.4929 KOps/s $\color{#35bf28}+0.89\%$
test_vmap_mlp_speed[False-True] 1.1732ms 1.0202ms 980.1984 Ops/s 974.3535 Ops/s $\color{#35bf28}+0.60\%$
test_vmap_mlp_speed[False-False] 0.7447ms 0.5941ms 1.6831 KOps/s 1.6730 KOps/s $\color{#35bf28}+0.61\%$
test_vmap_mlp_speed_decorator[True-True] 3.2299ms 2.5730ms 388.6517 Ops/s 412.7328 Ops/s $\textbf{\color{#d91a1a}-5.83\%}$
test_vmap_mlp_speed_decorator[True-False] 1.1651ms 0.7109ms 1.4067 KOps/s 1.3967 KOps/s $\color{#35bf28}+0.72\%$
test_vmap_mlp_speed_decorator[False-True] 2.4691ms 2.1011ms 475.9449 Ops/s 489.4464 Ops/s $\color{#d91a1a}-2.76\%$
test_vmap_mlp_speed_decorator[False-False] 1.0025ms 0.6146ms 1.6271 KOps/s 1.5957 KOps/s $\color{#35bf28}+1.97\%$
test_vmap_transformer_speed[True-True] 12.8401ms 12.4102ms 80.5791 Ops/s 78.2295 Ops/s $\color{#35bf28}+3.00\%$
test_vmap_transformer_speed[True-False] 8.5356ms 8.2672ms 120.9604 Ops/s 119.7492 Ops/s $\color{#35bf28}+1.01\%$
test_vmap_transformer_speed[False-True] 13.7821ms 12.8799ms 77.6403 Ops/s 81.6992 Ops/s $\color{#d91a1a}-4.97\%$
test_vmap_transformer_speed[False-False] 9.0272ms 8.3940ms 119.1331 Ops/s 122.8017 Ops/s $\color{#d91a1a}-2.99\%$
test_vmap_transformer_speed_decorator[True-True] 78.0867ms 77.0306ms 12.9819 Ops/s 12.5368 Ops/s $\color{#35bf28}+3.55\%$
test_vmap_transformer_speed_decorator[True-False] 22.0509ms 20.3478ms 49.1454 Ops/s 50.6837 Ops/s $\color{#d91a1a}-3.04\%$
test_vmap_transformer_speed_decorator[False-True] 70.5705ms 69.5856ms 14.3708 Ops/s 15.1433 Ops/s $\textbf{\color{#d91a1a}-5.10\%}$
test_vmap_transformer_speed_decorator[False-False] 21.4433ms 19.8034ms 50.4963 Ops/s 51.8218 Ops/s $\color{#d91a1a}-2.56\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants