Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster __setitem__ #985

Merged
merged 2 commits into from
Sep 11, 2024
Merged

[Performance] Faster __setitem__ #985

merged 2 commits into from
Sep 11, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Sep 11, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 11, 2024
Copy link

github-actions bot commented Sep 11, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 222. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 43.7020μs 19.7683μs 50.5861 KOps/s 50.6269 KOps/s $\color{#d91a1a}-0.08\%$
test_plain_set_stack_nested 76.2530μs 19.6018μs 51.0158 KOps/s 49.5060 KOps/s $\color{#35bf28}+3.05\%$
test_plain_set_nested_inplace 72.2440μs 21.0585μs 47.4868 KOps/s 46.7433 KOps/s $\color{#35bf28}+1.59\%$
test_plain_set_stack_nested_inplace 54.9220μs 21.0145μs 47.5863 KOps/s 46.8539 KOps/s $\color{#35bf28}+1.56\%$
test_items 37.0490μs 4.1491μs 241.0170 KOps/s 239.4892 KOps/s $\color{#35bf28}+0.64\%$
test_items_nested 0.5584ms 0.3280ms 3.0491 KOps/s 3.0665 KOps/s $\color{#d91a1a}-0.57\%$
test_items_nested_locked 0.7369ms 0.3297ms 3.0329 KOps/s 3.0506 KOps/s $\color{#d91a1a}-0.58\%$
test_items_nested_leaf 0.1506ms 83.6913μs 11.9487 KOps/s 11.8463 KOps/s $\color{#35bf28}+0.86\%$
test_items_stack_nested 0.5518ms 0.3336ms 2.9974 KOps/s 3.0575 KOps/s $\color{#d91a1a}-1.96\%$
test_items_stack_nested_leaf 0.1621ms 84.9211μs 11.7756 KOps/s 12.2768 KOps/s $\color{#d91a1a}-4.08\%$
test_items_stack_nested_locked 0.7023ms 0.3349ms 2.9863 KOps/s 3.0266 KOps/s $\color{#d91a1a}-1.33\%$
test_keys 26.8600μs 3.5421μs 282.3166 KOps/s 285.9806 KOps/s $\color{#d91a1a}-1.28\%$
test_keys_nested 0.2006ms 95.4475μs 10.4770 KOps/s 10.3481 KOps/s $\color{#35bf28}+1.25\%$
test_keys_nested_locked 0.6447ms 0.1029ms 9.7190 KOps/s 9.8357 KOps/s $\color{#d91a1a}-1.19\%$
test_keys_nested_leaf 0.1448ms 82.2726μs 12.1547 KOps/s 12.6615 KOps/s $\color{#d91a1a}-4.00\%$
test_keys_stack_nested 0.1700ms 96.9058μs 10.3193 KOps/s 10.5638 KOps/s $\color{#d91a1a}-2.31\%$
test_keys_stack_nested_leaf 0.1592ms 80.1452μs 12.4773 KOps/s 12.6525 KOps/s $\color{#d91a1a}-1.38\%$
test_keys_stack_nested_locked 0.2175ms 0.1010ms 9.8963 KOps/s 10.1344 KOps/s $\color{#d91a1a}-2.35\%$
test_values 6.8248μs 1.0805μs 925.5293 KOps/s 930.5236 KOps/s $\color{#d91a1a}-0.54\%$
test_values_nested 92.0120μs 47.8231μs 20.9104 KOps/s 20.8680 KOps/s $\color{#35bf28}+0.20\%$
test_values_nested_locked 0.1085ms 47.2998μs 21.1418 KOps/s 20.7587 KOps/s $\color{#35bf28}+1.85\%$
test_values_nested_leaf 82.0930μs 42.5775μs 23.4866 KOps/s 23.4490 KOps/s $\color{#35bf28}+0.16\%$
test_values_stack_nested 93.2440μs 47.9519μs 20.8543 KOps/s 20.8655 KOps/s $\color{#d91a1a}-0.05\%$
test_values_stack_nested_leaf 83.4450μs 42.0633μs 23.7737 KOps/s 23.8307 KOps/s $\color{#d91a1a}-0.24\%$
test_values_stack_nested_locked 0.1006ms 48.0192μs 20.8250 KOps/s 20.6665 KOps/s $\color{#35bf28}+0.77\%$
test_membership 5.7607μs 0.6778μs 1.4753 MOps/s 1.1994 MOps/s $\textbf{\color{#35bf28}+23.00\%}$
test_membership_nested 27.9120μs 2.5064μs 398.9718 KOps/s 372.8840 KOps/s $\textbf{\color{#35bf28}+7.00\%}$
test_membership_nested_leaf 32.9020μs 2.5525μs 391.7796 KOps/s 372.0231 KOps/s $\textbf{\color{#35bf28}+5.31\%}$
test_membership_stacked_nested 32.5710μs 2.5301μs 395.2409 KOps/s 371.1607 KOps/s $\textbf{\color{#35bf28}+6.49\%}$
test_membership_stacked_nested_leaf 20.2480μs 2.5478μs 392.4901 KOps/s 370.5554 KOps/s $\textbf{\color{#35bf28}+5.92\%}$
test_membership_nested_last 32.5400μs 3.6900μs 271.0017 KOps/s 244.9960 KOps/s $\textbf{\color{#35bf28}+10.61\%}$
test_membership_nested_leaf_last 28.8840μs 3.7914μs 263.7570 KOps/s 259.2550 KOps/s $\color{#35bf28}+1.74\%$
test_membership_stacked_nested_last 36.1470μs 4.2987μs 232.6302 KOps/s 74.3801 KOps/s $\textbf{\color{#35bf28}+212.76\%}$
test_membership_stacked_nested_leaf_last 24.5760μs 4.3835μs 228.1285 KOps/s 76.8339 KOps/s $\textbf{\color{#35bf28}+196.91\%}$
test_nested_getleaf 57.0770μs 10.4257μs 95.9165 KOps/s 95.3602 KOps/s $\color{#35bf28}+0.58\%$
test_nested_get 37.5700μs 10.0169μs 99.8313 KOps/s 99.0250 KOps/s $\color{#35bf28}+0.81\%$
test_stacked_getleaf 34.0730μs 10.4747μs 95.4677 KOps/s 94.0921 KOps/s $\color{#35bf28}+1.46\%$
test_stacked_get 33.7230μs 10.1589μs 98.4359 KOps/s 99.8489 KOps/s $\color{#d91a1a}-1.42\%$
test_nested_getitemleaf 53.9540μs 10.8826μs 91.8897 KOps/s 89.4438 KOps/s $\color{#35bf28}+2.73\%$
test_nested_getitem 37.6700μs 10.1090μs 98.9220 KOps/s 98.0792 KOps/s $\color{#35bf28}+0.86\%$
test_stacked_getitemleaf 53.2500μs 10.7499μs 93.0241 KOps/s 92.8758 KOps/s $\color{#35bf28}+0.16\%$
test_stacked_getitem 39.9550μs 9.9541μs 100.4613 KOps/s 95.9691 KOps/s $\color{#35bf28}+4.68\%$
test_lock_nested 91.0414ms 0.5740ms 1.7422 KOps/s 2.1243 KOps/s $\textbf{\color{#d91a1a}-17.99\%}$
test_lock_stack_nested 0.5270ms 0.4421ms 2.2618 KOps/s 2.3285 KOps/s $\color{#d91a1a}-2.86\%$
test_unlock_nested 0.1008s 0.5022ms 1.9911 KOps/s 2.5056 KOps/s $\textbf{\color{#d91a1a}-20.53\%}$
test_unlock_stack_nested 0.4507ms 0.3652ms 2.7383 KOps/s 2.8395 KOps/s $\color{#d91a1a}-3.57\%$
test_flatten_speed 0.2557ms 0.1041ms 9.6058 KOps/s 9.6144 KOps/s $\color{#d91a1a}-0.09\%$
test_unflatten_speed 0.8302ms 0.4571ms 2.1878 KOps/s 2.1921 KOps/s $\color{#d91a1a}-0.20\%$
test_common_ops 1.8974ms 1.0786ms 927.1560 Ops/s 933.7538 Ops/s $\color{#d91a1a}-0.71\%$
test_creation 30.5260μs 2.0389μs 490.4577 KOps/s 481.7022 KOps/s $\color{#35bf28}+1.82\%$
test_creation_empty 55.2340μs 15.8990μs 62.8970 KOps/s 58.2018 KOps/s $\textbf{\color{#35bf28}+8.07\%}$
test_creation_nested_1 1.2741ms 19.9306μs 50.1740 KOps/s 48.7585 KOps/s $\color{#35bf28}+2.90\%$
test_creation_nested_2 66.2530μs 23.1977μs 43.1078 KOps/s 41.0878 KOps/s $\color{#35bf28}+4.92\%$
test_clone 0.1559ms 16.7753μs 59.6115 KOps/s 61.0164 KOps/s $\color{#d91a1a}-2.30\%$
test_getitem[int] 0.7285ms 16.2275μs 61.6239 KOps/s 60.2766 KOps/s $\color{#35bf28}+2.24\%$
test_getitem[slice_int] 0.1346ms 29.4544μs 33.9508 KOps/s 33.7445 KOps/s $\color{#35bf28}+0.61\%$
test_getitem[range] 0.1687ms 58.2448μs 17.1689 KOps/s 17.4578 KOps/s $\color{#d91a1a}-1.65\%$
test_getitem[tuple] 0.1387ms 24.8195μs 40.2909 KOps/s 40.9777 KOps/s $\color{#d91a1a}-1.68\%$
test_getitem[list] 0.7665ms 51.7614μs 19.3194 KOps/s 18.8465 KOps/s $\color{#35bf28}+2.51\%$
test_setitem_dim[int] 72.0640μs 33.0241μs 30.2809 KOps/s 25.1177 KOps/s $\textbf{\color{#35bf28}+20.56\%}$
test_setitem_dim[slice_int] 0.1124ms 60.2953μs 16.5850 KOps/s 14.7726 KOps/s $\textbf{\color{#35bf28}+12.27\%}$
test_setitem_dim[range] 0.1366ms 83.2132μs 12.0173 KOps/s 10.9030 KOps/s $\textbf{\color{#35bf28}+10.22\%}$
test_setitem_dim[tuple] 83.6660μs 48.9521μs 20.4281 KOps/s 18.0067 KOps/s $\textbf{\color{#35bf28}+13.45\%}$
test_setitem 77.9850μs 28.5089μs 35.0768 KOps/s 35.5112 KOps/s $\color{#d91a1a}-1.22\%$
test_set 0.2151ms 28.0594μs 35.6386 KOps/s 36.1285 KOps/s $\color{#d91a1a}-1.36\%$
test_set_shared 1.4122ms 0.2119ms 4.7192 KOps/s 4.7274 KOps/s $\color{#d91a1a}-0.17\%$
test_update 0.1726ms 34.4990μs 28.9864 KOps/s 29.1837 KOps/s $\color{#d91a1a}-0.68\%$
test_update_nested 1.0129ms 43.1853μs 23.1560 KOps/s 22.4208 KOps/s $\color{#35bf28}+3.28\%$
test_update__nested 0.2120ms 35.0952μs 28.4939 KOps/s 29.8335 KOps/s $\color{#d91a1a}-4.49\%$
test_set_nested 0.2822ms 31.0555μs 32.2004 KOps/s 33.3184 KOps/s $\color{#d91a1a}-3.36\%$
test_set_nested_new 0.3777ms 36.5346μs 27.3713 KOps/s 28.3331 KOps/s $\color{#d91a1a}-3.39\%$
test_select 0.3679ms 52.8382μs 18.9257 KOps/s 18.7745 KOps/s $\color{#35bf28}+0.81\%$
test_select_nested 0.1210ms 61.3582μs 16.2977 KOps/s 16.6508 KOps/s $\color{#d91a1a}-2.12\%$
test_exclude_nested 0.1613ms 76.8307μs 13.0156 KOps/s 13.1507 KOps/s $\color{#d91a1a}-1.03\%$
test_empty[True] 0.6906ms 0.3150ms 3.1748 KOps/s 3.1720 KOps/s $\color{#35bf28}+0.09\%$
test_empty[False] 14.8078μs 1.2220μs 818.2985 KOps/s 833.2730 KOps/s $\color{#d91a1a}-1.80\%$
test_unbind_speed 0.6202ms 0.2965ms 3.3724 KOps/s 3.4029 KOps/s $\color{#d91a1a}-0.90\%$
test_unbind_speed_stack0 0.4847ms 0.2913ms 3.4328 KOps/s 3.5675 KOps/s $\color{#d91a1a}-3.78\%$
test_unbind_speed_stack1 89.9919ms 0.7899ms 1.2660 KOps/s 1.4022 KOps/s $\textbf{\color{#d91a1a}-9.72\%}$
test_split 96.2916ms 2.2939ms 435.9463 Ops/s 457.6720 Ops/s $\color{#d91a1a}-4.75\%$
test_chunk 3.0589ms 2.0084ms 497.9130 Ops/s 460.5361 Ops/s $\textbf{\color{#35bf28}+8.12\%}$
test_creation[device0] 0.2819ms 0.1164ms 8.5875 KOps/s 8.5743 KOps/s $\color{#35bf28}+0.15\%$
test_creation_from_tensor 5.0547ms 0.1179ms 8.4838 KOps/s 8.5626 KOps/s $\color{#d91a1a}-0.92\%$
test_add_one[memmap_tensor0] 0.4798ms 7.6876μs 130.0798 KOps/s 138.1945 KOps/s $\textbf{\color{#d91a1a}-5.87\%}$
test_contiguous[memmap_tensor0] 24.6760μs 1.9042μs 525.1608 KOps/s 523.8325 KOps/s $\color{#35bf28}+0.25\%$
test_stack[memmap_tensor0] 0.1213ms 5.7894μs 172.7286 KOps/s 173.1243 KOps/s $\color{#d91a1a}-0.23\%$
test_memmaptd_index 1.1458ms 0.4001ms 2.4993 KOps/s 2.5228 KOps/s $\color{#d91a1a}-0.93\%$
test_memmaptd_index_astensor 0.9639ms 0.4786ms 2.0893 KOps/s 2.1116 KOps/s $\color{#d91a1a}-1.06\%$
test_memmaptd_index_op 1.6073ms 1.0002ms 999.7709 Ops/s 1.0081 KOps/s $\color{#d91a1a}-0.83\%$
test_serialize_model 0.1240s 0.1150s 8.6974 Ops/s 8.3434 Ops/s $\color{#35bf28}+4.24\%$
test_serialize_model_pickle 0.4555s 0.3919s 2.5517 Ops/s 2.5097 Ops/s $\color{#35bf28}+1.67\%$
test_serialize_weights 0.1235s 0.1160s 8.6181 Ops/s 8.6203 Ops/s $\color{#d91a1a}-0.03\%$
test_serialize_weights_returnearly 0.1729s 0.1607s 6.2246 Ops/s 6.3347 Ops/s $\color{#d91a1a}-1.74\%$
test_serialize_weights_pickle 0.6776s 0.4570s 2.1882 Ops/s 2.1882 Ops/s $+0.00\%$
test_serialize_weights_filesystem 0.2333s 0.1550s 6.4515 Ops/s 7.2057 Ops/s $\textbf{\color{#d91a1a}-10.47\%}$
test_serialize_model_filesystem 0.1537s 0.1452s 6.8864 Ops/s 6.6616 Ops/s $\color{#35bf28}+3.37\%$
test_reshape_pytree 0.1028ms 38.1067μs 26.2421 KOps/s 26.4418 KOps/s $\color{#d91a1a}-0.76\%$
test_reshape_td 0.1102ms 44.6893μs 22.3767 KOps/s 21.8034 KOps/s $\color{#35bf28}+2.63\%$
test_view_pytree 93.9950μs 37.8452μs 26.4234 KOps/s 26.3711 KOps/s $\color{#35bf28}+0.20\%$
test_view_td 0.1185ms 50.6842μs 19.7300 KOps/s 19.4391 KOps/s $\color{#35bf28}+1.50\%$
test_unbind_pytree 75.5810μs 35.2141μs 28.3977 KOps/s 28.4953 KOps/s $\color{#d91a1a}-0.34\%$
test_unbind_td 0.3120ms 44.2090μs 22.6198 KOps/s 22.9988 KOps/s $\color{#d91a1a}-1.65\%$
test_split_pytree 92.9230μs 37.4009μs 26.7373 KOps/s 26.7680 KOps/s $\color{#d91a1a}-0.11\%$
test_split_td 0.2405ms 56.8919μs 17.5772 KOps/s 17.7197 KOps/s $\color{#d91a1a}-0.80\%$
test_add_pytree 0.1169ms 45.6239μs 21.9184 KOps/s 22.9156 KOps/s $\color{#d91a1a}-4.35\%$
test_add_td 0.1760ms 81.2177μs 12.3126 KOps/s 12.5959 KOps/s $\color{#d91a1a}-2.25\%$
test_compile_add_one_nested[tensordict-compile] 0.1225ms 58.0385μs 17.2300 KOps/s 17.7245 KOps/s $\color{#d91a1a}-2.79\%$
test_compile_add_one_nested[tensordict-eager] 0.3325ms 0.1878ms 5.3247 KOps/s 5.3464 KOps/s $\color{#d91a1a}-0.41\%$
test_compile_add_one_nested[pytree-compile] 0.1060ms 55.5640μs 17.9973 KOps/s 17.7761 KOps/s $\color{#35bf28}+1.24\%$
test_compile_add_one_nested[pytree-eager] 0.3032ms 0.1421ms 7.0374 KOps/s 7.1743 KOps/s $\color{#d91a1a}-1.91\%$
test_compile_copy_nested[tensordict-compile] 60.7730μs 20.6559μs 48.4123 KOps/s 47.8863 KOps/s $\color{#35bf28}+1.10\%$
test_compile_copy_nested[tensordict-eager] 0.1251ms 66.0618μs 15.1373 KOps/s 14.9693 KOps/s $\color{#35bf28}+1.12\%$
test_compile_copy_nested[pytree-compile] 0.1316ms 74.2453μs 13.4689 KOps/s 13.3731 KOps/s $\color{#35bf28}+0.72\%$
test_compile_copy_nested[pytree-eager] 0.1376ms 66.9531μs 14.9358 KOps/s 14.6277 KOps/s $\color{#35bf28}+2.11\%$
test_compile_add_one_flat[tensordict-compile] 0.3532ms 0.1723ms 5.8031 KOps/s 5.8266 KOps/s $\color{#d91a1a}-0.40\%$
test_compile_add_one_flat[tensordict-eager] 0.3371ms 0.1871ms 5.3434 KOps/s 5.3584 KOps/s $\color{#d91a1a}-0.28\%$
test_compile_add_one_flat[tensorclass-compile] 0.1024ms 47.0718μs 21.2441 KOps/s 21.3388 KOps/s $\color{#d91a1a}-0.44\%$
test_compile_add_one_flat[tensorclass-eager] 0.5379ms 71.4407μs 13.9976 KOps/s 14.2984 KOps/s $\color{#d91a1a}-2.10\%$
test_compile_add_one_flat[pytree-compile] 0.3159ms 0.1779ms 5.6203 KOps/s 5.8571 KOps/s $\color{#d91a1a}-4.04\%$
test_compile_add_one_flat[pytree-eager] 0.5540ms 0.2896ms 3.4532 KOps/s 3.4723 KOps/s $\color{#d91a1a}-0.55\%$
test_compile_add_self_flat[tensordict-eager] 0.3287ms 0.2000ms 5.0004 KOps/s 5.0066 KOps/s $\color{#d91a1a}-0.12\%$
test_compile_add_self_flat[tensordict-compile] 0.5155ms 0.1778ms 5.6250 KOps/s 5.8474 KOps/s $\color{#d91a1a}-3.80\%$
test_compile_add_self_flat[tensorclass-eager] 0.1851ms 63.6767μs 15.7043 KOps/s 16.0319 KOps/s $\color{#d91a1a}-2.04\%$
test_compile_add_self_flat[tensorclass-compile] 0.1219ms 47.0242μs 21.2656 KOps/s 21.2468 KOps/s $\color{#35bf28}+0.09\%$
test_compile_add_self_flat[pytree-eager] 0.4792ms 0.2343ms 4.2675 KOps/s 4.3083 KOps/s $\color{#d91a1a}-0.95\%$
test_compile_add_self_flat[pytree-compile] 0.2750ms 0.1747ms 5.7239 KOps/s 5.6695 KOps/s $\color{#35bf28}+0.96\%$
test_compile_copy_flat[tensordict-compile] 0.2264ms 0.1021ms 9.7922 KOps/s 9.9412 KOps/s $\color{#d91a1a}-1.50\%$
test_compile_copy_flat[tensordict-eager] 0.1218ms 56.8835μs 17.5798 KOps/s 17.5285 KOps/s $\color{#35bf28}+0.29\%$
test_compile_copy_flat[pytree-compile] 0.1700ms 75.2417μs 13.2905 KOps/s 13.2194 KOps/s $\color{#35bf28}+0.54\%$
test_compile_copy_flat[pytree-eager] 0.1393ms 67.5849μs 14.7962 KOps/s 14.6240 KOps/s $\color{#35bf28}+1.18\%$
test_compile_assign_and_add[tensordict-compile] 0.2942ms 0.1951ms 5.1265 KOps/s 5.1406 KOps/s $\color{#d91a1a}-0.27\%$
test_compile_assign_and_add[tensordict-eager] 2.1643ms 1.6214ms 616.7544 Ops/s 618.6675 Ops/s $\color{#d91a1a}-0.31\%$
test_compile_assign_and_add[pytree-compile] 0.4128ms 0.1927ms 5.1885 KOps/s 5.1875 KOps/s $\color{#35bf28}+0.02\%$
test_compile_assign_and_add[pytree-eager] 1.3258ms 1.1007ms 908.4744 Ops/s 913.4392 Ops/s $\color{#d91a1a}-0.54\%$
test_compile_assign_and_add_stack[compile] 0.4973ms 0.4183ms 2.3908 KOps/s 2.4503 KOps/s $\color{#d91a1a}-2.43\%$
test_compile_assign_and_add_stack[eager] 3.9023ms 3.6946ms 270.6636 Ops/s 271.9167 Ops/s $\color{#d91a1a}-0.46\%$
test_compile_indexing[tensor-tensordict-compile] 95.2070μs 35.2333μs 28.3823 KOps/s 28.8158 KOps/s $\color{#d91a1a}-1.50\%$
test_compile_indexing[tensor-tensordict-eager] 1.2297ms 47.6594μs 20.9822 KOps/s 20.8870 KOps/s $\color{#35bf28}+0.46\%$
test_compile_indexing[tensor-tensorclass-compile] 92.3020μs 29.7358μs 33.6295 KOps/s 34.5760 KOps/s $\color{#d91a1a}-2.74\%$
test_compile_indexing[tensor-tensorclass-eager] 85.4990μs 28.3632μs 35.2570 KOps/s 33.7551 KOps/s $\color{#35bf28}+4.45\%$
test_compile_indexing[tensor-pytree-compile] 93.2640μs 29.5073μs 33.8899 KOps/s 34.0600 KOps/s $\color{#d91a1a}-0.50\%$
test_compile_indexing[tensor-pytree-eager] 90.1580μs 28.2352μs 35.4167 KOps/s 34.1744 KOps/s $\color{#35bf28}+3.64\%$
test_compile_indexing[slice-tensordict-compile] 0.1885ms 74.8908μs 13.3528 KOps/s 13.6885 KOps/s $\color{#d91a1a}-2.45\%$
test_compile_indexing[slice-tensordict-eager] 0.3281ms 26.8651μs 37.2230 KOps/s 36.5481 KOps/s $\color{#35bf28}+1.85\%$
test_compile_indexing[slice-tensorclass-compile] 0.1613ms 69.4304μs 14.4029 KOps/s 15.0892 KOps/s $\color{#d91a1a}-4.55\%$
test_compile_indexing[slice-tensorclass-eager] 67.7060μs 22.8573μs 43.7498 KOps/s 43.5615 KOps/s $\color{#35bf28}+0.43\%$
test_compile_indexing[slice-pytree-compile] 0.1351ms 68.3876μs 14.6225 KOps/s 14.8741 KOps/s $\color{#d91a1a}-1.69\%$
test_compile_indexing[slice-pytree-eager] 60.7230μs 22.4759μs 44.4920 KOps/s 44.4639 KOps/s $\color{#35bf28}+0.06\%$
test_compile_indexing[int-tensordict-compile] 0.1480ms 74.8348μs 13.3628 KOps/s 13.8673 KOps/s $\color{#d91a1a}-3.64\%$
test_compile_indexing[int-tensordict-eager] 0.9333ms 26.7319μs 37.4085 KOps/s 37.1529 KOps/s $\color{#35bf28}+0.69\%$
test_compile_indexing[int-tensorclass-compile] 0.1724ms 68.1718μs 14.6688 KOps/s 14.7858 KOps/s $\color{#d91a1a}-0.79\%$
test_compile_indexing[int-tensorclass-eager] 67.3550μs 22.6425μs 44.1647 KOps/s 44.0897 KOps/s $\color{#35bf28}+0.17\%$
test_compile_indexing[int-pytree-compile] 0.1483ms 68.1262μs 14.6786 KOps/s 15.2317 KOps/s $\color{#d91a1a}-3.63\%$
test_compile_indexing[int-pytree-eager] 69.9310μs 22.7739μs 43.9098 KOps/s 44.5802 KOps/s $\color{#d91a1a}-1.50\%$
test_mod_add[eager] 0.1392ms 23.5788μs 42.4110 KOps/s 42.9368 KOps/s $\color{#d91a1a}-1.22\%$
test_mod_add[compile] 0.1080ms 39.8781μs 25.0764 KOps/s 26.2093 KOps/s $\color{#d91a1a}-4.32\%$
test_mod_add[compile-overhead] 0.1126ms 39.6392μs 25.2275 KOps/s 26.5229 KOps/s $\color{#d91a1a}-4.88\%$
test_mod_wrap[eager] 0.3899ms 0.2097ms 4.7679 KOps/s 4.9419 KOps/s $\color{#d91a1a}-3.52\%$
test_mod_wrap[compile] 0.3578ms 0.2365ms 4.2287 KOps/s 4.4274 KOps/s $\color{#d91a1a}-4.49\%$
test_mod_wrap[compile-overhead] 0.3353ms 0.2303ms 4.3423 KOps/s 4.3930 KOps/s $\color{#d91a1a}-1.15\%$
test_mod_wrap_and_backward[eager] 17.0589ms 13.2452ms 75.4992 Ops/s 95.4156 Ops/s $\textbf{\color{#d91a1a}-20.87\%}$
test_mod_wrap_and_backward[compile] 14.6692ms 12.6984ms 78.7501 Ops/s 88.1438 Ops/s $\textbf{\color{#d91a1a}-10.66\%}$
test_mod_wrap_and_backward[compile-overhead] 16.2839ms 13.1753ms 75.8995 Ops/s 89.5743 Ops/s $\textbf{\color{#d91a1a}-15.27\%}$
test_seq_add[eager] 0.1699ms 86.7643μs 11.5255 KOps/s 11.8740 KOps/s $\color{#d91a1a}-2.93\%$
test_seq_add[compile] 0.4530ms 65.4491μs 15.2790 KOps/s 16.0141 KOps/s $\color{#d91a1a}-4.59\%$
test_seq_add[compile-overhead] 0.1509ms 63.9494μs 15.6374 KOps/s 16.2485 KOps/s $\color{#d91a1a}-3.76\%$
test_seq_wrap[eager] 0.7135ms 0.3885ms 2.5743 KOps/s 2.7509 KOps/s $\textbf{\color{#d91a1a}-6.42\%}$
test_seq_wrap[compile] 0.4061ms 0.2685ms 3.7246 KOps/s 3.7475 KOps/s $\color{#d91a1a}-0.61\%$
test_seq_wrap[compile-overhead] 0.5014ms 0.2703ms 3.6991 KOps/s 3.7787 KOps/s $\color{#d91a1a}-2.11\%$
test_func_call_runtime[False-eager] 0.7605ms 0.5357ms 1.8667 KOps/s 1.9159 KOps/s $\color{#d91a1a}-2.57\%$
test_func_call_runtime[False-compile] 1.0858ms 0.5054ms 1.9785 KOps/s 2.0283 KOps/s $\color{#d91a1a}-2.45\%$
test_func_call_runtime[False-compile-overhead] 0.6568ms 0.4971ms 2.0116 KOps/s 2.0248 KOps/s $\color{#d91a1a}-0.65\%$
test_func_call_runtime[True-eager] 0.9463ms 0.7511ms 1.3314 KOps/s 1.3481 KOps/s $\color{#d91a1a}-1.24\%$
test_func_call_runtime[True-compile] 0.8524ms 0.5082ms 1.9678 KOps/s 1.9989 KOps/s $\color{#d91a1a}-1.56\%$
test_func_call_runtime[True-compile-overhead] 0.9151ms 0.5139ms 1.9459 KOps/s 1.9624 KOps/s $\color{#d91a1a}-0.84\%$
test_func_call_cm_runtime[False-eager] 0.9241ms 0.5268ms 1.8983 KOps/s 1.9167 KOps/s $\color{#d91a1a}-0.96\%$
test_func_call_cm_runtime[False-compile] 1.0504ms 0.5004ms 1.9985 KOps/s 2.0258 KOps/s $\color{#d91a1a}-1.35\%$
test_func_call_cm_runtime[False-compile-overhead] 0.9132ms 0.5114ms 1.9555 KOps/s 2.0314 KOps/s $\color{#d91a1a}-3.74\%$
test_func_call_cm_runtime[True-eager] 1.1230ms 0.8860ms 1.1286 KOps/s 1.1512 KOps/s $\color{#d91a1a}-1.96\%$
test_func_call_cm_runtime[True-compile] 0.9970ms 0.7536ms 1.3269 KOps/s 1.3641 KOps/s $\color{#d91a1a}-2.72\%$
test_func_call_cm_runtime[True-compile-overhead] 1.0377ms 0.7467ms 1.3393 KOps/s 1.3536 KOps/s $\color{#d91a1a}-1.06\%$
test_vmap_func_call_cm_runtime[eager] 2.5112ms 1.8657ms 536.0051 Ops/s 542.5710 Ops/s $\color{#d91a1a}-1.21\%$
test_vmap_func_call_cm_runtime[compile] 3.0246ms 1.9200ms 520.8380 Ops/s 531.6222 Ops/s $\color{#d91a1a}-2.03\%$
test_vmap_func_call_cm_runtime[compile-overhead] 2.9175ms 1.9168ms 521.6929 Ops/s 532.6480 Ops/s $\color{#d91a1a}-2.06\%$
test_distributed 0.2698ms 0.1226ms 8.1566 KOps/s 7.9376 KOps/s $\color{#35bf28}+2.76\%$
test_tdmodule 48.6010μs 16.7812μs 59.5906 KOps/s 61.0870 KOps/s $\color{#d91a1a}-2.45\%$
test_tdmodule_dispatch 81.2820μs 34.8125μs 28.7253 KOps/s 29.3828 KOps/s $\color{#d91a1a}-2.24\%$
test_tdseq 50.5440μs 19.2436μs 51.9652 KOps/s 52.5417 KOps/s $\color{#d91a1a}-1.10\%$
test_tdseq_dispatch 71.2730μs 39.1104μs 25.5687 KOps/s 24.8141 KOps/s $\color{#35bf28}+3.04\%$
test_instantiation_functorch 2.3729ms 1.6161ms 618.7666 Ops/s 640.6210 Ops/s $\color{#d91a1a}-3.41\%$
test_instantiation_td 2.1783ms 1.2021ms 831.8704 Ops/s 871.9390 Ops/s $\color{#d91a1a}-4.60\%$
test_exec_functorch 0.3428ms 0.1882ms 5.3123 KOps/s 5.4767 KOps/s $\color{#d91a1a}-3.00\%$
test_exec_functional_call 0.3437ms 0.1782ms 5.6125 KOps/s 5.9174 KOps/s $\textbf{\color{#d91a1a}-5.15\%}$
test_exec_td 0.3008ms 0.1716ms 5.8272 KOps/s 5.9905 KOps/s $\color{#d91a1a}-2.72\%$
test_exec_td_decorator 1.0254ms 0.2259ms 4.4259 KOps/s 4.5637 KOps/s $\color{#d91a1a}-3.02\%$
test_vmap_mlp_speed[True-True] 0.8972ms 0.6387ms 1.5657 KOps/s 1.5878 KOps/s $\color{#d91a1a}-1.39\%$
test_vmap_mlp_speed[True-False] 0.9511ms 0.6391ms 1.5648 KOps/s 1.5931 KOps/s $\color{#d91a1a}-1.77\%$
test_vmap_mlp_speed[False-True] 0.7235ms 0.4976ms 2.0097 KOps/s 2.0594 KOps/s $\color{#d91a1a}-2.41\%$
test_vmap_mlp_speed[False-False] 0.6860ms 0.4949ms 2.0208 KOps/s 1.9782 KOps/s $\color{#35bf28}+2.15\%$
test_vmap_mlp_speed_decorator[True-True] 1.2535ms 0.6144ms 1.6275 KOps/s 1.6381 KOps/s $\color{#d91a1a}-0.64\%$
test_vmap_mlp_speed_decorator[True-False] 0.8317ms 0.6145ms 1.6273 KOps/s 1.6387 KOps/s $\color{#d91a1a}-0.70\%$
test_vmap_mlp_speed_decorator[False-True] 0.6535ms 0.5118ms 1.9540 KOps/s 1.9836 KOps/s $\color{#d91a1a}-1.49\%$
test_vmap_mlp_speed_decorator[False-False] 0.9030ms 0.5175ms 1.9324 KOps/s 1.9881 KOps/s $\color{#d91a1a}-2.80\%$
test_to_module_speed[True] 1.7244ms 1.2943ms 772.6091 Ops/s 781.0626 Ops/s $\color{#d91a1a}-1.08\%$
test_to_module_speed[False] 2.3073ms 1.2616ms 792.6664 Ops/s 800.3475 Ops/s $\color{#d91a1a}-0.96\%$
test_tc_init 0.1259ms 42.7392μs 23.3977 KOps/s 23.1744 KOps/s $\color{#35bf28}+0.96\%$
test_tc_init_nested 0.1434ms 85.6149μs 11.6802 KOps/s 11.6706 KOps/s $\color{#35bf28}+0.08\%$
test_tc_first_layer_tensor 26.8100μs 1.5271μs 654.8248 KOps/s 663.7510 KOps/s $\color{#d91a1a}-1.34\%$
test_tc_first_layer_nontensor 45.9760μs 4.6892μs 213.2567 KOps/s 214.8955 KOps/s $\color{#d91a1a}-0.76\%$
test_tc_second_layer_tensor 28.6730μs 2.8100μs 355.8693 KOps/s 357.3217 KOps/s $\color{#d91a1a}-0.41\%$
test_tc_second_layer_nontensor 44.2120μs 6.0245μs 165.9882 KOps/s 168.8504 KOps/s $\color{#d91a1a}-1.70\%$
test_unbind 0.4663s 12.8796ms 77.6423 Ops/s 75.6468 Ops/s $\color{#35bf28}+2.64\%$
test_full_like 8.1793ms 7.1215ms 140.4203 Ops/s 79.5894 Ops/s $\textbf{\color{#35bf28}+76.43\%}$
test_zeros_like 12.9668ms 6.1465ms 162.6942 Ops/s 140.3174 Ops/s $\textbf{\color{#35bf28}+15.95\%}$
test_ones_like 15.1183ms 7.5548ms 132.3658 Ops/s 136.7912 Ops/s $\color{#d91a1a}-3.24\%$
test_clone 16.9625ms 9.0780ms 110.1570 Ops/s 112.9248 Ops/s $\color{#d91a1a}-2.45\%$
test_squeeze 73.4180μs 12.1198μs 82.5094 KOps/s 79.3407 KOps/s $\color{#35bf28}+3.99\%$
test_unsqueeze 0.1742ms 90.4131μs 11.0603 KOps/s 10.7636 KOps/s $\color{#35bf28}+2.76\%$
test_split 0.6670ms 0.1921ms 5.2046 KOps/s 5.1297 KOps/s $\color{#35bf28}+1.46\%$
test_permute 0.3894ms 0.2174ms 4.6000 KOps/s 4.4904 KOps/s $\color{#35bf28}+2.44\%$
test_stack 32.9162ms 24.3187ms 41.1206 Ops/s 41.1479 Ops/s $\color{#d91a1a}-0.07\%$
test_cat 24.3639ms 23.8909ms 41.8570 Ops/s 41.2670 Ops/s $\color{#35bf28}+1.43\%$

Copy link

github-actions bot commented Sep 11, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 228. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1200ms 14.8297μs 67.4322 KOps/s 68.0727 KOps/s $\color{#d91a1a}-0.94\%$
test_plain_set_stack_nested 43.7110μs 15.0159μs 66.5963 KOps/s 67.0584 KOps/s $\color{#d91a1a}-0.69\%$
test_plain_set_nested_inplace 43.5610μs 15.9875μs 62.5488 KOps/s 63.8392 KOps/s $\color{#d91a1a}-2.02\%$
test_plain_set_stack_nested_inplace 45.9110μs 15.8191μs 63.2147 KOps/s 63.6555 KOps/s $\color{#d91a1a}-0.69\%$
test_items 29.3700μs 2.8406μs 352.0380 KOps/s 354.0333 KOps/s $\color{#d91a1a}-0.56\%$
test_items_nested 0.3581ms 0.3104ms 3.2212 KOps/s 3.1928 KOps/s $\color{#35bf28}+0.89\%$
test_items_nested_locked 0.4019ms 0.3132ms 3.1929 KOps/s 3.1753 KOps/s $\color{#35bf28}+0.55\%$
test_items_nested_leaf 95.1420μs 63.0990μs 15.8481 KOps/s 15.8570 KOps/s $\color{#d91a1a}-0.06\%$
test_items_stack_nested 0.3665ms 0.3124ms 3.2013 KOps/s 3.0847 KOps/s $\color{#35bf28}+3.78\%$
test_items_stack_nested_leaf 0.1084ms 63.9036μs 15.6486 KOps/s 15.3273 KOps/s $\color{#35bf28}+2.10\%$
test_items_stack_nested_locked 0.3736ms 0.3164ms 3.1601 KOps/s 3.1526 KOps/s $\color{#35bf28}+0.24\%$
test_keys 31.6900μs 3.4071μs 293.5025 KOps/s 293.9711 KOps/s $\color{#d91a1a}-0.16\%$
test_keys_nested 83.3010μs 53.4223μs 18.7188 KOps/s 18.0417 KOps/s $\color{#35bf28}+3.75\%$
test_keys_nested_locked 2.8405ms 60.5622μs 16.5120 KOps/s 16.4814 KOps/s $\color{#35bf28}+0.19\%$
test_keys_nested_leaf 80.0120μs 45.3362μs 22.0574 KOps/s 21.3614 KOps/s $\color{#35bf28}+3.26\%$
test_keys_stack_nested 84.0710μs 55.5249μs 18.0099 KOps/s 18.0620 KOps/s $\color{#d91a1a}-0.29\%$
test_keys_stack_nested_leaf 73.8610μs 46.7891μs 21.3725 KOps/s 20.9663 KOps/s $\color{#35bf28}+1.94\%$
test_keys_stack_nested_locked 0.1037ms 59.9268μs 16.6870 KOps/s 16.6378 KOps/s $\color{#35bf28}+0.30\%$
test_values 5.4200μs 0.8199μs 1.2197 MOps/s 1.2171 MOps/s $\color{#35bf28}+0.21\%$
test_values_nested 57.1410μs 27.3287μs 36.5916 KOps/s 36.4285 KOps/s $\color{#35bf28}+0.45\%$
test_values_nested_locked 65.0910μs 29.3882μs 34.0273 KOps/s 34.0803 KOps/s $\color{#d91a1a}-0.16\%$
test_values_nested_leaf 58.8710μs 24.0374μs 41.6019 KOps/s 41.7791 KOps/s $\color{#d91a1a}-0.42\%$
test_values_stack_nested 64.6310μs 27.8314μs 35.9306 KOps/s 35.1986 KOps/s $\color{#35bf28}+2.08\%$
test_values_stack_nested_leaf 53.5210μs 24.5542μs 40.7262 KOps/s 39.9720 KOps/s $\color{#35bf28}+1.89\%$
test_values_stack_nested_locked 60.3410μs 29.7774μs 33.5825 KOps/s 32.9828 KOps/s $\color{#35bf28}+1.82\%$
test_membership 1.6580μs 0.4726μs 2.1158 MOps/s 2.1271 MOps/s $\color{#d91a1a}-0.53\%$
test_membership_nested 19.4950μs 1.7397μs 574.8278 KOps/s 568.0451 KOps/s $\color{#35bf28}+1.19\%$
test_membership_nested_leaf 11.3467μs 1.7152μs 583.0190 KOps/s 580.2504 KOps/s $\color{#35bf28}+0.48\%$
test_membership_stacked_nested 51.0810μs 1.7979μs 556.2179 KOps/s 555.7376 KOps/s $\color{#35bf28}+0.09\%$
test_membership_stacked_nested_leaf 31.0800μs 1.8131μs 551.5328 KOps/s 563.6648 KOps/s $\color{#d91a1a}-2.15\%$
test_membership_nested_last 33.4510μs 2.6618μs 375.6795 KOps/s 389.3696 KOps/s $\color{#d91a1a}-3.52\%$
test_membership_nested_leaf_last 33.3500μs 2.6124μs 382.7946 KOps/s 387.2158 KOps/s $\color{#d91a1a}-1.14\%$
test_membership_stacked_nested_last 31.1710μs 2.7343μs 365.7180 KOps/s 389.5338 KOps/s $\textbf{\color{#d91a1a}-6.11\%}$
test_membership_stacked_nested_leaf_last 41.3410μs 2.6101μs 383.1219 KOps/s 389.5223 KOps/s $\color{#d91a1a}-1.64\%$
test_nested_getleaf 35.7410μs 6.0888μs 164.2347 KOps/s 165.7582 KOps/s $\color{#d91a1a}-0.92\%$
test_nested_get 35.2400μs 5.7033μs 175.3363 KOps/s 174.4859 KOps/s $\color{#35bf28}+0.49\%$
test_stacked_getleaf 30.0110μs 6.0642μs 164.9027 KOps/s 165.5381 KOps/s $\color{#d91a1a}-0.38\%$
test_stacked_get 33.3210μs 5.6299μs 177.6237 KOps/s 177.8825 KOps/s $\color{#d91a1a}-0.15\%$
test_nested_getitemleaf 26.8200μs 6.1165μs 163.4929 KOps/s 163.2545 KOps/s $\color{#35bf28}+0.15\%$
test_nested_getitem 33.7510μs 5.7559μs 173.7337 KOps/s 174.1686 KOps/s $\color{#d91a1a}-0.25\%$
test_stacked_getitemleaf 28.8100μs 6.0445μs 165.4406 KOps/s 166.2338 KOps/s $\color{#d91a1a}-0.48\%$
test_stacked_getitem 37.3700μs 5.6866μs 175.8509 KOps/s 176.2546 KOps/s $\color{#d91a1a}-0.23\%$
test_lock_nested 7.2247ms 0.4212ms 2.3743 KOps/s 2.3945 KOps/s $\color{#d91a1a}-0.85\%$
test_lock_stack_nested 0.4751ms 0.3761ms 2.6592 KOps/s 2.6376 KOps/s $\color{#35bf28}+0.82\%$
test_unlock_nested 0.7917ms 0.3546ms 2.8200 KOps/s 2.7994 KOps/s $\color{#35bf28}+0.73\%$
test_unlock_stack_nested 0.3626ms 0.3152ms 3.1729 KOps/s 3.1329 KOps/s $\color{#35bf28}+1.28\%$
test_flatten_speed 0.3073ms 77.9079μs 12.8357 KOps/s 12.4444 KOps/s $\color{#35bf28}+3.14\%$
test_unflatten_speed 0.3642ms 0.2783ms 3.5928 KOps/s 3.5494 KOps/s $\color{#35bf28}+1.22\%$
test_common_ops 1.4881ms 1.2798ms 781.3582 Ops/s 779.4800 Ops/s $\color{#35bf28}+0.24\%$
test_creation 25.9900μs 1.4680μs 681.1945 KOps/s 671.9798 KOps/s $\color{#35bf28}+1.37\%$
test_creation_empty 45.5110μs 17.2858μs 57.8510 KOps/s 58.3379 KOps/s $\color{#d91a1a}-0.83\%$
test_creation_nested_1 53.9710μs 19.0512μs 52.4901 KOps/s 53.2756 KOps/s $\color{#d91a1a}-1.47\%$
test_creation_nested_2 45.0310μs 21.3403μs 46.8598 KOps/s 46.9098 KOps/s $\color{#d91a1a}-0.11\%$
test_clone 67.1010μs 28.3235μs 35.3064 KOps/s 34.2664 KOps/s $\color{#35bf28}+3.03\%$
test_getitem[int] 1.2549ms 15.7638μs 63.4365 KOps/s 63.1333 KOps/s $\color{#35bf28}+0.48\%$
test_getitem[slice_int] 0.1197ms 27.8877μs 35.8581 KOps/s 35.7906 KOps/s $\color{#35bf28}+0.19\%$
test_getitem[range] 0.2536ms 0.1099ms 9.0976 KOps/s 9.0866 KOps/s $\color{#35bf28}+0.12\%$
test_getitem[tuple] 0.1270ms 23.3232μs 42.8758 KOps/s 41.9368 KOps/s $\color{#35bf28}+2.24\%$
test_getitem[list] 0.1911ms 98.0441μs 10.1995 KOps/s 9.9929 KOps/s $\color{#35bf28}+2.07\%$
test_setitem_dim[int] 72.9410μs 45.5708μs 21.9439 KOps/s 18.4211 KOps/s $\textbf{\color{#35bf28}+19.12\%}$
test_setitem_dim[slice_int] 0.1086ms 68.1134μs 14.6814 KOps/s 12.6606 KOps/s $\textbf{\color{#35bf28}+15.96\%}$
test_setitem_dim[range] 0.1624ms 0.1281ms 7.8041 KOps/s 7.1268 KOps/s $\textbf{\color{#35bf28}+9.50\%}$
test_setitem_dim[tuple] 86.7620μs 61.1085μs 16.3643 KOps/s 14.0010 KOps/s $\textbf{\color{#35bf28}+16.88\%}$
test_setitem 77.7310μs 41.9919μs 23.8141 KOps/s 23.2708 KOps/s $\color{#35bf28}+2.33\%$
test_set 75.5020μs 41.0420μs 24.3653 KOps/s 24.2288 KOps/s $\color{#35bf28}+0.56\%$
test_set_shared 0.3714ms 50.3981μs 19.8420 KOps/s 19.7111 KOps/s $\color{#35bf28}+0.66\%$
test_update 0.1913ms 51.3668μs 19.4678 KOps/s 19.4173 KOps/s $\color{#35bf28}+0.26\%$
test_update_nested 98.6720μs 58.6334μs 17.0551 KOps/s 17.2599 KOps/s $\color{#d91a1a}-1.19\%$
test_update__nested 0.1038ms 59.9276μs 16.6868 KOps/s 16.4635 KOps/s $\color{#35bf28}+1.36\%$
test_set_nested 99.1620μs 44.3888μs 22.5282 KOps/s 22.3863 KOps/s $\color{#35bf28}+0.63\%$
test_set_nested_new 87.4320μs 47.5158μs 21.0456 KOps/s 20.9705 KOps/s $\color{#35bf28}+0.36\%$
test_select 0.1020ms 59.8102μs 16.7196 KOps/s 16.2310 KOps/s $\color{#35bf28}+3.01\%$
test_select_nested 71.2010μs 41.4697μs 24.1140 KOps/s 23.8855 KOps/s $\color{#35bf28}+0.96\%$
test_exclude_nested 89.7910μs 58.4211μs 17.1171 KOps/s 16.6145 KOps/s $\color{#35bf28}+3.03\%$
test_empty[True] 0.3040ms 0.2412ms 4.1461 KOps/s 4.0308 KOps/s $\color{#35bf28}+2.86\%$
test_empty[False] 3.2811μs 0.7493μs 1.3346 MOps/s 1.3510 MOps/s $\color{#d91a1a}-1.22\%$
test_to 52.7010μs 25.6785μs 38.9432 KOps/s 39.0194 KOps/s $\color{#d91a1a}-0.20\%$
test_to_nonblocking 53.7810μs 24.1654μs 41.3814 KOps/s 42.0600 KOps/s $\color{#d91a1a}-1.61\%$
test_unbind_speed 0.3144ms 0.2820ms 3.5467 KOps/s 3.5649 KOps/s $\color{#d91a1a}-0.51\%$
test_unbind_speed_stack0 0.3341ms 0.2776ms 3.6026 KOps/s 3.6076 KOps/s $\color{#d91a1a}-0.14\%$
test_unbind_speed_stack1 93.0509ms 0.7081ms 1.4123 KOps/s 1.5327 KOps/s $\textbf{\color{#d91a1a}-7.86\%}$
test_split 93.7704ms 2.1777ms 459.1906 Ops/s 453.1139 Ops/s $\color{#35bf28}+1.34\%$
test_chunk 96.0818ms 2.1877ms 457.0989 Ops/s 451.9758 Ops/s $\color{#35bf28}+1.13\%$
test_creation[device0] 0.4467ms 0.1262ms 7.9237 KOps/s 7.9181 KOps/s $\color{#35bf28}+0.07\%$
test_creation_from_tensor 0.3261ms 0.1280ms 7.8101 KOps/s 7.7855 KOps/s $\color{#35bf28}+0.32\%$
test_add_one[memmap_tensor0] 0.2022ms 8.8516μs 112.9741 KOps/s 110.8706 KOps/s $\color{#35bf28}+1.90\%$
test_contiguous[memmap_tensor0] 27.7310μs 2.1796μs 458.7959 KOps/s 448.1820 KOps/s $\color{#35bf28}+2.37\%$
test_stack[memmap_tensor0] 36.5100μs 6.8095μs 146.8534 KOps/s 140.9429 KOps/s $\color{#35bf28}+4.19\%$
test_memmaptd_index 1.0923ms 0.4159ms 2.4046 KOps/s 2.3245 KOps/s $\color{#35bf28}+3.45\%$
test_memmaptd_index_astensor 0.7132ms 0.4763ms 2.0995 KOps/s 2.0429 KOps/s $\color{#35bf28}+2.77\%$
test_memmaptd_index_op 1.4881ms 1.0554ms 947.5001 Ops/s 947.6156 Ops/s $\color{#d91a1a}-0.01\%$
test_serialize_model 0.1277s 0.1266s 7.9018 Ops/s 7.8597 Ops/s $\color{#35bf28}+0.54\%$
test_serialize_model_pickle 1.3503s 1.2126s 0.8246 Ops/s 0.8251 Ops/s $\color{#d91a1a}-0.05\%$
test_serialize_weights 0.2205s 0.1396s 7.1624 Ops/s 7.9129 Ops/s $\textbf{\color{#d91a1a}-9.48\%}$
test_serialize_weights_returnearly 0.2226s 56.0741ms 17.8335 Ops/s 21.8641 Ops/s $\textbf{\color{#d91a1a}-18.43\%}$
test_serialize_weights_pickle 1.4042s 1.2264s 0.8154 Ops/s 0.8240 Ops/s $\color{#d91a1a}-1.04\%$
test_reshape_pytree 71.8510μs 35.7140μs 28.0002 KOps/s 27.2454 KOps/s $\color{#35bf28}+2.77\%$
test_reshape_td 71.5110μs 41.9991μs 23.8100 KOps/s 23.7589 KOps/s $\color{#35bf28}+0.22\%$
test_view_pytree 66.6210μs 35.9774μs 27.7952 KOps/s 27.5350 KOps/s $\color{#35bf28}+0.95\%$
test_view_td 73.1410μs 45.8877μs 21.7923 KOps/s 20.5247 KOps/s $\textbf{\color{#35bf28}+6.18\%}$
test_unbind_pytree 60.8510μs 33.8813μs 29.5148 KOps/s 28.0654 KOps/s $\textbf{\color{#35bf28}+5.16\%}$
test_unbind_td 0.3632ms 42.2218μs 23.6844 KOps/s 23.1290 KOps/s $\color{#35bf28}+2.40\%$
test_split_pytree 76.2520μs 46.5610μs 21.4772 KOps/s 21.1238 KOps/s $\color{#35bf28}+1.67\%$
test_split_td 0.4642ms 54.9684μs 18.1923 KOps/s 18.2511 KOps/s $\color{#d91a1a}-0.32\%$
test_add_pytree 0.1001ms 56.3626μs 17.7422 KOps/s 17.2559 KOps/s $\color{#35bf28}+2.82\%$
test_add_td 0.2440ms 93.5768μs 10.6864 KOps/s 10.3785 KOps/s $\color{#35bf28}+2.97\%$
test_compile_add_one_nested[tensordict-compile] 0.4086ms 0.2083ms 4.8006 KOps/s 4.7066 KOps/s $\color{#35bf28}+2.00\%$
test_compile_add_one_nested[tensordict-eager] 0.2501ms 0.1565ms 6.3910 KOps/s 6.3037 KOps/s $\color{#35bf28}+1.39\%$
test_compile_add_one_nested[pytree-compile] 0.1991ms 0.1455ms 6.8749 KOps/s 6.9415 KOps/s $\color{#d91a1a}-0.96\%$
test_compile_add_one_nested[pytree-eager] 0.2600ms 0.1848ms 5.4115 KOps/s 5.3577 KOps/s $\color{#35bf28}+1.00\%$
test_compile_copy_nested[tensordict-compile] 59.6310μs 21.3802μs 46.7723 KOps/s 48.6566 KOps/s $\color{#d91a1a}-3.87\%$
test_compile_copy_nested[tensordict-eager] 87.7010μs 44.1272μs 22.6618 KOps/s 22.9463 KOps/s $\color{#d91a1a}-1.24\%$
test_compile_copy_nested[pytree-compile] 0.1079ms 64.9244μs 15.4025 KOps/s 15.4863 KOps/s $\color{#d91a1a}-0.54\%$
test_compile_copy_nested[pytree-eager] 93.6020μs 50.0805μs 19.9678 KOps/s 20.0146 KOps/s $\color{#d91a1a}-0.23\%$
test_compile_add_one_flat[tensordict-compile] 0.4141ms 0.3176ms 3.1484 KOps/s 3.1447 KOps/s $\color{#35bf28}+0.12\%$
test_compile_add_one_flat[tensordict-eager] 0.2485ms 0.2083ms 4.8015 KOps/s 4.7415 KOps/s $\color{#35bf28}+1.26\%$
test_compile_add_one_flat[tensorclass-compile] 0.1629ms 0.1280ms 7.8128 KOps/s 7.8246 KOps/s $\color{#d91a1a}-0.15\%$
test_compile_add_one_flat[tensorclass-eager] 0.1221ms 59.1493μs 16.9064 KOps/s 16.7395 KOps/s $\color{#35bf28}+1.00\%$
test_compile_add_one_flat[pytree-compile] 0.3545ms 0.3152ms 3.1724 KOps/s 3.1259 KOps/s $\color{#35bf28}+1.49\%$
test_compile_add_one_flat[pytree-eager] 0.6723ms 0.6249ms 1.6003 KOps/s 1.5870 KOps/s $\color{#35bf28}+0.84\%$
test_compile_add_self_flat[tensordict-eager] 0.2948ms 0.2469ms 4.0505 KOps/s 3.9497 KOps/s $\color{#35bf28}+2.55\%$
test_compile_add_self_flat[tensordict-compile] 0.4753ms 0.3185ms 3.1395 KOps/s 3.1136 KOps/s $\color{#35bf28}+0.83\%$
test_compile_add_self_flat[tensorclass-eager] 0.1450ms 69.1453μs 14.4623 KOps/s 14.1283 KOps/s $\color{#35bf28}+2.36\%$
test_compile_add_self_flat[tensorclass-compile] 0.1722ms 0.1293ms 7.7325 KOps/s 7.7454 KOps/s $\color{#d91a1a}-0.17\%$
test_compile_add_self_flat[pytree-eager] 0.7200ms 0.5311ms 1.8827 KOps/s 1.8227 KOps/s $\color{#35bf28}+3.29\%$
test_compile_add_self_flat[pytree-compile] 0.3840ms 0.3147ms 3.1777 KOps/s 3.1200 KOps/s $\color{#35bf28}+1.85\%$
test_compile_copy_flat[tensordict-compile] 51.1510μs 18.1007μs 55.2465 KOps/s 55.4944 KOps/s $\color{#d91a1a}-0.45\%$
test_compile_copy_flat[tensordict-eager] 53.9610μs 26.5225μs 37.7038 KOps/s 36.9267 KOps/s $\color{#35bf28}+2.10\%$
test_compile_copy_flat[pytree-compile] 0.1054ms 69.4761μs 14.3934 KOps/s 14.4874 KOps/s $\color{#d91a1a}-0.65\%$
test_compile_copy_flat[pytree-eager] 76.9920μs 52.2613μs 19.1346 KOps/s 19.3573 KOps/s $\color{#d91a1a}-1.15\%$
test_compile_assign_and_add[tensordict-compile] 2.3278ms 0.8027ms 1.2458 KOps/s 1.1465 KOps/s $\textbf{\color{#35bf28}+8.66\%}$
test_compile_assign_and_add[tensordict-eager] 3.4237ms 3.1859ms 313.8836 Ops/s 307.6631 Ops/s $\color{#35bf28}+2.02\%$
test_compile_assign_and_add[pytree-compile] 2.3834ms 0.8158ms 1.2258 KOps/s 1.1407 KOps/s $\textbf{\color{#35bf28}+7.46\%}$
test_compile_assign_and_add[pytree-eager] 3.6649ms 3.3238ms 300.8584 Ops/s 300.2477 Ops/s $\color{#35bf28}+0.20\%$
test_compile_indexing[tensor-tensordict-compile] 0.2715ms 0.1155ms 8.6556 KOps/s 9.2715 KOps/s $\textbf{\color{#d91a1a}-6.64\%}$
test_compile_indexing[tensor-tensordict-eager] 0.1963ms 63.8147μs 15.6704 KOps/s 16.3007 KOps/s $\color{#d91a1a}-3.87\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1708ms 0.1061ms 9.4241 KOps/s 9.6166 KOps/s $\color{#d91a1a}-2.00\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1325ms 43.6920μs 22.8875 KOps/s 22.4554 KOps/s $\color{#35bf28}+1.92\%$
test_compile_indexing[tensor-pytree-compile] 0.2551ms 0.1054ms 9.4916 KOps/s 9.5987 KOps/s $\color{#d91a1a}-1.12\%$
test_compile_indexing[tensor-pytree-eager] 0.1821ms 43.0657μs 23.2203 KOps/s 22.4559 KOps/s $\color{#35bf28}+3.40\%$
test_compile_indexing[slice-tensordict-compile] 0.2031ms 0.1390ms 7.1918 KOps/s 7.3276 KOps/s $\color{#d91a1a}-1.85\%$
test_compile_indexing[slice-tensordict-eager] 0.1603ms 25.5399μs 39.1544 KOps/s 38.6532 KOps/s $\color{#35bf28}+1.30\%$
test_compile_indexing[slice-tensorclass-compile] 0.1776ms 0.1331ms 7.5155 KOps/s 7.5635 KOps/s $\color{#d91a1a}-0.63\%$
test_compile_indexing[slice-tensorclass-eager] 86.6210μs 22.0221μs 45.4089 KOps/s 46.4546 KOps/s $\color{#d91a1a}-2.25\%$
test_compile_indexing[slice-pytree-compile] 0.1803ms 0.1342ms 7.4539 KOps/s 7.5753 KOps/s $\color{#d91a1a}-1.60\%$
test_compile_indexing[slice-pytree-eager] 56.9610μs 21.8017μs 45.8680 KOps/s 46.7900 KOps/s $\color{#d91a1a}-1.97\%$
test_compile_indexing[int-tensordict-compile] 0.2024ms 0.1403ms 7.1297 KOps/s 7.2605 KOps/s $\color{#d91a1a}-1.80\%$
test_compile_indexing[int-tensordict-eager] 0.4629ms 25.7291μs 38.8665 KOps/s 39.1629 KOps/s $\color{#d91a1a}-0.76\%$
test_compile_indexing[int-tensorclass-compile] 0.1751ms 0.1337ms 7.4812 KOps/s 7.5263 KOps/s $\color{#d91a1a}-0.60\%$
test_compile_indexing[int-tensorclass-eager] 0.1153ms 26.7475μs 37.3867 KOps/s 47.0430 KOps/s $\textbf{\color{#d91a1a}-20.53\%}$
test_compile_indexing[int-pytree-compile] 0.2608ms 0.1399ms 7.1461 KOps/s 7.5690 KOps/s $\textbf{\color{#d91a1a}-5.59\%}$
test_compile_indexing[int-pytree-eager] 68.7810μs 22.6031μs 44.2417 KOps/s 46.8959 KOps/s $\textbf{\color{#d91a1a}-5.66\%}$
test_mod_add[eager] 97.5420μs 33.6794μs 29.6917 KOps/s 30.7492 KOps/s $\color{#d91a1a}-3.44\%$
test_mod_add[compile] 0.1130ms 70.7877μs 14.1268 KOps/s 14.2177 KOps/s $\color{#d91a1a}-0.64\%$
test_mod_add[compile-overhead] 0.2601ms 0.1358ms 7.3660 KOps/s 7.0786 KOps/s $\color{#35bf28}+4.06\%$
test_mod_wrap[eager] 0.3277ms 0.2529ms 3.9546 KOps/s 4.0485 KOps/s $\color{#d91a1a}-2.32\%$
test_mod_wrap[compile] 0.4492ms 0.2858ms 3.4993 KOps/s 3.4290 KOps/s $\color{#35bf28}+2.05\%$
test_mod_wrap[compile-overhead] 7.8979ms 4.1234ms 242.5158 Ops/s 244.5996 Ops/s $\color{#d91a1a}-0.85\%$
test_mod_wrap_and_backward[eager] 1.5137ms 1.3581ms 736.3455 Ops/s 685.7966 Ops/s $\textbf{\color{#35bf28}+7.37\%}$
test_mod_wrap_and_backward[compile] 2.7048ms 1.3162ms 759.7467 Ops/s 695.9313 Ops/s $\textbf{\color{#35bf28}+9.17\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3092ms 0.8931ms 1.1197 KOps/s 1.0186 KOps/s $\textbf{\color{#35bf28}+9.92\%}$
test_seq_add[eager] 0.2085ms 96.5223μs 10.3603 KOps/s 10.0200 KOps/s $\color{#35bf28}+3.40\%$
test_seq_add[compile] 0.1218ms 80.4127μs 12.4358 KOps/s 12.3720 KOps/s $\color{#35bf28}+0.52\%$
test_seq_add[compile-overhead] 0.1666ms 0.1139ms 8.7797 KOps/s 8.7718 KOps/s $\color{#35bf28}+0.09\%$
test_seq_wrap[eager] 0.5510ms 0.3786ms 2.6412 KOps/s 2.5815 KOps/s $\color{#35bf28}+2.31\%$
test_seq_wrap[compile] 0.3833ms 0.3022ms 3.3094 KOps/s 3.2511 KOps/s $\color{#35bf28}+1.79\%$
test_seq_wrap[compile-overhead] 0.2783ms 0.2096ms 4.7705 KOps/s 4.7795 KOps/s $\color{#d91a1a}-0.19\%$
test_func_call_runtime[False-eager] 0.9397ms 0.7598ms 1.3161 KOps/s 1.3315 KOps/s $\color{#d91a1a}-1.16\%$
test_func_call_runtime[False-compile] 0.9082ms 0.7791ms 1.2835 KOps/s 1.2526 KOps/s $\color{#35bf28}+2.47\%$
test_func_call_runtime[False-compile-overhead] 0.4103ms 0.3523ms 2.8386 KOps/s 2.8469 KOps/s $\color{#d91a1a}-0.29\%$
test_func_call_runtime[True-eager] 0.9739ms 0.8807ms 1.1355 KOps/s 1.1040 KOps/s $\color{#35bf28}+2.85\%$
test_func_call_runtime[True-compile] 0.9724ms 0.8164ms 1.2248 KOps/s 1.2077 KOps/s $\color{#35bf28}+1.42\%$
test_func_call_runtime[True-compile-overhead] 0.4554ms 0.3840ms 2.6038 KOps/s 2.6163 KOps/s $\color{#d91a1a}-0.48\%$
test_func_call_cm_runtime[False-eager] 0.7776ms 0.7175ms 1.3938 KOps/s 1.3460 KOps/s $\color{#35bf28}+3.55\%$
test_func_call_cm_runtime[False-compile] 0.8474ms 0.7865ms 1.2714 KOps/s 1.2515 KOps/s $\color{#35bf28}+1.59\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4489ms 0.3515ms 2.8448 KOps/s 2.8350 KOps/s $\color{#35bf28}+0.35\%$
test_func_call_cm_runtime[True-eager] 1.0614ms 0.9815ms 1.0189 KOps/s 1.0014 KOps/s $\color{#35bf28}+1.75\%$
test_func_call_cm_runtime[True-compile] 0.9661ms 0.8446ms 1.1840 KOps/s 1.1692 KOps/s $\color{#35bf28}+1.26\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5303ms 0.4074ms 2.4548 KOps/s 2.4347 KOps/s $\color{#35bf28}+0.83\%$
test_vmap_func_call_cm_runtime[eager] 2.5700ms 2.0598ms 485.4781 Ops/s 474.2609 Ops/s $\color{#35bf28}+2.37\%$
test_vmap_func_call_cm_runtime[compile] 0.9110ms 0.8565ms 1.1676 KOps/s 1.1501 KOps/s $\color{#35bf28}+1.52\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5292ms 0.4140ms 2.4152 KOps/s 2.4089 KOps/s $\color{#35bf28}+0.26\%$
test_distributed 0.6137ms 0.1169ms 8.5565 KOps/s 8.8338 KOps/s $\color{#d91a1a}-3.14\%$
test_tdmodule 68.5310μs 15.5449μs 64.3298 KOps/s 65.7818 KOps/s $\color{#d91a1a}-2.21\%$
test_tdmodule_dispatch 68.6310μs 31.8279μs 31.4190 KOps/s 32.6448 KOps/s $\color{#d91a1a}-3.76\%$
test_tdseq 36.3810μs 16.0948μs 62.1317 KOps/s 62.5192 KOps/s $\color{#d91a1a}-0.62\%$
test_tdseq_dispatch 61.7010μs 34.3316μs 29.1277 KOps/s 30.0180 KOps/s $\color{#d91a1a}-2.97\%$
test_instantiation_functorch 1.9693ms 1.8638ms 536.5269 Ops/s 538.8685 Ops/s $\color{#d91a1a}-0.43\%$
test_instantiation_td 1.8129ms 1.1990ms 834.0047 Ops/s 837.0280 Ops/s $\color{#d91a1a}-0.36\%$
test_exec_functorch 0.2580ms 0.2047ms 4.8843 KOps/s 4.6539 KOps/s $\color{#35bf28}+4.95\%$
test_exec_functional_call 0.2832ms 0.2072ms 4.8257 KOps/s 4.6514 KOps/s $\color{#35bf28}+3.75\%$
test_exec_td 0.3626ms 0.2135ms 4.6844 KOps/s 4.6170 KOps/s $\color{#35bf28}+1.46\%$
test_exec_td_decorator 0.9643ms 0.2543ms 3.9316 KOps/s 3.8942 KOps/s $\color{#35bf28}+0.96\%$
test_vmap_mlp_speed[True-True] 0.7995ms 0.6945ms 1.4399 KOps/s 1.4491 KOps/s $\color{#d91a1a}-0.63\%$
test_vmap_mlp_speed[True-False] 0.8053ms 0.6873ms 1.4549 KOps/s 1.4511 KOps/s $\color{#35bf28}+0.26\%$
test_vmap_mlp_speed[False-True] 0.6888ms 0.5712ms 1.7508 KOps/s 1.6867 KOps/s $\color{#35bf28}+3.80\%$
test_vmap_mlp_speed[False-False] 0.6846ms 0.5734ms 1.7441 KOps/s 1.7385 KOps/s $\color{#35bf28}+0.32\%$
test_vmap_mlp_speed_decorator[True-True] 1.4529ms 0.6726ms 1.4867 KOps/s 1.4687 KOps/s $\color{#35bf28}+1.23\%$
test_vmap_mlp_speed_decorator[True-False] 0.7990ms 0.6733ms 1.4852 KOps/s 1.4831 KOps/s $\color{#35bf28}+0.14\%$
test_vmap_mlp_speed_decorator[False-True] 0.7174ms 0.5863ms 1.7056 KOps/s 1.6901 KOps/s $\color{#35bf28}+0.92\%$
test_vmap_mlp_speed_decorator[False-False] 0.6833ms 0.5863ms 1.7056 KOps/s 1.6968 KOps/s $\color{#35bf28}+0.52\%$
test_vmap_transformer_speed[True-True] 8.4045ms 8.3206ms 120.1842 Ops/s 118.4981 Ops/s $\color{#35bf28}+1.42\%$
test_vmap_transformer_speed[True-False] 8.4758ms 8.3267ms 120.0952 Ops/s 118.6619 Ops/s $\color{#35bf28}+1.21\%$
test_vmap_transformer_speed[False-True] 8.3386ms 8.1361ms 122.9085 Ops/s 121.1420 Ops/s $\color{#35bf28}+1.46\%$
test_vmap_transformer_speed[False-False] 8.1961ms 8.1112ms 123.2869 Ops/s 121.6334 Ops/s $\color{#35bf28}+1.36\%$
test_vmap_transformer_speed_decorator[True-True] 19.5978ms 19.4790ms 51.3374 Ops/s 50.8691 Ops/s $\color{#35bf28}+0.92\%$
test_vmap_transformer_speed_decorator[True-False] 19.8876ms 19.5089ms 51.2586 Ops/s 50.9203 Ops/s $\color{#35bf28}+0.66\%$
test_vmap_transformer_speed_decorator[False-True] 19.4790ms 19.3561ms 51.6632 Ops/s 51.4193 Ops/s $\color{#35bf28}+0.47\%$
test_vmap_transformer_speed_decorator[False-False] 19.4759ms 19.3681ms 51.6313 Ops/s 51.1614 Ops/s $\color{#35bf28}+0.92\%$
test_to_module_speed[True] 1.3960ms 0.9220ms 1.0846 KOps/s 1.0824 KOps/s $\color{#35bf28}+0.20\%$
test_to_module_speed[False] 1.2655ms 0.9226ms 1.0838 KOps/s 1.1019 KOps/s $\color{#d91a1a}-1.64\%$
test_tc_init 69.9410μs 35.5237μs 28.1502 KOps/s 28.9700 KOps/s $\color{#d91a1a}-2.83\%$
test_tc_init_nested 0.1096ms 75.3917μs 13.2641 KOps/s 14.2238 KOps/s $\textbf{\color{#d91a1a}-6.75\%}$
test_tc_first_layer_tensor 3.6959μs 0.6776μs 1.4757 MOps/s 1.4672 MOps/s $\color{#35bf28}+0.58\%$
test_tc_first_layer_nontensor 32.6400μs 2.2157μs 451.3147 KOps/s 446.9932 KOps/s $\color{#35bf28}+0.97\%$
test_tc_second_layer_tensor 9.6500μs 1.3662μs 731.9514 KOps/s 732.2887 KOps/s $\color{#d91a1a}-0.05\%$
test_tc_second_layer_nontensor 25.2400μs 2.9474μs 339.2806 KOps/s 342.0065 KOps/s $\color{#d91a1a}-0.80\%$
test_unbind 0.1867s 12.1442ms 82.3441 Ops/s 96.6823 Ops/s $\textbf{\color{#d91a1a}-14.83\%}$
test_full_like 0.6500ms 0.5738ms 1.7429 KOps/s 1.7426 KOps/s $\color{#35bf28}+0.02\%$
test_zeros_like 0.2896ms 0.1979ms 5.0542 KOps/s 5.0553 KOps/s $\color{#d91a1a}-0.02\%$
test_ones_like 0.2776ms 0.1976ms 5.0605 KOps/s 5.0585 KOps/s $\color{#35bf28}+0.04\%$
test_clone 0.4560ms 0.4141ms 2.4148 KOps/s 2.4134 KOps/s $\color{#35bf28}+0.06\%$
test_squeeze 62.1610μs 9.8100μs 101.9366 KOps/s 102.9250 KOps/s $\color{#d91a1a}-0.96\%$
test_unsqueeze 0.2462ms 73.4990μs 13.6056 KOps/s 13.2795 KOps/s $\color{#35bf28}+2.46\%$
test_split 0.3827ms 0.1568ms 6.3756 KOps/s 6.2518 KOps/s $\color{#35bf28}+1.98\%$
test_permute 0.2446ms 0.1808ms 5.5302 KOps/s 5.6376 KOps/s $\color{#d91a1a}-1.91\%$
test_stack 1.2534ms 0.8706ms 1.1486 KOps/s 1.1641 KOps/s $\color{#d91a1a}-1.33\%$
test_cat 1.2629ms 1.2316ms 811.9481 Ops/s 812.1283 Ops/s $\color{#d91a1a}-0.02\%$

@vmoens vmoens merged commit da98f1e into main Sep 11, 2024
48 checks passed
@vmoens vmoens deleted the faster-setitem branch September 11, 2024 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants