Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] to for consolidated TDs #851

Merged
merged 58 commits into from
Jul 26, 2024
Merged

[Feature] to for consolidated TDs #851

merged 58 commits into from
Jul 26, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jul 3, 2024

Features:

  • copying a td that is consolidated on another device keeps it consolidated
  • is_consolidated function
  • support for non-indexed cuda devices (assumes index is 0)

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 3, 2024
Copy link

github-actions bot commented Jul 3, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}21$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 52.8980μs 21.6085μs 46.2782 KOps/s 46.0774 KOps/s $\color{#35bf28}+0.44\%$
test_plain_set_stack_nested 62.4060μs 21.6224μs 46.2483 KOps/s 46.3197 KOps/s $\color{#d91a1a}-0.15\%$
test_plain_set_nested_inplace 65.3620μs 23.5729μs 42.4217 KOps/s 42.5739 KOps/s $\color{#d91a1a}-0.36\%$
test_plain_set_stack_nested_inplace 66.1630μs 23.4442μs 42.6544 KOps/s 42.8212 KOps/s $\color{#d91a1a}-0.39\%$
test_items 19.9660μs 2.7939μs 357.9265 KOps/s 369.7970 KOps/s $\color{#d91a1a}-3.21\%$
test_items_nested 0.4680ms 0.3417ms 2.9263 KOps/s 2.9318 KOps/s $\color{#d91a1a}-0.19\%$
test_items_nested_locked 0.4890ms 0.3387ms 2.9521 KOps/s 2.9494 KOps/s $\color{#35bf28}+0.09\%$
test_items_nested_leaf 0.1547ms 85.1932μs 11.7380 KOps/s 11.0420 KOps/s $\textbf{\color{#35bf28}+6.30\%}$
test_items_stack_nested 0.5396ms 0.3422ms 2.9221 KOps/s 2.9529 KOps/s $\color{#d91a1a}-1.04\%$
test_items_stack_nested_leaf 0.1465ms 81.3683μs 12.2898 KOps/s 10.6561 KOps/s $\textbf{\color{#35bf28}+15.33\%}$
test_items_stack_nested_locked 0.4894ms 0.3444ms 2.9034 KOps/s 2.9334 KOps/s $\color{#d91a1a}-1.02\%$
test_keys 26.4290μs 3.8736μs 258.1606 KOps/s 258.3295 KOps/s $\color{#d91a1a}-0.07\%$
test_keys_nested 0.2551ms 0.1489ms 6.7173 KOps/s 7.0151 KOps/s $\color{#d91a1a}-4.24\%$
test_keys_nested_locked 0.7648ms 0.1538ms 6.5040 KOps/s 6.7536 KOps/s $\color{#d91a1a}-3.70\%$
test_keys_nested_leaf 0.2108ms 0.1279ms 7.8205 KOps/s 8.0279 KOps/s $\color{#d91a1a}-2.58\%$
test_keys_stack_nested 0.2351ms 0.1454ms 6.8763 KOps/s 6.9369 KOps/s $\color{#d91a1a}-0.87\%$
test_keys_stack_nested_leaf 0.2094ms 0.1251ms 7.9958 KOps/s 8.1830 KOps/s $\color{#d91a1a}-2.29\%$
test_keys_stack_nested_locked 0.2934ms 0.1513ms 6.6073 KOps/s 6.7121 KOps/s $\color{#d91a1a}-1.56\%$
test_values 7.0908μs 1.1897μs 840.5226 KOps/s 822.7462 KOps/s $\color{#35bf28}+2.16\%$
test_values_nested 0.1036ms 52.5910μs 19.0147 KOps/s 20.2883 KOps/s $\textbf{\color{#d91a1a}-6.28\%}$
test_values_nested_locked 0.1243ms 52.4783μs 19.0555 KOps/s 20.3097 KOps/s $\textbf{\color{#d91a1a}-6.18\%}$
test_values_nested_leaf 0.1211ms 46.9053μs 21.3196 KOps/s 22.5796 KOps/s $\textbf{\color{#d91a1a}-5.58\%}$
test_values_stack_nested 0.1318ms 52.8112μs 18.9354 KOps/s 20.2097 KOps/s $\textbf{\color{#d91a1a}-6.31\%}$
test_values_stack_nested_leaf 83.9160μs 45.7445μs 21.8606 KOps/s 22.5710 KOps/s $\color{#d91a1a}-3.15\%$
test_values_stack_nested_locked 0.1125ms 53.9553μs 18.5339 KOps/s 20.2373 KOps/s $\textbf{\color{#d91a1a}-8.42\%}$
test_membership 51.9360μs 0.9201μs 1.0868 MOps/s 1.2886 MOps/s $\textbf{\color{#d91a1a}-15.66\%}$
test_membership_nested 21.2200μs 2.6327μs 379.8386 KOps/s 371.1844 KOps/s $\color{#35bf28}+2.33\%$
test_membership_nested_leaf 25.9980μs 2.6719μs 374.2642 KOps/s 367.8550 KOps/s $\color{#35bf28}+1.74\%$
test_membership_stacked_nested 27.7120μs 2.6325μs 379.8660 KOps/s 373.3144 KOps/s $\color{#35bf28}+1.75\%$
test_membership_stacked_nested_leaf 23.5330μs 2.6542μs 376.7594 KOps/s 370.0492 KOps/s $\color{#35bf28}+1.81\%$
test_membership_nested_last 34.1130μs 3.9880μs 250.7520 KOps/s 253.5955 KOps/s $\color{#d91a1a}-1.12\%$
test_membership_nested_leaf_last 27.6810μs 4.0497μs 246.9302 KOps/s 253.8283 KOps/s $\color{#d91a1a}-2.72\%$
test_membership_stacked_nested_last 44.1120μs 12.6263μs 79.1996 KOps/s 254.7856 KOps/s $\textbf{\color{#d91a1a}-68.92\%}$
test_membership_stacked_nested_leaf_last 46.7370μs 12.7154μs 78.6451 KOps/s 252.9422 KOps/s $\textbf{\color{#d91a1a}-68.91\%}$
test_nested_getleaf 30.9670μs 10.6949μs 93.5025 KOps/s 96.8905 KOps/s $\color{#d91a1a}-3.50\%$
test_nested_get 30.9270μs 10.1390μs 98.6293 KOps/s 99.4694 KOps/s $\color{#d91a1a}-0.84\%$
test_stacked_getleaf 34.6350μs 10.6127μs 94.2264 KOps/s 97.5676 KOps/s $\color{#d91a1a}-3.42\%$
test_stacked_get 34.6540μs 9.8399μs 101.6273 KOps/s 102.5881 KOps/s $\color{#d91a1a}-0.94\%$
test_nested_getitemleaf 33.1320μs 11.1248μs 89.8894 KOps/s 92.3215 KOps/s $\color{#d91a1a}-2.63\%$
test_nested_getitem 63.2380μs 10.2755μs 97.3191 KOps/s 99.6119 KOps/s $\color{#d91a1a}-2.30\%$
test_stacked_getitemleaf 37.6400μs 11.0912μs 90.1613 KOps/s 94.0442 KOps/s $\color{#d91a1a}-4.13\%$
test_stacked_getitem 46.6570μs 10.1260μs 98.7558 KOps/s 100.5286 KOps/s $\color{#d91a1a}-1.76\%$
test_lock_nested 85.9254ms 0.5806ms 1.7224 KOps/s 2.0476 KOps/s $\textbf{\color{#d91a1a}-15.88\%}$
test_lock_stack_nested 0.6902ms 0.4484ms 2.2301 KOps/s 2.1491 KOps/s $\color{#35bf28}+3.77\%$
test_unlock_nested 80.9117ms 0.4936ms 2.0260 KOps/s 2.4628 KOps/s $\textbf{\color{#d91a1a}-17.73\%}$
test_unlock_stack_nested 0.7460ms 0.3647ms 2.7420 KOps/s 2.6134 KOps/s $\color{#35bf28}+4.92\%$
test_flatten_speed 0.5306ms 0.1034ms 9.6747 KOps/s 8.9023 KOps/s $\textbf{\color{#35bf28}+8.68\%}$
test_unflatten_speed 0.6326ms 0.4304ms 2.3232 KOps/s 2.3298 KOps/s $\color{#d91a1a}-0.29\%$
test_common_ops 1.7271ms 1.0743ms 930.8190 Ops/s 903.0308 Ops/s $\color{#35bf28}+3.08\%$
test_creation 21.5110μs 2.0815μs 480.4225 KOps/s 448.4523 KOps/s $\textbf{\color{#35bf28}+7.13\%}$
test_creation_empty 46.8470μs 17.8320μs 56.0791 KOps/s 54.4557 KOps/s $\color{#35bf28}+2.98\%$
test_creation_nested_1 50.6540μs 20.7687μs 48.1495 KOps/s 46.2297 KOps/s $\color{#35bf28}+4.15\%$
test_creation_nested_2 56.0240μs 24.6958μs 40.4927 KOps/s 40.0059 KOps/s $\color{#35bf28}+1.22\%$
test_clone 91.8600μs 16.5769μs 60.3248 KOps/s 61.3629 KOps/s $\color{#d91a1a}-1.69\%$
test_getitem[int] 1.2216ms 16.5588μs 60.3908 KOps/s 62.8990 KOps/s $\color{#d91a1a}-3.99\%$
test_getitem[slice_int] 0.1332ms 31.9382μs 31.3105 KOps/s 32.3713 KOps/s $\color{#d91a1a}-3.28\%$
test_getitem[range] 0.3255ms 57.0565μs 17.5265 KOps/s 17.8361 KOps/s $\color{#d91a1a}-1.74\%$
test_getitem[tuple] 0.1233ms 25.7001μs 38.9103 KOps/s 40.0276 KOps/s $\color{#d91a1a}-2.79\%$
test_getitem[list] 0.2062ms 52.3208μs 19.1129 KOps/s 19.3378 KOps/s $\color{#d91a1a}-1.16\%$
test_setitem_dim[int] 65.2910μs 39.4733μs 25.3336 KOps/s 24.2655 KOps/s $\color{#35bf28}+4.40\%$
test_setitem_dim[slice_int] 0.1229ms 69.7587μs 14.3351 KOps/s 13.8811 KOps/s $\color{#35bf28}+3.27\%$
test_setitem_dim[range] 0.1472ms 91.1710μs 10.9684 KOps/s 10.6755 KOps/s $\color{#35bf28}+2.74\%$
test_setitem_dim[tuple] 0.1131ms 56.5116μs 17.6955 KOps/s 16.7354 KOps/s $\textbf{\color{#35bf28}+5.74\%}$
test_setitem 0.1065ms 28.3784μs 35.2381 KOps/s 33.4328 KOps/s $\textbf{\color{#35bf28}+5.40\%}$
test_set 85.7690μs 27.5203μs 36.3368 KOps/s 34.2606 KOps/s $\textbf{\color{#35bf28}+6.06\%}$
test_set_shared 4.3922ms 0.2161ms 4.6266 KOps/s 4.6540 KOps/s $\color{#d91a1a}-0.59\%$
test_update 0.1878ms 34.5707μs 28.9262 KOps/s 26.8435 KOps/s $\textbf{\color{#35bf28}+7.76\%}$
test_update_nested 0.1242ms 44.5741μs 22.4346 KOps/s 21.3420 KOps/s $\textbf{\color{#35bf28}+5.12\%}$
test_update__nested 0.1233ms 34.2339μs 29.2108 KOps/s 29.0530 KOps/s $\color{#35bf28}+0.54\%$
test_set_nested 0.1044ms 30.0317μs 33.2982 KOps/s 31.6455 KOps/s $\textbf{\color{#35bf28}+5.22\%}$
test_set_nested_new 0.1489ms 34.8784μs 28.6710 KOps/s 27.3463 KOps/s $\color{#35bf28}+4.84\%$
test_select 0.1136ms 52.0179μs 19.2241 KOps/s 18.7157 KOps/s $\color{#35bf28}+2.72\%$
test_select_nested 0.1185ms 59.8656μs 16.7041 KOps/s 17.1900 KOps/s $\color{#d91a1a}-2.83\%$
test_exclude_nested 0.1439ms 77.9455μs 12.8295 KOps/s 13.1359 KOps/s $\color{#d91a1a}-2.33\%$
test_empty[True] 0.4644ms 0.3242ms 3.0848 KOps/s 3.1596 KOps/s $\color{#d91a1a}-2.37\%$
test_empty[False] 7.0830μs 1.1672μs 856.7260 KOps/s 879.8575 KOps/s $\color{#d91a1a}-2.63\%$
test_unbind_speed 0.4200ms 0.3123ms 3.2021 KOps/s 3.2970 KOps/s $\color{#d91a1a}-2.88\%$
test_unbind_speed_stack0 0.4527ms 0.2959ms 3.3798 KOps/s 3.3316 KOps/s $\color{#35bf28}+1.45\%$
test_unbind_speed_stack1 83.6185ms 0.7635ms 1.3097 KOps/s 1.3936 KOps/s $\textbf{\color{#d91a1a}-6.02\%}$
test_split 81.1058ms 2.1403ms 467.2290 Ops/s 474.2652 Ops/s $\color{#d91a1a}-1.48\%$
test_chunk 79.2577ms 2.1349ms 468.3988 Ops/s 472.3293 Ops/s $\color{#d91a1a}-0.83\%$
test_creation[device0] 0.2314ms 0.1175ms 8.5074 KOps/s 8.4281 KOps/s $\color{#35bf28}+0.94\%$
test_creation_from_tensor 4.3762ms 0.1197ms 8.3567 KOps/s 8.4580 KOps/s $\color{#d91a1a}-1.20\%$
test_add_one[memmap_tensor0] 0.1643ms 7.8699μs 127.0662 KOps/s 130.7332 KOps/s $\color{#d91a1a}-2.80\%$
test_contiguous[memmap_tensor0] 20.2870μs 1.9717μs 507.1672 KOps/s 505.8105 KOps/s $\color{#35bf28}+0.27\%$
test_stack[memmap_tensor0] 60.1320μs 5.8789μs 170.0985 KOps/s 172.5930 KOps/s $\color{#d91a1a}-1.45\%$
test_memmaptd_index 1.1456ms 0.4031ms 2.4810 KOps/s 2.5766 KOps/s $\color{#d91a1a}-3.71\%$
test_memmaptd_index_astensor 0.9852ms 0.4815ms 2.0768 KOps/s 2.1298 KOps/s $\color{#d91a1a}-2.49\%$
test_memmaptd_index_op 1.9908ms 1.0214ms 979.0035 Ops/s 983.6298 Ops/s $\color{#d91a1a}-0.47\%$
test_serialize_model 0.1256s 0.1184s 8.4457 Ops/s 7.3667 Ops/s $\textbf{\color{#35bf28}+14.65\%}$
test_serialize_model_pickle 0.4835s 0.4014s 2.4911 Ops/s 2.5053 Ops/s $\color{#d91a1a}-0.57\%$
test_serialize_weights 0.1220s 0.1173s 8.5245 Ops/s 8.1702 Ops/s $\color{#35bf28}+4.34\%$
test_serialize_weights_returnearly 0.1820s 0.1610s 6.2115 Ops/s 6.0155 Ops/s $\color{#35bf28}+3.26\%$
test_serialize_weights_pickle 0.4788s 0.4375s 2.2858 Ops/s 2.4933 Ops/s $\textbf{\color{#d91a1a}-8.32\%}$
test_serialize_weights_filesystem 0.1468s 0.1424s 7.0221 Ops/s 6.5266 Ops/s $\textbf{\color{#35bf28}+7.59\%}$
test_serialize_model_filesystem 0.2357s 0.1615s 6.1927 Ops/s 6.7493 Ops/s $\textbf{\color{#d91a1a}-8.25\%}$
test_reshape_pytree 88.4210μs 39.8302μs 25.1066 KOps/s 25.2135 KOps/s $\color{#d91a1a}-0.42\%$
test_reshape_td 0.1061ms 45.4555μs 21.9995 KOps/s 21.6553 KOps/s $\color{#35bf28}+1.59\%$
test_view_pytree 81.2610μs 39.4789μs 25.3300 KOps/s 25.3286 KOps/s $+0.01\%$
test_view_td 0.1005ms 51.0952μs 19.5713 KOps/s 18.7034 KOps/s $\color{#35bf28}+4.64\%$
test_unbind_pytree 82.3030μs 37.1947μs 26.8856 KOps/s 27.0530 KOps/s $\color{#d91a1a}-0.62\%$
test_unbind_td 0.3576ms 45.4845μs 21.9855 KOps/s 22.3086 KOps/s $\color{#d91a1a}-1.45\%$
test_split_pytree 95.0660μs 39.7613μs 25.1501 KOps/s 24.8517 KOps/s $\color{#35bf28}+1.20\%$
test_split_td 0.5362ms 58.1080μs 17.2093 KOps/s 17.3767 KOps/s $\color{#d91a1a}-0.96\%$
test_add_pytree 0.1223ms 46.5505μs 21.4821 KOps/s 20.7916 KOps/s $\color{#35bf28}+3.32\%$
test_add_td 0.1557ms 82.0411μs 12.1890 KOps/s 11.9981 KOps/s $\color{#35bf28}+1.59\%$
test_compile_add_one_nested[tensordict-compile] 0.1215ms 54.1227μs 18.4765 KOps/s 18.0192 KOps/s $\color{#35bf28}+2.54\%$
test_compile_add_one_nested[tensordict-eager] 5.5584ms 0.1883ms 5.3103 KOps/s 5.1872 KOps/s $\color{#35bf28}+2.37\%$
test_compile_add_one_nested[pytree-compile] 0.1910ms 54.9352μs 18.2033 KOps/s 18.4936 KOps/s $\color{#d91a1a}-1.57\%$
test_compile_add_one_nested[pytree-eager] 0.3049ms 0.1473ms 6.7900 KOps/s 7.0215 KOps/s $\color{#d91a1a}-3.30\%$
test_compile_copy_nested[tensordict-compile] 56.8950μs 20.7285μs 48.2428 KOps/s 48.7235 KOps/s $\color{#d91a1a}-0.99\%$
test_compile_copy_nested[tensordict-eager] 0.1307ms 65.0015μs 15.3843 KOps/s 15.6160 KOps/s $\color{#d91a1a}-1.48\%$
test_compile_copy_nested[pytree-compile] 0.1587ms 79.4102μs 12.5928 KOps/s 12.4762 KOps/s $\color{#35bf28}+0.93\%$
test_compile_copy_nested[pytree-eager] 0.1494ms 71.4733μs 13.9912 KOps/s 13.9099 KOps/s $\color{#35bf28}+0.58\%$
test_compile_add_one_flat[tensordict-compile] 0.2939ms 0.1751ms 5.7114 KOps/s 5.7583 KOps/s $\color{#d91a1a}-0.81\%$
test_compile_add_one_flat[tensordict-eager] 0.3358ms 0.1936ms 5.1646 KOps/s 5.1036 KOps/s $\color{#35bf28}+1.20\%$
test_compile_add_one_flat[tensorclass-compile] 0.1047ms 38.2203μs 26.1641 KOps/s 25.5137 KOps/s $\color{#35bf28}+2.55\%$
test_compile_add_one_flat[tensorclass-eager] 0.4794ms 69.4273μs 14.4036 KOps/s 14.4244 KOps/s $\color{#d91a1a}-0.14\%$
test_compile_add_one_flat[pytree-compile] 0.2870ms 0.1735ms 5.7623 KOps/s 5.6464 KOps/s $\color{#35bf28}+2.05\%$
test_compile_add_one_flat[pytree-eager] 0.5746ms 0.3053ms 3.2756 KOps/s 3.4467 KOps/s $\color{#d91a1a}-4.96\%$
test_compile_add_self_flat[tensordict-eager] 0.4202ms 0.2107ms 4.7467 KOps/s 4.7203 KOps/s $\color{#35bf28}+0.56\%$
test_compile_add_self_flat[tensordict-compile] 0.3696ms 0.1798ms 5.5630 KOps/s 5.7835 KOps/s $\color{#d91a1a}-3.81\%$
test_compile_add_self_flat[tensorclass-eager] 0.7473ms 63.8080μs 15.6720 KOps/s 15.2298 KOps/s $\color{#35bf28}+2.90\%$
test_compile_add_self_flat[tensorclass-compile] 0.1176ms 41.0402μs 24.3664 KOps/s 24.8409 KOps/s $\color{#d91a1a}-1.91\%$
test_compile_add_self_flat[pytree-eager] 0.4673ms 0.2544ms 3.9311 KOps/s 4.2263 KOps/s $\textbf{\color{#d91a1a}-6.98\%}$
test_compile_add_self_flat[pytree-compile] 0.2917ms 0.1767ms 5.6583 KOps/s 5.7838 KOps/s $\color{#d91a1a}-2.17\%$
test_compile_copy_flat[tensordict-compile] 0.2481ms 0.1108ms 9.0264 KOps/s 9.3947 KOps/s $\color{#d91a1a}-3.92\%$
test_compile_copy_flat[tensordict-eager] 0.1212ms 59.9184μs 16.6894 KOps/s 17.5468 KOps/s $\color{#d91a1a}-4.89\%$
test_compile_copy_flat[pytree-compile] 0.1635ms 80.5553μs 12.4138 KOps/s 12.1577 KOps/s $\color{#35bf28}+2.11\%$
test_compile_copy_flat[pytree-eager] 0.1851ms 76.3412μs 13.0991 KOps/s 13.6009 KOps/s $\color{#d91a1a}-3.69\%$
test_compile_assign_and_add[tensordict-compile] 0.2686ms 0.1926ms 5.1926 KOps/s 5.1723 KOps/s $\color{#35bf28}+0.39\%$
test_compile_assign_and_add[tensordict-eager] 3.1758ms 1.6363ms 611.1318 Ops/s 606.4360 Ops/s $\color{#35bf28}+0.77\%$
test_compile_assign_and_add[pytree-compile] 0.2754ms 0.1940ms 5.1557 KOps/s 5.2447 KOps/s $\color{#d91a1a}-1.70\%$
test_compile_assign_and_add[pytree-eager] 1.4049ms 1.1202ms 892.6635 Ops/s 917.4139 Ops/s $\color{#d91a1a}-2.70\%$
test_compile_assign_and_add_stack[compile] 0.5697ms 0.4310ms 2.3199 KOps/s 2.3452 KOps/s $\color{#d91a1a}-1.08\%$
test_compile_assign_and_add_stack[eager] 4.4177ms 3.8494ms 259.7789 Ops/s 260.6636 Ops/s $\color{#d91a1a}-0.34\%$
test_compile_indexing[tensor-tensordict-compile] 0.1150ms 32.1318μs 31.1218 KOps/s 30.7179 KOps/s $\color{#35bf28}+1.32\%$
test_compile_indexing[tensor-tensordict-eager] 1.4117ms 49.1570μs 20.3430 KOps/s 20.9095 KOps/s $\color{#d91a1a}-2.71\%$
test_compile_indexing[tensor-tensorclass-compile] 95.9910μs 28.3919μs 35.2213 KOps/s 34.0862 KOps/s $\color{#35bf28}+3.33\%$
test_compile_indexing[tensor-tensorclass-eager] 86.2600μs 31.6588μs 31.5868 KOps/s 32.9586 KOps/s $\color{#d91a1a}-4.16\%$
test_compile_indexing[tensor-pytree-compile] 72.0650μs 28.4990μs 35.0889 KOps/s 34.9835 KOps/s $\color{#35bf28}+0.30\%$
test_compile_indexing[tensor-pytree-eager] 0.1229ms 31.3771μs 31.8704 KOps/s 33.3286 KOps/s $\color{#d91a1a}-4.38\%$
test_compile_indexing[slice-tensordict-compile] 0.1389ms 71.4036μs 14.0049 KOps/s 13.6144 KOps/s $\color{#35bf28}+2.87\%$
test_compile_indexing[slice-tensordict-eager] 0.5205ms 27.8690μs 35.8822 KOps/s 36.2543 KOps/s $\color{#d91a1a}-1.03\%$
test_compile_indexing[slice-tensorclass-compile] 0.1341ms 67.1831μs 14.8847 KOps/s 14.6413 KOps/s $\color{#35bf28}+1.66\%$
test_compile_indexing[slice-tensorclass-eager] 82.1820μs 25.4710μs 39.2603 KOps/s 41.1571 KOps/s $\color{#d91a1a}-4.61\%$
test_compile_indexing[slice-pytree-compile] 0.1396ms 66.9494μs 14.9367 KOps/s 14.5977 KOps/s $\color{#35bf28}+2.32\%$
test_compile_indexing[slice-pytree-eager] 84.4070μs 24.9488μs 40.0822 KOps/s 41.0291 KOps/s $\color{#d91a1a}-2.31\%$
test_compile_indexing[int-tensordict-compile] 0.1627ms 72.7478μs 13.7461 KOps/s 13.7638 KOps/s $\color{#d91a1a}-0.13\%$
test_compile_indexing[int-tensordict-eager] 1.2161ms 27.9498μs 35.7784 KOps/s 36.7783 KOps/s $\color{#d91a1a}-2.72\%$
test_compile_indexing[int-tensorclass-compile] 0.1435ms 66.7395μs 14.9836 KOps/s 14.6336 KOps/s $\color{#35bf28}+2.39\%$
test_compile_indexing[int-tensorclass-eager] 64.3890μs 24.2954μs 41.1601 KOps/s 40.5824 KOps/s $\color{#35bf28}+1.42\%$
test_compile_indexing[int-pytree-compile] 0.1388ms 66.9091μs 14.9456 KOps/s 14.7385 KOps/s $\color{#35bf28}+1.41\%$
test_compile_indexing[int-pytree-eager] 73.1960μs 24.8246μs 40.2826 KOps/s 41.6137 KOps/s $\color{#d91a1a}-3.20\%$
test_mod_add[eager] 0.1170ms 25.0730μs 39.8836 KOps/s 39.7294 KOps/s $\color{#35bf28}+0.39\%$
test_mod_add[compile] 0.1157ms 37.6377μs 26.5691 KOps/s 25.4844 KOps/s $\color{#35bf28}+4.26\%$
test_mod_add[compile-overhead] 0.1096ms 38.6503μs 25.8730 KOps/s 25.9134 KOps/s $\color{#d91a1a}-0.16\%$
test_mod_wrap[eager] 0.4214ms 0.2085ms 4.7951 KOps/s 4.7396 KOps/s $\color{#35bf28}+1.17\%$
test_mod_wrap[compile] 1.6118ms 0.2278ms 4.3901 KOps/s 4.3789 KOps/s $\color{#35bf28}+0.26\%$
test_mod_wrap[compile-overhead] 0.3862ms 0.2353ms 4.2503 KOps/s 4.3808 KOps/s $\color{#d91a1a}-2.98\%$
test_mod_wrap_and_backward[eager] 15.5966ms 11.4411ms 87.4044 Ops/s 89.8491 Ops/s $\color{#d91a1a}-2.72\%$
test_mod_wrap_and_backward[compile] 13.8205ms 11.9448ms 83.7186 Ops/s 92.5024 Ops/s $\textbf{\color{#d91a1a}-9.50\%}$
test_mod_wrap_and_backward[compile-overhead] 14.7329ms 12.3089ms 81.2419 Ops/s 86.1772 Ops/s $\textbf{\color{#d91a1a}-5.73\%}$
test_seq_add[eager] 0.1523ms 85.0629μs 11.7560 KOps/s 11.3606 KOps/s $\color{#35bf28}+3.48\%$
test_seq_add[compile] 0.1495ms 60.2288μs 16.6033 KOps/s 16.1667 KOps/s $\color{#35bf28}+2.70\%$
test_seq_add[compile-overhead] 0.1597ms 59.5254μs 16.7996 KOps/s 16.6177 KOps/s $\color{#35bf28}+1.09\%$
test_seq_wrap[eager] 0.6263ms 0.3810ms 2.6248 KOps/s 2.6760 KOps/s $\color{#d91a1a}-1.91\%$
test_seq_wrap[compile] 0.7894ms 0.2611ms 3.8295 KOps/s 3.7899 KOps/s $\color{#35bf28}+1.04\%$
test_seq_wrap[compile-overhead] 0.6519ms 0.2615ms 3.8234 KOps/s 3.7641 KOps/s $\color{#35bf28}+1.58\%$
test_func_call_runtime[False-eager] 0.9249ms 0.5307ms 1.8842 KOps/s 1.9256 KOps/s $\color{#d91a1a}-2.15\%$
test_func_call_runtime[False-compile] 0.6473ms 0.4973ms 2.0110 KOps/s 1.9986 KOps/s $\color{#35bf28}+0.62\%$
test_func_call_runtime[False-compile-overhead] 0.8868ms 0.5020ms 1.9920 KOps/s 2.0094 KOps/s $\color{#d91a1a}-0.87\%$
test_func_call_runtime[True-eager] 0.8664ms 0.7563ms 1.3222 KOps/s 1.3321 KOps/s $\color{#d91a1a}-0.74\%$
test_func_call_runtime[True-compile] 0.8038ms 0.5139ms 1.9461 KOps/s 1.9413 KOps/s $\color{#35bf28}+0.24\%$
test_func_call_runtime[True-compile-overhead] 0.8916ms 0.5159ms 1.9385 KOps/s 1.9668 KOps/s $\color{#d91a1a}-1.44\%$
test_func_call_cm_runtime[False-eager] 0.7649ms 0.5229ms 1.9126 KOps/s 1.9101 KOps/s $\color{#35bf28}+0.13\%$
test_func_call_cm_runtime[False-compile] 0.6379ms 0.4961ms 2.0157 KOps/s 2.0249 KOps/s $\color{#d91a1a}-0.45\%$
test_func_call_cm_runtime[False-compile-overhead] 0.6832ms 0.5001ms 1.9995 KOps/s 2.0004 KOps/s $\color{#d91a1a}-0.04\%$
test_func_call_cm_runtime[True-eager] 1.2359ms 0.8981ms 1.1135 KOps/s 1.1304 KOps/s $\color{#d91a1a}-1.50\%$
test_func_call_cm_runtime[True-compile] 1.3844ms 0.8531ms 1.1722 KOps/s 1.1889 KOps/s $\color{#d91a1a}-1.40\%$
test_func_call_cm_runtime[True-compile-overhead] 1.2352ms 0.8544ms 1.1704 KOps/s 1.1843 KOps/s $\color{#d91a1a}-1.17\%$
test_distributed 0.4491ms 0.1292ms 7.7398 KOps/s 7.5864 KOps/s $\color{#35bf28}+2.02\%$
test_tdmodule 47.1670μs 17.2641μs 57.9237 KOps/s 56.8963 KOps/s $\color{#35bf28}+1.81\%$
test_tdmodule_dispatch 92.1210μs 36.0470μs 27.7415 KOps/s 27.2792 KOps/s $\color{#35bf28}+1.69\%$
test_tdseq 36.0260μs 19.2957μs 51.8251 KOps/s 50.3925 KOps/s $\color{#35bf28}+2.84\%$
test_tdseq_dispatch 81.1210μs 40.4171μs 24.7420 KOps/s 24.3429 KOps/s $\color{#35bf28}+1.64\%$
test_instantiation_functorch 1.8896ms 1.6476ms 606.9558 Ops/s 590.6905 Ops/s $\color{#35bf28}+2.75\%$
test_instantiation_td 1.8022ms 1.1801ms 847.3714 Ops/s 823.5177 Ops/s $\color{#35bf28}+2.90\%$
test_exec_functorch 0.3027ms 0.1770ms 5.6492 KOps/s 5.6152 KOps/s $\color{#35bf28}+0.61\%$
test_exec_functional_call 0.3358ms 0.1759ms 5.6862 KOps/s 6.0062 KOps/s $\textbf{\color{#d91a1a}-5.33\%}$
test_exec_td 0.3396ms 0.1750ms 5.7137 KOps/s 5.9554 KOps/s $\color{#d91a1a}-4.06\%$
test_exec_td_decorator 1.0978ms 0.2287ms 4.3725 KOps/s 4.4675 KOps/s $\color{#d91a1a}-2.13\%$
test_vmap_mlp_speed[True-True] 1.1103ms 0.6065ms 1.6487 KOps/s 1.6758 KOps/s $\color{#d91a1a}-1.61\%$
test_vmap_mlp_speed[True-False] 0.8119ms 0.5962ms 1.6774 KOps/s 1.6696 KOps/s $\color{#35bf28}+0.47\%$
test_vmap_mlp_speed[False-True] 0.8426ms 0.4975ms 2.0100 KOps/s 2.0276 KOps/s $\color{#d91a1a}-0.87\%$
test_vmap_mlp_speed[False-False] 0.7340ms 0.4980ms 2.0080 KOps/s 2.0389 KOps/s $\color{#d91a1a}-1.52\%$
test_vmap_mlp_speed_decorator[True-True] 1.4958ms 0.6539ms 1.5293 KOps/s 1.5360 KOps/s $\color{#d91a1a}-0.44\%$
test_vmap_mlp_speed_decorator[True-False] 0.9853ms 0.6605ms 1.5140 KOps/s 1.5335 KOps/s $\color{#d91a1a}-1.28\%$
test_vmap_mlp_speed_decorator[False-True] 0.8699ms 0.5449ms 1.8352 KOps/s 1.4761 KOps/s $\textbf{\color{#35bf28}+24.33\%}$
test_vmap_mlp_speed_decorator[False-False] 0.8649ms 0.5370ms 1.8621 KOps/s 1.7811 KOps/s $\color{#35bf28}+4.55\%$
test_to_module_speed[True] 2.1597ms 1.3462ms 742.8387 Ops/s 738.7396 Ops/s $\color{#35bf28}+0.55\%$
test_to_module_speed[False] 2.0309ms 1.3066ms 765.3294 Ops/s 768.2710 Ops/s $\color{#d91a1a}-0.38\%$
test_tc_init 78.8160μs 43.6989μs 22.8838 KOps/s 21.8011 KOps/s $\color{#35bf28}+4.97\%$
test_tc_init_nested 0.1676ms 85.9047μs 11.6408 KOps/s 10.6549 KOps/s $\textbf{\color{#35bf28}+9.25\%}$
test_tc_first_layer_tensor 54.2400μs 1.4186μs 704.9113 KOps/s 704.0925 KOps/s $\color{#35bf28}+0.12\%$
test_tc_first_layer_nontensor 22.1610μs 4.2042μs 237.8599 KOps/s 240.7968 KOps/s $\color{#d91a1a}-1.22\%$
test_tc_second_layer_tensor 42.9950μs 2.6009μs 384.4793 KOps/s 373.1618 KOps/s $\color{#35bf28}+3.03\%$
test_tc_second_layer_nontensor 35.1950μs 5.4574μs 183.2374 KOps/s 185.9626 KOps/s $\color{#d91a1a}-1.47\%$
test_unbind 0.4979s 15.5177ms 64.4427 Ops/s 74.1137 Ops/s $\textbf{\color{#d91a1a}-13.05\%}$
test_full_like 11.9672ms 8.8543ms 112.9389 Ops/s 125.0047 Ops/s $\textbf{\color{#d91a1a}-9.65\%}$
test_zeros_like 17.1276ms 7.1703ms 139.4633 Ops/s 132.8626 Ops/s $\color{#35bf28}+4.97\%$
test_ones_like 16.6673ms 7.8636ms 127.1677 Ops/s 131.6117 Ops/s $\color{#d91a1a}-3.38\%$
test_clone 17.2407ms 9.8841ms 101.1727 Ops/s 106.1067 Ops/s $\color{#d91a1a}-4.65\%$
test_squeeze 96.9710μs 13.1101μs 76.2768 KOps/s 75.1503 KOps/s $\color{#35bf28}+1.50\%$
test_unsqueeze 0.1659ms 92.0971μs 10.8581 KOps/s 10.8806 KOps/s $\color{#d91a1a}-0.21\%$
test_split 0.3547ms 0.2013ms 4.9674 KOps/s 4.9979 KOps/s $\color{#d91a1a}-0.61\%$
test_permute 0.4735ms 0.2232ms 4.4806 KOps/s 4.5719 KOps/s $\color{#d91a1a}-2.00\%$
test_stack 36.9432ms 27.3570ms 36.5537 Ops/s 40.0079 Ops/s $\textbf{\color{#d91a1a}-8.63\%}$
test_cat 31.9539ms 26.8705ms 37.2156 Ops/s 40.7503 Ops/s $\textbf{\color{#d91a1a}-8.67\%}$

Copy link

github-actions bot commented Jul 3, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 225. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}39$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 37.3410μs 17.2293μs 58.0408 KOps/s 66.0454 KOps/s $\textbf{\color{#d91a1a}-12.12\%}$
test_plain_set_stack_nested 36.6820μs 17.2937μs 57.8245 KOps/s 66.0759 KOps/s $\textbf{\color{#d91a1a}-12.49\%}$
test_plain_set_nested_inplace 40.1310μs 18.1607μs 55.0639 KOps/s 61.1373 KOps/s $\textbf{\color{#d91a1a}-9.93\%}$
test_plain_set_stack_nested_inplace 46.3110μs 18.2652μs 54.7489 KOps/s 60.8481 KOps/s $\textbf{\color{#d91a1a}-10.02\%}$
test_items 21.9200μs 4.7159μs 212.0474 KOps/s 212.7638 KOps/s $\color{#d91a1a}-0.34\%$
test_items_nested 0.4562ms 0.3650ms 2.7395 KOps/s 2.7490 KOps/s $\color{#d91a1a}-0.35\%$
test_items_nested_locked 0.4783ms 0.3682ms 2.7161 KOps/s 2.7557 KOps/s $\color{#d91a1a}-1.44\%$
test_items_nested_leaf 0.1161ms 84.0247μs 11.9013 KOps/s 11.9017 KOps/s $-0.00\%$
test_items_stack_nested 0.4594ms 0.3634ms 2.7519 KOps/s 2.7262 KOps/s $\color{#35bf28}+0.94\%$
test_items_stack_nested_leaf 0.1234ms 83.7877μs 11.9349 KOps/s 11.8035 KOps/s $\color{#35bf28}+1.11\%$
test_items_stack_nested_locked 0.4521ms 0.3634ms 2.7522 KOps/s 2.7186 KOps/s $\color{#35bf28}+1.23\%$
test_keys 17.9600μs 4.3974μs 227.4090 KOps/s 227.3102 KOps/s $\color{#35bf28}+0.04\%$
test_keys_nested 92.5120μs 66.0744μs 15.1344 KOps/s 14.7419 KOps/s $\color{#35bf28}+2.66\%$
test_keys_nested_locked 0.9948ms 71.7431μs 13.9386 KOps/s 13.7149 KOps/s $\color{#35bf28}+1.63\%$
test_keys_nested_leaf 87.5220μs 57.4669μs 17.4013 KOps/s 17.6077 KOps/s $\color{#d91a1a}-1.17\%$
test_keys_stack_nested 0.1092ms 66.6047μs 15.0140 KOps/s 15.0550 KOps/s $\color{#d91a1a}-0.27\%$
test_keys_stack_nested_leaf 94.3220μs 56.9928μs 17.5461 KOps/s 17.1094 KOps/s $\color{#35bf28}+2.55\%$
test_keys_stack_nested_locked 0.1134ms 71.1416μs 14.0565 KOps/s 13.9130 KOps/s $\color{#35bf28}+1.03\%$
test_values 7.1207μs 1.7694μs 565.1715 KOps/s 567.3802 KOps/s $\color{#d91a1a}-0.39\%$
test_values_nested 57.2610μs 34.4357μs 29.0396 KOps/s 29.4844 KOps/s $\color{#d91a1a}-1.51\%$
test_values_nested_locked 57.0100μs 36.5421μs 27.3657 KOps/s 27.8290 KOps/s $\color{#d91a1a}-1.66\%$
test_values_nested_leaf 52.9710μs 30.7346μs 32.5366 KOps/s 33.0584 KOps/s $\color{#d91a1a}-1.58\%$
test_values_stack_nested 60.3600μs 34.8475μs 28.6965 KOps/s 28.9059 KOps/s $\color{#d91a1a}-0.72\%$
test_values_stack_nested_leaf 51.4210μs 31.3024μs 31.9464 KOps/s 32.7030 KOps/s $\color{#d91a1a}-2.31\%$
test_values_stack_nested_locked 60.9400μs 36.9165μs 27.0882 KOps/s 27.3064 KOps/s $\color{#d91a1a}-0.80\%$
test_membership 1.2340μs 0.5463μs 1.8304 MOps/s 1.8099 MOps/s $\color{#35bf28}+1.13\%$
test_membership_nested 10.3750μs 1.9190μs 521.0933 KOps/s 512.3199 KOps/s $\color{#35bf28}+1.71\%$
test_membership_nested_leaf 10.5505μs 1.8939μs 527.9995 KOps/s 522.1757 KOps/s $\color{#35bf28}+1.12\%$
test_membership_stacked_nested 21.6720μs 1.9608μs 510.0035 KOps/s 502.3955 KOps/s $\color{#35bf28}+1.51\%$
test_membership_stacked_nested_leaf 16.7420μs 1.9553μs 511.4327 KOps/s 519.9631 KOps/s $\color{#d91a1a}-1.64\%$
test_membership_nested_last 27.4710μs 2.9294μs 341.3615 KOps/s 350.4318 KOps/s $\color{#d91a1a}-2.59\%$
test_membership_nested_leaf_last 25.4500μs 2.9570μs 338.1775 KOps/s 347.5205 KOps/s $\color{#d91a1a}-2.69\%$
test_membership_stacked_nested_last 29.1200μs 9.1105μs 109.7638 KOps/s 146.3535 KOps/s $\textbf{\color{#d91a1a}-25.00\%}$
test_membership_stacked_nested_leaf_last 31.0300μs 9.1536μs 109.2461 KOps/s 146.6505 KOps/s $\textbf{\color{#d91a1a}-25.51\%}$
test_nested_getleaf 25.2000μs 7.9453μs 125.8601 KOps/s 126.3847 KOps/s $\color{#d91a1a}-0.42\%$
test_nested_get 22.9400μs 7.4171μs 134.8229 KOps/s 134.6438 KOps/s $\color{#35bf28}+0.13\%$
test_stacked_getleaf 25.2200μs 7.9364μs 126.0023 KOps/s 125.4164 KOps/s $\color{#35bf28}+0.47\%$
test_stacked_get 30.4710μs 7.4338μs 134.5208 KOps/s 133.6833 KOps/s $\color{#35bf28}+0.63\%$
test_nested_getitemleaf 23.5300μs 8.0880μs 123.6397 KOps/s 123.3278 KOps/s $\color{#35bf28}+0.25\%$
test_nested_getitem 23.2300μs 7.5774μs 131.9720 KOps/s 130.4745 KOps/s $\color{#35bf28}+1.15\%$
test_stacked_getitemleaf 26.4400μs 8.0994μs 123.4667 KOps/s 122.9794 KOps/s $\color{#35bf28}+0.40\%$
test_stacked_getitem 26.9610μs 7.6099μs 131.4074 KOps/s 130.4825 KOps/s $\color{#35bf28}+0.71\%$
test_lock_nested 9.4219ms 0.4775ms 2.0942 KOps/s 2.1264 KOps/s $\color{#d91a1a}-1.51\%$
test_lock_stack_nested 0.5324ms 0.4167ms 2.3999 KOps/s 2.3483 KOps/s $\color{#35bf28}+2.19\%$
test_unlock_nested 0.8916ms 0.3902ms 2.5627 KOps/s 2.5761 KOps/s $\color{#d91a1a}-0.52\%$
test_unlock_stack_nested 0.3706ms 0.3374ms 2.9641 KOps/s 2.9088 KOps/s $\color{#35bf28}+1.90\%$
test_flatten_speed 0.5084ms 0.1052ms 9.5038 KOps/s 9.6271 KOps/s $\color{#d91a1a}-1.28\%$
test_unflatten_speed 0.3809ms 0.2830ms 3.5334 KOps/s 3.4598 KOps/s $\color{#35bf28}+2.13\%$
test_common_ops 1.5796ms 1.3414ms 745.5031 Ops/s 786.0700 Ops/s $\textbf{\color{#d91a1a}-5.16\%}$
test_creation 18.8400μs 1.6417μs 609.1403 KOps/s 597.1993 KOps/s $\color{#35bf28}+2.00\%$
test_creation_empty 41.4510μs 17.6857μs 56.5429 KOps/s 73.3484 KOps/s $\textbf{\color{#d91a1a}-22.91\%}$
test_creation_nested_1 45.7420μs 19.8183μs 50.4583 KOps/s 64.8895 KOps/s $\textbf{\color{#d91a1a}-22.24\%}$
test_creation_nested_2 48.6320μs 22.3238μs 44.7952 KOps/s 55.2506 KOps/s $\textbf{\color{#d91a1a}-18.92\%}$
test_clone 54.5220μs 29.4700μs 33.9328 KOps/s 31.5543 KOps/s $\textbf{\color{#35bf28}+7.54\%}$
test_getitem[int] 1.1610ms 16.7104μs 59.8430 KOps/s 61.5340 KOps/s $\color{#d91a1a}-2.75\%$
test_getitem[slice_int] 0.1527ms 28.1315μs 35.5473 KOps/s 35.7639 KOps/s $\color{#d91a1a}-0.61\%$
test_getitem[range] 0.3226ms 0.1135ms 8.8101 KOps/s 8.9793 KOps/s $\color{#d91a1a}-1.88\%$
test_getitem[tuple] 90.3494ms 29.7514μs 33.6118 KOps/s 40.5008 KOps/s $\textbf{\color{#d91a1a}-17.01\%}$
test_getitem[list] 0.2211ms 0.1064ms 9.4022 KOps/s 9.3269 KOps/s $\color{#35bf28}+0.81\%$
test_setitem_dim[int] 80.8920μs 55.7810μs 17.9273 KOps/s 19.8623 KOps/s $\textbf{\color{#d91a1a}-9.74\%}$
test_setitem_dim[slice_int] 0.1057ms 77.7779μs 12.8571 KOps/s 13.1953 KOps/s $\color{#d91a1a}-2.56\%$
test_setitem_dim[range] 0.1732ms 0.1392ms 7.1822 KOps/s 7.1111 KOps/s $\color{#35bf28}+1.00\%$
test_setitem_dim[tuple] 92.2420μs 69.9602μs 14.2938 KOps/s 14.5277 KOps/s $\color{#d91a1a}-1.61\%$
test_setitem 69.0520μs 44.0344μs 22.7095 KOps/s 23.2975 KOps/s $\color{#d91a1a}-2.52\%$
test_set 88.2730μs 44.6510μs 22.3959 KOps/s 24.3574 KOps/s $\textbf{\color{#d91a1a}-8.05\%}$
test_set_shared 0.3852ms 52.7321μs 18.9638 KOps/s 17.9655 KOps/s $\textbf{\color{#35bf28}+5.56\%}$
test_update 89.3220μs 53.3186μs 18.7552 KOps/s 21.0964 KOps/s $\textbf{\color{#d91a1a}-11.10\%}$
test_update_nested 0.3809ms 62.3917μs 16.0278 KOps/s 18.1447 KOps/s $\textbf{\color{#d91a1a}-11.67\%}$
test_update__nested 0.1045ms 65.2972μs 15.3146 KOps/s 15.7129 KOps/s $\color{#d91a1a}-2.53\%$
test_set_nested 72.8620μs 47.3831μs 21.1046 KOps/s 22.5186 KOps/s $\textbf{\color{#d91a1a}-6.28\%}$
test_set_nested_new 96.5020μs 51.6461μs 19.3626 KOps/s 20.7035 KOps/s $\textbf{\color{#d91a1a}-6.48\%}$
test_select 0.1130ms 66.4702μs 15.0443 KOps/s 16.4771 KOps/s $\textbf{\color{#d91a1a}-8.70\%}$
test_select_nested 0.2897ms 51.1120μs 19.5649 KOps/s 19.6046 KOps/s $\color{#d91a1a}-0.20\%$
test_exclude_nested 0.1022ms 68.4077μs 14.6182 KOps/s 14.8166 KOps/s $\color{#d91a1a}-1.34\%$
test_empty[True] 0.3576ms 0.2845ms 3.5152 KOps/s 3.5189 KOps/s $\color{#d91a1a}-0.10\%$
test_empty[False] 2.5810μs 0.8782μs 1.1387 MOps/s 1.1424 MOps/s $\color{#d91a1a}-0.32\%$
test_to 63.9220μs 40.2811μs 24.8255 KOps/s 26.1348 KOps/s $\textbf{\color{#d91a1a}-5.01\%}$
test_to_nonblocking 52.1110μs 26.3583μs 37.9387 KOps/s 43.1469 KOps/s $\textbf{\color{#d91a1a}-12.07\%}$
test_unbind_speed 0.9521ms 0.3009ms 3.3236 KOps/s 3.3092 KOps/s $\color{#35bf28}+0.44\%$
test_unbind_speed_stack0 0.3869ms 0.2886ms 3.4650 KOps/s 3.3920 KOps/s $\color{#35bf28}+2.15\%$
test_unbind_speed_stack1 90.3938ms 0.7606ms 1.3148 KOps/s 1.4334 KOps/s $\textbf{\color{#d91a1a}-8.27\%}$
test_split 92.0382ms 2.2786ms 438.8728 Ops/s 439.6557 Ops/s $\color{#d91a1a}-0.18\%$
test_chunk 2.3100ms 2.0833ms 480.0153 Ops/s 430.0410 Ops/s $\textbf{\color{#35bf28}+11.62\%}$
test_creation[device0] 0.1599ms 0.1021ms 9.7943 KOps/s 9.3596 KOps/s $\color{#35bf28}+4.64\%$
test_creation_from_tensor 0.1641ms 0.1016ms 9.8388 KOps/s 9.6749 KOps/s $\color{#35bf28}+1.69\%$
test_add_one[memmap_tensor0] 70.7810μs 8.7851μs 113.8293 KOps/s 116.8690 KOps/s $\color{#d91a1a}-2.60\%$
test_contiguous[memmap_tensor0] 15.0010μs 2.1113μs 473.6391 KOps/s 475.3525 KOps/s $\color{#d91a1a}-0.36\%$
test_stack[memmap_tensor0] 33.9920μs 6.6535μs 150.2965 KOps/s 154.6838 KOps/s $\color{#d91a1a}-2.84\%$
test_memmaptd_index 93.0218ms 0.4843ms 2.0649 KOps/s 2.4168 KOps/s $\textbf{\color{#d91a1a}-14.56\%}$
test_memmaptd_index_astensor 0.7988ms 0.4884ms 2.0475 KOps/s 2.0547 KOps/s $\color{#d91a1a}-0.35\%$
test_memmaptd_index_op 1.4361ms 1.0578ms 945.3687 Ops/s 1.0350 KOps/s $\textbf{\color{#d91a1a}-8.66\%}$
test_serialize_model 93.6175ms 89.7713ms 11.1394 Ops/s 10.9655 Ops/s $\color{#35bf28}+1.59\%$
test_serialize_model_pickle 1.3522s 1.2368s 0.8085 Ops/s 0.8082 Ops/s $\color{#35bf28}+0.03\%$
test_serialize_weights 88.4518ms 85.7978ms 11.6553 Ops/s 9.6518 Ops/s $\textbf{\color{#35bf28}+20.76\%}$
test_serialize_weights_returnearly 0.2108s 64.2458ms 15.5652 Ops/s 15.1779 Ops/s $\color{#35bf28}+2.55\%$
test_serialize_weights_pickle 1.3492s 1.2369s 0.8085 Ops/s 0.8035 Ops/s $\color{#35bf28}+0.62\%$
test_reshape_pytree 61.3720μs 38.3398μs 26.0825 KOps/s 26.3201 KOps/s $\color{#d91a1a}-0.90\%$
test_reshape_td 75.0420μs 44.7549μs 22.3439 KOps/s 23.2696 KOps/s $\color{#d91a1a}-3.98\%$
test_view_pytree 62.3820μs 37.2749μs 26.8277 KOps/s 26.6351 KOps/s $\color{#35bf28}+0.72\%$
test_view_td 83.8020μs 50.5915μs 19.7661 KOps/s 20.2581 KOps/s $\color{#d91a1a}-2.43\%$
test_unbind_pytree 60.6020μs 37.2560μs 26.8413 KOps/s 27.5859 KOps/s $\color{#d91a1a}-2.70\%$
test_unbind_td 0.4286ms 45.6610μs 21.9005 KOps/s 22.1748 KOps/s $\color{#d91a1a}-1.24\%$
test_split_pytree 0.2045ms 52.5854μs 19.0167 KOps/s 20.2991 KOps/s $\textbf{\color{#d91a1a}-6.32\%}$
test_split_td 94.3602ms 68.4864μs 14.6014 KOps/s 17.3643 KOps/s $\textbf{\color{#d91a1a}-15.91\%}$
test_add_pytree 0.1021ms 63.1269μs 15.8411 KOps/s 17.3557 KOps/s $\textbf{\color{#d91a1a}-8.73\%}$
test_add_td 0.1471ms 0.1008ms 9.9251 KOps/s 11.7983 KOps/s $\textbf{\color{#d91a1a}-15.88\%}$
test_compile_add_one_nested[tensordict-compile] 0.4110ms 0.2072ms 4.8253 KOps/s 4.8102 KOps/s $\color{#35bf28}+0.31\%$
test_compile_add_one_nested[tensordict-eager] 0.2561ms 0.1712ms 5.8410 KOps/s 5.8971 KOps/s $\color{#d91a1a}-0.95\%$
test_compile_add_one_nested[pytree-compile] 0.1827ms 0.1445ms 6.9225 KOps/s 6.9686 KOps/s $\color{#d91a1a}-0.66\%$
test_compile_add_one_nested[pytree-eager] 0.2316ms 0.1925ms 5.1945 KOps/s 5.2455 KOps/s $\color{#d91a1a}-0.97\%$
test_compile_copy_nested[tensordict-compile] 59.6620μs 22.2738μs 44.8959 KOps/s 45.8327 KOps/s $\color{#d91a1a}-2.04\%$
test_compile_copy_nested[tensordict-eager] 76.4420μs 48.4686μs 20.6319 KOps/s 20.7354 KOps/s $\color{#d91a1a}-0.50\%$
test_compile_copy_nested[pytree-compile] 0.1129ms 73.5592μs 13.5945 KOps/s 13.7326 KOps/s $\color{#d91a1a}-1.01\%$
test_compile_copy_nested[pytree-eager] 88.3320μs 60.1875μs 16.6147 KOps/s 16.7849 KOps/s $\color{#d91a1a}-1.01\%$
test_compile_add_one_flat[tensordict-compile] 0.3786ms 0.3199ms 3.1258 KOps/s 3.1304 KOps/s $\color{#d91a1a}-0.15\%$
test_compile_add_one_flat[tensordict-eager] 0.2717ms 0.2199ms 4.5481 KOps/s 4.4866 KOps/s $\color{#35bf28}+1.37\%$
test_compile_add_one_flat[tensorclass-compile] 0.1698ms 0.1298ms 7.7045 KOps/s 7.7322 KOps/s $\color{#d91a1a}-0.36\%$
test_compile_add_one_flat[tensorclass-eager] 0.1222ms 61.2394μs 16.3294 KOps/s 16.1818 KOps/s $\color{#35bf28}+0.91\%$
test_compile_add_one_flat[pytree-compile] 0.3735ms 0.3183ms 3.1416 KOps/s 3.1292 KOps/s $\color{#35bf28}+0.40\%$
test_compile_add_one_flat[pytree-eager] 0.7079ms 0.6300ms 1.5874 KOps/s 1.5452 KOps/s $\color{#35bf28}+2.73\%$
test_compile_add_self_flat[tensordict-eager] 0.3131ms 0.2710ms 3.6894 KOps/s 3.6888 KOps/s $\color{#35bf28}+0.02\%$
test_compile_add_self_flat[tensordict-compile] 0.3635ms 0.3208ms 3.1172 KOps/s 3.1164 KOps/s $\color{#35bf28}+0.03\%$
test_compile_add_self_flat[tensorclass-eager] 0.1792ms 75.2381μs 13.2911 KOps/s 13.6276 KOps/s $\color{#d91a1a}-2.47\%$
test_compile_add_self_flat[tensorclass-compile] 0.2588ms 0.1296ms 7.7145 KOps/s 7.7175 KOps/s $\color{#d91a1a}-0.04\%$
test_compile_add_self_flat[pytree-eager] 0.5872ms 0.5386ms 1.8566 KOps/s 1.8641 KOps/s $\color{#d91a1a}-0.40\%$
test_compile_add_self_flat[pytree-compile] 0.4023ms 0.3175ms 3.1495 KOps/s 3.1530 KOps/s $\color{#d91a1a}-0.11\%$
test_compile_copy_flat[tensordict-compile] 37.6400μs 18.4328μs 54.2512 KOps/s 53.8687 KOps/s $\color{#35bf28}+0.71\%$
test_compile_copy_flat[tensordict-eager] 67.4720μs 33.2442μs 30.0804 KOps/s 30.1392 KOps/s $\color{#d91a1a}-0.20\%$
test_compile_copy_flat[pytree-compile] 0.1023ms 76.1461μs 13.1326 KOps/s 13.0042 KOps/s $\color{#35bf28}+0.99\%$
test_compile_copy_flat[pytree-eager] 91.8420μs 60.4398μs 16.5454 KOps/s 16.2575 KOps/s $\color{#35bf28}+1.77\%$
test_compile_assign_and_add[tensordict-compile] 2.5037ms 0.9194ms 1.0876 KOps/s 1.1046 KOps/s $\color{#d91a1a}-1.54\%$
test_compile_assign_and_add[tensordict-eager] 3.4765ms 3.3388ms 299.5111 Ops/s 304.4699 Ops/s $\color{#d91a1a}-1.63\%$
test_compile_assign_and_add[pytree-compile] 2.4568ms 0.9019ms 1.1088 KOps/s 1.1108 KOps/s $\color{#d91a1a}-0.18\%$
test_compile_assign_and_add[pytree-eager] 3.4591ms 3.2288ms 309.7080 Ops/s 316.2557 Ops/s $\color{#d91a1a}-2.07\%$
test_compile_indexing[tensor-tensordict-compile] 0.1831ms 0.1123ms 8.9057 KOps/s 9.1282 KOps/s $\color{#d91a1a}-2.44\%$
test_compile_indexing[tensor-tensordict-eager] 0.2693ms 65.4097μs 15.2882 KOps/s 16.9371 KOps/s $\textbf{\color{#d91a1a}-9.74\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.1372ms 0.1042ms 9.6007 KOps/s 9.7003 KOps/s $\color{#d91a1a}-1.03\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2408ms 45.3048μs 22.0727 KOps/s 23.0748 KOps/s $\color{#d91a1a}-4.34\%$
test_compile_indexing[tensor-pytree-compile] 0.3152ms 0.1069ms 9.3577 KOps/s 9.5211 KOps/s $\color{#d91a1a}-1.72\%$
test_compile_indexing[tensor-pytree-eager] 0.2430ms 47.4902μs 21.0570 KOps/s 21.8333 KOps/s $\color{#d91a1a}-3.56\%$
test_compile_indexing[slice-tensordict-compile] 0.3436ms 0.1381ms 7.2392 KOps/s 7.1447 KOps/s $\color{#35bf28}+1.32\%$
test_compile_indexing[slice-tensordict-eager] 0.2195ms 26.7208μs 37.4241 KOps/s 38.7798 KOps/s $\color{#d91a1a}-3.50\%$
test_compile_indexing[slice-tensorclass-compile] 0.1775ms 0.1333ms 7.5000 KOps/s 7.6627 KOps/s $\color{#d91a1a}-2.12\%$
test_compile_indexing[slice-tensorclass-eager] 0.2163ms 22.8335μs 43.7954 KOps/s 44.3915 KOps/s $\color{#d91a1a}-1.34\%$
test_compile_indexing[slice-pytree-compile] 0.3200ms 0.1302ms 7.6832 KOps/s 7.4442 KOps/s $\color{#35bf28}+3.21\%$
test_compile_indexing[slice-pytree-eager] 0.2087ms 22.6740μs 44.1033 KOps/s 44.9122 KOps/s $\color{#d91a1a}-1.80\%$
test_compile_indexing[int-tensordict-compile] 0.3379ms 0.1378ms 7.2594 KOps/s 7.1701 KOps/s $\color{#35bf28}+1.25\%$
test_compile_indexing[int-tensordict-eager] 0.4968ms 26.2589μs 38.0823 KOps/s 39.3733 KOps/s $\color{#d91a1a}-3.28\%$
test_compile_indexing[int-tensorclass-compile] 0.3297ms 0.1305ms 7.6618 KOps/s 7.4368 KOps/s $\color{#35bf28}+3.02\%$
test_compile_indexing[int-tensorclass-eager] 52.3700μs 22.1560μs 45.1345 KOps/s 46.0093 KOps/s $\color{#d91a1a}-1.90\%$
test_compile_indexing[int-pytree-compile] 0.3205ms 0.1312ms 7.6232 KOps/s 7.5209 KOps/s $\color{#35bf28}+1.36\%$
test_compile_indexing[int-pytree-eager] 0.2094ms 22.4192μs 44.6046 KOps/s 45.3937 KOps/s $\color{#d91a1a}-1.74\%$
test_mod_add[eager] 0.2378ms 38.0003μs 26.3156 KOps/s 28.0250 KOps/s $\textbf{\color{#d91a1a}-6.10\%}$
test_mod_add[compile] 93.3620μs 67.2257μs 14.8753 KOps/s 14.1702 KOps/s $\color{#35bf28}+4.98\%$
test_mod_add[compile-overhead] 0.2536ms 0.1440ms 6.9454 KOps/s 6.9469 KOps/s $\color{#d91a1a}-0.02\%$
test_mod_wrap[eager] 0.4465ms 0.2522ms 3.9653 KOps/s 3.8552 KOps/s $\color{#35bf28}+2.86\%$
test_mod_wrap[compile] 0.4971ms 0.2862ms 3.4946 KOps/s 3.5311 KOps/s $\color{#d91a1a}-1.04\%$
test_mod_wrap[compile-overhead] 8.4082ms 4.4054ms 226.9959 Ops/s 227.8692 Ops/s $\color{#d91a1a}-0.38\%$
test_mod_wrap_and_backward[eager] 1.5661ms 1.4476ms 690.8026 Ops/s 693.6049 Ops/s $\color{#d91a1a}-0.40\%$
test_mod_wrap_and_backward[compile] 1.7228ms 1.4260ms 701.2463 Ops/s 748.8821 Ops/s $\textbf{\color{#d91a1a}-6.36\%}$
test_mod_wrap_and_backward[compile-overhead] 1.4636ms 0.9925ms 1.0075 KOps/s 1.1242 KOps/s $\textbf{\color{#d91a1a}-10.38\%}$
test_seq_add[eager] 0.3222ms 0.1111ms 9.0006 KOps/s 9.6142 KOps/s $\textbf{\color{#d91a1a}-6.38\%}$
test_seq_add[compile] 0.2994ms 85.5565μs 11.6882 KOps/s 11.3728 KOps/s $\color{#35bf28}+2.77\%$
test_seq_add[compile-overhead] 0.3355ms 0.1225ms 8.1613 KOps/s 8.2004 KOps/s $\color{#d91a1a}-0.48\%$
test_seq_wrap[eager] 0.6637ms 0.4319ms 2.3152 KOps/s 2.4430 KOps/s $\textbf{\color{#d91a1a}-5.23\%}$
test_seq_wrap[compile] 0.5473ms 0.3232ms 3.0941 KOps/s 3.1740 KOps/s $\color{#d91a1a}-2.52\%$
test_seq_wrap[compile-overhead] 0.3100s 0.1477s 6.7717 Ops/s 6.7386 Ops/s $\color{#35bf28}+0.49\%$
test_func_call_runtime[False-eager] 0.9912ms 0.7504ms 1.3327 KOps/s 1.3656 KOps/s $\color{#d91a1a}-2.41\%$
test_func_call_runtime[False-compile] 1.0074ms 0.7887ms 1.2680 KOps/s 1.2455 KOps/s $\color{#35bf28}+1.80\%$
test_func_call_runtime[False-compile-overhead] 0.5703ms 0.3568ms 2.8028 KOps/s 2.8132 KOps/s $\color{#d91a1a}-0.37\%$
test_func_call_runtime[True-eager] 1.1106ms 0.9272ms 1.0785 KOps/s 1.0726 KOps/s $\color{#35bf28}+0.55\%$
test_func_call_runtime[True-compile] 1.0488ms 0.8270ms 1.2092 KOps/s 1.2085 KOps/s $\color{#35bf28}+0.06\%$
test_func_call_runtime[True-compile-overhead] 0.6156ms 0.3988ms 2.5076 KOps/s 2.5114 KOps/s $\color{#d91a1a}-0.15\%$
test_func_call_cm_runtime[False-eager] 0.9495ms 0.7546ms 1.3252 KOps/s 1.2959 KOps/s $\color{#35bf28}+2.26\%$
test_func_call_cm_runtime[False-compile] 0.9926ms 0.7781ms 1.2852 KOps/s 1.2254 KOps/s $\color{#35bf28}+4.88\%$
test_func_call_cm_runtime[False-compile-overhead] 0.3971ms 0.3570ms 2.8008 KOps/s 2.8047 KOps/s $\color{#d91a1a}-0.14\%$
test_func_call_cm_runtime[True-eager] 1.2436ms 1.0408ms 960.7895 Ops/s 960.0858 Ops/s $\color{#35bf28}+0.07\%$
test_func_call_cm_runtime[True-compile] 1.2249ms 1.0021ms 997.8724 Ops/s 996.1178 Ops/s $\color{#35bf28}+0.18\%$
test_func_call_cm_runtime[True-compile-overhead] 1.2095ms 1.0020ms 997.9575 Ops/s 996.4554 Ops/s $\color{#35bf28}+0.15\%$
test_distributed 0.1797ms 68.3723μs 14.6258 KOps/s 12.4007 KOps/s $\textbf{\color{#35bf28}+17.94\%}$
test_tdmodule 32.5610μs 16.2370μs 61.5878 KOps/s 73.3772 KOps/s $\textbf{\color{#d91a1a}-16.07\%}$
test_tdmodule_dispatch 49.8710μs 33.0219μs 30.2829 KOps/s 35.4949 KOps/s $\textbf{\color{#d91a1a}-14.68\%}$
test_tdseq 24.5820μs 16.9443μs 59.0171 KOps/s 69.2010 KOps/s $\textbf{\color{#d91a1a}-14.72\%}$
test_tdseq_dispatch 64.3810μs 35.3215μs 28.3113 KOps/s 31.1621 KOps/s $\textbf{\color{#d91a1a}-9.15\%}$
test_instantiation_functorch 2.2048ms 1.9998ms 500.0406 Ops/s 502.5798 Ops/s $\color{#d91a1a}-0.51\%$
test_instantiation_td 2.0250ms 1.2975ms 770.7348 Ops/s 773.3981 Ops/s $\color{#d91a1a}-0.34\%$
test_exec_functorch 0.4348ms 0.2248ms 4.4482 KOps/s 4.4640 KOps/s $\color{#d91a1a}-0.35\%$
test_exec_functional_call 0.2648ms 0.2167ms 4.6150 KOps/s 4.5076 KOps/s $\color{#35bf28}+2.38\%$
test_exec_td 0.2533ms 0.2163ms 4.6237 KOps/s 4.3767 KOps/s $\textbf{\color{#35bf28}+5.64\%}$
test_exec_td_decorator 1.0920ms 0.2702ms 3.7005 KOps/s 3.4803 KOps/s $\textbf{\color{#35bf28}+6.33\%}$
test_vmap_mlp_speed[True-True] 0.8785ms 0.6641ms 1.5059 KOps/s 1.4566 KOps/s $\color{#35bf28}+3.38\%$
test_vmap_mlp_speed[True-False] 0.8758ms 0.6572ms 1.5217 KOps/s 1.4970 KOps/s $\color{#35bf28}+1.65\%$
test_vmap_mlp_speed[False-True] 0.7719ms 0.5741ms 1.7419 KOps/s 1.6418 KOps/s $\textbf{\color{#35bf28}+6.09\%}$
test_vmap_mlp_speed[False-False] 0.7891ms 0.5753ms 1.7382 KOps/s 1.6438 KOps/s $\textbf{\color{#35bf28}+5.74\%}$
test_vmap_mlp_speed_decorator[True-True] 0.9153ms 0.7093ms 1.4099 KOps/s 1.3940 KOps/s $\color{#35bf28}+1.14\%$
test_vmap_mlp_speed_decorator[True-False] 0.9401ms 0.7090ms 1.4104 KOps/s 1.4224 KOps/s $\color{#d91a1a}-0.85\%$
test_vmap_mlp_speed_decorator[False-True] 0.8197ms 0.6156ms 1.6245 KOps/s 1.6129 KOps/s $\color{#35bf28}+0.72\%$
test_vmap_mlp_speed_decorator[False-False] 0.8303ms 0.6170ms 1.6206 KOps/s 1.6161 KOps/s $\color{#35bf28}+0.28\%$
test_vmap_transformer_speed[True-True] 8.8970ms 8.6983ms 114.9653 Ops/s 115.0747 Ops/s $\color{#d91a1a}-0.10\%$
test_vmap_transformer_speed[True-False] 8.8877ms 8.6981ms 114.9681 Ops/s 115.5081 Ops/s $\color{#d91a1a}-0.47\%$
test_vmap_transformer_speed[False-True] 8.8024ms 8.5799ms 116.5511 Ops/s 115.9847 Ops/s $\color{#35bf28}+0.49\%$
test_vmap_transformer_speed[False-False] 8.7803ms 8.5744ms 116.6257 Ops/s 116.2602 Ops/s $\color{#35bf28}+0.31\%$
test_vmap_transformer_speed_decorator[True-True] 20.5212ms 20.3445ms 49.1533 Ops/s 49.1546 Ops/s $-0.00\%$
test_vmap_transformer_speed_decorator[True-False] 20.4985ms 20.3250ms 49.2005 Ops/s 49.1755 Ops/s $\color{#35bf28}+0.05\%$
test_vmap_transformer_speed_decorator[False-True] 20.3717ms 20.1578ms 49.6086 Ops/s 49.5577 Ops/s $\color{#35bf28}+0.10\%$
test_vmap_transformer_speed_decorator[False-False] 21.5381ms 20.2207ms 49.4542 Ops/s 49.3783 Ops/s $\color{#35bf28}+0.15\%$
test_to_module_speed[True] 1.6024ms 1.1409ms 876.4777 Ops/s 860.7927 Ops/s $\color{#35bf28}+1.82\%$
test_to_module_speed[False] 1.5538ms 1.1232ms 890.2929 Ops/s 877.9503 Ops/s $\color{#35bf28}+1.41\%$
test_tc_init 61.5220μs 39.8518μs 25.0930 KOps/s 28.9735 KOps/s $\textbf{\color{#d91a1a}-13.39\%}$
test_tc_init_nested 0.2704ms 79.5160μs 12.5761 KOps/s 13.9728 KOps/s $\textbf{\color{#d91a1a}-10.00\%}$
test_tc_first_layer_tensor 26.9207μs 0.7882μs 1.2688 MOps/s 1.2720 MOps/s $\color{#d91a1a}-0.25\%$
test_tc_first_layer_nontensor 11.6400μs 2.5839μs 387.0162 KOps/s 391.9811 KOps/s $\color{#d91a1a}-1.27\%$
test_tc_second_layer_tensor 38.2943μs 1.6182μs 617.9562 KOps/s 612.8019 KOps/s $\color{#35bf28}+0.84\%$
test_tc_second_layer_nontensor 26.8210μs 3.3956μs 294.5011 KOps/s 296.0895 KOps/s $\color{#d91a1a}-0.54\%$
test_unbind 0.3290s 12.3246ms 81.1383 Ops/s 80.0531 Ops/s $\color{#35bf28}+1.36\%$
test_full_like 0.7560ms 0.5786ms 1.7283 KOps/s 1.7338 KOps/s $\color{#d91a1a}-0.32\%$
test_zeros_like 0.3381ms 0.1978ms 5.0559 KOps/s 5.0560 KOps/s $-0.00\%$
test_ones_like 0.3464ms 0.1976ms 5.0604 KOps/s 5.0584 KOps/s $\color{#35bf28}+0.04\%$
test_clone 0.6031ms 0.4152ms 2.4085 KOps/s 2.4075 KOps/s $\color{#35bf28}+0.04\%$
test_squeeze 29.4500μs 10.8893μs 91.8331 KOps/s 92.9512 KOps/s $\color{#d91a1a}-1.20\%$
test_unsqueeze 0.2742ms 76.3456μs 13.0983 KOps/s 12.8932 KOps/s $\color{#35bf28}+1.59\%$
test_split 0.5208ms 0.1764ms 5.6703 KOps/s 5.6658 KOps/s $\color{#35bf28}+0.08\%$
test_permute 0.3837ms 0.1843ms 5.4263 KOps/s 5.2996 KOps/s $\color{#35bf28}+2.39\%$
test_stack 1.2661ms 0.9230ms 1.0835 KOps/s 1.1150 KOps/s $\color{#d91a1a}-2.83\%$
test_cat 1.3577ms 1.2319ms 811.7762 Ops/s 811.9211 Ops/s $\color{#d91a1a}-0.02\%$

@vmoens vmoens added enhancement New feature or request Performance labels Jul 26, 2024
@vmoens vmoens merged commit 697a01f into main Jul 26, 2024
5 of 8 checks passed
@vmoens vmoens deleted the consolidate-to-cuda branch July 26, 2024 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants