Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] flatten_keys and unflatten_keys as context managers #908

Merged
merged 1 commit into from
Jul 22, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jul 22, 2024

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jul 22, 2024
ghstack-source-id: 1628c44a3b012a32c83dfc2f543ccb08ec6ee874
Pull Request resolved: #908
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 22, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}35$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 58.1780μs 23.7191μs 42.1601 KOps/s 46.1613 KOps/s $\textbf{\color{#d91a1a}-8.67\%}$
test_plain_set_stack_nested 89.1660μs 24.2420μs 41.2507 KOps/s 45.3245 KOps/s $\textbf{\color{#d91a1a}-8.99\%}$
test_plain_set_nested_inplace 58.1880μs 26.2622μs 38.0776 KOps/s 42.1562 KOps/s $\textbf{\color{#d91a1a}-9.67\%}$
test_plain_set_stack_nested_inplace 90.1870μs 26.3970μs 37.8830 KOps/s 42.6573 KOps/s $\textbf{\color{#d91a1a}-11.19\%}$
test_items 27.6120μs 2.6748μs 373.8531 KOps/s 384.0378 KOps/s $\color{#d91a1a}-2.65\%$
test_items_nested 0.5642ms 0.3604ms 2.7747 KOps/s 2.4724 KOps/s $\textbf{\color{#35bf28}+12.23\%}$
test_items_nested_locked 1.6194ms 0.3669ms 2.7257 KOps/s 2.4703 KOps/s $\textbf{\color{#35bf28}+10.34\%}$
test_items_nested_leaf 0.1697ms 88.5528μs 11.2927 KOps/s 11.4767 KOps/s $\color{#d91a1a}-1.60\%$
test_items_stack_nested 0.7509ms 0.3644ms 2.7445 KOps/s 2.4731 KOps/s $\textbf{\color{#35bf28}+10.98\%}$
test_items_stack_nested_leaf 0.1859ms 89.1378μs 11.2186 KOps/s 11.4825 KOps/s $\color{#d91a1a}-2.30\%$
test_items_stack_nested_locked 0.5347ms 0.3619ms 2.7633 KOps/s 2.4732 KOps/s $\textbf{\color{#35bf28}+11.73\%}$
test_keys 29.8560μs 3.8857μs 257.3547 KOps/s 252.7135 KOps/s $\color{#35bf28}+1.84\%$
test_keys_nested 0.2474ms 0.1463ms 6.8337 KOps/s 7.0264 KOps/s $\color{#d91a1a}-2.74\%$
test_keys_nested_locked 1.8953ms 0.1516ms 6.5980 KOps/s 6.7190 KOps/s $\color{#d91a1a}-1.80\%$
test_keys_nested_leaf 0.2160ms 0.1257ms 7.9559 KOps/s 8.1173 KOps/s $\color{#d91a1a}-1.99\%$
test_keys_stack_nested 0.3350ms 0.1476ms 6.7773 KOps/s 6.9496 KOps/s $\color{#d91a1a}-2.48\%$
test_keys_stack_nested_leaf 0.2283ms 0.1265ms 7.9060 KOps/s 8.1377 KOps/s $\color{#d91a1a}-2.85\%$
test_keys_stack_nested_locked 0.2950ms 0.1518ms 6.5862 KOps/s 6.6163 KOps/s $\color{#d91a1a}-0.46\%$
test_values 8.5907μs 1.1720μs 853.2660 KOps/s 858.1076 KOps/s $\color{#d91a1a}-0.56\%$
test_values_nested 0.1352ms 50.7656μs 19.6984 KOps/s 19.9607 KOps/s $\color{#d91a1a}-1.31\%$
test_values_nested_locked 0.1056ms 50.6496μs 19.7435 KOps/s 19.8773 KOps/s $\color{#d91a1a}-0.67\%$
test_values_nested_leaf 0.1622ms 46.0830μs 21.7000 KOps/s 21.9143 KOps/s $\color{#d91a1a}-0.98\%$
test_values_stack_nested 0.1321ms 52.2702μs 19.1314 KOps/s 19.8670 KOps/s $\color{#d91a1a}-3.70\%$
test_values_stack_nested_leaf 0.3990ms 47.0030μs 21.2752 KOps/s 21.5336 KOps/s $\color{#d91a1a}-1.20\%$
test_values_stack_nested_locked 1.3022ms 51.9997μs 19.2309 KOps/s 19.6109 KOps/s $\color{#d91a1a}-1.94\%$
test_membership 25.2470μs 0.9416μs 1.0620 MOps/s 1.0936 MOps/s $\color{#d91a1a}-2.89\%$
test_membership_nested 52.2970μs 2.7449μs 364.3176 KOps/s 365.2647 KOps/s $\color{#d91a1a}-0.26\%$
test_membership_nested_leaf 30.3670μs 2.7483μs 363.8570 KOps/s 365.1095 KOps/s $\color{#d91a1a}-0.34\%$
test_membership_stacked_nested 30.1660μs 2.7365μs 365.4291 KOps/s 358.2319 KOps/s $\color{#35bf28}+2.01\%$
test_membership_stacked_nested_leaf 32.9610μs 2.7731μs 360.6113 KOps/s 365.7619 KOps/s $\color{#d91a1a}-1.41\%$
test_membership_nested_last 31.2580μs 4.1455μs 241.2231 KOps/s 247.1385 KOps/s $\color{#d91a1a}-2.39\%$
test_membership_nested_leaf_last 23.4930μs 4.1421μs 241.4210 KOps/s 248.8583 KOps/s $\color{#d91a1a}-2.99\%$
test_membership_stacked_nested_last 60.0110μs 7.3628μs 135.8184 KOps/s 249.6028 KOps/s $\textbf{\color{#d91a1a}-45.59\%}$
test_membership_stacked_nested_leaf_last 43.8920μs 7.2809μs 137.3448 KOps/s 244.8912 KOps/s $\textbf{\color{#d91a1a}-43.92\%}$
test_nested_getleaf 67.7860μs 11.2822μs 88.6349 KOps/s 92.6691 KOps/s $\color{#d91a1a}-4.35\%$
test_nested_get 41.4870μs 10.8258μs 92.3722 KOps/s 94.9473 KOps/s $\color{#d91a1a}-2.71\%$
test_stacked_getleaf 66.1930μs 11.1384μs 89.7791 KOps/s 91.9112 KOps/s $\color{#d91a1a}-2.32\%$
test_stacked_get 0.1501ms 10.6685μs 93.7338 KOps/s 97.5022 KOps/s $\color{#d91a1a}-3.86\%$
test_nested_getitemleaf 35.1450μs 11.8108μs 84.6683 KOps/s 88.6310 KOps/s $\color{#d91a1a}-4.47\%$
test_nested_getitem 61.0540μs 10.9855μs 91.0291 KOps/s 95.5686 KOps/s $\color{#d91a1a}-4.75\%$
test_stacked_getitemleaf 54.5110μs 11.8157μs 84.6328 KOps/s 88.3895 KOps/s $\color{#d91a1a}-4.25\%$
test_stacked_getitem 32.4500μs 10.9989μs 90.9180 KOps/s 94.8218 KOps/s $\color{#d91a1a}-4.12\%$
test_lock_nested 3.0237ms 0.5204ms 1.9215 KOps/s 1.6307 KOps/s $\textbf{\color{#35bf28}+17.83\%}$
test_lock_stack_nested 0.9242ms 0.4717ms 2.1201 KOps/s 2.0329 KOps/s $\color{#35bf28}+4.29\%$
test_unlock_nested 0.8861ms 0.4405ms 2.2703 KOps/s 2.3226 KOps/s $\color{#d91a1a}-2.25\%$
test_unlock_stack_nested 0.4843ms 0.3867ms 2.5859 KOps/s 2.4825 KOps/s $\color{#35bf28}+4.17\%$
test_flatten_speed 0.2600ms 0.1065ms 9.3879 KOps/s 9.4682 KOps/s $\color{#d91a1a}-0.85\%$
test_unflatten_speed 0.9999ms 0.4557ms 2.1946 KOps/s 2.2194 KOps/s $\color{#d91a1a}-1.12\%$
test_common_ops 2.1616ms 1.2385ms 807.4026 Ops/s 860.4017 Ops/s $\textbf{\color{#d91a1a}-6.16\%}$
test_creation 27.8720μs 2.5529μs 391.7152 KOps/s 394.2538 KOps/s $\color{#d91a1a}-0.64\%$
test_creation_empty 50.6230μs 22.6964μs 44.0599 KOps/s 54.6708 KOps/s $\textbf{\color{#d91a1a}-19.41\%}$
test_creation_nested_1 73.4460μs 26.5846μs 37.6157 KOps/s 43.8137 KOps/s $\textbf{\color{#d91a1a}-14.15\%}$
test_creation_nested_2 89.1350μs 30.5361μs 32.7481 KOps/s 38.3054 KOps/s $\textbf{\color{#d91a1a}-14.51\%}$
test_clone 0.1884ms 18.7920μs 53.2142 KOps/s 57.3491 KOps/s $\textbf{\color{#d91a1a}-7.21\%}$
test_getitem[int] 0.8472ms 12.9166μs 77.4199 KOps/s 77.5472 KOps/s $\color{#d91a1a}-0.16\%$
test_getitem[slice_int] 0.1325ms 33.4538μs 29.8920 KOps/s 29.2700 KOps/s $\color{#35bf28}+2.12\%$
test_getitem[range] 0.3644ms 58.4607μs 17.1055 KOps/s 16.8611 KOps/s $\color{#35bf28}+1.45\%$
test_getitem[tuple] 0.1488ms 27.2523μs 36.6942 KOps/s 35.9902 KOps/s $\color{#35bf28}+1.96\%$
test_getitem[list] 0.4014ms 53.2128μs 18.7925 KOps/s 18.2722 KOps/s $\color{#35bf28}+2.85\%$
test_setitem_dim[int] 82.0120μs 38.6163μs 25.8958 KOps/s 29.4853 KOps/s $\textbf{\color{#d91a1a}-12.17\%}$
test_setitem_dim[slice_int] 0.2008ms 77.9955μs 12.8213 KOps/s 13.4741 KOps/s $\color{#d91a1a}-4.85\%$
test_setitem_dim[range] 0.1457ms 98.1801μs 10.1854 KOps/s 10.5783 KOps/s $\color{#d91a1a}-3.71\%$
test_setitem_dim[tuple] 0.1130ms 64.7763μs 15.4378 KOps/s 16.2156 KOps/s $\color{#d91a1a}-4.80\%$
test_setitem 0.1518ms 33.8906μs 29.5067 KOps/s 32.8850 KOps/s $\textbf{\color{#d91a1a}-10.27\%}$
test_set 0.1811ms 33.3063μs 30.0244 KOps/s 33.5873 KOps/s $\textbf{\color{#d91a1a}-10.61\%}$
test_set_shared 3.5316ms 0.2198ms 4.5496 KOps/s 4.4990 KOps/s $\color{#35bf28}+1.13\%$
test_update 0.1863ms 41.9528μs 23.8363 KOps/s 25.9556 KOps/s $\textbf{\color{#d91a1a}-8.16\%}$
test_update_nested 0.2400ms 53.0165μs 18.8621 KOps/s 21.3819 KOps/s $\textbf{\color{#d91a1a}-11.78\%}$
test_update__nested 0.1795ms 36.5670μs 27.3470 KOps/s 28.1344 KOps/s $\color{#d91a1a}-2.80\%$
test_set_nested 0.1444ms 35.3866μs 28.2593 KOps/s 30.9158 KOps/s $\textbf{\color{#d91a1a}-8.59\%}$
test_set_nested_new 0.1698ms 41.3699μs 24.1722 KOps/s 26.0935 KOps/s $\textbf{\color{#d91a1a}-7.36\%}$
test_select 0.2196ms 59.0218μs 16.9429 KOps/s 17.8995 KOps/s $\textbf{\color{#d91a1a}-5.34\%}$
test_select_nested 0.1543ms 60.5580μs 16.5131 KOps/s 16.3527 KOps/s $\color{#35bf28}+0.98\%$
test_exclude_nested 0.1549ms 80.0434μs 12.4932 KOps/s 12.1026 KOps/s $\color{#35bf28}+3.23\%$
test_empty[True] 0.6540ms 0.3484ms 2.8700 KOps/s 2.8428 KOps/s $\color{#35bf28}+0.96\%$
test_empty[False] 12.5608μs 1.3098μs 763.4465 KOps/s 765.9821 KOps/s $\color{#d91a1a}-0.33\%$
test_unbind_speed 0.5683ms 0.3246ms 3.0809 KOps/s 3.0742 KOps/s $\color{#35bf28}+0.22\%$
test_unbind_speed_stack0 0.6171ms 0.3117ms 3.2085 KOps/s 3.1002 KOps/s $\color{#35bf28}+3.50\%$
test_unbind_speed_stack1 83.9489ms 0.8068ms 1.2395 KOps/s 1.3740 KOps/s $\textbf{\color{#d91a1a}-9.79\%}$
test_split 76.9395ms 2.2894ms 436.8004 Ops/s 402.6555 Ops/s $\textbf{\color{#35bf28}+8.48\%}$
test_chunk 83.7467ms 2.3126ms 432.4090 Ops/s 465.4370 Ops/s $\textbf{\color{#d91a1a}-7.10\%}$
test_creation[device0] 0.2288ms 0.1257ms 7.9581 KOps/s 8.0705 KOps/s $\color{#d91a1a}-1.39\%$
test_creation_from_tensor 4.2419ms 0.1253ms 7.9830 KOps/s 8.1521 KOps/s $\color{#d91a1a}-2.07\%$
test_add_one[memmap_tensor0] 0.1941ms 7.6097μs 131.4115 KOps/s 125.5856 KOps/s $\color{#35bf28}+4.64\%$
test_contiguous[memmap_tensor0] 45.3750μs 2.2110μs 452.2750 KOps/s 445.9585 KOps/s $\color{#35bf28}+1.42\%$
test_stack[memmap_tensor0] 51.2850μs 6.0022μs 166.6042 KOps/s 170.3582 KOps/s $\color{#d91a1a}-2.20\%$
test_memmaptd_index 1.2056ms 0.4527ms 2.2090 KOps/s 2.2917 KOps/s $\color{#d91a1a}-3.61\%$
test_memmaptd_index_astensor 0.8194ms 0.5277ms 1.8949 KOps/s 1.7773 KOps/s $\textbf{\color{#35bf28}+6.62\%}$
test_memmaptd_index_op 1.5221ms 1.1468ms 872.0148 Ops/s 932.3414 Ops/s $\textbf{\color{#d91a1a}-6.47\%}$
test_serialize_model 0.2095s 0.1446s 6.9148 Ops/s 7.8287 Ops/s $\textbf{\color{#d91a1a}-11.67\%}$
test_serialize_model_pickle 0.4748s 0.3993s 2.5043 Ops/s 2.4851 Ops/s $\color{#35bf28}+0.78\%$
test_serialize_weights 0.1363s 0.1287s 7.7723 Ops/s 6.9375 Ops/s $\textbf{\color{#35bf28}+12.03\%}$
test_serialize_weights_returnearly 0.1729s 0.1658s 6.0319 Ops/s 6.0651 Ops/s $\color{#d91a1a}-0.55\%$
test_serialize_weights_pickle 1.2485s 0.9118s 1.0968 Ops/s 2.5457 Ops/s $\textbf{\color{#d91a1a}-56.92\%}$
test_serialize_weights_filesystem 0.1510s 0.1455s 6.8706 Ops/s 6.6925 Ops/s $\color{#35bf28}+2.66\%$
test_serialize_model_filesystem 0.1565s 0.1453s 6.8827 Ops/s 5.8762 Ops/s $\textbf{\color{#35bf28}+17.13\%}$
test_reshape_pytree 0.2322ms 41.8723μs 23.8822 KOps/s 24.4828 KOps/s $\color{#d91a1a}-2.45\%$
test_reshape_td 0.1031ms 50.8270μs 19.6746 KOps/s 19.7988 KOps/s $\color{#d91a1a}-0.63\%$
test_view_pytree 96.9690μs 39.6329μs 25.2316 KOps/s 25.2643 KOps/s $\color{#d91a1a}-0.13\%$
test_view_td 0.1151ms 56.9581μs 17.5568 KOps/s 17.7273 KOps/s $\color{#d91a1a}-0.96\%$
test_unbind_pytree 84.4460μs 36.5813μs 27.3364 KOps/s 28.0997 KOps/s $\color{#d91a1a}-2.72\%$
test_unbind_td 78.5903ms 56.8407μs 17.5930 KOps/s 20.5109 KOps/s $\textbf{\color{#d91a1a}-14.23\%}$
test_split_pytree 82.8440μs 39.4937μs 25.3205 KOps/s 25.9083 KOps/s $\color{#d91a1a}-2.27\%$
test_split_td 0.1960ms 62.6948μs 15.9503 KOps/s 16.3209 KOps/s $\color{#d91a1a}-2.27\%$
test_add_pytree 0.1713ms 44.1392μs 22.6556 KOps/s 22.2299 KOps/s $\color{#35bf28}+1.92\%$
test_add_td 0.1729ms 92.8599μs 10.7689 KOps/s 12.0421 KOps/s $\textbf{\color{#d91a1a}-10.57\%}$
test_distributed 0.7288ms 0.1314ms 7.6104 KOps/s 7.2453 KOps/s $\textbf{\color{#35bf28}+5.04\%}$
test_tdmodule 48.3500μs 18.8169μs 53.1436 KOps/s 59.1148 KOps/s $\textbf{\color{#d91a1a}-10.10\%}$
test_tdmodule_dispatch 71.4320μs 40.0347μs 24.9783 KOps/s 28.4952 KOps/s $\textbf{\color{#d91a1a}-12.34\%}$
test_tdseq 45.6350μs 20.6113μs 48.5171 KOps/s 53.3568 KOps/s $\textbf{\color{#d91a1a}-9.07\%}$
test_tdseq_dispatch 79.1170μs 44.3227μs 22.5618 KOps/s 25.7466 KOps/s $\textbf{\color{#d91a1a}-12.37\%}$
test_instantiation_functorch 1.9402ms 1.6524ms 605.1854 Ops/s 625.4436 Ops/s $\color{#d91a1a}-3.24\%$
test_instantiation_td 2.6710ms 1.2035ms 830.8939 Ops/s 865.8988 Ops/s $\color{#d91a1a}-4.04\%$
test_exec_functorch 0.3422ms 0.1883ms 5.3111 KOps/s 5.3426 KOps/s $\color{#d91a1a}-0.59\%$
test_exec_functional_call 0.3026ms 0.1746ms 5.7269 KOps/s 5.2064 KOps/s $\textbf{\color{#35bf28}+10.00\%}$
test_exec_td 0.3309ms 0.1759ms 5.6841 KOps/s 5.6538 KOps/s $\color{#35bf28}+0.54\%$
test_exec_td_decorator 0.8462ms 0.2671ms 3.7444 KOps/s 3.7769 KOps/s $\color{#d91a1a}-0.86\%$
test_vmap_mlp_speed[True-True] 0.8685ms 0.6103ms 1.6386 KOps/s 1.6544 KOps/s $\color{#d91a1a}-0.95\%$
test_vmap_mlp_speed[True-False] 0.8962ms 0.6088ms 1.6426 KOps/s 1.6547 KOps/s $\color{#d91a1a}-0.73\%$
test_vmap_mlp_speed[False-True] 0.8464ms 0.4963ms 2.0149 KOps/s 2.0091 KOps/s $\color{#35bf28}+0.29\%$
test_vmap_mlp_speed[False-False] 0.8695ms 0.4980ms 2.0080 KOps/s 2.0158 KOps/s $\color{#d91a1a}-0.39\%$
test_vmap_mlp_speed_decorator[True-True] 1.3637ms 0.7033ms 1.4218 KOps/s 1.4226 KOps/s $\color{#d91a1a}-0.06\%$
test_vmap_mlp_speed_decorator[True-False] 0.9590ms 0.7047ms 1.4190 KOps/s 1.4300 KOps/s $\color{#d91a1a}-0.77\%$
test_vmap_mlp_speed_decorator[False-True] 0.9094ms 0.5792ms 1.7264 KOps/s 1.7133 KOps/s $\color{#35bf28}+0.77\%$
test_vmap_mlp_speed_decorator[False-False] 0.9885ms 0.5789ms 1.7276 KOps/s 1.7247 KOps/s $\color{#35bf28}+0.16\%$
test_to_module_speed[True] 2.4630ms 1.8500ms 540.5429 Ops/s 538.8634 Ops/s $\color{#35bf28}+0.31\%$
test_to_module_speed[False] 2.9039ms 1.8352ms 544.9109 Ops/s 548.5593 Ops/s $\color{#d91a1a}-0.67\%$
test_tc_init 96.9100μs 46.0740μs 21.7042 KOps/s 23.1175 KOps/s $\textbf{\color{#d91a1a}-6.11\%}$
test_tc_init_nested 0.1533ms 93.2386μs 10.7252 KOps/s 11.2473 KOps/s $\color{#d91a1a}-4.64\%$
test_tc_first_layer_tensor 46.4060μs 9.3081μs 107.4337 KOps/s 106.1683 KOps/s $\color{#35bf28}+1.19\%$
test_tc_first_layer_nontensor 51.6060μs 9.0913μs 109.9957 KOps/s 107.3081 KOps/s $\color{#35bf28}+2.50\%$
test_tc_second_layer_tensor 44.5520μs 2.8137μs 355.4093 KOps/s 340.9049 KOps/s $\color{#35bf28}+4.25\%$
test_tc_second_layer_nontensor 44.8630μs 10.3923μs 96.2249 KOps/s 92.8632 KOps/s $\color{#35bf28}+3.62\%$
test_unbind 0.1044s 14.4627ms 69.1435 Ops/s 67.4599 Ops/s $\color{#35bf28}+2.50\%$
test_full_like 11.5202ms 8.5969ms 116.3215 Ops/s 126.6579 Ops/s $\textbf{\color{#d91a1a}-8.16\%}$
test_zeros_like 14.5601ms 6.7715ms 147.6776 Ops/s 153.5228 Ops/s $\color{#d91a1a}-3.81\%$
test_ones_like 13.4976ms 7.8255ms 127.7880 Ops/s 128.7501 Ops/s $\color{#d91a1a}-0.75\%$
test_clone 17.5014ms 9.9139ms 100.8685 Ops/s 109.6009 Ops/s $\textbf{\color{#d91a1a}-7.97\%}$
test_squeeze 60.6520μs 14.8642μs 67.2757 KOps/s 64.5823 KOps/s $\color{#35bf28}+4.17\%$
test_unsqueeze 0.2137ms 0.1001ms 9.9909 KOps/s 9.8931 KOps/s $\color{#35bf28}+0.99\%$
test_split 0.4498ms 0.2070ms 4.8305 KOps/s 4.6123 KOps/s $\color{#35bf28}+4.73\%$
test_permute 0.4171ms 0.2323ms 4.3040 KOps/s 4.3333 KOps/s $\color{#d91a1a}-0.68\%$
test_stack 33.8198ms 27.2829ms 36.6530 Ops/s 39.5636 Ops/s $\textbf{\color{#d91a1a}-7.36\%}$
test_cat 33.5557ms 27.2998ms 36.6302 Ops/s 40.4875 Ops/s $\textbf{\color{#d91a1a}-9.53\%}$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}21$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 29.9400μs 17.1672μs 58.2508 KOps/s 59.5848 KOps/s $\color{#d91a1a}-2.24\%$
test_plain_set_stack_nested 36.7600μs 17.3032μs 57.7928 KOps/s 59.3534 KOps/s $\color{#d91a1a}-2.63\%$
test_plain_set_nested_inplace 43.5600μs 18.5522μs 53.9021 KOps/s 53.2050 KOps/s $\color{#35bf28}+1.31\%$
test_plain_set_stack_nested_inplace 44.2210μs 18.5265μs 53.9767 KOps/s 55.3320 KOps/s $\color{#d91a1a}-2.45\%$
test_items 17.3710μs 4.8390μs 206.6551 KOps/s 210.5973 KOps/s $\color{#d91a1a}-1.87\%$
test_items_nested 0.4928ms 0.3940ms 2.5380 KOps/s 2.4874 KOps/s $\color{#35bf28}+2.03\%$
test_items_nested_locked 0.4803ms 0.3910ms 2.5576 KOps/s 2.4830 KOps/s $\color{#35bf28}+3.00\%$
test_items_nested_leaf 0.1183ms 86.0946μs 11.6151 KOps/s 11.5808 KOps/s $\color{#35bf28}+0.30\%$
test_items_stack_nested 0.4881ms 0.3907ms 2.5595 KOps/s 2.4941 KOps/s $\color{#35bf28}+2.62\%$
test_items_stack_nested_leaf 0.1318ms 86.3261μs 11.5840 KOps/s 11.4835 KOps/s $\color{#35bf28}+0.88\%$
test_items_stack_nested_locked 0.4415ms 0.3928ms 2.5457 KOps/s 2.5179 KOps/s $\color{#35bf28}+1.10\%$
test_keys 18.0600μs 4.4067μs 226.9287 KOps/s 228.2812 KOps/s $\color{#d91a1a}-0.59\%$
test_keys_nested 0.1013ms 67.6305μs 14.7862 KOps/s 15.2294 KOps/s $\color{#d91a1a}-2.91\%$
test_keys_nested_locked 0.7333ms 73.1001μs 13.6799 KOps/s 13.6256 KOps/s $\color{#35bf28}+0.40\%$
test_keys_nested_leaf 91.2110μs 57.6563μs 17.3442 KOps/s 17.5727 KOps/s $\color{#d91a1a}-1.30\%$
test_keys_stack_nested 0.1089ms 66.3441μs 15.0729 KOps/s 15.1544 KOps/s $\color{#d91a1a}-0.54\%$
test_keys_stack_nested_leaf 93.6910μs 56.9444μs 17.5610 KOps/s 17.7404 KOps/s $\color{#d91a1a}-1.01\%$
test_keys_stack_nested_locked 0.1051ms 72.3696μs 13.8179 KOps/s 13.7371 KOps/s $\color{#35bf28}+0.59\%$
test_values 7.3570μs 1.7740μs 563.6872 KOps/s 566.9632 KOps/s $\color{#d91a1a}-0.58\%$
test_values_nested 46.4100μs 34.1181μs 29.3099 KOps/s 29.4578 KOps/s $\color{#d91a1a}-0.50\%$
test_values_nested_locked 86.0320μs 35.6650μs 28.0387 KOps/s 27.7520 KOps/s $\color{#35bf28}+1.03\%$
test_values_nested_leaf 61.5510μs 30.3009μs 33.0023 KOps/s 33.1318 KOps/s $\color{#d91a1a}-0.39\%$
test_values_stack_nested 61.4910μs 34.8897μs 28.6618 KOps/s 28.6794 KOps/s $\color{#d91a1a}-0.06\%$
test_values_stack_nested_leaf 59.3800μs 31.0267μs 32.2303 KOps/s 32.3490 KOps/s $\color{#d91a1a}-0.37\%$
test_values_stack_nested_locked 63.4010μs 36.3459μs 27.5134 KOps/s 27.4367 KOps/s $\color{#35bf28}+0.28\%$
test_membership 3.1836μs 0.5602μs 1.7850 MOps/s 1.8386 MOps/s $\color{#d91a1a}-2.92\%$
test_membership_nested 13.3900μs 2.0619μs 485.0011 KOps/s 477.9696 KOps/s $\color{#35bf28}+1.47\%$
test_membership_nested_leaf 11.2855μs 2.0730μs 482.3949 KOps/s 494.3265 KOps/s $\color{#d91a1a}-2.41\%$
test_membership_stacked_nested 33.8910μs 2.1538μs 464.2914 KOps/s 487.1459 KOps/s $\color{#d91a1a}-4.69\%$
test_membership_stacked_nested_leaf 15.7500μs 2.1281μs 469.8948 KOps/s 482.3425 KOps/s $\color{#d91a1a}-2.58\%$
test_membership_nested_last 16.7610μs 3.1051μs 322.0559 KOps/s 332.1734 KOps/s $\color{#d91a1a}-3.05\%$
test_membership_nested_leaf_last 55.7310μs 3.1185μs 320.6692 KOps/s 334.2651 KOps/s $\color{#d91a1a}-4.07\%$
test_membership_stacked_nested_last 15.9590μs 3.1094μs 321.6081 KOps/s 334.4267 KOps/s $\color{#d91a1a}-3.83\%$
test_membership_stacked_nested_leaf_last 19.9390μs 3.0666μs 326.0944 KOps/s 331.2035 KOps/s $\color{#d91a1a}-1.54\%$
test_nested_getleaf 30.2510μs 8.0069μs 124.8917 KOps/s 123.8436 KOps/s $\color{#35bf28}+0.85\%$
test_nested_get 22.3000μs 7.5595μs 132.2833 KOps/s 131.9271 KOps/s $\color{#35bf28}+0.27\%$
test_stacked_getleaf 24.4710μs 8.0506μs 124.2141 KOps/s 124.1707 KOps/s $\color{#35bf28}+0.03\%$
test_stacked_get 20.7400μs 7.5796μs 131.9323 KOps/s 131.7998 KOps/s $\color{#35bf28}+0.10\%$
test_nested_getitemleaf 67.8420μs 8.1919μs 122.0714 KOps/s 122.1611 KOps/s $\color{#d91a1a}-0.07\%$
test_nested_getitem 21.5510μs 7.7569μs 128.9174 KOps/s 129.0864 KOps/s $\color{#d91a1a}-0.13\%$
test_stacked_getitemleaf 31.0110μs 8.2170μs 121.6982 KOps/s 120.8845 KOps/s $\color{#35bf28}+0.67\%$
test_stacked_getitem 23.1310μs 7.7320μs 129.3319 KOps/s 129.5715 KOps/s $\color{#d91a1a}-0.18\%$
test_lock_nested 9.9110ms 0.4860ms 2.0575 KOps/s 2.0441 KOps/s $\color{#35bf28}+0.66\%$
test_lock_stack_nested 0.5309ms 0.4401ms 2.2724 KOps/s 2.2356 KOps/s $\color{#35bf28}+1.64\%$
test_unlock_nested 0.8833ms 0.4010ms 2.4937 KOps/s 2.4691 KOps/s $\color{#35bf28}+1.00\%$
test_unlock_stack_nested 0.3956ms 0.3610ms 2.7701 KOps/s 2.7302 KOps/s $\color{#35bf28}+1.46\%$
test_flatten_speed 0.5127ms 0.1054ms 9.4868 KOps/s 9.5619 KOps/s $\color{#d91a1a}-0.78\%$
test_unflatten_speed 0.3524ms 0.2981ms 3.3551 KOps/s 3.4187 KOps/s $\color{#d91a1a}-1.86\%$
test_common_ops 1.7397ms 1.4491ms 690.0722 Ops/s 740.5372 Ops/s $\textbf{\color{#d91a1a}-6.81\%}$
test_creation 20.4910μs 2.0504μs 487.7073 KOps/s 486.9862 KOps/s $\color{#35bf28}+0.15\%$
test_creation_empty 47.7500μs 19.4260μs 51.4774 KOps/s 57.7403 KOps/s $\textbf{\color{#d91a1a}-10.85\%}$
test_creation_nested_1 61.8610μs 21.6277μs 46.2370 KOps/s 50.9419 KOps/s $\textbf{\color{#d91a1a}-9.24\%}$
test_creation_nested_2 45.6400μs 25.6232μs 39.0272 KOps/s 44.7713 KOps/s $\textbf{\color{#d91a1a}-12.83\%}$
test_clone 70.4420μs 34.5209μs 28.9680 KOps/s 31.6106 KOps/s $\textbf{\color{#d91a1a}-8.36\%}$
test_getitem[int] 1.2943ms 18.7900μs 53.2199 KOps/s 56.1460 KOps/s $\textbf{\color{#d91a1a}-5.21\%}$
test_getitem[slice_int] 0.1559ms 33.4563μs 29.8897 KOps/s 32.0479 KOps/s $\textbf{\color{#d91a1a}-6.73\%}$
test_getitem[range] 0.2648ms 0.1217ms 8.2174 KOps/s 8.4357 KOps/s $\color{#d91a1a}-2.59\%$
test_getitem[tuple] 89.8689ms 32.2947μs 30.9648 KOps/s 37.8489 KOps/s $\textbf{\color{#d91a1a}-18.19\%}$
test_getitem[list] 0.2221ms 0.1109ms 9.0138 KOps/s 9.2520 KOps/s $\color{#d91a1a}-2.57\%$
test_setitem_dim[int] 77.8620μs 56.4469μs 17.7158 KOps/s 18.7489 KOps/s $\textbf{\color{#d91a1a}-5.51\%}$
test_setitem_dim[slice_int] 0.1194ms 82.2609μs 12.1564 KOps/s 12.7534 KOps/s $\color{#d91a1a}-4.68\%$
test_setitem_dim[range] 0.1950ms 0.1573ms 6.3586 KOps/s 7.0187 KOps/s $\textbf{\color{#d91a1a}-9.40\%}$
test_setitem_dim[tuple] 0.1157ms 80.2920μs 12.4545 KOps/s 13.9791 KOps/s $\textbf{\color{#d91a1a}-10.91\%}$
test_setitem 0.1015ms 50.2773μs 19.8897 KOps/s 20.7713 KOps/s $\color{#d91a1a}-4.24\%$
test_set 75.9720μs 48.9120μs 20.4449 KOps/s 20.2884 KOps/s $\color{#35bf28}+0.77\%$
test_set_shared 0.3907ms 56.7185μs 17.6309 KOps/s 17.9518 KOps/s $\color{#d91a1a}-1.79\%$
test_update 0.1061ms 54.4282μs 18.3728 KOps/s 18.5650 KOps/s $\color{#d91a1a}-1.03\%$
test_update_nested 98.3420μs 64.2952μs 15.5533 KOps/s 16.1121 KOps/s $\color{#d91a1a}-3.47\%$
test_update__nested 0.1090ms 70.7394μs 14.1364 KOps/s 15.5858 KOps/s $\textbf{\color{#d91a1a}-9.30\%}$
test_set_nested 82.8210μs 51.8144μs 19.2997 KOps/s 19.7579 KOps/s $\color{#d91a1a}-2.32\%$
test_set_nested_new 91.1420μs 56.3626μs 17.7423 KOps/s 17.7987 KOps/s $\color{#d91a1a}-0.32\%$
test_select 95.0010μs 71.1798μs 14.0489 KOps/s 13.8089 KOps/s $\color{#35bf28}+1.74\%$
test_select_nested 0.5327ms 53.9960μs 18.5199 KOps/s 18.6973 KOps/s $\color{#d91a1a}-0.95\%$
test_exclude_nested 0.1115ms 72.3466μs 13.8223 KOps/s 14.0105 KOps/s $\color{#d91a1a}-1.34\%$
test_empty[True] 0.3580ms 0.2937ms 3.4053 KOps/s 3.4103 KOps/s $\color{#d91a1a}-0.15\%$
test_empty[False] 3.2332μs 0.9422μs 1.0613 MOps/s 1.0857 MOps/s $\color{#d91a1a}-2.24\%$
test_to 64.6110μs 38.1394μs 26.2196 KOps/s 25.9447 KOps/s $\color{#35bf28}+1.06\%$
test_to_nonblocking 60.9210μs 24.7702μs 40.3711 KOps/s 41.5788 KOps/s $\color{#d91a1a}-2.90\%$
test_unbind_speed 0.4001ms 0.3098ms 3.2278 KOps/s 3.2110 KOps/s $\color{#35bf28}+0.52\%$
test_unbind_speed_stack0 0.4061ms 0.3043ms 3.2865 KOps/s 3.1908 KOps/s $\color{#35bf28}+3.00\%$
test_unbind_speed_stack1 89.6915ms 0.7714ms 1.2963 KOps/s 1.2597 KOps/s $\color{#35bf28}+2.91\%$
test_split 91.3963ms 2.4072ms 415.4244 Ops/s 418.8806 Ops/s $\color{#d91a1a}-0.83\%$
test_chunk 2.3313ms 2.1768ms 459.3999 Ops/s 455.3524 Ops/s $\color{#35bf28}+0.89\%$
test_creation[device0] 0.1639ms 0.1061ms 9.4228 KOps/s 8.6163 KOps/s $\textbf{\color{#35bf28}+9.36\%}$
test_creation_from_tensor 0.1843ms 0.1093ms 9.1460 KOps/s 8.8596 KOps/s $\color{#35bf28}+3.23\%$
test_add_one[memmap_tensor0] 24.7010μs 10.2545μs 97.5177 KOps/s 105.4630 KOps/s $\textbf{\color{#d91a1a}-7.53\%}$
test_contiguous[memmap_tensor0] 24.2820μs 2.2411μs 446.2120 KOps/s 428.8251 KOps/s $\color{#35bf28}+4.05\%$
test_stack[memmap_tensor0] 34.0310μs 7.0374μs 142.0981 KOps/s 147.3508 KOps/s $\color{#d91a1a}-3.56\%$
test_memmaptd_index 1.2117ms 0.4449ms 2.2478 KOps/s 1.9127 KOps/s $\textbf{\color{#35bf28}+17.52\%}$
test_memmaptd_index_astensor 0.7995ms 0.5068ms 1.9731 KOps/s 1.9564 KOps/s $\color{#35bf28}+0.85\%$
test_memmaptd_index_op 1.5554ms 1.1064ms 903.8498 Ops/s 912.4362 Ops/s $\color{#d91a1a}-0.94\%$
test_serialize_model 0.1008s 96.8150ms 10.3290 Ops/s 9.9274 Ops/s $\color{#35bf28}+4.04\%$
test_serialize_model_pickle 1.3492s 1.2362s 0.8089 Ops/s 0.8056 Ops/s $\color{#35bf28}+0.41\%$
test_serialize_weights 96.0737ms 92.8765ms 10.7670 Ops/s 9.0808 Ops/s $\textbf{\color{#35bf28}+18.57\%}$
test_serialize_weights_returnearly 86.2683ms 71.8502ms 13.9179 Ops/s 13.9211 Ops/s $\color{#d91a1a}-0.02\%$
test_serialize_weights_pickle 1.3523s 1.2368s 0.8086 Ops/s 0.8082 Ops/s $\color{#35bf28}+0.04\%$
test_reshape_pytree 0.1079ms 40.3371μs 24.7911 KOps/s 24.8256 KOps/s $\color{#d91a1a}-0.14\%$
test_reshape_td 75.0510μs 45.3240μs 22.0634 KOps/s 21.3020 KOps/s $\color{#35bf28}+3.57\%$
test_view_pytree 64.2010μs 39.5374μs 25.2925 KOps/s 25.1888 KOps/s $\color{#35bf28}+0.41\%$
test_view_td 74.5120μs 50.4119μs 19.8366 KOps/s 18.6545 KOps/s $\textbf{\color{#35bf28}+6.34\%}$
test_unbind_pytree 85.7910μs 38.0886μs 26.2546 KOps/s 26.2912 KOps/s $\color{#d91a1a}-0.14\%$
test_unbind_td 0.4411ms 46.9754μs 21.2877 KOps/s 21.0421 KOps/s $\color{#35bf28}+1.17\%$
test_split_pytree 81.0520μs 51.6913μs 19.3456 KOps/s 18.8092 KOps/s $\color{#35bf28}+2.85\%$
test_split_td 90.1962ms 71.1305μs 14.0587 KOps/s 13.6975 KOps/s $\color{#35bf28}+2.64\%$
test_add_pytree 0.1064ms 61.9452μs 16.1433 KOps/s 16.2481 KOps/s $\color{#d91a1a}-0.64\%$
test_add_td 0.1286ms 98.0347μs 10.2005 KOps/s 10.3511 KOps/s $\color{#d91a1a}-1.45\%$
test_compile_add_one_nested[tensordict-compile] 0.4158ms 0.2162ms 4.6249 KOps/s 4.7560 KOps/s $\color{#d91a1a}-2.76\%$
test_compile_add_one_nested[tensordict-eager] 0.2607ms 0.1763ms 5.6733 KOps/s 5.7079 KOps/s $\color{#d91a1a}-0.61\%$
test_compile_add_one_nested[pytree-compile] 0.2065ms 0.1469ms 6.8084 KOps/s 6.7786 KOps/s $\color{#35bf28}+0.44\%$
test_compile_add_one_nested[pytree-eager] 0.2962ms 0.2087ms 4.7915 KOps/s 4.9277 KOps/s $\color{#d91a1a}-2.76\%$
test_compile_copy_nested[tensordict-compile] 63.4210μs 22.0345μs 45.3834 KOps/s 44.3464 KOps/s $\color{#35bf28}+2.34\%$
test_compile_copy_nested[tensordict-eager] 83.8610μs 49.8410μs 20.0638 KOps/s 20.3177 KOps/s $\color{#d91a1a}-1.25\%$
test_compile_copy_nested[pytree-compile] 0.1317ms 71.6547μs 13.9558 KOps/s 13.8481 KOps/s $\color{#35bf28}+0.78\%$
test_compile_copy_nested[pytree-eager] 94.3010μs 59.5464μs 16.7936 KOps/s 16.7579 KOps/s $\color{#35bf28}+0.21\%$
test_compile_add_one_flat[tensordict-compile] 0.4265ms 0.3297ms 3.0329 KOps/s 2.9834 KOps/s $\color{#35bf28}+1.66\%$
test_compile_add_one_flat[tensordict-eager] 0.3163ms 0.2247ms 4.4504 KOps/s 4.4946 KOps/s $\color{#d91a1a}-0.98\%$
test_compile_add_one_flat[tensorclass-compile] 0.1842ms 0.1323ms 7.5613 KOps/s 7.5286 KOps/s $\color{#35bf28}+0.43\%$
test_compile_add_one_flat[tensorclass-eager] 0.1231ms 64.5316μs 15.4963 KOps/s 14.8986 KOps/s $\color{#35bf28}+4.01\%$
test_compile_add_one_flat[pytree-compile] 0.4412ms 0.3302ms 3.0283 KOps/s 3.0059 KOps/s $\color{#35bf28}+0.74\%$
test_compile_add_one_flat[pytree-eager] 0.7996ms 0.7143ms 1.4000 KOps/s 1.4867 KOps/s $\textbf{\color{#d91a1a}-5.83\%}$
test_compile_add_self_flat[tensordict-eager] 0.3668ms 0.2731ms 3.6622 KOps/s 3.6774 KOps/s $\color{#d91a1a}-0.41\%$
test_compile_add_self_flat[tensordict-compile] 0.3697ms 0.3339ms 2.9948 KOps/s 2.9484 KOps/s $\color{#35bf28}+1.58\%$
test_compile_add_self_flat[tensorclass-eager] 0.1606ms 81.4643μs 12.2753 KOps/s 12.3979 KOps/s $\color{#d91a1a}-0.99\%$
test_compile_add_self_flat[tensorclass-compile] 0.2375ms 0.1334ms 7.4966 KOps/s 7.3800 KOps/s $\color{#35bf28}+1.58\%$
test_compile_add_self_flat[pytree-eager] 0.6703ms 0.6034ms 1.6571 KOps/s 1.7706 KOps/s $\textbf{\color{#d91a1a}-6.41\%}$
test_compile_add_self_flat[pytree-compile] 0.3789ms 0.3302ms 3.0286 KOps/s 2.9747 KOps/s $\color{#35bf28}+1.81\%$
test_compile_copy_flat[tensordict-compile] 39.1300μs 18.0088μs 55.5284 KOps/s 53.2394 KOps/s $\color{#35bf28}+4.30\%$
test_compile_copy_flat[tensordict-eager] 55.5920μs 31.9031μs 31.3449 KOps/s 31.0879 KOps/s $\color{#35bf28}+0.83\%$
test_compile_copy_flat[pytree-compile] 0.1144ms 75.7340μs 13.2041 KOps/s 12.9987 KOps/s $\color{#35bf28}+1.58\%$
test_compile_copy_flat[pytree-eager] 0.1079ms 61.2577μs 16.3245 KOps/s 16.3116 KOps/s $\color{#35bf28}+0.08\%$
test_compile_assign_and_add[tensordict-compile] 2.5585ms 0.9449ms 1.0584 KOps/s 1.0302 KOps/s $\color{#35bf28}+2.73\%$
test_compile_assign_and_add[tensordict-eager] 3.7584ms 3.5013ms 285.6047 Ops/s 288.0124 Ops/s $\color{#d91a1a}-0.84\%$
test_compile_assign_and_add[pytree-compile] 2.5375ms 0.9284ms 1.0771 KOps/s 1.0537 KOps/s $\color{#35bf28}+2.22\%$
test_compile_assign_and_add[pytree-eager] 3.6325ms 3.5590ms 280.9814 Ops/s 266.6459 Ops/s $\textbf{\color{#35bf28}+5.38\%}$
test_compile_indexing[tensor-tensordict-compile] 0.1750ms 0.1162ms 8.6053 KOps/s 8.8592 KOps/s $\color{#d91a1a}-2.87\%$
test_compile_indexing[tensor-tensordict-eager] 0.2646ms 69.5397μs 14.3803 KOps/s 15.2643 KOps/s $\textbf{\color{#d91a1a}-5.79\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.1481ms 0.1063ms 9.4087 KOps/s 9.2088 KOps/s $\color{#35bf28}+2.17\%$
test_compile_indexing[tensor-tensorclass-eager] 94.7020μs 51.3754μs 19.4646 KOps/s 20.0017 KOps/s $\color{#d91a1a}-2.69\%$
test_compile_indexing[tensor-pytree-compile] 0.1696ms 0.1095ms 9.1309 KOps/s 9.4028 KOps/s $\color{#d91a1a}-2.89\%$
test_compile_indexing[tensor-pytree-eager] 0.1024ms 51.0700μs 19.5810 KOps/s 21.2301 KOps/s $\textbf{\color{#d91a1a}-7.77\%}$
test_compile_indexing[slice-tensordict-compile] 0.1857ms 0.1423ms 7.0292 KOps/s 6.9265 KOps/s $\color{#35bf28}+1.48\%$
test_compile_indexing[slice-tensordict-eager] 0.1961ms 28.1407μs 35.5358 KOps/s 36.5568 KOps/s $\color{#d91a1a}-2.79\%$
test_compile_indexing[slice-tensorclass-compile] 0.1955ms 0.1399ms 7.1504 KOps/s 7.4161 KOps/s $\color{#d91a1a}-3.58\%$
test_compile_indexing[slice-tensorclass-eager] 53.8220μs 23.4505μs 42.6429 KOps/s 43.0689 KOps/s $\color{#d91a1a}-0.99\%$
test_compile_indexing[slice-pytree-compile] 0.2120ms 0.1391ms 7.1909 KOps/s 7.3987 KOps/s $\color{#d91a1a}-2.81\%$
test_compile_indexing[slice-pytree-eager] 68.0720μs 22.8886μs 43.6899 KOps/s 38.0757 KOps/s $\textbf{\color{#35bf28}+14.74\%}$
test_compile_indexing[int-tensordict-compile] 0.1966ms 0.1426ms 7.0140 KOps/s 6.9585 KOps/s $\color{#35bf28}+0.80\%$
test_compile_indexing[int-tensordict-eager] 0.4902ms 27.5081μs 36.3529 KOps/s 37.2961 KOps/s $\color{#d91a1a}-2.53\%$
test_compile_indexing[int-tensorclass-compile] 0.1813ms 0.1370ms 7.3009 KOps/s 7.4025 KOps/s $\color{#d91a1a}-1.37\%$
test_compile_indexing[int-tensorclass-eager] 54.3420μs 22.9528μs 43.5676 KOps/s 43.8891 KOps/s $\color{#d91a1a}-0.73\%$
test_compile_indexing[int-pytree-compile] 0.1785ms 0.1345ms 7.4368 KOps/s 7.4213 KOps/s $\color{#35bf28}+0.21\%$
test_compile_indexing[int-pytree-eager] 0.4249ms 23.1225μs 43.2479 KOps/s 43.4714 KOps/s $\color{#d91a1a}-0.51\%$
test_mod_add[eager] 75.8620μs 39.1542μs 25.5400 KOps/s 25.4189 KOps/s $\color{#35bf28}+0.48\%$
test_mod_add[compile] 0.1153ms 69.2739μs 14.4354 KOps/s 14.2549 KOps/s $\color{#35bf28}+1.27\%$
test_mod_add[compile-overhead] 0.2601ms 0.1470ms 6.8026 KOps/s 6.5826 KOps/s $\color{#35bf28}+3.34\%$
test_mod_wrap[eager] 0.3573ms 0.2641ms 3.7858 KOps/s 3.5741 KOps/s $\textbf{\color{#35bf28}+5.92\%}$
test_mod_wrap[compile] 1.1915ms 0.3019ms 3.3128 KOps/s 3.2846 KOps/s $\color{#35bf28}+0.86\%$
test_mod_wrap[compile-overhead] 7.9664ms 4.1814ms 239.1568 Ops/s 232.2283 Ops/s $\color{#35bf28}+2.98\%$
test_mod_wrap_and_backward[eager] 1.6531ms 1.4701ms 680.2049 Ops/s 719.2259 Ops/s $\textbf{\color{#d91a1a}-5.43\%}$
test_mod_wrap_and_backward[compile] 1.5685ms 1.4791ms 676.0729 Ops/s 719.2027 Ops/s $\textbf{\color{#d91a1a}-6.00\%}$
test_mod_wrap_and_backward[compile-overhead] 1.8434ms 1.0766ms 928.8570 Ops/s 1.0931 KOps/s $\textbf{\color{#d91a1a}-15.03\%}$
test_seq_add[eager] 0.1695ms 0.1187ms 8.4250 KOps/s 8.7126 KOps/s $\color{#d91a1a}-3.30\%$
test_seq_add[compile] 0.1424ms 85.4684μs 11.7002 KOps/s 11.0175 KOps/s $\textbf{\color{#35bf28}+6.20\%}$
test_seq_add[compile-overhead] 0.1743ms 0.1247ms 8.0180 KOps/s 8.0427 KOps/s $\color{#d91a1a}-0.31\%$
test_seq_wrap[eager] 0.5117ms 0.4547ms 2.1991 KOps/s 2.2946 KOps/s $\color{#d91a1a}-4.16\%$
test_seq_wrap[compile] 1.5351ms 0.3412ms 2.9310 KOps/s 2.9383 KOps/s $\color{#d91a1a}-0.25\%$
test_seq_wrap[compile-overhead] 0.3153s 0.1466s 6.8211 Ops/s 6.8498 Ops/s $\color{#d91a1a}-0.42\%$
test_func_call_runtime[False-eager] 0.8157ms 0.7672ms 1.3035 KOps/s 1.2927 KOps/s $\color{#35bf28}+0.83\%$
test_func_call_runtime[False-compile] 0.8686ms 0.8331ms 1.2003 KOps/s 1.1434 KOps/s $\color{#35bf28}+4.98\%$
test_func_call_runtime[False-compile-overhead] 0.4061ms 0.3672ms 2.7232 KOps/s 2.6106 KOps/s $\color{#35bf28}+4.32\%$
test_func_call_runtime[True-eager] 1.0710ms 1.0225ms 978.0136 Ops/s 935.5915 Ops/s $\color{#35bf28}+4.53\%$
test_func_call_runtime[True-compile] 0.9315ms 0.8733ms 1.1451 KOps/s 1.0745 KOps/s $\textbf{\color{#35bf28}+6.57\%}$
test_func_call_runtime[True-compile-overhead] 0.4771ms 0.4106ms 2.4355 KOps/s 2.4168 KOps/s $\color{#35bf28}+0.77\%$
test_distributed 1.3943ms 72.4989μs 13.7933 KOps/s 13.6576 KOps/s $\color{#35bf28}+0.99\%$
test_tdmodule 31.6810μs 16.3799μs 61.0504 KOps/s 59.0048 KOps/s $\color{#35bf28}+3.47\%$
test_tdmodule_dispatch 51.3800μs 35.0219μs 28.5536 KOps/s 29.3602 KOps/s $\color{#d91a1a}-2.75\%$
test_tdseq 34.0410μs 17.2777μs 57.8780 KOps/s 56.6158 KOps/s $\color{#35bf28}+2.23\%$
test_tdseq_dispatch 59.2210μs 36.8534μs 27.1346 KOps/s 27.3391 KOps/s $\color{#d91a1a}-0.75\%$
test_instantiation_functorch 2.2366ms 2.0805ms 480.6550 Ops/s 481.7805 Ops/s $\color{#d91a1a}-0.23\%$
test_instantiation_td 2.0650ms 1.3351ms 749.0341 Ops/s 753.1071 Ops/s $\color{#d91a1a}-0.54\%$
test_exec_functorch 0.2968ms 0.2440ms 4.0977 KOps/s 4.0644 KOps/s $\color{#35bf28}+0.82\%$
test_exec_functional_call 0.3257ms 0.2476ms 4.0387 KOps/s 4.1251 KOps/s $\color{#d91a1a}-2.09\%$
test_exec_td 0.3116ms 0.2456ms 4.0725 KOps/s 4.1153 KOps/s $\color{#d91a1a}-1.04\%$
test_exec_td_decorator 0.9922ms 0.3235ms 3.0910 KOps/s 3.1437 KOps/s $\color{#d91a1a}-1.67\%$
test_vmap_mlp_speed[True-True] 0.8876ms 0.7190ms 1.3909 KOps/s 1.3876 KOps/s $\color{#35bf28}+0.23\%$
test_vmap_mlp_speed[True-False] 0.8847ms 0.7249ms 1.3795 KOps/s 1.3925 KOps/s $\color{#d91a1a}-0.94\%$
test_vmap_mlp_speed[False-True] 0.7016ms 0.6401ms 1.5622 KOps/s 1.5819 KOps/s $\color{#d91a1a}-1.25\%$
test_vmap_mlp_speed[False-False] 0.7795ms 0.6410ms 1.5601 KOps/s 1.6435 KOps/s $\textbf{\color{#d91a1a}-5.07\%}$
test_vmap_mlp_speed_decorator[True-True] 0.9316ms 0.8002ms 1.2498 KOps/s 1.2596 KOps/s $\color{#d91a1a}-0.78\%$
test_vmap_mlp_speed_decorator[True-False] 1.2944ms 0.8009ms 1.2486 KOps/s 1.2576 KOps/s $\color{#d91a1a}-0.72\%$
test_vmap_mlp_speed_decorator[False-True] 0.8411ms 0.6955ms 1.4378 KOps/s 1.4344 KOps/s $\color{#35bf28}+0.24\%$
test_vmap_mlp_speed_decorator[False-False] 0.8612ms 0.6944ms 1.4400 KOps/s 1.4290 KOps/s $\color{#35bf28}+0.77\%$
test_vmap_transformer_speed[True-True] 9.5928ms 9.2634ms 107.9521 Ops/s 108.9365 Ops/s $\color{#d91a1a}-0.90\%$
test_vmap_transformer_speed[True-False] 9.5695ms 9.4334ms 106.0063 Ops/s 107.3416 Ops/s $\color{#d91a1a}-1.24\%$
test_vmap_transformer_speed[False-True] 9.5046ms 9.2689ms 107.8878 Ops/s 109.9350 Ops/s $\color{#d91a1a}-1.86\%$
test_vmap_transformer_speed[False-False] 9.5658ms 9.2677ms 107.9012 Ops/s 109.2632 Ops/s $\color{#d91a1a}-1.25\%$
test_vmap_transformer_speed_decorator[True-True] 22.6707ms 22.0302ms 45.3921 Ops/s 46.6724 Ops/s $\color{#d91a1a}-2.74\%$
test_vmap_transformer_speed_decorator[True-False] 22.9037ms 22.0268ms 45.3992 Ops/s 46.9301 Ops/s $\color{#d91a1a}-3.26\%$
test_vmap_transformer_speed_decorator[False-True] 22.3543ms 21.3331ms 46.8755 Ops/s 47.1282 Ops/s $\color{#d91a1a}-0.54\%$
test_vmap_transformer_speed_decorator[False-False] 22.4319ms 21.3375ms 46.8658 Ops/s 47.2086 Ops/s $\color{#d91a1a}-0.73\%$
test_to_module_speed[True] 3.0566ms 1.5128ms 661.0208 Ops/s 670.0223 Ops/s $\color{#d91a1a}-1.34\%$
test_to_module_speed[False] 2.0011ms 1.5071ms 663.5138 Ops/s 674.2669 Ops/s $\color{#d91a1a}-1.59\%$
test_tc_init 69.9520μs 37.8319μs 26.4327 KOps/s 27.0531 KOps/s $\color{#d91a1a}-2.29\%$
test_tc_init_nested 0.1091ms 78.7286μs 12.7019 KOps/s 13.2962 KOps/s $\color{#d91a1a}-4.47\%$
test_tc_first_layer_tensor 0.1007ms 4.0390μs 247.5886 KOps/s 252.7337 KOps/s $\color{#d91a1a}-2.04\%$
test_tc_first_layer_nontensor 29.6620μs 4.0408μs 247.4743 KOps/s 250.1938 KOps/s $\color{#d91a1a}-1.09\%$
test_tc_second_layer_tensor 7.6775μs 1.2929μs 773.4486 KOps/s 777.2255 KOps/s $\color{#d91a1a}-0.49\%$
test_tc_second_layer_nontensor 26.5310μs 4.5971μs 217.5303 KOps/s 218.2170 KOps/s $\color{#d91a1a}-0.31\%$
test_unbind 0.3152s 12.7995ms 78.1281 Ops/s 77.4408 Ops/s $\color{#35bf28}+0.89\%$
test_full_like 0.6567ms 0.5788ms 1.7277 KOps/s 1.7315 KOps/s $\color{#d91a1a}-0.22\%$
test_zeros_like 0.2652ms 0.1976ms 5.0615 KOps/s 5.0544 KOps/s $\color{#35bf28}+0.14\%$
test_ones_like 0.2299ms 0.1974ms 5.0650 KOps/s 5.0630 KOps/s $\color{#35bf28}+0.04\%$
test_clone 0.4477ms 0.4149ms 2.4099 KOps/s 2.4086 KOps/s $\color{#35bf28}+0.05\%$
test_squeeze 36.2910μs 12.0781μs 82.7947 KOps/s 82.1742 KOps/s $\color{#35bf28}+0.76\%$
test_unsqueeze 0.2648ms 83.2469μs 12.0125 KOps/s 11.4343 KOps/s $\textbf{\color{#35bf28}+5.06\%}$
test_split 0.4465ms 0.1853ms 5.3969 KOps/s 5.3988 KOps/s $\color{#d91a1a}-0.03\%$
test_permute 0.2453ms 0.1949ms 5.1309 KOps/s 4.9707 KOps/s $\color{#35bf28}+3.22\%$
test_stack 1.2499ms 0.9094ms 1.0996 KOps/s 1.1159 KOps/s $\color{#d91a1a}-1.46\%$
test_cat 1.2523ms 1.2317ms 811.8628 Ops/s 811.9302 Ops/s $-0.01\%$

@vmoens vmoens mentioned this pull request Jul 22, 2024
@vmoens vmoens added the enhancement New feature or request label Jul 22, 2024
@vmoens vmoens merged commit e4c13dc into gh/vmoens/8/base Jul 22, 2024
37 checks passed
vmoens added a commit that referenced this pull request Jul 22, 2024
ghstack-source-id: 1628c44a3b012a32c83dfc2f543ccb08ec6ee874
Pull Request resolved: #908
@vmoens vmoens deleted the gh/vmoens/8/head branch July 22, 2024 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants