Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BE] single dim check helper #1192

Merged
merged 2 commits into from
Feb 4, 2025
Merged

[BE] single dim check helper #1192

merged 2 commits into from
Feb 4, 2025

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 26, 2025

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 26, 2025
ghstack-source-id: 2a530b64cbc285544cf4abb7b2970d1f8ffee321
Pull Request resolved: #1192
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 26, 2025
@vmoens vmoens added the BE Better errors, logs, docs or test utils label Feb 4, 2025
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 4, 2025
ghstack-source-id: 6606e4b96061f73b98787b25129c29671a78dc1e
Pull Request resolved: #1192
Copy link

github-actions bot commented Feb 4, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 217. Improved: $\large\color{#35bf28}22$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 55.3940μs 21.0232μs 47.5666 KOps/s 47.7521 KOps/s $\color{#d91a1a}-0.39\%$
test_plain_set_stack_nested 50.2240μs 21.1877μs 47.1972 KOps/s 46.7442 KOps/s $\color{#35bf28}+0.97\%$
test_plain_set_nested_inplace 0.2305ms 23.3646μs 42.7998 KOps/s 42.3122 KOps/s $\color{#35bf28}+1.15\%$
test_plain_set_stack_nested_inplace 60.8840μs 22.7876μs 43.8834 KOps/s 43.8717 KOps/s $\color{#35bf28}+0.03\%$
test_items 38.3410μs 4.1956μs 238.3456 KOps/s 241.8737 KOps/s $\color{#d91a1a}-1.46\%$
test_items_nested 0.7202ms 0.4012ms 2.4924 KOps/s 2.4269 KOps/s $\color{#35bf28}+2.70\%$
test_items_nested_locked 1.0868ms 0.4060ms 2.4631 KOps/s 2.4146 KOps/s $\color{#35bf28}+2.01\%$
test_items_nested_leaf 0.2341ms 76.1676μs 13.1289 KOps/s 12.9780 KOps/s $\color{#35bf28}+1.16\%$
test_items_stack_nested 0.7625ms 0.4036ms 2.4778 KOps/s 2.4105 KOps/s $\color{#35bf28}+2.79\%$
test_items_stack_nested_leaf 0.1510ms 77.5087μs 12.9018 KOps/s 12.6439 KOps/s $\color{#35bf28}+2.04\%$
test_items_stack_nested_locked 0.4954ms 0.4020ms 2.4878 KOps/s 2.4198 KOps/s $\color{#35bf28}+2.81\%$
test_keys 27.1810μs 3.5634μs 280.6330 KOps/s 280.6363 KOps/s $-0.00\%$
test_keys_nested 0.2894ms 0.1642ms 6.0891 KOps/s 6.0130 KOps/s $\color{#35bf28}+1.27\%$
test_keys_nested_locked 0.6334ms 0.1688ms 5.9241 KOps/s 5.8139 KOps/s $\color{#35bf28}+1.89\%$
test_keys_nested_leaf 0.1979ms 0.1411ms 7.0892 KOps/s 6.9455 KOps/s $\color{#35bf28}+2.07\%$
test_keys_stack_nested 0.2940ms 0.1624ms 6.1557 KOps/s 6.0318 KOps/s $\color{#35bf28}+2.06\%$
test_keys_stack_nested_leaf 0.1999ms 0.1422ms 7.0332 KOps/s 6.9487 KOps/s $\color{#35bf28}+1.22\%$
test_keys_stack_nested_locked 0.2948ms 0.1690ms 5.9186 KOps/s 5.8444 KOps/s $\color{#35bf28}+1.27\%$
test_values 7.7986μs 1.0978μs 910.9397 KOps/s 944.7889 KOps/s $\color{#d91a1a}-3.58\%$
test_values_nested 0.1474ms 62.9654μs 15.8817 KOps/s 15.7656 KOps/s $\color{#35bf28}+0.74\%$
test_values_nested_locked 0.1211ms 62.3900μs 16.0282 KOps/s 15.2336 KOps/s $\textbf{\color{#35bf28}+5.22\%}$
test_values_nested_leaf 0.1226ms 71.3467μs 14.0161 KOps/s 13.3743 KOps/s $\color{#35bf28}+4.80\%$
test_values_stack_nested 0.1114ms 62.7262μs 15.9423 KOps/s 15.8112 KOps/s $\color{#35bf28}+0.83\%$
test_values_stack_nested_leaf 0.1472ms 72.0882μs 13.8719 KOps/s 13.8173 KOps/s $\color{#35bf28}+0.40\%$
test_values_stack_nested_locked 0.1311ms 62.5139μs 15.9964 KOps/s 15.6216 KOps/s $\color{#35bf28}+2.40\%$
test_membership 22.4020μs 0.8575μs 1.1662 MOps/s 1.1808 MOps/s $\color{#d91a1a}-1.24\%$
test_membership_nested 28.5130μs 2.9023μs 344.5576 KOps/s 334.2481 KOps/s $\color{#35bf28}+3.08\%$
test_membership_nested_leaf 56.3650μs 2.9280μs 341.5319 KOps/s 331.1453 KOps/s $\color{#35bf28}+3.14\%$
test_membership_stacked_nested 23.1630μs 2.8834μs 346.8094 KOps/s 330.1801 KOps/s $\textbf{\color{#35bf28}+5.04\%}$
test_membership_stacked_nested_leaf 46.6870μs 2.9231μs 342.1008 KOps/s 311.7206 KOps/s $\textbf{\color{#35bf28}+9.75\%}$
test_membership_nested_last 27.4510μs 4.3772μs 228.4560 KOps/s 224.4534 KOps/s $\color{#35bf28}+1.78\%$
test_membership_nested_leaf_last 32.0490μs 4.3776μs 228.4372 KOps/s 226.5486 KOps/s $\color{#35bf28}+0.83\%$
test_membership_stacked_nested_last 28.9040μs 5.1106μs 195.6707 KOps/s 227.7240 KOps/s $\textbf{\color{#d91a1a}-14.08\%}$
test_membership_stacked_nested_leaf_last 24.2450μs 5.1910μs 192.6400 KOps/s 224.1463 KOps/s $\textbf{\color{#d91a1a}-14.06\%}$
test_nested_getleaf 37.7710μs 10.6481μs 93.9134 KOps/s 95.0714 KOps/s $\color{#d91a1a}-1.22\%$
test_nested_get 44.4030μs 10.1073μs 98.9388 KOps/s 99.7321 KOps/s $\color{#d91a1a}-0.80\%$
test_stacked_getleaf 38.5630μs 10.6001μs 94.3385 KOps/s 95.5766 KOps/s $\color{#d91a1a}-1.30\%$
test_stacked_get 33.2320μs 10.1168μs 98.8457 KOps/s 100.0133 KOps/s $\color{#d91a1a}-1.17\%$
test_nested_getitemleaf 30.0050μs 11.2103μs 89.2037 KOps/s 88.5064 KOps/s $\color{#35bf28}+0.79\%$
test_nested_getitem 64.2500μs 10.8035μs 92.5629 KOps/s 93.0639 KOps/s $\color{#d91a1a}-0.54\%$
test_stacked_getitemleaf 34.8050μs 11.0983μs 90.1040 KOps/s 88.9266 KOps/s $\color{#35bf28}+1.32\%$
test_stacked_getitem 34.1840μs 10.6237μs 94.1292 KOps/s 93.5148 KOps/s $\color{#35bf28}+0.66\%$
test_lock_nested 0.5397ms 0.4047ms 2.4712 KOps/s 2.4317 KOps/s $\color{#35bf28}+1.62\%$
test_lock_stack_nested 0.6483ms 0.4157ms 2.4055 KOps/s 2.3279 KOps/s $\color{#35bf28}+3.33\%$
test_unlock_nested 0.5278ms 0.3347ms 2.9878 KOps/s 2.9358 KOps/s $\color{#35bf28}+1.77\%$
test_unlock_stack_nested 0.5810ms 0.3392ms 2.9480 KOps/s 2.8956 KOps/s $\color{#35bf28}+1.81\%$
test_flatten_speed 0.1792ms 98.6519μs 10.1366 KOps/s 9.9916 KOps/s $\color{#35bf28}+1.45\%$
test_unflatten_speed 1.0653ms 0.5222ms 1.9151 KOps/s 1.8920 KOps/s $\color{#35bf28}+1.22\%$
test_common_ops 5.4736ms 0.8327ms 1.2010 KOps/s 1.1701 KOps/s $\color{#35bf28}+2.64\%$
test_creation 25.4780μs 2.5177μs 397.1937 KOps/s 398.9191 KOps/s $\color{#d91a1a}-0.43\%$
test_creation_empty 35.8270μs 12.9520μs 77.2084 KOps/s 79.5355 KOps/s $\color{#d91a1a}-2.93\%$
test_creation_nested_1 42.3790μs 15.9093μs 62.8565 KOps/s 64.6569 KOps/s $\color{#d91a1a}-2.78\%$
test_creation_nested_2 44.0620μs 20.5928μs 48.5608 KOps/s 49.6124 KOps/s $\color{#d91a1a}-2.12\%$
test_clone 0.1395ms 13.5349μs 73.8829 KOps/s 72.1528 KOps/s $\color{#35bf28}+2.40\%$
test_getitem[int] 0.8421ms 12.6102μs 79.3011 KOps/s 77.0860 KOps/s $\color{#35bf28}+2.87\%$
test_getitem[slice_int] 0.1452ms 24.8130μs 40.3014 KOps/s 39.9243 KOps/s $\color{#35bf28}+0.94\%$
test_getitem[range] 0.1743ms 50.6078μs 19.7598 KOps/s 20.0240 KOps/s $\color{#d91a1a}-1.32\%$
test_getitem[tuple] 0.1253ms 20.0512μs 49.8724 KOps/s 49.0721 KOps/s $\color{#35bf28}+1.63\%$
test_getitem[list] 0.1623ms 45.4184μs 22.0175 KOps/s 21.8117 KOps/s $\color{#35bf28}+0.94\%$
test_setitem_dim[int] 57.9280μs 26.2591μs 38.0820 KOps/s 37.4237 KOps/s $\color{#35bf28}+1.76\%$
test_setitem_dim[slice_int] 83.0950μs 50.6163μs 19.7565 KOps/s 18.8132 KOps/s $\textbf{\color{#35bf28}+5.01\%}$
test_setitem_dim[range] 0.1301ms 76.0775μs 13.1445 KOps/s 12.7630 KOps/s $\color{#35bf28}+2.99\%$
test_setitem_dim[tuple] 84.2270μs 41.0990μs 24.3315 KOps/s 23.3920 KOps/s $\color{#35bf28}+4.02\%$
test_setitem 0.1784ms 21.1192μs 47.3503 KOps/s 46.9634 KOps/s $\color{#35bf28}+0.82\%$
test_set 0.2297ms 20.6319μs 48.4686 KOps/s 48.4636 KOps/s $\color{#35bf28}+0.01\%$
test_set_shared 0.3907ms 0.1792ms 5.5801 KOps/s 5.4301 KOps/s $\color{#35bf28}+2.76\%$
test_update 0.1942ms 24.0813μs 41.5260 KOps/s 41.4814 KOps/s $\color{#35bf28}+0.11\%$
test_update_nested 0.2045ms 33.2469μs 30.0780 KOps/s 28.8998 KOps/s $\color{#35bf28}+4.08\%$
test_update__nested 0.5030ms 33.6445μs 29.7226 KOps/s 28.7421 KOps/s $\color{#35bf28}+3.41\%$
test_set_nested 62.8170μs 22.5939μs 44.2598 KOps/s 42.3589 KOps/s $\color{#35bf28}+4.49\%$
test_set_nested_new 0.2159ms 26.9331μs 37.1290 KOps/s 36.2472 KOps/s $\color{#35bf28}+2.43\%$
test_select 0.2277ms 44.3999μs 22.5226 KOps/s 21.8626 KOps/s $\color{#35bf28}+3.02\%$
test_select_nested 0.1198ms 63.4614μs 15.7576 KOps/s 15.4924 KOps/s $\color{#35bf28}+1.71\%$
test_exclude_nested 0.1880ms 81.5228μs 12.2665 KOps/s 12.1178 KOps/s $\color{#35bf28}+1.23\%$
test_empty[True] 0.7425ms 0.4143ms 2.4139 KOps/s 2.3905 KOps/s $\color{#35bf28}+0.98\%$
test_empty[False] 8.5383μs 1.3840μs 722.5237 KOps/s 722.0446 KOps/s $\color{#35bf28}+0.07\%$
test_unbind_speed 0.3677ms 0.2722ms 3.6733 KOps/s 3.6498 KOps/s $\color{#35bf28}+0.64\%$
test_unbind_speed_stack0 0.4897ms 0.2698ms 3.7069 KOps/s 3.6747 KOps/s $\color{#35bf28}+0.88\%$
test_unbind_speed_stack1 0.1095s 0.7353ms 1.3599 KOps/s 1.2258 KOps/s $\textbf{\color{#35bf28}+10.94\%}$
test_split 0.1104s 1.7520ms 570.7912 Ops/s 622.3769 Ops/s $\textbf{\color{#d91a1a}-8.29\%}$
test_chunk 0.1111s 1.7517ms 570.8762 Ops/s 564.6371 Ops/s $\color{#35bf28}+1.10\%$
test_consolidate_njt[False-None] 8.7233ms 8.2213ms 121.6356 Ops/s 109.7262 Ops/s $\textbf{\color{#35bf28}+10.85\%}$
test_creation[device0] 0.2930ms 93.9816μs 10.6404 KOps/s 10.5741 KOps/s $\color{#35bf28}+0.63\%$
test_creation_from_tensor 3.6181ms 98.6942μs 10.1323 KOps/s 10.3074 KOps/s $\color{#d91a1a}-1.70\%$
test_add_one[memmap_tensor0] 0.1447ms 4.9595μs 201.6350 KOps/s 196.6132 KOps/s $\color{#35bf28}+2.55\%$
test_contiguous[memmap_tensor0] 21.9510μs 0.5136μs 1.9469 MOps/s 1.9828 MOps/s $\color{#d91a1a}-1.81\%$
test_stack[memmap_tensor0] 31.6590μs 3.3965μs 294.4247 KOps/s 289.3672 KOps/s $\color{#35bf28}+1.75\%$
test_memmaptd_index 0.3445ms 0.2291ms 4.3643 KOps/s 4.4189 KOps/s $\color{#d91a1a}-1.24\%$
test_memmaptd_index_astensor 0.5295ms 0.3176ms 3.1488 KOps/s 3.1683 KOps/s $\color{#d91a1a}-0.62\%$
test_memmaptd_index_op 1.3744ms 0.5991ms 1.6691 KOps/s 1.6377 KOps/s $\color{#35bf28}+1.92\%$
test_serialize_model 0.2240s 0.1359s 7.3608 Ops/s 8.7265 Ops/s $\textbf{\color{#d91a1a}-15.65\%}$
test_serialize_model_pickle 0.4988s 0.3993s 2.5043 Ops/s 2.5014 Ops/s $\color{#35bf28}+0.12\%$
test_serialize_weights 0.1239s 0.1128s 8.8670 Ops/s 8.6534 Ops/s $\color{#35bf28}+2.47\%$
test_serialize_weights_returnearly 0.1676s 0.1583s 6.3180 Ops/s 6.4288 Ops/s $\color{#d91a1a}-1.72\%$
test_serialize_weights_pickle 0.6394s 0.4743s 2.1082 Ops/s 2.3264 Ops/s $\textbf{\color{#d91a1a}-9.38\%}$
test_serialize_weights_filesystem 0.2555s 0.1617s 6.1824 Ops/s 7.0406 Ops/s $\textbf{\color{#d91a1a}-12.19\%}$
test_serialize_model_filesystem 0.1606s 0.1488s 6.7195 Ops/s 6.5642 Ops/s $\color{#35bf28}+2.37\%$
test_reshape_pytree 57.6770μs 26.0092μs 38.4479 KOps/s 37.8690 KOps/s $\color{#35bf28}+1.53\%$
test_reshape_td 66.1830μs 31.9855μs 31.2641 KOps/s 29.9470 KOps/s $\color{#35bf28}+4.40\%$
test_view_pytree 61.9460μs 25.6658μs 38.9623 KOps/s 38.2859 KOps/s $\color{#35bf28}+1.77\%$
test_view_td 94.3860μs 38.1934μs 26.1826 KOps/s 25.2423 KOps/s $\color{#35bf28}+3.72\%$
test_unbind_pytree 67.6770μs 29.5825μs 33.8038 KOps/s 33.8301 KOps/s $\color{#d91a1a}-0.08\%$
test_unbind_td 0.3147ms 41.0377μs 24.3679 KOps/s 24.8524 KOps/s $\color{#d91a1a}-1.95\%$
test_split_pytree 0.1500ms 29.4327μs 33.9758 KOps/s 34.1927 KOps/s $\color{#d91a1a}-0.63\%$
test_split_td 0.9351ms 48.6750μs 20.5444 KOps/s 22.1601 KOps/s $\textbf{\color{#d91a1a}-7.29\%}$
test_add_pytree 99.8860μs 35.8855μs 27.8664 KOps/s 27.8949 KOps/s $\color{#d91a1a}-0.10\%$
test_add_td 0.1082ms 56.0799μs 17.8317 KOps/s 16.8923 KOps/s $\textbf{\color{#35bf28}+5.56\%}$
test_compile_add_one_nested[tensordict-compile] 0.1397ms 66.3026μs 15.0824 KOps/s 14.7621 KOps/s $\color{#35bf28}+2.17\%$
test_compile_add_one_nested[tensordict-eager] 1.3503ms 0.1703ms 5.8713 KOps/s 5.7658 KOps/s $\color{#35bf28}+1.83\%$
test_compile_add_one_nested[pytree-compile] 0.1094ms 46.2457μs 21.6236 KOps/s 20.6165 KOps/s $\color{#35bf28}+4.89\%$
test_compile_add_one_nested[pytree-eager] 0.2014ms 0.1177ms 8.4967 KOps/s 8.4591 KOps/s $\color{#35bf28}+0.44\%$
test_compile_copy_nested[tensordict-compile] 62.7970μs 28.3352μs 35.2918 KOps/s 35.6824 KOps/s $\color{#d91a1a}-1.09\%$
test_compile_copy_nested[tensordict-eager] 0.1365ms 58.1278μs 17.2035 KOps/s 16.9894 KOps/s $\color{#35bf28}+1.26\%$
test_compile_copy_nested[pytree-compile] 0.1481ms 78.1925μs 12.7890 KOps/s 12.5438 KOps/s $\color{#35bf28}+1.95\%$
test_compile_copy_nested[pytree-eager] 0.1532ms 65.9806μs 15.1560 KOps/s 14.7142 KOps/s $\color{#35bf28}+3.00\%$
test_compile_add_one_flat[tensordict-compile] 0.2356ms 0.1066ms 9.3803 KOps/s 9.1558 KOps/s $\color{#35bf28}+2.45\%$
test_compile_add_one_flat[tensordict-eager] 0.4269ms 0.2137ms 4.6789 KOps/s 4.5661 KOps/s $\color{#35bf28}+2.47\%$
test_compile_add_one_flat[tensorclass-compile] 0.1098ms 47.7283μs 20.9519 KOps/s 20.8819 KOps/s $\color{#35bf28}+0.34\%$
test_compile_add_one_flat[tensorclass-eager] 0.1408ms 65.8891μs 15.1770 KOps/s 14.3457 KOps/s $\textbf{\color{#35bf28}+5.80\%}$
test_compile_add_one_flat[pytree-compile] 0.4669ms 0.1034ms 9.6717 KOps/s 9.9403 KOps/s $\color{#d91a1a}-2.70\%$
test_compile_add_one_flat[pytree-eager] 0.3135ms 0.2029ms 4.9279 KOps/s 4.9236 KOps/s $\color{#35bf28}+0.09\%$
test_compile_add_self_flat[tensordict-eager] 0.4720ms 0.2310ms 4.3299 KOps/s 4.1899 KOps/s $\color{#35bf28}+3.34\%$
test_compile_add_self_flat[tensordict-compile] 0.3129ms 0.1097ms 9.1120 KOps/s 8.9539 KOps/s $\color{#35bf28}+1.77\%$
test_compile_add_self_flat[tensorclass-eager] 0.1319ms 63.6583μs 15.7089 KOps/s 15.2185 KOps/s $\color{#35bf28}+3.22\%$
test_compile_add_self_flat[tensorclass-compile] 97.9520μs 48.5044μs 20.6167 KOps/s 20.1425 KOps/s $\color{#35bf28}+2.35\%$
test_compile_add_self_flat[pytree-eager] 0.3316ms 0.1564ms 6.3928 KOps/s 6.3658 KOps/s $\color{#35bf28}+0.42\%$
test_compile_add_self_flat[pytree-compile] 0.2169ms 0.1014ms 9.8657 KOps/s 9.6746 KOps/s $\color{#35bf28}+1.98\%$
test_compile_copy_flat[tensordict-compile] 71.1730μs 21.1950μs 47.1810 KOps/s 44.5872 KOps/s $\textbf{\color{#35bf28}+5.82\%}$
test_compile_copy_flat[tensordict-eager] 0.1171ms 66.7033μs 14.9918 KOps/s 14.4092 KOps/s $\color{#35bf28}+4.04\%$
test_compile_copy_flat[pytree-compile] 0.1810ms 85.8327μs 11.6506 KOps/s 11.8533 KOps/s $\color{#d91a1a}-1.71\%$
test_compile_copy_flat[pytree-eager] 0.1576ms 68.8006μs 14.5348 KOps/s 14.4905 KOps/s $\color{#35bf28}+0.31\%$
test_compile_assign_and_add[tensordict-compile] 0.2872ms 0.2139ms 4.6748 KOps/s 4.5362 KOps/s $\color{#35bf28}+3.05\%$
test_compile_assign_and_add[tensordict-eager] 1.6076ms 1.3595ms 735.5546 Ops/s 704.1992 Ops/s $\color{#35bf28}+4.45\%$
test_compile_assign_and_add[pytree-compile] 0.3820ms 0.2081ms 4.8049 KOps/s 4.6361 KOps/s $\color{#35bf28}+3.64\%$
test_compile_assign_and_add[pytree-eager] 1.7770ms 0.8393ms 1.1914 KOps/s 1.2134 KOps/s $\color{#d91a1a}-1.81\%$
test_compile_assign_and_add_stack[compile] 0.7329ms 0.4521ms 2.2118 KOps/s 2.0935 KOps/s $\textbf{\color{#35bf28}+5.65\%}$
test_compile_assign_and_add_stack[eager] 4.4608ms 2.7689ms 361.1505 Ops/s 359.5036 Ops/s $\color{#35bf28}+0.46\%$
test_compile_indexing[tensor-tensordict-compile] 96.5310μs 37.6581μs 26.5547 KOps/s 24.3326 KOps/s $\textbf{\color{#35bf28}+9.13\%}$
test_compile_indexing[tensor-tensordict-eager] 0.5855ms 33.8054μs 29.5811 KOps/s 29.5943 KOps/s $\color{#d91a1a}-0.04\%$
test_compile_indexing[tensor-tensorclass-compile] 81.0310μs 31.4210μs 31.8259 KOps/s 30.8003 KOps/s $\color{#35bf28}+3.33\%$
test_compile_indexing[tensor-tensorclass-eager] 77.4340μs 23.1034μs 43.2837 KOps/s 43.6349 KOps/s $\color{#d91a1a}-0.80\%$
test_compile_indexing[tensor-pytree-compile] 77.7750μs 31.7505μs 31.4955 KOps/s 29.7550 KOps/s $\textbf{\color{#35bf28}+5.85\%}$
test_compile_indexing[tensor-pytree-eager] 0.2965ms 22.9682μs 43.5385 KOps/s 43.6374 KOps/s $\color{#d91a1a}-0.23\%$
test_compile_indexing[slice-tensordict-compile] 95.0280μs 53.4659μs 18.7035 KOps/s 18.0779 KOps/s $\color{#35bf28}+3.46\%$
test_compile_indexing[slice-tensordict-eager] 0.3854ms 20.1386μs 49.6559 KOps/s 48.6971 KOps/s $\color{#35bf28}+1.97\%$
test_compile_indexing[slice-tensorclass-compile] 0.1051ms 45.9548μs 21.7605 KOps/s 21.0601 KOps/s $\color{#35bf28}+3.33\%$
test_compile_indexing[slice-tensorclass-eager] 75.7010μs 18.7980μs 53.1971 KOps/s 53.2836 KOps/s $\color{#d91a1a}-0.16\%$
test_compile_indexing[slice-pytree-compile] 0.1434ms 46.1577μs 21.6649 KOps/s 20.9334 KOps/s $\color{#35bf28}+3.49\%$
test_compile_indexing[slice-pytree-eager] 66.3440μs 18.5920μs 53.7866 KOps/s 52.8480 KOps/s $\color{#35bf28}+1.78\%$
test_compile_indexing[int-tensordict-compile] 0.1385ms 54.9278μs 18.2057 KOps/s 17.9438 KOps/s $\color{#35bf28}+1.46\%$
test_compile_indexing[int-tensordict-eager] 1.0276ms 19.6252μs 50.9548 KOps/s 49.3633 KOps/s $\color{#35bf28}+3.22\%$
test_compile_indexing[int-tensorclass-compile] 95.2680μs 45.9803μs 21.7485 KOps/s 21.1340 KOps/s $\color{#35bf28}+2.91\%$
test_compile_indexing[int-tensorclass-eager] 54.2110μs 18.6819μs 53.5277 KOps/s 53.9221 KOps/s $\color{#d91a1a}-0.73\%$
test_compile_indexing[int-pytree-compile] 0.1195ms 46.5800μs 21.4685 KOps/s 21.1626 KOps/s $\color{#35bf28}+1.45\%$
test_compile_indexing[int-pytree-eager] 56.1250μs 18.6954μs 53.4892 KOps/s 53.3884 KOps/s $\color{#35bf28}+0.19\%$
test_mod_add[eager] 0.1320ms 36.5039μs 27.3944 KOps/s 27.5986 KOps/s $\color{#d91a1a}-0.74\%$
test_mod_add[compile] 0.1182ms 63.3197μs 15.7929 KOps/s 15.0821 KOps/s $\color{#35bf28}+4.71\%$
test_mod_add[compile-overhead] 0.1054ms 62.1797μs 16.0824 KOps/s 15.0924 KOps/s $\textbf{\color{#35bf28}+6.56\%}$
test_mod_wrap[eager] 0.3607ms 0.2196ms 4.5534 KOps/s 4.3443 KOps/s $\color{#35bf28}+4.81\%$
test_mod_wrap[compile] 2.3716ms 0.2284ms 4.3786 KOps/s 4.2995 KOps/s $\color{#35bf28}+1.84\%$
test_mod_wrap[compile-overhead] 0.3570ms 0.2264ms 4.4173 KOps/s 4.3499 KOps/s $\color{#35bf28}+1.55\%$
test_mod_wrap_and_backward[eager] 12.3961ms 11.1322ms 89.8298 Ops/s 72.8044 Ops/s $\textbf{\color{#35bf28}+23.39\%}$
test_mod_wrap_and_backward[compile] 11.6346ms 10.7608ms 92.9301 Ops/s 82.0639 Ops/s $\textbf{\color{#35bf28}+13.24\%}$
test_mod_wrap_and_backward[compile-overhead] 12.6310ms 10.8344ms 92.2986 Ops/s 84.8779 Ops/s $\textbf{\color{#35bf28}+8.74\%}$
test_seq_add[eager] 0.2147ms 0.1185ms 8.4372 KOps/s 8.2533 KOps/s $\color{#35bf28}+2.23\%$
test_seq_add[compile] 0.1338ms 76.1717μs 13.1282 KOps/s 12.9701 KOps/s $\color{#35bf28}+1.22\%$
test_seq_add[compile-overhead] 0.1335ms 76.2428μs 13.1160 KOps/s 13.0029 KOps/s $\color{#35bf28}+0.87\%$
test_seq_wrap[eager] 0.6768ms 0.4504ms 2.2204 KOps/s 2.2040 KOps/s $\color{#35bf28}+0.74\%$
test_seq_wrap[compile] 0.3377ms 0.2448ms 4.0849 KOps/s 4.1018 KOps/s $\color{#d91a1a}-0.41\%$
test_seq_wrap[compile-overhead] 0.3902ms 0.2433ms 4.1105 KOps/s 4.1153 KOps/s $\color{#d91a1a}-0.12\%$
test_func_call_runtime[False-eager] 0.7105ms 0.5450ms 1.8349 KOps/s 1.8430 KOps/s $\color{#d91a1a}-0.44\%$
test_func_call_runtime[False-compile] 0.6983ms 0.4496ms 2.2242 KOps/s 2.2661 KOps/s $\color{#d91a1a}-1.85\%$
test_func_call_runtime[False-compile-overhead] 0.5607ms 0.4467ms 2.2388 KOps/s 2.2813 KOps/s $\color{#d91a1a}-1.86\%$
test_func_call_runtime[True-eager] 0.9863ms 0.7423ms 1.3472 KOps/s 1.3095 KOps/s $\color{#35bf28}+2.88\%$
test_func_call_runtime[True-compile] 0.8482ms 0.4697ms 2.1290 KOps/s 2.1608 KOps/s $\color{#d91a1a}-1.47\%$
test_func_call_runtime[True-compile-overhead] 1.0125ms 0.4713ms 2.1219 KOps/s 2.1665 KOps/s $\color{#d91a1a}-2.06\%$
test_func_call_cm_runtime[False-eager] 0.9211ms 0.5377ms 1.8598 KOps/s 1.8793 KOps/s $\color{#d91a1a}-1.04\%$
test_func_call_cm_runtime[False-compile] 0.7808ms 0.4500ms 2.2221 KOps/s 2.2960 KOps/s $\color{#d91a1a}-3.22\%$
test_func_call_cm_runtime[False-compile-overhead] 1.4053ms 0.4566ms 2.1901 KOps/s 2.2661 KOps/s $\color{#d91a1a}-3.36\%$
test_func_call_cm_runtime[True-eager] 0.9925ms 0.8971ms 1.1147 KOps/s 1.1089 KOps/s $\color{#35bf28}+0.52\%$
test_func_call_cm_runtime[True-compile] 0.9220ms 0.8011ms 1.2484 KOps/s 1.2541 KOps/s $\color{#d91a1a}-0.46\%$
test_func_call_cm_runtime[True-compile-overhead] 0.9376ms 0.8078ms 1.2379 KOps/s 1.2471 KOps/s $\color{#d91a1a}-0.74\%$
test_vmap_func_call_cm_runtime[eager] 2.6093ms 1.9134ms 522.6396 Ops/s 519.8251 Ops/s $\color{#35bf28}+0.54\%$
test_vmap_func_call_cm_runtime[compile] 0.9393ms 0.5482ms 1.8240 KOps/s 1.8795 KOps/s $\color{#d91a1a}-2.95\%$
test_vmap_func_call_cm_runtime[compile-overhead] 1.0457ms 0.5523ms 1.8106 KOps/s 1.8845 KOps/s $\color{#d91a1a}-3.92\%$
test_distributed 0.2321ms 0.1256ms 7.9626 KOps/s 7.7598 KOps/s $\color{#35bf28}+2.61\%$
test_tdmodule 44.8040μs 27.4781μs 36.3927 KOps/s 36.0437 KOps/s $\color{#35bf28}+0.97\%$
test_tdmodule_dispatch 85.6200μs 49.9539μs 20.0185 KOps/s 20.0257 KOps/s $\color{#d91a1a}-0.04\%$
test_tdseq 89.5270μs 33.1119μs 30.2006 KOps/s 32.8252 KOps/s $\textbf{\color{#d91a1a}-8.00\%}$
test_tdseq_dispatch 88.1950μs 56.0515μs 17.8407 KOps/s 17.8366 KOps/s $\color{#35bf28}+0.02\%$
test_instantiation_functorch 1.7894ms 1.5063ms 663.8608 Ops/s 643.5744 Ops/s $\color{#35bf28}+3.15\%$
test_exec_functorch 0.3296ms 0.1770ms 5.6500 KOps/s 5.4358 KOps/s $\color{#35bf28}+3.94\%$
test_exec_functional_call 0.3366ms 0.1697ms 5.8933 KOps/s 5.7280 KOps/s $\color{#35bf28}+2.89\%$
test_exec_td_decorator 0.4519ms 0.2322ms 4.3073 KOps/s 4.2042 KOps/s $\color{#35bf28}+2.45\%$
test_vmap_mlp_speed_decorator[True-True] 1.1363ms 0.6671ms 1.4990 KOps/s 1.4592 KOps/s $\color{#35bf28}+2.73\%$
test_vmap_mlp_speed_decorator[True-False] 1.1778ms 0.6698ms 1.4930 KOps/s 1.4908 KOps/s $\color{#35bf28}+0.14\%$
test_vmap_mlp_speed_decorator[False-True] 0.7328ms 0.5290ms 1.8903 KOps/s 1.8410 KOps/s $\color{#35bf28}+2.68\%$
test_vmap_mlp_speed_decorator[False-False] 0.7501ms 0.5320ms 1.8797 KOps/s 1.8535 KOps/s $\color{#35bf28}+1.42\%$
test_to_module_speed[True] 2.1768ms 1.3128ms 761.7471 Ops/s 736.6011 Ops/s $\color{#35bf28}+3.41\%$
test_to_module_speed[False] 2.1182ms 1.2884ms 776.1305 Ops/s 758.9429 Ops/s $\color{#35bf28}+2.26\%$
test_tc_init 91.5210μs 48.5200μs 20.6101 KOps/s 21.5002 KOps/s $\color{#d91a1a}-4.14\%$
test_tc_init_nested 0.1876ms 97.0134μs 10.3079 KOps/s 10.7453 KOps/s $\color{#d91a1a}-4.07\%$
test_tc_first_layer_tensor 15.1780μs 1.5245μs 655.9513 KOps/s 657.1073 KOps/s $\color{#d91a1a}-0.18\%$
test_tc_first_layer_nontensor 29.7250μs 4.7539μs 210.3537 KOps/s 212.1530 KOps/s $\color{#d91a1a}-0.85\%$
test_tc_second_layer_tensor 28.2530μs 2.8011μs 357.0061 KOps/s 345.4772 KOps/s $\color{#35bf28}+3.34\%$
test_tc_second_layer_nontensor 32.7110μs 5.9419μs 168.2949 KOps/s 165.5416 KOps/s $\color{#35bf28}+1.66\%$
test_unbind 0.2194s 12.8985ms 77.5283 Ops/s 63.9222 Ops/s $\textbf{\color{#35bf28}+21.29\%}$
test_full_like 8.2801ms 7.0149ms 142.5527 Ops/s 120.4196 Ops/s $\textbf{\color{#35bf28}+18.38\%}$
test_zeros_like 4.4683ms 2.7184ms 367.8591 Ops/s 214.8217 Ops/s $\textbf{\color{#35bf28}+71.24\%}$
test_ones_like 3.6256ms 3.0356ms 329.4206 Ops/s 289.7849 Ops/s $\textbf{\color{#35bf28}+13.68\%}$
test_clone 4.9840ms 4.6649ms 214.3690 Ops/s 141.8689 Ops/s $\textbf{\color{#35bf28}+51.10\%}$
test_squeeze 56.1450μs 12.0020μs 83.3193 KOps/s 81.7545 KOps/s $\color{#35bf28}+1.91\%$
test_unsqueeze 0.2662ms 89.7582μs 11.1410 KOps/s 11.1802 KOps/s $\color{#d91a1a}-0.35\%$
test_split 0.3660ms 0.1911ms 5.2322 KOps/s 5.1209 KOps/s $\color{#35bf28}+2.17\%$
test_permute 0.3363ms 0.1943ms 5.1480 KOps/s 4.9653 KOps/s $\color{#35bf28}+3.68\%$
test_stack 27.5389ms 24.4998ms 40.8167 Ops/s 38.2520 Ops/s $\textbf{\color{#35bf28}+6.70\%}$
test_cat 28.3284ms 24.5552ms 40.7246 Ops/s 39.6024 Ops/s $\color{#35bf28}+2.83\%$

@vmoens vmoens merged commit 9197097 into gh/vmoens/45/base Feb 4, 2025
46 of 54 checks passed
vmoens added a commit that referenced this pull request Feb 4, 2025
ghstack-source-id: 6606e4b96061f73b98787b25129c29671a78dc1e
Pull Request resolved: #1192
@vmoens vmoens deleted the gh/vmoens/45/head branch February 4, 2025 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BE Better errors, logs, docs or test utils CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants