Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster tensorclass set #880

Merged
merged 3 commits into from
Jul 12, 2024
Merged

[Performance] Faster tensorclass set #880

merged 3 commits into from
Jul 12, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jul 12, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 12, 2024
Copy link

github-actions bot commented Jul 12, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 133. Improved: $\large\color{#35bf28}29$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.5680μs 16.4427μs 60.8174 KOps/s 57.0725 KOps/s $\textbf{\color{#35bf28}+6.56\%}$
test_plain_set_stack_nested 44.5230μs 16.7501μs 59.7013 KOps/s 57.5169 KOps/s $\color{#35bf28}+3.80\%$
test_plain_set_nested_inplace 50.2040μs 18.5411μs 53.9344 KOps/s 51.9336 KOps/s $\color{#35bf28}+3.85\%$
test_plain_set_stack_nested_inplace 52.3080μs 18.5047μs 54.0404 KOps/s 51.8639 KOps/s $\color{#35bf28}+4.20\%$
test_items 22.5820μs 2.6250μs 380.9463 KOps/s 375.6204 KOps/s $\color{#35bf28}+1.42\%$
test_items_nested 1.6499ms 0.3669ms 2.7258 KOps/s 2.7369 KOps/s $\color{#d91a1a}-0.40\%$
test_items_nested_locked 0.8026ms 0.3667ms 2.7271 KOps/s 2.7113 KOps/s $\color{#35bf28}+0.58\%$
test_items_nested_leaf 0.1482ms 85.9318μs 11.6371 KOps/s 11.6459 KOps/s $\color{#d91a1a}-0.08\%$
test_items_stack_nested 0.6719ms 0.3721ms 2.6872 KOps/s 2.7231 KOps/s $\color{#d91a1a}-1.32\%$
test_items_stack_nested_leaf 0.1562ms 85.3369μs 11.7183 KOps/s 11.9082 KOps/s $\color{#d91a1a}-1.60\%$
test_items_stack_nested_locked 0.6933ms 0.3710ms 2.6957 KOps/s 2.7353 KOps/s $\color{#d91a1a}-1.45\%$
test_keys 48.8110μs 3.9720μs 251.7620 KOps/s 254.0530 KOps/s $\color{#d91a1a}-0.90\%$
test_keys_nested 0.3092ms 0.1457ms 6.8645 KOps/s 6.8826 KOps/s $\color{#d91a1a}-0.26\%$
test_keys_nested_locked 2.1192ms 0.1518ms 6.5863 KOps/s 6.6413 KOps/s $\color{#d91a1a}-0.83\%$
test_keys_nested_leaf 0.2328ms 0.1241ms 8.0595 KOps/s 8.1027 KOps/s $\color{#d91a1a}-0.53\%$
test_keys_stack_nested 0.2506ms 0.1450ms 6.8958 KOps/s 6.9860 KOps/s $\color{#d91a1a}-1.29\%$
test_keys_stack_nested_leaf 0.2374ms 0.1238ms 8.0745 KOps/s 8.2219 KOps/s $\color{#d91a1a}-1.79\%$
test_keys_stack_nested_locked 0.2621ms 0.1504ms 6.6508 KOps/s 6.7671 KOps/s $\color{#d91a1a}-1.72\%$
test_values 7.1985μs 1.2333μs 810.8359 KOps/s 844.6142 KOps/s $\color{#d91a1a}-4.00\%$
test_values_nested 0.1124ms 49.2816μs 20.2916 KOps/s 20.1597 KOps/s $\color{#35bf28}+0.65\%$
test_values_nested_locked 0.1472ms 49.8826μs 20.0471 KOps/s 20.3494 KOps/s $\color{#d91a1a}-1.49\%$
test_values_nested_leaf 84.6480μs 45.1046μs 22.1707 KOps/s 22.4476 KOps/s $\color{#d91a1a}-1.23\%$
test_values_stack_nested 0.1427ms 49.5468μs 20.1829 KOps/s 19.5765 KOps/s $\color{#35bf28}+3.10\%$
test_values_stack_nested_leaf 0.1122ms 44.1921μs 22.6285 KOps/s 22.7514 KOps/s $\color{#d91a1a}-0.54\%$
test_values_stack_nested_locked 90.2580μs 49.5433μs 20.1843 KOps/s 19.7605 KOps/s $\color{#35bf28}+2.14\%$
test_membership 4.5256μs 0.7367μs 1.3573 MOps/s 1.0897 MOps/s $\textbf{\color{#35bf28}+24.56\%}$
test_membership_nested 45.0640μs 2.6547μs 376.6916 KOps/s 334.8930 KOps/s $\textbf{\color{#35bf28}+12.48\%}$
test_membership_nested_leaf 24.8170μs 2.7081μs 369.2583 KOps/s 368.0026 KOps/s $\color{#35bf28}+0.34\%$
test_membership_stacked_nested 31.0880μs 2.6676μs 374.8625 KOps/s 369.9455 KOps/s $\color{#35bf28}+1.33\%$
test_membership_stacked_nested_leaf 18.3040μs 2.7147μs 368.3654 KOps/s 370.8934 KOps/s $\color{#d91a1a}-0.68\%$
test_membership_nested_last 36.4080μs 4.0026μs 249.8363 KOps/s 250.2794 KOps/s $\color{#d91a1a}-0.18\%$
test_membership_nested_leaf_last 31.1180μs 4.0094μs 249.4111 KOps/s 244.5662 KOps/s $\color{#35bf28}+1.98\%$
test_membership_stacked_nested_last 30.7080μs 3.9566μs 252.7404 KOps/s 78.4059 KOps/s $\textbf{\color{#35bf28}+222.35\%}$
test_membership_stacked_nested_leaf_last 28.9640μs 4.0196μs 248.7803 KOps/s 77.5393 KOps/s $\textbf{\color{#35bf28}+220.84\%}$
test_nested_getleaf 35.8570μs 10.9676μs 91.1779 KOps/s 92.1690 KOps/s $\color{#d91a1a}-1.08\%$
test_nested_get 38.5820μs 10.3522μs 96.5981 KOps/s 98.4252 KOps/s $\color{#d91a1a}-1.86\%$
test_stacked_getleaf 48.4200μs 10.9302μs 91.4900 KOps/s 94.0127 KOps/s $\color{#d91a1a}-2.68\%$
test_stacked_get 36.7780μs 10.3663μs 96.4663 KOps/s 98.3106 KOps/s $\color{#d91a1a}-1.88\%$
test_nested_getitemleaf 37.7510μs 11.3697μs 87.9532 KOps/s 89.2116 KOps/s $\color{#d91a1a}-1.41\%$
test_nested_getitem 38.9130μs 10.6048μs 94.2967 KOps/s 96.7293 KOps/s $\color{#d91a1a}-2.51\%$
test_stacked_getitemleaf 50.5960μs 11.4027μs 87.6988 KOps/s 89.0341 KOps/s $\color{#d91a1a}-1.50\%$
test_stacked_getitem 32.4810μs 10.5830μs 94.4913 KOps/s 98.2178 KOps/s $\color{#d91a1a}-3.79\%$
test_lock_nested 3.4835ms 0.4419ms 2.2630 KOps/s 2.2845 KOps/s $\color{#d91a1a}-0.94\%$
test_lock_stack_nested 0.7341ms 0.4101ms 2.4383 KOps/s 2.5299 KOps/s $\color{#d91a1a}-3.62\%$
test_unlock_nested 0.7243ms 0.3593ms 2.7829 KOps/s 2.3652 KOps/s $\textbf{\color{#35bf28}+17.66\%}$
test_unlock_stack_nested 0.5388ms 0.3268ms 3.0603 KOps/s 3.2324 KOps/s $\textbf{\color{#d91a1a}-5.32\%}$
test_flatten_speed 0.2373ms 0.1050ms 9.5252 KOps/s 9.6057 KOps/s $\color{#d91a1a}-0.84\%$
test_unflatten_speed 0.7708ms 0.4425ms 2.2598 KOps/s 2.2702 KOps/s $\color{#d91a1a}-0.45\%$
test_common_ops 5.2313ms 0.7163ms 1.3961 KOps/s 1.2824 KOps/s $\textbf{\color{#35bf28}+8.86\%}$
test_creation 70.9820μs 2.2993μs 434.9102 KOps/s 434.9955 KOps/s $\color{#d91a1a}-0.02\%$
test_creation_empty 51.1950μs 9.3236μs 107.2547 KOps/s 87.1683 KOps/s $\textbf{\color{#35bf28}+23.04\%}$
test_creation_nested_1 38.9520μs 12.6059μs 79.3280 KOps/s 67.8378 KOps/s $\textbf{\color{#35bf28}+16.94\%}$
test_creation_nested_2 44.0820μs 16.0634μs 62.2533 KOps/s 55.4781 KOps/s $\textbf{\color{#35bf28}+12.21\%}$
test_clone 69.2090μs 13.2030μs 75.7401 KOps/s 78.0975 KOps/s $\color{#d91a1a}-3.02\%$
test_getitem[int] 37.8610μs 11.7856μs 84.8492 KOps/s 87.1455 KOps/s $\color{#d91a1a}-2.64\%$
test_getitem[slice_int] 76.0520μs 23.9848μs 41.6931 KOps/s 43.4937 KOps/s $\color{#d91a1a}-4.14\%$
test_getitem[range] 0.1975ms 44.3853μs 22.5300 KOps/s 21.5570 KOps/s $\color{#35bf28}+4.51\%$
test_getitem[tuple] 56.3750μs 19.4061μs 51.5303 KOps/s 52.7721 KOps/s $\color{#d91a1a}-2.35\%$
test_getitem[list] 0.1564ms 39.8530μs 25.0922 KOps/s 24.3392 KOps/s $\color{#35bf28}+3.09\%$
test_setitem_dim[int] 56.9660μs 29.4446μs 33.9621 KOps/s 30.5968 KOps/s $\textbf{\color{#35bf28}+11.00\%}$
test_setitem_dim[slice_int] 0.1120ms 56.7043μs 17.6354 KOps/s 16.0631 KOps/s $\textbf{\color{#35bf28}+9.79\%}$
test_setitem_dim[range] 0.1133ms 76.5954μs 13.0556 KOps/s 12.0372 KOps/s $\textbf{\color{#35bf28}+8.46\%}$
test_setitem_dim[tuple] 69.1790μs 45.4444μs 22.0049 KOps/s 20.0201 KOps/s $\textbf{\color{#35bf28}+9.91\%}$
test_setitem 82.3340μs 18.8865μs 52.9480 KOps/s 50.8290 KOps/s $\color{#35bf28}+4.17\%$
test_set 91.2800μs 18.3934μs 54.3675 KOps/s 51.0313 KOps/s $\textbf{\color{#35bf28}+6.54\%}$
test_set_shared 2.3313ms 0.1669ms 5.9934 KOps/s 6.0280 KOps/s $\color{#d91a1a}-0.57\%$
test_update 0.1320ms 20.2026μs 49.4985 KOps/s 44.3283 KOps/s $\textbf{\color{#35bf28}+11.66\%}$
test_update_nested 0.1454ms 30.0125μs 33.3195 KOps/s 31.1137 KOps/s $\textbf{\color{#35bf28}+7.09\%}$
test_update__nested 83.0550μs 25.0295μs 39.9529 KOps/s 40.4570 KOps/s $\color{#d91a1a}-1.25\%$
test_set_nested 0.1529ms 19.9493μs 50.1271 KOps/s 46.5141 KOps/s $\textbf{\color{#35bf28}+7.77\%}$
test_set_nested_new 0.1316ms 24.8988μs 40.1626 KOps/s 38.1614 KOps/s $\textbf{\color{#35bf28}+5.24\%}$
test_select 0.1658ms 40.7220μs 24.5568 KOps/s 23.9314 KOps/s $\color{#35bf28}+2.61\%$
test_select_nested 0.1306ms 61.9022μs 16.1545 KOps/s 17.0040 KOps/s $\color{#d91a1a}-5.00\%$
test_exclude_nested 0.1437ms 81.7474μs 12.2328 KOps/s 12.8058 KOps/s $\color{#d91a1a}-4.47\%$
test_empty[True] 0.4070ms 0.3412ms 2.9310 KOps/s 2.9905 KOps/s $\color{#d91a1a}-1.99\%$
test_empty[False] 11.6192μs 1.3142μs 760.9007 KOps/s 775.2453 KOps/s $\color{#d91a1a}-1.85\%$
test_unbind_speed 0.3349ms 0.2596ms 3.8523 KOps/s 3.8981 KOps/s $\color{#d91a1a}-1.18\%$
test_unbind_speed_stack0 0.3600ms 0.2582ms 3.8727 KOps/s 4.0297 KOps/s $\color{#d91a1a}-3.89\%$
test_unbind_speed_stack1 80.8449ms 0.7617ms 1.3129 KOps/s 1.5107 KOps/s $\textbf{\color{#d91a1a}-13.10\%}$
test_split 77.9265ms 1.6480ms 606.8031 Ops/s 620.4178 Ops/s $\color{#d91a1a}-2.19\%$
test_chunk 75.7063ms 1.6539ms 604.6136 Ops/s 617.7223 Ops/s $\color{#d91a1a}-2.12\%$
test_creation[device0] 4.3493ms 95.3848μs 10.4838 KOps/s 10.6288 KOps/s $\color{#d91a1a}-1.36\%$
test_creation_from_tensor 0.2532ms 96.7175μs 10.3394 KOps/s 10.4599 KOps/s $\color{#d91a1a}-1.15\%$
test_add_one[memmap_tensor0] 0.1986ms 5.5040μs 181.6852 KOps/s 182.0623 KOps/s $\color{#d91a1a}-0.21\%$
test_contiguous[memmap_tensor0] 18.2040μs 0.6353μs 1.5741 MOps/s 1.5394 MOps/s $\color{#35bf28}+2.25\%$
test_stack[memmap_tensor0] 43.3910μs 3.6931μs 270.7787 KOps/s 269.6970 KOps/s $\color{#35bf28}+0.40\%$
test_memmaptd_index 0.9684ms 0.2528ms 3.9551 KOps/s 3.8869 KOps/s $\color{#35bf28}+1.75\%$
test_memmaptd_index_astensor 0.8899ms 0.3269ms 3.0592 KOps/s 3.0229 KOps/s $\color{#35bf28}+1.20\%$
test_memmaptd_index_op 1.3610ms 0.5785ms 1.7286 KOps/s 1.6251 KOps/s $\textbf{\color{#35bf28}+6.36\%}$
test_serialize_model 0.1287s 0.1217s 8.2168 Ops/s 7.1286 Ops/s $\textbf{\color{#35bf28}+15.27\%}$
test_serialize_model_pickle 0.4600s 0.3912s 2.5562 Ops/s 2.5418 Ops/s $\color{#35bf28}+0.57\%$
test_serialize_weights 0.2079s 0.1333s 7.5027 Ops/s 8.0350 Ops/s $\textbf{\color{#d91a1a}-6.62\%}$
test_serialize_weights_returnearly 0.1721s 0.1610s 6.2113 Ops/s 6.0853 Ops/s $\color{#35bf28}+2.07\%$
test_serialize_weights_pickle 0.4617s 0.4129s 2.4217 Ops/s 2.5344 Ops/s $\color{#d91a1a}-4.45\%$
test_serialize_weights_filesystem 0.1522s 0.1420s 7.0414 Ops/s 6.9212 Ops/s $\color{#35bf28}+1.74\%$
test_serialize_model_filesystem 0.1609s 0.1523s 6.5649 Ops/s 6.5889 Ops/s $\color{#d91a1a}-0.36\%$
test_reshape_pytree 70.6410μs 25.8727μs 38.6508 KOps/s 38.9750 KOps/s $\color{#d91a1a}-0.83\%$
test_reshape_td 78.0560μs 34.2945μs 29.1592 KOps/s 30.0694 KOps/s $\color{#d91a1a}-3.03\%$
test_view_pytree 92.3890μs 25.6274μs 39.0207 KOps/s 38.5869 KOps/s $\color{#35bf28}+1.12\%$
test_view_td 0.1309ms 40.3641μs 24.7745 KOps/s 26.0369 KOps/s $\color{#d91a1a}-4.85\%$
test_unbind_pytree 80.2100μs 29.8295μs 33.5238 KOps/s 34.2462 KOps/s $\color{#d91a1a}-2.11\%$
test_unbind_td 0.3550ms 38.5228μs 25.9587 KOps/s 26.4676 KOps/s $\color{#d91a1a}-1.92\%$
test_split_pytree 61.3350μs 29.5111μs 33.8855 KOps/s 34.2265 KOps/s $\color{#d91a1a}-1.00\%$
test_split_td 0.4707ms 39.9424μs 25.0360 KOps/s 25.3058 KOps/s $\color{#d91a1a}-1.07\%$
test_add_pytree 83.4350μs 35.4789μs 28.1858 KOps/s 28.9087 KOps/s $\color{#d91a1a}-2.50\%$
test_add_td 0.1138ms 51.9229μs 19.2593 KOps/s 17.8895 KOps/s $\textbf{\color{#35bf28}+7.66\%}$
test_distributed 0.2678ms 0.1310ms 7.6360 KOps/s 7.5354 KOps/s $\color{#35bf28}+1.34\%$
test_tdmodule 28.5640μs 15.2074μs 65.7573 KOps/s 58.8963 KOps/s $\textbf{\color{#35bf28}+11.65\%}$
test_tdmodule_dispatch 62.7480μs 32.3615μs 30.9009 KOps/s 28.3202 KOps/s $\textbf{\color{#35bf28}+9.11\%}$
test_tdseq 36.6690μs 16.9253μs 59.0831 KOps/s 52.2886 KOps/s $\textbf{\color{#35bf28}+12.99\%}$
test_tdseq_dispatch 71.3330μs 35.7599μs 27.9643 KOps/s 24.9445 KOps/s $\textbf{\color{#35bf28}+12.11\%}$
test_instantiation_functorch 1.6107ms 1.3212ms 756.9161 Ops/s 749.0238 Ops/s $\color{#35bf28}+1.05\%$
test_instantiation_td 1.6382ms 1.0375ms 963.8401 Ops/s 890.2906 Ops/s $\textbf{\color{#35bf28}+8.26\%}$
test_exec_functorch 0.2864ms 0.1674ms 5.9735 KOps/s 6.2156 KOps/s $\color{#d91a1a}-3.89\%$
test_exec_functional_call 0.2824ms 0.1517ms 6.5918 KOps/s 6.6425 KOps/s $\color{#d91a1a}-0.76\%$
test_exec_td 0.2406ms 0.1510ms 6.6236 KOps/s 6.7799 KOps/s $\color{#d91a1a}-2.31\%$
test_exec_td_decorator 0.5210ms 0.2372ms 4.2157 KOps/s 4.3415 KOps/s $\color{#d91a1a}-2.90\%$
test_vmap_mlp_speed[True-True] 0.6865ms 0.4800ms 2.0832 KOps/s 2.0493 KOps/s $\color{#35bf28}+1.65\%$
test_vmap_mlp_speed[True-False] 0.8128ms 0.4787ms 2.0888 KOps/s 2.0600 KOps/s $\color{#35bf28}+1.40\%$
test_vmap_mlp_speed[False-True] 0.6767ms 0.3984ms 2.5099 KOps/s 2.5217 KOps/s $\color{#d91a1a}-0.47\%$
test_vmap_mlp_speed[False-False] 0.5969ms 0.3996ms 2.5027 KOps/s 2.5204 KOps/s $\color{#d91a1a}-0.70\%$
test_vmap_mlp_speed_decorator[True-True] 1.1672ms 0.5780ms 1.7301 KOps/s 1.7176 KOps/s $\color{#35bf28}+0.73\%$
test_vmap_mlp_speed_decorator[True-False] 0.8237ms 0.5807ms 1.7220 KOps/s 1.7394 KOps/s $\color{#d91a1a}-1.00\%$
test_vmap_mlp_speed_decorator[False-True] 0.7912ms 0.4751ms 2.1047 KOps/s 2.1320 KOps/s $\color{#d91a1a}-1.28\%$
test_vmap_mlp_speed_decorator[False-False] 0.7863ms 0.4816ms 2.0763 KOps/s 2.1226 KOps/s $\color{#d91a1a}-2.18\%$
test_to_module_speed[True] 2.4602ms 1.8351ms 544.9220 Ops/s 560.9202 Ops/s $\color{#d91a1a}-2.85\%$
test_to_module_speed[False] 2.8385ms 1.7848ms 560.2867 Ops/s 569.6100 Ops/s $\color{#d91a1a}-1.64\%$
test_tc_init 98.5840μs 33.4518μs 29.8937 KOps/s 17.8970 KOps/s $\textbf{\color{#35bf28}+67.03\%}$
test_tc_init_nested 0.1275ms 68.8372μs 14.5270 KOps/s 8.6041 KOps/s $\textbf{\color{#35bf28}+68.84\%}$
test_tc_first_layer_tensor 45.8960μs 8.0951μs 123.5322 KOps/s 120.8341 KOps/s $\color{#35bf28}+2.23\%$
test_tc_first_layer_nontensor 35.8160μs 8.0456μs 124.2913 KOps/s 120.8220 KOps/s $\color{#35bf28}+2.87\%$
test_tc_second_layer_tensor 40.3350μs 2.5023μs 399.6248 KOps/s 412.2442 KOps/s $\color{#d91a1a}-3.06\%$
test_tc_second_layer_nontensor 37.3490μs 9.0181μs 110.8879 KOps/s 109.6898 KOps/s $\color{#35bf28}+1.09\%$

Copy link

github-actions bot commented Jul 12, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 141. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}21$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 72.3483ms 16.2061μs 61.7052 KOps/s 84.9381 KOps/s $\textbf{\color{#d91a1a}-27.35\%}$
test_plain_set_stack_nested 29.6700μs 13.0026μs 76.9077 KOps/s 84.7436 KOps/s $\textbf{\color{#d91a1a}-9.25\%}$
test_plain_set_nested_inplace 29.8800μs 13.5403μs 73.8537 KOps/s 77.6066 KOps/s $\color{#d91a1a}-4.84\%$
test_plain_set_stack_nested_inplace 36.5900μs 13.5905μs 73.5805 KOps/s 77.8230 KOps/s $\textbf{\color{#d91a1a}-5.45\%}$
test_items 19.0500μs 4.7778μs 209.3008 KOps/s 210.7509 KOps/s $\color{#d91a1a}-0.69\%$
test_items_nested 0.4634ms 0.3935ms 2.5416 KOps/s 2.5398 KOps/s $\color{#35bf28}+0.07\%$
test_items_nested_locked 0.4772ms 0.3949ms 2.5323 KOps/s 2.5103 KOps/s $\color{#35bf28}+0.87\%$
test_items_nested_leaf 0.1144ms 87.0381μs 11.4892 KOps/s 11.6094 KOps/s $\color{#d91a1a}-1.04\%$
test_items_stack_nested 0.4924ms 0.3996ms 2.5026 KOps/s 2.5250 KOps/s $\color{#d91a1a}-0.89\%$
test_items_stack_nested_leaf 0.1129ms 85.9541μs 11.6341 KOps/s 11.5961 KOps/s $\color{#35bf28}+0.33\%$
test_items_stack_nested_locked 0.4824ms 0.3976ms 2.5154 KOps/s 2.5289 KOps/s $\color{#d91a1a}-0.53\%$
test_keys 19.7800μs 4.3852μs 228.0384 KOps/s 229.5498 KOps/s $\color{#d91a1a}-0.66\%$
test_keys_nested 93.6610μs 68.5627μs 14.5852 KOps/s 14.7949 KOps/s $\color{#d91a1a}-1.42\%$
test_keys_nested_locked 2.0834ms 75.5931μs 13.2287 KOps/s 13.5152 KOps/s $\color{#d91a1a}-2.12\%$
test_keys_nested_leaf 87.3510μs 57.7691μs 17.3103 KOps/s 17.1791 KOps/s $\color{#35bf28}+0.76\%$
test_keys_stack_nested 86.9810μs 67.3466μs 14.8486 KOps/s 14.6798 KOps/s $\color{#35bf28}+1.15\%$
test_keys_stack_nested_leaf 75.2210μs 57.1981μs 17.4831 KOps/s 17.0538 KOps/s $\color{#35bf28}+2.52\%$
test_keys_stack_nested_locked 99.6620μs 72.4353μs 13.8054 KOps/s 13.4693 KOps/s $\color{#35bf28}+2.50\%$
test_values 7.3633μs 1.7595μs 568.3308 KOps/s 572.6121 KOps/s $\color{#d91a1a}-0.75\%$
test_values_nested 47.6410μs 34.3500μs 29.1120 KOps/s 29.0164 KOps/s $\color{#35bf28}+0.33\%$
test_values_nested_locked 58.2110μs 36.1995μs 27.6247 KOps/s 27.3766 KOps/s $\color{#35bf28}+0.91\%$
test_values_nested_leaf 0.5280ms 30.5224μs 32.7628 KOps/s 32.5515 KOps/s $\color{#35bf28}+0.65\%$
test_values_stack_nested 54.3700μs 35.0870μs 28.5006 KOps/s 28.3969 KOps/s $\color{#35bf28}+0.37\%$
test_values_stack_nested_leaf 52.5810μs 31.1516μs 32.1010 KOps/s 31.7247 KOps/s $\color{#35bf28}+1.19\%$
test_values_stack_nested_locked 54.3710μs 36.7451μs 27.2145 KOps/s 27.1170 KOps/s $\color{#35bf28}+0.36\%$
test_membership 1.2340μs 0.5556μs 1.7998 MOps/s 1.8781 MOps/s $\color{#d91a1a}-4.17\%$
test_membership_nested 15.3300μs 2.0887μs 478.7612 KOps/s 479.2769 KOps/s $\color{#d91a1a}-0.11\%$
test_membership_nested_leaf 10.3800μs 2.0057μs 498.5733 KOps/s 494.4746 KOps/s $\color{#35bf28}+0.83\%$
test_membership_stacked_nested 15.5400μs 2.0857μs 479.4552 KOps/s 480.4516 KOps/s $\color{#d91a1a}-0.21\%$
test_membership_stacked_nested_leaf 24.2100μs 2.0744μs 482.0559 KOps/s 481.8150 KOps/s $\color{#35bf28}+0.05\%$
test_membership_nested_last 21.1800μs 2.9743μs 336.2175 KOps/s 337.9324 KOps/s $\color{#d91a1a}-0.51\%$
test_membership_nested_leaf_last 16.9700μs 3.0297μs 330.0707 KOps/s 329.3451 KOps/s $\color{#35bf28}+0.22\%$
test_membership_stacked_nested_last 28.2710μs 9.1980μs 108.7187 KOps/s 334.0949 KOps/s $\textbf{\color{#d91a1a}-67.46\%}$
test_membership_stacked_nested_leaf_last 36.6910μs 9.1629μs 109.1356 KOps/s 329.8273 KOps/s $\textbf{\color{#d91a1a}-66.91\%}$
test_nested_getleaf 26.5610μs 8.0147μs 124.7710 KOps/s 124.4205 KOps/s $\color{#35bf28}+0.28\%$
test_nested_get 22.8600μs 7.5312μs 132.7817 KOps/s 132.3651 KOps/s $\color{#35bf28}+0.31\%$
test_stacked_getleaf 29.6910μs 8.0245μs 124.6190 KOps/s 124.4203 KOps/s $\color{#35bf28}+0.16\%$
test_stacked_get 0.1928ms 7.5401μs 132.6241 KOps/s 132.7370 KOps/s $\color{#d91a1a}-0.09\%$
test_nested_getitemleaf 22.5800μs 8.2482μs 121.2382 KOps/s 122.4789 KOps/s $\color{#d91a1a}-1.01\%$
test_nested_getitem 23.0800μs 7.7018μs 129.8400 KOps/s 129.6281 KOps/s $\color{#35bf28}+0.16\%$
test_stacked_getitemleaf 31.8300μs 8.1955μs 122.0179 KOps/s 122.4285 KOps/s $\color{#d91a1a}-0.34\%$
test_stacked_getitem 23.3000μs 7.6928μs 129.9921 KOps/s 130.1241 KOps/s $\color{#d91a1a}-0.10\%$
test_lock_nested 4.0169ms 0.4225ms 2.3666 KOps/s 2.4132 KOps/s $\color{#d91a1a}-1.93\%$
test_lock_stack_nested 0.4011ms 0.3750ms 2.6669 KOps/s 2.6189 KOps/s $\color{#35bf28}+1.83\%$
test_unlock_nested 88.1824ms 0.4248ms 2.3543 KOps/s 3.0033 KOps/s $\textbf{\color{#d91a1a}-21.61\%}$
test_unlock_stack_nested 0.3205ms 0.2932ms 3.4105 KOps/s 3.3373 KOps/s $\color{#35bf28}+2.19\%$
test_flatten_speed 0.3496ms 0.1066ms 9.3836 KOps/s 9.4314 KOps/s $\color{#d91a1a}-0.51\%$
test_unflatten_speed 0.3183ms 0.2878ms 3.4746 KOps/s 3.4212 KOps/s $\color{#35bf28}+1.56\%$
test_common_ops 0.9630ms 0.5616ms 1.7807 KOps/s 1.6346 KOps/s $\textbf{\color{#35bf28}+8.94\%}$
test_creation 15.5400μs 1.8588μs 537.9711 KOps/s 533.0910 KOps/s $\color{#35bf28}+0.92\%$
test_creation_empty 23.5510μs 8.8128μs 113.4718 KOps/s 133.1204 KOps/s $\textbf{\color{#d91a1a}-14.76\%}$
test_creation_nested_1 29.6900μs 10.6035μs 94.3087 KOps/s 109.2056 KOps/s $\textbf{\color{#d91a1a}-13.64\%}$
test_creation_nested_2 28.9400μs 12.9554μs 77.1878 KOps/s 86.0788 KOps/s $\textbf{\color{#d91a1a}-10.33\%}$
test_clone 57.0610μs 10.9108μs 91.6520 KOps/s 91.6961 KOps/s $\color{#d91a1a}-0.05\%$
test_getitem[int] 28.6810μs 10.0680μs 99.3250 KOps/s 97.3141 KOps/s $\color{#35bf28}+2.07\%$
test_getitem[slice_int] 43.9510μs 19.5044μs 51.2705 KOps/s 51.2673 KOps/s $+0.01\%$
test_getitem[range] 0.1574ms 38.0440μs 26.2853 KOps/s 26.8312 KOps/s $\color{#d91a1a}-2.03\%$
test_getitem[tuple] 33.8600μs 17.4255μs 57.3872 KOps/s 56.9187 KOps/s $\color{#35bf28}+0.82\%$
test_getitem[list] 0.1619ms 31.2780μs 31.9714 KOps/s 31.4708 KOps/s $\color{#35bf28}+1.59\%$
test_setitem_dim[int] 59.4410μs 24.5652μs 40.7080 KOps/s 43.3845 KOps/s $\textbf{\color{#d91a1a}-6.17\%}$
test_setitem_dim[slice_int] 73.5510μs 44.8001μs 22.3214 KOps/s 22.1633 KOps/s $\color{#35bf28}+0.71\%$
test_setitem_dim[range] 94.5110μs 60.7692μs 16.4557 KOps/s 16.8510 KOps/s $\color{#d91a1a}-2.35\%$
test_setitem_dim[tuple] 65.3010μs 38.9330μs 25.6851 KOps/s 26.0204 KOps/s $\color{#d91a1a}-1.29\%$
test_setitem 65.1310μs 15.3387μs 65.1948 KOps/s 68.9567 KOps/s $\textbf{\color{#d91a1a}-5.46\%}$
test_set 67.0610μs 14.8874μs 67.1707 KOps/s 70.8883 KOps/s $\textbf{\color{#d91a1a}-5.24\%}$
test_set_shared 2.7831ms 95.4996μs 10.4712 KOps/s 10.1596 KOps/s $\color{#35bf28}+3.07\%$
test_update 84.7410μs 17.5662μs 56.9274 KOps/s 61.8422 KOps/s $\textbf{\color{#d91a1a}-7.95\%}$
test_update_nested 92.5210μs 23.1134μs 43.2649 KOps/s 47.0844 KOps/s $\textbf{\color{#d91a1a}-8.11\%}$
test_update__nested 79.2810μs 21.3643μs 46.8070 KOps/s 47.5971 KOps/s $\color{#d91a1a}-1.66\%$
test_set_nested 78.3410μs 15.8513μs 63.0865 KOps/s 66.9140 KOps/s $\textbf{\color{#d91a1a}-5.72\%}$
test_set_nested_new 0.1107ms 19.1289μs 52.2771 KOps/s 54.3748 KOps/s $\color{#d91a1a}-3.86\%$
test_select 0.1064ms 31.9616μs 31.2876 KOps/s 32.2158 KOps/s $\color{#d91a1a}-2.88\%$
test_select_nested 86.1310μs 52.7404μs 18.9608 KOps/s 18.2226 KOps/s $\color{#35bf28}+4.05\%$
test_exclude_nested 95.4710μs 71.0778μs 14.0691 KOps/s 13.8422 KOps/s $\color{#35bf28}+1.64\%$
test_empty[True] 0.3459ms 0.2980ms 3.3553 KOps/s 3.3729 KOps/s $\color{#d91a1a}-0.52\%$
test_empty[False] 2.1990μs 0.9153μs 1.0925 MOps/s 1.0268 MOps/s $\textbf{\color{#35bf28}+6.40\%}$
test_to 87.8310μs 58.2296μs 17.1734 KOps/s 16.7980 KOps/s $\color{#35bf28}+2.23\%$
test_to_nonblocking 61.3110μs 34.6040μs 28.8984 KOps/s 27.3899 KOps/s $\textbf{\color{#35bf28}+5.51\%}$
test_unbind_speed 0.2744ms 0.2544ms 3.9312 KOps/s 3.9417 KOps/s $\color{#d91a1a}-0.27\%$
test_unbind_speed_stack0 0.2960ms 0.2481ms 4.0305 KOps/s 3.9251 KOps/s $\color{#35bf28}+2.69\%$
test_unbind_speed_stack1 91.4297ms 0.7697ms 1.2991 KOps/s 1.3646 KOps/s $\color{#d91a1a}-4.80\%$
test_split 89.4906ms 1.5678ms 637.8270 Ops/s 624.6162 Ops/s $\color{#35bf28}+2.12\%$
test_chunk 1.4810ms 1.4286ms 700.0103 Ops/s 686.6939 Ops/s $\color{#35bf28}+1.94\%$
test_creation[device0] 0.1272ms 53.5222μs 18.6838 KOps/s 17.6267 KOps/s $\textbf{\color{#35bf28}+6.00\%}$
test_creation_from_tensor 0.1902ms 53.2773μs 18.7697 KOps/s 18.1860 KOps/s $\color{#35bf28}+3.21\%$
test_add_one[memmap_tensor0] 76.2610μs 6.4768μs 154.3972 KOps/s 157.7866 KOps/s $\color{#d91a1a}-2.15\%$
test_contiguous[memmap_tensor0] 23.9110μs 0.5792μs 1.7264 MOps/s 1.7006 MOps/s $\color{#35bf28}+1.52\%$
test_stack[memmap_tensor0] 30.8800μs 4.3154μs 231.7257 KOps/s 230.0374 KOps/s $\color{#35bf28}+0.73\%$
test_memmaptd_index 1.1580ms 0.2514ms 3.9780 KOps/s 4.0027 KOps/s $\color{#d91a1a}-0.62\%$
test_memmaptd_index_astensor 0.6472ms 0.3140ms 3.1845 KOps/s 3.1548 KOps/s $\color{#35bf28}+0.94\%$
test_memmaptd_index_op 0.8466ms 0.5796ms 1.7253 KOps/s 1.6254 KOps/s $\textbf{\color{#35bf28}+6.15\%}$
test_serialize_model 0.1867s 0.1014s 9.8609 Ops/s 10.5217 Ops/s $\textbf{\color{#d91a1a}-6.28\%}$
test_serialize_model_pickle 1.3507s 1.2355s 0.8094 Ops/s 0.8078 Ops/s $\color{#35bf28}+0.20\%$
test_serialize_weights 92.4095ms 88.1544ms 11.3437 Ops/s 9.6534 Ops/s $\textbf{\color{#35bf28}+17.51\%}$
test_serialize_weights_returnearly 0.1700s 71.0804ms 14.0686 Ops/s 12.7420 Ops/s $\textbf{\color{#35bf28}+10.41\%}$
test_serialize_weights_pickle 1.3506s 1.2434s 0.8042 Ops/s 0.8012 Ops/s $\color{#35bf28}+0.38\%$
test_reshape_pytree 57.3710μs 24.9392μs 40.0975 KOps/s 40.1308 KOps/s $\color{#d91a1a}-0.08\%$
test_reshape_td 54.1910μs 29.8241μs 33.5299 KOps/s 33.7264 KOps/s $\color{#d91a1a}-0.58\%$
test_view_pytree 54.7410μs 24.6256μs 40.6082 KOps/s 39.0081 KOps/s $\color{#35bf28}+4.10\%$
test_view_td 61.7010μs 36.6280μs 27.3015 KOps/s 27.2293 KOps/s $\color{#35bf28}+0.27\%$
test_unbind_pytree 0.1382ms 30.4208μs 32.8722 KOps/s 33.2645 KOps/s $\color{#d91a1a}-1.18\%$
test_unbind_td 0.4929ms 37.8550μs 26.4166 KOps/s 26.2325 KOps/s $\color{#35bf28}+0.70\%$
test_split_pytree 57.8910μs 32.5895μs 30.6847 KOps/s 29.9882 KOps/s $\color{#35bf28}+2.32\%$
test_split_td 0.1728ms 36.8942μs 27.1045 KOps/s 27.8412 KOps/s $\color{#d91a1a}-2.65\%$
test_add_pytree 0.1977ms 38.1657μs 26.2015 KOps/s 27.5814 KOps/s $\textbf{\color{#d91a1a}-5.00\%}$
test_add_td 79.9310μs 47.2661μs 21.1568 KOps/s 21.5317 KOps/s $\color{#d91a1a}-1.74\%$
test_distributed 1.8492ms 71.7807μs 13.9313 KOps/s 12.6709 KOps/s $\textbf{\color{#35bf28}+9.95\%}$
test_tdmodule 0.1382ms 14.2458μs 70.1959 KOps/s 76.8187 KOps/s $\textbf{\color{#d91a1a}-8.62\%}$
test_tdmodule_dispatch 45.8110μs 28.3875μs 35.2267 KOps/s 37.5329 KOps/s $\textbf{\color{#d91a1a}-6.14\%}$
test_tdseq 30.8200μs 15.1658μs 65.9380 KOps/s 69.2038 KOps/s $\color{#d91a1a}-4.72\%$
test_tdseq_dispatch 51.3200μs 30.8989μs 32.3637 KOps/s 34.1273 KOps/s $\textbf{\color{#d91a1a}-5.17\%}$
test_instantiation_functorch 1.5365ms 1.3667ms 731.6872 Ops/s 734.3950 Ops/s $\color{#d91a1a}-0.37\%$
test_instantiation_td 92.6883ms 1.0848ms 921.8122 Ops/s 1.0397 KOps/s $\textbf{\color{#d91a1a}-11.34\%}$
test_exec_functorch 0.1805ms 0.1421ms 7.0350 KOps/s 6.9509 KOps/s $\color{#35bf28}+1.21\%$
test_exec_functional_call 0.1693ms 0.1339ms 7.4692 KOps/s 7.6727 KOps/s $\color{#d91a1a}-2.65\%$
test_exec_td 0.1704ms 0.1334ms 7.4973 KOps/s 7.7864 KOps/s $\color{#d91a1a}-3.71\%$
test_exec_td_decorator 0.7454ms 0.2122ms 4.7117 KOps/s 4.8847 KOps/s $\color{#d91a1a}-3.54\%$
test_vmap_mlp_speed[True-True] 0.7677ms 0.5903ms 1.6941 KOps/s 1.7364 KOps/s $\color{#d91a1a}-2.44\%$
test_vmap_mlp_speed[True-False] 0.6456ms 0.5896ms 1.6960 KOps/s 1.7454 KOps/s $\color{#d91a1a}-2.83\%$
test_vmap_mlp_speed[False-True] 0.6996ms 0.5340ms 1.8726 KOps/s 1.9572 KOps/s $\color{#d91a1a}-4.32\%$
test_vmap_mlp_speed[False-False] 0.6971ms 0.5317ms 1.8807 KOps/s 1.9433 KOps/s $\color{#d91a1a}-3.22\%$
test_vmap_mlp_speed_decorator[True-True] 1.2255ms 0.6610ms 1.5128 KOps/s 1.5423 KOps/s $\color{#d91a1a}-1.91\%$
test_vmap_mlp_speed_decorator[True-False] 0.9377ms 0.6705ms 1.4913 KOps/s 1.5451 KOps/s $\color{#d91a1a}-3.48\%$
test_vmap_mlp_speed_decorator[False-True] 0.8116ms 0.5758ms 1.7368 KOps/s 1.7439 KOps/s $\color{#d91a1a}-0.41\%$
test_vmap_mlp_speed_decorator[False-False] 0.7733ms 0.5751ms 1.7389 KOps/s 1.7506 KOps/s $\color{#d91a1a}-0.67\%$
test_vmap_transformer_speed[True-True] 7.7946ms 7.6034ms 131.5196 Ops/s 126.4953 Ops/s $\color{#35bf28}+3.97\%$
test_vmap_transformer_speed[True-False] 7.9651ms 7.5830ms 131.8735 Ops/s 128.8308 Ops/s $\color{#35bf28}+2.36\%$
test_vmap_transformer_speed[False-True] 8.9126ms 7.5354ms 132.7065 Ops/s 129.4671 Ops/s $\color{#35bf28}+2.50\%$
test_vmap_transformer_speed[False-False] 8.4337ms 7.8352ms 127.6294 Ops/s 131.0116 Ops/s $\color{#d91a1a}-2.58\%$
test_vmap_transformer_speed_decorator[True-True] 19.4653ms 19.1562ms 52.2023 Ops/s 52.5511 Ops/s $\color{#d91a1a}-0.66\%$
test_vmap_transformer_speed_decorator[True-False] 19.6081ms 19.1634ms 52.1829 Ops/s 52.7703 Ops/s $\color{#d91a1a}-1.11\%$
test_vmap_transformer_speed_decorator[False-True] 19.6467ms 19.0140ms 52.5928 Ops/s 52.9564 Ops/s $\color{#d91a1a}-0.69\%$
test_vmap_transformer_speed_decorator[False-False] 20.0737ms 19.1902ms 52.1099 Ops/s 52.8977 Ops/s $\color{#d91a1a}-1.49\%$
test_to_module_speed[True] 2.9206ms 1.5976ms 625.9245 Ops/s 648.6298 Ops/s $\color{#d91a1a}-3.50\%$
test_to_module_speed[False] 2.0358ms 1.5688ms 637.4180 Ops/s 651.8659 Ops/s $\color{#d91a1a}-2.22\%$
test_tc_init 0.1689ms 34.2705μs 29.1796 KOps/s 19.8105 KOps/s $\textbf{\color{#35bf28}+47.29\%}$
test_tc_init_nested 0.3959ms 70.1147μs 14.2624 KOps/s 10.0658 KOps/s $\textbf{\color{#35bf28}+41.69\%}$
test_tc_first_layer_tensor 0.1355ms 3.5831μs 279.0878 KOps/s 285.2919 KOps/s $\color{#d91a1a}-2.17\%$
test_tc_first_layer_nontensor 0.1284ms 3.5945μs 278.2041 KOps/s 281.5400 KOps/s $\color{#d91a1a}-1.18\%$
test_tc_second_layer_tensor 27.5964μs 1.1356μs 880.5617 KOps/s 906.5597 KOps/s $\color{#d91a1a}-2.87\%$
test_tc_second_layer_nontensor 0.1168ms 4.1234μs 242.5176 KOps/s 247.0890 KOps/s $\color{#d91a1a}-1.85\%$

@vmoens vmoens merged commit a90cb02 into main Jul 12, 2024
41 checks passed
@vmoens vmoens deleted the faster-tc-set branch July 12, 2024 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants