Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Avoid lazy stacks in stack if not asked explicitly #741

Merged
merged 17 commits into from
Apr 23, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Apr 22, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 22, 2024
Copy link

github-actions bot commented Apr 22, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 37.9910μs 16.8653μs 59.2934 KOps/s 60.6679 KOps/s $\color{#d91a1a}-2.27\%$
test_plain_set_stack_nested 34.3350μs 16.9020μs 59.1645 KOps/s 61.0408 KOps/s $\color{#d91a1a}-3.07\%$
test_plain_set_nested_inplace 52.6390μs 19.2138μs 52.0459 KOps/s 53.2914 KOps/s $\color{#d91a1a}-2.34\%$
test_plain_set_stack_nested_inplace 63.8500μs 19.2798μs 51.8678 KOps/s 53.0946 KOps/s $\color{#d91a1a}-2.31\%$
test_items 26.6700μs 2.6370μs 379.2171 KOps/s 404.7363 KOps/s $\textbf{\color{#d91a1a}-6.31\%}$
test_items_nested 0.3988ms 0.2688ms 3.7200 KOps/s 3.7210 KOps/s $\color{#d91a1a}-0.03\%$
test_items_nested_locked 1.3369ms 0.2701ms 3.7027 KOps/s 3.6865 KOps/s $\color{#35bf28}+0.44\%$
test_items_nested_leaf 0.1458ms 77.3302μs 12.9316 KOps/s 12.9325 KOps/s $-0.01\%$
test_items_stack_nested 0.3383ms 0.2712ms 3.6871 KOps/s 3.6914 KOps/s $\color{#d91a1a}-0.12\%$
test_items_stack_nested_leaf 0.1559ms 77.1024μs 12.9698 KOps/s 12.1852 KOps/s $\textbf{\color{#35bf28}+6.44\%}$
test_items_stack_nested_locked 1.3708ms 0.2734ms 3.6578 KOps/s 3.6883 KOps/s $\color{#d91a1a}-0.83\%$
test_keys 18.1540μs 3.8836μs 257.4943 KOps/s 250.9740 KOps/s $\color{#35bf28}+2.60\%$
test_keys_nested 0.2409ms 0.1349ms 7.4139 KOps/s 7.4525 KOps/s $\color{#d91a1a}-0.52\%$
test_keys_nested_locked 0.7060ms 0.1405ms 7.1191 KOps/s 7.0935 KOps/s $\color{#35bf28}+0.36\%$
test_keys_nested_leaf 0.2100ms 0.1152ms 8.6781 KOps/s 8.7151 KOps/s $\color{#d91a1a}-0.42\%$
test_keys_stack_nested 0.2875ms 0.1359ms 7.3588 KOps/s 7.2093 KOps/s $\color{#35bf28}+2.07\%$
test_keys_stack_nested_leaf 0.2059ms 0.1148ms 8.7127 KOps/s 8.6006 KOps/s $\color{#35bf28}+1.30\%$
test_keys_stack_nested_locked 0.2618ms 0.1413ms 7.0765 KOps/s 7.1671 KOps/s $\color{#d91a1a}-1.27\%$
test_values 8.9390μs 1.1500μs 869.5673 KOps/s 869.2098 KOps/s $\color{#35bf28}+0.04\%$
test_values_nested 91.6020μs 50.5258μs 19.7919 KOps/s 19.6348 KOps/s $\color{#35bf28}+0.80\%$
test_values_nested_locked 0.1004ms 50.8488μs 19.6661 KOps/s 19.5522 KOps/s $\color{#35bf28}+0.58\%$
test_values_nested_leaf 90.5700μs 45.8766μs 21.7976 KOps/s 21.6091 KOps/s $\color{#35bf28}+0.87\%$
test_values_stack_nested 97.6330μs 51.0064μs 19.6054 KOps/s 19.6463 KOps/s $\color{#d91a1a}-0.21\%$
test_values_stack_nested_leaf 91.2710μs 46.1064μs 21.6889 KOps/s 21.5593 KOps/s $\color{#35bf28}+0.60\%$
test_values_stack_nested_locked 0.1009ms 51.0062μs 19.6055 KOps/s 19.3876 KOps/s $\color{#35bf28}+1.12\%$
test_membership 10.3290μs 1.3299μs 751.9475 KOps/s 726.9149 KOps/s $\color{#35bf28}+3.44\%$
test_membership_nested 40.4160μs 3.3955μs 294.5068 KOps/s 293.2729 KOps/s $\color{#35bf28}+0.42\%$
test_membership_nested_leaf 21.6310μs 3.4000μs 294.1210 KOps/s 292.1164 KOps/s $\color{#35bf28}+0.69\%$
test_membership_stacked_nested 41.3410μs 3.3950μs 294.5498 KOps/s 282.9978 KOps/s $\color{#35bf28}+4.08\%$
test_membership_stacked_nested_leaf 26.9810μs 3.4170μs 292.6516 KOps/s 284.2080 KOps/s $\color{#35bf28}+2.97\%$
test_membership_nested_last 25.5080μs 4.1883μs 238.7624 KOps/s 240.4607 KOps/s $\color{#d91a1a}-0.71\%$
test_membership_nested_leaf_last 20.6890μs 4.1988μs 238.1646 KOps/s 239.6757 KOps/s $\color{#d91a1a}-0.63\%$
test_membership_stacked_nested_last 21.9010μs 4.2034μs 237.9008 KOps/s 242.4831 KOps/s $\color{#d91a1a}-1.89\%$
test_membership_stacked_nested_leaf_last 20.3490μs 4.1606μs 240.3515 KOps/s 240.4869 KOps/s $\color{#d91a1a}-0.06\%$
test_nested_getleaf 53.1100μs 10.6676μs 93.7419 KOps/s 95.3893 KOps/s $\color{#d91a1a}-1.73\%$
test_nested_get 47.6490μs 10.2403μs 97.6529 KOps/s 101.4319 KOps/s $\color{#d91a1a}-3.73\%$
test_stacked_getleaf 30.1870μs 10.7013μs 93.4468 KOps/s 90.5809 KOps/s $\color{#35bf28}+3.16\%$
test_stacked_get 48.2210μs 10.0375μs 99.6261 KOps/s 100.6660 KOps/s $\color{#d91a1a}-1.03\%$
test_nested_getitemleaf 46.7080μs 11.3771μs 87.8961 KOps/s 89.8128 KOps/s $\color{#d91a1a}-2.13\%$
test_nested_getitem 31.5890μs 10.3815μs 96.3251 KOps/s 96.8545 KOps/s $\color{#d91a1a}-0.55\%$
test_stacked_getitemleaf 45.2750μs 11.3105μs 88.4134 KOps/s 89.7589 KOps/s $\color{#d91a1a}-1.50\%$
test_stacked_getitem 50.3640μs 10.2734μs 97.3386 KOps/s 97.7359 KOps/s $\color{#d91a1a}-0.41\%$
test_lock_nested 46.3967ms 0.3893ms 2.5687 KOps/s 2.9001 KOps/s $\textbf{\color{#d91a1a}-11.43\%}$
test_lock_stack_nested 0.4983ms 0.3161ms 3.1637 KOps/s 3.2064 KOps/s $\color{#d91a1a}-1.33\%$
test_unlock_nested 73.0626ms 0.4172ms 2.3968 KOps/s 2.3700 KOps/s $\color{#35bf28}+1.13\%$
test_unlock_stack_nested 0.6586ms 0.3208ms 3.1175 KOps/s 3.1263 KOps/s $\color{#d91a1a}-0.28\%$
test_flatten_speed 0.3444ms 93.7554μs 10.6661 KOps/s 10.7679 KOps/s $\color{#d91a1a}-0.95\%$
test_unflatten_speed 0.6631ms 0.4038ms 2.4765 KOps/s 2.4621 KOps/s $\color{#35bf28}+0.59\%$
test_common_ops 4.6027ms 0.7100ms 1.4084 KOps/s 1.4512 KOps/s $\color{#d91a1a}-2.95\%$
test_creation 19.9880μs 1.8662μs 535.8560 KOps/s 538.6131 KOps/s $\color{#d91a1a}-0.51\%$
test_creation_empty 48.2400μs 10.0343μs 99.6583 KOps/s 98.0101 KOps/s $\color{#35bf28}+1.68\%$
test_creation_nested_1 40.9370μs 12.7230μs 78.5977 KOps/s 79.9746 KOps/s $\color{#d91a1a}-1.72\%$
test_creation_nested_2 55.8140μs 15.9289μs 62.7791 KOps/s 62.3078 KOps/s $\color{#35bf28}+0.76\%$
test_clone 61.3850μs 13.6961μs 73.0137 KOps/s 76.2110 KOps/s $\color{#d91a1a}-4.20\%$
test_getitem[int] 36.6790μs 11.6859μs 85.5732 KOps/s 89.4407 KOps/s $\color{#d91a1a}-4.32\%$
test_getitem[slice_int] 62.8280μs 23.1488μs 43.1987 KOps/s 43.5609 KOps/s $\color{#d91a1a}-0.83\%$
test_getitem[range] 0.1496ms 42.5617μs 23.4953 KOps/s 24.0318 KOps/s $\color{#d91a1a}-2.23\%$
test_getitem[tuple] 59.5220μs 19.1110μs 52.3258 KOps/s 53.6353 KOps/s $\color{#d91a1a}-2.44\%$
test_getitem[list] 0.1634ms 38.2350μs 26.1541 KOps/s 26.2533 KOps/s $\color{#d91a1a}-0.38\%$
test_setitem_dim[int] 61.7660μs 35.0863μs 28.5012 KOps/s 29.6438 KOps/s $\color{#d91a1a}-3.85\%$
test_setitem_dim[slice_int] 88.5860μs 61.7142μs 16.2037 KOps/s 16.1458 KOps/s $\color{#35bf28}+0.36\%$
test_setitem_dim[range] 0.1417ms 80.9019μs 12.3606 KOps/s 12.7977 KOps/s $\color{#d91a1a}-3.41\%$
test_setitem_dim[tuple] 89.5980μs 51.3648μs 19.4686 KOps/s 20.4427 KOps/s $\color{#d91a1a}-4.76\%$
test_setitem 73.9390μs 20.3916μs 49.0399 KOps/s 51.5327 KOps/s $\color{#d91a1a}-4.84\%$
test_set 73.9390μs 19.6654μs 50.8508 KOps/s 52.1421 KOps/s $\color{#d91a1a}-2.48\%$
test_set_shared 4.5140ms 0.1453ms 6.8832 KOps/s 7.1454 KOps/s $\color{#d91a1a}-3.67\%$
test_update 87.5250μs 21.4967μs 46.5189 KOps/s 49.1197 KOps/s $\textbf{\color{#d91a1a}-5.29\%}$
test_update_nested 82.2040μs 29.8179μs 33.5369 KOps/s 35.0195 KOps/s $\color{#d91a1a}-4.23\%$
test_update__nested 79.2590μs 25.1926μs 39.6941 KOps/s 40.5822 KOps/s $\color{#d91a1a}-2.19\%$
test_set_nested 68.5580μs 21.8545μs 45.7572 KOps/s 48.2058 KOps/s $\textbf{\color{#d91a1a}-5.08\%}$
test_set_nested_new 70.5630μs 25.6161μs 39.0380 KOps/s 39.8565 KOps/s $\color{#d91a1a}-2.05\%$
test_select 94.1160μs 40.5600μs 24.6548 KOps/s 25.1059 KOps/s $\color{#d91a1a}-1.80\%$
test_select_nested 0.8625ms 60.2280μs 16.6036 KOps/s 16.8835 KOps/s $\color{#d91a1a}-1.66\%$
test_exclude_nested 0.2263ms 0.1184ms 8.4426 KOps/s 8.4431 KOps/s $-0.01\%$
test_empty[True] 0.4654ms 0.3884ms 2.5749 KOps/s 2.5495 KOps/s $\color{#35bf28}+1.00\%$
test_empty[False] 7.8488μs 1.0605μs 942.9478 KOps/s 948.3856 KOps/s $\color{#d91a1a}-0.57\%$
test_unbind_speed 1.5905ms 0.2546ms 3.9279 KOps/s 3.9386 KOps/s $\color{#d91a1a}-0.27\%$
test_unbind_speed_stack0 0.4635ms 0.2515ms 3.9755 KOps/s 3.9571 KOps/s $\color{#35bf28}+0.47\%$
test_unbind_speed_stack1 0.1138s 0.6912ms 1.4468 KOps/s 1.4353 KOps/s $\color{#35bf28}+0.80\%$
test_split 1.7293ms 1.5217ms 657.1680 Ops/s 598.8692 Ops/s $\textbf{\color{#35bf28}+9.73\%}$
test_chunk 0.1054s 1.6788ms 595.6663 Ops/s 661.5490 Ops/s $\textbf{\color{#d91a1a}-9.96\%}$
test_creation[device0] 5.6049ms 0.1060ms 9.4365 KOps/s 9.8724 KOps/s $\color{#d91a1a}-4.42\%$
test_creation_from_tensor 0.1797ms 82.7910μs 12.0786 KOps/s 12.3546 KOps/s $\color{#d91a1a}-2.23\%$
test_add_one[memmap_tensor0] 95.4700μs 5.6713μs 176.3265 KOps/s 185.8565 KOps/s $\textbf{\color{#d91a1a}-5.13\%}$
test_contiguous[memmap_tensor0] 13.7760μs 0.6277μs 1.5931 MOps/s 1.5957 MOps/s $\color{#d91a1a}-0.16\%$
test_stack[memmap_tensor0] 19.7370μs 3.7178μs 268.9772 KOps/s 289.3241 KOps/s $\textbf{\color{#d91a1a}-7.03\%}$
test_memmaptd_index 0.9085ms 0.2452ms 4.0776 KOps/s 4.1117 KOps/s $\color{#d91a1a}-0.83\%$
test_memmaptd_index_astensor 0.6869ms 0.3056ms 3.2727 KOps/s 3.2414 KOps/s $\color{#35bf28}+0.97\%$
test_memmaptd_index_op 0.9179ms 0.5972ms 1.6744 KOps/s 1.6742 KOps/s $\color{#35bf28}+0.02\%$
test_serialize_model 0.2261s 0.1167s 8.5659 Ops/s 8.6452 Ops/s $\color{#d91a1a}-0.92\%$
test_serialize_model_pickle 0.4584s 0.3767s 2.6549 Ops/s 2.6254 Ops/s $\color{#35bf28}+1.12\%$
test_serialize_weights 0.1126s 0.1004s 9.9641 Ops/s 9.9147 Ops/s $\color{#35bf28}+0.50\%$
test_serialize_weights_returnearly 0.2426s 0.1364s 7.3289 Ops/s 7.9228 Ops/s $\textbf{\color{#d91a1a}-7.50\%}$
test_serialize_weights_pickle 0.9951s 0.5646s 1.7712 Ops/s 2.3854 Ops/s $\textbf{\color{#d91a1a}-25.75\%}$
test_serialize_weights_filesystem 98.8387ms 90.9870ms 10.9906 Ops/s 9.2639 Ops/s $\textbf{\color{#35bf28}+18.64\%}$
test_serialize_model_filesystem 0.1009s 92.2321ms 10.8422 Ops/s 10.6292 Ops/s $\color{#35bf28}+2.00\%$
test_reshape_pytree 58.0900μs 20.9537μs 47.7243 KOps/s 48.1205 KOps/s $\color{#d91a1a}-0.82\%$
test_reshape_td 70.4930μs 32.2198μs 31.0368 KOps/s 31.6235 KOps/s $\color{#d91a1a}-1.86\%$
test_view_pytree 54.9330μs 20.7368μs 48.2234 KOps/s 47.6439 KOps/s $\color{#35bf28}+1.22\%$
test_view_td 0.1174s 58.3808μs 17.1289 KOps/s 16.2562 KOps/s $\textbf{\color{#35bf28}+5.37\%}$
test_unbind_pytree 60.6630μs 25.0117μs 39.9814 KOps/s 40.4244 KOps/s $\color{#d91a1a}-1.10\%$
test_unbind_td 0.1064ms 37.6417μs 26.5663 KOps/s 26.9859 KOps/s $\color{#d91a1a}-1.56\%$
test_split_pytree 52.1080μs 24.5072μs 40.8044 KOps/s 41.7278 KOps/s $\color{#d91a1a}-2.21\%$
test_split_td 0.1105ms 41.2610μs 24.2360 KOps/s 24.6831 KOps/s $\color{#d91a1a}-1.81\%$
test_add_pytree 73.0370μs 31.0638μs 32.1918 KOps/s 33.2804 KOps/s $\color{#d91a1a}-3.27\%$
test_add_td 95.7900μs 56.4743μs 17.7072 KOps/s 18.0278 KOps/s $\color{#d91a1a}-1.78\%$
test_distributed 0.2319ms 99.1956μs 10.0811 KOps/s 9.9516 KOps/s $\color{#35bf28}+1.30\%$
test_tdmodule 39.3940μs 17.2536μs 57.9589 KOps/s 57.4246 KOps/s $\color{#35bf28}+0.93\%$
test_tdmodule_dispatch 65.8030μs 34.6716μs 28.8420 KOps/s 29.1216 KOps/s $\color{#d91a1a}-0.96\%$
test_tdseq 42.3300μs 20.2441μs 49.3970 KOps/s 50.3398 KOps/s $\color{#d91a1a}-1.87\%$
test_tdseq_dispatch 70.9240μs 39.3434μs 25.4172 KOps/s 26.0131 KOps/s $\color{#d91a1a}-2.29\%$
test_instantiation_functorch 1.5600ms 1.3140ms 761.0140 Ops/s 770.7965 Ops/s $\color{#d91a1a}-1.27\%$
test_instantiation_td 1.5153ms 1.0145ms 985.7409 Ops/s 946.5618 Ops/s $\color{#35bf28}+4.14\%$
test_exec_functorch 0.3077ms 0.1617ms 6.1836 KOps/s 6.3425 KOps/s $\color{#d91a1a}-2.51\%$
test_exec_functional_call 0.3848ms 0.1518ms 6.5875 KOps/s 6.7629 KOps/s $\color{#d91a1a}-2.59\%$
test_exec_td 0.2212ms 0.1473ms 6.7894 KOps/s 6.8925 KOps/s $\color{#d91a1a}-1.50\%$
test_exec_td_decorator 0.8643ms 0.1992ms 5.0206 KOps/s 5.0973 KOps/s $\color{#d91a1a}-1.51\%$
test_vmap_mlp_speed[True-True] 0.7377ms 0.4742ms 2.1089 KOps/s 2.1054 KOps/s $\color{#35bf28}+0.16\%$
test_vmap_mlp_speed[True-False] 0.8242ms 0.4690ms 2.1323 KOps/s 2.1084 KOps/s $\color{#35bf28}+1.13\%$
test_vmap_mlp_speed[False-True] 0.6384ms 0.3980ms 2.5128 KOps/s 2.5884 KOps/s $\color{#d91a1a}-2.92\%$
test_vmap_mlp_speed[False-False] 0.6730ms 0.3839ms 2.6048 KOps/s 2.5949 KOps/s $\color{#35bf28}+0.38\%$
test_vmap_mlp_speed_decorator[True-True] 0.9807ms 0.4921ms 2.0320 KOps/s 2.0229 KOps/s $\color{#35bf28}+0.45\%$
test_vmap_mlp_speed_decorator[True-False] 0.6538ms 0.4907ms 2.0379 KOps/s 2.0204 KOps/s $\color{#35bf28}+0.86\%$
test_vmap_mlp_speed_decorator[False-True] 0.6475ms 0.3983ms 2.5104 KOps/s 2.4803 KOps/s $\color{#35bf28}+1.21\%$
test_vmap_mlp_speed_decorator[False-False] 0.7450ms 0.3997ms 2.5016 KOps/s 2.4886 KOps/s $\color{#35bf28}+0.52\%$
test_to_module_speed[True] 1.4692ms 1.3831ms 723.0249 Ops/s 701.8447 Ops/s $\color{#35bf28}+3.02\%$
test_to_module_speed[False] 1.4436ms 1.3590ms 735.8256 Ops/s 718.7213 Ops/s $\color{#35bf28}+2.38\%$

@vmoens vmoens added the bug Something isn't working label Apr 22, 2024
@vmoens vmoens merged commit 5d3df5b into main Apr 23, 2024
44 of 48 checks passed
@vmoens vmoens deleted the better-stack branch April 23, 2024 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BC-breaking bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants