Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Limit number of threads in workers for .map() #638

Merged
merged 1 commit into from
Jan 24, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 24, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 24, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 124. Improved: $\large\color{#35bf28}37$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 48.2410μs 15.9998μs 62.5006 KOps/s 54.7986 KOps/s $\textbf{\color{#35bf28}+14.06\%}$
test_plain_set_stack_nested 0.1782ms 0.1419ms 7.0451 KOps/s 6.4607 KOps/s $\textbf{\color{#35bf28}+9.05\%}$
test_plain_set_nested_inplace 62.5370μs 18.0239μs 55.4820 KOps/s 50.6606 KOps/s $\textbf{\color{#35bf28}+9.52\%}$
test_plain_set_stack_nested_inplace 0.2961ms 0.1778ms 5.6242 KOps/s 5.1307 KOps/s $\textbf{\color{#35bf28}+9.62\%}$
test_items 12.9250μs 2.3710μs 421.7685 KOps/s 376.7549 KOps/s $\textbf{\color{#35bf28}+11.95\%}$
test_items_nested 0.4645ms 0.2716ms 3.6817 KOps/s 3.6405 KOps/s $\color{#35bf28}+1.13\%$
test_items_nested_locked 0.4906ms 0.2699ms 3.7047 KOps/s 3.6342 KOps/s $\color{#35bf28}+1.94\%$
test_items_nested_leaf 0.5221ms 0.1666ms 6.0022 KOps/s 5.9451 KOps/s $\color{#35bf28}+0.96\%$
test_items_stack_nested 2.0924ms 1.3196ms 757.8235 Ops/s 728.7630 Ops/s $\color{#35bf28}+3.99\%$
test_items_stack_nested_leaf 2.1966ms 1.2056ms 829.4341 Ops/s 812.4826 Ops/s $\color{#35bf28}+2.09\%$
test_items_stack_nested_locked 1.4740ms 0.8639ms 1.1576 KOps/s 1.1169 KOps/s $\color{#35bf28}+3.64\%$
test_keys 19.4970μs 3.9091μs 255.8163 KOps/s 258.3890 KOps/s $\color{#d91a1a}-1.00\%$
test_keys_nested 51.3228ms 0.1583ms 6.3175 KOps/s 6.7278 KOps/s $\textbf{\color{#d91a1a}-6.10\%}$
test_keys_nested_locked 0.2923ms 0.1525ms 6.5592 KOps/s 6.5653 KOps/s $\color{#d91a1a}-0.09\%$
test_keys_nested_leaf 0.2384ms 0.1316ms 7.5981 KOps/s 7.5870 KOps/s $\color{#35bf28}+0.15\%$
test_keys_stack_nested 1.9761ms 1.2599ms 793.7287 Ops/s 754.9971 Ops/s $\textbf{\color{#35bf28}+5.13\%}$
test_keys_stack_nested_leaf 2.1421ms 1.2516ms 798.9588 Ops/s 754.6222 Ops/s $\textbf{\color{#35bf28}+5.88\%}$
test_keys_stack_nested_locked 1.3308ms 0.7871ms 1.2704 KOps/s 1.1881 KOps/s $\textbf{\color{#35bf28}+6.93\%}$
test_values 9.7233μs 1.1605μs 861.7289 KOps/s 860.0241 KOps/s $\color{#35bf28}+0.20\%$
test_values_nested 92.5600μs 52.4736μs 19.0572 KOps/s 19.6201 KOps/s $\color{#d91a1a}-2.87\%$
test_values_nested_locked 0.1076ms 52.1911μs 19.1604 KOps/s 19.7165 KOps/s $\color{#d91a1a}-2.82\%$
test_values_nested_leaf 0.1137ms 46.6121μs 21.4537 KOps/s 21.9615 KOps/s $\color{#d91a1a}-2.31\%$
test_values_stack_nested 1.2691ms 1.0438ms 958.0660 Ops/s 937.8966 Ops/s $\color{#35bf28}+2.15\%$
test_values_stack_nested_leaf 1.1792ms 1.0286ms 972.1581 Ops/s 943.1555 Ops/s $\color{#35bf28}+3.08\%$
test_values_stack_nested_locked 0.7784ms 0.5968ms 1.6755 KOps/s 1.6475 KOps/s $\color{#35bf28}+1.70\%$
test_membership 11.1900μs 1.3365μs 748.2395 KOps/s 744.6730 KOps/s $\color{#35bf28}+0.48\%$
test_membership_nested 26.6800μs 3.4621μs 288.8385 KOps/s 282.2811 KOps/s $\color{#35bf28}+2.32\%$
test_membership_nested_leaf 20.1080μs 3.4619μs 288.8612 KOps/s 274.8820 KOps/s $\textbf{\color{#35bf28}+5.09\%}$
test_membership_stacked_nested 55.9940μs 11.7174μs 85.3432 KOps/s 78.1193 KOps/s $\textbf{\color{#35bf28}+9.25\%}$
test_membership_stacked_nested_leaf 34.0240μs 11.8067μs 84.6980 KOps/s 73.6338 KOps/s $\textbf{\color{#35bf28}+15.03\%}$
test_membership_nested_last 47.0380μs 6.6400μs 150.6033 KOps/s 149.9788 KOps/s $\color{#35bf28}+0.42\%$
test_membership_nested_leaf_last 49.9730μs 6.7281μs 148.6308 KOps/s 147.9757 KOps/s $\color{#35bf28}+0.44\%$
test_membership_stacked_nested_last 0.3139ms 0.1747ms 5.7254 KOps/s 5.5477 KOps/s $\color{#35bf28}+3.20\%$
test_membership_stacked_nested_leaf_last 0.2810ms 14.5718μs 68.6259 KOps/s 66.0890 KOps/s $\color{#35bf28}+3.84\%$
test_nested_getleaf 45.7050μs 10.4908μs 95.3213 KOps/s 93.1436 KOps/s $\color{#35bf28}+2.34\%$
test_nested_get 30.7270μs 10.0350μs 99.6515 KOps/s 98.1711 KOps/s $\color{#35bf28}+1.51\%$
test_stacked_getleaf 0.6153ms 0.3985ms 2.5092 KOps/s 2.4435 KOps/s $\color{#35bf28}+2.69\%$
test_stacked_get 0.5686ms 0.3607ms 2.7722 KOps/s 2.6842 KOps/s $\color{#35bf28}+3.28\%$
test_nested_getitemleaf 42.4600μs 11.8930μs 84.0834 KOps/s 83.4490 KOps/s $\color{#35bf28}+0.76\%$
test_nested_getitem 52.1470μs 11.4404μs 87.4097 KOps/s 86.8074 KOps/s $\color{#35bf28}+0.69\%$
test_stacked_getitemleaf 0.9166ms 0.4001ms 2.4995 KOps/s 2.4248 KOps/s $\color{#35bf28}+3.08\%$
test_stacked_getitem 0.5631ms 0.3703ms 2.7008 KOps/s 2.6843 KOps/s $\color{#35bf28}+0.62\%$
test_lock_nested 0.6990ms 0.3328ms 3.0045 KOps/s 2.9383 KOps/s $\color{#35bf28}+2.25\%$
test_lock_stack_nested 70.7185ms 5.2704ms 189.7403 Ops/s 186.1908 Ops/s $\color{#35bf28}+1.91\%$
test_unlock_nested 0.7455ms 0.3334ms 2.9991 KOps/s 2.4932 KOps/s $\textbf{\color{#35bf28}+20.29\%}$
test_unlock_stack_nested 70.8509ms 5.3876ms 185.6123 Ops/s 179.8277 Ops/s $\color{#35bf28}+3.22\%$
test_flatten_speed 0.7078ms 0.3667ms 2.7269 KOps/s 2.7104 KOps/s $\color{#35bf28}+0.61\%$
test_unflatten_speed 0.5337ms 0.4624ms 2.1628 KOps/s 2.1183 KOps/s $\color{#35bf28}+2.10\%$
test_common_ops 1.1489ms 0.6287ms 1.5906 KOps/s 1.4745 KOps/s $\textbf{\color{#35bf28}+7.87\%}$
test_creation 17.9440μs 1.9088μs 523.8922 KOps/s 531.2583 KOps/s $\color{#d91a1a}-1.39\%$
test_creation_empty 23.8750μs 7.7038μs 129.8056 KOps/s 111.0733 KOps/s $\textbf{\color{#35bf28}+16.86\%}$
test_creation_nested_1 31.8100μs 10.3712μs 96.4210 KOps/s 85.6024 KOps/s $\textbf{\color{#35bf28}+12.64\%}$
test_creation_nested_2 46.7370μs 13.5295μs 73.9128 KOps/s 66.8783 KOps/s $\textbf{\color{#35bf28}+10.52\%}$
test_clone 88.9960μs 12.8792μs 77.6445 KOps/s 76.4470 KOps/s $\color{#35bf28}+1.57\%$
test_getitem[int] 49.8630μs 11.0124μs 90.8067 KOps/s 90.4465 KOps/s $\color{#35bf28}+0.40\%$
test_getitem[slice_int] 59.6120μs 21.9455μs 45.5675 KOps/s 43.4093 KOps/s $\color{#35bf28}+4.97\%$
test_getitem[range] 90.2490μs 39.6951μs 25.1920 KOps/s 23.2619 KOps/s $\textbf{\color{#35bf28}+8.30\%}$
test_getitem[tuple] 51.5570μs 17.8465μs 56.0333 KOps/s 54.3051 KOps/s $\color{#35bf28}+3.18\%$
test_getitem[list] 0.3881ms 34.8161μs 28.7223 KOps/s 26.7472 KOps/s $\textbf{\color{#35bf28}+7.38\%}$
test_setitem_dim[int] 72.5050μs 26.9957μs 37.0429 KOps/s 33.3409 KOps/s $\textbf{\color{#35bf28}+11.10\%}$
test_setitem_dim[slice_int] 87.8150μs 51.5211μs 19.4095 KOps/s 17.3795 KOps/s $\textbf{\color{#35bf28}+11.68\%}$
test_setitem_dim[range] 0.1361ms 68.9732μs 14.4984 KOps/s 13.1685 KOps/s $\textbf{\color{#35bf28}+10.10\%}$
test_setitem_dim[tuple] 76.4430μs 41.4102μs 24.1486 KOps/s 21.6863 KOps/s $\textbf{\color{#35bf28}+11.35\%}$
test_setitem 0.1286ms 17.4908μs 57.1730 KOps/s 52.4268 KOps/s $\textbf{\color{#35bf28}+9.05\%}$
test_set 86.3620μs 16.8985μs 59.1769 KOps/s 54.0410 KOps/s $\textbf{\color{#35bf28}+9.50\%}$
test_set_shared 3.0789ms 0.1447ms 6.9128 KOps/s 6.9692 KOps/s $\color{#d91a1a}-0.81\%$
test_update 93.0540μs 18.7601μs 53.3047 KOps/s 47.4821 KOps/s $\textbf{\color{#35bf28}+12.26\%}$
test_update_nested 0.1004ms 25.8360μs 38.7057 KOps/s 34.6340 KOps/s $\textbf{\color{#35bf28}+11.76\%}$
test_set_nested 78.9680μs 18.7825μs 53.2410 KOps/s 49.0411 KOps/s $\textbf{\color{#35bf28}+8.56\%}$
test_set_nested_new 98.9550μs 22.9877μs 43.5015 KOps/s 40.9623 KOps/s $\textbf{\color{#35bf28}+6.20\%}$
test_select 0.1090ms 36.3046μs 27.5447 KOps/s 25.9725 KOps/s $\textbf{\color{#35bf28}+6.05\%}$
test_select_nested 0.1208ms 57.7477μs 17.3167 KOps/s 17.3298 KOps/s $\color{#d91a1a}-0.08\%$
test_exclude_nested 0.2564ms 0.1173ms 8.5234 KOps/s 8.3383 KOps/s $\color{#35bf28}+2.22\%$
test_empty[True] 0.4862ms 0.4096ms 2.4412 KOps/s 2.3703 KOps/s $\color{#35bf28}+2.99\%$
test_empty[False] 9.0610μs 1.0935μs 914.5257 KOps/s 883.2895 KOps/s $\color{#35bf28}+3.54\%$
test_unbind_speed 0.3558ms 0.2424ms 4.1246 KOps/s 3.9851 KOps/s $\color{#35bf28}+3.50\%$
test_unbind_speed_stack0 70.3489ms 3.2225ms 310.3138 Ops/s 330.2691 Ops/s $\textbf{\color{#d91a1a}-6.04\%}$
test_unbind_speed_stack1 35.6160μs 1.9100μs 523.5516 KOps/s 508.7441 KOps/s $\color{#35bf28}+2.91\%$
test_split 66.6641ms 1.6245ms 615.5661 Ops/s 629.2814 Ops/s $\color{#d91a1a}-2.18\%$
test_chunk 68.6627ms 1.5643ms 639.2590 Ops/s 639.1402 Ops/s $\color{#35bf28}+0.02\%$
test_creation[device0] 0.2117ms 0.1004ms 9.9579 KOps/s 10.0033 KOps/s $\color{#d91a1a}-0.45\%$
test_creation_from_tensor 3.5889ms 81.0833μs 12.3330 KOps/s 12.2684 KOps/s $\color{#35bf28}+0.53\%$
test_add_one[memmap_tensor0] 0.3647ms 5.3486μs 186.9634 KOps/s 180.4440 KOps/s $\color{#35bf28}+3.61\%$
test_contiguous[memmap_tensor0] 24.1160μs 0.6322μs 1.5817 MOps/s 1.5651 MOps/s $\color{#35bf28}+1.06\%$
test_stack[memmap_tensor0] 68.9790μs 3.4318μs 291.3906 KOps/s 277.5042 KOps/s $\textbf{\color{#35bf28}+5.00\%}$
test_memmaptd_index 0.8507ms 0.2198ms 4.5498 KOps/s 4.4305 KOps/s $\color{#35bf28}+2.69\%$
test_memmaptd_index_astensor 0.5201ms 0.2779ms 3.5984 KOps/s 3.4616 KOps/s $\color{#35bf28}+3.95\%$
test_memmaptd_index_op 1.1999ms 0.5317ms 1.8807 KOps/s 1.8054 KOps/s $\color{#35bf28}+4.17\%$
test_serialize_model 0.1692s 0.1043s 9.5841 Ops/s 9.0928 Ops/s $\textbf{\color{#35bf28}+5.40\%}$
test_serialize_model_pickle 0.4555s 0.3790s 2.6388 Ops/s 2.6165 Ops/s $\color{#35bf28}+0.85\%$
test_serialize_weights 0.1589s 0.1038s 9.6310 Ops/s 9.3916 Ops/s $\color{#35bf28}+2.55\%$
test_serialize_weights_returnearly 0.3174s 0.1472s 6.7915 Ops/s 7.5604 Ops/s $\textbf{\color{#d91a1a}-10.17\%}$
test_serialize_weights_pickle 0.8916s 0.5208s 1.9202 Ops/s 2.4169 Ops/s $\textbf{\color{#d91a1a}-20.55\%}$
test_serialize_weights_filesystem 0.1484s 95.4235ms 10.4796 Ops/s 10.9124 Ops/s $\color{#d91a1a}-3.97\%$
test_serialize_model_filesystem 99.9152ms 90.9125ms 10.9996 Ops/s 9.6846 Ops/s $\textbf{\color{#35bf28}+13.58\%}$
test_reshape_pytree 68.0200μs 22.9886μs 43.4999 KOps/s 43.2926 KOps/s $\color{#35bf28}+0.48\%$
test_reshape_td 71.2130μs 29.7901μs 33.5682 KOps/s 33.5645 KOps/s $\color{#35bf28}+0.01\%$
test_view_pytree 60.0230μs 22.9333μs 43.6048 KOps/s 42.6318 KOps/s $\color{#35bf28}+2.28\%$
test_view_td 38.6820μs 4.8373μs 206.7276 KOps/s 199.6755 KOps/s $\color{#35bf28}+3.53\%$
test_unbind_pytree 86.7630μs 26.1760μs 38.2030 KOps/s 37.5254 KOps/s $\color{#35bf28}+1.81\%$
test_unbind_td 0.4888ms 35.6288μs 28.0672 KOps/s 27.5371 KOps/s $\color{#35bf28}+1.92\%$
test_split_pytree 81.1710μs 26.2772μs 38.0558 KOps/s 37.9291 KOps/s $\color{#35bf28}+0.33\%$
test_split_td 0.1178ms 39.8682μs 25.0826 KOps/s 24.5838 KOps/s $\color{#35bf28}+2.03\%$
test_add_pytree 81.9740μs 32.2038μs 31.0522 KOps/s 30.2658 KOps/s $\color{#35bf28}+2.60\%$
test_add_td 0.1174ms 45.7201μs 21.8722 KOps/s 20.1138 KOps/s $\textbf{\color{#35bf28}+8.74\%}$
test_distributed 0.1808ms 97.8180μs 10.2231 KOps/s 9.9139 KOps/s $\color{#35bf28}+3.12\%$
test_tdmodule 0.1040ms 20.5742μs 48.6045 KOps/s 44.2514 KOps/s $\textbf{\color{#35bf28}+9.84\%}$
test_tdmodule_dispatch 0.2583ms 40.3205μs 24.8013 KOps/s 23.5297 KOps/s $\textbf{\color{#35bf28}+5.40\%}$
test_tdseq 46.4160μs 24.0803μs 41.5276 KOps/s 39.9060 KOps/s $\color{#35bf28}+4.06\%$
test_tdseq_dispatch 0.1386ms 44.2809μs 22.5831 KOps/s 21.0886 KOps/s $\textbf{\color{#35bf28}+7.09\%}$
test_instantiation_functorch 1.8414ms 1.3054ms 766.0606 Ops/s 758.9714 Ops/s $\color{#35bf28}+0.93\%$
test_instantiation_td 1.5301ms 1.0126ms 987.5982 Ops/s 988.0826 Ops/s $\color{#d91a1a}-0.05\%$
test_exec_functorch 0.2956ms 0.1565ms 6.3912 KOps/s 6.2830 KOps/s $\color{#35bf28}+1.72\%$
test_exec_functional_call 0.2893ms 0.1468ms 6.8101 KOps/s 6.6942 KOps/s $\color{#35bf28}+1.73\%$
test_exec_td 0.2829ms 0.1437ms 6.9583 KOps/s 6.7897 KOps/s $\color{#35bf28}+2.48\%$
test_exec_td_decorator 0.8709ms 0.1804ms 5.5436 KOps/s 5.5794 KOps/s $\color{#d91a1a}-0.64\%$
test_vmap_mlp_speed[True-True] 1.1845ms 0.8640ms 1.1573 KOps/s 1.1031 KOps/s $\color{#35bf28}+4.91\%$
test_vmap_mlp_speed[True-False] 0.6725ms 0.4597ms 2.1755 KOps/s 2.1138 KOps/s $\color{#35bf28}+2.92\%$
test_vmap_mlp_speed[False-True] 1.1628ms 0.7648ms 1.3075 KOps/s 1.2499 KOps/s $\color{#35bf28}+4.61\%$
test_vmap_mlp_speed[False-False] 0.5364ms 0.3797ms 2.6335 KOps/s 2.5584 KOps/s $\color{#35bf28}+2.93\%$
test_vmap_mlp_speed_decorator[True-True] 69.0421ms 2.3836ms 419.5325 Ops/s 420.6313 Ops/s $\color{#d91a1a}-0.26\%$
test_vmap_mlp_speed_decorator[True-False] 0.8169ms 0.5156ms 1.9393 KOps/s 1.9015 KOps/s $\color{#35bf28}+1.99\%$
test_vmap_mlp_speed_decorator[False-True] 2.3915ms 1.8185ms 549.8926 Ops/s 518.4389 Ops/s $\textbf{\color{#35bf28}+6.07\%}$
test_vmap_mlp_speed_decorator[False-False] 0.7704ms 0.3982ms 2.5114 KOps/s 2.4879 KOps/s $\color{#35bf28}+0.95\%$

@vmoens vmoens added the bug Something isn't working label Jan 24, 2024
@vmoens vmoens merged commit 4c746fd into main Jan 24, 2024
40 of 44 checks passed
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 132. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}18$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.6514ms 14.8121μs 67.5122 KOps/s 74.0097 KOps/s $\textbf{\color{#d91a1a}-8.78\%}$
test_plain_set_stack_nested 0.1645ms 0.1214ms 8.2361 KOps/s 8.3208 KOps/s $\color{#d91a1a}-1.02\%$
test_plain_set_nested_inplace 34.6320μs 16.1540μs 61.9041 KOps/s 67.2996 KOps/s $\textbf{\color{#d91a1a}-8.02\%}$
test_plain_set_stack_nested_inplace 0.2166ms 0.1506ms 6.6404 KOps/s 6.6764 KOps/s $\color{#d91a1a}-0.54\%$
test_items 20.9010μs 4.7669μs 209.7799 KOps/s 211.5981 KOps/s $\color{#d91a1a}-0.86\%$
test_items_nested 0.4128ms 0.3452ms 2.8966 KOps/s 2.8759 KOps/s $\color{#35bf28}+0.72\%$
test_items_nested_locked 0.4247ms 0.3465ms 2.8862 KOps/s 2.8583 KOps/s $\color{#35bf28}+0.98\%$
test_items_nested_leaf 0.2818ms 0.2037ms 4.9102 KOps/s 4.8894 KOps/s $\color{#35bf28}+0.43\%$
test_items_stack_nested 1.4481ms 1.3510ms 740.2167 Ops/s 739.3137 Ops/s $\color{#35bf28}+0.12\%$
test_items_stack_nested_leaf 1.2683ms 1.1761ms 850.2512 Ops/s 845.0361 Ops/s $\color{#35bf28}+0.62\%$
test_items_stack_nested_locked 1.9767ms 0.9283ms 1.0773 KOps/s 1.0825 KOps/s $\color{#d91a1a}-0.48\%$
test_keys 25.7300μs 4.7685μs 209.7111 KOps/s 217.7950 KOps/s $\color{#d91a1a}-3.71\%$
test_keys_nested 0.4810ms 95.0969μs 10.5156 KOps/s 10.5191 KOps/s $\color{#d91a1a}-0.03\%$
test_keys_nested_locked 0.1233ms 98.8030μs 10.1211 KOps/s 10.1250 KOps/s $\color{#d91a1a}-0.04\%$
test_keys_nested_leaf 0.1808ms 78.7316μs 12.7014 KOps/s 12.5779 KOps/s $\color{#35bf28}+0.98\%$
test_keys_stack_nested 1.2743ms 1.1784ms 848.6267 Ops/s 852.0999 Ops/s $\color{#d91a1a}-0.41\%$
test_keys_stack_nested_leaf 1.2687ms 1.1717ms 853.4668 Ops/s 858.0167 Ops/s $\color{#d91a1a}-0.53\%$
test_keys_stack_nested_locked 0.8479ms 0.7507ms 1.3320 KOps/s 1.3296 KOps/s $\color{#35bf28}+0.18\%$
test_values 8.5567μs 1.9076μs 524.2316 KOps/s 528.5651 KOps/s $\color{#d91a1a}-0.82\%$
test_values_nested 67.4410μs 45.8945μs 21.7891 KOps/s 22.0990 KOps/s $\color{#d91a1a}-1.40\%$
test_values_nested_locked 69.4010μs 47.6624μs 20.9809 KOps/s 21.0460 KOps/s $\color{#d91a1a}-0.31\%$
test_values_nested_leaf 62.5710μs 39.7297μs 25.1701 KOps/s 25.1507 KOps/s $\color{#35bf28}+0.08\%$
test_values_stack_nested 1.3992ms 0.9911ms 1.0090 KOps/s 996.8528 Ops/s $\color{#35bf28}+1.22\%$
test_values_stack_nested_leaf 1.1085ms 0.9947ms 1.0054 KOps/s 1.0112 KOps/s $\color{#d91a1a}-0.58\%$
test_values_stack_nested_locked 0.6824ms 0.6020ms 1.6610 KOps/s 1.6722 KOps/s $\color{#d91a1a}-0.67\%$
test_membership 4.3462μs 0.9366μs 1.0677 MOps/s 1.0668 MOps/s $\color{#35bf28}+0.08\%$
test_membership_nested 29.5500μs 2.8686μs 348.6017 KOps/s 350.8112 KOps/s $\color{#d91a1a}-0.63\%$
test_membership_nested_leaf 26.2700μs 2.8736μs 347.9977 KOps/s 346.8671 KOps/s $\color{#35bf28}+0.33\%$
test_membership_stacked_nested 54.5510μs 11.6325μs 85.9657 KOps/s 85.3086 KOps/s $\color{#35bf28}+0.77\%$
test_membership_stacked_nested_leaf 30.5210μs 11.6283μs 85.9969 KOps/s 86.0874 KOps/s $\color{#d91a1a}-0.11\%$
test_membership_nested_last 21.1110μs 5.2593μs 190.1397 KOps/s 191.7591 KOps/s $\color{#d91a1a}-0.84\%$
test_membership_nested_leaf_last 20.7310μs 5.2640μs 189.9711 KOps/s 191.2049 KOps/s $\color{#d91a1a}-0.65\%$
test_membership_stacked_nested_last 0.2423ms 0.1568ms 6.3775 KOps/s 6.4116 KOps/s $\color{#d91a1a}-0.53\%$
test_membership_stacked_nested_leaf_last 39.1100μs 13.4596μs 74.2963 KOps/s 74.7339 KOps/s $\color{#d91a1a}-0.59\%$
test_nested_getleaf 29.1500μs 8.4457μs 118.4029 KOps/s 119.0714 KOps/s $\color{#d91a1a}-0.56\%$
test_nested_get 28.4500μs 7.9616μs 125.6028 KOps/s 126.6091 KOps/s $\color{#d91a1a}-0.79\%$
test_stacked_getleaf 0.4052ms 0.3361ms 2.9756 KOps/s 2.9845 KOps/s $\color{#d91a1a}-0.30\%$
test_stacked_get 0.3564ms 0.2994ms 3.3398 KOps/s 3.3452 KOps/s $\color{#d91a1a}-0.16\%$
test_nested_getitemleaf 25.0000μs 9.8807μs 101.2076 KOps/s 101.3146 KOps/s $\color{#d91a1a}-0.11\%$
test_nested_getitem 31.2300μs 9.3963μs 106.4253 KOps/s 106.4302 KOps/s $-0.00\%$
test_stacked_getitemleaf 0.3932ms 0.3404ms 2.9381 KOps/s 2.9421 KOps/s $\color{#d91a1a}-0.14\%$
test_stacked_getitem 0.3555ms 0.3013ms 3.3188 KOps/s 3.3089 KOps/s $\color{#35bf28}+0.30\%$
test_lock_nested 0.7959ms 0.3601ms 2.7774 KOps/s 2.7667 KOps/s $\color{#35bf28}+0.38\%$
test_lock_stack_nested 84.0660ms 6.3094ms 158.4943 Ops/s 159.0229 Ops/s $\color{#d91a1a}-0.33\%$
test_unlock_nested 0.7883ms 0.3591ms 2.7848 KOps/s 2.3023 KOps/s $\textbf{\color{#35bf28}+20.96\%}$
test_unlock_stack_nested 82.8834ms 6.4013ms 156.2183 Ops/s 157.3844 Ops/s $\color{#d91a1a}-0.74\%$
test_flatten_speed 0.5255ms 0.2679ms 3.7326 KOps/s 3.7459 KOps/s $\color{#d91a1a}-0.36\%$
test_unflatten_speed 0.4263ms 0.3637ms 2.7497 KOps/s 2.7468 KOps/s $\color{#35bf28}+0.10\%$
test_common_ops 1.2675ms 0.6727ms 1.4865 KOps/s 1.6043 KOps/s $\textbf{\color{#d91a1a}-7.34\%}$
test_creation 42.5610μs 1.5518μs 644.4102 KOps/s 630.0611 KOps/s $\color{#35bf28}+2.28\%$
test_creation_empty 38.6800μs 10.3482μs 96.6352 KOps/s 126.6962 KOps/s $\textbf{\color{#d91a1a}-23.73\%}$
test_creation_nested_1 31.3900μs 12.1126μs 82.5584 KOps/s 103.9835 KOps/s $\textbf{\color{#d91a1a}-20.60\%}$
test_creation_nested_2 32.5500μs 14.5462μs 68.7464 KOps/s 82.8264 KOps/s $\textbf{\color{#d91a1a}-17.00\%}$
test_clone 0.1590ms 14.9495μs 66.8920 KOps/s 68.7666 KOps/s $\color{#d91a1a}-2.73\%$
test_getitem[int] 36.9310μs 11.5541μs 86.5492 KOps/s 86.5429 KOps/s $+0.01\%$
test_getitem[slice_int] 45.2310μs 21.9761μs 45.5039 KOps/s 44.2766 KOps/s $\color{#35bf28}+2.77\%$
test_getitem[range] 69.2410μs 38.4036μs 26.0392 KOps/s 25.8335 KOps/s $\color{#35bf28}+0.80\%$
test_getitem[tuple] 45.9900μs 19.5334μs 51.1943 KOps/s 50.4676 KOps/s $\color{#35bf28}+1.44\%$
test_getitem[list] 71.1010μs 34.2663μs 29.1832 KOps/s 26.9678 KOps/s $\textbf{\color{#35bf28}+8.22\%}$
test_setitem_dim[int] 55.8610μs 30.7432μs 32.5275 KOps/s 34.6015 KOps/s $\textbf{\color{#d91a1a}-5.99\%}$
test_setitem_dim[slice_int] 75.8510μs 52.4024μs 19.0831 KOps/s 18.9924 KOps/s $\color{#35bf28}+0.48\%$
test_setitem_dim[range] 0.1031ms 66.5795μs 15.0196 KOps/s 14.6027 KOps/s $\color{#35bf28}+2.86\%$
test_setitem_dim[tuple] 76.6810μs 45.3207μs 22.0650 KOps/s 22.4092 KOps/s $\color{#d91a1a}-1.54\%$
test_setitem 0.1594ms 20.7362μs 48.2247 KOps/s 47.6637 KOps/s $\color{#35bf28}+1.18\%$
test_set 0.1389ms 20.1837μs 49.5450 KOps/s 50.7756 KOps/s $\color{#d91a1a}-2.42\%$
test_set_shared 2.8093ms 0.1077ms 9.2840 KOps/s 9.2764 KOps/s $\color{#35bf28}+0.08\%$
test_update 0.1166ms 23.5378μs 42.4848 KOps/s 46.7591 KOps/s $\textbf{\color{#d91a1a}-9.14\%}$
test_update_nested 0.1745ms 30.4847μs 32.8034 KOps/s 34.5282 KOps/s $\color{#d91a1a}-5.00\%$
test_set_nested 66.3210μs 21.5678μs 46.3654 KOps/s 49.0645 KOps/s $\textbf{\color{#d91a1a}-5.50\%}$
test_set_nested_new 0.1340ms 24.5943μs 40.6598 KOps/s 42.0934 KOps/s $\color{#d91a1a}-3.41\%$
test_select 62.8110μs 37.3846μs 26.7490 KOps/s 26.5357 KOps/s $\color{#35bf28}+0.80\%$
test_select_nested 71.5920μs 53.0315μs 18.8567 KOps/s 18.8053 KOps/s $\color{#35bf28}+0.27\%$
test_exclude_nested 0.1472ms 0.1127ms 8.8751 KOps/s 8.8579 KOps/s $\color{#35bf28}+0.19\%$
test_empty[True] 0.4540ms 0.3895ms 2.5672 KOps/s 2.5907 KOps/s $\color{#d91a1a}-0.91\%$
test_empty[False] 2.8470μs 0.8675μs 1.1528 MOps/s 1.1571 MOps/s $\color{#d91a1a}-0.38\%$
test_to 73.8910μs 55.2708μs 18.0927 KOps/s 18.0021 KOps/s $\color{#35bf28}+0.50\%$
test_to_nonblocking 84.1110μs 35.6125μs 28.0800 KOps/s 28.1722 KOps/s $\color{#d91a1a}-0.33\%$
test_unbind_speed 0.3458ms 0.2775ms 3.6031 KOps/s 3.6748 KOps/s $\color{#d91a1a}-1.95\%$
test_unbind_speed_stack0 81.1868ms 3.7485ms 266.7717 Ops/s 263.7376 Ops/s $\color{#35bf28}+1.15\%$
test_unbind_speed_stack1 46.1120μs 1.8795μs 532.0616 KOps/s 541.8418 KOps/s $\color{#d91a1a}-1.80\%$
test_split 2.2329ms 1.6311ms 613.0752 Ops/s 557.4659 Ops/s $\textbf{\color{#35bf28}+9.98\%}$
test_chunk 76.6039ms 1.7542ms 570.0626 Ops/s 576.6036 Ops/s $\color{#d91a1a}-1.13\%$
test_creation[device0] 0.1399ms 74.2811μs 13.4624 KOps/s 13.5914 KOps/s $\color{#d91a1a}-0.95\%$
test_creation_from_tensor 0.1265ms 55.1898μs 18.1193 KOps/s 18.4123 KOps/s $\color{#d91a1a}-1.59\%$
test_add_one[memmap_tensor0] 0.1350ms 7.9930μs 125.1100 KOps/s 127.4107 KOps/s $\color{#d91a1a}-1.81\%$
test_contiguous[memmap_tensor0] 11.8000μs 0.6440μs 1.5528 MOps/s 1.5420 MOps/s $\color{#35bf28}+0.70\%$
test_stack[memmap_tensor0] 30.7900μs 4.8325μs 206.9306 KOps/s 208.4729 KOps/s $\color{#d91a1a}-0.74\%$
test_memmaptd_index 1.0458ms 0.2747ms 3.6402 KOps/s 3.7177 KOps/s $\color{#d91a1a}-2.08\%$
test_memmaptd_index_astensor 0.6353ms 0.3273ms 3.0555 KOps/s 3.0816 KOps/s $\color{#d91a1a}-0.85\%$
test_memmaptd_index_op 1.0995ms 0.6825ms 1.4653 KOps/s 1.5731 KOps/s $\textbf{\color{#d91a1a}-6.85\%}$
test_serialize_model 0.1713s 98.6042ms 10.1416 Ops/s 10.6871 Ops/s $\textbf{\color{#d91a1a}-5.11\%}$
test_serialize_model_pickle 1.3537s 1.2373s 0.8082 Ops/s 0.8055 Ops/s $\color{#35bf28}+0.33\%$
test_serialize_weights 0.1666s 96.3373ms 10.3802 Ops/s 10.0594 Ops/s $\color{#35bf28}+3.19\%$
test_serialize_weights_returnearly 0.2599s 81.7552ms 12.2316 Ops/s 14.2611 Ops/s $\textbf{\color{#d91a1a}-14.23\%}$
test_serialize_weights_pickle 1.3484s 1.2377s 0.8079 Ops/s 0.8083 Ops/s $\color{#d91a1a}-0.04\%$
test_reshape_pytree 58.6810μs 25.6503μs 38.9860 KOps/s 39.5798 KOps/s $\color{#d91a1a}-1.50\%$
test_reshape_td 0.2896ms 30.8126μs 32.4543 KOps/s 33.7042 KOps/s $\color{#d91a1a}-3.71\%$
test_view_pytree 54.6500μs 25.1815μs 39.7117 KOps/s 40.2753 KOps/s $\color{#d91a1a}-1.40\%$
test_view_td 27.1310μs 4.3147μs 231.7679 KOps/s 231.5113 KOps/s $\color{#35bf28}+0.11\%$
test_unbind_pytree 0.2238ms 31.8164μs 31.4303 KOps/s 32.2123 KOps/s $\color{#d91a1a}-2.43\%$
test_unbind_td 0.5065ms 42.2490μs 23.6692 KOps/s 24.3883 KOps/s $\color{#d91a1a}-2.95\%$
test_split_pytree 53.7210μs 30.1316μs 33.1878 KOps/s 33.6857 KOps/s $\color{#d91a1a}-1.48\%$
test_split_td 0.5422ms 40.8122μs 24.5025 KOps/s 24.5412 KOps/s $\color{#d91a1a}-0.16\%$
test_add_pytree 66.6710μs 38.9536μs 25.6715 KOps/s 26.4233 KOps/s $\color{#d91a1a}-2.84\%$
test_add_td 0.2785ms 56.3453μs 17.7477 KOps/s 20.0792 KOps/s $\textbf{\color{#d91a1a}-11.61\%}$
test_distributed 1.9386ms 72.7804μs 13.7400 KOps/s 13.3631 KOps/s $\color{#35bf28}+2.82\%$
test_tdmodule 40.5310μs 19.2959μs 51.8245 KOps/s 56.4623 KOps/s $\textbf{\color{#d91a1a}-8.21\%}$
test_tdmodule_dispatch 0.2391ms 40.6789μs 24.5828 KOps/s 27.7649 KOps/s $\textbf{\color{#d91a1a}-11.46\%}$
test_tdseq 38.0600μs 22.4111μs 44.6208 KOps/s 48.3888 KOps/s $\textbf{\color{#d91a1a}-7.79\%}$
test_tdseq_dispatch 71.9610μs 42.5842μs 23.4829 KOps/s 25.5123 KOps/s $\textbf{\color{#d91a1a}-7.95\%}$
test_instantiation_functorch 1.8622ms 1.7251ms 579.6725 Ops/s 579.2599 Ops/s $\color{#35bf28}+0.07\%$
test_instantiation_td 1.8350ms 1.1959ms 836.1831 Ops/s 842.3804 Ops/s $\color{#d91a1a}-0.74\%$
test_exec_functorch 0.2068ms 0.1676ms 5.9661 KOps/s 5.9259 KOps/s $\color{#35bf28}+0.68\%$
test_exec_functional_call 0.2328ms 0.1707ms 5.8584 KOps/s 5.9465 KOps/s $\color{#d91a1a}-1.48\%$
test_exec_td 0.2156ms 0.1652ms 6.0528 KOps/s 6.2574 KOps/s $\color{#d91a1a}-3.27\%$
test_exec_td_decorator 0.8846ms 0.1981ms 5.0475 KOps/s 5.0605 KOps/s $\color{#d91a1a}-0.26\%$
test_vmap_mlp_speed[True-True] 1.1771ms 1.0918ms 915.9018 Ops/s 933.9979 Ops/s $\color{#d91a1a}-1.94\%$
test_vmap_mlp_speed[True-False] 0.7217ms 0.6320ms 1.5823 KOps/s 1.6116 KOps/s $\color{#d91a1a}-1.82\%$
test_vmap_mlp_speed[False-True] 1.0831ms 0.9989ms 1.0011 KOps/s 1.0128 KOps/s $\color{#d91a1a}-1.15\%$
test_vmap_mlp_speed[False-False] 0.6352ms 0.5616ms 1.7807 KOps/s 1.8149 KOps/s $\color{#d91a1a}-1.88\%$
test_vmap_mlp_speed_decorator[True-True] 2.9707ms 2.4401ms 409.8271 Ops/s 422.3000 Ops/s $\color{#d91a1a}-2.95\%$
test_vmap_mlp_speed_decorator[True-False] 1.0849ms 0.6749ms 1.4818 KOps/s 1.4864 KOps/s $\color{#d91a1a}-0.31\%$
test_vmap_mlp_speed_decorator[False-True] 2.4839ms 2.0341ms 491.6083 Ops/s 497.7863 Ops/s $\color{#d91a1a}-1.24\%$
test_vmap_mlp_speed_decorator[False-False] 1.0333ms 0.5746ms 1.7403 KOps/s 1.7553 KOps/s $\color{#d91a1a}-0.86\%$
test_vmap_transformer_speed[True-True] 13.3241ms 12.8107ms 78.0599 Ops/s 78.6799 Ops/s $\color{#d91a1a}-0.79\%$
test_vmap_transformer_speed[True-False] 8.6482ms 8.3282ms 120.0746 Ops/s 120.3795 Ops/s $\color{#d91a1a}-0.25\%$
test_vmap_transformer_speed[False-True] 13.1566ms 12.7216ms 78.6062 Ops/s 79.7476 Ops/s $\color{#d91a1a}-1.43\%$
test_vmap_transformer_speed[False-False] 8.5945ms 8.2315ms 121.4846 Ops/s 122.0227 Ops/s $\color{#d91a1a}-0.44\%$
test_vmap_transformer_speed_decorator[True-True] 78.1087ms 75.4882ms 13.2471 Ops/s 13.4497 Ops/s $\color{#d91a1a}-1.51\%$
test_vmap_transformer_speed_decorator[True-False] 22.3113ms 20.2458ms 49.3930 Ops/s 49.9900 Ops/s $\color{#d91a1a}-1.19\%$
test_vmap_transformer_speed_decorator[False-True] 69.6092ms 68.7264ms 14.5504 Ops/s 14.8138 Ops/s $\color{#d91a1a}-1.78\%$
test_vmap_transformer_speed_decorator[False-False] 0.1144s 21.6380ms 46.2150 Ops/s 51.0099 Ops/s $\textbf{\color{#d91a1a}-9.40\%}$

@vmoens vmoens deleted the threads_workers_map branch October 21, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants