Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster update_ #705

Merged
merged 7 commits into from
Mar 8, 2024
Merged

[Performance] Faster update_ #705

merged 7 commits into from
Mar 8, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 8, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 8, 2024
Copy link

github-actions bot commented Mar 8, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}28$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.6080μs 16.7742μs 59.6155 KOps/s 65.3912 KOps/s $\textbf{\color{#d91a1a}-8.83\%}$
test_plain_set_stack_nested 47.2080μs 17.0917μs 58.5079 KOps/s 64.3961 KOps/s $\textbf{\color{#d91a1a}-9.14\%}$
test_plain_set_nested_inplace 51.9970μs 19.5930μs 51.0386 KOps/s 55.9707 KOps/s $\textbf{\color{#d91a1a}-8.81\%}$
test_plain_set_stack_nested_inplace 51.0450μs 19.6943μs 50.7762 KOps/s 55.3362 KOps/s $\textbf{\color{#d91a1a}-8.24\%}$
test_items 26.1290μs 2.4454μs 408.9262 KOps/s 417.6913 KOps/s $\color{#d91a1a}-2.10\%$
test_items_nested 0.4140ms 0.2752ms 3.6331 KOps/s 3.5859 KOps/s $\color{#35bf28}+1.32\%$
test_items_nested_locked 1.7412ms 0.2749ms 3.6377 KOps/s 3.7054 KOps/s $\color{#d91a1a}-1.83\%$
test_items_nested_leaf 0.6529ms 0.1692ms 5.9107 KOps/s 6.0357 KOps/s $\color{#d91a1a}-2.07\%$
test_items_stack_nested 0.4051ms 0.2740ms 3.6501 KOps/s 3.6980 KOps/s $\color{#d91a1a}-1.29\%$
test_items_stack_nested_leaf 0.9413ms 0.1698ms 5.8899 KOps/s 5.9458 KOps/s $\color{#d91a1a}-0.94\%$
test_items_stack_nested_locked 0.4559ms 0.2750ms 3.6362 KOps/s 3.6969 KOps/s $\color{#d91a1a}-1.64\%$
test_keys 22.4310μs 3.8692μs 258.4516 KOps/s 263.1094 KOps/s $\color{#d91a1a}-1.77\%$
test_keys_nested 0.7718ms 0.1473ms 6.7898 KOps/s 6.8504 KOps/s $\color{#d91a1a}-0.88\%$
test_keys_nested_locked 0.2093ms 0.1524ms 6.5634 KOps/s 6.6245 KOps/s $\color{#d91a1a}-0.92\%$
test_keys_nested_leaf 35.5739ms 0.1365ms 7.3268 KOps/s 7.8628 KOps/s $\textbf{\color{#d91a1a}-6.82\%}$
test_keys_stack_nested 0.2474ms 0.1501ms 6.6643 KOps/s 6.8496 KOps/s $\color{#d91a1a}-2.70\%$
test_keys_stack_nested_leaf 0.2588ms 0.1324ms 7.5555 KOps/s 7.8474 KOps/s $\color{#d91a1a}-3.72\%$
test_keys_stack_nested_locked 0.2710ms 0.1544ms 6.4776 KOps/s 6.5755 KOps/s $\color{#d91a1a}-1.49\%$
test_values 12.6410μs 1.1917μs 839.1459 KOps/s 809.4615 KOps/s $\color{#35bf28}+3.67\%$
test_values_nested 0.1047ms 50.8064μs 19.6826 KOps/s 19.4361 KOps/s $\color{#35bf28}+1.27\%$
test_values_nested_locked 91.2590μs 51.2797μs 19.5009 KOps/s 19.5511 KOps/s $\color{#d91a1a}-0.26\%$
test_values_nested_leaf 0.1513ms 45.6529μs 21.9044 KOps/s 21.6493 KOps/s $\color{#35bf28}+1.18\%$
test_values_stack_nested 0.1201ms 52.3516μs 19.1016 KOps/s 19.1898 KOps/s $\color{#d91a1a}-0.46\%$
test_values_stack_nested_leaf 83.8660μs 45.3760μs 22.0381 KOps/s 21.9390 KOps/s $\color{#35bf28}+0.45\%$
test_values_stack_nested_locked 98.5630μs 52.2151μs 19.1515 KOps/s 19.5515 KOps/s $\color{#d91a1a}-2.05\%$
test_membership 19.4460μs 1.3747μs 727.4266 KOps/s 736.2559 KOps/s $\color{#d91a1a}-1.20\%$
test_membership_nested 33.4030μs 3.5432μs 282.2272 KOps/s 286.6400 KOps/s $\color{#d91a1a}-1.54\%$
test_membership_nested_leaf 27.4710μs 3.5428μs 282.2662 KOps/s 279.4608 KOps/s $\color{#35bf28}+1.00\%$
test_membership_stacked_nested 26.5990μs 3.5375μs 282.6875 KOps/s 260.5286 KOps/s $\textbf{\color{#35bf28}+8.51\%}$
test_membership_stacked_nested_leaf 23.1430μs 3.5349μs 282.8967 KOps/s 265.4931 KOps/s $\textbf{\color{#35bf28}+6.56\%}$
test_membership_nested_last 32.5100μs 4.4272μs 225.8759 KOps/s 229.4920 KOps/s $\color{#d91a1a}-1.58\%$
test_membership_nested_leaf_last 33.3220μs 4.3917μs 227.7031 KOps/s 228.1332 KOps/s $\color{#d91a1a}-0.19\%$
test_membership_stacked_nested_last 38.4110μs 13.5856μs 73.6075 KOps/s 224.3840 KOps/s $\textbf{\color{#d91a1a}-67.20\%}$
test_membership_stacked_nested_leaf_last 36.8180μs 13.6310μs 73.3620 KOps/s 230.0190 KOps/s $\textbf{\color{#d91a1a}-68.11\%}$
test_nested_getleaf 34.8250μs 10.7934μs 92.6491 KOps/s 95.2148 KOps/s $\color{#d91a1a}-2.69\%$
test_nested_get 38.9120μs 10.1420μs 98.6003 KOps/s 99.3382 KOps/s $\color{#d91a1a}-0.74\%$
test_stacked_getleaf 76.9630μs 10.5736μs 94.5751 KOps/s 95.6506 KOps/s $\color{#d91a1a}-1.12\%$
test_stacked_get 32.3400μs 10.0509μs 99.4936 KOps/s 99.8975 KOps/s $\color{#d91a1a}-0.40\%$
test_nested_getitemleaf 35.0760μs 11.2139μs 89.1752 KOps/s 91.5527 KOps/s $\color{#d91a1a}-2.60\%$
test_nested_getitem 31.2180μs 10.6397μs 93.9878 KOps/s 96.7237 KOps/s $\color{#d91a1a}-2.83\%$
test_stacked_getitemleaf 39.7140μs 11.1004μs 90.0870 KOps/s 91.2608 KOps/s $\color{#d91a1a}-1.29\%$
test_stacked_getitem 29.8160μs 10.3939μs 96.2104 KOps/s 97.4283 KOps/s $\color{#d91a1a}-1.25\%$
test_lock_nested 0.7008ms 0.3357ms 2.9792 KOps/s 3.0155 KOps/s $\color{#d91a1a}-1.21\%$
test_lock_stack_nested 0.3457ms 0.2888ms 3.4625 KOps/s 3.3494 KOps/s $\color{#35bf28}+3.38\%$
test_unlock_nested 92.4157ms 0.4318ms 2.3159 KOps/s 2.3924 KOps/s $\color{#d91a1a}-3.20\%$
test_unlock_stack_nested 0.3907ms 0.2999ms 3.3340 KOps/s 3.2501 KOps/s $\color{#35bf28}+2.58\%$
test_flatten_speed 0.6391ms 0.2683ms 3.7269 KOps/s 3.6207 KOps/s $\color{#35bf28}+2.93\%$
test_unflatten_speed 0.4760ms 0.4029ms 2.4819 KOps/s 2.5003 KOps/s $\color{#d91a1a}-0.74\%$
test_common_ops 1.1617ms 0.6957ms 1.4373 KOps/s 1.5301 KOps/s $\textbf{\color{#d91a1a}-6.06\%}$
test_creation 17.6420μs 1.8988μs 526.6521 KOps/s 539.1511 KOps/s $\color{#d91a1a}-2.32\%$
test_creation_empty 29.9860μs 10.9832μs 91.0484 KOps/s 120.3333 KOps/s $\textbf{\color{#d91a1a}-24.34\%}$
test_creation_nested_1 33.6430μs 13.6431μs 73.2973 KOps/s 91.4941 KOps/s $\textbf{\color{#d91a1a}-19.89\%}$
test_creation_nested_2 43.4600μs 17.2192μs 58.0747 KOps/s 71.6715 KOps/s $\textbf{\color{#d91a1a}-18.97\%}$
test_clone 56.0940μs 13.4251μs 74.4874 KOps/s 75.7378 KOps/s $\color{#d91a1a}-1.65\%$
test_getitem[int] 1.2785ms 11.3559μs 88.0598 KOps/s 89.7148 KOps/s $\color{#d91a1a}-1.84\%$
test_getitem[slice_int] 85.3820μs 22.4455μs 44.5524 KOps/s 44.3274 KOps/s $\color{#35bf28}+0.51\%$
test_getitem[range] 0.1984ms 42.5682μs 23.4917 KOps/s 24.8693 KOps/s $\textbf{\color{#d91a1a}-5.54\%}$
test_getitem[tuple] 59.7310μs 18.6999μs 53.4761 KOps/s 55.5157 KOps/s $\color{#d91a1a}-3.67\%$
test_getitem[list] 0.1403ms 37.2895μs 26.8172 KOps/s 28.1163 KOps/s $\color{#d91a1a}-4.62\%$
test_setitem_dim[int] 78.1850μs 35.7171μs 27.9978 KOps/s 32.5572 KOps/s $\textbf{\color{#d91a1a}-14.00\%}$
test_setitem_dim[slice_int] 0.1184ms 62.6953μs 15.9502 KOps/s 17.6092 KOps/s $\textbf{\color{#d91a1a}-9.42\%}$
test_setitem_dim[range] 0.1415ms 81.2298μs 12.3108 KOps/s 13.3178 KOps/s $\textbf{\color{#d91a1a}-7.56\%}$
test_setitem_dim[tuple] 73.2360μs 50.6564μs 19.7408 KOps/s 21.6933 KOps/s $\textbf{\color{#d91a1a}-9.00\%}$
test_setitem 61.5950μs 20.2807μs 49.3079 KOps/s 53.2781 KOps/s $\textbf{\color{#d91a1a}-7.45\%}$
test_set 61.5640μs 20.0386μs 49.9037 KOps/s 56.0829 KOps/s $\textbf{\color{#d91a1a}-11.02\%}$
test_set_shared 3.7279ms 0.1398ms 7.1526 KOps/s 7.1525 KOps/s $+0.00\%$
test_update 0.1305ms 22.9260μs 43.6185 KOps/s 49.4064 KOps/s $\textbf{\color{#d91a1a}-11.71\%}$
test_update_nested 90.2470μs 30.2586μs 33.0484 KOps/s 36.5053 KOps/s $\textbf{\color{#d91a1a}-9.47\%}$
test_update__nested 57.1770μs 24.1074μs 41.4810 KOps/s 20.9955 KOps/s $\textbf{\color{#35bf28}+97.57\%}$
test_set_nested 93.4840μs 21.6617μs 46.1644 KOps/s 50.3663 KOps/s $\textbf{\color{#d91a1a}-8.34\%}$
test_set_nested_new 0.1179ms 25.4011μs 39.3684 KOps/s 41.3625 KOps/s $\color{#d91a1a}-4.82\%$
test_select 0.1784ms 40.6833μs 24.5801 KOps/s 25.8226 KOps/s $\color{#d91a1a}-4.81\%$
test_select_nested 0.1257ms 59.7363μs 16.7402 KOps/s 16.5139 KOps/s $\color{#35bf28}+1.37\%$
test_exclude_nested 0.2475ms 0.1192ms 8.3919 KOps/s 8.5300 KOps/s $\color{#d91a1a}-1.62\%$
test_empty[True] 0.6357ms 0.4155ms 2.4068 KOps/s 2.4155 KOps/s $\color{#d91a1a}-0.36\%$
test_empty[False] 5.7868μs 1.0413μs 960.3614 KOps/s 940.0215 KOps/s $\color{#35bf28}+2.16\%$
test_unbind_speed 0.4469ms 0.2470ms 4.0483 KOps/s 4.0688 KOps/s $\color{#d91a1a}-0.50\%$
test_unbind_speed_stack0 0.3655ms 0.2343ms 4.2684 KOps/s 4.1318 KOps/s $\color{#35bf28}+3.31\%$
test_unbind_speed_stack1 0.1232s 0.6541ms 1.5289 KOps/s 1.4371 KOps/s $\textbf{\color{#35bf28}+6.39\%}$
test_split 0.1276s 1.6887ms 592.1848 Ops/s 604.5364 Ops/s $\color{#d91a1a}-2.04\%$
test_chunk 2.3566ms 1.4746ms 678.1602 Ops/s 679.8037 Ops/s $\color{#d91a1a}-0.24\%$
test_creation[device0] 0.2230ms 0.1032ms 9.6893 KOps/s 9.9367 KOps/s $\color{#d91a1a}-2.49\%$
test_creation_from_tensor 5.0387ms 83.6333μs 11.9570 KOps/s 12.3383 KOps/s $\color{#d91a1a}-3.09\%$
test_add_one[memmap_tensor0] 0.1013ms 5.2147μs 191.7674 KOps/s 180.9024 KOps/s $\textbf{\color{#35bf28}+6.01\%}$
test_contiguous[memmap_tensor0] 16.1300μs 0.6846μs 1.4606 MOps/s 1.5290 MOps/s $\color{#d91a1a}-4.47\%$
test_stack[memmap_tensor0] 25.7380μs 3.5565μs 281.1752 KOps/s 274.5264 KOps/s $\color{#35bf28}+2.42\%$
test_memmaptd_index 1.0109ms 0.2413ms 4.1443 KOps/s 4.1259 KOps/s $\color{#35bf28}+0.45\%$
test_memmaptd_index_astensor 5.9069ms 0.3030ms 3.3007 KOps/s 3.3037 KOps/s $\color{#d91a1a}-0.09\%$
test_memmaptd_index_op 1.3396ms 0.6207ms 1.6112 KOps/s 1.7818 KOps/s $\textbf{\color{#d91a1a}-9.58\%}$
test_serialize_model 0.2376s 0.1194s 8.3769 Ops/s 8.4475 Ops/s $\color{#d91a1a}-0.84\%$
test_serialize_model_pickle 0.4769s 0.3761s 2.6590 Ops/s 2.6223 Ops/s $\color{#35bf28}+1.40\%$
test_serialize_weights 0.1060s 97.7062ms 10.2348 Ops/s 10.0345 Ops/s $\color{#35bf28}+2.00\%$
test_serialize_weights_returnearly 0.2465s 0.1334s 7.4974 Ops/s 6.9689 Ops/s $\textbf{\color{#35bf28}+7.58\%}$
test_serialize_weights_pickle 1.0564s 0.5621s 1.7792 Ops/s 2.4439 Ops/s $\textbf{\color{#d91a1a}-27.20\%}$
test_serialize_weights_filesystem 95.5331ms 91.2711ms 10.9564 Ops/s 10.5841 Ops/s $\color{#35bf28}+3.52\%$
test_serialize_model_filesystem 0.1034s 93.8612ms 10.6540 Ops/s 10.2761 Ops/s $\color{#35bf28}+3.68\%$
test_reshape_pytree 54.0610μs 20.9865μs 47.6498 KOps/s 47.6457 KOps/s $+0.01\%$
test_reshape_td 71.7240μs 32.1326μs 31.1210 KOps/s 32.1184 KOps/s $\color{#d91a1a}-3.11\%$
test_view_pytree 49.1910μs 20.9010μs 47.8446 KOps/s 48.1943 KOps/s $\color{#d91a1a}-0.73\%$
test_view_td 0.1236s 60.9269μs 16.4131 KOps/s 15.9129 KOps/s $\color{#35bf28}+3.14\%$
test_unbind_pytree 70.0100μs 25.0766μs 39.8778 KOps/s 41.1612 KOps/s $\color{#d91a1a}-3.12\%$
test_unbind_td 0.4225ms 35.7528μs 27.9699 KOps/s 27.6845 KOps/s $\color{#35bf28}+1.03\%$
test_split_pytree 53.0590μs 24.0299μs 41.6148 KOps/s 41.8565 KOps/s $\color{#d91a1a}-0.58\%$
test_split_td 0.1073ms 39.3643μs 25.4037 KOps/s 25.1911 KOps/s $\color{#35bf28}+0.84\%$
test_add_pytree 65.5620μs 29.4182μs 33.9925 KOps/s 34.1365 KOps/s $\color{#d91a1a}-0.42\%$
test_add_td 0.1555ms 54.2659μs 18.4278 KOps/s 20.2241 KOps/s $\textbf{\color{#d91a1a}-8.88\%}$
test_distributed 0.1748ms 99.8269μs 10.0173 KOps/s 9.8705 KOps/s $\color{#35bf28}+1.49\%$
test_tdmodule 66.2740μs 17.8206μs 56.1149 KOps/s 61.2206 KOps/s $\textbf{\color{#d91a1a}-8.34\%}$
test_tdmodule_dispatch 52.0070μs 33.7972μs 29.5882 KOps/s 32.7924 KOps/s $\textbf{\color{#d91a1a}-9.77\%}$
test_tdseq 41.8780μs 20.7147μs 48.2748 KOps/s 52.1299 KOps/s $\textbf{\color{#d91a1a}-7.40\%}$
test_tdseq_dispatch 70.8110μs 39.7952μs 25.1286 KOps/s 27.4682 KOps/s $\textbf{\color{#d91a1a}-8.52\%}$
test_instantiation_functorch 1.4711ms 1.2938ms 772.9405 Ops/s 776.5865 Ops/s $\color{#d91a1a}-0.47\%$
test_instantiation_td 2.1660ms 1.0269ms 973.8241 Ops/s 1.0003 KOps/s $\color{#d91a1a}-2.64\%$
test_exec_functorch 0.2253ms 0.1547ms 6.4648 KOps/s 6.4048 KOps/s $\color{#35bf28}+0.94\%$
test_exec_functional_call 0.2971ms 0.1440ms 6.9440 KOps/s 6.9657 KOps/s $\color{#d91a1a}-0.31\%$
test_exec_td 0.2055ms 0.1380ms 7.2468 KOps/s 7.0968 KOps/s $\color{#35bf28}+2.11\%$
test_exec_td_decorator 0.3386ms 0.1904ms 5.2520 KOps/s 5.1598 KOps/s $\color{#35bf28}+1.79\%$
test_vmap_mlp_speed[True-True] 0.5528ms 0.4627ms 2.1613 KOps/s 2.1541 KOps/s $\color{#35bf28}+0.33\%$
test_vmap_mlp_speed[True-False] 0.8920ms 0.4769ms 2.0968 KOps/s 2.1371 KOps/s $\color{#d91a1a}-1.89\%$
test_vmap_mlp_speed[False-True] 0.5663ms 0.3804ms 2.6290 KOps/s 2.6200 KOps/s $\color{#35bf28}+0.34\%$
test_vmap_mlp_speed[False-False] 0.5853ms 0.3800ms 2.6317 KOps/s 2.6084 KOps/s $\color{#35bf28}+0.90\%$
test_vmap_mlp_speed_decorator[True-True] 1.1575ms 0.5060ms 1.9763 KOps/s 2.0732 KOps/s $\color{#d91a1a}-4.67\%$
test_vmap_mlp_speed_decorator[True-False] 0.7723ms 0.4907ms 2.0378 KOps/s 2.0636 KOps/s $\color{#d91a1a}-1.25\%$
test_vmap_mlp_speed_decorator[False-True] 0.6143ms 0.3996ms 2.5027 KOps/s 2.5040 KOps/s $\color{#d91a1a}-0.05\%$
test_vmap_mlp_speed_decorator[False-False] 0.8557ms 0.3972ms 2.5179 KOps/s 2.5230 KOps/s $\color{#d91a1a}-0.20\%$
test_to_module_speed[True] 2.1178ms 1.3855ms 721.7693 Ops/s 726.0526 Ops/s $\color{#d91a1a}-0.59\%$
test_to_module_speed[False] 2.2069ms 1.3575ms 736.6596 Ops/s 744.0239 Ops/s $\color{#d91a1a}-0.99\%$

@vmoens vmoens merged commit ed22554 into main Mar 8, 2024
45 of 48 checks passed
vmoens added a commit that referenced this pull request Mar 24, 2024
vmoens added a commit that referenced this pull request Mar 24, 2024
(cherry picked from commit ed22554)
vmoens added a commit that referenced this pull request Mar 25, 2024
(cherry picked from commit ed22554)
@vmoens vmoens deleted the update_refactor branch October 21, 2024 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants