Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Expose call_on_nested to apply and named_apply #768

Merged
merged 1 commit into from
Apr 30, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Apr 30, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 30, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 45.8660μs 17.3875μs 57.5125 KOps/s 57.5888 KOps/s $\color{#d91a1a}-0.13\%$
test_plain_set_stack_nested 47.6190μs 17.5687μs 56.9195 KOps/s 56.7803 KOps/s $\color{#35bf28}+0.25\%$
test_plain_set_nested_inplace 72.5330μs 19.6581μs 50.8697 KOps/s 50.6772 KOps/s $\color{#35bf28}+0.38\%$
test_plain_set_stack_nested_inplace 59.3420μs 19.7965μs 50.5140 KOps/s 50.6642 KOps/s $\color{#d91a1a}-0.30\%$
test_items 40.9570μs 2.5262μs 395.8575 KOps/s 390.5110 KOps/s $\color{#35bf28}+1.37\%$
test_items_nested 0.4935ms 0.2617ms 3.8218 KOps/s 3.7387 KOps/s $\color{#35bf28}+2.22\%$
test_items_nested_locked 0.9443ms 0.2697ms 3.7085 KOps/s 3.6543 KOps/s $\color{#35bf28}+1.48\%$
test_items_nested_leaf 0.2476ms 79.0658μs 12.6477 KOps/s 12.5264 KOps/s $\color{#35bf28}+0.97\%$
test_items_stack_nested 0.4837ms 0.2659ms 3.7610 KOps/s 3.6190 KOps/s $\color{#35bf28}+3.92\%$
test_items_stack_nested_leaf 0.1508ms 78.5630μs 12.7286 KOps/s 12.5356 KOps/s $\color{#35bf28}+1.54\%$
test_items_stack_nested_locked 0.5493ms 0.2696ms 3.7092 KOps/s 3.6652 KOps/s $\color{#35bf28}+1.20\%$
test_keys 26.4500μs 3.8372μs 260.6047 KOps/s 245.0752 KOps/s $\textbf{\color{#35bf28}+6.34\%}$
test_keys_nested 0.2190ms 0.1376ms 7.2685 KOps/s 7.1045 KOps/s $\color{#35bf28}+2.31\%$
test_keys_nested_locked 2.4364ms 0.1436ms 6.9631 KOps/s 6.9287 KOps/s $\color{#35bf28}+0.50\%$
test_keys_nested_leaf 0.2747ms 0.1173ms 8.5251 KOps/s 8.4477 KOps/s $\color{#35bf28}+0.92\%$
test_keys_stack_nested 0.2498ms 0.1393ms 7.1788 KOps/s 7.2864 KOps/s $\color{#d91a1a}-1.48\%$
test_keys_stack_nested_leaf 0.1658ms 0.1173ms 8.5226 KOps/s 8.5574 KOps/s $\color{#d91a1a}-0.41\%$
test_keys_stack_nested_locked 0.2371ms 0.1426ms 7.0122 KOps/s 6.9601 KOps/s $\color{#35bf28}+0.75\%$
test_values 6.5272μs 1.1813μs 846.4943 KOps/s 827.8443 KOps/s $\color{#35bf28}+2.25\%$
test_values_nested 0.1001ms 50.2434μs 19.9031 KOps/s 19.2980 KOps/s $\color{#35bf28}+3.14\%$
test_values_nested_locked 0.1060ms 50.4952μs 19.8039 KOps/s 19.4862 KOps/s $\color{#35bf28}+1.63\%$
test_values_nested_leaf 85.6800μs 45.3529μs 22.0493 KOps/s 21.4689 KOps/s $\color{#35bf28}+2.70\%$
test_values_stack_nested 0.1024ms 51.5478μs 19.3995 KOps/s 19.1460 KOps/s $\color{#35bf28}+1.32\%$
test_values_stack_nested_leaf 92.7540μs 45.3828μs 22.0348 KOps/s 21.9028 KOps/s $\color{#35bf28}+0.60\%$
test_values_stack_nested_locked 0.1017ms 51.2207μs 19.5233 KOps/s 19.2685 KOps/s $\color{#35bf28}+1.32\%$
test_membership 20.3780μs 1.3621μs 734.1478 KOps/s 718.0730 KOps/s $\color{#35bf28}+2.24\%$
test_membership_nested 22.8120μs 3.4446μs 290.3107 KOps/s 281.5604 KOps/s $\color{#35bf28}+3.11\%$
test_membership_nested_leaf 22.7220μs 3.4519μs 289.6972 KOps/s 283.1017 KOps/s $\color{#35bf28}+2.33\%$
test_membership_stacked_nested 17.7530μs 3.3901μs 294.9747 KOps/s 286.4634 KOps/s $\color{#35bf28}+2.97\%$
test_membership_stacked_nested_leaf 21.0390μs 3.4119μs 293.0901 KOps/s 284.5650 KOps/s $\color{#35bf28}+3.00\%$
test_membership_nested_last 42.3190μs 4.1843μs 238.9909 KOps/s 235.3756 KOps/s $\color{#35bf28}+1.54\%$
test_membership_nested_leaf_last 36.4080μs 4.1907μs 238.6213 KOps/s 235.0582 KOps/s $\color{#35bf28}+1.52\%$
test_membership_stacked_nested_last 25.0470μs 4.1155μs 242.9849 KOps/s 206.1749 KOps/s $\textbf{\color{#35bf28}+17.85\%}$
test_membership_stacked_nested_leaf_last 19.7970μs 4.1402μs 241.5365 KOps/s 205.1750 KOps/s $\textbf{\color{#35bf28}+17.72\%}$
test_nested_getleaf 30.8480μs 10.7287μs 93.2077 KOps/s 88.7953 KOps/s $\color{#35bf28}+4.97\%$
test_nested_get 38.9730μs 10.1402μs 98.6172 KOps/s 93.8811 KOps/s $\textbf{\color{#35bf28}+5.04\%}$
test_stacked_getleaf 31.2180μs 10.7018μs 93.4418 KOps/s 89.7894 KOps/s $\color{#35bf28}+4.07\%$
test_stacked_get 23.8540μs 10.0682μs 99.3223 KOps/s 93.3260 KOps/s $\textbf{\color{#35bf28}+6.43\%}$
test_nested_getitemleaf 29.5260μs 11.2404μs 88.9647 KOps/s 88.0485 KOps/s $\color{#35bf28}+1.04\%$
test_nested_getitem 30.9180μs 10.3981μs 96.1715 KOps/s 95.5263 KOps/s $\color{#35bf28}+0.68\%$
test_stacked_getitemleaf 31.8000μs 11.0967μs 90.1166 KOps/s 88.5942 KOps/s $\color{#35bf28}+1.72\%$
test_stacked_getitem 33.2520μs 10.2483μs 97.5770 KOps/s 95.5566 KOps/s $\color{#35bf28}+2.11\%$
test_lock_nested 51.0654ms 0.4074ms 2.4546 KOps/s 2.8786 KOps/s $\textbf{\color{#d91a1a}-14.73\%}$
test_lock_stack_nested 0.3759ms 0.3090ms 3.2358 KOps/s 3.2646 KOps/s $\color{#d91a1a}-0.88\%$
test_unlock_nested 0.6797ms 0.3483ms 2.8715 KOps/s 2.5636 KOps/s $\textbf{\color{#35bf28}+12.01\%}$
test_unlock_stack_nested 0.4213ms 0.3193ms 3.1321 KOps/s 3.1805 KOps/s $\color{#d91a1a}-1.52\%$
test_flatten_speed 0.1833ms 96.1903μs 10.3961 KOps/s 10.5025 KOps/s $\color{#d91a1a}-1.01\%$
test_unflatten_speed 0.5686ms 0.4112ms 2.4321 KOps/s 2.4165 KOps/s $\color{#35bf28}+0.65\%$
test_common_ops 3.5315ms 0.7185ms 1.3918 KOps/s 1.3715 KOps/s $\color{#35bf28}+1.48\%$
test_creation 0.1440ms 2.1488μs 465.3654 KOps/s 507.3845 KOps/s $\textbf{\color{#d91a1a}-8.28\%}$
test_creation_empty 34.0230μs 11.4151μs 87.6034 KOps/s 90.6859 KOps/s $\color{#d91a1a}-3.40\%$
test_creation_nested_1 47.2890μs 14.1664μs 70.5895 KOps/s 73.7223 KOps/s $\color{#d91a1a}-4.25\%$
test_creation_nested_2 46.6870μs 17.5857μs 56.8645 KOps/s 58.5364 KOps/s $\color{#d91a1a}-2.86\%$
test_clone 71.9650μs 13.8203μs 72.3573 KOps/s 73.1367 KOps/s $\color{#d91a1a}-1.07\%$
test_getitem[int] 50.0640μs 11.5200μs 86.8054 KOps/s 86.2233 KOps/s $\color{#35bf28}+0.68\%$
test_getitem[slice_int] 85.4370μs 22.5576μs 44.3310 KOps/s 43.7709 KOps/s $\color{#35bf28}+1.28\%$
test_getitem[range] 76.4940μs 57.1832μs 17.4877 KOps/s 16.9513 KOps/s $\color{#35bf28}+3.16\%$
test_getitem[tuple] 50.2740μs 19.0243μs 52.5644 KOps/s 52.9541 KOps/s $\color{#d91a1a}-0.74\%$
test_getitem[list] 0.1090ms 40.5095μs 24.6856 KOps/s 24.6837 KOps/s $+0.01\%$
test_setitem_dim[int] 79.1090μs 35.1029μs 28.4876 KOps/s 28.5005 KOps/s $\color{#d91a1a}-0.05\%$
test_setitem_dim[slice_int] 0.1002ms 61.7872μs 16.1846 KOps/s 15.9346 KOps/s $\color{#35bf28}+1.57\%$
test_setitem_dim[range] 0.1578ms 87.8483μs 11.3833 KOps/s 11.9955 KOps/s $\textbf{\color{#d91a1a}-5.10\%}$
test_setitem_dim[tuple] 82.4440μs 49.9441μs 20.0224 KOps/s 19.6160 KOps/s $\color{#35bf28}+2.07\%$
test_setitem 0.1461ms 21.4332μs 46.6567 KOps/s 48.7108 KOps/s $\color{#d91a1a}-4.22\%$
test_set 0.1035ms 20.6089μs 48.5228 KOps/s 50.0087 KOps/s $\color{#d91a1a}-2.97\%$
test_set_shared 1.6672ms 0.1366ms 7.3222 KOps/s 7.1524 KOps/s $\color{#35bf28}+2.37\%$
test_update 83.3160μs 22.4359μs 44.5714 KOps/s 45.4444 KOps/s $\color{#d91a1a}-1.92\%$
test_update_nested 73.5980μs 30.8887μs 32.3743 KOps/s 32.7977 KOps/s $\color{#d91a1a}-1.29\%$
test_update__nested 64.0300μs 25.4521μs 39.2895 KOps/s 38.7014 KOps/s $\color{#35bf28}+1.52\%$
test_set_nested 68.5890μs 22.5975μs 44.2527 KOps/s 46.1219 KOps/s $\color{#d91a1a}-4.05\%$
test_set_nested_new 82.7050μs 26.9709μs 37.0770 KOps/s 38.5794 KOps/s $\color{#d91a1a}-3.89\%$
test_select 0.1006ms 41.5870μs 24.0460 KOps/s 24.6339 KOps/s $\color{#d91a1a}-2.39\%$
test_select_nested 0.1179ms 60.8090μs 16.4449 KOps/s 15.8858 KOps/s $\color{#35bf28}+3.52\%$
test_exclude_nested 0.2117ms 0.1205ms 8.2987 KOps/s 8.1362 KOps/s $\color{#35bf28}+2.00\%$
test_empty[True] 0.6822ms 0.3998ms 2.5011 KOps/s 2.5055 KOps/s $\color{#d91a1a}-0.17\%$
test_empty[False] 6.3980μs 1.0651μs 938.8601 KOps/s 867.5088 KOps/s $\textbf{\color{#35bf28}+8.22\%}$
test_unbind_speed 0.3107ms 0.2575ms 3.8836 KOps/s 3.8214 KOps/s $\color{#35bf28}+1.63\%$
test_unbind_speed_stack0 0.3396ms 0.2593ms 3.8564 KOps/s 3.9715 KOps/s $\color{#d91a1a}-2.90\%$
test_unbind_speed_stack1 68.8279ms 0.7123ms 1.4038 KOps/s 1.3429 KOps/s $\color{#35bf28}+4.53\%$
test_split 64.6207ms 1.5767ms 634.2433 Ops/s 606.3393 Ops/s $\color{#35bf28}+4.60\%$
test_chunk 61.8026ms 1.5753ms 634.7940 Ops/s 628.9194 Ops/s $\color{#35bf28}+0.93\%$
test_creation[device0] 3.3863ms 0.1039ms 9.6249 KOps/s 9.8806 KOps/s $\color{#d91a1a}-2.59\%$
test_creation_from_tensor 0.1618ms 81.2795μs 12.3032 KOps/s 12.1100 KOps/s $\color{#35bf28}+1.60\%$
test_add_one[memmap_tensor0] 57.9790μs 5.3993μs 185.2076 KOps/s 178.0149 KOps/s $\color{#35bf28}+4.04\%$
test_contiguous[memmap_tensor0] 15.6090μs 0.6310μs 1.5848 MOps/s 1.5511 MOps/s $\color{#35bf28}+2.17\%$
test_stack[memmap_tensor0] 20.7290μs 3.4944μs 286.1732 KOps/s 277.1986 KOps/s $\color{#35bf28}+3.24\%$
test_memmaptd_index 1.0908ms 0.2377ms 4.2077 KOps/s 4.2135 KOps/s $\color{#d91a1a}-0.14\%$
test_memmaptd_index_astensor 0.6543ms 0.3117ms 3.2082 KOps/s 3.2022 KOps/s $\color{#35bf28}+0.19\%$
test_memmaptd_index_op 1.1953ms 0.6074ms 1.6463 KOps/s 1.6389 KOps/s $\color{#35bf28}+0.45\%$
test_serialize_model 0.1024s 97.4933ms 10.2571 Ops/s 9.2438 Ops/s $\textbf{\color{#35bf28}+10.96\%}$
test_serialize_model_pickle 0.4470s 0.3753s 2.6648 Ops/s 2.6250 Ops/s $\color{#35bf28}+1.52\%$
test_serialize_weights 0.1636s 0.1041s 9.6051 Ops/s 9.0921 Ops/s $\textbf{\color{#35bf28}+5.64\%}$
test_serialize_weights_returnearly 0.1342s 0.1254s 7.9776 Ops/s 8.0985 Ops/s $\color{#d91a1a}-1.49\%$
test_serialize_weights_pickle 1.0906s 0.6092s 1.6415 Ops/s 2.4194 Ops/s $\textbf{\color{#d91a1a}-32.15\%}$
test_serialize_weights_filesystem 98.0904ms 92.0753ms 10.8607 Ops/s 10.4497 Ops/s $\color{#35bf28}+3.93\%$
test_serialize_model_filesystem 99.3249ms 92.3716ms 10.8258 Ops/s 10.0926 Ops/s $\textbf{\color{#35bf28}+7.26\%}$
test_reshape_pytree 61.5660μs 25.3305μs 39.4781 KOps/s 38.9826 KOps/s $\color{#35bf28}+1.27\%$
test_reshape_td 82.3840μs 32.8849μs 30.4091 KOps/s 30.4717 KOps/s $\color{#d91a1a}-0.21\%$
test_view_pytree 65.3320μs 25.4689μs 39.2636 KOps/s 39.2613 KOps/s $+0.01\%$
test_view_td 0.1001ms 36.5237μs 27.3795 KOps/s 25.2249 KOps/s $\textbf{\color{#35bf28}+8.54\%}$
test_unbind_pytree 66.4550μs 29.4983μs 33.9003 KOps/s 33.6929 KOps/s $\color{#35bf28}+0.62\%$
test_unbind_td 0.4193ms 38.1850μs 26.1883 KOps/s 25.7922 KOps/s $\color{#35bf28}+1.54\%$
test_split_pytree 59.4120μs 29.2107μs 34.2340 KOps/s 33.5753 KOps/s $\color{#35bf28}+1.96\%$
test_split_td 0.4781ms 40.9331μs 24.4301 KOps/s 24.5906 KOps/s $\color{#d91a1a}-0.65\%$
test_add_pytree 89.4380μs 35.0470μs 28.5331 KOps/s 28.2593 KOps/s $\color{#35bf28}+0.97\%$
test_add_td 0.1116ms 55.3076μs 18.0807 KOps/s 17.6266 KOps/s $\color{#35bf28}+2.58\%$
test_distributed 0.1717ms 98.6346μs 10.1384 KOps/s 9.9500 KOps/s $\color{#35bf28}+1.89\%$
test_tdmodule 29.0450μs 17.7868μs 56.2215 KOps/s 56.4127 KOps/s $\color{#d91a1a}-0.34\%$
test_tdmodule_dispatch 69.3490μs 35.9274μs 27.8339 KOps/s 28.4554 KOps/s $\color{#d91a1a}-2.18\%$
test_tdseq 37.5510μs 21.0585μs 47.4869 KOps/s 48.8782 KOps/s $\color{#d91a1a}-2.85\%$
test_tdseq_dispatch 84.0770μs 41.4832μs 24.1061 KOps/s 25.1342 KOps/s $\color{#d91a1a}-4.09\%$
test_instantiation_functorch 1.8883ms 1.2896ms 775.4148 Ops/s 769.4230 Ops/s $\color{#35bf28}+0.78\%$
test_instantiation_td 1.4825ms 1.0002ms 999.8069 Ops/s 989.3972 Ops/s $\color{#35bf28}+1.05\%$
test_exec_functorch 0.3091ms 0.1572ms 6.3611 KOps/s 6.2072 KOps/s $\color{#35bf28}+2.48\%$
test_exec_functional_call 0.2596ms 0.1468ms 6.8114 KOps/s 6.6527 KOps/s $\color{#35bf28}+2.39\%$
test_exec_td 0.2362ms 0.1419ms 7.0461 KOps/s 6.7844 KOps/s $\color{#35bf28}+3.86\%$
test_exec_td_decorator 0.5925ms 0.2199ms 4.5480 KOps/s 4.4629 KOps/s $\color{#35bf28}+1.91\%$
test_vmap_mlp_speed[True-True] 0.9148ms 0.4806ms 2.0809 KOps/s 2.0646 KOps/s $\color{#35bf28}+0.79\%$
test_vmap_mlp_speed[True-False] 0.8276ms 0.4725ms 2.1166 KOps/s 2.0738 KOps/s $\color{#35bf28}+2.06\%$
test_vmap_mlp_speed[False-True] 0.5878ms 0.3833ms 2.6092 KOps/s 2.5560 KOps/s $\color{#35bf28}+2.08\%$
test_vmap_mlp_speed[False-False] 0.5501ms 0.3835ms 2.6077 KOps/s 2.5537 KOps/s $\color{#35bf28}+2.11\%$
test_vmap_mlp_speed_decorator[True-True] 1.0979ms 0.5426ms 1.8429 KOps/s 1.8128 KOps/s $\color{#35bf28}+1.66\%$
test_vmap_mlp_speed_decorator[True-False] 0.6803ms 0.5397ms 1.8530 KOps/s 1.8271 KOps/s $\color{#35bf28}+1.42\%$
test_vmap_mlp_speed_decorator[False-True] 0.7449ms 0.4438ms 2.2531 KOps/s 2.2332 KOps/s $\color{#35bf28}+0.89\%$
test_vmap_mlp_speed_decorator[False-False] 68.7637ms 0.4785ms 2.0897 KOps/s 2.2235 KOps/s $\textbf{\color{#d91a1a}-6.01\%}$
test_to_module_speed[True] 1.9206ms 1.6791ms 595.5470 Ops/s 583.4617 Ops/s $\color{#35bf28}+2.07\%$
test_to_module_speed[False] 2.5697ms 1.6515ms 605.4985 Ops/s 604.3617 Ops/s $\color{#35bf28}+0.19\%$

@vmoens vmoens added the enhancement New feature or request label Apr 30, 2024
@vmoens vmoens merged commit 1f78271 into main Apr 30, 2024
35 of 38 checks passed
@vmoens vmoens deleted the expose-call-on-nested branch April 30, 2024 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants