Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Unlock td during update in to_module #686

Merged
merged 1 commit into from
Feb 21, 2024
Merged

[BugFix] Unlock td during update in to_module #686

merged 1 commit into from
Feb 21, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 21, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 21, 2024
@vmoens vmoens added the bug Something isn't working label Feb 21, 2024
@vmoens vmoens merged commit 87aaa6e into main Feb 21, 2024
18 of 33 checks passed
@vmoens vmoens deleted the unlock_module branch February 21, 2024 18:10
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 126. Improved: $\large\color{#35bf28}26$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 55.3130μs 15.7528μs 63.4808 KOps/s 54.7373 KOps/s $\textbf{\color{#35bf28}+15.97\%}$
test_plain_set_stack_nested 41.9380μs 16.0227μs 62.4116 KOps/s 54.1133 KOps/s $\textbf{\color{#35bf28}+15.34\%}$
test_plain_set_nested_inplace 61.0430μs 18.6016μs 53.7589 KOps/s 48.5842 KOps/s $\textbf{\color{#35bf28}+10.65\%}$
test_plain_set_stack_nested_inplace 89.8870μs 18.5056μs 54.0378 KOps/s 47.9430 KOps/s $\textbf{\color{#35bf28}+12.71\%}$
test_items 17.9130μs 2.6922μs 371.4495 KOps/s 404.4758 KOps/s $\textbf{\color{#d91a1a}-8.17\%}$
test_items_nested 0.8537ms 0.2691ms 3.7167 KOps/s 3.6710 KOps/s $\color{#35bf28}+1.24\%$
test_items_nested_locked 0.4108ms 0.2713ms 3.6859 KOps/s 3.6787 KOps/s $\color{#35bf28}+0.20\%$
test_items_nested_leaf 0.6803ms 0.1671ms 5.9862 KOps/s 5.8088 KOps/s $\color{#35bf28}+3.05\%$
test_items_stack_nested 0.8601ms 0.2707ms 3.6939 KOps/s 3.6111 KOps/s $\color{#35bf28}+2.29\%$
test_items_stack_nested_leaf 0.3181ms 0.1661ms 6.0193 KOps/s 5.8291 KOps/s $\color{#35bf28}+3.26\%$
test_items_stack_nested_locked 0.4186ms 0.2708ms 3.6929 KOps/s 3.5806 KOps/s $\color{#35bf28}+3.14\%$
test_keys 21.5510μs 3.8057μs 262.7651 KOps/s 254.8664 KOps/s $\color{#35bf28}+3.10\%$
test_keys_nested 2.6038ms 0.1505ms 6.6430 KOps/s 6.6553 KOps/s $\color{#d91a1a}-0.18\%$
test_keys_nested_locked 0.3035ms 0.1565ms 6.3904 KOps/s 6.4734 KOps/s $\color{#d91a1a}-1.28\%$
test_keys_nested_leaf 51.6327ms 0.1407ms 7.1074 KOps/s 7.5941 KOps/s $\textbf{\color{#d91a1a}-6.41\%}$
test_keys_stack_nested 0.2589ms 0.1540ms 6.4936 KOps/s 6.6128 KOps/s $\color{#d91a1a}-1.80\%$
test_keys_stack_nested_leaf 0.2592ms 0.1357ms 7.3710 KOps/s 7.5640 KOps/s $\color{#d91a1a}-2.55\%$
test_keys_stack_nested_locked 0.2648ms 0.1589ms 6.2919 KOps/s 6.4587 KOps/s $\color{#d91a1a}-2.58\%$
test_values 12.4336μs 1.1854μs 843.6109 KOps/s 866.2304 KOps/s $\color{#d91a1a}-2.61\%$
test_values_nested 98.4430μs 52.7888μs 18.9434 KOps/s 19.2330 KOps/s $\color{#d91a1a}-1.51\%$
test_values_nested_locked 0.2140ms 53.4820μs 18.6979 KOps/s 19.1450 KOps/s $\color{#d91a1a}-2.34\%$
test_values_nested_leaf 95.8180μs 46.4945μs 21.5079 KOps/s 21.7130 KOps/s $\color{#d91a1a}-0.94\%$
test_values_stack_nested 0.1001ms 53.0274μs 18.8582 KOps/s 18.7802 KOps/s $\color{#35bf28}+0.42\%$
test_values_stack_nested_leaf 0.1285ms 46.4021μs 21.5508 KOps/s 21.4183 KOps/s $\color{#35bf28}+0.62\%$
test_values_stack_nested_locked 0.1215ms 53.0415μs 18.8532 KOps/s 18.8612 KOps/s $\color{#d91a1a}-0.04\%$
test_membership 15.6690μs 1.3507μs 740.3798 KOps/s 739.9371 KOps/s $\color{#35bf28}+0.06\%$
test_membership_nested 40.9970μs 3.4804μs 287.3199 KOps/s 284.2040 KOps/s $\color{#35bf28}+1.10\%$
test_membership_nested_leaf 48.5700μs 3.5019μs 285.5583 KOps/s 282.7355 KOps/s $\color{#35bf28}+1.00\%$
test_membership_stacked_nested 23.1530μs 3.4416μs 290.5593 KOps/s 282.7299 KOps/s $\color{#35bf28}+2.77\%$
test_membership_stacked_nested_leaf 22.1910μs 3.4720μs 288.0220 KOps/s 285.1181 KOps/s $\color{#35bf28}+1.02\%$
test_membership_nested_last 52.1570μs 6.6727μs 149.8645 KOps/s 147.6511 KOps/s $\color{#35bf28}+1.50\%$
test_membership_nested_leaf_last 54.7920μs 6.6638μs 150.0636 KOps/s 146.7674 KOps/s $\color{#35bf28}+2.25\%$
test_membership_stacked_nested_last 41.3970μs 8.1717μs 122.3735 KOps/s 148.5963 KOps/s $\textbf{\color{#d91a1a}-17.65\%}$
test_membership_stacked_nested_leaf_last 34.6440μs 8.0598μs 124.0719 KOps/s 147.0703 KOps/s $\textbf{\color{#d91a1a}-15.64\%}$
test_nested_getleaf 48.1390μs 10.6827μs 93.6090 KOps/s 91.8463 KOps/s $\color{#35bf28}+1.92\%$
test_nested_get 57.8870μs 10.2106μs 97.9374 KOps/s 97.3748 KOps/s $\color{#35bf28}+0.58\%$
test_stacked_getleaf 50.6340μs 10.5212μs 95.0465 KOps/s 92.5001 KOps/s $\color{#35bf28}+2.75\%$
test_stacked_get 36.5880μs 9.8960μs 101.0513 KOps/s 99.2822 KOps/s $\color{#35bf28}+1.78\%$
test_nested_getitemleaf 52.3070μs 12.0672μs 82.8693 KOps/s 81.5266 KOps/s $\color{#35bf28}+1.65\%$
test_nested_getitem 53.6200μs 11.4901μs 87.0315 KOps/s 84.1836 KOps/s $\color{#35bf28}+3.38\%$
test_stacked_getitemleaf 52.0570μs 11.8592μs 84.3229 KOps/s 82.8738 KOps/s $\color{#35bf28}+1.75\%$
test_stacked_getitem 49.8730μs 11.2841μs 88.6206 KOps/s 81.2035 KOps/s $\textbf{\color{#35bf28}+9.13\%}$
test_lock_nested 0.8439ms 0.3414ms 2.9291 KOps/s 2.8605 KOps/s $\color{#35bf28}+2.40\%$
test_lock_stack_nested 0.4176ms 0.2991ms 3.3438 KOps/s 3.2642 KOps/s $\color{#35bf28}+2.44\%$
test_unlock_nested 0.1045s 0.4511ms 2.2167 KOps/s 2.1265 KOps/s $\color{#35bf28}+4.24\%$
test_unlock_stack_nested 0.4586ms 0.3062ms 3.2655 KOps/s 3.1959 KOps/s $\color{#35bf28}+2.18\%$
test_flatten_speed 0.7813ms 0.3680ms 2.7174 KOps/s 2.7207 KOps/s $\color{#d91a1a}-0.12\%$
test_unflatten_speed 0.5379ms 0.4539ms 2.2033 KOps/s 2.1451 KOps/s $\color{#35bf28}+2.72\%$
test_common_ops 1.3333ms 0.6428ms 1.5558 KOps/s 1.3496 KOps/s $\textbf{\color{#35bf28}+15.28\%}$
test_creation 29.2240μs 1.8420μs 542.8838 KOps/s 542.0496 KOps/s $\color{#35bf28}+0.15\%$
test_creation_empty 30.0160μs 7.9146μs 126.3485 KOps/s 86.6102 KOps/s $\textbf{\color{#35bf28}+45.88\%}$
test_creation_nested_1 32.9210μs 10.6282μs 94.0893 KOps/s 69.3738 KOps/s $\textbf{\color{#35bf28}+35.63\%}$
test_creation_nested_2 46.0750μs 13.7213μs 72.8792 KOps/s 55.5576 KOps/s $\textbf{\color{#35bf28}+31.18\%}$
test_clone 0.1691ms 12.9850μs 77.0120 KOps/s 75.1542 KOps/s $\color{#35bf28}+2.47\%$
test_getitem[int] 37.4200μs 11.1165μs 89.9568 KOps/s 91.9881 KOps/s $\color{#d91a1a}-2.21\%$
test_getitem[slice_int] 56.8560μs 21.9009μs 45.6601 KOps/s 43.9553 KOps/s $\color{#35bf28}+3.88\%$
test_getitem[range] 0.3288ms 41.2476μs 24.2438 KOps/s 24.5013 KOps/s $\color{#d91a1a}-1.05\%$
test_getitem[tuple] 53.5000μs 18.0128μs 55.5162 KOps/s 54.1340 KOps/s $\color{#35bf28}+2.55\%$
test_getitem[list] 0.3655ms 36.4514μs 27.4338 KOps/s 27.4800 KOps/s $\color{#d91a1a}-0.17\%$
test_setitem_dim[int] 61.5940μs 27.1711μs 36.8038 KOps/s 29.9330 KOps/s $\textbf{\color{#35bf28}+22.95\%}$
test_setitem_dim[slice_int] 95.3580μs 54.3492μs 18.3995 KOps/s 17.2923 KOps/s $\textbf{\color{#35bf28}+6.40\%}$
test_setitem_dim[range] 0.1484ms 74.0610μs 13.5024 KOps/s 13.0350 KOps/s $\color{#35bf28}+3.59\%$
test_setitem_dim[tuple] 67.4560μs 42.1246μs 23.7391 KOps/s 21.1190 KOps/s $\textbf{\color{#35bf28}+12.41\%}$
test_setitem 0.1846ms 17.9972μs 55.5643 KOps/s 48.5316 KOps/s $\textbf{\color{#35bf28}+14.49\%}$
test_set 0.1975ms 17.1120μs 58.4384 KOps/s 50.0896 KOps/s $\textbf{\color{#35bf28}+16.67\%}$
test_set_shared 1.2122ms 0.1436ms 6.9616 KOps/s 6.9101 KOps/s $\color{#35bf28}+0.75\%$
test_update 0.1901ms 19.0555μs 52.4784 KOps/s 42.2652 KOps/s $\textbf{\color{#35bf28}+24.16\%}$
test_update_nested 0.1866ms 26.3298μs 37.9797 KOps/s 32.1056 KOps/s $\textbf{\color{#35bf28}+18.30\%}$
test_set_nested 0.2469ms 19.5287μs 51.2066 KOps/s 45.1797 KOps/s $\textbf{\color{#35bf28}+13.34\%}$
test_set_nested_new 0.1815ms 23.1774μs 43.1455 KOps/s 38.8862 KOps/s $\textbf{\color{#35bf28}+10.95\%}$
test_select 0.2086ms 36.5927μs 27.3278 KOps/s 25.1422 KOps/s $\textbf{\color{#35bf28}+8.69\%}$
test_select_nested 0.1524ms 57.0775μs 17.5200 KOps/s 17.1109 KOps/s $\color{#35bf28}+2.39\%$
test_exclude_nested 0.2838ms 0.1171ms 8.5408 KOps/s 8.4374 KOps/s $\color{#35bf28}+1.23\%$
test_empty[True] 1.3865ms 0.4039ms 2.4759 KOps/s 2.4247 KOps/s $\color{#35bf28}+2.11\%$
test_empty[False] 6.9010μs 1.0513μs 951.1949 KOps/s 948.7607 KOps/s $\color{#35bf28}+0.26\%$
test_unbind_speed 0.3241ms 0.2409ms 4.1517 KOps/s 3.9661 KOps/s $\color{#35bf28}+4.68\%$
test_unbind_speed_stack0 0.4743ms 0.2355ms 4.2469 KOps/s 4.1730 KOps/s $\color{#35bf28}+1.77\%$
test_unbind_speed_stack1 0.1580s 0.6840ms 1.4621 KOps/s 1.4272 KOps/s $\color{#35bf28}+2.45\%$
test_split 0.1519s 1.6777ms 596.0512 Ops/s 579.0229 Ops/s $\color{#35bf28}+2.94\%$
test_chunk 1.6793ms 1.4412ms 693.8479 Ops/s 672.0982 Ops/s $\color{#35bf28}+3.24\%$
test_creation[device0] 0.2900ms 0.1042ms 9.5973 KOps/s 9.5625 KOps/s $\color{#35bf28}+0.36\%$
test_creation_from_tensor 7.0291ms 83.5370μs 11.9708 KOps/s 11.8929 KOps/s $\color{#35bf28}+0.65\%$
test_add_one[memmap_tensor0] 0.1740ms 5.1583μs 193.8606 KOps/s 186.1398 KOps/s $\color{#35bf28}+4.15\%$
test_contiguous[memmap_tensor0] 17.1210μs 0.6554μs 1.5259 MOps/s 1.5836 MOps/s $\color{#d91a1a}-3.64\%$
test_stack[memmap_tensor0] 40.4150μs 3.5438μs 282.1825 KOps/s 272.2987 KOps/s $\color{#35bf28}+3.63\%$
test_memmaptd_index 1.0162ms 0.2362ms 4.2337 KOps/s 4.0780 KOps/s $\color{#35bf28}+3.82\%$
test_memmaptd_index_astensor 0.5598ms 0.2977ms 3.3591 KOps/s 3.2339 KOps/s $\color{#35bf28}+3.87\%$
test_memmaptd_index_op 1.1573ms 0.5442ms 1.8377 KOps/s 1.5976 KOps/s $\textbf{\color{#35bf28}+15.03\%}$
test_serialize_model 0.2604s 0.1276s 7.8396 Ops/s 7.9862 Ops/s $\color{#d91a1a}-1.84\%$
test_serialize_model_pickle 0.4611s 0.3797s 2.6337 Ops/s 2.6097 Ops/s $\color{#35bf28}+0.92\%$
test_serialize_weights 0.1133s 0.1049s 9.5307 Ops/s 9.7099 Ops/s $\color{#d91a1a}-1.85\%$
test_serialize_weights_returnearly 0.2641s 0.1403s 7.1282 Ops/s 7.5662 Ops/s $\textbf{\color{#d91a1a}-5.79\%}$
test_serialize_weights_pickle 0.6424s 0.4521s 2.2121 Ops/s 2.4162 Ops/s $\textbf{\color{#d91a1a}-8.45\%}$
test_serialize_weights_filesystem 0.1209s 98.6476ms 10.1371 Ops/s 8.8627 Ops/s $\textbf{\color{#35bf28}+14.38\%}$
test_serialize_model_filesystem 0.1047s 98.3148ms 10.1714 Ops/s 9.8691 Ops/s $\color{#35bf28}+3.06\%$
test_reshape_pytree 56.6550μs 21.3195μs 46.9055 KOps/s 46.8838 KOps/s $\color{#35bf28}+0.05\%$
test_reshape_td 80.8000μs 31.2043μs 32.0469 KOps/s 31.7130 KOps/s $\color{#35bf28}+1.05\%$
test_view_pytree 73.9480μs 21.0554μs 47.4937 KOps/s 46.7409 KOps/s $\color{#35bf28}+1.61\%$
test_view_td 0.1553s 66.4696μs 15.0445 KOps/s 15.1315 KOps/s $\color{#d91a1a}-0.58\%$
test_unbind_pytree 74.0080μs 24.3694μs 41.0350 KOps/s 39.6770 KOps/s $\color{#35bf28}+3.42\%$
test_unbind_td 0.5427ms 36.0114μs 27.7690 KOps/s 27.5920 KOps/s $\color{#35bf28}+0.64\%$
test_split_pytree 62.4560μs 23.6229μs 42.3318 KOps/s 41.4652 KOps/s $\color{#35bf28}+2.09\%$
test_split_td 0.1737ms 40.6731μs 24.5863 KOps/s 24.9277 KOps/s $\color{#d91a1a}-1.37\%$
test_add_pytree 83.5250μs 29.0266μs 34.4511 KOps/s 33.5736 KOps/s $\color{#35bf28}+2.61\%$
test_add_td 0.1082ms 48.3646μs 20.6763 KOps/s 17.9236 KOps/s $\textbf{\color{#35bf28}+15.36\%}$
test_distributed 0.2576ms 0.1044ms 9.5759 KOps/s 9.6342 KOps/s $\color{#d91a1a}-0.61\%$
test_tdmodule 0.4057ms 20.3011μs 49.2583 KOps/s 43.7164 KOps/s $\textbf{\color{#35bf28}+12.68\%}$
test_tdmodule_dispatch 0.1919ms 38.8289μs 25.7540 KOps/s 22.2174 KOps/s $\textbf{\color{#35bf28}+15.92\%}$
test_tdseq 0.1131ms 23.3928μs 42.7482 KOps/s 38.4703 KOps/s $\textbf{\color{#35bf28}+11.12\%}$
test_tdseq_dispatch 0.4864ms 44.5258μs 22.4589 KOps/s 20.3159 KOps/s $\textbf{\color{#35bf28}+10.55\%}$
test_instantiation_functorch 1.7893ms 1.3248ms 754.8127 Ops/s 754.8994 Ops/s $\color{#d91a1a}-0.01\%$
test_instantiation_td 2.3350ms 1.0546ms 948.2117 Ops/s 977.8102 Ops/s $\color{#d91a1a}-3.03\%$
test_exec_functorch 0.3159ms 0.1559ms 6.4131 KOps/s 6.1681 KOps/s $\color{#35bf28}+3.97\%$
test_exec_functional_call 0.3546ms 0.1471ms 6.7967 KOps/s 6.5835 KOps/s $\color{#35bf28}+3.24\%$
test_exec_td 0.2419ms 0.1455ms 6.8744 KOps/s 6.6610 KOps/s $\color{#35bf28}+3.20\%$
test_exec_td_decorator 0.9647ms 0.1951ms 5.1262 KOps/s 5.0108 KOps/s $\color{#35bf28}+2.30\%$
test_vmap_mlp_speed[True-True] 0.9262ms 0.4792ms 2.0866 KOps/s 2.1043 KOps/s $\color{#d91a1a}-0.84\%$
test_vmap_mlp_speed[True-False] 0.6700ms 0.4712ms 2.1220 KOps/s 2.1194 KOps/s $\color{#35bf28}+0.13\%$
test_vmap_mlp_speed[False-True] 0.6508ms 0.3966ms 2.5212 KOps/s 2.5054 KOps/s $\color{#35bf28}+0.63\%$
test_vmap_mlp_speed[False-False] 0.7288ms 0.3965ms 2.5223 KOps/s 2.6063 KOps/s $\color{#d91a1a}-3.22\%$
test_vmap_mlp_speed_decorator[True-True] 1.1322ms 0.5288ms 1.8911 KOps/s 1.9199 KOps/s $\color{#d91a1a}-1.50\%$
test_vmap_mlp_speed_decorator[True-False] 0.7697ms 0.5226ms 1.9137 KOps/s 1.9171 KOps/s $\color{#d91a1a}-0.18\%$
test_vmap_mlp_speed_decorator[False-True] 0.6618ms 0.4117ms 2.4292 KOps/s 2.4887 KOps/s $\color{#d91a1a}-2.39\%$
test_vmap_mlp_speed_decorator[False-False] 0.7775ms 0.4108ms 2.4344 KOps/s 2.4931 KOps/s $\color{#d91a1a}-2.36\%$
test_to_module_speed[True] 2.1332ms 1.3796ms 724.8585 Ops/s 718.9535 Ops/s $\color{#35bf28}+0.82\%$
test_to_module_speed[False] 2.2649ms 1.3714ms 729.1680 Ops/s 736.8825 Ops/s $\color{#d91a1a}-1.05\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 134. Improved: $\large\color{#35bf28}26$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.5259ms 13.3203μs 75.0732 KOps/s 71.5663 KOps/s $\color{#35bf28}+4.90\%$
test_plain_set_stack_nested 27.4100μs 13.2776μs 75.3150 KOps/s 71.2701 KOps/s $\textbf{\color{#35bf28}+5.68\%}$
test_plain_set_nested_inplace 30.2300μs 14.5964μs 68.5100 KOps/s 64.6988 KOps/s $\textbf{\color{#35bf28}+5.89\%}$
test_plain_set_stack_nested_inplace 38.1200μs 14.7242μs 67.9154 KOps/s 64.6361 KOps/s $\textbf{\color{#35bf28}+5.07\%}$
test_items 20.0600μs 4.7190μs 211.9092 KOps/s 210.7536 KOps/s $\color{#35bf28}+0.55\%$
test_items_nested 0.3898ms 0.3385ms 2.9543 KOps/s 2.9504 KOps/s $\color{#35bf28}+0.13\%$
test_items_nested_locked 0.3955ms 0.3436ms 2.9104 KOps/s 2.9064 KOps/s $\color{#35bf28}+0.14\%$
test_items_nested_leaf 0.2700ms 0.2027ms 4.9323 KOps/s 4.9698 KOps/s $\color{#d91a1a}-0.76\%$
test_items_stack_nested 0.3869ms 0.3413ms 2.9302 KOps/s 2.9301 KOps/s $+0.00\%$
test_items_stack_nested_leaf 0.2431ms 0.2024ms 4.9414 KOps/s 4.9841 KOps/s $\color{#d91a1a}-0.86\%$
test_items_stack_nested_locked 0.4441ms 0.3448ms 2.9000 KOps/s 2.9147 KOps/s $\color{#d91a1a}-0.50\%$
test_keys 21.2000μs 4.5887μs 217.9290 KOps/s 217.8753 KOps/s $\color{#35bf28}+0.02\%$
test_keys_nested 43.9452ms 0.1007ms 9.9319 KOps/s 10.5374 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_keys_nested_locked 0.1431ms 98.2134μs 10.1819 KOps/s 10.1228 KOps/s $\color{#35bf28}+0.58\%$
test_keys_nested_leaf 0.1141ms 77.4708μs 12.9081 KOps/s 12.8193 KOps/s $\color{#35bf28}+0.69\%$
test_keys_stack_nested 0.1111ms 94.7175μs 10.5577 KOps/s 10.6695 KOps/s $\color{#d91a1a}-1.05\%$
test_keys_stack_nested_leaf 95.0610μs 78.1930μs 12.7889 KOps/s 12.9214 KOps/s $\color{#d91a1a}-1.03\%$
test_keys_stack_nested_locked 0.1311ms 98.8186μs 10.1196 KOps/s 10.1103 KOps/s $\color{#35bf28}+0.09\%$
test_values 6.3033μs 1.8977μs 526.9652 KOps/s 521.7010 KOps/s $\color{#35bf28}+1.01\%$
test_values_nested 63.6010μs 45.1907μs 22.1285 KOps/s 21.9041 KOps/s $\color{#35bf28}+1.02\%$
test_values_nested_locked 61.0410μs 47.6331μs 20.9938 KOps/s 20.7306 KOps/s $\color{#35bf28}+1.27\%$
test_values_nested_leaf 61.1510μs 39.5438μs 25.2884 KOps/s 25.1098 KOps/s $\color{#35bf28}+0.71\%$
test_values_stack_nested 62.3610μs 45.9817μs 21.7478 KOps/s 21.7094 KOps/s $\color{#35bf28}+0.18\%$
test_values_stack_nested_leaf 56.2400μs 39.8360μs 25.1029 KOps/s 24.8566 KOps/s $\color{#35bf28}+0.99\%$
test_values_stack_nested_locked 68.1010μs 48.0896μs 20.7945 KOps/s 20.6540 KOps/s $\color{#35bf28}+0.68\%$
test_membership 4.2680μs 0.9324μs 1.0725 MOps/s 1.0588 MOps/s $\color{#35bf28}+1.29\%$
test_membership_nested 25.3300μs 2.9273μs 341.6076 KOps/s 341.7860 KOps/s $\color{#d91a1a}-0.05\%$
test_membership_nested_leaf 20.9400μs 2.9461μs 339.4359 KOps/s 340.0326 KOps/s $\color{#d91a1a}-0.18\%$
test_membership_stacked_nested 23.6000μs 2.9266μs 341.6888 KOps/s 344.9259 KOps/s $\color{#d91a1a}-0.94\%$
test_membership_stacked_nested_leaf 15.9300μs 2.8737μs 347.9810 KOps/s 341.8115 KOps/s $\color{#35bf28}+1.80\%$
test_membership_nested_last 25.0310μs 5.3205μs 187.9519 KOps/s 188.7165 KOps/s $\color{#d91a1a}-0.41\%$
test_membership_nested_leaf_last 21.0200μs 5.3542μs 186.7682 KOps/s 188.7114 KOps/s $\color{#d91a1a}-1.03\%$
test_membership_stacked_nested_last 24.0710μs 5.3354μs 187.4280 KOps/s 176.3457 KOps/s $\textbf{\color{#35bf28}+6.28\%}$
test_membership_stacked_nested_leaf_last 18.3400μs 5.3076μs 188.4090 KOps/s 176.0707 KOps/s $\textbf{\color{#35bf28}+7.01\%}$
test_nested_getleaf 23.8200μs 8.4993μs 117.6562 KOps/s 118.2989 KOps/s $\color{#d91a1a}-0.54\%$
test_nested_get 20.9700μs 7.9836μs 125.2573 KOps/s 125.3124 KOps/s $\color{#d91a1a}-0.04\%$
test_stacked_getleaf 25.9100μs 8.4617μs 118.1793 KOps/s 118.9939 KOps/s $\color{#d91a1a}-0.68\%$
test_stacked_get 28.7400μs 8.0142μs 124.7786 KOps/s 125.7247 KOps/s $\color{#d91a1a}-0.75\%$
test_nested_getitemleaf 24.7700μs 9.8755μs 101.2608 KOps/s 100.9560 KOps/s $\color{#35bf28}+0.30\%$
test_nested_getitem 31.2100μs 9.3390μs 107.0783 KOps/s 105.8773 KOps/s $\color{#35bf28}+1.13\%$
test_stacked_getitemleaf 25.9800μs 9.8382μs 101.6443 KOps/s 101.5713 KOps/s $\color{#35bf28}+0.07\%$
test_stacked_getitem 22.0200μs 9.3437μs 107.0241 KOps/s 105.8255 KOps/s $\color{#35bf28}+1.13\%$
test_lock_nested 2.0966ms 0.3575ms 2.7974 KOps/s 2.7621 KOps/s $\color{#35bf28}+1.28\%$
test_lock_stack_nested 0.3514ms 0.3107ms 3.2188 KOps/s 3.1866 KOps/s $\color{#35bf28}+1.01\%$
test_unlock_nested 0.7351ms 0.3572ms 2.7992 KOps/s 2.8163 KOps/s $\color{#d91a1a}-0.61\%$
test_unlock_stack_nested 0.3731ms 0.3215ms 3.1102 KOps/s 3.0827 KOps/s $\color{#35bf28}+0.89\%$
test_flatten_speed 0.4652ms 0.2623ms 3.8118 KOps/s 3.8345 KOps/s $\color{#d91a1a}-0.59\%$
test_unflatten_speed 0.4130ms 0.3628ms 2.7565 KOps/s 2.7821 KOps/s $\color{#d91a1a}-0.92\%$
test_common_ops 1.0439ms 0.5903ms 1.6941 KOps/s 1.5722 KOps/s $\textbf{\color{#35bf28}+7.75\%}$
test_creation 17.8600μs 1.5894μs 629.1529 KOps/s 631.9611 KOps/s $\color{#d91a1a}-0.44\%$
test_creation_empty 49.4610μs 7.7118μs 129.6709 KOps/s 109.0852 KOps/s $\textbf{\color{#35bf28}+18.87\%}$
test_creation_nested_1 25.0400μs 9.4732μs 105.5608 KOps/s 91.9735 KOps/s $\textbf{\color{#35bf28}+14.77\%}$
test_creation_nested_2 25.7010μs 12.0553μs 82.9510 KOps/s 74.8648 KOps/s $\textbf{\color{#35bf28}+10.80\%}$
test_clone 67.3910μs 13.6222μs 73.4096 KOps/s 72.1328 KOps/s $\color{#35bf28}+1.77\%$
test_getitem[int] 24.8700μs 10.8883μs 91.8415 KOps/s 90.5466 KOps/s $\color{#35bf28}+1.43\%$
test_getitem[slice_int] 38.7010μs 21.8390μs 45.7896 KOps/s 44.1760 KOps/s $\color{#35bf28}+3.65\%$
test_getitem[range] 68.7500μs 50.2584μs 19.8972 KOps/s 19.2915 KOps/s $\color{#35bf28}+3.14\%$
test_getitem[tuple] 41.9900μs 19.0570μs 52.4743 KOps/s 50.6581 KOps/s $\color{#35bf28}+3.59\%$
test_getitem[list] 0.1339ms 36.6186μs 27.3085 KOps/s 26.2913 KOps/s $\color{#35bf28}+3.87\%$
test_setitem_dim[int] 41.4200μs 26.1650μs 38.2189 KOps/s 34.7808 KOps/s $\textbf{\color{#35bf28}+9.89\%}$
test_setitem_dim[slice_int] 64.0210μs 47.3101μs 21.1371 KOps/s 19.8054 KOps/s $\textbf{\color{#35bf28}+6.72\%}$
test_setitem_dim[range] 87.8220μs 65.5734μs 15.2501 KOps/s 14.2602 KOps/s $\textbf{\color{#35bf28}+6.94\%}$
test_setitem_dim[tuple] 58.3110μs 40.0220μs 24.9862 KOps/s 22.7259 KOps/s $\textbf{\color{#35bf28}+9.95\%}$
test_setitem 46.3500μs 18.1698μs 55.0364 KOps/s 49.6912 KOps/s $\textbf{\color{#35bf28}+10.76\%}$
test_set 48.0700μs 17.8544μs 56.0085 KOps/s 51.1534 KOps/s $\textbf{\color{#35bf28}+9.49\%}$
test_set_shared 0.1303s 0.1318ms 7.5851 KOps/s 9.6982 KOps/s $\textbf{\color{#d91a1a}-21.79\%}$
test_update 62.3610μs 20.3045μs 49.2502 KOps/s 44.7726 KOps/s $\textbf{\color{#35bf28}+10.00\%}$
test_update_nested 63.0400μs 26.7247μs 37.4185 KOps/s 34.0664 KOps/s $\textbf{\color{#35bf28}+9.84\%}$
test_set_nested 53.8500μs 19.1168μs 52.3101 KOps/s 48.1215 KOps/s $\textbf{\color{#35bf28}+8.70\%}$
test_set_nested_new 55.1910μs 22.3923μs 44.6582 KOps/s 41.6611 KOps/s $\textbf{\color{#35bf28}+7.19\%}$
test_select 72.6800μs 34.6450μs 28.8642 KOps/s 26.5144 KOps/s $\textbf{\color{#35bf28}+8.86\%}$
test_select_nested 76.3810μs 53.4694μs 18.7023 KOps/s 18.8001 KOps/s $\color{#d91a1a}-0.52\%$
test_exclude_nested 0.6257ms 0.1148ms 8.7080 KOps/s 8.4052 KOps/s $\color{#35bf28}+3.60\%$
test_empty[True] 0.4471ms 0.3930ms 2.5447 KOps/s 2.5518 KOps/s $\color{#d91a1a}-0.28\%$
test_empty[False] 2.8420μs 0.8525μs 1.1731 MOps/s 1.1828 MOps/s $\color{#d91a1a}-0.82\%$
test_to 75.2800μs 60.0544μs 16.6516 KOps/s 17.6272 KOps/s $\textbf{\color{#d91a1a}-5.53\%}$
test_to_nonblocking 64.5000μs 35.2638μs 28.3577 KOps/s 27.7001 KOps/s $\color{#35bf28}+2.37\%$
test_unbind_speed 0.3289ms 0.2694ms 3.7126 KOps/s 3.6987 KOps/s $\color{#35bf28}+0.38\%$
test_unbind_speed_stack0 0.3304ms 0.2694ms 3.7118 KOps/s 3.7001 KOps/s $\color{#35bf28}+0.31\%$
test_unbind_speed_stack1 0.1298s 0.7770ms 1.2870 KOps/s 1.4649 KOps/s $\textbf{\color{#d91a1a}-12.15\%}$
test_split 1.5968ms 1.5354ms 651.2860 Ops/s 643.3519 Ops/s $\color{#35bf28}+1.23\%$
test_chunk 1.6949ms 1.5361ms 650.9933 Ops/s 564.5541 Ops/s $\textbf{\color{#35bf28}+15.31\%}$
test_creation[device0] 0.1552ms 76.4486μs 13.0807 KOps/s 13.5989 KOps/s $\color{#d91a1a}-3.81\%$
test_creation_from_tensor 0.1464ms 57.3171μs 17.4468 KOps/s 18.0703 KOps/s $\color{#d91a1a}-3.45\%$
test_add_one[memmap_tensor0] 0.1154ms 6.5482μs 152.7141 KOps/s 147.1858 KOps/s $\color{#35bf28}+3.76\%$
test_contiguous[memmap_tensor0] 11.4400μs 0.6522μs 1.5333 MOps/s 1.5051 MOps/s $\color{#35bf28}+1.88\%$
test_stack[memmap_tensor0] 31.7200μs 4.5027μs 222.0914 KOps/s 215.2751 KOps/s $\color{#35bf28}+3.17\%$
test_memmaptd_index 1.1680ms 0.2674ms 3.7402 KOps/s 3.7220 KOps/s $\color{#35bf28}+0.49\%$
test_memmaptd_index_astensor 0.5691ms 0.3246ms 3.0809 KOps/s 3.0717 KOps/s $\color{#35bf28}+0.30\%$
test_memmaptd_index_op 0.8677ms 0.6021ms 1.6608 KOps/s 1.5791 KOps/s $\textbf{\color{#35bf28}+5.17\%}$
test_serialize_model 0.2251s 0.1033s 9.6804 Ops/s 10.7327 Ops/s $\textbf{\color{#d91a1a}-9.80\%}$
test_serialize_model_pickle 1.3501s 1.2359s 0.8091 Ops/s 0.8083 Ops/s $\color{#35bf28}+0.10\%$
test_serialize_weights 90.7280ms 86.5325ms 11.5564 Ops/s 9.2478 Ops/s $\textbf{\color{#35bf28}+24.96\%}$
test_serialize_weights_returnearly 0.3894s 80.2505ms 12.4610 Ops/s 13.0297 Ops/s $\color{#d91a1a}-4.36\%$
test_serialize_weights_pickle 1.3470s 1.2359s 0.8091 Ops/s 0.8012 Ops/s $\color{#35bf28}+0.99\%$
test_reshape_pytree 57.4000μs 25.2352μs 39.6271 KOps/s 39.4146 KOps/s $\color{#35bf28}+0.54\%$
test_reshape_td 63.0110μs 31.2425μs 32.0077 KOps/s 32.6329 KOps/s $\color{#d91a1a}-1.92\%$
test_view_pytree 47.1710μs 25.0811μs 39.8706 KOps/s 40.4585 KOps/s $\color{#d91a1a}-1.45\%$
test_view_td 0.5554ms 47.7143μs 20.9581 KOps/s 17.2214 KOps/s $\textbf{\color{#35bf28}+21.70\%}$
test_unbind_pytree 72.0510μs 30.2243μs 33.0860 KOps/s 32.7668 KOps/s $\color{#35bf28}+0.97\%$
test_unbind_td 0.1063ms 40.0028μs 24.9983 KOps/s 24.7369 KOps/s $\color{#35bf28}+1.06\%$
test_split_pytree 45.3300μs 29.4644μs 33.9392 KOps/s 33.9597 KOps/s $\color{#d91a1a}-0.06\%$
test_split_td 0.3654ms 40.0654μs 24.9592 KOps/s 25.1943 KOps/s $\color{#d91a1a}-0.93\%$
test_add_pytree 55.0400μs 35.1718μs 28.4319 KOps/s 28.9262 KOps/s $\color{#d91a1a}-1.71\%$
test_add_td 79.8400μs 47.8470μs 20.9000 KOps/s 19.2539 KOps/s $\textbf{\color{#35bf28}+8.55\%}$
test_distributed 3.5428ms 92.9154μs 10.7625 KOps/s 14.0451 KOps/s $\textbf{\color{#d91a1a}-23.37\%}$
test_tdmodule 32.7010μs 17.6055μs 56.8003 KOps/s 55.1269 KOps/s $\color{#35bf28}+3.04\%$
test_tdmodule_dispatch 0.1387ms 35.6941μs 28.0158 KOps/s 26.4058 KOps/s $\textbf{\color{#35bf28}+6.10\%}$
test_tdseq 38.3010μs 20.6667μs 48.3871 KOps/s 46.5001 KOps/s $\color{#35bf28}+4.06\%$
test_tdseq_dispatch 55.3010μs 38.6797μs 25.8533 KOps/s 24.9266 KOps/s $\color{#35bf28}+3.72\%$
test_instantiation_functorch 1.7782ms 1.6709ms 598.4899 Ops/s 601.5294 Ops/s $\color{#d91a1a}-0.51\%$
test_instantiation_td 0.1751s 1.3691ms 730.3961 Ops/s 870.4976 Ops/s $\textbf{\color{#d91a1a}-16.09\%}$
test_exec_functorch 0.2207ms 0.1576ms 6.3451 KOps/s 6.4027 KOps/s $\color{#d91a1a}-0.90\%$
test_exec_functional_call 0.2487ms 0.1546ms 6.4682 KOps/s 6.5891 KOps/s $\color{#d91a1a}-1.83\%$
test_exec_td 0.2022ms 0.1435ms 6.9685 KOps/s 6.9354 KOps/s $\color{#35bf28}+0.48\%$
test_exec_td_decorator 0.2952ms 0.1926ms 5.1932 KOps/s 5.2780 KOps/s $\color{#d91a1a}-1.61\%$
test_vmap_mlp_speed[True-True] 0.7825ms 0.5873ms 1.7028 KOps/s 1.7130 KOps/s $\color{#d91a1a}-0.60\%$
test_vmap_mlp_speed[True-False] 0.6534ms 0.5841ms 1.7122 KOps/s 1.7169 KOps/s $\color{#d91a1a}-0.27\%$
test_vmap_mlp_speed[False-True] 0.5850ms 0.5148ms 1.9425 KOps/s 1.9657 KOps/s $\color{#d91a1a}-1.18\%$
test_vmap_mlp_speed[False-False] 0.6393ms 0.5146ms 1.9431 KOps/s 1.9596 KOps/s $\color{#d91a1a}-0.85\%$
test_vmap_mlp_speed_decorator[True-True] 0.7686ms 0.6261ms 1.5973 KOps/s 1.5580 KOps/s $\color{#35bf28}+2.52\%$
test_vmap_mlp_speed_decorator[True-False] 1.0901ms 0.6259ms 1.5978 KOps/s 1.5932 KOps/s $\color{#35bf28}+0.29\%$
test_vmap_mlp_speed_decorator[False-True] 0.6832ms 0.5306ms 1.8848 KOps/s 1.8917 KOps/s $\color{#d91a1a}-0.36\%$
test_vmap_mlp_speed_decorator[False-False] 0.6599ms 0.5303ms 1.8856 KOps/s 1.9010 KOps/s $\color{#d91a1a}-0.81\%$
test_vmap_transformer_speed[True-True] 8.0442ms 7.8551ms 127.3061 Ops/s 127.3077 Ops/s $-0.00\%$
test_vmap_transformer_speed[True-False] 7.9974ms 7.8232ms 127.8253 Ops/s 127.5350 Ops/s $\color{#35bf28}+0.23\%$
test_vmap_transformer_speed[False-True] 7.8943ms 7.7695ms 128.7089 Ops/s 128.8222 Ops/s $\color{#d91a1a}-0.09\%$
test_vmap_transformer_speed[False-False] 7.9386ms 7.7717ms 128.6723 Ops/s 129.1716 Ops/s $\color{#d91a1a}-0.39\%$
test_vmap_transformer_speed_decorator[True-True] 18.9983ms 18.6393ms 53.6500 Ops/s 53.7543 Ops/s $\color{#d91a1a}-0.19\%$
test_vmap_transformer_speed_decorator[True-False] 18.7384ms 18.5623ms 53.8727 Ops/s 53.9127 Ops/s $\color{#d91a1a}-0.07\%$
test_vmap_transformer_speed_decorator[False-True] 18.3426ms 18.1921ms 54.9688 Ops/s 54.9483 Ops/s $\color{#35bf28}+0.04\%$
test_vmap_transformer_speed_decorator[False-False] 18.3141ms 18.1567ms 55.0760 Ops/s 55.0033 Ops/s $\color{#35bf28}+0.13\%$
test_to_module_speed[True] 1.3699ms 1.2583ms 794.7322 Ops/s 793.6267 Ops/s $\color{#35bf28}+0.14\%$
test_to_module_speed[False] 2.2331ms 1.2295ms 813.3350 Ops/s 810.3513 Ops/s $\color{#35bf28}+0.37\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants