Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix to_module batch-size mismatch #688

Merged
merged 1 commit into from
Feb 23, 2024
Merged

[BugFix] Fix to_module batch-size mismatch #688

merged 1 commit into from
Feb 23, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 23, 2024

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 23, 2024
@vmoens vmoens added the bug Something isn't working label Feb 23, 2024
@vmoens vmoens marked this pull request as ready for review February 23, 2024 16:41
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 126. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 40.0850μs 16.8353μs 59.3988 KOps/s 62.0741 KOps/s $\color{#d91a1a}-4.31\%$
test_plain_set_stack_nested 44.3730μs 17.2913μs 57.8326 KOps/s 61.4018 KOps/s $\textbf{\color{#d91a1a}-5.81\%}$
test_plain_set_nested_inplace 51.0650μs 19.5152μs 51.2422 KOps/s 53.7685 KOps/s $\color{#d91a1a}-4.70\%$
test_plain_set_stack_nested_inplace 45.8760μs 19.7619μs 50.6023 KOps/s 53.9498 KOps/s $\textbf{\color{#d91a1a}-6.20\%}$
test_items 11.5865μs 2.2598μs 442.5116 KOps/s 406.2506 KOps/s $\textbf{\color{#35bf28}+8.93\%}$
test_items_nested 0.8197ms 0.2718ms 3.6795 KOps/s 3.7453 KOps/s $\color{#d91a1a}-1.76\%$
test_items_nested_locked 0.5049ms 0.2703ms 3.6999 KOps/s 3.7275 KOps/s $\color{#d91a1a}-0.74\%$
test_items_nested_leaf 0.5248ms 0.1680ms 5.9531 KOps/s 6.0077 KOps/s $\color{#d91a1a}-0.91\%$
test_items_stack_nested 0.5875ms 0.2693ms 3.7134 KOps/s 3.7129 KOps/s $\color{#35bf28}+0.01\%$
test_items_stack_nested_leaf 0.2731ms 0.1660ms 6.0249 KOps/s 5.9877 KOps/s $\color{#35bf28}+0.62\%$
test_items_stack_nested_locked 0.9751ms 0.2737ms 3.6539 KOps/s 3.6710 KOps/s $\color{#d91a1a}-0.47\%$
test_keys 25.3470μs 3.8391μs 260.4756 KOps/s 258.5196 KOps/s $\color{#35bf28}+0.76\%$
test_keys_nested 2.0847ms 0.1500ms 6.6681 KOps/s 6.6412 KOps/s $\color{#35bf28}+0.41\%$
test_keys_nested_locked 0.3071ms 0.1522ms 6.5713 KOps/s 6.4884 KOps/s $\color{#35bf28}+1.28\%$
test_keys_nested_leaf 35.2031ms 0.1359ms 7.3610 KOps/s 7.6547 KOps/s $\color{#d91a1a}-3.84\%$
test_keys_stack_nested 0.2945ms 0.1499ms 6.6708 KOps/s 6.6385 KOps/s $\color{#35bf28}+0.49\%$
test_keys_stack_nested_leaf 0.2621ms 0.1315ms 7.6039 KOps/s 7.5615 KOps/s $\color{#35bf28}+0.56\%$
test_keys_stack_nested_locked 0.3032ms 0.1563ms 6.3987 KOps/s 6.4224 KOps/s $\color{#d91a1a}-0.37\%$
test_values 4.7170μs 1.1443μs 873.9143 KOps/s 808.6853 KOps/s $\textbf{\color{#35bf28}+8.07\%}$
test_values_nested 0.1054ms 51.8557μs 19.2843 KOps/s 19.1136 KOps/s $\color{#35bf28}+0.89\%$
test_values_nested_locked 0.1050ms 52.4997μs 19.0477 KOps/s 19.2461 KOps/s $\color{#d91a1a}-1.03\%$
test_values_nested_leaf 0.1003ms 46.7507μs 21.3900 KOps/s 21.6753 KOps/s $\color{#d91a1a}-1.32\%$
test_values_stack_nested 0.2250ms 54.8345μs 18.2367 KOps/s 19.0457 KOps/s $\color{#d91a1a}-4.25\%$
test_values_stack_nested_leaf 98.9250μs 46.6216μs 21.4493 KOps/s 21.6328 KOps/s $\color{#d91a1a}-0.85\%$
test_values_stack_nested_locked 0.1066ms 52.6420μs 18.9962 KOps/s 19.1338 KOps/s $\color{#d91a1a}-0.72\%$
test_membership 6.6975μs 1.1804μs 847.1984 KOps/s 836.7612 KOps/s $\color{#35bf28}+1.25\%$
test_membership_nested 23.0530μs 3.4203μs 292.3759 KOps/s 284.5847 KOps/s $\color{#35bf28}+2.74\%$
test_membership_nested_leaf 20.6980μs 3.4499μs 289.8611 KOps/s 278.6052 KOps/s $\color{#35bf28}+4.04\%$
test_membership_stacked_nested 21.6110μs 3.4176μs 292.6042 KOps/s 282.7364 KOps/s $\color{#35bf28}+3.49\%$
test_membership_stacked_nested_leaf 18.8960μs 3.4218μs 292.2451 KOps/s 282.9748 KOps/s $\color{#35bf28}+3.28\%$
test_membership_nested_last 0.1332ms 6.6633μs 150.0747 KOps/s 147.1455 KOps/s $\color{#35bf28}+1.99\%$
test_membership_nested_leaf_last 67.5860μs 6.6515μs 150.3411 KOps/s 138.8010 KOps/s $\textbf{\color{#35bf28}+8.31\%}$
test_membership_stacked_nested_last 41.0170μs 6.9994μs 142.8692 KOps/s 139.2988 KOps/s $\color{#35bf28}+2.56\%$
test_membership_stacked_nested_leaf_last 0.1281ms 7.3476μs 136.0980 KOps/s 140.2505 KOps/s $\color{#d91a1a}-2.96\%$
test_nested_getleaf 31.8490μs 10.4858μs 95.3674 KOps/s 94.8356 KOps/s $\color{#35bf28}+0.56\%$
test_nested_get 46.9380μs 9.9993μs 100.0074 KOps/s 101.2124 KOps/s $\color{#d91a1a}-1.19\%$
test_stacked_getleaf 38.0310μs 10.4808μs 95.4122 KOps/s 95.3061 KOps/s $\color{#35bf28}+0.11\%$
test_stacked_get 36.5680μs 9.9642μs 100.3591 KOps/s 101.2573 KOps/s $\color{#d91a1a}-0.89\%$
test_nested_getitemleaf 50.3340μs 12.1172μs 82.5275 KOps/s 80.0680 KOps/s $\color{#35bf28}+3.07\%$
test_nested_getitem 38.1710μs 11.5134μs 86.8551 KOps/s 85.4676 KOps/s $\color{#35bf28}+1.62\%$
test_stacked_getitemleaf 41.1970μs 12.0071μs 83.2839 KOps/s 79.7717 KOps/s $\color{#35bf28}+4.40\%$
test_stacked_getitem 0.1390ms 11.4862μs 87.0610 KOps/s 86.2876 KOps/s $\color{#35bf28}+0.90\%$
test_lock_nested 0.6755ms 0.3384ms 2.9547 KOps/s 2.9461 KOps/s $\color{#35bf28}+0.29\%$
test_lock_stack_nested 0.3607ms 0.3018ms 3.3134 KOps/s 3.2950 KOps/s $\color{#35bf28}+0.56\%$
test_unlock_nested 74.0440ms 0.4173ms 2.3961 KOps/s 2.3426 KOps/s $\color{#35bf28}+2.28\%$
test_unlock_stack_nested 0.5119ms 0.3106ms 3.2191 KOps/s 3.1863 KOps/s $\color{#35bf28}+1.03\%$
test_flatten_speed 0.6852ms 0.3644ms 2.7446 KOps/s 2.6975 KOps/s $\color{#35bf28}+1.74\%$
test_unflatten_speed 0.7617ms 0.4634ms 2.1579 KOps/s 2.1758 KOps/s $\color{#d91a1a}-0.82\%$
test_common_ops 1.1525ms 0.6819ms 1.4664 KOps/s 1.5031 KOps/s $\color{#d91a1a}-2.44\%$
test_creation 18.0040μs 1.7909μs 558.3883 KOps/s 542.7291 KOps/s $\color{#35bf28}+2.89\%$
test_creation_empty 39.6140μs 9.7341μs 102.7319 KOps/s 121.9265 KOps/s $\textbf{\color{#d91a1a}-15.74\%}$
test_creation_nested_1 39.2130μs 12.6773μs 78.8810 KOps/s 91.7036 KOps/s $\textbf{\color{#d91a1a}-13.98\%}$
test_creation_nested_2 39.6640μs 15.5497μs 64.3098 KOps/s 69.9697 KOps/s $\textbf{\color{#d91a1a}-8.09\%}$
test_clone 1.2088ms 12.9304μs 77.3372 KOps/s 76.9998 KOps/s $\color{#35bf28}+0.44\%$
test_getitem[int] 30.0160μs 11.0842μs 90.2189 KOps/s 89.8638 KOps/s $\color{#35bf28}+0.40\%$
test_getitem[slice_int] 65.1410μs 22.5745μs 44.2978 KOps/s 45.4478 KOps/s $\color{#d91a1a}-2.53\%$
test_getitem[range] 0.1521ms 42.0257μs 23.7950 KOps/s 24.5709 KOps/s $\color{#d91a1a}-3.16\%$
test_getitem[tuple] 50.7950μs 18.2963μs 54.6560 KOps/s 54.9299 KOps/s $\color{#d91a1a}-0.50\%$
test_getitem[list] 0.1697ms 36.9069μs 27.0952 KOps/s 26.2179 KOps/s $\color{#35bf28}+3.35\%$
test_setitem_dim[int] 55.3630μs 30.0271μs 33.3032 KOps/s 37.9805 KOps/s $\textbf{\color{#d91a1a}-12.31\%}$
test_setitem_dim[slice_int] 0.1054ms 54.9193μs 18.2085 KOps/s 19.2276 KOps/s $\textbf{\color{#d91a1a}-5.30\%}$
test_setitem_dim[range] 0.1160ms 74.0198μs 13.5099 KOps/s 14.1168 KOps/s $\color{#d91a1a}-4.30\%$
test_setitem_dim[tuple] 0.1055ms 43.8864μs 22.7861 KOps/s 24.0807 KOps/s $\textbf{\color{#d91a1a}-5.38\%}$
test_setitem 65.6430μs 19.4868μs 51.3167 KOps/s 52.9795 KOps/s $\color{#d91a1a}-3.14\%$
test_set 52.9690μs 18.8873μs 52.9456 KOps/s 54.8276 KOps/s $\color{#d91a1a}-3.43\%$
test_set_shared 3.8979ms 0.1407ms 7.1071 KOps/s 7.0761 KOps/s $\color{#35bf28}+0.44\%$
test_update 92.1520μs 21.4681μs 46.5808 KOps/s 48.5829 KOps/s $\color{#d91a1a}-4.12\%$
test_update_nested 0.5800ms 28.8955μs 34.6075 KOps/s 35.7681 KOps/s $\color{#d91a1a}-3.24\%$
test_set_nested 65.6030μs 21.0236μs 47.5655 KOps/s 49.6217 KOps/s $\color{#d91a1a}-4.14\%$
test_set_nested_new 80.3000μs 24.8481μs 40.2445 KOps/s 41.6209 KOps/s $\color{#d91a1a}-3.31\%$
test_select 0.1047ms 38.3061μs 26.1055 KOps/s 26.8597 KOps/s $\color{#d91a1a}-2.81\%$
test_select_nested 0.1328ms 58.8575μs 16.9902 KOps/s 16.9496 KOps/s $\color{#35bf28}+0.24\%$
test_exclude_nested 0.2411ms 0.1179ms 8.4791 KOps/s 8.4427 KOps/s $\color{#35bf28}+0.43\%$
test_empty[True] 0.6629ms 0.4161ms 2.4032 KOps/s 2.4100 KOps/s $\color{#d91a1a}-0.28\%$
test_empty[False] 5.2798μs 1.0482μs 954.0414 KOps/s 926.7447 KOps/s $\color{#35bf28}+2.95\%$
test_unbind_speed 0.3423ms 0.2620ms 3.8173 KOps/s 4.0662 KOps/s $\textbf{\color{#d91a1a}-6.12\%}$
test_unbind_speed_stack0 0.5692ms 0.2412ms 4.1459 KOps/s 4.1436 KOps/s $\color{#35bf28}+0.06\%$
test_unbind_speed_stack1 0.6681ms 0.5947ms 1.6815 KOps/s 1.4866 KOps/s $\textbf{\color{#35bf28}+13.11\%}$
test_split 0.1226s 1.6096ms 621.2570 Ops/s 609.6374 Ops/s $\color{#35bf28}+1.91\%$
test_chunk 3.0532ms 1.4446ms 692.2343 Ops/s 672.0261 Ops/s $\color{#35bf28}+3.01\%$
test_creation[device0] 4.8636ms 0.1019ms 9.8112 KOps/s 9.8839 KOps/s $\color{#d91a1a}-0.74\%$
test_creation_from_tensor 0.1903ms 80.7289μs 12.3871 KOps/s 11.9142 KOps/s $\color{#35bf28}+3.97\%$
test_add_one[memmap_tensor0] 91.0500μs 5.3347μs 187.4515 KOps/s 190.9979 KOps/s $\color{#d91a1a}-1.86\%$
test_contiguous[memmap_tensor0] 20.4280μs 0.6331μs 1.5796 MOps/s 1.5271 MOps/s $\color{#35bf28}+3.44\%$
test_stack[memmap_tensor0] 41.9990μs 3.5473μs 281.9030 KOps/s 281.8822 KOps/s $+0.01\%$
test_memmaptd_index 0.9923ms 0.2372ms 4.2159 KOps/s 4.1683 KOps/s $\color{#35bf28}+1.14\%$
test_memmaptd_index_astensor 0.6251ms 0.3020ms 3.3113 KOps/s 3.3396 KOps/s $\color{#d91a1a}-0.85\%$
test_memmaptd_index_op 0.8686ms 0.5778ms 1.7306 KOps/s 1.7847 KOps/s $\color{#d91a1a}-3.03\%$
test_serialize_model 0.2188s 0.1113s 8.9820 Ops/s 8.6219 Ops/s $\color{#35bf28}+4.18\%$
test_serialize_model_pickle 0.4490s 0.3740s 2.6738 Ops/s 2.6401 Ops/s $\color{#35bf28}+1.27\%$
test_serialize_weights 99.6800ms 97.1471ms 10.2937 Ops/s 10.1101 Ops/s $\color{#35bf28}+1.82\%$
test_serialize_weights_returnearly 0.1302s 0.1206s 8.2926 Ops/s 8.3943 Ops/s $\color{#d91a1a}-1.21\%$
test_serialize_weights_pickle 1.1054s 0.6132s 1.6307 Ops/s 1.3147 Ops/s $\textbf{\color{#35bf28}+24.04\%}$
test_serialize_weights_filesystem 96.5356ms 91.1105ms 10.9757 Ops/s 9.9106 Ops/s $\textbf{\color{#35bf28}+10.75\%}$
test_serialize_model_filesystem 95.9530ms 91.0174ms 10.9869 Ops/s 10.8437 Ops/s $\color{#35bf28}+1.32\%$
test_reshape_pytree 50.4540μs 20.9951μs 47.6302 KOps/s 46.7277 KOps/s $\color{#35bf28}+1.93\%$
test_reshape_td 84.9190μs 31.6302μs 31.6153 KOps/s 32.4467 KOps/s $\color{#d91a1a}-2.56\%$
test_view_pytree 61.4250μs 21.0191μs 47.5759 KOps/s 46.3912 KOps/s $\color{#35bf28}+2.55\%$
test_view_td 0.1113s 56.7012μs 17.6363 KOps/s 16.5746 KOps/s $\textbf{\color{#35bf28}+6.41\%}$
test_unbind_pytree 52.9680μs 24.7372μs 40.4250 KOps/s 40.8357 KOps/s $\color{#d91a1a}-1.01\%$
test_unbind_td 0.1166ms 36.1367μs 27.6727 KOps/s 27.6160 KOps/s $\color{#35bf28}+0.21\%$
test_split_pytree 56.4050μs 24.2598μs 41.2204 KOps/s 41.3569 KOps/s $\color{#d91a1a}-0.33\%$
test_split_td 0.1116ms 39.6445μs 25.2242 KOps/s 25.0610 KOps/s $\color{#35bf28}+0.65\%$
test_add_pytree 76.3730μs 29.1819μs 34.2678 KOps/s 32.9491 KOps/s $\color{#35bf28}+4.00\%$
test_add_td 0.1234ms 50.5477μs 19.7833 KOps/s 20.7347 KOps/s $\color{#d91a1a}-4.59\%$
test_distributed 0.1982ms 99.9685μs 10.0031 KOps/s 9.8989 KOps/s $\color{#35bf28}+1.05\%$
test_tdmodule 0.5277ms 21.8243μs 45.8204 KOps/s 47.0206 KOps/s $\color{#d91a1a}-2.55\%$
test_tdmodule_dispatch 0.1974ms 42.3447μs 23.6157 KOps/s 24.0146 KOps/s $\color{#d91a1a}-1.66\%$
test_tdseq 38.8820μs 24.4963μs 40.8225 KOps/s 41.8963 KOps/s $\color{#d91a1a}-2.56\%$
test_tdseq_dispatch 0.1355ms 46.3375μs 21.5808 KOps/s 22.8422 KOps/s $\textbf{\color{#d91a1a}-5.52\%}$
test_instantiation_functorch 2.0648ms 1.3322ms 750.6371 Ops/s 776.6096 Ops/s $\color{#d91a1a}-3.34\%$
test_instantiation_td 1.9650ms 1.0085ms 991.5229 Ops/s 983.8793 Ops/s $\color{#35bf28}+0.78\%$
test_exec_functorch 0.3025ms 0.1585ms 6.3076 KOps/s 6.4197 KOps/s $\color{#d91a1a}-1.74\%$
test_exec_functional_call 0.2646ms 0.1477ms 6.7724 KOps/s 6.8448 KOps/s $\color{#d91a1a}-1.06\%$
test_exec_td 0.3577ms 0.1461ms 6.8443 KOps/s 6.9942 KOps/s $\color{#d91a1a}-2.14\%$
test_exec_td_decorator 0.5808ms 0.1942ms 5.1501 KOps/s 5.0989 KOps/s $\color{#35bf28}+1.00\%$
test_vmap_mlp_speed[True-True] 0.5874ms 0.4642ms 2.1541 KOps/s 2.1814 KOps/s $\color{#d91a1a}-1.25\%$
test_vmap_mlp_speed[True-False] 0.7550ms 0.4630ms 2.1597 KOps/s 2.1907 KOps/s $\color{#d91a1a}-1.41\%$
test_vmap_mlp_speed[False-True] 0.5620ms 0.3782ms 2.6438 KOps/s 2.6442 KOps/s $\color{#d91a1a}-0.02\%$
test_vmap_mlp_speed[False-False] 0.6274ms 0.3797ms 2.6334 KOps/s 2.6261 KOps/s $\color{#35bf28}+0.28\%$
test_vmap_mlp_speed_decorator[True-True] 0.9849ms 0.5111ms 1.9566 KOps/s 1.9529 KOps/s $\color{#35bf28}+0.19\%$
test_vmap_mlp_speed_decorator[True-False] 0.6792ms 0.5113ms 1.9557 KOps/s 1.9619 KOps/s $\color{#d91a1a}-0.32\%$
test_vmap_mlp_speed_decorator[False-True] 0.6988ms 0.3956ms 2.5280 KOps/s 2.5319 KOps/s $\color{#d91a1a}-0.15\%$
test_vmap_mlp_speed_decorator[False-False] 0.6220ms 0.3942ms 2.5366 KOps/s 2.5303 KOps/s $\color{#35bf28}+0.25\%$
test_to_module_speed[True] 2.1088ms 1.4030ms 712.7677 Ops/s 706.8752 Ops/s $\color{#35bf28}+0.83\%$
test_to_module_speed[False] 1.9110ms 1.3750ms 727.2593 Ops/s 711.5672 Ops/s $\color{#35bf28}+2.21\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 134. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.7811ms 13.1428μs 76.0874 KOps/s 71.9581 KOps/s $\textbf{\color{#35bf28}+5.74\%}$
test_plain_set_stack_nested 35.3910μs 13.1577μs 76.0013 KOps/s 71.3600 KOps/s $\textbf{\color{#35bf28}+6.50\%}$
test_plain_set_nested_inplace 47.0500μs 14.4642μs 69.1361 KOps/s 65.0728 KOps/s $\textbf{\color{#35bf28}+6.24\%}$
test_plain_set_stack_nested_inplace 33.0910μs 14.4881μs 69.0223 KOps/s 65.0192 KOps/s $\textbf{\color{#35bf28}+6.16\%}$
test_items 32.1800μs 4.7403μs 210.9551 KOps/s 210.1722 KOps/s $\color{#35bf28}+0.37\%$
test_items_nested 0.3602ms 0.3373ms 2.9648 KOps/s 2.9409 KOps/s $\color{#35bf28}+0.82\%$
test_items_nested_locked 0.3788ms 0.3409ms 2.9334 KOps/s 2.9304 KOps/s $\color{#35bf28}+0.10\%$
test_items_nested_leaf 0.2299ms 0.1997ms 5.0063 KOps/s 4.9653 KOps/s $\color{#35bf28}+0.83\%$
test_items_stack_nested 0.3610ms 0.3394ms 2.9460 KOps/s 2.9540 KOps/s $\color{#d91a1a}-0.27\%$
test_items_stack_nested_leaf 0.2218ms 0.1981ms 5.0488 KOps/s 4.9490 KOps/s $\color{#35bf28}+2.02\%$
test_items_stack_nested_locked 0.3745ms 0.3407ms 2.9354 KOps/s 2.9140 KOps/s $\color{#35bf28}+0.73\%$
test_keys 23.6510μs 4.5673μs 218.9473 KOps/s 217.9399 KOps/s $\color{#35bf28}+0.46\%$
test_keys_nested 43.4248ms 0.1003ms 9.9678 KOps/s 10.4880 KOps/s $\color{#d91a1a}-4.96\%$
test_keys_nested_locked 0.1415ms 97.6253μs 10.2433 KOps/s 10.1207 KOps/s $\color{#35bf28}+1.21\%$
test_keys_nested_leaf 0.1128ms 77.4231μs 12.9160 KOps/s 12.7071 KOps/s $\color{#35bf28}+1.64\%$
test_keys_stack_nested 0.1285ms 94.1670μs 10.6194 KOps/s 10.4828 KOps/s $\color{#35bf28}+1.30\%$
test_keys_stack_nested_leaf 98.7430μs 77.3194μs 12.9334 KOps/s 12.7729 KOps/s $\color{#35bf28}+1.26\%$
test_keys_stack_nested_locked 0.1261ms 98.6310μs 10.1388 KOps/s 10.0462 KOps/s $\color{#35bf28}+0.92\%$
test_values 7.8033μs 1.9177μs 521.4709 KOps/s 531.1192 KOps/s $\color{#d91a1a}-1.82\%$
test_values_nested 75.5020μs 45.4728μs 21.9911 KOps/s 22.0333 KOps/s $\color{#d91a1a}-0.19\%$
test_values_nested_locked 78.9220μs 47.7500μs 20.9424 KOps/s 20.9289 KOps/s $\color{#35bf28}+0.06\%$
test_values_nested_leaf 61.4020μs 39.9671μs 25.0206 KOps/s 25.1697 KOps/s $\color{#d91a1a}-0.59\%$
test_values_stack_nested 72.2720μs 46.4300μs 21.5378 KOps/s 21.5148 KOps/s $\color{#35bf28}+0.11\%$
test_values_stack_nested_leaf 69.0000μs 40.0286μs 24.9821 KOps/s 24.9683 KOps/s $\color{#35bf28}+0.06\%$
test_values_stack_nested_locked 69.7210μs 48.2353μs 20.7317 KOps/s 20.6300 KOps/s $\color{#35bf28}+0.49\%$
test_membership 4.2062μs 0.9538μs 1.0484 MOps/s 1.0523 MOps/s $\color{#d91a1a}-0.37\%$
test_membership_nested 18.7510μs 2.9117μs 343.4376 KOps/s 342.1440 KOps/s $\color{#35bf28}+0.38\%$
test_membership_nested_leaf 17.5405μs 2.8476μs 351.1783 KOps/s 342.4791 KOps/s $\color{#35bf28}+2.54\%$
test_membership_stacked_nested 22.4400μs 2.9583μs 338.0351 KOps/s 335.7125 KOps/s $\color{#35bf28}+0.69\%$
test_membership_stacked_nested_leaf 21.6700μs 2.9615μs 337.6674 KOps/s 337.9886 KOps/s $\color{#d91a1a}-0.10\%$
test_membership_nested_last 28.8610μs 5.3808μs 185.8458 KOps/s 184.6614 KOps/s $\color{#35bf28}+0.64\%$
test_membership_nested_leaf_last 34.3100μs 5.3883μs 185.5877 KOps/s 186.0563 KOps/s $\color{#d91a1a}-0.25\%$
test_membership_stacked_nested_last 25.4900μs 7.6388μs 130.9103 KOps/s 173.2759 KOps/s $\textbf{\color{#d91a1a}-24.45\%}$
test_membership_stacked_nested_leaf_last 39.0810μs 7.6085μs 131.4313 KOps/s 174.0655 KOps/s $\textbf{\color{#d91a1a}-24.49\%}$
test_nested_getleaf 32.7910μs 8.4127μs 118.8682 KOps/s 118.5372 KOps/s $\color{#35bf28}+0.28\%$
test_nested_get 38.1820μs 7.9504μs 125.7794 KOps/s 126.0932 KOps/s $\color{#d91a1a}-0.25\%$
test_stacked_getleaf 26.2900μs 8.4705μs 118.0573 KOps/s 118.1669 KOps/s $\color{#d91a1a}-0.09\%$
test_stacked_get 40.8100μs 7.9973μs 125.0415 KOps/s 125.4885 KOps/s $\color{#d91a1a}-0.36\%$
test_nested_getitemleaf 27.1620μs 9.8203μs 101.8301 KOps/s 102.1356 KOps/s $\color{#d91a1a}-0.30\%$
test_nested_getitem 34.8710μs 9.3411μs 107.0533 KOps/s 107.3100 KOps/s $\color{#d91a1a}-0.24\%$
test_stacked_getitemleaf 26.5700μs 9.8550μs 101.4715 KOps/s 101.2395 KOps/s $\color{#35bf28}+0.23\%$
test_stacked_getitem 39.6200μs 9.3753μs 106.6631 KOps/s 106.4841 KOps/s $\color{#35bf28}+0.17\%$
test_lock_nested 2.1571ms 0.3613ms 2.7680 KOps/s 2.7626 KOps/s $\color{#35bf28}+0.20\%$
test_lock_stack_nested 0.3502ms 0.3123ms 3.2025 KOps/s 3.2461 KOps/s $\color{#d91a1a}-1.34\%$
test_unlock_nested 0.7212ms 0.3607ms 2.7721 KOps/s 2.8359 KOps/s $\color{#d91a1a}-2.25\%$
test_unlock_stack_nested 0.3586ms 0.3219ms 3.1065 KOps/s 3.1560 KOps/s $\color{#d91a1a}-1.57\%$
test_flatten_speed 0.4610ms 0.2618ms 3.8196 KOps/s 3.8667 KOps/s $\color{#d91a1a}-1.22\%$
test_unflatten_speed 0.3879ms 0.3576ms 2.7967 KOps/s 2.7848 KOps/s $\color{#35bf28}+0.43\%$
test_common_ops 1.0289ms 0.5881ms 1.7003 KOps/s 1.6451 KOps/s $\color{#35bf28}+3.36\%$
test_creation 29.9010μs 1.5840μs 631.3171 KOps/s 639.7888 KOps/s $\color{#d91a1a}-1.32\%$
test_creation_empty 19.9700μs 6.9796μs 143.2737 KOps/s 112.9834 KOps/s $\textbf{\color{#35bf28}+26.81\%}$
test_creation_nested_1 75.5800μs 8.6740μs 115.2877 KOps/s 94.2819 KOps/s $\textbf{\color{#35bf28}+22.28\%}$
test_creation_nested_2 47.5310μs 11.2107μs 89.2005 KOps/s 75.5812 KOps/s $\textbf{\color{#35bf28}+18.02\%}$
test_clone 45.0420μs 14.5636μs 68.6642 KOps/s 71.3351 KOps/s $\color{#d91a1a}-3.74\%$
test_getitem[int] 49.1600μs 11.3269μs 88.2855 KOps/s 91.5376 KOps/s $\color{#d91a1a}-3.55\%$
test_getitem[slice_int] 47.0020μs 22.2943μs 44.8546 KOps/s 46.2563 KOps/s $\color{#d91a1a}-3.03\%$
test_getitem[range] 68.7020μs 51.9592μs 19.2459 KOps/s 19.7733 KOps/s $\color{#d91a1a}-2.67\%$
test_getitem[tuple] 42.5110μs 19.8778μs 50.3073 KOps/s 50.9391 KOps/s $\color{#d91a1a}-1.24\%$
test_getitem[list] 0.1449ms 39.0314μs 25.6204 KOps/s 26.5434 KOps/s $\color{#d91a1a}-3.48\%$
test_setitem_dim[int] 55.1210μs 26.6878μs 37.4704 KOps/s 36.9883 KOps/s $\color{#35bf28}+1.30\%$
test_setitem_dim[slice_int] 67.5220μs 47.4596μs 21.0705 KOps/s 20.2440 KOps/s $\color{#35bf28}+4.08\%$
test_setitem_dim[range] 90.6520μs 68.4500μs 14.6092 KOps/s 14.4793 KOps/s $\color{#35bf28}+0.90\%$
test_setitem_dim[tuple] 58.0510μs 41.4796μs 24.1082 KOps/s 23.0507 KOps/s $\color{#35bf28}+4.59\%$
test_setitem 56.4300μs 19.2759μs 51.8784 KOps/s 52.2132 KOps/s $\color{#d91a1a}-0.64\%$
test_set 58.3000μs 18.7014μs 53.4720 KOps/s 48.5072 KOps/s $\textbf{\color{#35bf28}+10.24\%}$
test_set_shared 0.1286s 0.1345ms 7.4342 KOps/s 9.7300 KOps/s $\textbf{\color{#d91a1a}-23.59\%}$
test_update 80.9710μs 20.1287μs 49.6804 KOps/s 47.7661 KOps/s $\color{#35bf28}+4.01\%$
test_update_nested 79.2910μs 26.7442μs 37.3913 KOps/s 34.9767 KOps/s $\textbf{\color{#35bf28}+6.90\%}$
test_set_nested 70.3120μs 19.5394μs 51.1787 KOps/s 48.4004 KOps/s $\textbf{\color{#35bf28}+5.74\%}$
test_set_nested_new 77.0910μs 22.6779μs 44.0958 KOps/s 41.6760 KOps/s $\textbf{\color{#35bf28}+5.81\%}$
test_select 83.4820μs 34.9247μs 28.6330 KOps/s 28.4631 KOps/s $\color{#35bf28}+0.60\%$
test_select_nested 77.2020μs 53.4087μs 18.7235 KOps/s 18.6333 KOps/s $\color{#35bf28}+0.48\%$
test_exclude_nested 0.6916ms 0.1165ms 8.5863 KOps/s 8.4763 KOps/s $\color{#35bf28}+1.30\%$
test_empty[True] 0.9586ms 0.3957ms 2.5271 KOps/s 2.5265 KOps/s $\color{#35bf28}+0.03\%$
test_empty[False] 3.2740μs 0.8442μs 1.1846 MOps/s 1.1697 MOps/s $\color{#35bf28}+1.27\%$
test_to 76.0120μs 55.9290μs 17.8798 KOps/s 18.1004 KOps/s $\color{#d91a1a}-1.22\%$
test_to_nonblocking 64.0510μs 36.3130μs 27.5383 KOps/s 28.0561 KOps/s $\color{#d91a1a}-1.85\%$
test_unbind_speed 0.3224ms 0.2745ms 3.6424 KOps/s 3.7392 KOps/s $\color{#d91a1a}-2.59\%$
test_unbind_speed_stack0 0.3144ms 0.2713ms 3.6854 KOps/s 3.7537 KOps/s $\color{#d91a1a}-1.82\%$
test_unbind_speed_stack1 0.1278s 0.7773ms 1.2865 KOps/s 1.2828 KOps/s $\color{#35bf28}+0.29\%$
test_split 1.6391ms 1.5733ms 635.5893 Ops/s 651.5471 Ops/s $\color{#d91a1a}-2.45\%$
test_chunk 1.6077ms 1.5657ms 638.6890 Ops/s 654.2541 Ops/s $\color{#d91a1a}-2.38\%$
test_creation[device0] 0.1229ms 73.2315μs 13.6553 KOps/s 13.6478 KOps/s $\color{#35bf28}+0.06\%$
test_creation_from_tensor 0.1320ms 55.1915μs 18.1187 KOps/s 18.4259 KOps/s $\color{#d91a1a}-1.67\%$
test_add_one[memmap_tensor0] 0.1236ms 7.9383μs 125.9713 KOps/s 140.9483 KOps/s $\textbf{\color{#d91a1a}-10.63\%}$
test_contiguous[memmap_tensor0] 9.7000μs 0.6434μs 1.5543 MOps/s 1.5314 MOps/s $\color{#35bf28}+1.49\%$
test_stack[memmap_tensor0] 31.5310μs 4.9060μs 203.8335 KOps/s 218.4097 KOps/s $\textbf{\color{#d91a1a}-6.67\%}$
test_memmaptd_index 1.1030ms 0.2669ms 3.7472 KOps/s 3.7174 KOps/s $\color{#35bf28}+0.80\%$
test_memmaptd_index_astensor 0.6392ms 0.3262ms 3.0653 KOps/s 3.0688 KOps/s $\color{#d91a1a}-0.11\%$
test_memmaptd_index_op 0.9411ms 0.6325ms 1.5811 KOps/s 1.5910 KOps/s $\color{#d91a1a}-0.62\%$
test_serialize_model 0.2268s 0.1034s 9.6668 Ops/s 9.0192 Ops/s $\textbf{\color{#35bf28}+7.18\%}$
test_serialize_model_pickle 1.3483s 1.2357s 0.8093 Ops/s 0.8085 Ops/s $\color{#35bf28}+0.10\%$
test_serialize_weights 88.4076ms 86.7960ms 11.5213 Ops/s 10.7601 Ops/s $\textbf{\color{#35bf28}+7.07\%}$
test_serialize_weights_returnearly 60.1617ms 54.9475ms 18.1992 Ops/s 11.6890 Ops/s $\textbf{\color{#35bf28}+55.70\%}$
test_serialize_weights_pickle 1.3484s 1.2483s 0.8011 Ops/s 0.8012 Ops/s $-0.01\%$
test_reshape_pytree 48.5210μs 25.6417μs 38.9989 KOps/s 39.6595 KOps/s $\color{#d91a1a}-1.67\%$
test_reshape_td 56.8410μs 31.8262μs 31.4206 KOps/s 31.8819 KOps/s $\color{#d91a1a}-1.45\%$
test_view_pytree 0.1177ms 25.3788μs 39.4030 KOps/s 40.3749 KOps/s $\color{#d91a1a}-2.41\%$
test_view_td 0.1396s 59.4095μs 16.8323 KOps/s 16.9639 KOps/s $\color{#d91a1a}-0.78\%$
test_unbind_pytree 76.0720μs 30.6255μs 32.6525 KOps/s 33.0770 KOps/s $\color{#d91a1a}-1.28\%$
test_unbind_td 72.7810μs 40.8570μs 24.4756 KOps/s 24.3857 KOps/s $\color{#35bf28}+0.37\%$
test_split_pytree 50.7500μs 29.9549μs 33.3835 KOps/s 34.8762 KOps/s $\color{#d91a1a}-4.28\%$
test_split_td 0.3630ms 41.2640μs 24.2342 KOps/s 25.3563 KOps/s $\color{#d91a1a}-4.43\%$
test_add_pytree 69.2510μs 39.3075μs 25.4405 KOps/s 27.8801 KOps/s $\textbf{\color{#d91a1a}-8.75\%}$
test_add_td 87.7820μs 51.7322μs 19.3303 KOps/s 20.0399 KOps/s $\color{#d91a1a}-3.54\%$
test_distributed 2.5793ms 91.4936μs 10.9297 KOps/s 13.9961 KOps/s $\textbf{\color{#d91a1a}-21.91\%}$
test_tdmodule 0.1077ms 17.9997μs 55.5564 KOps/s 54.8809 KOps/s $\color{#35bf28}+1.23\%$
test_tdmodule_dispatch 0.1324ms 35.6840μs 28.0238 KOps/s 26.7154 KOps/s $\color{#35bf28}+4.90\%$
test_tdseq 38.5400μs 20.3614μs 49.1125 KOps/s 46.7279 KOps/s $\textbf{\color{#35bf28}+5.10\%}$
test_tdseq_dispatch 66.5410μs 37.3994μs 26.7384 KOps/s 24.6441 KOps/s $\textbf{\color{#35bf28}+8.50\%}$
test_instantiation_functorch 1.8513ms 1.6753ms 596.9203 Ops/s 600.0377 Ops/s $\color{#d91a1a}-0.52\%$
test_instantiation_td 1.6861ms 1.1534ms 866.9857 Ops/s 860.3133 Ops/s $\color{#35bf28}+0.78\%$
test_exec_functorch 0.2086ms 0.1626ms 6.1517 KOps/s 6.2546 KOps/s $\color{#d91a1a}-1.65\%$
test_exec_functional_call 0.2171ms 0.1584ms 6.3133 KOps/s 6.3322 KOps/s $\color{#d91a1a}-0.30\%$
test_exec_td 0.1818ms 0.1523ms 6.5674 KOps/s 6.7156 KOps/s $\color{#d91a1a}-2.21\%$
test_exec_td_decorator 0.8046ms 0.1957ms 5.1097 KOps/s 5.1319 KOps/s $\color{#d91a1a}-0.43\%$
test_vmap_mlp_speed[True-True] 0.7381ms 0.6096ms 1.6404 KOps/s 1.6495 KOps/s $\color{#d91a1a}-0.55\%$
test_vmap_mlp_speed[True-False] 0.6681ms 0.6040ms 1.6556 KOps/s 1.6502 KOps/s $\color{#35bf28}+0.33\%$
test_vmap_mlp_speed[False-True] 0.5984ms 0.5372ms 1.8615 KOps/s 1.8772 KOps/s $\color{#d91a1a}-0.84\%$
test_vmap_mlp_speed[False-False] 0.7454ms 0.5590ms 1.7890 KOps/s 1.8787 KOps/s $\color{#d91a1a}-4.78\%$
test_vmap_mlp_speed_decorator[True-True] 0.7554ms 0.6630ms 1.5082 KOps/s 1.5549 KOps/s $\color{#d91a1a}-3.00\%$
test_vmap_mlp_speed_decorator[True-False] 1.1190ms 0.6468ms 1.5460 KOps/s 1.5475 KOps/s $\color{#d91a1a}-0.10\%$
test_vmap_mlp_speed_decorator[False-True] 0.6666ms 0.5524ms 1.8104 KOps/s 1.7266 KOps/s $\color{#35bf28}+4.85\%$
test_vmap_mlp_speed_decorator[False-False] 0.8185ms 0.5673ms 1.7627 KOps/s 1.7641 KOps/s $\color{#d91a1a}-0.08\%$
test_vmap_transformer_speed[True-True] 8.6136ms 8.2601ms 121.0637 Ops/s 119.8768 Ops/s $\color{#35bf28}+0.99\%$
test_vmap_transformer_speed[True-False] 8.2194ms 8.1454ms 122.7689 Ops/s 120.4290 Ops/s $\color{#35bf28}+1.94\%$
test_vmap_transformer_speed[False-True] 8.1470ms 8.0722ms 123.8815 Ops/s 121.1903 Ops/s $\color{#35bf28}+2.22\%$
test_vmap_transformer_speed[False-False] 8.1604ms 8.0711ms 123.8983 Ops/s 121.2816 Ops/s $\color{#35bf28}+2.16\%$
test_vmap_transformer_speed_decorator[True-True] 19.4070ms 19.3351ms 51.7193 Ops/s 50.5260 Ops/s $\color{#35bf28}+2.36\%$
test_vmap_transformer_speed_decorator[True-False] 19.4005ms 19.3018ms 51.8087 Ops/s 50.6138 Ops/s $\color{#35bf28}+2.36\%$
test_vmap_transformer_speed_decorator[False-True] 18.9996ms 18.8915ms 52.9337 Ops/s 51.5478 Ops/s $\color{#35bf28}+2.69\%$
test_vmap_transformer_speed_decorator[False-False] 18.9974ms 18.9193ms 52.8562 Ops/s 51.5689 Ops/s $\color{#35bf28}+2.50\%$
test_to_module_speed[True] 1.3315ms 1.2351ms 809.6470 Ops/s 773.7345 Ops/s $\color{#35bf28}+4.64\%$
test_to_module_speed[False] 1.7639ms 1.2078ms 827.9650 Ops/s 799.2731 Ops/s $\color{#35bf28}+3.59\%$

@vmoens vmoens merged commit 7fae5ae into main Feb 23, 2024
47 of 48 checks passed
@vmoens vmoens deleted the fix-to-module2 branch February 23, 2024 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Multiagent nets problems with SAC
2 participants