Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Avoid collapsing NonTensorStack when calling where #837

Merged
merged 2 commits into from
Jun 25, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 25, 2024

Closes #831

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 25, 2024
@vmoens vmoens added the bug Something isn't working label Jun 25, 2024
Copy link

github-actions bot commented Jun 25, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 30.7570μs 16.8875μs 59.2155 KOps/s 60.2365 KOps/s $\color{#d91a1a}-1.70\%$
test_plain_set_stack_nested 33.4830μs 17.0873μs 58.5231 KOps/s 59.0635 KOps/s $\color{#d91a1a}-0.92\%$
test_plain_set_nested_inplace 58.0990μs 19.0294μs 52.5502 KOps/s 52.1865 KOps/s $\color{#35bf28}+0.70\%$
test_plain_set_stack_nested_inplace 0.2623ms 20.1109μs 49.7242 KOps/s 52.3293 KOps/s $\color{#d91a1a}-4.98\%$
test_items 14.2360μs 2.6245μs 381.0178 KOps/s 402.4475 KOps/s $\textbf{\color{#d91a1a}-5.32\%}$
test_items_nested 0.3516ms 0.2656ms 3.7644 KOps/s 3.7869 KOps/s $\color{#d91a1a}-0.59\%$
test_items_nested_locked 1.1305ms 0.2637ms 3.7925 KOps/s 3.6806 KOps/s $\color{#35bf28}+3.04\%$
test_items_nested_leaf 0.1286ms 76.0184μs 13.1547 KOps/s 12.7940 KOps/s $\color{#35bf28}+2.82\%$
test_items_stack_nested 1.9791ms 0.2702ms 3.7006 KOps/s 3.8056 KOps/s $\color{#d91a1a}-2.76\%$
test_items_stack_nested_leaf 0.1295ms 76.4910μs 13.0734 KOps/s 12.6605 KOps/s $\color{#35bf28}+3.26\%$
test_items_stack_nested_locked 0.4504ms 0.2643ms 3.7836 KOps/s 3.8175 KOps/s $\color{#d91a1a}-0.89\%$
test_keys 43.7710μs 3.8479μs 259.8801 KOps/s 250.2786 KOps/s $\color{#35bf28}+3.84\%$
test_keys_nested 0.6231ms 0.1369ms 7.3025 KOps/s 7.2228 KOps/s $\color{#35bf28}+1.10\%$
test_keys_nested_locked 0.9169ms 0.1418ms 7.0544 KOps/s 7.0287 KOps/s $\color{#35bf28}+0.37\%$
test_keys_nested_leaf 0.1980ms 0.1151ms 8.6870 KOps/s 8.5538 KOps/s $\color{#35bf28}+1.56\%$
test_keys_stack_nested 0.2886ms 0.1367ms 7.3126 KOps/s 7.2944 KOps/s $\color{#35bf28}+0.25\%$
test_keys_stack_nested_leaf 0.2064ms 0.1155ms 8.6614 KOps/s 8.4812 KOps/s $\color{#35bf28}+2.12\%$
test_keys_stack_nested_locked 0.2394ms 0.1407ms 7.1086 KOps/s 7.0338 KOps/s $\color{#35bf28}+1.06\%$
test_values 9.3956μs 1.1437μs 874.3307 KOps/s 843.4067 KOps/s $\color{#35bf28}+3.67\%$
test_values_nested 92.1320μs 50.6730μs 19.7344 KOps/s 19.8705 KOps/s $\color{#d91a1a}-0.68\%$
test_values_nested_locked 97.5630μs 50.7885μs 19.6895 KOps/s 20.0038 KOps/s $\color{#d91a1a}-1.57\%$
test_values_nested_leaf 89.4970μs 45.6319μs 21.9145 KOps/s 22.0323 KOps/s $\color{#d91a1a}-0.53\%$
test_values_stack_nested 0.1248ms 50.7253μs 19.7140 KOps/s 19.6385 KOps/s $\color{#35bf28}+0.38\%$
test_values_stack_nested_leaf 0.1014ms 45.6998μs 21.8819 KOps/s 22.0121 KOps/s $\color{#d91a1a}-0.59\%$
test_values_stack_nested_locked 0.1106ms 51.2007μs 19.5310 KOps/s 19.7343 KOps/s $\color{#d91a1a}-1.03\%$
test_membership 25.5980μs 1.3562μs 737.3317 KOps/s 759.7288 KOps/s $\color{#d91a1a}-2.95\%$
test_membership_nested 40.5660μs 3.4343μs 291.1842 KOps/s 293.9113 KOps/s $\color{#d91a1a}-0.93\%$
test_membership_nested_leaf 39.0830μs 3.4473μs 290.0850 KOps/s 290.6250 KOps/s $\color{#d91a1a}-0.19\%$
test_membership_stacked_nested 47.2280μs 3.4016μs 293.9808 KOps/s 290.0887 KOps/s $\color{#35bf28}+1.34\%$
test_membership_stacked_nested_leaf 39.9650μs 3.4649μs 288.6062 KOps/s 290.8490 KOps/s $\color{#d91a1a}-0.77\%$
test_membership_nested_last 45.2240μs 4.2194μs 237.0028 KOps/s 239.7264 KOps/s $\color{#d91a1a}-1.14\%$
test_membership_nested_leaf_last 43.2800μs 4.2923μs 232.9752 KOps/s 234.5426 KOps/s $\color{#d91a1a}-0.67\%$
test_membership_stacked_nested_last 44.0720μs 4.2249μs 236.6943 KOps/s 192.2604 KOps/s $\textbf{\color{#35bf28}+23.11\%}$
test_membership_stacked_nested_leaf_last 39.0930μs 4.2115μs 237.4429 KOps/s 210.8862 KOps/s $\textbf{\color{#35bf28}+12.59\%}$
test_nested_getleaf 54.5120μs 10.8315μs 92.3234 KOps/s 93.8361 KOps/s $\color{#d91a1a}-1.61\%$
test_nested_get 55.4430μs 10.0586μs 99.4179 KOps/s 98.4652 KOps/s $\color{#35bf28}+0.97\%$
test_stacked_getleaf 30.4470μs 10.6264μs 94.1054 KOps/s 94.6576 KOps/s $\color{#d91a1a}-0.58\%$
test_stacked_get 59.8490μs 10.0023μs 99.9770 KOps/s 99.8666 KOps/s $\color{#35bf28}+0.11\%$
test_nested_getitemleaf 40.9970μs 11.1764μs 89.4742 KOps/s 88.6811 KOps/s $\color{#35bf28}+0.89\%$
test_nested_getitem 31.2480μs 10.2789μs 97.2862 KOps/s 96.7477 KOps/s $\color{#35bf28}+0.56\%$
test_stacked_getitemleaf 54.1950μs 10.9541μs 91.2900 KOps/s 90.1703 KOps/s $\color{#35bf28}+1.24\%$
test_stacked_getitem 55.6240μs 10.1137μs 98.8759 KOps/s 98.8503 KOps/s $\color{#35bf28}+0.03\%$
test_lock_nested 55.0724ms 0.3941ms 2.5372 KOps/s 2.9516 KOps/s $\textbf{\color{#d91a1a}-14.04\%}$
test_lock_stack_nested 0.6596ms 0.3092ms 3.2342 KOps/s 3.2469 KOps/s $\color{#d91a1a}-0.39\%$
test_unlock_nested 0.7261ms 0.3448ms 2.9000 KOps/s 2.8650 KOps/s $\color{#35bf28}+1.22\%$
test_unlock_stack_nested 0.4182ms 0.3155ms 3.1693 KOps/s 3.1634 KOps/s $\color{#35bf28}+0.19\%$
test_flatten_speed 0.2369ms 93.9031μs 10.6493 KOps/s 10.3456 KOps/s $\color{#35bf28}+2.94\%$
test_unflatten_speed 0.6352ms 0.4074ms 2.4545 KOps/s 2.4206 KOps/s $\color{#35bf28}+1.40\%$
test_common_ops 4.6710ms 0.7150ms 1.3985 KOps/s 1.4255 KOps/s $\color{#d91a1a}-1.89\%$
test_creation 23.5240μs 1.9131μs 522.7187 KOps/s 515.2700 KOps/s $\color{#35bf28}+1.45\%$
test_creation_empty 37.7310μs 10.4719μs 95.4936 KOps/s 96.5683 KOps/s $\color{#d91a1a}-1.11\%$
test_creation_nested_1 53.7000μs 13.3036μs 75.1679 KOps/s 75.0726 KOps/s $\color{#35bf28}+0.13\%$
test_creation_nested_2 63.0080μs 16.1524μs 61.9104 KOps/s 61.1014 KOps/s $\color{#35bf28}+1.32\%$
test_clone 0.1466ms 13.3375μs 74.9765 KOps/s 74.0198 KOps/s $\color{#35bf28}+1.29\%$
test_getitem[int] 29.0140μs 11.4423μs 87.3948 KOps/s 87.0567 KOps/s $\color{#35bf28}+0.39\%$
test_getitem[slice_int] 62.1060μs 22.8124μs 43.8358 KOps/s 43.4420 KOps/s $\color{#35bf28}+0.91\%$
test_getitem[range] 81.5930μs 59.9966μs 16.6676 KOps/s 16.9456 KOps/s $\color{#d91a1a}-1.64\%$
test_getitem[tuple] 44.7440μs 19.2183μs 52.0338 KOps/s 52.5625 KOps/s $\color{#d91a1a}-1.01\%$
test_getitem[list] 88.1050μs 41.0449μs 24.3635 KOps/s 25.0830 KOps/s $\color{#d91a1a}-2.87\%$
test_setitem_dim[int] 76.5940μs 33.7510μs 29.6288 KOps/s 30.2910 KOps/s $\color{#d91a1a}-2.19\%$
test_setitem_dim[slice_int] 0.1133ms 60.9458μs 16.4080 KOps/s 16.4636 KOps/s $\color{#d91a1a}-0.34\%$
test_setitem_dim[range] 0.1445ms 83.6601μs 11.9531 KOps/s 12.0411 KOps/s $\color{#d91a1a}-0.73\%$
test_setitem_dim[tuple] 0.1067ms 49.5819μs 20.1687 KOps/s 20.6714 KOps/s $\color{#d91a1a}-2.43\%$
test_setitem 52.5380μs 20.0384μs 49.9042 KOps/s 49.8516 KOps/s $\color{#35bf28}+0.11\%$
test_set 0.1235ms 19.9155μs 50.2122 KOps/s 51.2343 KOps/s $\color{#d91a1a}-2.00\%$
test_set_shared 2.7686ms 0.1554ms 6.4360 KOps/s 6.9153 KOps/s $\textbf{\color{#d91a1a}-6.93\%}$
test_update 0.1407ms 22.2390μs 44.9661 KOps/s 45.6628 KOps/s $\color{#d91a1a}-1.53\%$
test_update_nested 0.1190ms 31.9306μs 31.3179 KOps/s 33.1668 KOps/s $\textbf{\color{#d91a1a}-5.57\%}$
test_update__nested 96.8680μs 25.8553μs 38.6767 KOps/s 39.1876 KOps/s $\color{#d91a1a}-1.30\%$
test_set_nested 79.9700μs 21.6469μs 46.1960 KOps/s 46.8608 KOps/s $\color{#d91a1a}-1.42\%$
test_set_nested_new 92.1820μs 25.8883μs 38.6275 KOps/s 39.1402 KOps/s $\color{#d91a1a}-1.31\%$
test_select 0.1239ms 40.9127μs 24.4423 KOps/s 24.7153 KOps/s $\color{#d91a1a}-1.10\%$
test_select_nested 0.1414ms 59.7401μs 16.7392 KOps/s 16.2267 KOps/s $\color{#35bf28}+3.16\%$
test_exclude_nested 0.2169ms 0.1194ms 8.3728 KOps/s 8.0884 KOps/s $\color{#35bf28}+3.52\%$
test_empty[True] 0.8907ms 0.3917ms 2.5530 KOps/s 2.4910 KOps/s $\color{#35bf28}+2.49\%$
test_empty[False] 25.6680μs 1.1580μs 863.5691 KOps/s 850.8801 KOps/s $\color{#35bf28}+1.49\%$
test_unbind_speed 1.6514ms 0.2594ms 3.8557 KOps/s 3.9532 KOps/s $\color{#d91a1a}-2.47\%$
test_unbind_speed_stack0 0.4275ms 0.2497ms 4.0049 KOps/s 3.9169 KOps/s $\color{#35bf28}+2.25\%$
test_unbind_speed_stack1 77.3615ms 0.7399ms 1.3516 KOps/s 1.3547 KOps/s $\color{#d91a1a}-0.23\%$
test_split 76.1851ms 1.6062ms 622.5700 Ops/s 620.8322 Ops/s $\color{#35bf28}+0.28\%$
test_chunk 75.9000ms 1.6120ms 620.3286 Ops/s 621.7894 Ops/s $\color{#d91a1a}-0.23\%$
test_creation[device0] 0.2362ms 83.2955μs 12.0054 KOps/s 11.6940 KOps/s $\color{#35bf28}+2.66\%$
test_creation_from_tensor 3.3454ms 87.4957μs 11.4291 KOps/s 11.7585 KOps/s $\color{#d91a1a}-2.80\%$
test_add_one[memmap_tensor0] 0.1728ms 5.4210μs 184.4682 KOps/s 194.5773 KOps/s $\textbf{\color{#d91a1a}-5.20\%}$
test_contiguous[memmap_tensor0] 14.9380μs 0.6262μs 1.5969 MOps/s 1.5794 MOps/s $\color{#35bf28}+1.10\%$
test_stack[memmap_tensor0] 36.6180μs 3.5652μs 280.4911 KOps/s 283.6566 KOps/s $\color{#d91a1a}-1.12\%$
test_memmaptd_index 1.1151ms 0.2564ms 3.8999 KOps/s 3.9382 KOps/s $\color{#d91a1a}-0.97\%$
test_memmaptd_index_astensor 0.7701ms 0.3304ms 3.0267 KOps/s 3.0460 KOps/s $\color{#d91a1a}-0.63\%$
test_memmaptd_index_op 1.0808ms 0.6099ms 1.6395 KOps/s 1.6846 KOps/s $\color{#d91a1a}-2.67\%$
test_serialize_model 0.1802s 0.1153s 8.6702 Ops/s 8.4529 Ops/s $\color{#35bf28}+2.57\%$
test_serialize_model_pickle 0.4514s 0.3765s 2.6559 Ops/s 2.6126 Ops/s $\color{#35bf28}+1.66\%$
test_serialize_weights 0.1888s 0.1144s 8.7443 Ops/s 8.6173 Ops/s $\color{#35bf28}+1.47\%$
test_serialize_weights_returnearly 0.2066s 0.1382s 7.2369 Ops/s 7.1441 Ops/s $\color{#35bf28}+1.30\%$
test_serialize_weights_pickle 0.7771s 0.5217s 1.9168 Ops/s 1.3681 Ops/s $\textbf{\color{#35bf28}+40.10\%}$
test_serialize_weights_filesystem 0.1071s 95.3344ms 10.4894 Ops/s 10.6615 Ops/s $\color{#d91a1a}-1.61\%$
test_serialize_model_filesystem 0.1011s 95.6638ms 10.4533 Ops/s 9.8202 Ops/s $\textbf{\color{#35bf28}+6.45\%}$
test_reshape_pytree 54.9020μs 25.2862μs 39.5473 KOps/s 39.1359 KOps/s $\color{#35bf28}+1.05\%$
test_reshape_td 94.4370μs 34.2396μs 29.2060 KOps/s 29.1142 KOps/s $\color{#35bf28}+0.32\%$
test_view_pytree 87.1620μs 25.2844μs 39.5501 KOps/s 39.5343 KOps/s $\color{#35bf28}+0.04\%$
test_view_td 87.7640μs 38.6103μs 25.8998 KOps/s 25.6162 KOps/s $\color{#35bf28}+1.11\%$
test_unbind_pytree 62.1860μs 29.2536μs 34.1838 KOps/s 33.8245 KOps/s $\color{#35bf28}+1.06\%$
test_unbind_td 0.4114ms 37.6967μs 26.5275 KOps/s 26.5266 KOps/s $+0.00\%$
test_split_pytree 73.3970μs 29.2964μs 34.1339 KOps/s 34.1016 KOps/s $\color{#35bf28}+0.09\%$
test_split_td 0.5954ms 40.6045μs 24.6278 KOps/s 24.5049 KOps/s $\color{#35bf28}+0.50\%$
test_add_pytree 0.1133ms 34.8521μs 28.6927 KOps/s 28.9943 KOps/s $\color{#d91a1a}-1.04\%$
test_add_td 0.1173ms 55.6236μs 17.9780 KOps/s 18.6972 KOps/s $\color{#d91a1a}-3.85\%$
test_distributed 0.2717ms 0.1013ms 9.8696 KOps/s 9.6104 KOps/s $\color{#35bf28}+2.70\%$
test_tdmodule 33.0220μs 17.3814μs 57.5328 KOps/s 55.4996 KOps/s $\color{#35bf28}+3.66\%$
test_tdmodule_dispatch 73.7580μs 35.7155μs 27.9991 KOps/s 28.5611 KOps/s $\color{#d91a1a}-1.97\%$
test_tdseq 35.4360μs 19.7111μs 50.7329 KOps/s 47.9734 KOps/s $\textbf{\color{#35bf28}+5.75\%}$
test_tdseq_dispatch 65.2220μs 39.0851μs 25.5852 KOps/s 25.1664 KOps/s $\color{#35bf28}+1.66\%$
test_instantiation_functorch 1.5146ms 1.2795ms 781.5780 Ops/s 775.7759 Ops/s $\color{#35bf28}+0.75\%$
test_instantiation_td 1.6185ms 1.0079ms 992.2016 Ops/s 989.8834 Ops/s $\color{#35bf28}+0.23\%$
test_exec_functorch 0.2875ms 0.1596ms 6.2668 KOps/s 6.4005 KOps/s $\color{#d91a1a}-2.09\%$
test_exec_functional_call 0.2934ms 0.1523ms 6.5671 KOps/s 6.7493 KOps/s $\color{#d91a1a}-2.70\%$
test_exec_td 0.2610ms 0.1430ms 6.9913 KOps/s 5.8508 KOps/s $\textbf{\color{#35bf28}+19.49\%}$
test_exec_td_decorator 0.9558ms 0.2180ms 4.5880 KOps/s 4.5790 KOps/s $\color{#35bf28}+0.20\%$
test_vmap_mlp_speed[True-True] 0.6091ms 0.4797ms 2.0848 KOps/s 2.0445 KOps/s $\color{#35bf28}+1.97\%$
test_vmap_mlp_speed[True-False] 0.8020ms 0.4795ms 2.0855 KOps/s 2.0686 KOps/s $\color{#35bf28}+0.82\%$
test_vmap_mlp_speed[False-True] 0.5050ms 0.3916ms 2.5538 KOps/s 2.5518 KOps/s $\color{#35bf28}+0.08\%$
test_vmap_mlp_speed[False-False] 0.6971ms 0.3931ms 2.5439 KOps/s 2.5406 KOps/s $\color{#35bf28}+0.13\%$
test_vmap_mlp_speed_decorator[True-True] 1.2460ms 0.5852ms 1.7087 KOps/s 1.8040 KOps/s $\textbf{\color{#d91a1a}-5.28\%}$
test_vmap_mlp_speed_decorator[True-False] 1.0416ms 0.5488ms 1.8223 KOps/s 1.7966 KOps/s $\color{#35bf28}+1.43\%$
test_vmap_mlp_speed_decorator[False-True] 0.6406ms 0.4507ms 2.2187 KOps/s 2.1890 KOps/s $\color{#35bf28}+1.36\%$
test_vmap_mlp_speed_decorator[False-False] 0.7181ms 0.4525ms 2.2102 KOps/s 2.1828 KOps/s $\color{#35bf28}+1.25\%$
test_to_module_speed[True] 1.9785ms 1.6682ms 599.4604 Ops/s 586.1356 Ops/s $\color{#35bf28}+2.27\%$
test_to_module_speed[False] 1.7531ms 1.6310ms 613.1354 Ops/s 593.5519 Ops/s $\color{#35bf28}+3.30\%$
test_tc_init 57.0860μs 28.8660μs 34.6429 KOps/s 34.9003 KOps/s $\color{#d91a1a}-0.74\%$
test_tc_init_nested 0.1142ms 59.1806μs 16.8974 KOps/s 16.6917 KOps/s $\color{#35bf28}+1.23\%$
test_tc_first_layer_tensor 7.6500μs 0.6731μs 1.4856 MOps/s 1.4200 MOps/s $\color{#35bf28}+4.62\%$
test_tc_first_layer_nontensor 3.4034μs 0.6439μs 1.5529 MOps/s 1.3979 MOps/s $\textbf{\color{#35bf28}+11.09\%}$
test_tc_second_layer_tensor 15.2990μs 1.8194μs 549.6313 KOps/s 532.3964 KOps/s $\color{#35bf28}+3.24\%$
test_tc_second_layer_nontensor 49.0110μs 1.6280μs 614.2539 KOps/s 593.0612 KOps/s $\color{#35bf28}+3.57\%$
test_unbind 93.2245ms 7.4849ms 133.6017 Ops/s 151.7979 Ops/s $\textbf{\color{#d91a1a}-11.99\%}$
test_full_like 18.5099ms 11.6748ms 85.6548 Ops/s 85.4120 Ops/s $\color{#35bf28}+0.28\%$
test_zeros_like 6.5770ms 5.8881ms 169.8345 Ops/s 166.4712 Ops/s $\color{#35bf28}+2.02\%$
test_ones_like 11.5893ms 6.4864ms 154.1688 Ops/s 153.1604 Ops/s $\color{#35bf28}+0.66\%$
test_clone 12.6774ms 8.5673ms 116.7233 Ops/s 118.6141 Ops/s $\color{#d91a1a}-1.59\%$
test_squeeze 71.4530μs 14.0695μs 71.0756 KOps/s 73.4186 KOps/s $\color{#d91a1a}-3.19\%$
test_unsqueeze 0.2261ms 62.0992μs 16.1033 KOps/s 16.6051 KOps/s $\color{#d91a1a}-3.02\%$
test_split 0.2220ms 0.1104ms 9.0546 KOps/s 8.9122 KOps/s $\color{#35bf28}+1.60\%$
test_permute 0.2784ms 0.1264ms 7.9101 KOps/s 7.9189 KOps/s $\color{#d91a1a}-0.11\%$
test_stack 31.6008ms 24.6816ms 40.5160 Ops/s 42.0982 Ops/s $\color{#d91a1a}-3.76\%$
test_cat 31.9772ms 24.3322ms 41.0978 Ops/s 42.9416 Ops/s $\color{#d91a1a}-4.29\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}35$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.5705ms 13.5567μs 73.7644 KOps/s 81.2824 KOps/s $\textbf{\color{#d91a1a}-9.25\%}$
test_plain_set_stack_nested 29.7010μs 13.7306μs 72.8300 KOps/s 79.8309 KOps/s $\textbf{\color{#d91a1a}-8.77\%}$
test_plain_set_nested_inplace 41.3630μs 14.8484μs 67.3472 KOps/s 73.0854 KOps/s $\textbf{\color{#d91a1a}-7.85\%}$
test_plain_set_stack_nested_inplace 37.4930μs 14.9974μs 66.6781 KOps/s 72.3076 KOps/s $\textbf{\color{#d91a1a}-7.79\%}$
test_items 19.4010μs 4.6740μs 213.9488 KOps/s 205.6599 KOps/s $\color{#35bf28}+4.03\%$
test_items_nested 0.4046ms 0.3438ms 2.9089 KOps/s 2.9724 KOps/s $\color{#d91a1a}-2.14\%$
test_items_nested_locked 0.3850ms 0.3409ms 2.9336 KOps/s 2.9180 KOps/s $\color{#35bf28}+0.53\%$
test_items_nested_leaf 0.1058ms 83.3521μs 11.9973 KOps/s 12.1036 KOps/s $\color{#d91a1a}-0.88\%$
test_items_stack_nested 0.3889ms 0.3495ms 2.8610 KOps/s 2.9002 KOps/s $\color{#d91a1a}-1.35\%$
test_items_stack_nested_leaf 0.1081ms 84.6794μs 11.8092 KOps/s 11.9429 KOps/s $\color{#d91a1a}-1.12\%$
test_items_stack_nested_locked 0.3843ms 0.3432ms 2.9134 KOps/s 2.9436 KOps/s $\color{#d91a1a}-1.03\%$
test_keys 16.5410μs 4.3323μs 230.8251 KOps/s 230.9984 KOps/s $\color{#d91a1a}-0.08\%$
test_keys_nested 92.3160μs 66.8836μs 14.9513 KOps/s 14.7253 KOps/s $\color{#35bf28}+1.53\%$
test_keys_nested_locked 2.0464ms 72.1846μs 13.8534 KOps/s 13.8195 KOps/s $\color{#35bf28}+0.25\%$
test_keys_nested_leaf 91.1750μs 57.6667μs 17.3410 KOps/s 17.2594 KOps/s $\color{#35bf28}+0.47\%$
test_keys_stack_nested 97.4750μs 67.0383μs 14.9169 KOps/s 14.8510 KOps/s $\color{#35bf28}+0.44\%$
test_keys_stack_nested_leaf 76.1140μs 57.5624μs 17.3725 KOps/s 17.3291 KOps/s $\color{#35bf28}+0.25\%$
test_keys_stack_nested_locked 96.2350μs 71.7720μs 13.9330 KOps/s 14.0136 KOps/s $\color{#d91a1a}-0.58\%$
test_values 6.9540μs 1.8022μs 554.8778 KOps/s 549.8048 KOps/s $\color{#35bf28}+0.92\%$
test_values_nested 60.6730μs 35.4111μs 28.2397 KOps/s 28.2772 KOps/s $\color{#d91a1a}-0.13\%$
test_values_nested_locked 59.0340μs 37.4149μs 26.7273 KOps/s 26.7455 KOps/s $\color{#d91a1a}-0.07\%$
test_values_nested_leaf 52.2230μs 31.3864μs 31.8609 KOps/s 31.9546 KOps/s $\color{#d91a1a}-0.29\%$
test_values_stack_nested 68.6040μs 36.0947μs 27.7049 KOps/s 27.9287 KOps/s $\color{#d91a1a}-0.80\%$
test_values_stack_nested_leaf 52.9720μs 32.0334μs 31.2174 KOps/s 31.3420 KOps/s $\color{#d91a1a}-0.40\%$
test_values_stack_nested_locked 56.6530μs 37.4645μs 26.6919 KOps/s 26.5420 KOps/s $\color{#35bf28}+0.57\%$
test_membership 3.6401μs 0.7234μs 1.3824 MOps/s 1.1599 MOps/s $\textbf{\color{#35bf28}+19.19\%}$
test_membership_nested 33.3320μs 2.5400μs 393.6983 KOps/s 390.4784 KOps/s $\color{#35bf28}+0.82\%$
test_membership_nested_leaf 16.0200μs 2.5508μs 392.0287 KOps/s 388.6521 KOps/s $\color{#35bf28}+0.87\%$
test_membership_stacked_nested 33.7620μs 2.5492μs 392.2766 KOps/s 382.3550 KOps/s $\color{#35bf28}+2.59\%$
test_membership_stacked_nested_leaf 18.1010μs 2.5708μs 388.9911 KOps/s 382.5621 KOps/s $\color{#35bf28}+1.68\%$
test_membership_nested_last 20.2910μs 3.1082μs 321.7256 KOps/s 324.7623 KOps/s $\color{#d91a1a}-0.94\%$
test_membership_nested_leaf_last 35.1320μs 3.1125μs 321.2855 KOps/s 323.7026 KOps/s $\color{#d91a1a}-0.75\%$
test_membership_stacked_nested_last 22.3310μs 3.1271μs 319.7827 KOps/s 279.9070 KOps/s $\textbf{\color{#35bf28}+14.25\%}$
test_membership_stacked_nested_leaf_last 33.3720μs 3.0978μs 322.8056 KOps/s 280.5659 KOps/s $\textbf{\color{#35bf28}+15.06\%}$
test_nested_getleaf 30.0220μs 8.3754μs 119.3977 KOps/s 119.5952 KOps/s $\color{#d91a1a}-0.17\%$
test_nested_get 31.5020μs 7.9008μs 126.5698 KOps/s 126.9219 KOps/s $\color{#d91a1a}-0.28\%$
test_stacked_getleaf 39.2520μs 8.4557μs 118.2640 KOps/s 120.0514 KOps/s $\color{#d91a1a}-1.49\%$
test_stacked_get 21.6410μs 7.9010μs 126.5658 KOps/s 125.9716 KOps/s $\color{#35bf28}+0.47\%$
test_nested_getitemleaf 41.7920μs 8.6189μs 116.0237 KOps/s 117.1164 KOps/s $\color{#d91a1a}-0.93\%$
test_nested_getitem 18.4610μs 8.0647μs 123.9969 KOps/s 123.3793 KOps/s $\color{#35bf28}+0.50\%$
test_stacked_getitemleaf 33.7320μs 8.6248μs 115.9448 KOps/s 116.4415 KOps/s $\color{#d91a1a}-0.43\%$
test_stacked_getitem 28.5710μs 8.0957μs 123.5217 KOps/s 123.9807 KOps/s $\color{#d91a1a}-0.37\%$
test_lock_nested 58.8239ms 0.4012ms 2.4925 KOps/s 2.4898 KOps/s $\color{#35bf28}+0.11\%$
test_lock_stack_nested 0.3606ms 0.3013ms 3.3195 KOps/s 3.3074 KOps/s $\color{#35bf28}+0.37\%$
test_unlock_nested 60.3212ms 0.4061ms 2.4627 KOps/s 2.4388 KOps/s $\color{#35bf28}+0.98\%$
test_unlock_stack_nested 0.3546ms 0.3097ms 3.2288 KOps/s 3.2060 KOps/s $\color{#35bf28}+0.71\%$
test_flatten_speed 0.2980ms 0.1013ms 9.8744 KOps/s 9.8180 KOps/s $\color{#35bf28}+0.57\%$
test_unflatten_speed 0.3290ms 0.2924ms 3.4203 KOps/s 3.4079 KOps/s $\color{#35bf28}+0.36\%$
test_common_ops 1.0846ms 0.6091ms 1.6418 KOps/s 1.7624 KOps/s $\textbf{\color{#d91a1a}-6.84\%}$
test_creation 22.7610μs 1.6245μs 615.5718 KOps/s 606.0704 KOps/s $\color{#35bf28}+1.57\%$
test_creation_empty 29.1920μs 9.9427μs 100.5761 KOps/s 135.2324 KOps/s $\textbf{\color{#d91a1a}-25.63\%}$
test_creation_nested_1 41.6130μs 11.7486μs 85.1165 KOps/s 108.6426 KOps/s $\textbf{\color{#d91a1a}-21.65\%}$
test_creation_nested_2 32.6920μs 13.9926μs 71.4661 KOps/s 87.9718 KOps/s $\textbf{\color{#d91a1a}-18.76\%}$
test_clone 73.0640μs 11.8346μs 84.4978 KOps/s 84.7491 KOps/s $\color{#d91a1a}-0.30\%$
test_getitem[int] 24.9820μs 10.9412μs 91.3974 KOps/s 93.3405 KOps/s $\color{#d91a1a}-2.08\%$
test_getitem[slice_int] 40.7520μs 20.7754μs 48.1338 KOps/s 48.1057 KOps/s $\color{#35bf28}+0.06\%$
test_getitem[range] 64.9340μs 46.8786μs 21.3317 KOps/s 21.4065 KOps/s $\color{#d91a1a}-0.35\%$
test_getitem[tuple] 45.0130μs 18.8659μs 53.0057 KOps/s 54.2761 KOps/s $\color{#d91a1a}-2.34\%$
test_getitem[list] 0.1149ms 33.5190μs 29.8338 KOps/s 30.9892 KOps/s $\color{#d91a1a}-3.73\%$
test_setitem_dim[int] 48.8330μs 32.9829μs 30.3187 KOps/s 34.7125 KOps/s $\textbf{\color{#d91a1a}-12.66\%}$
test_setitem_dim[slice_int] 75.1340μs 53.8945μs 18.5548 KOps/s 20.4106 KOps/s $\textbf{\color{#d91a1a}-9.09\%}$
test_setitem_dim[range] 0.1418ms 70.6432μs 14.1556 KOps/s 14.3985 KOps/s $\color{#d91a1a}-1.69\%$
test_setitem_dim[tuple] 65.7140μs 48.1527μs 20.7673 KOps/s 23.0324 KOps/s $\textbf{\color{#d91a1a}-9.83\%}$
test_setitem 54.6030μs 17.2841μs 57.8567 KOps/s 62.3977 KOps/s $\textbf{\color{#d91a1a}-7.28\%}$
test_set 60.5630μs 16.4931μs 60.6314 KOps/s 63.6268 KOps/s $\color{#d91a1a}-4.71\%$
test_set_shared 1.4187ms 0.1012ms 9.8824 KOps/s 9.9833 KOps/s $\color{#d91a1a}-1.01\%$
test_update 66.0940μs 20.0446μs 49.8888 KOps/s 55.8974 KOps/s $\textbf{\color{#d91a1a}-10.75\%}$
test_update_nested 79.4350μs 25.5424μs 39.1506 KOps/s 42.4778 KOps/s $\textbf{\color{#d91a1a}-7.83\%}$
test_update__nested 66.3240μs 22.0651μs 45.3204 KOps/s 43.3752 KOps/s $\color{#35bf28}+4.48\%$
test_set_nested 77.3540μs 17.5138μs 57.0978 KOps/s 59.4574 KOps/s $\color{#d91a1a}-3.97\%$
test_set_nested_new 83.4440μs 20.9033μs 47.8394 KOps/s 50.9255 KOps/s $\textbf{\color{#d91a1a}-6.06\%}$
test_select 77.3050μs 34.1027μs 29.3232 KOps/s 31.2886 KOps/s $\textbf{\color{#d91a1a}-6.28\%}$
test_select_nested 0.8224ms 54.6936μs 18.2837 KOps/s 18.3584 KOps/s $\color{#d91a1a}-0.41\%$
test_exclude_nested 0.1351ms 0.1106ms 9.0382 KOps/s 9.0106 KOps/s $\color{#35bf28}+0.31\%$
test_empty[True] 0.4049ms 0.3444ms 2.9040 KOps/s 2.8844 KOps/s $\color{#35bf28}+0.68\%$
test_empty[False] 2.7211μs 0.9363μs 1.0681 MOps/s 1.0800 MOps/s $\color{#d91a1a}-1.11\%$
test_to 0.1048ms 82.8121μs 12.0755 KOps/s 12.8287 KOps/s $\textbf{\color{#d91a1a}-5.87\%}$
test_to_nonblocking 95.2950μs 63.1405μs 15.8377 KOps/s 15.8703 KOps/s $\color{#d91a1a}-0.21\%$
test_unbind_speed 0.3104ms 0.2643ms 3.7834 KOps/s 3.8381 KOps/s $\color{#d91a1a}-1.42\%$
test_unbind_speed_stack0 0.3175ms 0.2655ms 3.7669 KOps/s 3.8014 KOps/s $\color{#d91a1a}-0.91\%$
test_unbind_speed_stack1 75.8295ms 0.8109ms 1.2333 KOps/s 1.2299 KOps/s $\color{#35bf28}+0.27\%$
test_split 76.1977ms 1.6667ms 599.9859 Ops/s 605.3296 Ops/s $\color{#d91a1a}-0.88\%$
test_chunk 75.9774ms 1.6659ms 600.2632 Ops/s 605.7826 Ops/s $\color{#d91a1a}-0.91\%$
test_creation[device0] 0.1276ms 58.4177μs 17.1181 KOps/s 17.1967 KOps/s $\color{#d91a1a}-0.46\%$
test_creation_from_tensor 0.1307ms 59.1386μs 16.9094 KOps/s 18.4647 KOps/s $\textbf{\color{#d91a1a}-8.42\%}$
test_add_one[memmap_tensor0] 77.7740μs 7.2941μs 137.0973 KOps/s 152.2298 KOps/s $\textbf{\color{#d91a1a}-9.94\%}$
test_contiguous[memmap_tensor0] 27.1010μs 0.7092μs 1.4101 MOps/s 1.4007 MOps/s $\color{#35bf28}+0.67\%$
test_stack[memmap_tensor0] 22.1420μs 4.6654μs 214.3446 KOps/s 214.1035 KOps/s $\color{#35bf28}+0.11\%$
test_memmaptd_index 1.0840ms 0.2897ms 3.4517 KOps/s 3.4732 KOps/s $\color{#d91a1a}-0.62\%$
test_memmaptd_index_astensor 0.6412ms 0.3593ms 2.7836 KOps/s 2.7802 KOps/s $\color{#35bf28}+0.12\%$
test_memmaptd_index_op 1.2519ms 0.6770ms 1.4771 KOps/s 1.5727 KOps/s $\textbf{\color{#d91a1a}-6.08\%}$
test_serialize_model 0.1837s 0.1110s 9.0118 Ops/s 8.7109 Ops/s $\color{#35bf28}+3.45\%$
test_serialize_model_pickle 1.3813s 1.2385s 0.8074 Ops/s 0.8070 Ops/s $\color{#35bf28}+0.05\%$
test_serialize_weights 0.1816s 0.1085s 9.2200 Ops/s 8.7832 Ops/s $\color{#35bf28}+4.97\%$
test_serialize_weights_returnearly 0.2749s 0.1023s 9.7734 Ops/s 9.8539 Ops/s $\color{#d91a1a}-0.82\%$
test_serialize_weights_pickle 1.3485s 1.2488s 0.8008 Ops/s 0.8009 Ops/s $\color{#d91a1a}-0.01\%$
test_reshape_pytree 58.6630μs 26.2853μs 38.0440 KOps/s 38.5537 KOps/s $\color{#d91a1a}-1.32\%$
test_reshape_td 57.7040μs 31.5964μs 31.6492 KOps/s 32.3502 KOps/s $\color{#d91a1a}-2.17\%$
test_view_pytree 60.5640μs 25.9270μs 38.5699 KOps/s 38.9282 KOps/s $\color{#d91a1a}-0.92\%$
test_view_td 64.2830μs 37.0148μs 27.0162 KOps/s 27.3627 KOps/s $\color{#d91a1a}-1.27\%$
test_unbind_pytree 0.1534ms 31.6536μs 31.5919 KOps/s 31.7181 KOps/s $\color{#d91a1a}-0.40\%$
test_unbind_td 0.4544ms 41.2596μs 24.2368 KOps/s 25.0011 KOps/s $\color{#d91a1a}-3.06\%$
test_split_pytree 56.4630μs 34.0612μs 29.3589 KOps/s 27.9256 KOps/s $\textbf{\color{#35bf28}+5.13\%}$
test_split_td 0.5154ms 39.4041μs 25.3781 KOps/s 25.5553 KOps/s $\color{#d91a1a}-0.69\%$
test_add_pytree 61.6730μs 37.4029μs 26.7359 KOps/s 27.8602 KOps/s $\color{#d91a1a}-4.04\%$
test_add_td 82.7040μs 55.2445μs 18.1013 KOps/s 20.8821 KOps/s $\textbf{\color{#d91a1a}-13.32\%}$
test_distributed 1.8839ms 69.5212μs 14.3841 KOps/s 14.9427 KOps/s $\color{#d91a1a}-3.74\%$
test_tdmodule 36.8420μs 15.5022μs 64.5069 KOps/s 65.8343 KOps/s $\color{#d91a1a}-2.02\%$
test_tdmodule_dispatch 46.9230μs 30.5130μs 32.7729 KOps/s 37.3938 KOps/s $\textbf{\color{#d91a1a}-12.36\%}$
test_tdseq 40.6920μs 17.1565μs 58.2871 KOps/s 61.5247 KOps/s $\textbf{\color{#d91a1a}-5.26\%}$
test_tdseq_dispatch 50.4530μs 33.9139μs 29.4864 KOps/s 31.9251 KOps/s $\textbf{\color{#d91a1a}-7.64\%}$
test_instantiation_functorch 1.6370ms 1.5443ms 647.5549 Ops/s 664.4724 Ops/s $\color{#d91a1a}-2.55\%$
test_instantiation_td 1.5740ms 1.0401ms 961.4897 Ops/s 967.8928 Ops/s $\color{#d91a1a}-0.66\%$
test_exec_functorch 0.1739ms 0.1482ms 6.7474 KOps/s 6.8694 KOps/s $\color{#d91a1a}-1.78\%$
test_exec_functional_call 0.1786ms 0.1386ms 7.2155 KOps/s 7.4762 KOps/s $\color{#d91a1a}-3.49\%$
test_exec_td 0.1816ms 0.1358ms 7.3664 KOps/s 7.6532 KOps/s $\color{#d91a1a}-3.75\%$
test_exec_td_decorator 0.3119ms 0.2071ms 4.8284 KOps/s 4.8710 KOps/s $\color{#d91a1a}-0.87\%$
test_vmap_mlp_speed[True-True] 0.6798ms 0.6122ms 1.6335 KOps/s 1.7568 KOps/s $\textbf{\color{#d91a1a}-7.02\%}$
test_vmap_mlp_speed[True-False] 0.6729ms 0.5915ms 1.6906 KOps/s 1.7530 KOps/s $\color{#d91a1a}-3.56\%$
test_vmap_mlp_speed[False-True] 0.6127ms 0.5184ms 1.9290 KOps/s 1.9976 KOps/s $\color{#d91a1a}-3.43\%$
test_vmap_mlp_speed[False-False] 0.6428ms 0.5373ms 1.8612 KOps/s 1.9989 KOps/s $\textbf{\color{#d91a1a}-6.89\%}$
test_vmap_mlp_speed_decorator[True-True] 1.1961ms 0.6813ms 1.4678 KOps/s 1.5745 KOps/s $\textbf{\color{#d91a1a}-6.78\%}$
test_vmap_mlp_speed_decorator[True-False] 0.7907ms 0.6491ms 1.5406 KOps/s 1.5467 KOps/s $\color{#d91a1a}-0.39\%$
test_vmap_mlp_speed_decorator[False-True] 0.7036ms 0.5730ms 1.7451 KOps/s 1.7923 KOps/s $\color{#d91a1a}-2.63\%$
test_vmap_mlp_speed_decorator[False-False] 0.7300ms 0.5719ms 1.7484 KOps/s 1.7959 KOps/s $\color{#d91a1a}-2.64\%$
test_vmap_transformer_speed[True-True] 8.0052ms 7.6462ms 130.7837 Ops/s 133.3522 Ops/s $\color{#d91a1a}-1.93\%$
test_vmap_transformer_speed[True-False] 8.3746ms 7.6683ms 130.4072 Ops/s 134.4266 Ops/s $\color{#d91a1a}-2.99\%$
test_vmap_transformer_speed[False-True] 8.4818ms 8.0593ms 124.0798 Ops/s 134.7966 Ops/s $\textbf{\color{#d91a1a}-7.95\%}$
test_vmap_transformer_speed[False-False] 8.1801ms 7.8834ms 126.8493 Ops/s 135.1431 Ops/s $\textbf{\color{#d91a1a}-6.14\%}$
test_vmap_transformer_speed_decorator[True-True] 19.7044ms 18.7509ms 53.3309 Ops/s 55.4204 Ops/s $\color{#d91a1a}-3.77\%$
test_vmap_transformer_speed_decorator[True-False] 19.3204ms 18.6898ms 53.5052 Ops/s 55.2409 Ops/s $\color{#d91a1a}-3.14\%$
test_vmap_transformer_speed_decorator[False-True] 18.6933ms 18.5198ms 53.9963 Ops/s 55.7272 Ops/s $\color{#d91a1a}-3.11\%$
test_vmap_transformer_speed_decorator[False-False] 18.6833ms 18.5204ms 53.9945 Ops/s 55.7191 Ops/s $\color{#d91a1a}-3.10\%$
test_to_module_speed[True] 1.6440ms 1.5274ms 654.7056 Ops/s 661.3874 Ops/s $\color{#d91a1a}-1.01\%$
test_to_module_speed[False] 0.1003s 1.7756ms 563.1938 Ops/s 668.9118 Ops/s $\textbf{\color{#d91a1a}-15.80\%}$
test_tc_init 49.2320μs 28.1543μs 35.5186 KOps/s 45.1803 KOps/s $\textbf{\color{#d91a1a}-21.38\%}$
test_tc_init_nested 76.8950μs 54.1797μs 18.4571 KOps/s 22.7019 KOps/s $\textbf{\color{#d91a1a}-18.70\%}$
test_tc_first_layer_tensor 0.9801μs 0.3617μs 2.7650 MOps/s 2.7716 MOps/s $\color{#d91a1a}-0.24\%$
test_tc_first_layer_nontensor 1.4001μs 0.3880μs 2.5772 MOps/s 2.5334 MOps/s $\color{#35bf28}+1.73\%$
test_tc_second_layer_tensor 4.3404μs 0.9642μs 1.0371 MOps/s 1.0351 MOps/s $\color{#35bf28}+0.19\%$
test_tc_second_layer_nontensor 2.3651μs 0.7954μs 1.2573 MOps/s 1.2527 MOps/s $\color{#35bf28}+0.36\%$
test_unbind 0.1075s 6.8125ms 146.7885 Ops/s 131.1520 Ops/s $\textbf{\color{#35bf28}+11.92\%}$
test_full_like 14.2492ms 13.3694ms 74.7976 Ops/s 75.9993 Ops/s $\color{#d91a1a}-1.58\%$
test_zeros_like 8.0100ms 7.8058ms 128.1105 Ops/s 127.6027 Ops/s $\color{#35bf28}+0.40\%$
test_ones_like 8.3493ms 7.8790ms 126.9197 Ops/s 127.2340 Ops/s $\color{#d91a1a}-0.25\%$
test_clone 9.4090ms 9.1910ms 108.8018 Ops/s 108.7480 Ops/s $\color{#35bf28}+0.05\%$
test_squeeze 53.0930μs 11.7113μs 85.3874 KOps/s 92.1829 KOps/s $\textbf{\color{#d91a1a}-7.37\%}$
test_unsqueeze 0.1006ms 55.5781μs 17.9927 KOps/s 19.8620 KOps/s $\textbf{\color{#d91a1a}-9.41\%}$
test_split 0.1403ms 0.1005ms 9.9461 KOps/s 10.5804 KOps/s $\textbf{\color{#d91a1a}-6.00\%}$
test_permute 0.1393ms 0.1099ms 9.1000 KOps/s 9.3122 KOps/s $\color{#d91a1a}-2.28\%$
test_stack 26.7257ms 26.5099ms 37.7217 Ops/s 37.5423 Ops/s $\color{#35bf28}+0.48\%$
test_cat 26.6338ms 26.4708ms 37.7775 Ops/s 37.6552 Ops/s $\color{#35bf28}+0.32\%$

@vmoens vmoens merged commit 6ae96dc into main Jun 25, 2024
41 of 43 checks passed
@vmoens vmoens deleted the fix-where-nontensor branch October 21, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] NonTensorData behavior with equal data is not transparent to the rest of the library
2 participants