Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Keep dim names in transpose #662

Merged
merged 7 commits into from
Feb 6, 2024
Merged

[BugFix] Keep dim names in transpose #662

merged 7 commits into from
Feb 6, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 6, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 6, 2024
@vmoens vmoens added the bug Something isn't working label Feb 6, 2024
Copy link

github-actions bot commented Feb 6, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 124. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 34.1140μs 17.5665μs 56.9265 KOps/s 57.5851 KOps/s $\color{#d91a1a}-1.14\%$
test_plain_set_stack_nested 0.2314ms 0.1506ms 6.6400 KOps/s 6.8399 KOps/s $\color{#d91a1a}-2.92\%$
test_plain_set_nested_inplace 70.8730μs 19.7616μs 50.6033 KOps/s 49.9469 KOps/s $\color{#35bf28}+1.31\%$
test_plain_set_stack_nested_inplace 0.3322ms 0.1763ms 5.6707 KOps/s 5.5599 KOps/s $\color{#35bf28}+1.99\%$
test_items 15.7800μs 2.4284μs 411.8015 KOps/s 412.4764 KOps/s $\color{#d91a1a}-0.16\%$
test_items_nested 0.3417ms 0.2747ms 3.6406 KOps/s 3.6915 KOps/s $\color{#d91a1a}-1.38\%$
test_items_nested_locked 0.4352ms 0.2746ms 3.6417 KOps/s 3.6820 KOps/s $\color{#d91a1a}-1.09\%$
test_items_nested_leaf 0.6457ms 0.1731ms 5.7763 KOps/s 6.0039 KOps/s $\color{#d91a1a}-3.79\%$
test_items_stack_nested 1.6752ms 1.3220ms 756.4289 Ops/s 762.1379 Ops/s $\color{#d91a1a}-0.75\%$
test_items_stack_nested_leaf 1.8412ms 1.2244ms 816.7479 Ops/s 833.7976 Ops/s $\color{#d91a1a}-2.04\%$
test_items_stack_nested_locked 1.6272ms 0.8864ms 1.1281 KOps/s 1.1531 KOps/s $\color{#d91a1a}-2.17\%$
test_keys 23.0320μs 3.8437μs 260.1640 KOps/s 260.8565 KOps/s $\color{#d91a1a}-0.27\%$
test_keys_nested 1.9216ms 0.1487ms 6.7234 KOps/s 6.7491 KOps/s $\color{#d91a1a}-0.38\%$
test_keys_nested_locked 0.2612ms 0.1514ms 6.6048 KOps/s 6.5846 KOps/s $\color{#35bf28}+0.31\%$
test_keys_nested_leaf 0.2396ms 0.1303ms 7.6727 KOps/s 7.6909 KOps/s $\color{#d91a1a}-0.24\%$
test_keys_stack_nested 1.5130ms 1.2535ms 797.7826 Ops/s 797.4421 Ops/s $\color{#35bf28}+0.04\%$
test_keys_stack_nested_leaf 1.5183ms 1.2527ms 798.2892 Ops/s 793.5511 Ops/s $\color{#35bf28}+0.60\%$
test_keys_stack_nested_locked 1.3833ms 0.8063ms 1.2403 KOps/s 1.2467 KOps/s $\color{#d91a1a}-0.52\%$
test_values 9.9510μs 1.1595μs 862.4526 KOps/s 881.7046 KOps/s $\color{#d91a1a}-2.18\%$
test_values_nested 0.1037ms 51.4078μs 19.4523 KOps/s 19.6380 KOps/s $\color{#d91a1a}-0.95\%$
test_values_nested_locked 98.9250μs 51.8656μs 19.2806 KOps/s 19.6664 KOps/s $\color{#d91a1a}-1.96\%$
test_values_nested_leaf 90.4290μs 46.0041μs 21.7372 KOps/s 21.7510 KOps/s $\color{#d91a1a}-0.06\%$
test_values_stack_nested 1.2175ms 1.0194ms 980.9939 Ops/s 972.6403 Ops/s $\color{#35bf28}+0.86\%$
test_values_stack_nested_leaf 1.2456ms 1.0087ms 991.3361 Ops/s 979.3562 Ops/s $\color{#35bf28}+1.22\%$
test_values_stack_nested_locked 1.2317ms 0.6038ms 1.6562 KOps/s 1.6776 KOps/s $\color{#d91a1a}-1.28\%$
test_membership 12.3530μs 1.3207μs 757.1474 KOps/s 745.3210 KOps/s $\color{#35bf28}+1.59\%$
test_membership_nested 28.2430μs 3.3914μs 294.8652 KOps/s 290.8021 KOps/s $\color{#35bf28}+1.40\%$
test_membership_nested_leaf 41.8980μs 3.4260μs 291.8893 KOps/s 288.9336 KOps/s $\color{#35bf28}+1.02\%$
test_membership_stacked_nested 48.8710μs 11.6978μs 85.4865 KOps/s 85.7554 KOps/s $\color{#d91a1a}-0.31\%$
test_membership_stacked_nested_leaf 31.9290μs 11.7952μs 84.7804 KOps/s 86.1294 KOps/s $\color{#d91a1a}-1.57\%$
test_membership_nested_last 46.1260μs 6.5806μs 151.9617 KOps/s 150.1475 KOps/s $\color{#35bf28}+1.21\%$
test_membership_nested_leaf_last 25.7380μs 6.5932μs 151.6711 KOps/s 150.8430 KOps/s $\color{#35bf28}+0.55\%$
test_membership_stacked_nested_last 0.3070ms 0.1740ms 5.7458 KOps/s 5.7145 KOps/s $\color{#35bf28}+0.55\%$
test_membership_stacked_nested_leaf_last 66.8240μs 13.8586μs 72.1573 KOps/s 73.1864 KOps/s $\color{#d91a1a}-1.41\%$
test_nested_getleaf 46.6370μs 10.6156μs 94.2006 KOps/s 94.7549 KOps/s $\color{#d91a1a}-0.58\%$
test_nested_get 56.8760μs 10.0474μs 99.5287 KOps/s 101.2837 KOps/s $\color{#d91a1a}-1.73\%$
test_stacked_getleaf 0.9064ms 0.3962ms 2.5237 KOps/s 2.5433 KOps/s $\color{#d91a1a}-0.77\%$
test_stacked_get 0.4250ms 0.3626ms 2.7579 KOps/s 2.7605 KOps/s $\color{#d91a1a}-0.09\%$
test_nested_getitemleaf 50.1630μs 11.8413μs 84.4503 KOps/s 83.7278 KOps/s $\color{#35bf28}+0.86\%$
test_nested_getitem 46.2870μs 11.3546μs 88.0697 KOps/s 86.6068 KOps/s $\color{#35bf28}+1.69\%$
test_stacked_getitemleaf 0.4686ms 0.3982ms 2.5114 KOps/s 2.4841 KOps/s $\color{#35bf28}+1.10\%$
test_stacked_getitem 0.8067ms 0.3671ms 2.7238 KOps/s 2.7198 KOps/s $\color{#35bf28}+0.15\%$
test_lock_nested 4.4009ms 0.3299ms 3.0309 KOps/s 2.9778 KOps/s $\color{#35bf28}+1.78\%$
test_lock_stack_nested 89.9697ms 5.8189ms 171.8531 Ops/s 176.3530 Ops/s $\color{#d91a1a}-2.55\%$
test_unlock_nested 79.2678ms 0.4107ms 2.4349 KOps/s 2.9683 KOps/s $\textbf{\color{#d91a1a}-17.97\%}$
test_unlock_stack_nested 81.4470ms 5.9778ms 167.2843 Ops/s 172.0589 Ops/s $\color{#d91a1a}-2.78\%$
test_flatten_speed 0.6630ms 0.3693ms 2.7076 KOps/s 2.7364 KOps/s $\color{#d91a1a}-1.05\%$
test_unflatten_speed 0.7571ms 0.4504ms 2.2200 KOps/s 2.2041 KOps/s $\color{#35bf28}+0.72\%$
test_common_ops 1.3132ms 0.6881ms 1.4534 KOps/s 1.4303 KOps/s $\color{#35bf28}+1.61\%$
test_creation 39.8440μs 1.8522μs 539.8873 KOps/s 552.6177 KOps/s $\color{#d91a1a}-2.30\%$
test_creation_empty 0.2123ms 11.5812μs 86.3470 KOps/s 89.9788 KOps/s $\color{#d91a1a}-4.04\%$
test_creation_nested_1 75.7220μs 13.9397μs 71.7376 KOps/s 73.2936 KOps/s $\color{#d91a1a}-2.12\%$
test_creation_nested_2 45.5650μs 17.3337μs 57.6910 KOps/s 58.8264 KOps/s $\color{#d91a1a}-1.93\%$
test_clone 67.9270μs 12.9740μs 77.0774 KOps/s 76.3858 KOps/s $\color{#35bf28}+0.91\%$
test_getitem[int] 29.6850μs 10.8700μs 91.9962 KOps/s 90.4003 KOps/s $\color{#35bf28}+1.77\%$
test_getitem[slice_int] 54.3610μs 21.4319μs 46.6594 KOps/s 43.8308 KOps/s $\textbf{\color{#35bf28}+6.45\%}$
test_getitem[range] 0.1660ms 40.6894μs 24.5764 KOps/s 23.1789 KOps/s $\textbf{\color{#35bf28}+6.03\%}$
test_getitem[tuple] 45.1950μs 17.7723μs 56.2674 KOps/s 54.2088 KOps/s $\color{#35bf28}+3.80\%$
test_getitem[list] 0.1473ms 36.1601μs 27.6548 KOps/s 26.5875 KOps/s $\color{#35bf28}+4.01\%$
test_setitem_dim[int] 75.0300μs 30.5993μs 32.6805 KOps/s 32.0049 KOps/s $\color{#35bf28}+2.11\%$
test_setitem_dim[slice_int] 83.0850μs 55.3222μs 18.0759 KOps/s 17.2410 KOps/s $\color{#35bf28}+4.84\%$
test_setitem_dim[range] 0.1360ms 75.6324μs 13.2219 KOps/s 12.8649 KOps/s $\color{#35bf28}+2.77\%$
test_setitem_dim[tuple] 0.1154ms 46.2467μs 21.6232 KOps/s 21.2486 KOps/s $\color{#35bf28}+1.76\%$
test_setitem 0.2077ms 20.6613μs 48.3997 KOps/s 49.2881 KOps/s $\color{#d91a1a}-1.80\%$
test_set 69.7500μs 19.4019μs 51.5414 KOps/s 50.4419 KOps/s $\color{#35bf28}+2.18\%$
test_set_shared 3.4001ms 0.1347ms 7.4264 KOps/s 7.4133 KOps/s $\color{#35bf28}+0.18\%$
test_update 97.0020μs 22.9602μs 43.5537 KOps/s 42.7812 KOps/s $\color{#35bf28}+1.81\%$
test_update_nested 0.7808ms 31.1083μs 32.1457 KOps/s 31.7402 KOps/s $\color{#35bf28}+1.28\%$
test_set_nested 78.2560μs 21.5789μs 46.3415 KOps/s 45.5389 KOps/s $\color{#35bf28}+1.76\%$
test_set_nested_new 70.4720μs 25.2334μs 39.6300 KOps/s 39.1653 KOps/s $\color{#35bf28}+1.19\%$
test_select 81.8730μs 37.2478μs 26.8473 KOps/s 25.6587 KOps/s $\color{#35bf28}+4.63\%$
test_select_nested 0.1240ms 57.5278μs 17.3829 KOps/s 17.2644 KOps/s $\color{#35bf28}+0.69\%$
test_exclude_nested 0.2597ms 0.1158ms 8.6329 KOps/s 8.5945 KOps/s $\color{#35bf28}+0.45\%$
test_empty[True] 0.8567ms 0.4008ms 2.4950 KOps/s 2.4506 KOps/s $\color{#35bf28}+1.81\%$
test_empty[False] 8.2956μs 1.0415μs 960.1517 KOps/s 976.0671 KOps/s $\color{#d91a1a}-1.63\%$
test_unbind_speed 0.3434ms 0.2558ms 3.9087 KOps/s 4.0826 KOps/s $\color{#d91a1a}-4.26\%$
test_unbind_speed_stack0 75.9236ms 3.5258ms 283.6232 Ops/s 330.5429 Ops/s $\textbf{\color{#d91a1a}-14.19\%}$
test_unbind_speed_stack1 18.7850μs 2.0095μs 497.6424 KOps/s 507.2406 KOps/s $\color{#d91a1a}-1.89\%$
test_split 2.1292ms 1.4471ms 691.0250 Ops/s 618.2433 Ops/s $\textbf{\color{#35bf28}+11.77\%}$
test_chunk 71.1536ms 1.5446ms 647.4149 Ops/s 642.6907 Ops/s $\color{#35bf28}+0.74\%$
test_creation[device0] 0.3143ms 98.3862μs 10.1640 KOps/s 10.0354 KOps/s $\color{#35bf28}+1.28\%$
test_creation_from_tensor 3.5841ms 78.8693μs 12.6792 KOps/s 12.3614 KOps/s $\color{#35bf28}+2.57\%$
test_add_one[memmap_tensor0] 0.3195ms 5.3166μs 188.0911 KOps/s 186.5152 KOps/s $\color{#35bf28}+0.84\%$
test_contiguous[memmap_tensor0] 11.6220μs 0.6257μs 1.5982 MOps/s 1.5995 MOps/s $\color{#d91a1a}-0.08\%$
test_stack[memmap_tensor0] 61.7550μs 3.5616μs 280.7693 KOps/s 263.4022 KOps/s $\textbf{\color{#35bf28}+6.59\%}$
test_memmaptd_index 0.9579ms 0.2305ms 4.3380 KOps/s 4.2537 KOps/s $\color{#35bf28}+1.98\%$
test_memmaptd_index_astensor 0.6276ms 0.2889ms 3.4618 KOps/s 3.3734 KOps/s $\color{#35bf28}+2.62\%$
test_memmaptd_index_op 0.9420ms 0.5970ms 1.6751 KOps/s 1.6385 KOps/s $\color{#35bf28}+2.23\%$
test_serialize_model 0.1694s 0.1089s 9.1809 Ops/s 8.9787 Ops/s $\color{#35bf28}+2.25\%$
test_serialize_model_pickle 0.4566s 0.3804s 2.6290 Ops/s 2.6429 Ops/s $\color{#d91a1a}-0.53\%$
test_serialize_weights 0.1745s 0.1075s 9.3027 Ops/s 8.9899 Ops/s $\color{#35bf28}+3.48\%$
test_serialize_weights_returnearly 0.2011s 0.1280s 7.8104 Ops/s 8.0468 Ops/s $\color{#d91a1a}-2.94\%$
test_serialize_weights_pickle 1.0898s 0.5775s 1.7316 Ops/s 2.3785 Ops/s $\textbf{\color{#d91a1a}-27.20\%}$
test_serialize_weights_filesystem 98.0102ms 92.4212ms 10.8200 Ops/s 10.4109 Ops/s $\color{#35bf28}+3.93\%$
test_serialize_model_filesystem 0.1619s 96.4434ms 10.3688 Ops/s 10.0912 Ops/s $\color{#35bf28}+2.75\%$
test_reshape_pytree 46.8370μs 20.9490μs 47.7349 KOps/s 47.0267 KOps/s $\color{#35bf28}+1.51\%$
test_reshape_td 0.1263ms 31.1254μs 32.1281 KOps/s 32.7033 KOps/s $\color{#d91a1a}-1.76\%$
test_view_pytree 55.4830μs 20.9672μs 47.6935 KOps/s 48.2500 KOps/s $\color{#d91a1a}-1.15\%$
test_view_td 73.8741ms 10.8477μs 92.1850 KOps/s 92.3726 KOps/s $\color{#d91a1a}-0.20\%$
test_unbind_pytree 56.2050μs 24.0004μs 41.6661 KOps/s 41.1151 KOps/s $\color{#35bf28}+1.34\%$
test_unbind_td 0.1259ms 34.9057μs 28.6486 KOps/s 27.8549 KOps/s $\color{#35bf28}+2.85\%$
test_split_pytree 57.2870μs 23.9041μs 41.8339 KOps/s 41.4960 KOps/s $\color{#35bf28}+0.81\%$
test_split_td 0.3901ms 38.9531μs 25.6719 KOps/s 25.4204 KOps/s $\color{#35bf28}+0.99\%$
test_add_pytree 66.9450μs 29.9754μs 33.3606 KOps/s 33.4498 KOps/s $\color{#d91a1a}-0.27\%$
test_add_td 0.1139ms 52.9692μs 18.8789 KOps/s 18.7018 KOps/s $\color{#35bf28}+0.95\%$
test_distributed 0.2092ms 97.4339μs 10.2634 KOps/s 9.9784 KOps/s $\color{#35bf28}+2.86\%$
test_tdmodule 0.1042ms 22.7755μs 43.9067 KOps/s 41.7287 KOps/s $\textbf{\color{#35bf28}+5.22\%}$
test_tdmodule_dispatch 0.1554ms 45.5232μs 21.9668 KOps/s 22.0930 KOps/s $\color{#d91a1a}-0.57\%$
test_tdseq 45.5140μs 25.4556μs 39.2841 KOps/s 38.0362 KOps/s $\color{#35bf28}+3.28\%$
test_tdseq_dispatch 0.1490ms 48.6922μs 20.5372 KOps/s 20.3606 KOps/s $\color{#35bf28}+0.87\%$
test_instantiation_functorch 1.6059ms 1.3159ms 759.9458 Ops/s 758.6274 Ops/s $\color{#35bf28}+0.17\%$
test_instantiation_td 2.1764ms 1.0076ms 992.4220 Ops/s 997.8478 Ops/s $\color{#d91a1a}-0.54\%$
test_exec_functorch 0.3623ms 0.1590ms 6.2883 KOps/s 6.4063 KOps/s $\color{#d91a1a}-1.84\%$
test_exec_functional_call 0.2302ms 0.1468ms 6.8131 KOps/s 6.8892 KOps/s $\color{#d91a1a}-1.10\%$
test_exec_td 0.2988ms 0.1427ms 7.0053 KOps/s 7.1482 KOps/s $\color{#d91a1a}-2.00\%$
test_exec_td_decorator 0.8136ms 0.1949ms 5.1304 KOps/s 5.0083 KOps/s $\color{#35bf28}+2.44\%$
test_vmap_mlp_speed[True-True] 1.2584ms 0.8804ms 1.1359 KOps/s 1.1501 KOps/s $\color{#d91a1a}-1.24\%$
test_vmap_mlp_speed[True-False] 0.6611ms 0.4590ms 2.1787 KOps/s 2.1818 KOps/s $\color{#d91a1a}-0.14\%$
test_vmap_mlp_speed[False-True] 1.3166ms 0.7602ms 1.3154 KOps/s 1.3311 KOps/s $\color{#d91a1a}-1.18\%$
test_vmap_mlp_speed[False-False] 0.7160ms 0.3770ms 2.6524 KOps/s 2.6838 KOps/s $\color{#d91a1a}-1.17\%$
test_vmap_mlp_speed_decorator[True-True] 0.1059s 2.5648ms 389.8881 Ops/s 436.8217 Ops/s $\textbf{\color{#d91a1a}-10.74\%}$
test_vmap_mlp_speed_decorator[True-False] 0.9714ms 0.5297ms 1.8878 KOps/s 1.8703 KOps/s $\color{#35bf28}+0.94\%$
test_vmap_mlp_speed_decorator[False-True] 2.4343ms 1.8915ms 528.6752 Ops/s 511.1260 Ops/s $\color{#35bf28}+3.43\%$
test_vmap_mlp_speed_decorator[False-False] 0.7443ms 0.4112ms 2.4317 KOps/s 2.4626 KOps/s $\color{#d91a1a}-1.25\%$

Copy link

github-actions bot commented Feb 6, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 132. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1307ms 14.0400μs 71.2251 KOps/s 75.8528 KOps/s $\textbf{\color{#d91a1a}-6.10\%}$
test_plain_set_stack_nested 0.1475ms 0.1186ms 8.4304 KOps/s 8.4057 KOps/s $\color{#35bf28}+0.29\%$
test_plain_set_nested_inplace 34.6210μs 15.5069μs 64.4875 KOps/s 68.2943 KOps/s $\textbf{\color{#d91a1a}-5.57\%}$
test_plain_set_stack_nested_inplace 0.2098ms 0.1486ms 6.7275 KOps/s 6.7190 KOps/s $\color{#35bf28}+0.13\%$
test_items 22.2410μs 4.7767μs 209.3480 KOps/s 212.9158 KOps/s $\color{#d91a1a}-1.68\%$
test_items_nested 0.3943ms 0.3419ms 2.9248 KOps/s 2.8939 KOps/s $\color{#35bf28}+1.07\%$
test_items_nested_locked 0.5325ms 0.3480ms 2.8737 KOps/s 2.8886 KOps/s $\color{#d91a1a}-0.52\%$
test_items_nested_leaf 0.2535ms 0.2033ms 4.9182 KOps/s 4.9554 KOps/s $\color{#d91a1a}-0.75\%$
test_items_stack_nested 1.3962ms 1.3142ms 760.9313 Ops/s 756.9065 Ops/s $\color{#35bf28}+0.53\%$
test_items_stack_nested_leaf 1.2086ms 1.1510ms 868.8128 Ops/s 870.7839 Ops/s $\color{#d91a1a}-0.23\%$
test_items_stack_nested_locked 0.9781ms 0.8972ms 1.1146 KOps/s 1.0946 KOps/s $\color{#35bf28}+1.82\%$
test_keys 18.0400μs 4.5593μs 219.3330 KOps/s 219.0930 KOps/s $\color{#35bf28}+0.11\%$
test_keys_nested 0.7250ms 95.0125μs 10.5249 KOps/s 10.5275 KOps/s $\color{#d91a1a}-0.02\%$
test_keys_nested_locked 0.1235ms 98.4948μs 10.1528 KOps/s 10.0826 KOps/s $\color{#35bf28}+0.70\%$
test_keys_nested_leaf 0.1923ms 78.8711μs 12.6789 KOps/s 12.6991 KOps/s $\color{#d91a1a}-0.16\%$
test_keys_stack_nested 1.2266ms 1.1537ms 866.7931 Ops/s 876.2736 Ops/s $\color{#d91a1a}-1.08\%$
test_keys_stack_nested_leaf 1.2061ms 1.1308ms 884.3317 Ops/s 892.7090 Ops/s $\color{#d91a1a}-0.94\%$
test_keys_stack_nested_locked 0.7993ms 0.7116ms 1.4053 KOps/s 1.3936 KOps/s $\color{#35bf28}+0.84\%$
test_values 8.8700μs 1.8828μs 531.1200 KOps/s 524.5193 KOps/s $\color{#35bf28}+1.26\%$
test_values_nested 71.8510μs 45.4613μs 21.9967 KOps/s 21.9928 KOps/s $\color{#35bf28}+0.02\%$
test_values_nested_locked 82.1510μs 47.7898μs 20.9250 KOps/s 20.8496 KOps/s $\color{#35bf28}+0.36\%$
test_values_nested_leaf 96.0210μs 39.8549μs 25.0910 KOps/s 25.2990 KOps/s $\color{#d91a1a}-0.82\%$
test_values_stack_nested 1.0284ms 0.9586ms 1.0432 KOps/s 1.0427 KOps/s $\color{#35bf28}+0.05\%$
test_values_stack_nested_leaf 1.0830ms 0.9572ms 1.0447 KOps/s 1.0504 KOps/s $\color{#d91a1a}-0.54\%$
test_values_stack_nested_locked 0.7141ms 0.5678ms 1.7611 KOps/s 1.7443 KOps/s $\color{#35bf28}+0.97\%$
test_membership 11.6582μs 0.9387μs 1.0653 MOps/s 1.0557 MOps/s $\color{#35bf28}+0.91\%$
test_membership_nested 24.2300μs 2.8576μs 349.9427 KOps/s 344.8592 KOps/s $\color{#35bf28}+1.47\%$
test_membership_nested_leaf 21.5700μs 2.8777μs 347.5037 KOps/s 344.4591 KOps/s $\color{#35bf28}+0.88\%$
test_membership_stacked_nested 35.1710μs 11.2008μs 89.2790 KOps/s 87.7200 KOps/s $\color{#35bf28}+1.78\%$
test_membership_stacked_nested_leaf 34.4810μs 11.3241μs 88.3070 KOps/s 87.5117 KOps/s $\color{#35bf28}+0.91\%$
test_membership_nested_last 18.2200μs 5.3417μs 187.2049 KOps/s 186.6618 KOps/s $\color{#35bf28}+0.29\%$
test_membership_nested_leaf_last 21.7710μs 5.3884μs 185.5832 KOps/s 187.7937 KOps/s $\color{#d91a1a}-1.18\%$
test_membership_stacked_nested_last 0.1979ms 0.1583ms 6.3171 KOps/s 6.3449 KOps/s $\color{#d91a1a}-0.44\%$
test_membership_stacked_nested_leaf_last 32.0300μs 13.1853μs 75.8420 KOps/s 75.8304 KOps/s $\color{#35bf28}+0.02\%$
test_nested_getleaf 29.8300μs 8.5350μs 117.1643 KOps/s 118.5420 KOps/s $\color{#d91a1a}-1.16\%$
test_nested_get 22.8310μs 8.0277μs 124.5684 KOps/s 125.4691 KOps/s $\color{#d91a1a}-0.72\%$
test_stacked_getleaf 0.3670ms 0.3345ms 2.9899 KOps/s 3.0347 KOps/s $\color{#d91a1a}-1.48\%$
test_stacked_get 0.3406ms 0.3018ms 3.3134 KOps/s 3.3410 KOps/s $\color{#d91a1a}-0.83\%$
test_nested_getitemleaf 32.7200μs 9.9045μs 100.9642 KOps/s 100.9758 KOps/s $\color{#d91a1a}-0.01\%$
test_nested_getitem 31.3000μs 9.4135μs 106.2300 KOps/s 106.0291 KOps/s $\color{#35bf28}+0.19\%$
test_stacked_getitemleaf 0.4428ms 0.3355ms 2.9810 KOps/s 2.9985 KOps/s $\color{#d91a1a}-0.58\%$
test_stacked_getitem 0.3584ms 0.3040ms 3.2896 KOps/s 3.3177 KOps/s $\color{#d91a1a}-0.85\%$
test_lock_nested 0.7989ms 0.3515ms 2.8450 KOps/s 2.8232 KOps/s $\color{#35bf28}+0.77\%$
test_lock_stack_nested 85.3150ms 6.3683ms 157.0278 Ops/s 159.2956 Ops/s $\color{#d91a1a}-1.42\%$
test_unlock_nested 79.4209ms 0.4317ms 2.3167 KOps/s 2.8756 KOps/s $\textbf{\color{#d91a1a}-19.44\%}$
test_unlock_stack_nested 85.9133ms 6.4614ms 154.7661 Ops/s 155.1498 Ops/s $\color{#d91a1a}-0.25\%$
test_flatten_speed 0.3178ms 0.2635ms 3.7945 KOps/s 3.7666 KOps/s $\color{#35bf28}+0.74\%$
test_unflatten_speed 0.4031ms 0.3653ms 2.7371 KOps/s 2.7559 KOps/s $\color{#d91a1a}-0.68\%$
test_common_ops 1.0939ms 0.6209ms 1.6107 KOps/s 1.6904 KOps/s $\color{#d91a1a}-4.72\%$
test_creation 15.0910μs 1.5721μs 636.1106 KOps/s 633.7639 KOps/s $\color{#35bf28}+0.37\%$
test_creation_empty 26.1100μs 9.2148μs 108.5206 KOps/s 131.3296 KOps/s $\textbf{\color{#d91a1a}-17.37\%}$
test_creation_nested_1 25.5900μs 11.0636μs 90.3867 KOps/s 106.6960 KOps/s $\textbf{\color{#d91a1a}-15.29\%}$
test_creation_nested_2 91.7110μs 13.4081μs 74.5818 KOps/s 84.3840 KOps/s $\textbf{\color{#d91a1a}-11.62\%}$
test_clone 62.5710μs 14.1217μs 70.8129 KOps/s 71.5679 KOps/s $\color{#d91a1a}-1.05\%$
test_getitem[int] 29.9310μs 11.1892μs 89.3721 KOps/s 90.3616 KOps/s $\color{#d91a1a}-1.10\%$
test_getitem[slice_int] 40.8910μs 21.5356μs 46.4348 KOps/s 45.3703 KOps/s $\color{#35bf28}+2.35\%$
test_getitem[range] 0.1724ms 36.6383μs 27.2939 KOps/s 28.7029 KOps/s $\color{#d91a1a}-4.91\%$
test_getitem[tuple] 40.1410μs 19.2206μs 52.0276 KOps/s 51.2772 KOps/s $\color{#35bf28}+1.46\%$
test_getitem[list] 0.1737ms 33.4822μs 29.8666 KOps/s 28.8770 KOps/s $\color{#35bf28}+3.43\%$
test_setitem_dim[int] 49.1910μs 27.3606μs 36.5490 KOps/s 35.1748 KOps/s $\color{#35bf28}+3.91\%$
test_setitem_dim[slice_int] 74.4110μs 47.6430μs 20.9894 KOps/s 20.5273 KOps/s $\color{#35bf28}+2.25\%$
test_setitem_dim[range] 82.3810μs 60.4721μs 16.5365 KOps/s 15.8218 KOps/s $\color{#35bf28}+4.52\%$
test_setitem_dim[tuple] 60.9720μs 41.2185μs 24.2610 KOps/s 23.2678 KOps/s $\color{#35bf28}+4.27\%$
test_setitem 73.8510μs 19.1722μs 52.1590 KOps/s 53.9262 KOps/s $\color{#d91a1a}-3.28\%$
test_set 81.0110μs 18.6366μs 53.6579 KOps/s 55.6649 KOps/s $\color{#d91a1a}-3.61\%$
test_set_shared 3.0213ms 0.1046ms 9.5611 KOps/s 9.7128 KOps/s $\color{#d91a1a}-1.56\%$
test_update 66.6810μs 21.7959μs 45.8801 KOps/s 49.5353 KOps/s $\textbf{\color{#d91a1a}-7.38\%}$
test_update_nested 66.1410μs 29.0636μs 34.4073 KOps/s 37.2795 KOps/s $\textbf{\color{#d91a1a}-7.70\%}$
test_set_nested 59.5910μs 20.1535μs 49.6192 KOps/s 51.3029 KOps/s $\color{#d91a1a}-3.28\%$
test_set_nested_new 75.8410μs 22.7434μs 43.9689 KOps/s 45.5465 KOps/s $\color{#d91a1a}-3.46\%$
test_select 86.0310μs 35.8491μs 27.8947 KOps/s 28.1267 KOps/s $\color{#d91a1a}-0.82\%$
test_select_nested 85.6810μs 54.0908μs 18.4874 KOps/s 18.5703 KOps/s $\color{#d91a1a}-0.45\%$
test_exclude_nested 0.1461ms 0.1167ms 8.5716 KOps/s 8.7837 KOps/s $\color{#d91a1a}-2.41\%$
test_empty[True] 0.5162ms 0.3978ms 2.5135 KOps/s 2.5367 KOps/s $\color{#d91a1a}-0.91\%$
test_empty[False] 2.2250μs 0.8428μs 1.1865 MOps/s 1.1801 MOps/s $\color{#35bf28}+0.54\%$
test_to 76.3810μs 55.6408μs 17.9724 KOps/s 18.2672 KOps/s $\color{#d91a1a}-1.61\%$
test_to_nonblocking 62.2910μs 35.7128μs 28.0012 KOps/s 28.4865 KOps/s $\color{#d91a1a}-1.70\%$
test_unbind_speed 0.3022ms 0.2696ms 3.7098 KOps/s 3.7393 KOps/s $\color{#d91a1a}-0.79\%$
test_unbind_speed_stack0 88.1177ms 4.1630ms 240.2101 Ops/s 301.6268 Ops/s $\textbf{\color{#d91a1a}-20.36\%}$
test_unbind_speed_stack1 25.1100μs 1.7901μs 558.6179 KOps/s 558.3567 KOps/s $\color{#35bf28}+0.05\%$
test_split 2.0509ms 1.5662ms 638.4978 Ops/s 637.4485 Ops/s $\color{#35bf28}+0.16\%$
test_chunk 80.7260ms 1.6940ms 590.3101 Ops/s 589.3657 Ops/s $\color{#35bf28}+0.16\%$
test_creation[device0] 0.1610ms 70.6741μs 14.1495 KOps/s 14.2514 KOps/s $\color{#d91a1a}-0.72\%$
test_creation_from_tensor 0.1306ms 53.1750μs 18.8058 KOps/s 18.9813 KOps/s $\color{#d91a1a}-0.92\%$
test_add_one[memmap_tensor0] 0.1980ms 6.6917μs 149.4396 KOps/s 153.5433 KOps/s $\color{#d91a1a}-2.67\%$
test_contiguous[memmap_tensor0] 23.4910μs 0.6173μs 1.6199 MOps/s 1.6078 MOps/s $\color{#35bf28}+0.75\%$
test_stack[memmap_tensor0] 44.0910μs 4.5615μs 219.2262 KOps/s 221.4832 KOps/s $\color{#d91a1a}-1.02\%$
test_memmaptd_index 1.0390ms 0.2699ms 3.7052 KOps/s 3.8021 KOps/s $\color{#d91a1a}-2.55\%$
test_memmaptd_index_astensor 0.5925ms 0.3286ms 3.0434 KOps/s 3.0822 KOps/s $\color{#d91a1a}-1.26\%$
test_memmaptd_index_op 0.9310ms 0.6283ms 1.5916 KOps/s 1.6696 KOps/s $\color{#d91a1a}-4.67\%$
test_serialize_model 0.1729s 98.1831ms 10.1851 Ops/s 9.5208 Ops/s $\textbf{\color{#35bf28}+6.98\%}$
test_serialize_model_pickle 1.3497s 1.2361s 0.8090 Ops/s 0.8084 Ops/s $\color{#35bf28}+0.06\%$
test_serialize_weights 0.1697s 96.0105ms 10.4155 Ops/s 9.6703 Ops/s $\textbf{\color{#35bf28}+7.71\%}$
test_serialize_weights_returnearly 0.1979s 69.1188ms 14.4678 Ops/s 12.3858 Ops/s $\textbf{\color{#35bf28}+16.81\%}$
test_serialize_weights_pickle 1.3566s 1.2367s 0.8086 Ops/s 0.8030 Ops/s $\color{#35bf28}+0.70\%$
test_reshape_pytree 56.7900μs 25.2682μs 39.5755 KOps/s 39.8088 KOps/s $\color{#d91a1a}-0.59\%$
test_reshape_td 59.4310μs 31.7657μs 31.4805 KOps/s 31.7995 KOps/s $\color{#d91a1a}-1.00\%$
test_view_pytree 48.3810μs 24.8268μs 40.2791 KOps/s 40.9488 KOps/s $\color{#d91a1a}-1.64\%$
test_view_td 0.3777ms 6.7624μs 147.8755 KOps/s 145.5962 KOps/s $\color{#35bf28}+1.57\%$
test_unbind_pytree 0.1669ms 31.7003μs 31.5455 KOps/s 31.9415 KOps/s $\color{#d91a1a}-1.24\%$
test_unbind_td 0.2231ms 41.2624μs 24.2351 KOps/s 24.2185 KOps/s $\color{#35bf28}+0.07\%$
test_split_pytree 55.8310μs 28.6664μs 34.8841 KOps/s 34.6513 KOps/s $\color{#35bf28}+0.67\%$
test_split_td 0.4976ms 40.4741μs 24.7072 KOps/s 25.5704 KOps/s $\color{#d91a1a}-3.38\%$
test_add_pytree 61.7010μs 36.1640μs 27.6518 KOps/s 28.1350 KOps/s $\color{#d91a1a}-1.72\%$
test_add_td 75.6710μs 50.9085μs 19.6431 KOps/s 19.7558 KOps/s $\color{#d91a1a}-0.57\%$
test_distributed 1.5972ms 72.7831μs 13.7395 KOps/s 11.2567 KOps/s $\textbf{\color{#35bf28}+22.06\%}$
test_tdmodule 92.8920μs 18.3921μs 54.3712 KOps/s 57.1100 KOps/s $\color{#d91a1a}-4.80\%$
test_tdmodule_dispatch 0.2234ms 37.6965μs 26.5277 KOps/s 27.9038 KOps/s $\color{#d91a1a}-4.93\%$
test_tdseq 36.0800μs 20.8946μs 47.8594 KOps/s 50.2020 KOps/s $\color{#d91a1a}-4.67\%$
test_tdseq_dispatch 56.3110μs 39.6752μs 25.2047 KOps/s 26.2491 KOps/s $\color{#d91a1a}-3.98\%$
test_instantiation_functorch 1.7467ms 1.6848ms 593.5597 Ops/s 598.9056 Ops/s $\color{#d91a1a}-0.89\%$
test_instantiation_td 1.7025ms 1.1610ms 861.3519 Ops/s 865.7367 Ops/s $\color{#d91a1a}-0.51\%$
test_exec_functorch 0.2188ms 0.1581ms 6.3236 KOps/s 6.3792 KOps/s $\color{#d91a1a}-0.87\%$
test_exec_functional_call 0.2195ms 0.1565ms 6.3901 KOps/s 6.4824 KOps/s $\color{#d91a1a}-1.42\%$
test_exec_td 0.1809ms 0.1476ms 6.7768 KOps/s 6.7993 KOps/s $\color{#d91a1a}-0.33\%$
test_exec_td_decorator 0.1135s 0.2339ms 4.2746 KOps/s 4.9488 KOps/s $\textbf{\color{#d91a1a}-13.62\%}$
test_vmap_mlp_speed[True-True] 1.1021ms 1.0186ms 981.7202 Ops/s 981.8298 Ops/s $\color{#d91a1a}-0.01\%$
test_vmap_mlp_speed[True-False] 0.6487ms 0.5907ms 1.6930 KOps/s 1.7119 KOps/s $\color{#d91a1a}-1.10\%$
test_vmap_mlp_speed[False-True] 1.0015ms 0.9305ms 1.0746 KOps/s 1.0723 KOps/s $\color{#35bf28}+0.22\%$
test_vmap_mlp_speed[False-False] 0.6096ms 0.5164ms 1.9365 KOps/s 1.9361 KOps/s $\color{#35bf28}+0.03\%$
test_vmap_mlp_speed_decorator[True-True] 2.9475ms 2.3405ms 427.2659 Ops/s 432.5453 Ops/s $\color{#d91a1a}-1.22\%$
test_vmap_mlp_speed_decorator[True-False] 1.1306ms 0.6692ms 1.4943 KOps/s 1.5413 KOps/s $\color{#d91a1a}-3.05\%$
test_vmap_mlp_speed_decorator[False-True] 2.4087ms 1.9627ms 509.4921 Ops/s 518.8903 Ops/s $\color{#d91a1a}-1.81\%$
test_vmap_mlp_speed_decorator[False-False] 0.8952ms 0.5564ms 1.7973 KOps/s 1.8287 KOps/s $\color{#d91a1a}-1.72\%$
test_vmap_transformer_speed[True-True] 13.5171ms 12.5256ms 79.8363 Ops/s 80.9587 Ops/s $\color{#d91a1a}-1.39\%$
test_vmap_transformer_speed[True-False] 8.5880ms 8.0792ms 123.7741 Ops/s 126.5800 Ops/s $\color{#d91a1a}-2.22\%$
test_vmap_transformer_speed[False-True] 12.6364ms 12.1727ms 82.1511 Ops/s 83.2816 Ops/s $\color{#d91a1a}-1.36\%$
test_vmap_transformer_speed[False-False] 8.5504ms 8.0007ms 124.9886 Ops/s 127.0530 Ops/s $\color{#d91a1a}-1.62\%$
test_vmap_transformer_speed_decorator[True-True] 75.1436ms 74.1602ms 13.4843 Ops/s 13.8287 Ops/s $\color{#d91a1a}-2.49\%$
test_vmap_transformer_speed_decorator[True-False] 21.4498ms 19.7309ms 50.6819 Ops/s 51.0917 Ops/s $\color{#d91a1a}-0.80\%$
test_vmap_transformer_speed_decorator[False-True] 67.9068ms 66.6776ms 14.9975 Ops/s 15.4002 Ops/s $\color{#d91a1a}-2.61\%$
test_vmap_transformer_speed_decorator[False-False] 21.0391ms 19.2788ms 51.8704 Ops/s 53.1739 Ops/s $\color{#d91a1a}-2.45\%$

@vmoens vmoens merged commit 707fd2c into main Feb 6, 2024
48 checks passed
@vmoens vmoens deleted the named-transpose branch February 6, 2024 12:17
vmoens added a commit that referenced this pull request Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants