Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Faster empty_like for MemoryMappedTensor (dup) #586

Merged
merged 3 commits into from
Nov 30, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 30, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 30, 2023
@vmoens vmoens added bug Something isn't working Performance labels Nov 30, 2023
@vmoens vmoens marked this pull request as ready for review November 30, 2023 10:16
@vmoens vmoens merged commit ab4b630 into main Nov 30, 2023
30 of 33 checks passed
@vmoens vmoens deleted the fix_memmap_empty2 branch November 30, 2023 10:21
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 113. Improved: $\large\color{#35bf28}33$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 32.3500μs 15.7147μs 63.6349 KOps/s 58.4044 KOps/s $\textbf{\color{#35bf28}+8.96\%}$
test_plain_set_stack_nested 0.1856ms 0.1413ms 7.0748 KOps/s 6.4709 KOps/s $\textbf{\color{#35bf28}+9.33\%}$
test_plain_set_nested_inplace 43.1100μs 18.9639μs 52.7317 KOps/s 48.2482 KOps/s $\textbf{\color{#35bf28}+9.29\%}$
test_plain_set_stack_nested_inplace 0.3256ms 0.1730ms 5.7798 KOps/s 5.3360 KOps/s $\textbf{\color{#35bf28}+8.32\%}$
test_items 29.9830μs 2.3980μs 417.0157 KOps/s 385.0612 KOps/s $\textbf{\color{#35bf28}+8.30\%}$
test_items_nested 0.3285ms 0.2695ms 3.7105 KOps/s 3.7163 KOps/s $\color{#d91a1a}-0.16\%$
test_items_nested_locked 1.2234ms 0.2700ms 3.7043 KOps/s 3.6897 KOps/s $\color{#35bf28}+0.39\%$
test_items_nested_leaf 0.3153ms 0.1645ms 6.0787 KOps/s 5.9750 KOps/s $\color{#35bf28}+1.74\%$
test_items_stack_nested 2.2205ms 1.4841ms 673.8048 Ops/s 646.4464 Ops/s $\color{#35bf28}+4.23\%$
test_items_stack_nested_leaf 2.0577ms 1.3457ms 743.1312 Ops/s 718.6469 Ops/s $\color{#35bf28}+3.41\%$
test_items_stack_nested_locked 1.2870ms 0.7642ms 1.3085 KOps/s 1.2843 KOps/s $\color{#35bf28}+1.89\%$
test_keys 20.6590μs 3.9152μs 255.4123 KOps/s 258.5095 KOps/s $\color{#d91a1a}-1.20\%$
test_keys_nested 3.3161ms 0.1403ms 7.1280 KOps/s 6.6527 KOps/s $\textbf{\color{#35bf28}+7.14\%}$
test_keys_nested_locked 0.2711ms 0.1381ms 7.2416 KOps/s 6.9381 KOps/s $\color{#35bf28}+4.37\%$
test_keys_nested_leaf 0.4106ms 0.1391ms 7.1901 KOps/s 6.6329 KOps/s $\textbf{\color{#35bf28}+8.40\%}$
test_keys_stack_nested 2.3084ms 1.4059ms 711.3059 Ops/s 683.0801 Ops/s $\color{#35bf28}+4.13\%$
test_keys_stack_nested_leaf 1.5775ms 1.4056ms 711.4519 Ops/s 680.4736 Ops/s $\color{#35bf28}+4.55\%$
test_keys_stack_nested_locked 0.8078ms 0.6744ms 1.4827 KOps/s 1.4229 KOps/s $\color{#35bf28}+4.20\%$
test_values 8.9770μs 1.2052μs 829.7110 KOps/s 835.5122 KOps/s $\color{#d91a1a}-0.69\%$
test_values_nested 98.2630μs 49.3377μs 20.2685 KOps/s 19.5174 KOps/s $\color{#35bf28}+3.85\%$
test_values_nested_locked 88.5250μs 49.7591μs 20.0968 KOps/s 19.7274 KOps/s $\color{#35bf28}+1.87\%$
test_values_nested_leaf 64.5500μs 43.9020μs 22.7780 KOps/s 21.3623 KOps/s $\textbf{\color{#35bf28}+6.63\%}$
test_values_stack_nested 1.9200ms 1.1926ms 838.5027 Ops/s 795.8124 Ops/s $\textbf{\color{#35bf28}+5.36\%}$
test_values_stack_nested_leaf 1.8629ms 1.2075ms 828.1355 Ops/s 809.8044 Ops/s $\color{#35bf28}+2.26\%$
test_values_stack_nested_locked 0.9592ms 0.5144ms 1.9441 KOps/s 1.8922 KOps/s $\color{#35bf28}+2.74\%$
test_membership 11.6320μs 1.3690μs 730.4756 KOps/s 747.7469 KOps/s $\color{#d91a1a}-2.31\%$
test_membership_nested 20.9290μs 2.8257μs 353.8975 KOps/s 351.6043 KOps/s $\color{#35bf28}+0.65\%$
test_membership_nested_leaf 20.6480μs 2.8437μs 351.6594 KOps/s 319.7146 KOps/s $\textbf{\color{#35bf28}+9.99\%}$
test_membership_stacked_nested 35.4760μs 11.7878μs 84.8338 KOps/s 81.4649 KOps/s $\color{#35bf28}+4.14\%$
test_membership_stacked_nested_leaf 46.2560μs 11.8537μs 84.3620 KOps/s 81.7635 KOps/s $\color{#35bf28}+3.18\%$
test_membership_nested_last 24.4850μs 5.9907μs 166.9242 KOps/s 163.7231 KOps/s $\color{#35bf28}+1.96\%$
test_membership_nested_leaf_last 25.2370μs 5.9866μs 167.0408 KOps/s 163.3438 KOps/s $\color{#35bf28}+2.26\%$
test_membership_stacked_nested_last 0.2359ms 0.1691ms 5.9133 KOps/s 5.7499 KOps/s $\color{#35bf28}+2.84\%$
test_membership_stacked_nested_leaf_last 79.5390μs 13.7308μs 72.8289 KOps/s 68.8470 KOps/s $\textbf{\color{#35bf28}+5.78\%}$
test_nested_getleaf 33.7320μs 10.9538μs 91.2929 KOps/s 90.3576 KOps/s $\color{#35bf28}+1.04\%$
test_nested_get 26.4800μs 10.3786μs 96.3525 KOps/s 94.8474 KOps/s $\color{#35bf28}+1.59\%$
test_stacked_getleaf 1.2122ms 0.6413ms 1.5592 KOps/s 1.5144 KOps/s $\color{#35bf28}+2.96\%$
test_stacked_get 5.0487ms 0.6182ms 1.6176 KOps/s 1.5885 KOps/s $\color{#35bf28}+1.83\%$
test_nested_getitemleaf 27.6510μs 10.7848μs 92.7229 KOps/s 90.4617 KOps/s $\color{#35bf28}+2.50\%$
test_nested_getitem 45.2740μs 10.1970μs 98.0683 KOps/s 94.4652 KOps/s $\color{#35bf28}+3.81\%$
test_stacked_getitemleaf 1.1055ms 0.6385ms 1.5662 KOps/s 1.4836 KOps/s $\textbf{\color{#35bf28}+5.57\%}$
test_stacked_getitem 0.9658ms 0.6084ms 1.6435 KOps/s 1.5674 KOps/s $\color{#35bf28}+4.86\%$
test_lock_nested 7.4964ms 0.5667ms 1.7647 KOps/s 1.7472 KOps/s $\color{#35bf28}+1.00\%$
test_lock_stack_nested 7.5857ms 5.0192ms 199.2364 Ops/s 193.9755 Ops/s $\color{#35bf28}+2.71\%$
test_unlock_nested 76.4495ms 0.5172ms 1.9334 KOps/s 2.2127 KOps/s $\textbf{\color{#d91a1a}-12.62\%}$
test_unlock_stack_nested 71.3180ms 6.9242ms 144.4217 Ops/s 139.8136 Ops/s $\color{#35bf28}+3.30\%$
test_flatten_speed 0.5899ms 0.2714ms 3.6841 KOps/s 3.5543 KOps/s $\color{#35bf28}+3.65\%$
test_unflatten_speed 0.7834ms 0.4660ms 2.1458 KOps/s 2.0972 KOps/s $\color{#35bf28}+2.32\%$
test_common_ops 1.2149ms 0.6642ms 1.5055 KOps/s 1.3979 KOps/s $\textbf{\color{#35bf28}+7.69\%}$
test_creation 59.7510μs 2.4620μs 406.1704 KOps/s 401.0375 KOps/s $\color{#35bf28}+1.28\%$
test_creation_empty 45.4050μs 8.0693μs 123.9262 KOps/s 111.9301 KOps/s $\textbf{\color{#35bf28}+10.72\%}$
test_creation_nested_1 31.3590μs 11.2872μs 88.5963 KOps/s 80.2395 KOps/s $\textbf{\color{#35bf28}+10.41\%}$
test_creation_nested_2 32.6710μs 14.8109μs 67.5179 KOps/s 62.3953 KOps/s $\textbf{\color{#35bf28}+8.21\%}$
test_clone 0.1046ms 14.0140μs 71.3571 KOps/s 70.8623 KOps/s $\color{#35bf28}+0.70\%$
test_getitem[int] 48.9010μs 13.3278μs 75.0310 KOps/s 75.5807 KOps/s $\color{#d91a1a}-0.73\%$
test_getitem[slice_int] 0.1003ms 26.1206μs 38.2840 KOps/s 39.0660 KOps/s $\color{#d91a1a}-2.00\%$
test_getitem[range] 0.1036ms 45.1239μs 22.1612 KOps/s 21.2577 KOps/s $\color{#35bf28}+4.25\%$
test_getitem[tuple] 60.2020μs 21.0192μs 47.5756 KOps/s 48.3943 KOps/s $\color{#d91a1a}-1.69\%$
test_getitem[list] 0.2094ms 39.4847μs 25.3263 KOps/s 23.8507 KOps/s $\textbf{\color{#35bf28}+6.19\%}$
test_setitem_dim[int] 54.0410μs 28.0697μs 35.6255 KOps/s 33.3169 KOps/s $\textbf{\color{#35bf28}+6.93\%}$
test_setitem_dim[slice_int] 85.7500μs 52.9914μs 18.8710 KOps/s 18.0037 KOps/s $\color{#35bf28}+4.82\%$
test_setitem_dim[range] 0.1077ms 73.9568μs 13.5214 KOps/s 13.1371 KOps/s $\color{#35bf28}+2.93\%$
test_setitem_dim[tuple] 68.4580μs 41.7295μs 23.9639 KOps/s 23.0422 KOps/s $\color{#35bf28}+4.00\%$
test_setitem 84.2370μs 18.6788μs 53.5366 KOps/s 49.0663 KOps/s $\textbf{\color{#35bf28}+9.11\%}$
test_set 83.0350μs 17.8829μs 55.9193 KOps/s 51.1588 KOps/s $\textbf{\color{#35bf28}+9.31\%}$
test_set_shared 3.7122ms 0.1402ms 7.1317 KOps/s 6.9349 KOps/s $\color{#35bf28}+2.84\%$
test_update 0.1398ms 19.0417μs 52.5164 KOps/s 46.6363 KOps/s $\textbf{\color{#35bf28}+12.61\%}$
test_update_nested 75.0800μs 26.5568μs 37.6552 KOps/s 33.6362 KOps/s $\textbf{\color{#35bf28}+11.95\%}$
test_set_nested 0.1407ms 19.5550μs 51.1377 KOps/s 45.7392 KOps/s $\textbf{\color{#35bf28}+11.80\%}$
test_set_nested_new 80.5510μs 24.6949μs 40.4942 KOps/s 35.4099 KOps/s $\textbf{\color{#35bf28}+14.36\%}$
test_select 0.1286ms 49.3752μs 20.2531 KOps/s 18.4452 KOps/s $\textbf{\color{#35bf28}+9.80\%}$
test_unbind_speed 0.7775ms 0.3784ms 2.6426 KOps/s 2.6513 KOps/s $\color{#d91a1a}-0.33\%$
test_unbind_speed_stack0 70.5046ms 4.7333ms 211.2703 Ops/s 231.8049 Ops/s $\textbf{\color{#d91a1a}-8.86\%}$
test_unbind_speed_stack1 2.1415μs 0.6221μs 1.6075 MOps/s 1.5725 MOps/s $\color{#35bf28}+2.22\%$
test_split 60.2952ms 1.7778ms 562.4863 Ops/s 596.9287 Ops/s $\textbf{\color{#d91a1a}-5.77\%}$
test_chunk 59.9495ms 1.7452ms 573.0077 Ops/s 594.5460 Ops/s $\color{#d91a1a}-3.62\%$
test_creation[device0] 0.4307ms 0.2968ms 3.3694 KOps/s 2.9400 KOps/s $\textbf{\color{#35bf28}+14.61\%}$
test_creation_from_tensor 4.4269ms 0.3333ms 3.0001 KOps/s 2.9992 KOps/s $\color{#35bf28}+0.03\%$
test_add_one[memmap_tensor0] 79.8390μs 25.5037μs 39.2100 KOps/s 38.6742 KOps/s $\color{#35bf28}+1.39\%$
test_contiguous[memmap_tensor0] 27.8720μs 5.9762μs 167.3306 KOps/s 172.3632 KOps/s $\color{#d91a1a}-2.92\%$
test_stack[memmap_tensor0] 86.5310μs 19.5312μs 51.2001 KOps/s 50.0867 KOps/s $\color{#35bf28}+2.22\%$
test_memmaptd_index 0.4740ms 0.1980ms 5.0498 KOps/s 4.9071 KOps/s $\color{#35bf28}+2.91\%$
test_memmaptd_index_astensor 0.5269ms 0.2564ms 3.9002 KOps/s 3.7649 KOps/s $\color{#35bf28}+3.59\%$
test_memmaptd_index_op 0.6012ms 0.5064ms 1.9746 KOps/s 1.8775 KOps/s $\textbf{\color{#35bf28}+5.17\%}$
test_reshape_pytree 59.5510μs 23.7644μs 42.0798 KOps/s 40.8690 KOps/s $\color{#35bf28}+2.96\%$
test_reshape_td 78.0960μs 32.8546μs 30.4371 KOps/s 30.2399 KOps/s $\color{#35bf28}+0.65\%$
test_view_pytree 71.6440μs 23.4450μs 42.6531 KOps/s 40.7374 KOps/s $\color{#35bf28}+4.70\%$
test_view_td 18.8850μs 4.9731μs 201.0806 KOps/s 202.9827 KOps/s $\color{#d91a1a}-0.94\%$
test_unbind_pytree 55.8340μs 26.4294μs 37.8367 KOps/s 36.3180 KOps/s $\color{#35bf28}+4.18\%$
test_unbind_td 0.1229ms 60.3295μs 16.5756 KOps/s 16.3909 KOps/s $\color{#35bf28}+1.13\%$
test_split_pytree 64.6010μs 26.3619μs 37.9335 KOps/s 36.2440 KOps/s $\color{#35bf28}+4.66\%$
test_split_td 0.1395ms 47.6734μs 20.9760 KOps/s 20.8094 KOps/s $\color{#35bf28}+0.80\%$
test_add_pytree 75.5910μs 32.1217μs 31.1316 KOps/s 29.5056 KOps/s $\textbf{\color{#35bf28}+5.51\%}$
test_add_td 0.1098ms 45.6047μs 21.9275 KOps/s 20.0098 KOps/s $\textbf{\color{#35bf28}+9.58\%}$
test_distributed 20.4680μs 6.0705μs 164.7318 KOps/s 162.0530 KOps/s $\color{#35bf28}+1.65\%$
test_tdmodule 0.1617ms 20.7210μs 48.2601 KOps/s 43.1558 KOps/s $\textbf{\color{#35bf28}+11.83\%}$
test_tdmodule_dispatch 0.1719ms 38.9960μs 25.6437 KOps/s 24.7318 KOps/s $\color{#35bf28}+3.69\%$
test_tdseq 50.1240μs 23.5306μs 42.4979 KOps/s 40.3764 KOps/s $\textbf{\color{#35bf28}+5.25\%}$
test_tdseq_dispatch 0.4339ms 41.9900μs 23.8152 KOps/s 22.1751 KOps/s $\textbf{\color{#35bf28}+7.40\%}$
test_instantiation_functorch 2.0506ms 1.3045ms 766.5915 Ops/s 738.7450 Ops/s $\color{#35bf28}+3.77\%$
test_instantiation_td 1.5820ms 1.0224ms 978.0639 Ops/s 949.6779 Ops/s $\color{#35bf28}+2.99\%$
test_exec_functorch 0.2504ms 0.1576ms 6.3470 KOps/s 6.1270 KOps/s $\color{#35bf28}+3.59\%$
test_exec_functional_call 0.3568ms 0.1482ms 6.7477 KOps/s 6.6817 KOps/s $\color{#35bf28}+0.99\%$
test_exec_td 0.2124ms 0.1439ms 6.9495 KOps/s 6.6377 KOps/s $\color{#35bf28}+4.70\%$
test_exec_td_decorator 0.9921ms 0.1762ms 5.6756 KOps/s 4.9413 KOps/s $\textbf{\color{#35bf28}+14.86\%}$
test_vmap_mlp_speed[True-True] 1.4355ms 0.9068ms 1.1028 KOps/s 1.0779 KOps/s $\color{#35bf28}+2.31\%$
test_vmap_mlp_speed[True-False] 0.6442ms 0.4677ms 2.1380 KOps/s 2.0481 KOps/s $\color{#35bf28}+4.39\%$
test_vmap_mlp_speed[False-True] 1.3129ms 0.7935ms 1.2602 KOps/s 1.2422 KOps/s $\color{#35bf28}+1.45\%$
test_vmap_mlp_speed[False-False] 0.5561ms 0.3853ms 2.5956 KOps/s 2.5112 KOps/s $\color{#35bf28}+3.36\%$
test_vmap_mlp_speed_decorator[True-True] 4.3440ms 1.9668ms 508.4402 Ops/s 545.3326 Ops/s $\textbf{\color{#d91a1a}-6.77\%}$
test_vmap_mlp_speed_decorator[True-False] 1.0076ms 0.5122ms 1.9525 KOps/s 1.8666 KOps/s $\color{#35bf28}+4.60\%$
test_vmap_mlp_speed_decorator[False-True] 1.9865ms 1.4775ms 676.8255 Ops/s 657.2215 Ops/s $\color{#35bf28}+2.98\%$
test_vmap_mlp_speed_decorator[False-False] 1.4797ms 0.4069ms 2.4576 KOps/s 2.4481 KOps/s $\color{#35bf28}+0.39\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.4324ms 12.7048μs 78.7102 KOps/s 78.8013 KOps/s $\color{#d91a1a}-0.12\%$
test_plain_set_stack_nested 0.2956ms 0.1148ms 8.7077 KOps/s 8.6159 KOps/s $\color{#35bf28}+1.07\%$
test_plain_set_nested_inplace 37.8220μs 15.1438μs 66.0335 KOps/s 65.6276 KOps/s $\color{#35bf28}+0.62\%$
test_plain_set_stack_nested_inplace 0.3287ms 0.1408ms 7.1003 KOps/s 7.0893 KOps/s $\color{#35bf28}+0.15\%$
test_items 0.1871ms 4.6392μs 215.5540 KOps/s 214.4428 KOps/s $\color{#35bf28}+0.52\%$
test_items_nested 0.5173ms 0.3357ms 2.9791 KOps/s 2.9473 KOps/s $\color{#35bf28}+1.08\%$
test_items_nested_locked 0.5295ms 0.3390ms 2.9498 KOps/s 2.9171 KOps/s $\color{#35bf28}+1.12\%$
test_items_nested_leaf 0.2260ms 0.1980ms 5.0512 KOps/s 4.9794 KOps/s $\color{#35bf28}+1.44\%$
test_items_stack_nested 1.6867ms 1.4740ms 678.4414 Ops/s 669.8370 Ops/s $\color{#35bf28}+1.28\%$
test_items_stack_nested_leaf 1.5601ms 1.3091ms 763.8784 Ops/s 752.9782 Ops/s $\color{#35bf28}+1.45\%$
test_items_stack_nested_locked 0.9782ms 0.8110ms 1.2330 KOps/s 1.1987 KOps/s $\color{#35bf28}+2.87\%$
test_keys 0.2398ms 4.5531μs 219.6301 KOps/s 217.8906 KOps/s $\color{#35bf28}+0.80\%$
test_keys_nested 3.2905ms 90.3565μs 11.0673 KOps/s 11.0174 KOps/s $\color{#35bf28}+0.45\%$
test_keys_nested_locked 0.1144ms 90.0552μs 11.1043 KOps/s 11.0592 KOps/s $\color{#35bf28}+0.41\%$
test_keys_nested_leaf 41.2258ms 86.8247μs 11.5175 KOps/s 12.2659 KOps/s $\textbf{\color{#d91a1a}-6.10\%}$
test_keys_stack_nested 1.4788ms 1.2972ms 770.9083 Ops/s 758.1007 Ops/s $\color{#35bf28}+1.69\%$
test_keys_stack_nested_leaf 1.4805ms 1.2985ms 770.1440 Ops/s 764.0835 Ops/s $\color{#35bf28}+0.79\%$
test_keys_stack_nested_locked 0.7862ms 0.6282ms 1.5918 KOps/s 1.5507 KOps/s $\color{#35bf28}+2.65\%$
test_values 60.5503μs 1.8908μs 528.8835 KOps/s 526.8461 KOps/s $\color{#35bf28}+0.39\%$
test_values_nested 70.5810μs 42.7336μs 23.4008 KOps/s 23.3112 KOps/s $\color{#35bf28}+0.38\%$
test_values_nested_locked 0.2167ms 44.9050μs 22.2692 KOps/s 22.0385 KOps/s $\color{#35bf28}+1.05\%$
test_values_nested_leaf 0.2173ms 36.9636μs 27.0536 KOps/s 26.7961 KOps/s $\color{#35bf28}+0.96\%$
test_values_stack_nested 1.3269ms 1.1316ms 883.7411 Ops/s 876.3233 Ops/s $\color{#35bf28}+0.85\%$
test_values_stack_nested_leaf 1.3229ms 1.1076ms 902.8678 Ops/s 886.5659 Ops/s $\color{#35bf28}+1.84\%$
test_values_stack_nested_locked 0.7310ms 0.4971ms 2.0116 KOps/s 1.9572 KOps/s $\color{#35bf28}+2.78\%$
test_membership 36.7644μs 0.9432μs 1.0602 MOps/s 937.4679 KOps/s $\textbf{\color{#35bf28}+13.09\%}$
test_membership_nested 18.9500μs 2.1908μs 456.4499 KOps/s 456.5388 KOps/s $\color{#d91a1a}-0.02\%$
test_membership_nested_leaf 96.6210μs 2.0973μs 476.8023 KOps/s 473.1796 KOps/s $\color{#35bf28}+0.77\%$
test_membership_stacked_nested 0.2039ms 10.9728μs 91.1345 KOps/s 91.4223 KOps/s $\color{#d91a1a}-0.31\%$
test_membership_stacked_nested_leaf 49.0400μs 10.9167μs 91.6027 KOps/s 91.7035 KOps/s $\color{#d91a1a}-0.11\%$
test_membership_nested_last 0.2012ms 4.5734μs 218.6550 KOps/s 217.7921 KOps/s $\color{#35bf28}+0.40\%$
test_membership_nested_leaf_last 0.1689ms 4.5980μs 217.4882 KOps/s 217.7050 KOps/s $\color{#d91a1a}-0.10\%$
test_membership_stacked_nested_last 0.3328ms 0.1336ms 7.4862 KOps/s 7.4050 KOps/s $\color{#35bf28}+1.10\%$
test_membership_stacked_nested_leaf_last 43.6210μs 12.7825μs 78.2321 KOps/s 78.2056 KOps/s $\color{#35bf28}+0.03\%$
test_nested_getleaf 0.1946ms 8.4201μs 118.7637 KOps/s 119.1052 KOps/s $\color{#d91a1a}-0.29\%$
test_nested_get 0.2243ms 7.9975μs 125.0389 KOps/s 124.9539 KOps/s $\color{#35bf28}+0.07\%$
test_stacked_getleaf 0.8600ms 0.5664ms 1.7655 KOps/s 1.7442 KOps/s $\color{#35bf28}+1.22\%$
test_stacked_get 0.7265ms 0.5311ms 1.8828 KOps/s 1.8785 KOps/s $\color{#35bf28}+0.23\%$
test_nested_getitemleaf 34.1000μs 8.4085μs 118.9269 KOps/s 118.3651 KOps/s $\color{#35bf28}+0.47\%$
test_nested_getitem 0.1756ms 7.9372μs 125.9892 KOps/s 125.3536 KOps/s $\color{#35bf28}+0.51\%$
test_stacked_getitemleaf 0.7669ms 0.5669ms 1.7639 KOps/s 1.7489 KOps/s $\color{#35bf28}+0.86\%$
test_stacked_getitem 0.7172ms 0.5294ms 1.8890 KOps/s 1.8406 KOps/s $\color{#35bf28}+2.63\%$
test_lock_nested 3.2298ms 0.5578ms 1.7928 KOps/s 1.7674 KOps/s $\color{#35bf28}+1.43\%$
test_lock_stack_nested 81.9395ms 7.2192ms 138.5199 Ops/s 137.4667 Ops/s $\color{#35bf28}+0.77\%$
test_unlock_nested 2.4443ms 0.4353ms 2.2972 KOps/s 2.3309 KOps/s $\color{#d91a1a}-1.45\%$
test_unlock_stack_nested 66.8960ms 6.2636ms 159.6519 Ops/s 158.9672 Ops/s $\color{#35bf28}+0.43\%$
test_flatten_speed 0.3823ms 0.1877ms 5.3271 KOps/s 5.3588 KOps/s $\color{#d91a1a}-0.59\%$
test_unflatten_speed 0.5448ms 0.3663ms 2.7299 KOps/s 2.7555 KOps/s $\color{#d91a1a}-0.93\%$
test_common_ops 1.1481ms 0.6161ms 1.6232 KOps/s 1.6593 KOps/s $\color{#d91a1a}-2.18\%$
test_creation 0.2025ms 2.0641μs 484.4629 KOps/s 470.5455 KOps/s $\color{#35bf28}+2.96\%$
test_creation_empty 41.3910μs 7.1169μs 140.5112 KOps/s 141.8243 KOps/s $\color{#d91a1a}-0.93\%$
test_creation_nested_1 28.6500μs 9.5198μs 105.0448 KOps/s 106.8206 KOps/s $\color{#d91a1a}-1.66\%$
test_creation_nested_2 0.1749ms 12.0935μs 82.6893 KOps/s 83.2139 KOps/s $\color{#d91a1a}-0.63\%$
test_clone 0.1167ms 14.6695μs 68.1687 KOps/s 70.2588 KOps/s $\color{#d91a1a}-2.97\%$
test_getitem[int] 31.9700μs 12.3745μs 80.8116 KOps/s 81.0366 KOps/s $\color{#d91a1a}-0.28\%$
test_getitem[slice_int] 0.2193ms 24.2419μs 41.2510 KOps/s 40.8534 KOps/s $\color{#35bf28}+0.97\%$
test_getitem[range] 83.9710μs 42.2844μs 23.6494 KOps/s 23.9528 KOps/s $\color{#d91a1a}-1.27\%$
test_getitem[tuple] 59.1410μs 22.3563μs 44.7301 KOps/s 49.0861 KOps/s $\textbf{\color{#d91a1a}-8.87\%}$
test_getitem[list] 69.1810μs 35.9232μs 27.8372 KOps/s 26.7278 KOps/s $\color{#35bf28}+4.15\%$
test_setitem_dim[int] 41.8410μs 25.9023μs 38.6066 KOps/s 37.7186 KOps/s $\color{#35bf28}+2.35\%$
test_setitem_dim[slice_int] 72.5110μs 46.3888μs 21.5569 KOps/s 21.0737 KOps/s $\color{#35bf28}+2.29\%$
test_setitem_dim[range] 0.2641ms 63.6025μs 15.7226 KOps/s 15.5568 KOps/s $\color{#35bf28}+1.07\%$
test_setitem_dim[tuple] 66.9610μs 39.5039μs 25.3139 KOps/s 24.6110 KOps/s $\color{#35bf28}+2.86\%$
test_setitem 97.2310μs 18.4774μs 54.1201 KOps/s 53.5202 KOps/s $\color{#35bf28}+1.12\%$
test_set 0.2084ms 18.0783μs 55.3150 KOps/s 55.8734 KOps/s $\color{#d91a1a}-1.00\%$
test_set_shared 2.6649ms 0.1056ms 9.4692 KOps/s 8.4441 KOps/s $\textbf{\color{#35bf28}+12.14\%}$
test_update 0.1012ms 19.4826μs 51.3279 KOps/s 51.2423 KOps/s $\color{#35bf28}+0.17\%$
test_update_nested 93.6820μs 25.8805μs 38.6392 KOps/s 37.8714 KOps/s $\color{#35bf28}+2.03\%$
test_set_nested 90.4510μs 19.5247μs 51.2172 KOps/s 51.2014 KOps/s $\color{#35bf28}+0.03\%$
test_set_nested_new 94.2710μs 23.4235μs 42.6922 KOps/s 41.3084 KOps/s $\color{#35bf28}+3.35\%$
test_select 0.1067ms 45.8638μs 21.8037 KOps/s 20.6956 KOps/s $\textbf{\color{#35bf28}+5.35\%}$
test_to 75.0610μs 54.6005μs 18.3149 KOps/s 18.3784 KOps/s $\color{#d91a1a}-0.35\%$
test_to_nonblocking 65.4310μs 34.6090μs 28.8942 KOps/s 28.0389 KOps/s $\color{#35bf28}+3.05\%$
test_unbind_speed 0.3960ms 0.3632ms 2.7531 KOps/s 2.7718 KOps/s $\color{#d91a1a}-0.68\%$
test_unbind_speed_stack0 62.8789ms 4.3839ms 228.1079 Ops/s 245.8537 Ops/s $\textbf{\color{#d91a1a}-7.22\%}$
test_unbind_speed_stack1 1.3126μs 0.5265μs 1.8992 MOps/s 1.8955 MOps/s $\color{#35bf28}+0.20\%$
test_split 53.6680ms 1.8489ms 540.8683 Ops/s 543.5124 Ops/s $\color{#d91a1a}-0.49\%$
test_chunk 53.3020ms 1.8418ms 542.9454 Ops/s 548.1018 Ops/s $\color{#d91a1a}-0.94\%$
test_creation[device0] 0.5618ms 0.3124ms 3.2007 KOps/s 3.2494 KOps/s $\color{#d91a1a}-1.50\%$
test_creation[device1] 0.9407ms 0.3187ms 3.1380 KOps/s 3.2172 KOps/s $\color{#d91a1a}-2.46\%$
test_creation_from_tensor 57.9628ms 0.3626ms 2.7578 KOps/s 2.9650 KOps/s $\textbf{\color{#d91a1a}-6.99\%}$
test_add_one[memmap_tensor0] 70.6510μs 23.0537μs 43.3770 KOps/s 40.6979 KOps/s $\textbf{\color{#35bf28}+6.58\%}$
test_add_one[memmap_tensor1] 0.2057ms 72.5518μs 13.7833 KOps/s 13.5834 KOps/s $\color{#35bf28}+1.47\%$
test_contiguous[memmap_tensor0] 26.0910μs 5.7302μs 174.5125 KOps/s 178.3858 KOps/s $\color{#d91a1a}-2.17\%$
test_contiguous[memmap_tensor1] 44.2410μs 21.1118μs 47.3668 KOps/s 45.9570 KOps/s $\color{#35bf28}+3.07\%$
test_stack[memmap_tensor0] 39.9610μs 18.5539μs 53.8970 KOps/s 53.0594 KOps/s $\color{#35bf28}+1.58\%$
test_stack[memmap_tensor1] 0.1532ms 72.4787μs 13.7972 KOps/s 13.5325 KOps/s $\color{#35bf28}+1.96\%$
test_memmaptd_index 0.2971ms 0.2383ms 4.1968 KOps/s 4.1099 KOps/s $\color{#35bf28}+2.11\%$
test_memmaptd_index_astensor 0.3738ms 0.2946ms 3.3950 KOps/s 3.3380 KOps/s $\color{#35bf28}+1.71\%$
test_memmaptd_index_op 0.6300ms 0.5551ms 1.8014 KOps/s 1.7488 KOps/s $\color{#35bf28}+3.00\%$
test_reshape_pytree 37.4300μs 21.0231μs 47.5668 KOps/s 47.4455 KOps/s $\color{#35bf28}+0.26\%$
test_reshape_td 64.1300μs 30.3800μs 32.9164 KOps/s 32.7014 KOps/s $\color{#35bf28}+0.66\%$
test_view_pytree 40.3100μs 20.7903μs 48.0994 KOps/s 48.3743 KOps/s $\color{#d91a1a}-0.57\%$
test_view_td 15.6910μs 4.0240μs 248.5105 KOps/s 248.6434 KOps/s $\color{#d91a1a}-0.05\%$
test_unbind_pytree 0.5975ms 25.9899μs 38.4765 KOps/s 37.8777 KOps/s $\color{#35bf28}+1.58\%$
test_unbind_td 90.2910μs 56.7266μs 17.6284 KOps/s 17.4529 KOps/s $\color{#35bf28}+1.01\%$
test_split_pytree 39.8600μs 23.9518μs 41.7506 KOps/s 41.4543 KOps/s $\color{#35bf28}+0.71\%$
test_split_td 73.2510μs 43.9277μs 22.7647 KOps/s 22.0217 KOps/s $\color{#35bf28}+3.37\%$
test_add_pytree 57.1700μs 31.9360μs 31.3127 KOps/s 31.0576 KOps/s $\color{#35bf28}+0.82\%$
test_add_td 76.8710μs 44.5553μs 22.4440 KOps/s 21.1448 KOps/s $\textbf{\color{#35bf28}+6.14\%}$
test_distributed 19.4000μs 5.5425μs 180.4237 KOps/s 179.1373 KOps/s $\color{#35bf28}+0.72\%$
test_tdmodule 31.9610μs 16.8308μs 59.4149 KOps/s 58.1935 KOps/s $\color{#35bf28}+2.10\%$
test_tdmodule_dispatch 0.2202ms 33.5048μs 29.8465 KOps/s 29.3838 KOps/s $\color{#35bf28}+1.57\%$
test_tdseq 35.6810μs 19.7803μs 50.5553 KOps/s 49.4485 KOps/s $\color{#35bf28}+2.24\%$
test_tdseq_dispatch 52.2010μs 36.2912μs 27.5549 KOps/s 27.7366 KOps/s $\color{#d91a1a}-0.66\%$
test_instantiation_functorch 1.7709ms 1.6899ms 591.7516 Ops/s 596.2229 Ops/s $\color{#d91a1a}-0.75\%$
test_instantiation_td 1.6710ms 1.1840ms 844.5845 Ops/s 844.3364 Ops/s $\color{#35bf28}+0.03\%$
test_exec_functorch 0.2180ms 0.1588ms 6.2985 KOps/s 6.3201 KOps/s $\color{#d91a1a}-0.34\%$
test_exec_functional_call 0.2215ms 0.1580ms 6.3275 KOps/s 6.4764 KOps/s $\color{#d91a1a}-2.30\%$
test_exec_td 0.1852ms 0.1487ms 6.7249 KOps/s 6.7325 KOps/s $\color{#d91a1a}-0.11\%$
test_exec_td_decorator 0.7071ms 0.1844ms 5.4233 KOps/s 5.3756 KOps/s $\color{#35bf28}+0.89\%$
test_vmap_mlp_speed[True-True] 1.1765ms 1.0822ms 924.0014 Ops/s 923.8306 Ops/s $\color{#35bf28}+0.02\%$
test_vmap_mlp_speed[True-False] 0.7240ms 0.6252ms 1.5996 KOps/s 1.6000 KOps/s $\color{#d91a1a}-0.02\%$
test_vmap_mlp_speed[False-True] 1.0853ms 0.9985ms 1.0015 KOps/s 1.0036 KOps/s $\color{#d91a1a}-0.21\%$
test_vmap_mlp_speed[False-False] 0.6322ms 0.5485ms 1.8231 KOps/s 1.8166 KOps/s $\color{#35bf28}+0.36\%$
test_vmap_mlp_speed_decorator[True-True] 2.8878ms 2.0417ms 489.7875 Ops/s 486.5477 Ops/s $\color{#35bf28}+0.67\%$
test_vmap_mlp_speed_decorator[True-False] 1.0596ms 0.6629ms 1.5086 KOps/s 1.5037 KOps/s $\color{#35bf28}+0.32\%$
test_vmap_mlp_speed_decorator[False-True] 2.2081ms 1.7678ms 565.6695 Ops/s 560.0388 Ops/s $\color{#35bf28}+1.01\%$
test_vmap_mlp_speed_decorator[False-False] 0.9498ms 0.5643ms 1.7722 KOps/s 1.7775 KOps/s $\color{#d91a1a}-0.30\%$
test_vmap_transformer_speed[True-True] 12.7418ms 12.6500ms 79.0514 Ops/s 79.1770 Ops/s $\color{#d91a1a}-0.16\%$
test_vmap_transformer_speed[True-False] 8.3924ms 8.3126ms 120.2987 Ops/s 120.6385 Ops/s $\color{#d91a1a}-0.28\%$
test_vmap_transformer_speed[False-True] 12.6558ms 12.5405ms 79.7413 Ops/s 79.5146 Ops/s $\color{#35bf28}+0.29\%$
test_vmap_transformer_speed[False-False] 8.5368ms 8.2259ms 121.5666 Ops/s 121.0841 Ops/s $\color{#35bf28}+0.40\%$
test_vmap_transformer_speed_decorator[True-True] 66.2524ms 64.8493ms 15.4204 Ops/s 15.2701 Ops/s $\color{#35bf28}+0.98\%$
test_vmap_transformer_speed_decorator[True-False] 22.2305ms 20.0401ms 49.8999 Ops/s 49.8328 Ops/s $\color{#35bf28}+0.13\%$
test_vmap_transformer_speed_decorator[False-True] 60.1096ms 58.7430ms 17.0233 Ops/s 15.5847 Ops/s $\textbf{\color{#35bf28}+9.23\%}$
test_vmap_transformer_speed_decorator[False-False] 21.6214ms 19.6341ms 50.9317 Ops/s 50.9411 Ops/s $\color{#d91a1a}-0.02\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants