Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] non_blocking=None by default #748

Merged
merged 2 commits into from
Apr 24, 2024
Merged

[Feature] non_blocking=None by default #748

merged 2 commits into from
Apr 24, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Apr 24, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 24, 2024
Copy link

github-actions bot commented Apr 24, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}26$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 37.8410μs 15.4090μs 64.8971 KOps/s 61.2630 KOps/s $\textbf{\color{#35bf28}+5.93\%}$
test_plain_set_stack_nested 40.6360μs 15.6072μs 64.0729 KOps/s 60.1935 KOps/s $\textbf{\color{#35bf28}+6.44\%}$
test_plain_set_nested_inplace 44.0920μs 17.8866μs 55.9077 KOps/s 52.8780 KOps/s $\textbf{\color{#35bf28}+5.73\%}$
test_plain_set_stack_nested_inplace 43.9220μs 17.8053μs 56.1631 KOps/s 53.0389 KOps/s $\textbf{\color{#35bf28}+5.89\%}$
test_items 37.7700μs 2.5325μs 394.8708 KOps/s 376.4448 KOps/s $\color{#35bf28}+4.89\%$
test_items_nested 0.4391ms 0.2685ms 3.7247 KOps/s 3.7407 KOps/s $\color{#d91a1a}-0.43\%$
test_items_nested_locked 1.0531ms 0.2706ms 3.6952 KOps/s 3.6442 KOps/s $\color{#35bf28}+1.40\%$
test_items_nested_leaf 0.2007ms 78.7697μs 12.6952 KOps/s 12.6362 KOps/s $\color{#35bf28}+0.47\%$
test_items_stack_nested 1.1444ms 0.2681ms 3.7305 KOps/s 3.6369 KOps/s $\color{#35bf28}+2.58\%$
test_items_stack_nested_leaf 0.1576ms 78.5772μs 12.7263 KOps/s 12.3075 KOps/s $\color{#35bf28}+3.40\%$
test_items_stack_nested_locked 1.0467ms 0.2680ms 3.7318 KOps/s 3.7520 KOps/s $\color{#d91a1a}-0.54\%$
test_keys 35.7260μs 3.8175μs 261.9546 KOps/s 255.9574 KOps/s $\color{#35bf28}+2.34\%$
test_keys_nested 0.2469ms 0.1400ms 7.1427 KOps/s 7.0423 KOps/s $\color{#35bf28}+1.43\%$
test_keys_nested_locked 2.1806ms 0.1442ms 6.9350 KOps/s 6.7958 KOps/s $\color{#35bf28}+2.05\%$
test_keys_nested_leaf 0.1668ms 0.1173ms 8.5230 KOps/s 8.2618 KOps/s $\color{#35bf28}+3.16\%$
test_keys_stack_nested 0.2430ms 0.1385ms 7.2223 KOps/s 6.9931 KOps/s $\color{#35bf28}+3.28\%$
test_keys_stack_nested_leaf 0.2364ms 0.1171ms 8.5421 KOps/s 8.2142 KOps/s $\color{#35bf28}+3.99\%$
test_keys_stack_nested_locked 0.2664ms 0.1425ms 7.0166 KOps/s 6.8074 KOps/s $\color{#35bf28}+3.07\%$
test_values 14.6870μs 1.3060μs 765.6723 KOps/s 862.1818 KOps/s $\textbf{\color{#d91a1a}-11.19\%}$
test_values_nested 92.7330μs 51.2501μs 19.5122 KOps/s 19.2172 KOps/s $\color{#35bf28}+1.54\%$
test_values_nested_locked 0.1044ms 51.4454μs 19.4381 KOps/s 19.0372 KOps/s $\color{#35bf28}+2.11\%$
test_values_nested_leaf 93.7750μs 45.8289μs 21.8203 KOps/s 21.3434 KOps/s $\color{#35bf28}+2.23\%$
test_values_stack_nested 0.1060ms 51.3223μs 19.4847 KOps/s 18.8340 KOps/s $\color{#35bf28}+3.46\%$
test_values_stack_nested_leaf 0.1007ms 45.9214μs 21.7763 KOps/s 21.3003 KOps/s $\color{#35bf28}+2.23\%$
test_values_stack_nested_locked 0.1031ms 51.2181μs 19.5244 KOps/s 18.9266 KOps/s $\color{#35bf28}+3.16\%$
test_membership 29.6160μs 1.3930μs 717.8687 KOps/s 716.5548 KOps/s $\color{#35bf28}+0.18\%$
test_membership_nested 39.4330μs 3.4335μs 291.2506 KOps/s 283.0345 KOps/s $\color{#35bf28}+2.90\%$
test_membership_nested_leaf 36.1980μs 3.3934μs 294.6862 KOps/s 262.6755 KOps/s $\textbf{\color{#35bf28}+12.19\%}$
test_membership_stacked_nested 30.2860μs 3.4068μs 293.5342 KOps/s 285.2924 KOps/s $\color{#35bf28}+2.89\%$
test_membership_stacked_nested_leaf 19.1750μs 3.4571μs 289.2613 KOps/s 283.4796 KOps/s $\color{#35bf28}+2.04\%$
test_membership_nested_last 38.6420μs 4.2384μs 235.9367 KOps/s 228.3225 KOps/s $\color{#35bf28}+3.33\%$
test_membership_nested_leaf_last 22.3320μs 4.2244μs 236.7192 KOps/s 230.4326 KOps/s $\color{#35bf28}+2.73\%$
test_membership_stacked_nested_last 22.6830μs 4.2524μs 235.1600 KOps/s 203.1858 KOps/s $\textbf{\color{#35bf28}+15.74\%}$
test_membership_stacked_nested_leaf_last 23.4840μs 4.2758μs 233.8728 KOps/s 201.6231 KOps/s $\textbf{\color{#35bf28}+16.00\%}$
test_nested_getleaf 33.6530μs 10.6956μs 93.4961 KOps/s 93.5596 KOps/s $\color{#d91a1a}-0.07\%$
test_nested_get 29.2950μs 10.2273μs 97.7775 KOps/s 97.4189 KOps/s $\color{#35bf28}+0.37\%$
test_stacked_getleaf 28.1920μs 10.7453μs 93.0643 KOps/s 93.2272 KOps/s $\color{#d91a1a}-0.17\%$
test_stacked_get 41.0670μs 10.1074μs 98.9371 KOps/s 99.1042 KOps/s $\color{#d91a1a}-0.17\%$
test_nested_getitemleaf 33.4730μs 11.2966μs 88.5220 KOps/s 87.6741 KOps/s $\color{#35bf28}+0.97\%$
test_nested_getitem 30.6980μs 10.4015μs 96.1397 KOps/s 95.8785 KOps/s $\color{#35bf28}+0.27\%$
test_stacked_getitemleaf 31.3090μs 11.0682μs 90.3488 KOps/s 87.8231 KOps/s $\color{#35bf28}+2.88\%$
test_stacked_getitem 30.5870μs 10.3482μs 96.6355 KOps/s 93.9835 KOps/s $\color{#35bf28}+2.82\%$
test_lock_nested 45.3791ms 0.3905ms 2.5609 KOps/s 2.5362 KOps/s $\color{#35bf28}+0.97\%$
test_lock_stack_nested 0.3908ms 0.3153ms 3.1720 KOps/s 3.1619 KOps/s $\color{#35bf28}+0.32\%$
test_unlock_nested 76.8857ms 0.4258ms 2.3488 KOps/s 2.3304 KOps/s $\color{#35bf28}+0.79\%$
test_unlock_stack_nested 0.3981ms 0.3261ms 3.0669 KOps/s 3.0517 KOps/s $\color{#35bf28}+0.50\%$
test_flatten_speed 0.3945ms 94.2304μs 10.6123 KOps/s 10.3921 KOps/s $\color{#35bf28}+2.12\%$
test_unflatten_speed 0.5925ms 0.4041ms 2.4743 KOps/s 2.4384 KOps/s $\color{#35bf28}+1.47\%$
test_common_ops 4.0280ms 0.6655ms 1.5027 KOps/s 1.4279 KOps/s $\textbf{\color{#35bf28}+5.24\%}$
test_creation 29.7260μs 1.9317μs 517.6840 KOps/s 511.2779 KOps/s $\color{#35bf28}+1.25\%$
test_creation_empty 23.6640μs 7.9898μs 125.1603 KOps/s 105.8729 KOps/s $\textbf{\color{#35bf28}+18.22\%}$
test_creation_nested_1 33.0720μs 10.7196μs 93.2869 KOps/s 79.8102 KOps/s $\textbf{\color{#35bf28}+16.89\%}$
test_creation_nested_2 49.2120μs 14.1860μs 70.4922 KOps/s 64.3961 KOps/s $\textbf{\color{#35bf28}+9.47\%}$
test_clone 50.6150μs 13.4408μs 74.4001 KOps/s 72.2235 KOps/s $\color{#35bf28}+3.01\%$
test_getitem[int] 31.3280μs 11.7225μs 85.3059 KOps/s 84.9400 KOps/s $\color{#35bf28}+0.43\%$
test_getitem[slice_int] 70.4110μs 23.6576μs 42.2698 KOps/s 42.2667 KOps/s $+0.01\%$
test_getitem[range] 82.3940μs 58.9842μs 16.9537 KOps/s 16.5868 KOps/s $\color{#35bf28}+2.21\%$
test_getitem[tuple] 51.6660μs 19.4688μs 51.3643 KOps/s 50.4192 KOps/s $\color{#35bf28}+1.87\%$
test_getitem[list] 0.1076ms 41.0346μs 24.3697 KOps/s 23.4904 KOps/s $\color{#35bf28}+3.74\%$
test_setitem_dim[int] 69.8210μs 33.5083μs 29.8433 KOps/s 29.0175 KOps/s $\color{#35bf28}+2.85\%$
test_setitem_dim[slice_int] 0.1105ms 62.4724μs 16.0071 KOps/s 16.0349 KOps/s $\color{#d91a1a}-0.17\%$
test_setitem_dim[range] 0.1674ms 82.2001μs 12.1654 KOps/s 11.9499 KOps/s $\color{#35bf28}+1.80\%$
test_setitem_dim[tuple] 79.8900μs 49.8637μs 20.0547 KOps/s 19.8119 KOps/s $\color{#35bf28}+1.23\%$
test_setitem 61.3140μs 19.0648μs 52.4527 KOps/s 47.7180 KOps/s $\textbf{\color{#35bf28}+9.92\%}$
test_set 55.0130μs 18.1128μs 55.2095 KOps/s 49.7395 KOps/s $\textbf{\color{#35bf28}+11.00\%}$
test_set_shared 5.5998ms 0.1409ms 7.0983 KOps/s 7.1045 KOps/s $\color{#d91a1a}-0.09\%$
test_update 82.3140μs 18.9450μs 52.7845 KOps/s 47.2460 KOps/s $\textbf{\color{#35bf28}+11.72\%}$
test_update_nested 0.1008ms 26.7694μs 37.3560 KOps/s 34.0781 KOps/s $\textbf{\color{#35bf28}+9.62\%}$
test_update__nested 69.0790μs 24.6568μs 40.5568 KOps/s 37.6919 KOps/s $\textbf{\color{#35bf28}+7.60\%}$
test_set_nested 53.9310μs 19.6765μs 50.8220 KOps/s 45.6593 KOps/s $\textbf{\color{#35bf28}+11.31\%}$
test_set_nested_new 73.3470μs 23.7511μs 42.1033 KOps/s 38.8897 KOps/s $\textbf{\color{#35bf28}+8.26\%}$
test_select 0.1114ms 38.1648μs 26.2022 KOps/s 24.8152 KOps/s $\textbf{\color{#35bf28}+5.59\%}$
test_select_nested 0.1230ms 61.7102μs 16.2048 KOps/s 16.1064 KOps/s $\color{#35bf28}+0.61\%$
test_exclude_nested 0.2395ms 0.1215ms 8.2295 KOps/s 8.2589 KOps/s $\color{#d91a1a}-0.36\%$
test_empty[True] 0.5809ms 0.3937ms 2.5399 KOps/s 2.5034 KOps/s $\color{#35bf28}+1.46\%$
test_empty[False] 6.2878μs 1.1050μs 904.9682 KOps/s 916.5768 KOps/s $\color{#d91a1a}-1.27\%$
test_unbind_speed 0.4599ms 0.2582ms 3.8726 KOps/s 3.8408 KOps/s $\color{#35bf28}+0.83\%$
test_unbind_speed_stack0 0.4305ms 0.2571ms 3.8890 KOps/s 3.8951 KOps/s $\color{#d91a1a}-0.16\%$
test_unbind_speed_stack1 0.1224s 0.7225ms 1.3840 KOps/s 1.3673 KOps/s $\color{#35bf28}+1.22\%$
test_split 0.1228s 1.7202ms 581.3206 Ops/s 578.4820 Ops/s $\color{#35bf28}+0.49\%$
test_chunk 1.7393ms 1.5133ms 660.7864 Ops/s 653.7917 Ops/s $\color{#35bf28}+1.07\%$
test_creation[device0] 5.5169ms 0.1051ms 9.5158 KOps/s 9.2301 KOps/s $\color{#35bf28}+3.10\%$
test_creation_from_tensor 0.1565ms 81.9511μs 12.2024 KOps/s 12.0451 KOps/s $\color{#35bf28}+1.31\%$
test_add_one[memmap_tensor0] 68.3580μs 5.5562μs 179.9787 KOps/s 174.5796 KOps/s $\color{#35bf28}+3.09\%$
test_contiguous[memmap_tensor0] 18.5450μs 0.6309μs 1.5851 MOps/s 1.5500 MOps/s $\color{#35bf28}+2.26\%$
test_stack[memmap_tensor0] 25.8180μs 3.5890μs 278.6294 KOps/s 271.0848 KOps/s $\color{#35bf28}+2.78\%$
test_memmaptd_index 1.0136ms 0.2359ms 4.2399 KOps/s 4.1028 KOps/s $\color{#35bf28}+3.34\%$
test_memmaptd_index_astensor 0.5544ms 0.3088ms 3.2382 KOps/s 3.1437 KOps/s $\color{#35bf28}+3.01\%$
test_memmaptd_index_op 1.1288ms 0.5549ms 1.8021 KOps/s 1.6816 KOps/s $\textbf{\color{#35bf28}+7.16\%}$
test_serialize_model 0.1074s 0.1002s 9.9755 Ops/s 8.6503 Ops/s $\textbf{\color{#35bf28}+15.32\%}$
test_serialize_model_pickle 0.4547s 0.3784s 2.6425 Ops/s 2.6075 Ops/s $\color{#35bf28}+1.34\%$
test_serialize_weights 0.1008s 93.6116ms 10.6824 Ops/s 10.1624 Ops/s $\textbf{\color{#35bf28}+5.12\%}$
test_serialize_weights_returnearly 0.1280s 0.1220s 8.1965 Ops/s 8.0973 Ops/s $\color{#35bf28}+1.23\%$
test_serialize_weights_pickle 0.7551s 0.4843s 2.0650 Ops/s 2.3293 Ops/s $\textbf{\color{#d91a1a}-11.34\%}$
test_serialize_weights_filesystem 0.2114s 0.1008s 9.9246 Ops/s 11.1992 Ops/s $\textbf{\color{#d91a1a}-11.38\%}$
test_serialize_model_filesystem 98.3555ms 90.3867ms 11.0636 Ops/s 10.6627 Ops/s $\color{#35bf28}+3.76\%$
test_reshape_pytree 65.5720μs 25.5871μs 39.0822 KOps/s 37.8031 KOps/s $\color{#35bf28}+3.38\%$
test_reshape_td 70.7420μs 34.1081μs 29.3186 KOps/s 29.5249 KOps/s $\color{#d91a1a}-0.70\%$
test_view_pytree 60.1720μs 25.4558μs 39.2838 KOps/s 38.2705 KOps/s $\color{#35bf28}+2.65\%$
test_view_td 0.1356s 65.7942μs 15.1989 KOps/s 15.6635 KOps/s $\color{#d91a1a}-2.97\%$
test_unbind_pytree 69.4200μs 28.7743μs 34.7532 KOps/s 34.1399 KOps/s $\color{#35bf28}+1.80\%$
test_unbind_td 0.1068ms 38.2562μs 26.1396 KOps/s 25.5142 KOps/s $\color{#35bf28}+2.45\%$
test_split_pytree 60.7640μs 29.7071μs 33.6620 KOps/s 33.8043 KOps/s $\color{#d91a1a}-0.42\%$
test_split_td 0.1281ms 41.2756μs 24.2274 KOps/s 23.7125 KOps/s $\color{#35bf28}+2.17\%$
test_add_pytree 84.3980μs 35.3981μs 28.2501 KOps/s 28.0484 KOps/s $\color{#35bf28}+0.72\%$
test_add_td 0.1468ms 54.2213μs 18.4429 KOps/s 18.0634 KOps/s $\color{#35bf28}+2.10\%$
test_distributed 0.1823ms 0.1020ms 9.8055 KOps/s 9.8134 KOps/s $\color{#d91a1a}-0.08\%$
test_tdmodule 62.9380μs 16.4236μs 60.8879 KOps/s 58.8209 KOps/s $\color{#35bf28}+3.51\%$
test_tdmodule_dispatch 49.9630μs 32.4584μs 30.8086 KOps/s 29.7857 KOps/s $\color{#35bf28}+3.43\%$
test_tdseq 34.0240μs 18.9269μs 52.8347 KOps/s 49.8420 KOps/s $\textbf{\color{#35bf28}+6.00\%}$
test_tdseq_dispatch 64.0890μs 36.7276μs 27.2275 KOps/s 24.3977 KOps/s $\textbf{\color{#35bf28}+11.60\%}$
test_instantiation_functorch 1.8365ms 1.3365ms 748.2455 Ops/s 757.6943 Ops/s $\color{#d91a1a}-1.25\%$
test_instantiation_td 0.1625s 1.1880ms 841.7491 Ops/s 975.4899 Ops/s $\textbf{\color{#d91a1a}-13.71\%}$
test_exec_functorch 0.3020ms 0.1576ms 6.3446 KOps/s 5.9861 KOps/s $\textbf{\color{#35bf28}+5.99\%}$
test_exec_functional_call 0.2942ms 0.1474ms 6.7863 KOps/s 6.4387 KOps/s $\textbf{\color{#35bf28}+5.40\%}$
test_exec_td 0.3116ms 0.1451ms 6.8940 KOps/s 6.5746 KOps/s $\color{#35bf28}+4.86\%$
test_exec_td_decorator 0.6934ms 0.2208ms 4.5298 KOps/s 4.3849 KOps/s $\color{#35bf28}+3.31\%$
test_vmap_mlp_speed[True-True] 0.5822ms 0.4794ms 2.0857 KOps/s 2.0638 KOps/s $\color{#35bf28}+1.06\%$
test_vmap_mlp_speed[True-False] 0.7728ms 0.4808ms 2.0799 KOps/s 2.0735 KOps/s $\color{#35bf28}+0.31\%$
test_vmap_mlp_speed[False-True] 0.5562ms 0.3940ms 2.5379 KOps/s 2.5299 KOps/s $\color{#35bf28}+0.32\%$
test_vmap_mlp_speed[False-False] 0.6041ms 0.3934ms 2.5422 KOps/s 2.5498 KOps/s $\color{#d91a1a}-0.30\%$
test_vmap_mlp_speed_decorator[True-True] 1.1594ms 0.5502ms 1.8174 KOps/s 1.8064 KOps/s $\color{#35bf28}+0.61\%$
test_vmap_mlp_speed_decorator[True-False] 0.8440ms 0.5535ms 1.8065 KOps/s 1.8061 KOps/s $\color{#35bf28}+0.02\%$
test_vmap_mlp_speed_decorator[False-True] 0.6188ms 0.4573ms 2.1866 KOps/s 2.1843 KOps/s $\color{#35bf28}+0.11\%$
test_vmap_mlp_speed_decorator[False-False] 0.7191ms 0.4589ms 2.1792 KOps/s 2.1969 KOps/s $\color{#d91a1a}-0.80\%$
test_to_module_speed[True] 1.9902ms 1.6953ms 589.8765 Ops/s 587.4991 Ops/s $\color{#35bf28}+0.40\%$
test_to_module_speed[False] 1.8861ms 1.6760ms 596.6533 Ops/s 595.5225 Ops/s $\color{#35bf28}+0.19\%$

@vmoens vmoens added the enhancement New feature or request label Apr 24, 2024
@vmoens vmoens merged commit 607698f into main Apr 24, 2024
44 of 48 checks passed
@vmoens vmoens deleted the non-blocking-none branch April 24, 2024 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants