Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix map test with fork on cuda #765

Merged
merged 1 commit into from
Apr 29, 2024
Merged

[BugFix] Fix map test with fork on cuda #765

merged 1 commit into from
Apr 29, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Apr 29, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 29, 2024
@vmoens vmoens added the bug Something isn't working label Apr 29, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 39.8750μs 16.8405μs 59.3805 KOps/s 58.1149 KOps/s $\color{#35bf28}+2.18\%$
test_plain_set_stack_nested 39.3740μs 16.2793μs 61.4275 KOps/s 57.8638 KOps/s $\textbf{\color{#35bf28}+6.16\%}$
test_plain_set_nested_inplace 55.5640μs 18.8537μs 53.0399 KOps/s 51.7607 KOps/s $\color{#35bf28}+2.47\%$
test_plain_set_stack_nested_inplace 61.3040μs 18.8106μs 53.1615 KOps/s 52.1840 KOps/s $\color{#35bf28}+1.87\%$
test_items 20.0980μs 2.6534μs 376.8736 KOps/s 390.6741 KOps/s $\color{#d91a1a}-3.53\%$
test_items_nested 1.1588ms 0.2649ms 3.7748 KOps/s 3.7637 KOps/s $\color{#35bf28}+0.29\%$
test_items_nested_locked 0.4898ms 0.2679ms 3.7328 KOps/s 3.6629 KOps/s $\color{#35bf28}+1.91\%$
test_items_nested_leaf 0.1649ms 77.0393μs 12.9804 KOps/s 13.0034 KOps/s $\color{#d91a1a}-0.18\%$
test_items_stack_nested 0.4956ms 0.2632ms 3.7990 KOps/s 3.7277 KOps/s $\color{#35bf28}+1.92\%$
test_items_stack_nested_leaf 0.4491ms 79.9859μs 12.5022 KOps/s 12.4837 KOps/s $\color{#35bf28}+0.15\%$
test_items_stack_nested_locked 0.4760ms 0.2666ms 3.7511 KOps/s 3.6774 KOps/s $\color{#35bf28}+2.00\%$
test_keys 30.4370μs 3.9911μs 250.5546 KOps/s 257.5265 KOps/s $\color{#d91a1a}-2.71\%$
test_keys_nested 0.2314ms 0.1400ms 7.1452 KOps/s 7.2510 KOps/s $\color{#d91a1a}-1.46\%$
test_keys_nested_locked 1.8833ms 0.1453ms 6.8817 KOps/s 6.9425 KOps/s $\color{#d91a1a}-0.88\%$
test_keys_nested_leaf 0.5844ms 0.1229ms 8.1348 KOps/s 8.5165 KOps/s $\color{#d91a1a}-4.48\%$
test_keys_stack_nested 0.2344ms 0.1410ms 7.0917 KOps/s 7.2481 KOps/s $\color{#d91a1a}-2.16\%$
test_keys_stack_nested_leaf 0.2188ms 0.1192ms 8.3866 KOps/s 8.5421 KOps/s $\color{#d91a1a}-1.82\%$
test_keys_stack_nested_locked 0.2845ms 0.1452ms 6.8848 KOps/s 6.8069 KOps/s $\color{#35bf28}+1.14\%$
test_values 4.3622μs 1.1373μs 879.2463 KOps/s 860.6742 KOps/s $\color{#35bf28}+2.16\%$
test_values_nested 0.1014ms 51.2823μs 19.4999 KOps/s 19.6212 KOps/s $\color{#d91a1a}-0.62\%$
test_values_nested_locked 94.8070μs 51.2432μs 19.5148 KOps/s 19.4623 KOps/s $\color{#35bf28}+0.27\%$
test_values_nested_leaf 0.2919ms 46.6346μs 21.4433 KOps/s 21.5943 KOps/s $\color{#d91a1a}-0.70\%$
test_values_stack_nested 0.1075ms 51.2852μs 19.4988 KOps/s 19.4778 KOps/s $\color{#35bf28}+0.11\%$
test_values_stack_nested_leaf 87.7040μs 46.4972μs 21.5067 KOps/s 21.7387 KOps/s $\color{#d91a1a}-1.07\%$
test_values_stack_nested_locked 0.1014ms 51.5733μs 19.3899 KOps/s 19.2191 KOps/s $\color{#35bf28}+0.89\%$
test_membership 30.8880μs 1.3795μs 724.9075 KOps/s 754.1845 KOps/s $\color{#d91a1a}-3.88\%$
test_membership_nested 47.0170μs 3.5180μs 284.2543 KOps/s 289.7407 KOps/s $\color{#d91a1a}-1.89\%$
test_membership_nested_leaf 28.9040μs 3.5315μs 283.1619 KOps/s 286.8534 KOps/s $\color{#d91a1a}-1.29\%$
test_membership_stacked_nested 29.1250μs 3.4793μs 287.4131 KOps/s 275.8226 KOps/s $\color{#35bf28}+4.20\%$
test_membership_stacked_nested_leaf 21.9510μs 3.5125μs 284.6965 KOps/s 288.6067 KOps/s $\color{#d91a1a}-1.35\%$
test_membership_nested_last 19.7270μs 4.2895μs 233.1289 KOps/s 235.8267 KOps/s $\color{#d91a1a}-1.14\%$
test_membership_nested_leaf_last 23.4630μs 4.3008μs 232.5160 KOps/s 234.5349 KOps/s $\color{#d91a1a}-0.86\%$
test_membership_stacked_nested_last 29.5960μs 4.2455μs 235.5444 KOps/s 236.8386 KOps/s $\color{#d91a1a}-0.55\%$
test_membership_stacked_nested_leaf_last 28.3030μs 4.2780μs 233.7546 KOps/s 235.5729 KOps/s $\color{#d91a1a}-0.77\%$
test_nested_getleaf 44.6830μs 10.6585μs 93.8218 KOps/s 92.1189 KOps/s $\color{#35bf28}+1.85\%$
test_nested_get 60.0720μs 10.1566μs 98.4577 KOps/s 99.2430 KOps/s $\color{#d91a1a}-0.79\%$
test_stacked_getleaf 34.8150μs 10.4778μs 95.4399 KOps/s 94.1488 KOps/s $\color{#35bf28}+1.37\%$
test_stacked_get 34.5730μs 10.0841μs 99.1656 KOps/s 97.6568 KOps/s $\color{#35bf28}+1.55\%$
test_nested_getitemleaf 39.3530μs 11.2074μs 89.2264 KOps/s 87.1033 KOps/s $\color{#35bf28}+2.44\%$
test_nested_getitem 34.7040μs 10.4056μs 96.1018 KOps/s 96.1418 KOps/s $\color{#d91a1a}-0.04\%$
test_stacked_getitemleaf 35.0060μs 11.2062μs 89.2360 KOps/s 89.0138 KOps/s $\color{#35bf28}+0.25\%$
test_stacked_getitem 32.8610μs 10.3735μs 96.3997 KOps/s 97.2771 KOps/s $\color{#d91a1a}-0.90\%$
test_lock_nested 48.7816ms 0.4068ms 2.4581 KOps/s 2.8128 KOps/s $\textbf{\color{#d91a1a}-12.61\%}$
test_lock_stack_nested 0.5219ms 0.3194ms 3.1312 KOps/s 3.1525 KOps/s $\color{#d91a1a}-0.68\%$
test_unlock_nested 0.7474ms 0.3605ms 2.7738 KOps/s 2.4615 KOps/s $\textbf{\color{#35bf28}+12.69\%}$
test_unlock_stack_nested 0.5642ms 0.3249ms 3.0781 KOps/s 3.0911 KOps/s $\color{#d91a1a}-0.42\%$
test_flatten_speed 0.1860ms 97.1267μs 10.2958 KOps/s 10.2418 KOps/s $\color{#35bf28}+0.53\%$
test_unflatten_speed 0.7274ms 0.4118ms 2.4284 KOps/s 2.3994 KOps/s $\color{#35bf28}+1.21\%$
test_common_ops 4.1796ms 0.6959ms 1.4370 KOps/s 1.4074 KOps/s $\color{#35bf28}+2.10\%$
test_creation 26.0890μs 2.0019μs 499.5252 KOps/s 527.2850 KOps/s $\textbf{\color{#d91a1a}-5.26\%}$
test_creation_empty 37.1090μs 9.0775μs 110.1621 KOps/s 95.0774 KOps/s $\textbf{\color{#35bf28}+15.87\%}$
test_creation_nested_1 35.5760μs 11.8635μs 84.2918 KOps/s 75.1711 KOps/s $\textbf{\color{#35bf28}+12.13\%}$
test_creation_nested_2 40.3550μs 15.2727μs 65.4764 KOps/s 59.8508 KOps/s $\textbf{\color{#35bf28}+9.40\%}$
test_clone 71.8740μs 13.8808μs 72.0422 KOps/s 71.8087 KOps/s $\color{#35bf28}+0.33\%$
test_getitem[int] 33.3120μs 11.7725μs 84.9439 KOps/s 86.7141 KOps/s $\color{#d91a1a}-2.04\%$
test_getitem[slice_int] 61.6750μs 23.5285μs 42.5016 KOps/s 44.7651 KOps/s $\textbf{\color{#d91a1a}-5.06\%}$
test_getitem[range] 96.5500μs 65.2938μs 15.3154 KOps/s 16.5468 KOps/s $\textbf{\color{#d91a1a}-7.44\%}$
test_getitem[tuple] 57.3770μs 19.5594μs 51.1262 KOps/s 52.7220 KOps/s $\color{#d91a1a}-3.03\%$
test_getitem[list] 0.1089ms 41.5384μs 24.0741 KOps/s 24.1885 KOps/s $\color{#d91a1a}-0.47\%$
test_setitem_dim[int] 63.2980μs 34.5216μs 28.9674 KOps/s 28.6481 KOps/s $\color{#35bf28}+1.11\%$
test_setitem_dim[slice_int] 98.6040μs 61.6244μs 16.2273 KOps/s 16.2200 KOps/s $\color{#35bf28}+0.04\%$
test_setitem_dim[range] 0.1433ms 87.5073μs 11.4276 KOps/s 11.7116 KOps/s $\color{#d91a1a}-2.42\%$
test_setitem_dim[tuple] 0.1040ms 49.9223μs 20.0311 KOps/s 19.6242 KOps/s $\color{#35bf28}+2.07\%$
test_setitem 53.9600μs 19.7135μs 50.7267 KOps/s 48.4367 KOps/s $\color{#35bf28}+4.73\%$
test_set 54.1910μs 19.5018μs 51.2772 KOps/s 50.5136 KOps/s $\color{#35bf28}+1.51\%$
test_set_shared 3.9034ms 0.1435ms 6.9682 KOps/s 6.9793 KOps/s $\color{#d91a1a}-0.16\%$
test_update 80.9210μs 20.2073μs 49.4870 KOps/s 45.5238 KOps/s $\textbf{\color{#35bf28}+8.71\%}$
test_update_nested 72.0680μs 28.9866μs 34.4986 KOps/s 33.9837 KOps/s $\color{#35bf28}+1.52\%$
test_update__nested 93.0830μs 25.5350μs 39.1619 KOps/s 37.7156 KOps/s $\color{#35bf28}+3.83\%$
test_set_nested 66.0230μs 21.2288μs 47.1058 KOps/s 46.0145 KOps/s $\color{#35bf28}+2.37\%$
test_set_nested_new 63.1680μs 25.5845μs 39.0861 KOps/s 38.1522 KOps/s $\color{#35bf28}+2.45\%$
test_select 0.1084ms 41.6086μs 24.0335 KOps/s 24.4280 KOps/s $\color{#d91a1a}-1.61\%$
test_select_nested 0.1401ms 61.1798μs 16.3453 KOps/s 16.1845 KOps/s $\color{#35bf28}+0.99\%$
test_exclude_nested 0.2360ms 0.1224ms 8.1710 KOps/s 8.3580 KOps/s $\color{#d91a1a}-2.24\%$
test_empty[True] 0.4854ms 0.4021ms 2.4872 KOps/s 2.4964 KOps/s $\color{#d91a1a}-0.37\%$
test_empty[False] 6.0434μs 1.1494μs 870.0008 KOps/s 867.7324 KOps/s $\color{#35bf28}+0.26\%$
test_unbind_speed 0.3589ms 0.2610ms 3.8314 KOps/s 3.6860 KOps/s $\color{#35bf28}+3.94\%$
test_unbind_speed_stack0 4.5916ms 0.2721ms 3.6758 KOps/s 3.8284 KOps/s $\color{#d91a1a}-3.99\%$
test_unbind_speed_stack1 65.2879ms 0.7410ms 1.3496 KOps/s 1.2525 KOps/s $\textbf{\color{#35bf28}+7.75\%}$
test_split 68.6510ms 1.6292ms 613.7831 Ops/s 616.5034 Ops/s $\color{#d91a1a}-0.44\%$
test_chunk 67.7464ms 1.6340ms 611.9977 Ops/s 615.1264 Ops/s $\color{#d91a1a}-0.51\%$
test_creation[device0] 0.1890ms 0.1043ms 9.5861 KOps/s 9.5522 KOps/s $\color{#35bf28}+0.35\%$
test_creation_from_tensor 3.7267ms 85.2445μs 11.7310 KOps/s 11.9841 KOps/s $\color{#d91a1a}-2.11\%$
test_add_one[memmap_tensor0] 68.4070μs 5.5893μs 178.9130 KOps/s 175.3892 KOps/s $\color{#35bf28}+2.01\%$
test_contiguous[memmap_tensor0] 10.7300μs 0.6327μs 1.5805 MOps/s 1.5861 MOps/s $\color{#d91a1a}-0.35\%$
test_stack[memmap_tensor0] 28.2530μs 3.5887μs 278.6524 KOps/s 275.6703 KOps/s $\color{#35bf28}+1.08\%$
test_memmaptd_index 1.0091ms 0.2466ms 4.0558 KOps/s 3.9624 KOps/s $\color{#35bf28}+2.36\%$
test_memmaptd_index_astensor 0.6565ms 0.3216ms 3.1095 KOps/s 3.0927 KOps/s $\color{#35bf28}+0.54\%$
test_memmaptd_index_op 1.1904ms 0.5880ms 1.7006 KOps/s 1.6196 KOps/s $\textbf{\color{#35bf28}+5.00\%}$
test_serialize_model 0.1705s 0.1092s 9.1546 Ops/s 8.8828 Ops/s $\color{#35bf28}+3.06\%$
test_serialize_model_pickle 0.4641s 0.3795s 2.6348 Ops/s 2.6127 Ops/s $\color{#35bf28}+0.84\%$
test_serialize_weights 0.1053s 99.4160ms 10.0587 Ops/s 9.0459 Ops/s $\textbf{\color{#35bf28}+11.20\%}$
test_serialize_weights_returnearly 0.1889s 0.1367s 7.3149 Ops/s 8.0903 Ops/s $\textbf{\color{#d91a1a}-9.58\%}$
test_serialize_weights_pickle 1.0859s 0.6212s 1.6098 Ops/s 1.5546 Ops/s $\color{#35bf28}+3.55\%$
test_serialize_weights_filesystem 0.1562s 97.7867ms 10.2263 Ops/s 10.9581 Ops/s $\textbf{\color{#d91a1a}-6.68\%}$
test_serialize_model_filesystem 0.1006s 92.3799ms 10.8249 Ops/s 9.8307 Ops/s $\textbf{\color{#35bf28}+10.11\%}$
test_reshape_pytree 67.8570μs 25.7005μs 38.9097 KOps/s 39.8228 KOps/s $\color{#d91a1a}-2.29\%$
test_reshape_td 78.8370μs 33.6383μs 29.7281 KOps/s 30.1104 KOps/s $\color{#d91a1a}-1.27\%$
test_view_pytree 55.3530μs 25.2757μs 39.5637 KOps/s 39.5767 KOps/s $\color{#d91a1a}-0.03\%$
test_view_td 0.1360ms 37.3261μs 26.7909 KOps/s 26.3191 KOps/s $\color{#35bf28}+1.79\%$
test_unbind_pytree 63.8900μs 29.3536μs 34.0674 KOps/s 33.6351 KOps/s $\color{#35bf28}+1.29\%$
test_unbind_td 0.3689ms 38.9699μs 25.6609 KOps/s 25.8813 KOps/s $\color{#d91a1a}-0.85\%$
test_split_pytree 77.4150μs 30.4977μs 32.7894 KOps/s 34.3517 KOps/s $\color{#d91a1a}-4.55\%$
test_split_td 0.4540ms 45.8036μs 21.8323 KOps/s 23.7529 KOps/s $\textbf{\color{#d91a1a}-8.09\%}$
test_add_pytree 0.2394ms 36.7156μs 27.2364 KOps/s 28.4809 KOps/s $\color{#d91a1a}-4.37\%$
test_add_td 0.1320ms 56.3951μs 17.7320 KOps/s 18.4410 KOps/s $\color{#d91a1a}-3.84\%$
test_distributed 0.2144ms 0.1002ms 9.9803 KOps/s 9.7952 KOps/s $\color{#35bf28}+1.89\%$
test_tdmodule 28.7240μs 17.0914μs 58.5089 KOps/s 57.0842 KOps/s $\color{#35bf28}+2.50\%$
test_tdmodule_dispatch 76.4630μs 33.9664μs 29.4409 KOps/s 28.0972 KOps/s $\color{#35bf28}+4.78\%$
test_tdseq 32.4610μs 19.2606μs 51.9193 KOps/s 47.7509 KOps/s $\textbf{\color{#35bf28}+8.73\%}$
test_tdseq_dispatch 66.5240μs 38.1280μs 26.2274 KOps/s 24.6302 KOps/s $\textbf{\color{#35bf28}+6.49\%}$
test_instantiation_functorch 1.9055ms 1.3549ms 738.0774 Ops/s 748.4108 Ops/s $\color{#d91a1a}-1.38\%$
test_instantiation_td 64.8389ms 1.1092ms 901.5708 Ops/s 965.5844 Ops/s $\textbf{\color{#d91a1a}-6.63\%}$
test_exec_functorch 0.2995ms 0.1643ms 6.0864 KOps/s 6.0754 KOps/s $\color{#35bf28}+0.18\%$
test_exec_functional_call 0.2623ms 0.1531ms 6.5299 KOps/s 6.6104 KOps/s $\color{#d91a1a}-1.22\%$
test_exec_td 0.3592ms 0.1478ms 6.7670 KOps/s 6.8019 KOps/s $\color{#d91a1a}-0.51\%$
test_exec_td_decorator 0.5715ms 0.2244ms 4.4571 KOps/s 4.5392 KOps/s $\color{#d91a1a}-1.81\%$
test_vmap_mlp_speed[True-True] 0.7639ms 0.4928ms 2.0291 KOps/s 2.0330 KOps/s $\color{#d91a1a}-0.19\%$
test_vmap_mlp_speed[True-False] 1.0606ms 0.4895ms 2.0429 KOps/s 2.0399 KOps/s $\color{#35bf28}+0.15\%$
test_vmap_mlp_speed[False-True] 0.7894ms 0.4020ms 2.4875 KOps/s 2.4852 KOps/s $\color{#35bf28}+0.09\%$
test_vmap_mlp_speed[False-False] 0.9310ms 0.4008ms 2.4952 KOps/s 2.4930 KOps/s $\color{#35bf28}+0.09\%$
test_vmap_mlp_speed_decorator[True-True] 1.5633ms 0.5750ms 1.7392 KOps/s 1.7846 KOps/s $\color{#d91a1a}-2.55\%$
test_vmap_mlp_speed_decorator[True-False] 0.8374ms 0.5576ms 1.7934 KOps/s 1.7929 KOps/s $\color{#35bf28}+0.03\%$
test_vmap_mlp_speed_decorator[False-True] 0.7235ms 0.4644ms 2.1531 KOps/s 2.1750 KOps/s $\color{#d91a1a}-1.01\%$
test_vmap_mlp_speed_decorator[False-False] 1.0150ms 0.4655ms 2.1482 KOps/s 2.1681 KOps/s $\color{#d91a1a}-0.92\%$
test_to_module_speed[True] 2.0757ms 1.7075ms 585.6447 Ops/s 592.3807 Ops/s $\color{#d91a1a}-1.14\%$
test_to_module_speed[False] 2.5936ms 1.6831ms 594.1538 Ops/s 601.4312 Ops/s $\color{#d91a1a}-1.21\%$

@vmoens vmoens merged commit ad35bfd into main Apr 29, 2024
34 of 38 checks passed
@vmoens vmoens deleted the fix-map-cuda branch April 29, 2024 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants