Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Test] Test FC of memmap save and load #838

Merged
merged 6 commits into from
Jun 26, 2024
Merged

[Test] Test FC of memmap save and load #838

merged 6 commits into from
Jun 26, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 25, 2024

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 25, 2024
@vmoens vmoens added the Test label Jun 25, 2024
Copy link

github-actions bot commented Jun 25, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 43.4710μs 17.0594μs 58.6188 KOps/s 57.3411 KOps/s $\color{#35bf28}+2.23\%$
test_plain_set_stack_nested 34.8850μs 17.1263μs 58.3896 KOps/s 57.2677 KOps/s $\color{#35bf28}+1.96\%$
test_plain_set_nested_inplace 50.5150μs 19.3412μs 51.7030 KOps/s 51.2798 KOps/s $\color{#35bf28}+0.83\%$
test_plain_set_stack_nested_inplace 40.5960μs 19.2143μs 52.0446 KOps/s 51.2606 KOps/s $\color{#35bf28}+1.53\%$
test_items 31.6990μs 2.5205μs 396.7390 KOps/s 354.8811 KOps/s $\textbf{\color{#35bf28}+11.79\%}$
test_items_nested 0.3638ms 0.2665ms 3.7526 KOps/s 3.7641 KOps/s $\color{#d91a1a}-0.30\%$
test_items_nested_locked 1.3320ms 0.2663ms 3.7548 KOps/s 3.7621 KOps/s $\color{#d91a1a}-0.19\%$
test_items_nested_leaf 0.1207ms 77.1216μs 12.9665 KOps/s 13.0017 KOps/s $\color{#d91a1a}-0.27\%$
test_items_stack_nested 1.3198ms 0.2663ms 3.7546 KOps/s 3.6729 KOps/s $\color{#35bf28}+2.22\%$
test_items_stack_nested_leaf 0.1086ms 74.6851μs 13.3896 KOps/s 12.7250 KOps/s $\textbf{\color{#35bf28}+5.22\%}$
test_items_stack_nested_locked 0.3152ms 0.2678ms 3.7340 KOps/s 3.7427 KOps/s $\color{#d91a1a}-0.23\%$
test_keys 38.4620μs 3.9852μs 250.9270 KOps/s 258.1891 KOps/s $\color{#d91a1a}-2.81\%$
test_keys_nested 0.2054ms 0.1384ms 7.2258 KOps/s 7.2689 KOps/s $\color{#d91a1a}-0.59\%$
test_keys_nested_locked 0.6434ms 0.1444ms 6.9268 KOps/s 7.0292 KOps/s $\color{#d91a1a}-1.46\%$
test_keys_nested_leaf 0.2046ms 0.1169ms 8.5551 KOps/s 8.5420 KOps/s $\color{#35bf28}+0.15\%$
test_keys_stack_nested 0.2321ms 0.1347ms 7.4219 KOps/s 7.2468 KOps/s $\color{#35bf28}+2.42\%$
test_keys_stack_nested_leaf 0.2106ms 0.1148ms 8.7079 KOps/s 8.5272 KOps/s $\color{#35bf28}+2.12\%$
test_keys_stack_nested_locked 0.2430ms 0.1399ms 7.1468 KOps/s 7.0278 KOps/s $\color{#35bf28}+1.69\%$
test_values 6.6400μs 1.1989μs 834.1038 KOps/s 853.1381 KOps/s $\color{#d91a1a}-2.23\%$
test_values_nested 0.1075ms 50.4280μs 19.8303 KOps/s 19.8064 KOps/s $\color{#35bf28}+0.12\%$
test_values_nested_locked 0.1024ms 50.5779μs 19.7715 KOps/s 19.7934 KOps/s $\color{#d91a1a}-0.11\%$
test_values_nested_leaf 93.5550μs 45.0523μs 22.1964 KOps/s 21.7826 KOps/s $\color{#35bf28}+1.90\%$
test_values_stack_nested 95.5380μs 51.2788μs 19.5012 KOps/s 19.4369 KOps/s $\color{#35bf28}+0.33\%$
test_values_stack_nested_leaf 78.4770μs 44.4893μs 22.4773 KOps/s 21.6323 KOps/s $\color{#35bf28}+3.91\%$
test_values_stack_nested_locked 97.1220μs 51.1343μs 19.5563 KOps/s 19.4941 KOps/s $\color{#35bf28}+0.32\%$
test_membership 17.9030μs 1.3554μs 737.7849 KOps/s 726.6856 KOps/s $\color{#35bf28}+1.53\%$
test_membership_nested 27.0410μs 3.5125μs 284.6979 KOps/s 285.8424 KOps/s $\color{#d91a1a}-0.40\%$
test_membership_nested_leaf 27.6920μs 3.5204μs 284.0611 KOps/s 289.0514 KOps/s $\color{#d91a1a}-1.73\%$
test_membership_stacked_nested 28.5730μs 3.5072μs 285.1307 KOps/s 270.4604 KOps/s $\textbf{\color{#35bf28}+5.42\%}$
test_membership_stacked_nested_leaf 33.7230μs 3.5300μs 283.2897 KOps/s 288.3471 KOps/s $\color{#d91a1a}-1.75\%$
test_membership_nested_last 22.0710μs 4.3176μs 231.6101 KOps/s 240.4191 KOps/s $\color{#d91a1a}-3.66\%$
test_membership_nested_leaf_last 27.7210μs 4.3818μs 228.2158 KOps/s 240.0077 KOps/s $\color{#d91a1a}-4.91\%$
test_membership_stacked_nested_last 36.1570μs 13.2491μs 75.4770 KOps/s 176.9485 KOps/s $\textbf{\color{#d91a1a}-57.35\%}$
test_membership_stacked_nested_leaf_last 48.7720μs 13.3677μs 74.8069 KOps/s 176.9615 KOps/s $\textbf{\color{#d91a1a}-57.73\%}$
test_nested_getleaf 40.3660μs 10.4918μs 95.3124 KOps/s 95.9531 KOps/s $\color{#d91a1a}-0.67\%$
test_nested_get 33.1320μs 9.8278μs 101.7517 KOps/s 101.3346 KOps/s $\color{#35bf28}+0.41\%$
test_stacked_getleaf 45.1940μs 10.3020μs 97.0683 KOps/s 98.6953 KOps/s $\color{#d91a1a}-1.65\%$
test_stacked_get 42.2590μs 9.8757μs 101.2591 KOps/s 101.7492 KOps/s $\color{#d91a1a}-0.48\%$
test_nested_getitemleaf 32.7110μs 10.9638μs 91.2091 KOps/s 91.7966 KOps/s $\color{#d91a1a}-0.64\%$
test_nested_getitem 51.6460μs 10.1932μs 98.1044 KOps/s 99.1415 KOps/s $\color{#d91a1a}-1.05\%$
test_stacked_getitemleaf 41.5180μs 11.0541μs 90.4639 KOps/s 92.6848 KOps/s $\color{#d91a1a}-2.40\%$
test_stacked_getitem 40.6460μs 10.1703μs 98.3251 KOps/s 99.0096 KOps/s $\color{#d91a1a}-0.69\%$
test_lock_nested 48.1291ms 0.3905ms 2.5611 KOps/s 2.9680 KOps/s $\textbf{\color{#d91a1a}-13.71\%}$
test_lock_stack_nested 0.4399ms 0.3002ms 3.3311 KOps/s 3.2819 KOps/s $\color{#35bf28}+1.50\%$
test_unlock_nested 0.6835ms 0.3467ms 2.8847 KOps/s 2.9315 KOps/s $\color{#d91a1a}-1.60\%$
test_unlock_stack_nested 0.5127ms 0.3080ms 3.2465 KOps/s 3.1761 KOps/s $\color{#35bf28}+2.21\%$
test_flatten_speed 0.2000ms 95.4170μs 10.4803 KOps/s 10.5683 KOps/s $\color{#d91a1a}-0.83\%$
test_unflatten_speed 0.7375ms 0.4096ms 2.4416 KOps/s 2.4796 KOps/s $\color{#d91a1a}-1.53\%$
test_common_ops 1.3907ms 0.7248ms 1.3797 KOps/s 1.3393 KOps/s $\color{#35bf28}+3.02\%$
test_creation 27.4510μs 1.9211μs 520.5285 KOps/s 526.5301 KOps/s $\color{#d91a1a}-1.14\%$
test_creation_empty 43.7620μs 10.7706μs 92.8456 KOps/s 83.5730 KOps/s $\textbf{\color{#35bf28}+11.10\%}$
test_creation_nested_1 40.1850μs 14.4679μs 69.1184 KOps/s 67.7273 KOps/s $\color{#35bf28}+2.05\%$
test_creation_nested_2 43.5010μs 16.8503μs 59.3460 KOps/s 56.4475 KOps/s $\textbf{\color{#35bf28}+5.13\%}$
test_clone 0.1168ms 12.9785μs 77.0505 KOps/s 74.4293 KOps/s $\color{#35bf28}+3.52\%$
test_getitem[int] 37.6200μs 11.3736μs 87.9232 KOps/s 88.9313 KOps/s $\color{#d91a1a}-1.13\%$
test_getitem[slice_int] 62.2860μs 22.5054μs 44.4337 KOps/s 45.8788 KOps/s $\color{#d91a1a}-3.15\%$
test_getitem[range] 79.7590μs 58.2210μs 17.1759 KOps/s 17.0789 KOps/s $\color{#35bf28}+0.57\%$
test_getitem[tuple] 58.8590μs 18.8327μs 53.0992 KOps/s 54.2927 KOps/s $\color{#d91a1a}-2.20\%$
test_getitem[list] 98.0830μs 41.5202μs 24.0847 KOps/s 25.4038 KOps/s $\textbf{\color{#d91a1a}-5.19\%}$
test_setitem_dim[int] 76.1720μs 35.6438μs 28.0554 KOps/s 28.0215 KOps/s $\color{#35bf28}+0.12\%$
test_setitem_dim[slice_int] 0.1211ms 62.0052μs 16.1277 KOps/s 16.1331 KOps/s $\color{#d91a1a}-0.03\%$
test_setitem_dim[range] 0.1458ms 85.4114μs 11.7080 KOps/s 11.8244 KOps/s $\color{#d91a1a}-0.98\%$
test_setitem_dim[tuple] 93.5050μs 51.5975μs 19.3808 KOps/s 19.7581 KOps/s $\color{#d91a1a}-1.91\%$
test_setitem 59.8310μs 20.1622μs 49.5978 KOps/s 47.7055 KOps/s $\color{#35bf28}+3.97\%$
test_set 98.2840μs 20.0691μs 49.8278 KOps/s 49.3222 KOps/s $\color{#35bf28}+1.03\%$
test_set_shared 4.4174ms 0.1443ms 6.9319 KOps/s 6.9269 KOps/s $\color{#35bf28}+0.07\%$
test_update 0.1355ms 22.2057μs 45.0334 KOps/s 42.8739 KOps/s $\textbf{\color{#35bf28}+5.04\%}$
test_update_nested 84.8690μs 30.5375μs 32.7466 KOps/s 31.4022 KOps/s $\color{#35bf28}+4.28\%$
test_update__nested 71.0130μs 24.8446μs 40.2502 KOps/s 40.1205 KOps/s $\color{#35bf28}+0.32\%$
test_set_nested 80.3300μs 21.6326μs 46.2266 KOps/s 45.0977 KOps/s $\color{#35bf28}+2.50\%$
test_set_nested_new 62.1670μs 26.0023μs 38.4581 KOps/s 37.9121 KOps/s $\color{#35bf28}+1.44\%$
test_select 83.0750μs 40.9500μs 24.4200 KOps/s 23.3287 KOps/s $\color{#35bf28}+4.68\%$
test_select_nested 0.1200ms 60.0734μs 16.6463 KOps/s 16.6483 KOps/s $\color{#d91a1a}-0.01\%$
test_exclude_nested 0.2269ms 0.1205ms 8.3013 KOps/s 8.3725 KOps/s $\color{#d91a1a}-0.85\%$
test_empty[True] 0.4628ms 0.3966ms 2.5214 KOps/s 2.4816 KOps/s $\color{#35bf28}+1.61\%$
test_empty[False] 8.5835μs 1.1907μs 839.8118 KOps/s 854.1641 KOps/s $\color{#d91a1a}-1.68\%$
test_unbind_speed 0.6085ms 0.2595ms 3.8537 KOps/s 3.8909 KOps/s $\color{#d91a1a}-0.96\%$
test_unbind_speed_stack0 0.3744ms 0.2455ms 4.0731 KOps/s 4.0684 KOps/s $\color{#35bf28}+0.11\%$
test_unbind_speed_stack1 65.6205ms 0.7181ms 1.3926 KOps/s 1.3853 KOps/s $\color{#35bf28}+0.52\%$
test_split 66.0872ms 1.6066ms 622.4135 Ops/s 634.3904 Ops/s $\color{#d91a1a}-1.89\%$
test_chunk 65.9907ms 1.6144ms 619.4343 Ops/s 637.0369 Ops/s $\color{#d91a1a}-2.76\%$
test_creation[device0] 0.1817ms 85.0102μs 11.7633 KOps/s 12.0407 KOps/s $\color{#d91a1a}-2.30\%$
test_creation_from_tensor 0.1572ms 83.5652μs 11.9667 KOps/s 11.9337 KOps/s $\color{#35bf28}+0.28\%$
test_add_one[memmap_tensor0] 60.8640μs 5.4200μs 184.5007 KOps/s 181.1126 KOps/s $\color{#35bf28}+1.87\%$
test_contiguous[memmap_tensor0] 19.7170μs 0.6526μs 1.5324 MOps/s 1.5885 MOps/s $\color{#d91a1a}-3.53\%$
test_stack[memmap_tensor0] 30.8780μs 3.5704μs 280.0767 KOps/s 280.0839 KOps/s $-0.00\%$
test_memmaptd_index 0.4297ms 0.2537ms 3.9410 KOps/s 3.9601 KOps/s $\color{#d91a1a}-0.48\%$
test_memmaptd_index_astensor 0.7070ms 0.3275ms 3.0539 KOps/s 3.0722 KOps/s $\color{#d91a1a}-0.60\%$
test_memmaptd_index_op 0.8830ms 0.6113ms 1.6360 KOps/s 1.6176 KOps/s $\color{#35bf28}+1.13\%$
test_serialize_model 0.1719s 0.1155s 8.6607 Ops/s 9.6516 Ops/s $\textbf{\color{#d91a1a}-10.27\%}$
test_serialize_model_pickle 0.4581s 0.3771s 2.6519 Ops/s 2.6450 Ops/s $\color{#35bf28}+0.26\%$
test_serialize_weights 0.1612s 0.1087s 9.2016 Ops/s 8.7409 Ops/s $\textbf{\color{#35bf28}+5.27\%}$
test_serialize_weights_returnearly 0.1869s 0.1329s 7.5235 Ops/s 7.4734 Ops/s $\color{#35bf28}+0.67\%$
test_serialize_weights_pickle 1.1023s 0.5675s 1.7621 Ops/s 2.2882 Ops/s $\textbf{\color{#d91a1a}-23.00\%}$
test_serialize_weights_filesystem 99.9610ms 90.6937ms 11.0261 Ops/s 10.5845 Ops/s $\color{#35bf28}+4.17\%$
test_serialize_model_filesystem 0.1570s 99.9561ms 10.0044 Ops/s 10.6310 Ops/s $\textbf{\color{#d91a1a}-5.89\%}$
test_reshape_pytree 66.6540μs 25.5592μs 39.1248 KOps/s 39.2262 KOps/s $\color{#d91a1a}-0.26\%$
test_reshape_td 73.5180μs 34.1517μs 29.2812 KOps/s 29.2237 KOps/s $\color{#35bf28}+0.20\%$
test_view_pytree 61.0750μs 25.2073μs 39.6710 KOps/s 39.5893 KOps/s $\color{#35bf28}+0.21\%$
test_view_td 84.0370μs 39.3780μs 25.3949 KOps/s 25.5546 KOps/s $\color{#d91a1a}-0.63\%$
test_unbind_pytree 62.4160μs 29.5784μs 33.8084 KOps/s 34.2347 KOps/s $\color{#d91a1a}-1.25\%$
test_unbind_td 0.3614ms 38.2772μs 26.1252 KOps/s 26.4765 KOps/s $\color{#d91a1a}-1.33\%$
test_split_pytree 73.6680μs 29.7306μs 33.6354 KOps/s 34.6579 KOps/s $\color{#d91a1a}-2.95\%$
test_split_td 0.1257ms 40.7107μs 24.5636 KOps/s 25.0274 KOps/s $\color{#d91a1a}-1.85\%$
test_add_pytree 94.1670μs 34.2262μs 29.2174 KOps/s 28.6335 KOps/s $\color{#35bf28}+2.04\%$
test_add_td 0.1189ms 55.2772μs 18.0906 KOps/s 17.2919 KOps/s $\color{#35bf28}+4.62\%$
test_distributed 0.2137ms 0.1005ms 9.9508 KOps/s 9.7971 KOps/s $\color{#35bf28}+1.57\%$
test_tdmodule 75.1300μs 18.2240μs 54.8727 KOps/s 55.5942 KOps/s $\color{#d91a1a}-1.30\%$
test_tdmodule_dispatch 56.2460μs 35.5376μs 28.1392 KOps/s 28.1787 KOps/s $\color{#d91a1a}-0.14\%$
test_tdseq 37.9210μs 20.3673μs 49.0984 KOps/s 49.1017 KOps/s $-0.01\%$
test_tdseq_dispatch 65.2820μs 39.9554μs 25.0279 KOps/s 24.7913 KOps/s $\color{#35bf28}+0.95\%$
test_instantiation_functorch 2.9221ms 1.3101ms 763.2771 Ops/s 755.4591 Ops/s $\color{#35bf28}+1.03\%$
test_instantiation_td 1.5861ms 0.9931ms 1.0069 KOps/s 991.4953 Ops/s $\color{#35bf28}+1.56\%$
test_exec_functorch 0.2922ms 0.1609ms 6.2137 KOps/s 5.8080 KOps/s $\textbf{\color{#35bf28}+6.98\%}$
test_exec_functional_call 0.2386ms 0.1475ms 6.7805 KOps/s 6.7574 KOps/s $\color{#35bf28}+0.34\%$
test_exec_td 0.2617ms 0.1448ms 6.9074 KOps/s 6.8128 KOps/s $\color{#35bf28}+1.39\%$
test_exec_td_decorator 0.7862ms 0.2185ms 4.5761 KOps/s 4.5580 KOps/s $\color{#35bf28}+0.40\%$
test_vmap_mlp_speed[True-True] 0.6834ms 0.4795ms 2.0856 KOps/s 2.0651 KOps/s $\color{#35bf28}+1.00\%$
test_vmap_mlp_speed[True-False] 0.8384ms 0.4781ms 2.0918 KOps/s 2.0670 KOps/s $\color{#35bf28}+1.20\%$
test_vmap_mlp_speed[False-True] 0.5752ms 0.3901ms 2.5632 KOps/s 2.5330 KOps/s $\color{#35bf28}+1.19\%$
test_vmap_mlp_speed[False-False] 0.6606ms 0.3911ms 2.5572 KOps/s 2.5381 KOps/s $\color{#35bf28}+0.75\%$
test_vmap_mlp_speed_decorator[True-True] 1.2504ms 0.5519ms 1.8120 KOps/s 1.7992 KOps/s $\color{#35bf28}+0.71\%$
test_vmap_mlp_speed_decorator[True-False] 0.7127ms 0.5487ms 1.8224 KOps/s 1.8067 KOps/s $\color{#35bf28}+0.87\%$
test_vmap_mlp_speed_decorator[False-True] 0.7648ms 0.4530ms 2.2076 KOps/s 2.1712 KOps/s $\color{#35bf28}+1.68\%$
test_vmap_mlp_speed_decorator[False-False] 0.6340ms 0.4501ms 2.2219 KOps/s 2.1759 KOps/s $\color{#35bf28}+2.11\%$
test_to_module_speed[True] 2.3673ms 1.6986ms 588.7081 Ops/s 591.6907 Ops/s $\color{#d91a1a}-0.50\%$
test_to_module_speed[False] 2.6605ms 1.6708ms 598.5023 Ops/s 601.3196 Ops/s $\color{#d91a1a}-0.47\%$
test_tc_init 54.8430μs 29.6277μs 33.7522 KOps/s 31.3156 KOps/s $\textbf{\color{#35bf28}+7.78\%}$
test_tc_init_nested 94.8680μs 60.3847μs 16.5605 KOps/s 15.4642 KOps/s $\textbf{\color{#35bf28}+7.09\%}$
test_tc_first_layer_tensor 3.9574μs 0.7081μs 1.4122 MOps/s 1.5148 MOps/s $\textbf{\color{#d91a1a}-6.77\%}$
test_tc_first_layer_nontensor 2.0208μs 0.6936μs 1.4416 MOps/s 1.5258 MOps/s $\textbf{\color{#d91a1a}-5.51\%}$
test_tc_second_layer_tensor 26.0790μs 1.9129μs 522.7705 KOps/s 527.3316 KOps/s $\color{#d91a1a}-0.86\%$
test_tc_second_layer_nontensor 10.4197μs 1.5709μs 636.5798 KOps/s 592.9092 KOps/s $\textbf{\color{#35bf28}+7.37\%}$
test_unbind 83.2872ms 7.4674ms 133.9149 Ops/s 143.7652 Ops/s $\textbf{\color{#d91a1a}-6.85\%}$
test_full_like 16.4957ms 9.8400ms 101.6264 Ops/s 90.5487 Ops/s $\textbf{\color{#35bf28}+12.23\%}$
test_zeros_like 6.8495ms 5.7046ms 175.2959 Ops/s 185.6292 Ops/s $\textbf{\color{#d91a1a}-5.57\%}$
test_ones_like 6.8348ms 6.1186ms 163.4358 Ops/s 170.5415 Ops/s $\color{#d91a1a}-4.17\%$
test_clone 12.9777ms 7.4805ms 133.6811 Ops/s 135.6000 Ops/s $\color{#d91a1a}-1.42\%$
test_squeeze 58.0280μs 13.8725μs 72.0851 KOps/s 74.8288 KOps/s $\color{#d91a1a}-3.67\%$
test_unsqueeze 0.1148ms 58.6714μs 17.0441 KOps/s 16.7453 KOps/s $\color{#35bf28}+1.78\%$
test_split 0.2471ms 0.1110ms 9.0112 KOps/s 9.0333 KOps/s $\color{#d91a1a}-0.24\%$
test_permute 0.2830ms 0.1255ms 7.9651 KOps/s 8.0439 KOps/s $\color{#d91a1a}-0.98\%$
test_stack 27.4364ms 21.4038ms 46.7206 Ops/s 48.2336 Ops/s $\color{#d91a1a}-3.14\%$
test_cat 27.6381ms 21.2419ms 47.0767 Ops/s 48.2536 Ops/s $\color{#d91a1a}-2.44\%$

Copy link

github-actions bot commented Jun 25, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}2$. Worsened: $\large\color{#d91a1a}32$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 92.9650μs 13.0932μs 76.3758 KOps/s 86.3623 KOps/s $\textbf{\color{#d91a1a}-11.56\%}$
test_plain_set_stack_nested 27.4920μs 13.2341μs 75.5626 KOps/s 85.3203 KOps/s $\textbf{\color{#d91a1a}-11.44\%}$
test_plain_set_nested_inplace 42.8520μs 14.4596μs 69.1580 KOps/s 77.8146 KOps/s $\textbf{\color{#d91a1a}-11.12\%}$
test_plain_set_stack_nested_inplace 43.7320μs 14.4516μs 69.1966 KOps/s 77.0017 KOps/s $\textbf{\color{#d91a1a}-10.14\%}$
test_items 17.9010μs 4.7403μs 210.9550 KOps/s 210.2899 KOps/s $\color{#35bf28}+0.32\%$
test_items_nested 0.3861ms 0.3364ms 2.9725 KOps/s 2.9588 KOps/s $\color{#35bf28}+0.47\%$
test_items_nested_locked 0.3893ms 0.3381ms 2.9575 KOps/s 2.8685 KOps/s $\color{#35bf28}+3.10\%$
test_items_nested_leaf 0.1034ms 82.7954μs 12.0780 KOps/s 12.1639 KOps/s $\color{#d91a1a}-0.71\%$
test_items_stack_nested 0.3960ms 0.3394ms 2.9466 KOps/s 2.9453 KOps/s $\color{#35bf28}+0.04\%$
test_items_stack_nested_leaf 0.1124ms 83.4532μs 11.9828 KOps/s 12.0488 KOps/s $\color{#d91a1a}-0.55\%$
test_items_stack_nested_locked 0.4018ms 0.3384ms 2.9555 KOps/s 2.9342 KOps/s $\color{#35bf28}+0.72\%$
test_keys 24.0910μs 4.3448μs 230.1616 KOps/s 230.4462 KOps/s $\color{#d91a1a}-0.12\%$
test_keys_nested 0.1381ms 66.5911μs 15.0170 KOps/s 14.8638 KOps/s $\color{#35bf28}+1.03\%$
test_keys_nested_locked 2.1003ms 71.9051μs 13.9072 KOps/s 13.7240 KOps/s $\color{#35bf28}+1.34\%$
test_keys_nested_leaf 79.1840μs 57.0243μs 17.5364 KOps/s 17.2243 KOps/s $\color{#35bf28}+1.81\%$
test_keys_stack_nested 83.5040μs 66.3810μs 15.0645 KOps/s 14.9407 KOps/s $\color{#35bf28}+0.83\%$
test_keys_stack_nested_leaf 80.5930μs 57.0767μs 17.5203 KOps/s 17.2674 KOps/s $\color{#35bf28}+1.46\%$
test_keys_stack_nested_locked 95.7050μs 70.7104μs 14.1422 KOps/s 13.9336 KOps/s $\color{#35bf28}+1.50\%$
test_values 10.9703μs 1.8001μs 555.5261 KOps/s 551.7487 KOps/s $\color{#35bf28}+0.68\%$
test_values_nested 60.3130μs 34.9169μs 28.6394 KOps/s 28.5660 KOps/s $\color{#35bf28}+0.26\%$
test_values_nested_locked 59.1430μs 36.6780μs 27.2643 KOps/s 27.2087 KOps/s $\color{#35bf28}+0.20\%$
test_values_nested_leaf 50.2020μs 30.9976μs 32.2606 KOps/s 32.1670 KOps/s $\color{#35bf28}+0.29\%$
test_values_stack_nested 67.7830μs 35.7431μs 27.9774 KOps/s 28.4599 KOps/s $\color{#d91a1a}-1.70\%$
test_values_stack_nested_leaf 61.2530μs 31.7708μs 31.4754 KOps/s 31.8481 KOps/s $\color{#d91a1a}-1.17\%$
test_values_stack_nested_locked 68.9340μs 37.4985μs 26.6677 KOps/s 27.1659 KOps/s $\color{#d91a1a}-1.83\%$
test_membership 3.4216μs 0.7316μs 1.3669 MOps/s 1.3341 MOps/s $\color{#35bf28}+2.46\%$
test_membership_nested 31.0610μs 2.5830μs 387.1512 KOps/s 390.2969 KOps/s $\color{#d91a1a}-0.81\%$
test_membership_nested_leaf 24.9410μs 2.5568μs 391.1156 KOps/s 393.1903 KOps/s $\color{#d91a1a}-0.53\%$
test_membership_stacked_nested 23.5210μs 2.6302μs 380.2046 KOps/s 392.8956 KOps/s $\color{#d91a1a}-3.23\%$
test_membership_stacked_nested_leaf 32.3510μs 2.5847μs 386.8987 KOps/s 394.0558 KOps/s $\color{#d91a1a}-1.82\%$
test_membership_nested_last 21.7810μs 3.0958μs 323.0148 KOps/s 324.1556 KOps/s $\color{#d91a1a}-0.35\%$
test_membership_nested_leaf_last 33.5510μs 3.0849μs 324.1592 KOps/s 327.7962 KOps/s $\color{#d91a1a}-1.11\%$
test_membership_stacked_nested_last 20.8110μs 3.8639μs 258.8081 KOps/s 325.7642 KOps/s $\textbf{\color{#d91a1a}-20.55\%}$
test_membership_stacked_nested_leaf_last 34.2710μs 3.9054μs 256.0561 KOps/s 325.6503 KOps/s $\textbf{\color{#d91a1a}-21.37\%}$
test_nested_getleaf 25.4110μs 8.3679μs 119.5049 KOps/s 119.4621 KOps/s $\color{#35bf28}+0.04\%$
test_nested_get 35.3530μs 7.8858μs 126.8099 KOps/s 127.3604 KOps/s $\color{#d91a1a}-0.43\%$
test_stacked_getleaf 37.4720μs 8.3997μs 119.0516 KOps/s 119.5355 KOps/s $\color{#d91a1a}-0.40\%$
test_stacked_get 24.2410μs 7.9312μs 126.0840 KOps/s 126.7606 KOps/s $\color{#d91a1a}-0.53\%$
test_nested_getitemleaf 39.1420μs 8.5290μs 117.2472 KOps/s 117.1221 KOps/s $\color{#35bf28}+0.11\%$
test_nested_getitem 61.6130μs 8.0470μs 124.2701 KOps/s 124.6169 KOps/s $\color{#d91a1a}-0.28\%$
test_stacked_getitemleaf 31.8720μs 8.5596μs 116.8285 KOps/s 117.3783 KOps/s $\color{#d91a1a}-0.47\%$
test_stacked_getitem 36.4820μs 8.0433μs 124.3277 KOps/s 124.1710 KOps/s $\color{#35bf28}+0.13\%$
test_lock_nested 59.0627ms 0.4030ms 2.4815 KOps/s 2.4889 KOps/s $\color{#d91a1a}-0.30\%$
test_lock_stack_nested 0.3316ms 0.3001ms 3.3318 KOps/s 3.3317 KOps/s $+0.01\%$
test_unlock_nested 60.9244ms 0.4055ms 2.4662 KOps/s 2.4554 KOps/s $\color{#35bf28}+0.44\%$
test_unlock_stack_nested 0.3567ms 0.3078ms 3.2489 KOps/s 3.2422 KOps/s $\color{#35bf28}+0.21\%$
test_flatten_speed 0.3541ms 0.1009ms 9.9093 KOps/s 9.8782 KOps/s $\color{#35bf28}+0.32\%$
test_unflatten_speed 0.3329ms 0.2876ms 3.4770 KOps/s 3.4577 KOps/s $\color{#35bf28}+0.56\%$
test_common_ops 1.0761ms 0.5881ms 1.7003 KOps/s 1.8943 KOps/s $\textbf{\color{#d91a1a}-10.24\%}$
test_creation 13.6900μs 1.6386μs 610.2794 KOps/s 618.5483 KOps/s $\color{#d91a1a}-1.34\%$
test_creation_empty 24.5020μs 9.3211μs 107.2834 KOps/s 161.9405 KOps/s $\textbf{\color{#d91a1a}-33.75\%}$
test_creation_nested_1 41.7020μs 11.1521μs 89.6691 KOps/s 124.7634 KOps/s $\textbf{\color{#d91a1a}-28.13\%}$
test_creation_nested_2 32.6520μs 13.2558μs 75.4387 KOps/s 99.3411 KOps/s $\textbf{\color{#d91a1a}-24.06\%}$
test_clone 36.5520μs 11.9919μs 83.3898 KOps/s 84.6307 KOps/s $\color{#d91a1a}-1.47\%$
test_getitem[int] 56.1830μs 10.8433μs 92.2230 KOps/s 94.0944 KOps/s $\color{#d91a1a}-1.99\%$
test_getitem[slice_int] 49.1130μs 20.7275μs 48.2452 KOps/s 48.6733 KOps/s $\color{#d91a1a}-0.88\%$
test_getitem[range] 66.6630μs 51.6824μs 19.3489 KOps/s 21.4628 KOps/s $\textbf{\color{#d91a1a}-9.85\%}$
test_getitem[tuple] 47.8620μs 18.4220μs 54.2830 KOps/s 54.2612 KOps/s $\color{#35bf28}+0.04\%$
test_getitem[list] 0.1373ms 33.9020μs 29.4968 KOps/s 31.3258 KOps/s $\textbf{\color{#d91a1a}-5.84\%}$
test_setitem_dim[int] 48.8020μs 29.2027μs 34.2434 KOps/s 37.5194 KOps/s $\textbf{\color{#d91a1a}-8.73\%}$
test_setitem_dim[slice_int] 67.0430μs 49.2385μs 20.3093 KOps/s 21.3829 KOps/s $\textbf{\color{#d91a1a}-5.02\%}$
test_setitem_dim[range] 0.1043ms 66.3057μs 15.0817 KOps/s 15.9065 KOps/s $\textbf{\color{#d91a1a}-5.19\%}$
test_setitem_dim[tuple] 60.7730μs 43.1262μs 23.1877 KOps/s 24.2188 KOps/s $\color{#d91a1a}-4.26\%$
test_setitem 40.4820μs 16.8943μs 59.1914 KOps/s 67.2053 KOps/s $\textbf{\color{#d91a1a}-11.92\%}$
test_set 47.7830μs 16.1118μs 62.0663 KOps/s 68.8897 KOps/s $\textbf{\color{#d91a1a}-9.90\%}$
test_set_shared 1.6509ms 99.7247μs 10.0276 KOps/s 10.1543 KOps/s $\color{#d91a1a}-1.25\%$
test_update 69.6430μs 19.4223μs 51.4872 KOps/s 63.1994 KOps/s $\textbf{\color{#d91a1a}-18.53\%}$
test_update_nested 68.5630μs 24.4970μs 40.8213 KOps/s 47.7140 KOps/s $\textbf{\color{#d91a1a}-14.45\%}$
test_update__nested 56.3320μs 22.7084μs 44.0366 KOps/s 45.5721 KOps/s $\color{#d91a1a}-3.37\%$
test_set_nested 60.9230μs 17.4627μs 57.2649 KOps/s 64.6696 KOps/s $\textbf{\color{#d91a1a}-11.45\%}$
test_set_nested_new 58.0030μs 19.9904μs 50.0241 KOps/s 55.3885 KOps/s $\textbf{\color{#d91a1a}-9.68\%}$
test_select 65.2430μs 32.5956μs 30.6789 KOps/s 31.8084 KOps/s $\color{#d91a1a}-3.55\%$
test_select_nested 0.9430ms 55.0771μs 18.1564 KOps/s 18.4845 KOps/s $\color{#d91a1a}-1.78\%$
test_exclude_nested 0.1440ms 0.1102ms 9.0776 KOps/s 9.1086 KOps/s $\color{#d91a1a}-0.34\%$
test_empty[True] 0.3909ms 0.3433ms 2.9129 KOps/s 2.8803 KOps/s $\color{#35bf28}+1.13\%$
test_empty[False] 3.2211μs 0.9279μs 1.0777 MOps/s 1.0881 MOps/s $\color{#d91a1a}-0.95\%$
test_to 0.1042ms 78.3961μs 12.7557 KOps/s 12.8203 KOps/s $\color{#d91a1a}-0.50\%$
test_to_nonblocking 95.4040μs 62.1426μs 16.0920 KOps/s 16.2158 KOps/s $\color{#d91a1a}-0.76\%$
test_unbind_speed 0.2952ms 0.2633ms 3.7981 KOps/s 3.8514 KOps/s $\color{#d91a1a}-1.38\%$
test_unbind_speed_stack0 0.3229ms 0.2667ms 3.7495 KOps/s 3.8279 KOps/s $\color{#d91a1a}-2.05\%$
test_unbind_speed_stack1 75.7494ms 0.8108ms 1.2333 KOps/s 1.2300 KOps/s $\color{#35bf28}+0.27\%$
test_split 76.0605ms 1.6785ms 595.7728 Ops/s 605.0600 Ops/s $\color{#d91a1a}-1.53\%$
test_chunk 75.9622ms 1.6696ms 598.9603 Ops/s 606.3781 Ops/s $\color{#d91a1a}-1.22\%$
test_creation[device0] 0.1284ms 57.0999μs 17.5132 KOps/s 16.8370 KOps/s $\color{#35bf28}+4.02\%$
test_creation_from_tensor 0.1302ms 52.9706μs 18.8784 KOps/s 17.5763 KOps/s $\textbf{\color{#35bf28}+7.41\%}$
test_add_one[memmap_tensor0] 79.2040μs 6.8524μs 145.9340 KOps/s 144.7500 KOps/s $\color{#35bf28}+0.82\%$
test_contiguous[memmap_tensor0] 10.5010μs 0.6628μs 1.5087 MOps/s 1.4690 MOps/s $\color{#35bf28}+2.71\%$
test_stack[memmap_tensor0] 27.7110μs 4.7752μs 209.4167 KOps/s 214.1503 KOps/s $\color{#d91a1a}-2.21\%$
test_memmaptd_index 1.0556ms 0.2932ms 3.4102 KOps/s 3.4471 KOps/s $\color{#d91a1a}-1.07\%$
test_memmaptd_index_astensor 0.7159ms 0.3592ms 2.7841 KOps/s 2.7957 KOps/s $\color{#d91a1a}-0.41\%$
test_memmaptd_index_op 0.9458ms 0.6685ms 1.4959 KOps/s 1.6261 KOps/s $\textbf{\color{#d91a1a}-8.00\%}$
test_serialize_model 0.1823s 0.1100s 9.0908 Ops/s 9.6225 Ops/s $\textbf{\color{#d91a1a}-5.53\%}$
test_serialize_model_pickle 1.3497s 1.2349s 0.8098 Ops/s 0.8090 Ops/s $\color{#35bf28}+0.09\%$
test_serialize_weights 0.1798s 0.1078s 9.2781 Ops/s 8.8334 Ops/s $\textbf{\color{#35bf28}+5.04\%}$
test_serialize_weights_returnearly 0.2899s 0.1008s 9.9244 Ops/s 10.0721 Ops/s $\color{#d91a1a}-1.47\%$
test_serialize_weights_pickle 1.3535s 1.2483s 0.8011 Ops/s 0.8011 Ops/s $-0.00\%$
test_reshape_pytree 0.2248ms 26.0385μs 38.4047 KOps/s 38.5496 KOps/s $\color{#d91a1a}-0.38\%$
test_reshape_td 57.0230μs 31.0161μs 32.2414 KOps/s 32.3691 KOps/s $\color{#d91a1a}-0.39\%$
test_view_pytree 90.1840μs 25.8163μs 38.7353 KOps/s 38.8371 KOps/s $\color{#d91a1a}-0.26\%$
test_view_td 0.2534ms 36.7618μs 27.2022 KOps/s 26.6079 KOps/s $\color{#35bf28}+2.23\%$
test_unbind_pytree 66.9540μs 31.6038μs 31.6418 KOps/s 30.2366 KOps/s $\color{#35bf28}+4.65\%$
test_unbind_td 0.5003ms 42.6429μs 23.4506 KOps/s 24.6029 KOps/s $\color{#d91a1a}-4.68\%$
test_split_pytree 0.1673ms 35.3429μs 28.2942 KOps/s 28.8348 KOps/s $\color{#d91a1a}-1.87\%$
test_split_td 0.2526ms 40.9963μs 24.3925 KOps/s 25.9965 KOps/s $\textbf{\color{#d91a1a}-6.17\%}$
test_add_pytree 66.5030μs 37.3061μs 26.8052 KOps/s 26.6335 KOps/s $\color{#35bf28}+0.64\%$
test_add_td 0.2552ms 49.1382μs 20.3507 KOps/s 20.4660 KOps/s $\color{#d91a1a}-0.56\%$
test_distributed 2.4276ms 69.5977μs 14.3683 KOps/s 14.3977 KOps/s $\color{#d91a1a}-0.20\%$
test_tdmodule 30.6810μs 14.6701μs 68.1659 KOps/s 73.8308 KOps/s $\textbf{\color{#d91a1a}-7.67\%}$
test_tdmodule_dispatch 46.5320μs 29.3253μs 34.1003 KOps/s 37.9724 KOps/s $\textbf{\color{#d91a1a}-10.20\%}$
test_tdseq 26.6410μs 16.6049μs 60.2230 KOps/s 66.2328 KOps/s $\textbf{\color{#d91a1a}-9.07\%}$
test_tdseq_dispatch 56.7730μs 32.8087μs 30.4798 KOps/s 34.5744 KOps/s $\textbf{\color{#d91a1a}-11.84\%}$
test_instantiation_functorch 1.6297ms 1.5246ms 655.9267 Ops/s 662.1924 Ops/s $\color{#d91a1a}-0.95\%$
test_instantiation_td 1.5557ms 1.0404ms 961.1259 Ops/s 972.6590 Ops/s $\color{#d91a1a}-1.19\%$
test_exec_functorch 0.1947ms 0.1493ms 6.6991 KOps/s 6.5931 KOps/s $\color{#35bf28}+1.61\%$
test_exec_functional_call 0.2028ms 0.1374ms 7.2769 KOps/s 7.2071 KOps/s $\color{#35bf28}+0.97\%$
test_exec_td 0.2188ms 0.1367ms 7.3166 KOps/s 7.3457 KOps/s $\color{#d91a1a}-0.40\%$
test_exec_td_decorator 0.4574ms 0.2072ms 4.8254 KOps/s 4.7495 KOps/s $\color{#35bf28}+1.60\%$
test_vmap_mlp_speed[True-True] 0.6320ms 0.5814ms 1.7200 KOps/s 1.7428 KOps/s $\color{#d91a1a}-1.31\%$
test_vmap_mlp_speed[True-False] 0.6476ms 0.5815ms 1.7197 KOps/s 1.7111 KOps/s $\color{#35bf28}+0.50\%$
test_vmap_mlp_speed[False-True] 0.5623ms 0.5137ms 1.9468 KOps/s 1.9185 KOps/s $\color{#35bf28}+1.48\%$
test_vmap_mlp_speed[False-False] 0.5790ms 0.5132ms 1.9484 KOps/s 1.8714 KOps/s $\color{#35bf28}+4.11\%$
test_vmap_mlp_speed_decorator[True-True] 1.0465ms 0.6493ms 1.5401 KOps/s 1.5696 KOps/s $\color{#d91a1a}-1.88\%$
test_vmap_mlp_speed_decorator[True-False] 0.7818ms 0.6458ms 1.5485 KOps/s 1.5727 KOps/s $\color{#d91a1a}-1.54\%$
test_vmap_mlp_speed_decorator[False-True] 0.7136ms 0.5719ms 1.7486 KOps/s 1.7559 KOps/s $\color{#d91a1a}-0.42\%$
test_vmap_mlp_speed_decorator[False-False] 0.7448ms 0.5719ms 1.7486 KOps/s 1.7650 KOps/s $\color{#d91a1a}-0.93\%$
test_vmap_transformer_speed[True-True] 7.7443ms 7.6455ms 130.7951 Ops/s 130.9727 Ops/s $\color{#d91a1a}-0.14\%$
test_vmap_transformer_speed[True-False] 7.7265ms 7.6177ms 131.2738 Ops/s 126.0283 Ops/s $\color{#35bf28}+4.16\%$
test_vmap_transformer_speed[False-True] 7.9649ms 7.6196ms 131.2399 Ops/s 128.0850 Ops/s $\color{#35bf28}+2.46\%$
test_vmap_transformer_speed[False-False] 7.6445ms 7.5832ms 131.8701 Ops/s 130.0957 Ops/s $\color{#35bf28}+1.36\%$
test_vmap_transformer_speed_decorator[True-True] 18.7486ms 18.6345ms 53.6640 Ops/s 53.6545 Ops/s $\color{#35bf28}+0.02\%$
test_vmap_transformer_speed_decorator[True-False] 18.6834ms 18.6160ms 53.7173 Ops/s 53.8816 Ops/s $\color{#d91a1a}-0.30\%$
test_vmap_transformer_speed_decorator[False-True] 18.6612ms 18.5324ms 53.9596 Ops/s 54.0617 Ops/s $\color{#d91a1a}-0.19\%$
test_vmap_transformer_speed_decorator[False-False] 19.0313ms 18.5433ms 53.9278 Ops/s 54.2197 Ops/s $\color{#d91a1a}-0.54\%$
test_to_module_speed[True] 1.6560ms 1.5305ms 653.3724 Ops/s 651.4152 Ops/s $\color{#35bf28}+0.30\%$
test_to_module_speed[False] 1.6276ms 1.5033ms 665.2084 Ops/s 661.4320 Ops/s $\color{#35bf28}+0.57\%$
test_tc_init 46.9720μs 26.7016μs 37.4509 KOps/s 50.5623 KOps/s $\textbf{\color{#d91a1a}-25.93\%}$
test_tc_init_nested 80.2550μs 51.7701μs 19.3162 KOps/s 24.5863 KOps/s $\textbf{\color{#d91a1a}-21.44\%}$
test_tc_first_layer_tensor 0.7618μs 0.3638μs 2.7486 MOps/s 2.7470 MOps/s $\color{#35bf28}+0.06\%$
test_tc_first_layer_nontensor 2.8752μs 0.3945μs 2.5351 MOps/s 2.5429 MOps/s $\color{#d91a1a}-0.31\%$
test_tc_second_layer_tensor 12.3510μs 1.0679μs 936.3972 KOps/s 925.2412 KOps/s $\color{#35bf28}+1.21\%$
test_tc_second_layer_nontensor 1.6156μs 0.8043μs 1.2433 MOps/s 1.1973 MOps/s $\color{#35bf28}+3.85\%$
test_unbind 0.1117s 6.8091ms 146.8626 Ops/s 195.9429 Ops/s $\textbf{\color{#d91a1a}-25.05\%}$
test_full_like 13.5312ms 13.1517ms 76.0359 Ops/s 89.0079 Ops/s $\textbf{\color{#d91a1a}-14.57\%}$
test_zeros_like 8.3105ms 7.8772ms 126.9482 Ops/s 126.6457 Ops/s $\color{#35bf28}+0.24\%$
test_ones_like 8.3604ms 7.9084ms 126.4477 Ops/s 126.6837 Ops/s $\color{#d91a1a}-0.19\%$
test_clone 9.4950ms 9.2655ms 107.9270 Ops/s 108.1171 Ops/s $\color{#d91a1a}-0.18\%$
test_squeeze 64.5130μs 10.9768μs 91.1015 KOps/s 90.2684 KOps/s $\color{#35bf28}+0.92\%$
test_unsqueeze 0.1128ms 51.2741μs 19.5030 KOps/s 18.9019 KOps/s $\color{#35bf28}+3.18\%$
test_split 0.1376ms 95.7752μs 10.4411 KOps/s 10.2145 KOps/s $\color{#35bf28}+2.22\%$
test_permute 0.1405ms 0.1102ms 9.0754 KOps/s 8.6941 KOps/s $\color{#35bf28}+4.39\%$
test_stack 27.0567ms 26.7147ms 37.4326 Ops/s 37.4338 Ops/s $-0.00\%$
test_cat 27.0756ms 26.6812ms 37.4796 Ops/s 37.3677 Ops/s $\color{#35bf28}+0.30\%$

Copy link
Contributor

@MateuszGuzek MateuszGuzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quickly checked and it seems that memmaps have write access for owner

-rw-r--r--   test/artifacts/mmap_example/nested/bfloat16.memmap

I think it will be better to set it to read only to enable testing this functionality (should work as long as test are not run by root).

Otherwise LGTM!

@vmoens
Copy link
Contributor Author

vmoens commented Jun 26, 2024

Good point @MateuszGuzek thanks!

@vmoens
Copy link
Contributor Author

vmoens commented Jun 26, 2024

@MateuszGuzek upon reflection I wonder if that's a good idea. We want to avoid people cloning a repo and having some weird behaviour when they / someone else deletes the local copy of the repo...

@vmoens vmoens merged commit d89e5c0 into main Jun 26, 2024
37 of 42 checks passed
@vmoens vmoens deleted the test-mmap-load branch June 26, 2024 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants