-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Best attempt to densly stack sub-tds when LazyStacked TDS are passed to maybe_dense_stack #799
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 36.1380μs | 17.2399μs | 58.0050 KOps/s | 61.0208 KOps/s | |
test_plain_set_stack_nested | 52.4690μs | 17.3866μs | 57.5155 KOps/s | 59.6639 KOps/s | |
test_plain_set_nested_inplace | 65.5420μs | 19.5800μs | 51.0726 KOps/s | 52.4264 KOps/s | |
test_plain_set_stack_nested_inplace | 66.9550μs | 19.6111μs | 50.9916 KOps/s | 53.2368 KOps/s | |
test_items | 28.6830μs | 2.5261μs | 395.8700 KOps/s | 401.1831 KOps/s | |
test_items_nested | 0.5674ms | 0.2678ms | 3.7337 KOps/s | 3.7280 KOps/s | |
test_items_nested_locked | 1.1837ms | 0.2716ms | 3.6825 KOps/s | 3.6418 KOps/s | |
test_items_nested_leaf | 0.1551ms | 76.6907μs | 13.0394 KOps/s | 13.0154 KOps/s | |
test_items_stack_nested | 0.8646ms | 0.2701ms | 3.7026 KOps/s | 3.7263 KOps/s | |
test_items_stack_nested_leaf | 0.1533ms | 77.4368μs | 12.9138 KOps/s | 12.4886 KOps/s | |
test_items_stack_nested_locked | 0.5673ms | 0.2711ms | 3.6887 KOps/s | 3.6524 KOps/s | |
test_keys | 22.4320μs | 3.8684μs | 258.5069 KOps/s | 251.2199 KOps/s | |
test_keys_nested | 0.2638ms | 0.1377ms | 7.2626 KOps/s | 7.2879 KOps/s | |
test_keys_nested_locked | 0.7358ms | 0.1415ms | 7.0687 KOps/s | 6.5193 KOps/s | |
test_keys_nested_leaf | 0.2262ms | 0.1164ms | 8.5889 KOps/s | 8.5176 KOps/s | |
test_keys_stack_nested | 0.2333ms | 0.1351ms | 7.4040 KOps/s | 7.3045 KOps/s | |
test_keys_stack_nested_leaf | 0.2303ms | 0.1151ms | 8.6844 KOps/s | 8.5434 KOps/s | |
test_keys_stack_nested_locked | 0.2338ms | 0.1387ms | 7.2112 KOps/s | 7.0602 KOps/s | |
test_values | 12.2955μs | 1.1657μs | 857.8806 KOps/s | 872.9479 KOps/s | |
test_values_nested | 88.9860μs | 50.2366μs | 19.9058 KOps/s | 19.6808 KOps/s | |
test_values_nested_locked | 0.1163ms | 50.2829μs | 19.8875 KOps/s | 19.5508 KOps/s | |
test_values_nested_leaf | 0.1011ms | 46.2508μs | 21.6213 KOps/s | 21.7506 KOps/s | |
test_values_stack_nested | 99.1960μs | 51.7724μs | 19.3153 KOps/s | 19.1464 KOps/s | |
test_values_stack_nested_leaf | 83.4860μs | 44.8847μs | 22.2793 KOps/s | 21.6353 KOps/s | |
test_values_stack_nested_locked | 0.1076ms | 51.5448μs | 19.4006 KOps/s | 19.3303 KOps/s | |
test_membership | 33.6430μs | 1.3642μs | 733.0166 KOps/s | 740.0969 KOps/s | |
test_membership_nested | 47.0560μs | 3.4688μs | 288.2800 KOps/s | 297.7270 KOps/s | |
test_membership_nested_leaf | 21.3600μs | 3.5011μs | 285.6266 KOps/s | 272.4442 KOps/s | |
test_membership_stacked_nested | 44.1020μs | 3.4467μs | 290.1344 KOps/s | 277.9604 KOps/s | |
test_membership_stacked_nested_leaf | 25.2470μs | 3.4734μs | 287.8993 KOps/s | 292.1649 KOps/s | |
test_membership_nested_last | 27.1010μs | 4.2507μs | 235.2538 KOps/s | 237.6049 KOps/s | |
test_membership_nested_leaf_last | 49.9040μs | 4.2370μs | 236.0177 KOps/s | 239.1778 KOps/s | |
test_membership_stacked_nested_last | 20.6680μs | 6.8387μs | 146.2276 KOps/s | 188.9014 KOps/s | |
test_membership_stacked_nested_leaf_last | 54.5220μs | 6.7721μs | 147.6650 KOps/s | 188.6992 KOps/s | |
test_nested_getleaf | 54.1410μs | 10.4353μs | 95.8281 KOps/s | 95.3513 KOps/s | |
test_nested_get | 44.9840μs | 9.8822μs | 101.1916 KOps/s | 100.4421 KOps/s | |
test_stacked_getleaf | 56.3050μs | 10.3023μs | 97.0658 KOps/s | 96.6423 KOps/s | |
test_stacked_get | 32.7620μs | 9.7578μs | 102.4817 KOps/s | 101.4904 KOps/s | |
test_nested_getitemleaf | 42.2790μs | 11.1984μs | 89.2986 KOps/s | 90.4287 KOps/s | |
test_nested_getitem | 56.5860μs | 10.1837μs | 98.1960 KOps/s | 98.1151 KOps/s | |
test_stacked_getitemleaf | 49.3820μs | 11.4506μs | 87.3318 KOps/s | 92.2719 KOps/s | |
test_stacked_getitem | 31.4990μs | 10.0558μs | 99.4456 KOps/s | 99.8725 KOps/s | |
test_lock_nested | 0.7851ms | 0.3477ms | 2.8757 KOps/s | 2.8953 KOps/s | |
test_lock_stack_nested | 0.5446ms | 0.3014ms | 3.3175 KOps/s | 3.2474 KOps/s | |
test_unlock_nested | 0.7450ms | 0.3498ms | 2.8585 KOps/s | 2.5408 KOps/s | |
test_unlock_stack_nested | 0.5041ms | 0.3097ms | 3.2288 KOps/s | 3.1614 KOps/s | |
test_flatten_speed | 0.5669ms | 99.7113μs | 10.0290 KOps/s | 10.4799 KOps/s | |
test_unflatten_speed | 0.6228ms | 0.4092ms | 2.4435 KOps/s | 2.4216 KOps/s | |
test_common_ops | 3.5146ms | 0.7194ms | 1.3901 KOps/s | 1.4446 KOps/s | |
test_creation | 12.5030μs | 1.9133μs | 522.6621 KOps/s | 520.7480 KOps/s | |
test_creation_empty | 34.4040μs | 11.2396μs | 88.9713 KOps/s | 104.6206 KOps/s | |
test_creation_nested_1 | 41.5580μs | 14.0626μs | 71.1105 KOps/s | 81.1235 KOps/s | |
test_creation_nested_2 | 0.2511ms | 17.3453μs | 57.6525 KOps/s | 63.2576 KOps/s | |
test_clone | 41.8980μs | 13.4389μs | 74.4111 KOps/s | 73.5740 KOps/s | |
test_getitem[int] | 0.1867ms | 13.1151μs | 76.2480 KOps/s | 88.7361 KOps/s | |
test_getitem[slice_int] | 70.5420μs | 22.5381μs | 44.3693 KOps/s | 43.3849 KOps/s | |
test_getitem[range] | 80.0100μs | 61.0222μs | 16.3875 KOps/s | 17.5659 KOps/s | |
test_getitem[tuple] | 50.3840μs | 18.9148μs | 52.8685 KOps/s | 52.9558 KOps/s | |
test_getitem[list] | 0.1092ms | 40.8690μs | 24.4684 KOps/s | 24.8363 KOps/s | |
test_setitem_dim[int] | 60.7440μs | 35.5763μs | 28.1086 KOps/s | 30.9296 KOps/s | |
test_setitem_dim[slice_int] | 0.1798ms | 62.7414μs | 15.9384 KOps/s | 16.6233 KOps/s | |
test_setitem_dim[range] | 0.1196ms | 83.4135μs | 11.9885 KOps/s | 12.4352 KOps/s | |
test_setitem_dim[tuple] | 86.8630μs | 50.5220μs | 19.7934 KOps/s | 20.8188 KOps/s | |
test_setitem | 75.8220μs | 20.6278μs | 48.4783 KOps/s | 50.8417 KOps/s | |
test_set | 47.5590μs | 20.2371μs | 49.4142 KOps/s | 51.7978 KOps/s | |
test_set_shared | 2.9429ms | 0.1409ms | 7.0961 KOps/s | 7.2035 KOps/s | |
test_update | 0.1020ms | 22.4738μs | 44.4963 KOps/s | 48.4133 KOps/s | |
test_update_nested | 0.2784ms | 31.2964μs | 31.9525 KOps/s | 34.9567 KOps/s | |
test_update__nested | 58.6300μs | 25.2570μs | 39.5929 KOps/s | 40.2742 KOps/s | |
test_set_nested | 60.9940μs | 21.9529μs | 45.5520 KOps/s | 47.4747 KOps/s | |
test_set_nested_new | 83.0050μs | 26.9850μs | 37.0576 KOps/s | 39.6845 KOps/s | |
test_select | 83.8860μs | 41.8176μs | 23.9134 KOps/s | 24.3224 KOps/s | |
test_select_nested | 0.1360ms | 60.2073μs | 16.6093 KOps/s | 16.5563 KOps/s | |
test_exclude_nested | 0.4944ms | 0.1225ms | 8.1648 KOps/s | 8.3622 KOps/s | |
test_empty[True] | 0.6129ms | 0.3940ms | 2.5381 KOps/s | 2.5534 KOps/s | |
test_empty[False] | 10.3674μs | 1.1713μs | 853.7853 KOps/s | 840.3650 KOps/s | |
test_unbind_speed | 1.5590ms | 0.2606ms | 3.8369 KOps/s | 3.8365 KOps/s | |
test_unbind_speed_stack0 | 0.4972ms | 0.2512ms | 3.9810 KOps/s | 3.9707 KOps/s | |
test_unbind_speed_stack1 | 67.7382ms | 0.7026ms | 1.4232 KOps/s | 1.2893 KOps/s | |
test_split | 68.6355ms | 1.6226ms | 616.2856 Ops/s | 622.1792 Ops/s | |
test_chunk | 66.3152ms | 1.6167ms | 618.5620 Ops/s | 620.8810 Ops/s | |
test_creation[device0] | 0.1857ms | 84.0041μs | 11.9042 KOps/s | 11.9633 KOps/s | |
test_creation_from_tensor | 3.8386ms | 85.5313μs | 11.6916 KOps/s | 11.8731 KOps/s | |
test_add_one[memmap_tensor0] | 67.0450μs | 5.2829μs | 189.2901 KOps/s | 178.4011 KOps/s | |
test_contiguous[memmap_tensor0] | 9.2280μs | 0.6304μs | 1.5864 MOps/s | 1.5932 MOps/s | |
test_stack[memmap_tensor0] | 51.6060μs | 3.4047μs | 293.7092 KOps/s | 280.8643 KOps/s | |
test_memmaptd_index | 0.9924ms | 0.2559ms | 3.9074 KOps/s | 3.9731 KOps/s | |
test_memmaptd_index_astensor | 0.7765ms | 0.3327ms | 3.0060 KOps/s | 3.0728 KOps/s | |
test_memmaptd_index_op | 0.9745ms | 0.6188ms | 1.6161 KOps/s | 1.7096 KOps/s | |
test_serialize_model | 0.1774s | 0.1127s | 8.8721 Ops/s | 8.3531 Ops/s | |
test_serialize_model_pickle | 0.4462s | 0.3740s | 2.6736 Ops/s | 2.6047 Ops/s | |
test_serialize_weights | 0.1646s | 0.1102s | 9.0731 Ops/s | 8.7215 Ops/s | |
test_serialize_weights_returnearly | 0.1410s | 0.1283s | 7.7918 Ops/s | 7.7932 Ops/s | |
test_serialize_weights_pickle | 0.7404s | 0.4808s | 2.0798 Ops/s | 2.3737 Ops/s | |
test_serialize_weights_filesystem | 99.1066ms | 92.3789ms | 10.8250 Ops/s | 9.7398 Ops/s | |
test_serialize_model_filesystem | 0.1627s | 0.1001s | 9.9887 Ops/s | 10.6609 Ops/s | |
test_reshape_pytree | 51.0350μs | 25.2809μs | 39.5556 KOps/s | 40.0006 KOps/s | |
test_reshape_td | 84.7090μs | 34.2665μs | 29.1830 KOps/s | 29.0899 KOps/s | |
test_view_pytree | 58.1390μs | 25.1049μs | 39.8328 KOps/s | 40.3423 KOps/s | |
test_view_td | 93.8250μs | 38.3650μs | 26.0655 KOps/s | 25.9877 KOps/s | |
test_unbind_pytree | 72.4260μs | 29.0884μs | 34.3780 KOps/s | 34.5134 KOps/s | |
test_unbind_td | 0.4461ms | 38.5741μs | 25.9241 KOps/s | 26.6927 KOps/s | |
test_split_pytree | 63.4590μs | 28.9557μs | 34.5355 KOps/s | 34.9678 KOps/s | |
test_split_td | 0.1266ms | 41.2663μs | 24.2328 KOps/s | 24.4253 KOps/s | |
test_add_pytree | 79.5990μs | 34.3006μs | 29.1540 KOps/s | 29.5712 KOps/s | |
test_add_td | 0.1367ms | 55.7194μs | 17.9471 KOps/s | 19.2365 KOps/s | |
test_distributed | 0.1800ms | 0.1012ms | 9.8798 KOps/s | 9.8098 KOps/s | |
test_tdmodule | 37.0090μs | 17.8689μs | 55.9630 KOps/s | 60.1068 KOps/s | |
test_tdmodule_dispatch | 65.4820μs | 35.6921μs | 28.0174 KOps/s | 29.7575 KOps/s | |
test_tdseq | 40.9160μs | 21.4418μs | 46.6378 KOps/s | 51.3158 KOps/s | |
test_tdseq_dispatch | 65.7830μs | 41.8305μs | 23.9060 KOps/s | 25.7080 KOps/s | |
test_instantiation_functorch | 3.0941ms | 1.3009ms | 768.6999 Ops/s | 760.8725 Ops/s | |
test_instantiation_td | 1.7859ms | 1.0119ms | 988.1992 Ops/s | 995.4290 Ops/s | |
test_exec_functorch | 0.2888ms | 0.1618ms | 6.1817 KOps/s | 6.1066 KOps/s | |
test_exec_functional_call | 0.3336ms | 0.1468ms | 6.8118 KOps/s | 6.3263 KOps/s | |
test_exec_td | 0.2347ms | 0.1423ms | 7.0271 KOps/s | 6.7982 KOps/s | |
test_exec_td_decorator | 0.9583ms | 0.2245ms | 4.4551 KOps/s | 4.0495 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.8023ms | 0.4980ms | 2.0081 KOps/s | 2.0718 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.7726ms | 0.4951ms | 2.0199 KOps/s | 2.0788 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.6185ms | 0.4006ms | 2.4963 KOps/s | 2.5346 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.6470ms | 0.4012ms | 2.4926 KOps/s | 2.5416 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.1675ms | 0.5708ms | 1.7519 KOps/s | 1.8032 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.0300ms | 0.5686ms | 1.7586 KOps/s | 1.8107 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.6710ms | 0.4638ms | 2.1560 KOps/s | 2.1871 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7282ms | 0.4652ms | 2.1498 KOps/s | 2.1827 KOps/s | |
test_to_module_speed[True] | 2.0250ms | 1.6922ms | 590.9514 Ops/s | 593.5919 Ops/s | |
test_to_module_speed[False] | 2.6754ms | 1.6702ms | 598.7483 Ops/s | 600.3164 Ops/s | |
test_tc_init | 57.4070μs | 30.1154μs | 33.2056 KOps/s | 38.5811 KOps/s | |
test_tc_init_nested | 0.1104ms | 61.9059μs | 16.1535 KOps/s | 18.3438 KOps/s | |
test_tc_first_layer_tensor | 4.7919μs | 0.6888μs | 1.4517 MOps/s | 1.4354 MOps/s | |
test_tc_first_layer_nontensor | 3.8744μs | 0.6847μs | 1.4605 MOps/s | 1.4887 MOps/s | |
test_tc_second_layer_tensor | 31.4290μs | 1.8679μs | 535.3475 KOps/s | 540.5146 KOps/s | |
test_tc_second_layer_nontensor | 16.9480μs | 1.5455μs | 647.0388 KOps/s | 606.8925 KOps/s | |
test_unbind | 95.3116ms | 6.7311ms | 148.5639 Ops/s | 136.3652 Ops/s | |
test_full_like | 15.6392ms | 11.2311ms | 89.0387 Ops/s | 94.9418 Ops/s | |
test_zeros_like | 12.7691ms | 6.1638ms | 162.2374 Ops/s | 168.7050 Ops/s | |
test_ones_like | 11.9608ms | 6.5543ms | 152.5713 Ops/s | 160.6744 Ops/s | |
test_clone | 15.3112ms | 7.8692ms | 127.0778 Ops/s | 126.3658 Ops/s | |
test_squeeze | 79.4690μs | 14.4905μs | 69.0107 KOps/s | 72.0751 KOps/s | |
test_unsqueeze | 0.1189ms | 60.5513μs | 16.5149 KOps/s | 16.6761 KOps/s | |
test_split | 0.2443ms | 0.1115ms | 8.9693 KOps/s | 8.7581 KOps/s | |
test_permute | 0.1987ms | 0.1263ms | 7.9149 KOps/s | 7.9247 KOps/s | |
test_stack | 28.6346ms | 22.6927ms | 44.0671 Ops/s | 43.6584 Ops/s | |
test_cat | 28.8995ms | 22.6409ms | 44.1678 Ops/s | 44.9788 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 0.5618ms | 13.2933μs | 75.2260 KOps/s | 75.8778 KOps/s | |
test_plain_set_stack_nested | 48.0000μs | 13.4286μs | 74.4678 KOps/s | 73.9275 KOps/s | |
test_plain_set_nested_inplace | 41.7400μs | 14.5325μs | 68.8112 KOps/s | 68.5903 KOps/s | |
test_plain_set_stack_nested_inplace | 40.8200μs | 14.7289μs | 67.8935 KOps/s | 68.2655 KOps/s | |
test_items | 17.5900μs | 4.6637μs | 214.4238 KOps/s | 210.6500 KOps/s | |
test_items_nested | 0.3752ms | 0.3436ms | 2.9102 KOps/s | 2.9476 KOps/s | |
test_items_nested_locked | 0.3868ms | 0.3508ms | 2.8508 KOps/s | 2.9165 KOps/s | |
test_items_nested_leaf | 0.1040ms | 83.4444μs | 11.9840 KOps/s | 12.0688 KOps/s | |
test_items_stack_nested | 0.3979ms | 0.3439ms | 2.9075 KOps/s | 2.9000 KOps/s | |
test_items_stack_nested_leaf | 0.1045ms | 83.9151μs | 11.9168 KOps/s | 12.1251 KOps/s | |
test_items_stack_nested_locked | 0.3751ms | 0.3470ms | 2.8822 KOps/s | 2.9294 KOps/s | |
test_keys | 23.9600μs | 4.3548μs | 229.6331 KOps/s | 230.1944 KOps/s | |
test_keys_nested | 96.2410μs | 67.2960μs | 14.8597 KOps/s | 14.8830 KOps/s | |
test_keys_nested_locked | 2.0746ms | 72.5474μs | 13.7841 KOps/s | 13.8542 KOps/s | |
test_keys_nested_leaf | 92.0510μs | 57.9310μs | 17.2619 KOps/s | 17.2579 KOps/s | |
test_keys_stack_nested | 87.0220μs | 67.6823μs | 14.7749 KOps/s | 14.9076 KOps/s | |
test_keys_stack_nested_leaf | 82.2510μs | 58.1125μs | 17.2080 KOps/s | 17.2727 KOps/s | |
test_keys_stack_nested_locked | 94.8610μs | 72.8622μs | 13.7245 KOps/s | 13.8318 KOps/s | |
test_values | 8.6367μs | 1.8088μs | 552.8502 KOps/s | 544.3453 KOps/s | |
test_values_nested | 64.1110μs | 35.3900μs | 28.2566 KOps/s | 28.4554 KOps/s | |
test_values_nested_locked | 58.7510μs | 37.0591μs | 26.9839 KOps/s | 26.9848 KOps/s | |
test_values_nested_leaf | 53.8420μs | 31.7707μs | 31.4756 KOps/s | 32.1321 KOps/s | |
test_values_stack_nested | 63.0110μs | 36.2387μs | 27.5948 KOps/s | 28.1058 KOps/s | |
test_values_stack_nested_leaf | 63.8510μs | 32.3240μs | 30.9368 KOps/s | 31.5539 KOps/s | |
test_values_stack_nested_locked | 63.2310μs | 38.1092μs | 26.2404 KOps/s | 26.9679 KOps/s | |
test_membership | 13.0500μs | 0.8484μs | 1.1787 MOps/s | 1.3772 MOps/s | |
test_membership_nested | 31.5500μs | 2.5998μs | 384.6496 KOps/s | 380.7351 KOps/s | |
test_membership_nested_leaf | 0.1304ms | 2.6014μs | 384.4049 KOps/s | 378.8372 KOps/s | |
test_membership_stacked_nested | 34.0100μs | 2.5997μs | 384.6609 KOps/s | 382.1272 KOps/s | |
test_membership_stacked_nested_leaf | 13.9200μs | 2.5878μs | 386.4230 KOps/s | 381.5184 KOps/s | |
test_membership_nested_last | 34.5610μs | 3.1479μs | 317.6697 KOps/s | 317.3087 KOps/s | |
test_membership_nested_leaf_last | 16.8100μs | 3.1538μs | 317.0773 KOps/s | 316.0923 KOps/s | |
test_membership_stacked_nested_last | 20.9610μs | 3.9161μs | 255.3567 KOps/s | 314.1615 KOps/s | |
test_membership_stacked_nested_leaf_last | 35.6500μs | 3.9495μs | 253.1977 KOps/s | 318.0146 KOps/s | |
test_nested_getleaf | 46.1210μs | 8.4447μs | 118.4176 KOps/s | 119.5118 KOps/s | |
test_nested_get | 30.4310μs | 7.9043μs | 126.5141 KOps/s | 126.9747 KOps/s | |
test_stacked_getleaf | 25.2500μs | 8.4047μs | 118.9815 KOps/s | 119.0787 KOps/s | |
test_stacked_get | 25.6900μs | 7.9317μs | 126.0763 KOps/s | 127.0536 KOps/s | |
test_nested_getitemleaf | 39.7600μs | 8.5937μs | 116.3637 KOps/s | 117.0969 KOps/s | |
test_nested_getitem | 30.4700μs | 8.1058μs | 123.3682 KOps/s | 124.2327 KOps/s | |
test_stacked_getitemleaf | 25.6100μs | 8.6317μs | 115.8523 KOps/s | 116.0213 KOps/s | |
test_stacked_getitem | 33.7610μs | 8.1084μs | 123.3286 KOps/s | 123.6391 KOps/s | |
test_lock_nested | 60.2918ms | 0.4262ms | 2.3462 KOps/s | 2.3594 KOps/s | |
test_lock_stack_nested | 0.3401ms | 0.3157ms | 3.1681 KOps/s | 3.1756 KOps/s | |
test_unlock_nested | 62.6478ms | 0.4251ms | 2.3523 KOps/s | 2.3571 KOps/s | |
test_unlock_stack_nested | 0.3522ms | 0.3229ms | 3.0974 KOps/s | 3.0946 KOps/s | |
test_flatten_speed | 0.1920ms | 0.1040ms | 9.6178 KOps/s | 9.8803 KOps/s | |
test_unflatten_speed | 0.3535ms | 0.2911ms | 3.4358 KOps/s | 3.4213 KOps/s | |
test_common_ops | 1.2299ms | 0.6087ms | 1.6427 KOps/s | 1.6604 KOps/s | |
test_creation | 0.1840ms | 1.6911μs | 591.3255 KOps/s | 594.3230 KOps/s | |
test_creation_empty | 41.0310μs | 9.4778μs | 105.5096 KOps/s | 106.9315 KOps/s | |
test_creation_nested_1 | 30.4010μs | 11.3729μs | 87.9281 KOps/s | 89.3827 KOps/s | |
test_creation_nested_2 | 0.2006ms | 13.6046μs | 73.5046 KOps/s | 74.4282 KOps/s | |
test_clone | 64.1510μs | 12.8301μs | 77.9418 KOps/s | 83.0808 KOps/s | |
test_getitem[int] | 1.8991ms | 11.8653μs | 84.2796 KOps/s | 86.1587 KOps/s | |
test_getitem[slice_int] | 46.8620μs | 22.1503μs | 45.1461 KOps/s | 46.7261 KOps/s | |
test_getitem[range] | 68.0220μs | 49.3688μs | 20.2557 KOps/s | 20.0607 KOps/s | |
test_getitem[tuple] | 53.6200μs | 19.4067μs | 51.5287 KOps/s | 51.4525 KOps/s | |
test_getitem[list] | 0.2264ms | 34.4587μs | 29.0202 KOps/s | 27.7884 KOps/s | |
test_setitem_dim[int] | 50.0310μs | 31.7414μs | 31.5046 KOps/s | 31.9577 KOps/s | |
test_setitem_dim[slice_int] | 68.4710μs | 50.9308μs | 19.6345 KOps/s | 19.4338 KOps/s | |
test_setitem_dim[range] | 0.1028ms | 68.4006μs | 14.6198 KOps/s | 14.3683 KOps/s | |
test_setitem_dim[tuple] | 64.8310μs | 44.8554μs | 22.2938 KOps/s | 21.8194 KOps/s | |
test_setitem | 65.6910μs | 17.8886μs | 55.9015 KOps/s | 57.8835 KOps/s | |
test_set | 50.8510μs | 17.3173μs | 57.7458 KOps/s | 59.5304 KOps/s | |
test_set_shared | 1.4208ms | 0.1004ms | 9.9575 KOps/s | 10.0255 KOps/s | |
test_update | 85.8510μs | 20.0913μs | 49.7727 KOps/s | 51.2975 KOps/s | |
test_update_nested | 72.6010μs | 25.0267μs | 39.9573 KOps/s | 40.6934 KOps/s | |
test_update__nested | 56.1010μs | 23.6489μs | 42.2852 KOps/s | 43.9103 KOps/s | |
test_set_nested | 73.7610μs | 18.2869μs | 54.6841 KOps/s | 55.9392 KOps/s | |
test_set_nested_new | 68.3710μs | 20.9964μs | 47.6271 KOps/s | 48.0990 KOps/s | |
test_select | 74.1220μs | 34.4069μs | 29.0640 KOps/s | 29.6823 KOps/s | |
test_select_nested | 95.0420μs | 55.3386μs | 18.0706 KOps/s | 18.0692 KOps/s | |
test_exclude_nested | 0.1509ms | 0.1089ms | 9.1863 KOps/s | 9.0065 KOps/s | |
test_empty[True] | 0.3873ms | 0.3432ms | 2.9134 KOps/s | 2.8285 KOps/s | |
test_empty[False] | 2.6680μs | 0.9251μs | 1.0810 MOps/s | 1.0723 MOps/s | |
test_to | 0.1047ms | 79.6150μs | 12.5604 KOps/s | 13.0503 KOps/s | |
test_to_nonblocking | 0.2211ms | 63.0587μs | 15.8582 KOps/s | 16.7110 KOps/s | |
test_unbind_speed | 1.5296ms | 0.2799ms | 3.5729 KOps/s | 3.5739 KOps/s | |
test_unbind_speed_stack0 | 0.3277ms | 0.2773ms | 3.6063 KOps/s | 3.5972 KOps/s | |
test_unbind_speed_stack1 | 84.6137ms | 0.8354ms | 1.1970 KOps/s | 1.1864 KOps/s | |
test_split | 78.4659ms | 1.7522ms | 570.7134 Ops/s | 579.9782 Ops/s | |
test_chunk | 78.2141ms | 1.7475ms | 572.2380 Ops/s | 581.2616 Ops/s | |
test_creation[device0] | 0.1996ms | 60.3693μs | 16.5647 KOps/s | 16.7086 KOps/s | |
test_creation_from_tensor | 0.1329ms | 56.6635μs | 17.6480 KOps/s | 17.7765 KOps/s | |
test_add_one[memmap_tensor0] | 81.0310μs | 7.9791μs | 125.3273 KOps/s | 136.1717 KOps/s | |
test_contiguous[memmap_tensor0] | 25.7910μs | 0.7058μs | 1.4168 MOps/s | 1.4041 MOps/s | |
test_stack[memmap_tensor0] | 30.6310μs | 5.2474μs | 190.5719 KOps/s | 197.6072 KOps/s | |
test_memmaptd_index | 1.0830ms | 0.3024ms | 3.3071 KOps/s | 3.2831 KOps/s | |
test_memmaptd_index_astensor | 0.7220ms | 0.3727ms | 2.6832 KOps/s | 2.6668 KOps/s | |
test_memmaptd_index_op | 1.1528ms | 0.7095ms | 1.4095 KOps/s | 1.4276 KOps/s | |
test_serialize_model | 0.1864s | 0.1124s | 8.8957 Ops/s | 8.3584 Ops/s | |
test_serialize_model_pickle | 1.3514s | 1.2366s | 0.8086 Ops/s | 0.8079 Ops/s | |
test_serialize_weights | 0.1842s | 0.1109s | 9.0144 Ops/s | 9.4856 Ops/s | |
test_serialize_weights_returnearly | 0.2697s | 0.1019s | 9.8145 Ops/s | 12.3241 Ops/s | |
test_serialize_weights_pickle | 1.4017s | 1.2542s | 0.7973 Ops/s | 0.8059 Ops/s | |
test_reshape_pytree | 68.7910μs | 26.7017μs | 37.4508 KOps/s | 37.6060 KOps/s | |
test_reshape_td | 63.7510μs | 31.7934μs | 31.4531 KOps/s | 31.5874 KOps/s | |
test_view_pytree | 0.1753ms | 26.4501μs | 37.8070 KOps/s | 37.9446 KOps/s | |
test_view_td | 62.6810μs | 36.1544μs | 27.6592 KOps/s | 27.7834 KOps/s | |
test_unbind_pytree | 91.6220μs | 32.5139μs | 30.7561 KOps/s | 30.7796 KOps/s | |
test_unbind_td | 0.4559ms | 42.7393μs | 23.3977 KOps/s | 23.9906 KOps/s | |
test_split_pytree | 61.9920μs | 34.9991μs | 28.5722 KOps/s | 27.5547 KOps/s | |
test_split_td | 0.1071ms | 41.7088μs | 23.9758 KOps/s | 24.5117 KOps/s | |
test_add_pytree | 70.3710μs | 39.8937μs | 25.0666 KOps/s | 25.6754 KOps/s | |
test_add_td | 86.0710μs | 51.6141μs | 19.3746 KOps/s | 19.6111 KOps/s | |
test_distributed | 0.2087ms | 66.1867μs | 15.1088 KOps/s | 13.8371 KOps/s | |
test_tdmodule | 38.3400μs | 15.0494μs | 66.4478 KOps/s | 64.7325 KOps/s | |
test_tdmodule_dispatch | 53.2110μs | 29.3514μs | 34.0700 KOps/s | 34.3024 KOps/s | |
test_tdseq | 32.8410μs | 17.0678μs | 58.5897 KOps/s | 59.2385 KOps/s | |
test_tdseq_dispatch | 50.3110μs | 32.6542μs | 30.6239 KOps/s | 29.9594 KOps/s | |
test_instantiation_functorch | 1.6729ms | 1.5289ms | 654.0643 Ops/s | 643.1564 Ops/s | |
test_instantiation_td | 1.5363ms | 1.0467ms | 955.3943 Ops/s | 942.6270 Ops/s | |
test_exec_functorch | 0.1878ms | 0.1554ms | 6.4353 KOps/s | 6.6135 KOps/s | |
test_exec_functional_call | 0.1865ms | 0.1461ms | 6.8453 KOps/s | 7.1329 KOps/s | |
test_exec_td | 0.1740ms | 0.1434ms | 6.9756 KOps/s | 7.0724 KOps/s | |
test_exec_td_decorator | 0.6995ms | 0.2171ms | 4.6065 KOps/s | 4.6762 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.7759ms | 0.6045ms | 1.6542 KOps/s | 1.6412 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.7405ms | 0.6018ms | 1.6617 KOps/s | 1.6458 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.6842ms | 0.5315ms | 1.8814 KOps/s | 1.8706 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.6710ms | 0.5312ms | 1.8826 KOps/s | 1.8750 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.2659ms | 0.6629ms | 1.5085 KOps/s | 1.4925 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8069ms | 0.6600ms | 1.5152 KOps/s | 1.5054 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7364ms | 0.5863ms | 1.7057 KOps/s | 1.6949 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.6972ms | 0.5862ms | 1.7060 KOps/s | 1.6952 KOps/s | |
test_vmap_transformer_speed[True-True] | 8.1704ms | 8.0729ms | 123.8705 Ops/s | 123.7563 Ops/s | |
test_vmap_transformer_speed[True-False] | 8.2426ms | 8.0712ms | 123.8968 Ops/s | 124.4629 Ops/s | |
test_vmap_transformer_speed[False-True] | 8.2453ms | 8.0182ms | 124.7164 Ops/s | 122.4563 Ops/s | |
test_vmap_transformer_speed[False-False] | 8.1195ms | 8.0052ms | 124.9194 Ops/s | 121.6465 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 19.6718ms | 19.4866ms | 51.3173 Ops/s | 50.3080 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 19.5770ms | 19.4482ms | 51.4187 Ops/s | 50.0549 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 19.4967ms | 19.3738ms | 51.6160 Ops/s | 50.4682 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 20.0682ms | 19.3744ms | 51.6145 Ops/s | 50.3859 Ops/s | |
test_to_module_speed[True] | 1.7035ms | 1.5109ms | 661.8533 Ops/s | 643.0599 Ops/s | |
test_to_module_speed[False] | 1.6527ms | 1.4914ms | 670.5124 Ops/s | 655.2732 Ops/s | |
test_tc_init | 74.7920μs | 26.0798μs | 38.3439 KOps/s | 38.1273 KOps/s | |
test_tc_init_nested | 88.7210μs | 53.5705μs | 18.6670 KOps/s | 17.8107 KOps/s | |
test_tc_first_layer_tensor | 1.1860μs | 0.3573μs | 2.7988 MOps/s | 2.7481 MOps/s | |
test_tc_first_layer_nontensor | 10.3832μs | 0.3903μs | 2.5621 MOps/s | 2.5692 MOps/s | |
test_tc_second_layer_tensor | 16.9600μs | 1.0767μs | 928.7543 KOps/s | 1.0254 MOps/s | |
test_tc_second_layer_nontensor | 5.4118μs | 0.8265μs | 1.2100 MOps/s | 1.2487 MOps/s | |
test_unbind | 0.1046s | 8.3051ms | 120.4087 Ops/s | 184.8951 Ops/s | |
test_full_like | 14.5223ms | 13.5712ms | 73.6854 Ops/s | 102.1083 Ops/s | |
test_zeros_like | 8.0842ms | 7.8666ms | 127.1199 Ops/s | 140.6153 Ops/s | |
test_ones_like | 8.2602ms | 7.8993ms | 126.5935 Ops/s | 139.6547 Ops/s | |
test_clone | 10.3489ms | 9.7199ms | 102.8820 Ops/s | 100.4318 Ops/s | |
test_squeeze | 66.5410μs | 11.2613μs | 88.7997 KOps/s | 91.8872 KOps/s | |
test_unsqueeze | 0.1875ms | 53.8190μs | 18.5808 KOps/s | 19.2897 KOps/s | |
test_split | 0.1794ms | 99.5630μs | 10.0439 KOps/s | 9.9587 KOps/s | |
test_permute | 0.2024ms | 0.1129ms | 8.8569 KOps/s | 9.0306 KOps/s | |
test_stack | 30.6459ms | 29.9010ms | 33.4437 Ops/s | 33.7804 Ops/s | |
test_cat | 30.6095ms | 29.8012ms | 33.5557 Ops/s | 33.6517 Ops/s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
enhancement
New feature or request
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The goal of this PR is that stacking 2 identical (in structure) lazy stacks, the resulting tensordict is a lazy stack containing dense tensordicts.
cc @matteobettini @dtsaras