-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Faster tensorclass #791
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 32.5010μs | 16.5895μs | 60.2791 KOps/s | 57.1999 KOps/s | |
test_plain_set_stack_nested | 51.7470μs | 17.0070μs | 58.7993 KOps/s | 56.3115 KOps/s | |
test_plain_set_nested_inplace | 56.9160μs | 19.0482μs | 52.4984 KOps/s | 49.8819 KOps/s | |
test_plain_set_stack_nested_inplace | 58.3090μs | 19.0146μs | 52.5910 KOps/s | 50.4222 KOps/s | |
test_items | 30.8980μs | 2.4635μs | 405.9201 KOps/s | 392.2748 KOps/s | |
test_items_nested | 1.3305ms | 0.2667ms | 3.7495 KOps/s | 3.7632 KOps/s | |
test_items_nested_locked | 0.4339ms | 0.2662ms | 3.7563 KOps/s | 3.6986 KOps/s | |
test_items_nested_leaf | 0.1285ms | 77.3562μs | 12.9272 KOps/s | 12.9785 KOps/s | |
test_items_stack_nested | 0.4554ms | 0.2668ms | 3.7476 KOps/s | 3.7218 KOps/s | |
test_items_stack_nested_leaf | 0.1522ms | 77.4172μs | 12.9170 KOps/s | 12.7323 KOps/s | |
test_items_stack_nested_locked | 0.3920ms | 0.2698ms | 3.7059 KOps/s | 3.7640 KOps/s | |
test_keys | 30.2970μs | 3.8844μs | 257.4377 KOps/s | 260.1484 KOps/s | |
test_keys_nested | 0.2955ms | 0.1394ms | 7.1714 KOps/s | 7.2840 KOps/s | |
test_keys_nested_locked | 2.2074ms | 0.1432ms | 6.9846 KOps/s | 7.0485 KOps/s | |
test_keys_nested_leaf | 0.2030ms | 0.1177ms | 8.4983 KOps/s | 8.4660 KOps/s | |
test_keys_stack_nested | 0.2092ms | 0.1383ms | 7.2294 KOps/s | 7.2559 KOps/s | |
test_keys_stack_nested_leaf | 0.1987ms | 0.1175ms | 8.5071 KOps/s | 8.4864 KOps/s | |
test_keys_stack_nested_locked | 0.2357ms | 0.1418ms | 7.0546 KOps/s | 6.9523 KOps/s | |
test_values | 10.3245μs | 1.1690μs | 855.4678 KOps/s | 836.6975 KOps/s | |
test_values_nested | 93.3250μs | 50.6421μs | 19.7464 KOps/s | 19.8304 KOps/s | |
test_values_nested_locked | 92.7230μs | 50.4406μs | 19.8253 KOps/s | 19.6974 KOps/s | |
test_values_nested_leaf | 90.4600μs | 46.2410μs | 21.6258 KOps/s | 22.0217 KOps/s | |
test_values_stack_nested | 0.1026ms | 51.1682μs | 19.5434 KOps/s | 19.6496 KOps/s | |
test_values_stack_nested_leaf | 91.3820μs | 46.3478μs | 21.5760 KOps/s | 22.0101 KOps/s | |
test_values_stack_nested_locked | 0.1035ms | 50.8055μs | 19.6829 KOps/s | 19.5380 KOps/s | |
test_membership | 12.8540μs | 1.3334μs | 749.9373 KOps/s | 714.7484 KOps/s | |
test_membership_nested | 51.2260μs | 3.4136μs | 292.9484 KOps/s | 289.0615 KOps/s | |
test_membership_nested_leaf | 57.6180μs | 3.4306μs | 291.4946 KOps/s | 290.2561 KOps/s | |
test_membership_stacked_nested | 42.2390μs | 3.4064μs | 293.5657 KOps/s | 294.4581 KOps/s | |
test_membership_stacked_nested_leaf | 30.5070μs | 3.4626μs | 288.8011 KOps/s | 290.1254 KOps/s | |
test_membership_nested_last | 30.7580μs | 4.1853μs | 238.9325 KOps/s | 236.4472 KOps/s | |
test_membership_nested_leaf_last | 39.0330μs | 4.2118μs | 237.4269 KOps/s | 240.4541 KOps/s | |
test_membership_stacked_nested_last | 31.7100μs | 4.1987μs | 238.1697 KOps/s | 234.7829 KOps/s | |
test_membership_stacked_nested_leaf_last | 31.8300μs | 4.2322μs | 236.2844 KOps/s | 235.8720 KOps/s | |
test_nested_getleaf | 79.8300μs | 10.6993μs | 93.4642 KOps/s | 94.8256 KOps/s | |
test_nested_get | 48.8110μs | 10.1641μs | 98.3852 KOps/s | 100.4440 KOps/s | |
test_stacked_getleaf | 82.4150μs | 10.7142μs | 93.3337 KOps/s | 94.9165 KOps/s | |
test_stacked_get | 36.5390μs | 10.0107μs | 99.8933 KOps/s | 99.4684 KOps/s | |
test_nested_getitemleaf | 3.6831ms | 11.3295μs | 88.2652 KOps/s | 89.6018 KOps/s | |
test_nested_getitem | 59.9210μs | 10.2522μs | 97.5400 KOps/s | 100.4627 KOps/s | |
test_stacked_getitemleaf | 91.8720μs | 11.1768μs | 89.4714 KOps/s | 89.4540 KOps/s | |
test_stacked_getitem | 45.8650μs | 10.2094μs | 97.9486 KOps/s | 98.4882 KOps/s | |
test_lock_nested | 60.9636ms | 0.4265ms | 2.3447 KOps/s | 2.7621 KOps/s | |
test_lock_stack_nested | 0.4849ms | 0.3144ms | 3.1808 KOps/s | 3.1753 KOps/s | |
test_unlock_nested | 1.6483ms | 0.3571ms | 2.8006 KOps/s | 2.3968 KOps/s | |
test_unlock_stack_nested | 0.4286ms | 0.3169ms | 3.1556 KOps/s | 3.0804 KOps/s | |
test_flatten_speed | 0.2671ms | 96.0424μs | 10.4121 KOps/s | 10.2913 KOps/s | |
test_unflatten_speed | 0.6054ms | 0.4089ms | 2.4458 KOps/s | 2.4240 KOps/s | |
test_common_ops | 1.6815ms | 0.7201ms | 1.3887 KOps/s | 1.3054 KOps/s | |
test_creation | 21.1090μs | 1.8823μs | 531.2566 KOps/s | 528.1358 KOps/s | |
test_creation_empty | 0.1074ms | 10.3090μs | 97.0028 KOps/s | 85.8574 KOps/s | |
test_creation_nested_1 | 54.2810μs | 13.1282μs | 76.1720 KOps/s | 69.0223 KOps/s | |
test_creation_nested_2 | 62.5670μs | 16.3863μs | 61.0265 KOps/s | 54.9944 KOps/s | |
test_clone | 0.1459ms | 13.6171μs | 73.4372 KOps/s | 72.6865 KOps/s | |
test_getitem[int] | 53.0090μs | 11.4041μs | 87.6879 KOps/s | 84.5881 KOps/s | |
test_getitem[slice_int] | 57.1670μs | 22.2646μs | 44.9144 KOps/s | 42.1508 KOps/s | |
test_getitem[range] | 79.6990μs | 58.8822μs | 16.9831 KOps/s | 16.5304 KOps/s | |
test_getitem[tuple] | 66.8450μs | 18.7990μs | 53.1943 KOps/s | 51.4087 KOps/s | |
test_getitem[list] | 0.1754ms | 41.8651μs | 23.8862 KOps/s | 22.8241 KOps/s | |
test_setitem_dim[int] | 75.2900μs | 36.0437μs | 27.7441 KOps/s | 27.8166 KOps/s | |
test_setitem_dim[slice_int] | 0.1244ms | 62.8275μs | 15.9166 KOps/s | 15.4934 KOps/s | |
test_setitem_dim[range] | 0.1787ms | 87.3861μs | 11.4435 KOps/s | 11.0522 KOps/s | |
test_setitem_dim[tuple] | 0.1532ms | 51.0133μs | 19.6027 KOps/s | 19.4713 KOps/s | |
test_setitem | 69.4500μs | 20.8401μs | 47.9844 KOps/s | 46.5433 KOps/s | |
test_set | 0.1016ms | 20.4525μs | 48.8938 KOps/s | 47.3024 KOps/s | |
test_set_shared | 3.6495ms | 0.1438ms | 6.9521 KOps/s | 6.8720 KOps/s | |
test_update | 0.1484ms | 22.2411μs | 44.9617 KOps/s | 42.1412 KOps/s | |
test_update_nested | 96.1400μs | 30.5301μs | 32.7546 KOps/s | 31.7422 KOps/s | |
test_update__nested | 64.2500μs | 24.9439μs | 40.0899 KOps/s | 38.5167 KOps/s | |
test_set_nested | 98.4640μs | 22.2937μs | 44.8557 KOps/s | 43.4141 KOps/s | |
test_set_nested_new | 0.1123ms | 26.2807μs | 38.0508 KOps/s | 37.0491 KOps/s | |
test_select | 0.1131ms | 41.7917μs | 23.9282 KOps/s | 23.2722 KOps/s | |
test_select_nested | 0.1458ms | 60.8132μs | 16.4438 KOps/s | 16.2010 KOps/s | |
test_exclude_nested | 0.2296ms | 0.1213ms | 8.2434 KOps/s | 8.1616 KOps/s | |
test_empty[True] | 0.5434ms | 0.3965ms | 2.5220 KOps/s | 2.4811 KOps/s | |
test_empty[False] | 7.3678μs | 1.0825μs | 923.7492 KOps/s | 922.3668 KOps/s | |
test_unbind_speed | 0.3709ms | 0.2650ms | 3.7742 KOps/s | 3.8146 KOps/s | |
test_unbind_speed_stack0 | 0.4095ms | 0.2545ms | 3.9289 KOps/s | 3.8856 KOps/s | |
test_unbind_speed_stack1 | 72.8653ms | 0.7317ms | 1.3666 KOps/s | 1.2480 KOps/s | |
test_split | 73.3713ms | 1.5946ms | 627.1275 Ops/s | 597.5835 Ops/s | |
test_chunk | 70.5256ms | 1.5921ms | 628.1126 Ops/s | 636.5812 Ops/s | |
test_creation[device0] | 3.6176ms | 87.2238μs | 11.4648 KOps/s | 11.5787 KOps/s | |
test_creation_from_tensor | 0.2341ms | 86.9931μs | 11.4952 KOps/s | 11.1252 KOps/s | |
test_add_one[memmap_tensor0] | 0.1111ms | 5.3906μs | 185.5072 KOps/s | 178.6133 KOps/s | |
test_contiguous[memmap_tensor0] | 22.5920μs | 0.6486μs | 1.5417 MOps/s | 1.5927 MOps/s | |
test_stack[memmap_tensor0] | 18.3940μs | 3.6371μs | 274.9417 KOps/s | 268.3840 KOps/s | |
test_memmaptd_index | 1.1589ms | 0.2557ms | 3.9110 KOps/s | 3.8679 KOps/s | |
test_memmaptd_index_astensor | 0.7838ms | 0.3334ms | 2.9991 KOps/s | 2.9740 KOps/s | |
test_memmaptd_index_op | 1.1775ms | 0.6296ms | 1.5884 KOps/s | 1.5124 KOps/s | |
test_serialize_model | 0.1848s | 0.1174s | 8.5212 Ops/s | 9.0403 Ops/s | |
test_serialize_model_pickle | 0.4484s | 0.3815s | 2.6212 Ops/s | 2.5609 Ops/s | |
test_serialize_weights | 0.1100s | 0.1039s | 9.6228 Ops/s | 8.0770 Ops/s | |
test_serialize_weights_returnearly | 0.2010s | 0.1384s | 7.2247 Ops/s | 6.8012 Ops/s | |
test_serialize_weights_pickle | 0.9629s | 0.6640s | 1.5060 Ops/s | 2.4230 Ops/s | |
test_serialize_weights_filesystem | 0.1724s | 0.1022s | 9.7826 Ops/s | 10.1794 Ops/s | |
test_serialize_model_filesystem | 0.1041s | 94.8076ms | 10.5477 Ops/s | 9.5574 Ops/s | |
test_reshape_pytree | 88.7260μs | 25.6763μs | 38.9465 KOps/s | 38.4575 KOps/s | |
test_reshape_td | 0.1198ms | 33.3170μs | 30.0147 KOps/s | 28.9781 KOps/s | |
test_view_pytree | 82.8350μs | 25.6826μs | 38.9368 KOps/s | 39.2178 KOps/s | |
test_view_td | 95.4290μs | 37.6174μs | 26.5834 KOps/s | 26.4313 KOps/s | |
test_unbind_pytree | 82.9060μs | 29.5934μs | 33.7913 KOps/s | 33.7250 KOps/s | |
test_unbind_td | 0.4029ms | 38.4692μs | 25.9948 KOps/s | 26.0164 KOps/s | |
test_split_pytree | 78.5270μs | 29.5466μs | 33.8449 KOps/s | 33.7769 KOps/s | |
test_split_td | 0.1138ms | 40.6200μs | 24.6184 KOps/s | 23.3832 KOps/s | |
test_add_pytree | 76.1530μs | 35.0054μs | 28.5671 KOps/s | 28.0725 KOps/s | |
test_add_td | 0.1731ms | 58.4169μs | 17.1183 KOps/s | 17.0051 KOps/s | |
test_distributed | 0.3130ms | 0.1055ms | 9.4803 KOps/s | 9.4008 KOps/s | |
test_tdmodule | 0.1104ms | 18.4043μs | 54.3351 KOps/s | 54.6497 KOps/s | |
test_tdmodule_dispatch | 69.7410μs | 35.8339μs | 27.9065 KOps/s | 26.7691 KOps/s | |
test_tdseq | 55.0340μs | 20.6727μs | 48.3730 KOps/s | 40.9574 KOps/s | |
test_tdseq_dispatch | 83.0360μs | 39.9117μs | 25.0553 KOps/s | 23.6854 KOps/s | |
test_instantiation_functorch | 1.9403ms | 1.3250ms | 754.7450 Ops/s | 749.7212 Ops/s | |
test_instantiation_td | 1.9188ms | 1.0368ms | 964.5192 Ops/s | 950.7808 Ops/s | |
test_exec_functorch | 0.3374ms | 0.1610ms | 6.2119 KOps/s | 6.1080 KOps/s | |
test_exec_functional_call | 0.2918ms | 0.1502ms | 6.6598 KOps/s | 6.5308 KOps/s | |
test_exec_td | 0.2875ms | 0.1469ms | 6.8082 KOps/s | 6.6771 KOps/s | |
test_exec_td_decorator | 1.6296ms | 0.2225ms | 4.4949 KOps/s | 4.4837 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.9453ms | 0.4928ms | 2.0293 KOps/s | 2.0246 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.8719ms | 0.4838ms | 2.0669 KOps/s | 2.0231 KOps/s | |
test_vmap_mlp_speed[False-True] | 1.1131ms | 0.4265ms | 2.3446 KOps/s | 2.4921 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.7529ms | 0.4016ms | 2.4900 KOps/s | 2.4771 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.3746ms | 0.5656ms | 1.7681 KOps/s | 1.7549 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.9811ms | 0.5602ms | 1.7850 KOps/s | 1.7585 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.8517ms | 0.4668ms | 2.1420 KOps/s | 2.1555 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7977ms | 0.4668ms | 2.1425 KOps/s | 2.1495 KOps/s | |
test_to_module_speed[True] | 2.5973ms | 1.6996ms | 588.3607 Ops/s | 584.4152 Ops/s | |
test_to_module_speed[False] | 2.6243ms | 1.6827ms | 594.2974 Ops/s | 549.7325 Ops/s | |
test_tc_init | 61.2750μs | 28.5574μs | 35.0171 KOps/s | 14.5655 KOps/s | |
test_tc_init_nested | 0.1190ms | 58.6180μs | 17.0596 KOps/s | 7.1687 KOps/s | |
test_tc_first_layer_tensor | 4.9793μs | 0.6832μs | 1.4638 MOps/s | 160.8866 KOps/s | |
test_tc_first_layer_nontensor | 1.8971μs | 0.6752μs | 1.4811 MOps/s | 159.4670 KOps/s | |
test_tc_second_layer_tensor | 24.1450μs | 1.8234μs | 548.4121 KOps/s | 87.9681 KOps/s | |
test_tc_second_layer_nontensor | 8.9333μs | 1.4777μs | 676.7266 KOps/s | 86.1024 KOps/s | |
test_unbind | 94.1599ms | 7.3748ms | 135.5976 Ops/s | 74.7854 Ops/s | |
test_full_like | 19.4543ms | 12.1912ms | 82.0261 Ops/s | 84.8644 Ops/s | |
test_zeros_like | 12.2971ms | 6.5376ms | 152.9614 Ops/s | 152.1959 Ops/s | |
test_ones_like | 15.4200ms | 6.5841ms | 151.8822 Ops/s | 141.3483 Ops/s | |
test_clone | 14.3957ms | 8.6462ms | 115.6583 Ops/s | 117.9237 Ops/s | |
test_squeeze | 72.1440μs | 14.4483μs | 69.2124 KOps/s | 36.0786 KOps/s | |
test_unsqueeze | 0.1751ms | 70.9901μs | 14.0865 KOps/s | 10.0089 KOps/s | |
test_split | 0.1943ms | 0.1125ms | 8.8890 KOps/s | 5.8273 KOps/s | |
test_permute | 0.2396ms | 0.1375ms | 7.2749 KOps/s | 5.6667 KOps/s | |
test_stack | 30.7516ms | 23.7975ms | 42.0213 Ops/s | 39.9472 Ops/s | |
test_cat | 41.3268ms | 25.3404ms | 39.4627 Ops/s | 39.7305 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 39.4810μs | 20.6723μs | 48.3738 KOps/s | 47.2634 KOps/s | |
test_plain_set_stack_nested | 40.4110μs | 20.7430μs | 48.2090 KOps/s | 47.4050 KOps/s | |
test_plain_set_nested_inplace | 44.3720μs | 23.1374μs | 43.2201 KOps/s | 42.1874 KOps/s | |
test_plain_set_stack_nested_inplace | 45.0900μs | 22.9479μs | 43.5771 KOps/s | 42.2361 KOps/s | |
test_items | 18.0410μs | 4.1893μs | 238.7027 KOps/s | 230.7624 KOps/s | |
test_items_nested | 0.3687ms | 0.3406ms | 2.9362 KOps/s | 2.8756 KOps/s | |
test_items_nested_locked | 0.3919ms | 0.3434ms | 2.9125 KOps/s | 2.8590 KOps/s | |
test_items_nested_leaf | 0.1212ms | 0.1012ms | 9.8819 KOps/s | 9.8736 KOps/s | |
test_items_stack_nested | 0.3698ms | 0.3449ms | 2.8995 KOps/s | 2.8963 KOps/s | |
test_items_stack_nested_leaf | 0.1244ms | 0.1005ms | 9.9459 KOps/s | 9.7130 KOps/s | |
test_items_stack_nested_locked | 0.3949ms | 0.3425ms | 2.9200 KOps/s | 2.8896 KOps/s | |
test_keys | 23.5000μs | 4.7207μs | 211.8339 KOps/s | 209.0952 KOps/s | |
test_keys_nested | 0.2064ms | 0.1662ms | 6.0186 KOps/s | 5.9232 KOps/s | |
test_keys_nested_locked | 0.7297ms | 0.1714ms | 5.8343 KOps/s | 5.8020 KOps/s | |
test_keys_nested_leaf | 0.1772ms | 0.1436ms | 6.9627 KOps/s | 6.9023 KOps/s | |
test_keys_stack_nested | 0.1830ms | 0.1627ms | 6.1445 KOps/s | 5.9120 KOps/s | |
test_keys_stack_nested_leaf | 0.2245ms | 0.1412ms | 7.0825 KOps/s | 6.8450 KOps/s | |
test_keys_stack_nested_locked | 0.2284ms | 0.1674ms | 5.9723 KOps/s | 5.8394 KOps/s | |
test_values | 8.8737μs | 2.0388μs | 490.4812 KOps/s | 489.8219 KOps/s | |
test_values_nested | 83.0610μs | 59.7844μs | 16.7268 KOps/s | 16.2391 KOps/s | |
test_values_nested_locked | 82.7020μs | 60.6482μs | 16.4885 KOps/s | 16.0648 KOps/s | |
test_values_nested_leaf | 89.1510μs | 54.6311μs | 18.3046 KOps/s | 17.9757 KOps/s | |
test_values_stack_nested | 98.0330μs | 60.7869μs | 16.4509 KOps/s | 15.9196 KOps/s | |
test_values_stack_nested_leaf | 85.6620μs | 53.5596μs | 18.6708 KOps/s | 17.4154 KOps/s | |
test_values_stack_nested_locked | 87.2620μs | 60.7130μs | 16.4709 KOps/s | 16.2881 KOps/s | |
test_membership | 37.0410μs | 1.5079μs | 663.1619 KOps/s | 650.5227 KOps/s | |
test_membership_nested | 19.1800μs | 3.7863μs | 264.1087 KOps/s | 259.2228 KOps/s | |
test_membership_nested_leaf | 20.7900μs | 3.7951μs | 263.4969 KOps/s | 257.4419 KOps/s | |
test_membership_stacked_nested | 24.1310μs | 3.7995μs | 263.1899 KOps/s | 256.6625 KOps/s | |
test_membership_stacked_nested_leaf | 34.4000μs | 3.7634μs | 265.7169 KOps/s | 258.0000 KOps/s | |
test_membership_nested_last | 20.3400μs | 4.6823μs | 213.5683 KOps/s | 212.8543 KOps/s | |
test_membership_nested_leaf_last | 29.8210μs | 4.7164μs | 212.0248 KOps/s | 211.9774 KOps/s | |
test_membership_stacked_nested_last | 25.1510μs | 8.2598μs | 121.0685 KOps/s | 210.7908 KOps/s | |
test_membership_stacked_nested_leaf_last | 38.7300μs | 8.2713μs | 120.8993 KOps/s | 212.9455 KOps/s | |
test_nested_getleaf | 34.8500μs | 12.9354μs | 77.3075 KOps/s | 74.7961 KOps/s | |
test_nested_get | 31.0810μs | 12.3489μs | 80.9788 KOps/s | 78.9636 KOps/s | |
test_stacked_getleaf | 35.7410μs | 12.9353μs | 77.3077 KOps/s | 74.6475 KOps/s | |
test_stacked_get | 39.7910μs | 12.2511μs | 81.6253 KOps/s | 79.2026 KOps/s | |
test_nested_getitemleaf | 37.5900μs | 13.3055μs | 75.1568 KOps/s | 72.6193 KOps/s | |
test_nested_getitem | 31.7000μs | 12.4591μs | 80.2626 KOps/s | 78.0123 KOps/s | |
test_stacked_getitemleaf | 42.9400μs | 13.3752μs | 74.7652 KOps/s | 72.4123 KOps/s | |
test_stacked_getitem | 41.0610μs | 12.5008μs | 79.9946 KOps/s | 77.7348 KOps/s | |
test_lock_nested | 0.7963ms | 0.3961ms | 2.5244 KOps/s | 2.1969 KOps/s | |
test_lock_stack_nested | 0.3996ms | 0.3463ms | 2.8878 KOps/s | 2.8304 KOps/s | |
test_unlock_nested | 0.7669ms | 0.4034ms | 2.4791 KOps/s | 2.1691 KOps/s | |
test_unlock_stack_nested | 0.4061ms | 0.3635ms | 2.7511 KOps/s | 2.7169 KOps/s | |
test_flatten_speed | 0.4183ms | 0.1226ms | 8.1559 KOps/s | 8.1931 KOps/s | |
test_unflatten_speed | 0.5943ms | 0.4785ms | 2.0901 KOps/s | 2.0547 KOps/s | |
test_common_ops | 1.1589ms | 0.7085ms | 1.4114 KOps/s | 1.4205 KOps/s | |
test_creation | 16.9490μs | 2.1148μs | 472.8668 KOps/s | 449.8448 KOps/s | |
test_creation_empty | 28.2910μs | 11.7833μs | 84.8656 KOps/s | 83.0648 KOps/s | |
test_creation_nested_1 | 40.8410μs | 14.6115μs | 68.4392 KOps/s | 67.0999 KOps/s | |
test_creation_nested_2 | 1.4980ms | 18.1927μs | 54.9672 KOps/s | 53.3774 KOps/s | |
test_clone | 87.0520μs | 14.4866μs | 69.0292 KOps/s | 65.5708 KOps/s | |
test_getitem[int] | 30.6000μs | 13.4554μs | 74.3194 KOps/s | 73.4834 KOps/s | |
test_getitem[slice_int] | 62.3210μs | 24.9784μs | 40.0346 KOps/s | 40.5632 KOps/s | |
test_getitem[range] | 69.1910μs | 50.3504μs | 19.8608 KOps/s | 20.0567 KOps/s | |
test_getitem[tuple] | 55.9820μs | 21.8557μs | 45.7547 KOps/s | 45.3945 KOps/s | |
test_getitem[list] | 98.1810μs | 38.7780μs | 25.7878 KOps/s | 25.6223 KOps/s | |
test_setitem_dim[int] | 55.4310μs | 37.8885μs | 26.3932 KOps/s | 27.8002 KOps/s | |
test_setitem_dim[slice_int] | 82.3220μs | 60.6254μs | 16.4947 KOps/s | 17.1168 KOps/s | |
test_setitem_dim[range] | 99.5720μs | 77.1779μs | 12.9571 KOps/s | 13.2608 KOps/s | |
test_setitem_dim[tuple] | 76.1510μs | 54.0082μs | 18.5157 KOps/s | 19.9682 KOps/s | |
test_setitem | 44.9300μs | 21.0212μs | 47.5710 KOps/s | 45.0457 KOps/s | |
test_set | 47.2400μs | 20.5274μs | 48.7153 KOps/s | 47.6001 KOps/s | |
test_set_shared | 1.7381ms | 0.1089ms | 9.1856 KOps/s | 9.0368 KOps/s | |
test_update | 73.2320μs | 23.3878μs | 42.7573 KOps/s | 42.0700 KOps/s | |
test_update_nested | 70.7310μs | 32.5061μs | 30.7634 KOps/s | 30.1063 KOps/s | |
test_update__nested | 65.8320μs | 27.8032μs | 35.9671 KOps/s | 35.3441 KOps/s | |
test_set_nested | 55.6710μs | 22.7012μs | 44.0505 KOps/s | 43.9190 KOps/s | |
test_set_nested_new | 66.9110μs | 27.5315μs | 36.3220 KOps/s | 35.2252 KOps/s | |
test_select | 73.2520μs | 44.7828μs | 22.3300 KOps/s | 21.9796 KOps/s | |
test_select_nested | 0.1010ms | 66.2765μs | 15.0883 KOps/s | 14.8549 KOps/s | |
test_exclude_nested | 0.1519ms | 0.1288ms | 7.7663 KOps/s | 7.5992 KOps/s | |
test_empty[True] | 0.4766ms | 0.4395ms | 2.2753 KOps/s | 2.2458 KOps/s | |
test_empty[False] | 6.8077μs | 1.2698μs | 787.5198 KOps/s | 781.7680 KOps/s | |
test_to | 0.1132ms | 91.5163μs | 10.9270 KOps/s | 10.7800 KOps/s | |
test_to_nonblocking | 0.1055ms | 71.9523μs | 13.8981 KOps/s | 13.4376 KOps/s | |
test_unbind_speed | 1.9963ms | 0.3111ms | 3.2146 KOps/s | 3.2136 KOps/s | |
test_unbind_speed_stack0 | 0.3501ms | 0.3049ms | 3.2793 KOps/s | 3.2579 KOps/s | |
test_unbind_speed_stack1 | 74.2666ms | 0.9047ms | 1.1053 KOps/s | 1.0927 KOps/s | |
test_split | 74.8599ms | 1.9471ms | 513.5892 Ops/s | 567.1944 Ops/s | |
test_chunk | 75.3187ms | 1.9468ms | 513.6670 Ops/s | 524.7081 Ops/s | |
test_creation[device0] | 0.1664ms | 72.0307μs | 13.8830 KOps/s | 14.0077 KOps/s | |
test_creation_from_tensor | 0.1465ms | 67.4860μs | 14.8179 KOps/s | 14.8461 KOps/s | |
test_add_one[memmap_tensor0] | 0.1098ms | 6.8540μs | 145.9006 KOps/s | 146.1106 KOps/s | |
test_contiguous[memmap_tensor0] | 16.6800μs | 0.6782μs | 1.4745 MOps/s | 1.4433 MOps/s | |
test_stack[memmap_tensor0] | 29.9500μs | 4.5254μs | 220.9730 KOps/s | 221.1852 KOps/s | |
test_memmaptd_index | 0.5358ms | 0.3183ms | 3.1416 KOps/s | 3.1215 KOps/s | |
test_memmaptd_index_astensor | 0.7949ms | 0.4062ms | 2.4621 KOps/s | 2.4330 KOps/s | |
test_memmaptd_index_op | 1.1528ms | 0.7372ms | 1.3565 KOps/s | 1.3497 KOps/s | |
test_serialize_model | 0.1914s | 0.1177s | 8.4985 Ops/s | 8.1702 Ops/s | |
test_serialize_model_pickle | 1.3504s | 1.2360s | 0.8091 Ops/s | 0.8063 Ops/s | |
test_serialize_weights | 0.1814s | 0.1144s | 8.7427 Ops/s | 8.3466 Ops/s | |
test_serialize_weights_returnearly | 0.2235s | 0.1067s | 9.3715 Ops/s | 9.9927 Ops/s | |
test_serialize_weights_pickle | 1.3590s | 1.2361s | 0.8090 Ops/s | 0.8087 Ops/s | |
test_reshape_pytree | 93.9320μs | 32.9180μs | 30.3785 KOps/s | 30.2836 KOps/s | |
test_reshape_td | 70.9910μs | 37.4194μs | 26.7241 KOps/s | 27.4278 KOps/s | |
test_view_pytree | 0.1727ms | 33.0872μs | 30.2232 KOps/s | 30.8700 KOps/s | |
test_view_td | 0.2102ms | 43.9772μs | 22.7390 KOps/s | 23.9065 KOps/s | |
test_unbind_pytree | 0.1573ms | 39.0095μs | 25.6348 KOps/s | 26.1242 KOps/s | |
test_unbind_td | 0.5347ms | 46.0431μs | 21.7188 KOps/s | 22.1455 KOps/s | |
test_split_pytree | 70.2210μs | 38.0094μs | 26.3093 KOps/s | 26.9761 KOps/s | |
test_split_td | 0.1130ms | 47.6832μs | 20.9717 KOps/s | 21.5764 KOps/s | |
test_add_pytree | 0.1585ms | 44.6631μs | 22.3898 KOps/s | 22.9519 KOps/s | |
test_add_td | 0.1035ms | 65.4389μs | 15.2814 KOps/s | 16.5803 KOps/s | |
test_distributed | 1.7716ms | 81.5147μs | 12.2677 KOps/s | 9.7081 KOps/s | |
test_tdmodule | 52.4110μs | 18.8387μs | 53.0822 KOps/s | 51.7435 KOps/s | |
test_tdmodule_dispatch | 55.6320μs | 36.5268μs | 27.3772 KOps/s | 27.3525 KOps/s | |
test_tdseq | 38.1910μs | 20.8016μs | 48.0731 KOps/s | 47.6737 KOps/s | |
test_tdseq_dispatch | 61.7920μs | 40.9975μs | 24.3917 KOps/s | 24.5056 KOps/s | |
test_instantiation_functorch | 1.5267ms | 1.4472ms | 690.9839 Ops/s | 685.4634 Ops/s | |
test_instantiation_td | 1.5667ms | 1.0749ms | 930.3436 Ops/s | 861.2229 Ops/s | |
test_exec_functorch | 0.2000ms | 0.1735ms | 5.7630 KOps/s | 5.7103 KOps/s | |
test_exec_functional_call | 0.1964ms | 0.1671ms | 5.9827 KOps/s | 6.0976 KOps/s | |
test_exec_td | 0.1973ms | 0.1607ms | 6.2241 KOps/s | 6.2883 KOps/s | |
test_exec_td_decorator | 0.9013ms | 0.2442ms | 4.0947 KOps/s | 4.0470 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.7338ms | 0.6077ms | 1.6455 KOps/s | 1.5917 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.6721ms | 0.6025ms | 1.6599 KOps/s | 1.5911 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.5555ms | 0.5185ms | 1.9287 KOps/s | 1.8576 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.5645ms | 0.5188ms | 1.9276 KOps/s | 1.8491 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.3189ms | 0.6782ms | 1.4744 KOps/s | 1.4187 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.7956ms | 0.6773ms | 1.4765 KOps/s | 1.4198 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7064ms | 0.5867ms | 1.7043 KOps/s | 1.6376 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.6943ms | 0.5885ms | 1.6993 KOps/s | 1.6393 KOps/s | |
test_vmap_transformer_speed[True-True] | 7.6697ms | 7.5885ms | 131.7785 Ops/s | 127.7062 Ops/s | |
test_vmap_transformer_speed[True-False] | 8.1786ms | 7.6458ms | 130.7902 Ops/s | 128.0872 Ops/s | |
test_vmap_transformer_speed[False-True] | 7.8400ms | 7.5160ms | 133.0499 Ops/s | 129.5190 Ops/s | |
test_vmap_transformer_speed[False-False] | 8.2688ms | 7.5973ms | 131.6251 Ops/s | 129.6764 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 19.3865ms | 18.4535ms | 54.1901 Ops/s | 52.5880 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 19.8040ms | 18.4079ms | 54.3246 Ops/s | 52.6077 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 19.1810ms | 18.3043ms | 54.6320 Ops/s | 52.8019 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 18.3274ms | 18.2481ms | 54.8001 Ops/s | 52.9231 Ops/s | |
test_to_module_speed[True] | 2.0031ms | 1.9056ms | 524.7560 Ops/s | 521.3636 Ops/s | |
test_to_module_speed[False] | 1.9797ms | 1.8807ms | 531.7107 Ops/s | 527.2584 Ops/s | |
test_tc_init | 60.6710μs | 32.6744μs | 30.6050 KOps/s | 13.9136 KOps/s | |
test_tc_init_nested | 86.4520μs | 63.2893μs | 15.8005 KOps/s | 6.5205 KOps/s | |
test_tc_first_layer_tensor | 1.4811μs | 0.6877μs | 1.4542 MOps/s | 151.7165 KOps/s | |
test_tc_first_layer_nontensor | 1.4425μs | 0.6863μs | 1.4571 MOps/s | 155.1044 KOps/s | |
test_tc_second_layer_tensor | 17.2100μs | 2.0246μs | 493.9214 KOps/s | 81.0368 KOps/s | |
test_tc_second_layer_nontensor | 8.9203μs | 1.5959μs | 626.6207 KOps/s | 80.4124 KOps/s | |
test_unbind | 95.8214ms | 8.9893ms | 111.2438 Ops/s | 76.2340 Ops/s | |
test_full_like | 13.7936ms | 13.4808ms | 74.1797 Ops/s | 84.7511 Ops/s | |
test_zeros_like | 8.1692ms | 7.9593ms | 125.6398 Ops/s | 124.8989 Ops/s | |
test_ones_like | 8.4119ms | 7.9619ms | 125.5988 Ops/s | 123.9740 Ops/s | |
test_clone | 9.8658ms | 9.5936ms | 104.2365 Ops/s | 101.0100 Ops/s | |
test_squeeze | 66.5420μs | 14.4214μs | 69.3416 KOps/s | 34.0236 KOps/s | |
test_unsqueeze | 0.1218ms | 70.6438μs | 14.1555 KOps/s | 9.9217 KOps/s | |
test_split | 0.1812ms | 0.1179ms | 8.4816 KOps/s | 5.8036 KOps/s | |
test_permute | 0.2030ms | 0.1302ms | 7.6795 KOps/s | 5.8328 KOps/s | |
test_stack | 28.1398ms | 27.7716ms | 36.0080 Ops/s | 34.9618 Ops/s | |
test_cat | 28.3083ms | 27.7355ms | 36.0549 Ops/s | 35.1082 Ops/s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Performance
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.