-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix vmap monkey patching #1009
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vmoens
added a commit
that referenced
this pull request
Sep 25, 2024
ghstack-source-id: 55194bcc1564a29121ea514fdb595c97d860d5ee Pull Request resolved: #1009
The goal is to close pytorch/pytorch#134004 whilst waiting for pytorch/pytorch#135471 to be merged |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 52.2980μs | 21.7896μs | 45.8935 KOps/s | 48.7749 KOps/s | |
test_plain_set_stack_nested | 58.1790μs | 21.5553μs | 46.3923 KOps/s | 48.4428 KOps/s | |
test_plain_set_nested_inplace | 68.4580μs | 23.3214μs | 42.8791 KOps/s | 45.2207 KOps/s | |
test_plain_set_stack_nested_inplace | 68.9800μs | 23.4359μs | 42.6696 KOps/s | 44.7519 KOps/s | |
test_items | 25.6480μs | 4.1323μs | 241.9935 KOps/s | 245.3002 KOps/s | |
test_items_nested | 0.4805ms | 0.3652ms | 2.7381 KOps/s | 2.7794 KOps/s | |
test_items_nested_locked | 0.9930ms | 0.3783ms | 2.6437 KOps/s | 2.7720 KOps/s | |
test_items_nested_leaf | 0.1286ms | 67.8675μs | 14.7346 KOps/s | 14.7810 KOps/s | |
test_items_stack_nested | 0.6316ms | 0.3696ms | 2.7060 KOps/s | 2.7292 KOps/s | |
test_items_stack_nested_leaf | 0.1399ms | 71.5571μs | 13.9749 KOps/s | 14.3144 KOps/s | |
test_items_stack_nested_locked | 0.7094ms | 0.3671ms | 2.7242 KOps/s | 2.7707 KOps/s | |
test_keys | 25.7280μs | 3.4869μs | 286.7856 KOps/s | 282.2439 KOps/s | |
test_keys_nested | 0.1925ms | 99.6706μs | 10.0330 KOps/s | 9.8383 KOps/s | |
test_keys_nested_locked | 1.5537ms | 0.1042ms | 9.5927 KOps/s | 9.4033 KOps/s | |
test_keys_nested_leaf | 0.3674ms | 83.6682μs | 11.9520 KOps/s | 11.8708 KOps/s | |
test_keys_stack_nested | 0.3011ms | 0.1015ms | 9.8537 KOps/s | 10.1046 KOps/s | |
test_keys_stack_nested_leaf | 0.4868ms | 84.0192μs | 11.9020 KOps/s | 12.3312 KOps/s | |
test_keys_stack_nested_locked | 0.1862ms | 0.1051ms | 9.5128 KOps/s | 9.7035 KOps/s | |
test_values | 15.7796μs | 1.0379μs | 963.4605 KOps/s | 834.2970 KOps/s | |
test_values_nested | 0.1355ms | 75.4153μs | 13.2599 KOps/s | 13.7965 KOps/s | |
test_values_nested_locked | 0.4227ms | 78.5624μs | 12.7287 KOps/s | 13.8952 KOps/s | |
test_values_nested_leaf | 0.1115ms | 62.5477μs | 15.9878 KOps/s | 16.2343 KOps/s | |
test_values_stack_nested | 0.1366ms | 76.6555μs | 13.0454 KOps/s | 13.6255 KOps/s | |
test_values_stack_nested_leaf | 0.1074ms | 62.0800μs | 16.1082 KOps/s | 16.8759 KOps/s | |
test_values_stack_nested_locked | 0.4366ms | 76.6285μs | 13.0500 KOps/s | 13.7424 KOps/s | |
test_membership | 4.7833μs | 0.7518μs | 1.3301 MOps/s | 1.1254 MOps/s | |
test_membership_nested | 21.7100μs | 2.8130μs | 355.4885 KOps/s | 362.4918 KOps/s | |
test_membership_nested_leaf | 40.4070μs | 2.7899μs | 358.4337 KOps/s | 358.7731 KOps/s | |
test_membership_stacked_nested | 28.3930μs | 2.8634μs | 349.2363 KOps/s | 364.2244 KOps/s | |
test_membership_stacked_nested_leaf | 79.1280μs | 2.8196μs | 354.6604 KOps/s | 363.1711 KOps/s | |
test_membership_nested_last | 30.5370μs | 4.1660μs | 240.0402 KOps/s | 247.6243 KOps/s | |
test_membership_nested_leaf_last | 38.0210μs | 4.0904μs | 244.4750 KOps/s | 245.6277 KOps/s | |
test_membership_stacked_nested_last | 26.0590μs | 4.7040μs | 212.5837 KOps/s | 77.8645 KOps/s | |
test_membership_stacked_nested_leaf_last | 32.8120μs | 4.6964μs | 212.9303 KOps/s | 78.0929 KOps/s | |
test_nested_getleaf | 56.0740μs | 10.4346μs | 95.8349 KOps/s | 94.3466 KOps/s | |
test_nested_get | 39.2930μs | 10.1036μs | 98.9743 KOps/s | 101.3059 KOps/s | |
test_stacked_getleaf | 32.8910μs | 10.6130μs | 94.2239 KOps/s | 95.6416 KOps/s | |
test_stacked_get | 34.3240μs | 10.1152μs | 98.8608 KOps/s | 100.0685 KOps/s | |
test_nested_getitemleaf | 41.1970μs | 11.2685μs | 88.7426 KOps/s | 89.8133 KOps/s | |
test_nested_getitem | 29.5350μs | 10.5025μs | 95.2157 KOps/s | 97.1449 KOps/s | |
test_stacked_getitemleaf | 31.9600μs | 11.1691μs | 89.5330 KOps/s | 91.2733 KOps/s | |
test_stacked_getitem | 42.3490μs | 10.3184μs | 96.9141 KOps/s | 97.3932 KOps/s | |
test_lock_nested | 83.7402ms | 0.5838ms | 1.7130 KOps/s | 2.0717 KOps/s | |
test_lock_stack_nested | 0.7301ms | 0.4594ms | 2.1768 KOps/s | 2.2783 KOps/s | |
test_unlock_nested | 85.0735ms | 0.5025ms | 1.9899 KOps/s | 2.4766 KOps/s | |
test_unlock_stack_nested | 0.4752ms | 0.3744ms | 2.6712 KOps/s | 2.7862 KOps/s | |
test_flatten_speed | 0.1724ms | 87.9505μs | 11.3700 KOps/s | 11.5543 KOps/s | |
test_unflatten_speed | 0.5987ms | 0.4667ms | 2.1425 KOps/s | 2.2013 KOps/s | |
test_common_ops | 4.3499ms | 1.1850ms | 843.8790 Ops/s | 870.5472 Ops/s | |
test_creation | 19.7470μs | 2.0488μs | 488.0804 KOps/s | 474.6944 KOps/s | |
test_creation_empty | 47.6900μs | 20.5012μs | 48.7775 KOps/s | 53.8904 KOps/s | |
test_creation_nested_1 | 76.3540μs | 23.9944μs | 41.6763 KOps/s | 45.9136 KOps/s | |
test_creation_nested_2 | 63.0990μs | 28.9047μs | 34.5964 KOps/s | 38.1514 KOps/s | |
test_clone | 64.0600μs | 17.3105μs | 57.7685 KOps/s | 59.3395 KOps/s | |
test_getitem[int] | 1.3449ms | 16.9403μs | 59.0308 KOps/s | 60.1002 KOps/s | |
test_getitem[slice_int] | 0.1391ms | 30.2961μs | 33.0076 KOps/s | 32.7277 KOps/s | |
test_getitem[range] | 0.1685ms | 56.9686μs | 17.5535 KOps/s | 17.1206 KOps/s | |
test_getitem[tuple] | 0.1570ms | 25.5103μs | 39.1998 KOps/s | 40.0823 KOps/s | |
test_getitem[list] | 0.1846ms | 52.7222μs | 18.9674 KOps/s | 18.7168 KOps/s | |
test_setitem_dim[int] | 54.9430μs | 33.3224μs | 30.0098 KOps/s | 31.3605 KOps/s | |
test_setitem_dim[slice_int] | 0.1259ms | 63.0067μs | 15.8713 KOps/s | 16.3971 KOps/s | |
test_setitem_dim[range] | 0.1470ms | 83.8605μs | 11.9246 KOps/s | 11.8479 KOps/s | |
test_setitem_dim[tuple] | 78.8870μs | 50.6479μs | 19.7442 KOps/s | 20.8567 KOps/s | |
test_setitem | 78.1670μs | 31.0947μs | 32.1598 KOps/s | 33.5439 KOps/s | |
test_set | 97.6740μs | 30.7556μs | 32.5144 KOps/s | 34.6424 KOps/s | |
test_set_shared | 2.0354ms | 0.2173ms | 4.6021 KOps/s | 4.7130 KOps/s | |
test_update | 0.1404ms | 39.7323μs | 25.1684 KOps/s | 27.0372 KOps/s | |
test_update_nested | 0.1139ms | 49.4738μs | 20.2127 KOps/s | 21.0086 KOps/s | |
test_update__nested | 82.2840μs | 35.1012μs | 28.4890 KOps/s | 29.1290 KOps/s | |
test_set_nested | 86.7930μs | 33.2378μs | 30.0863 KOps/s | 31.9093 KOps/s | |
test_set_nested_new | 85.0800μs | 38.6232μs | 25.8912 KOps/s | 27.2813 KOps/s | |
test_select | 0.1127ms | 55.7381μs | 17.9410 KOps/s | 18.3787 KOps/s | |
test_select_nested | 0.9143ms | 61.8742μs | 16.1618 KOps/s | 16.7865 KOps/s | |
test_exclude_nested | 0.1424ms | 76.1496μs | 13.1320 KOps/s | 13.6080 KOps/s | |
test_empty[True] | 0.3700ms | 0.3161ms | 3.1635 KOps/s | 3.2102 KOps/s | |
test_empty[False] | 9.6580μs | 1.2163μs | 822.1964 KOps/s | 796.9680 KOps/s | |
test_unbind_speed | 0.6470ms | 0.3118ms | 3.2074 KOps/s | 3.3831 KOps/s | |
test_unbind_speed_stack0 | 0.4215ms | 0.3000ms | 3.3334 KOps/s | 3.4831 KOps/s | |
test_unbind_speed_stack1 | 87.9470ms | 0.8704ms | 1.1489 KOps/s | 1.4069 KOps/s | |
test_split | 86.6370ms | 2.1418ms | 466.9025 Ops/s | 469.7001 Ops/s | |
test_chunk | 2.3751ms | 1.9913ms | 502.1962 Ops/s | 464.3759 Ops/s | |
test_creation[device0] | 0.2365ms | 0.1179ms | 8.4822 KOps/s | 8.6678 KOps/s | |
test_creation_from_tensor | 3.3812ms | 0.1188ms | 8.4210 KOps/s | 8.5504 KOps/s | |
test_add_one[memmap_tensor0] | 0.2279ms | 7.1088μs | 140.6713 KOps/s | 137.8141 KOps/s | |
test_contiguous[memmap_tensor0] | 16.5110μs | 1.9241μs | 519.7237 KOps/s | 525.3771 KOps/s | |
test_stack[memmap_tensor0] | 51.2760μs | 5.7116μs | 175.0833 KOps/s | 180.0253 KOps/s | |
test_memmaptd_index | 1.1251ms | 0.3963ms | 2.5233 KOps/s | 2.5611 KOps/s | |
test_memmaptd_index_astensor | 0.9703ms | 0.4748ms | 2.1060 KOps/s | 2.1257 KOps/s | |
test_memmaptd_index_op | 86.3334ms | 1.1314ms | 883.8788 Ops/s | 981.0463 Ops/s | |
test_serialize_model | 0.1258s | 0.1170s | 8.5469 Ops/s | 8.4573 Ops/s | |
test_serialize_model_pickle | 0.4476s | 0.3960s | 2.5250 Ops/s | 2.4970 Ops/s | |
test_serialize_weights | 0.1248s | 0.1140s | 8.7681 Ops/s | 7.7194 Ops/s | |
test_serialize_weights_returnearly | 0.2491s | 0.1737s | 5.7562 Ops/s | 6.5200 Ops/s | |
test_serialize_weights_pickle | 0.5254s | 0.4135s | 2.4184 Ops/s | 2.5572 Ops/s | |
test_serialize_weights_filesystem | 0.1466s | 0.1387s | 7.2115 Ops/s | 7.1054 Ops/s | |
test_serialize_model_filesystem | 0.1592s | 0.1495s | 6.6901 Ops/s | 5.9280 Ops/s | |
test_reshape_pytree | 72.3160μs | 39.6347μs | 25.2304 KOps/s | 25.2799 KOps/s | |
test_reshape_td | 0.1061ms | 47.7679μs | 20.9346 KOps/s | 21.3047 KOps/s | |
test_view_pytree | 99.9540μs | 38.7653μs | 25.7963 KOps/s | 25.9234 KOps/s | |
test_view_td | 0.1134ms | 53.0904μs | 18.8358 KOps/s | 19.1047 KOps/s | |
test_unbind_pytree | 92.9570μs | 37.0322μs | 27.0035 KOps/s | 27.9476 KOps/s | |
test_unbind_td | 0.3003ms | 46.0888μs | 21.6972 KOps/s | 22.4558 KOps/s | |
test_split_pytree | 0.1048ms | 38.6126μs | 25.8983 KOps/s | 26.6138 KOps/s | |
test_split_td | 87.4255ms | 67.1036μs | 14.9023 KOps/s | 17.6721 KOps/s | |
test_add_pytree | 0.1105ms | 45.3439μs | 22.0537 KOps/s | 21.8766 KOps/s | |
test_add_td | 0.1663ms | 84.5650μs | 11.8252 KOps/s | 11.3427 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.1199ms | 56.6133μs | 17.6637 KOps/s | 17.7455 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.4225ms | 0.1814ms | 5.5118 KOps/s | 5.7209 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1196ms | 56.0033μs | 17.8561 KOps/s | 17.6617 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.5563ms | 0.1399ms | 7.1496 KOps/s | 7.0578 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 66.3040μs | 20.7076μs | 48.2914 KOps/s | 45.7408 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.1412ms | 68.8566μs | 14.5229 KOps/s | 14.7009 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.1457ms | 74.0461μs | 13.5051 KOps/s | 13.5881 KOps/s | |
test_compile_copy_nested[pytree-eager] | 0.1620ms | 67.0779μs | 14.9080 KOps/s | 15.1320 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.2699ms | 0.1738ms | 5.7521 KOps/s | 5.7966 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.3887ms | 0.1923ms | 5.2003 KOps/s | 5.3294 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1201ms | 45.8752μs | 21.7983 KOps/s | 21.9534 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.1622ms | 69.3193μs | 14.4260 KOps/s | 14.5270 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.2748ms | 0.1748ms | 5.7222 KOps/s | 5.7472 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.5202ms | 0.2851ms | 3.5074 KOps/s | 3.5386 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.3999ms | 0.2042ms | 4.8964 KOps/s | 4.8916 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.2814ms | 0.1737ms | 5.7576 KOps/s | 5.7865 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.2237ms | 63.8992μs | 15.6496 KOps/s | 16.0886 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1635ms | 46.3140μs | 21.5917 KOps/s | 21.6165 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.3947ms | 0.2336ms | 4.2799 KOps/s | 4.3620 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.2958ms | 0.1771ms | 5.6464 KOps/s | 5.6683 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 0.1956ms | 0.1056ms | 9.4683 KOps/s | 9.6928 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 0.1612ms | 59.8495μs | 16.7086 KOps/s | 17.5609 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1552ms | 78.5939μs | 12.7236 KOps/s | 13.0467 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1306ms | 70.2863μs | 14.2275 KOps/s | 14.6734 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 0.3340ms | 0.1969ms | 5.0782 KOps/s | 5.0713 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 2.1445ms | 1.6526ms | 605.1061 Ops/s | 611.1594 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 0.3028ms | 0.1931ms | 5.1786 KOps/s | 5.1902 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 1.8310ms | 1.0836ms | 922.8499 Ops/s | 936.6354 Ops/s | |
test_compile_assign_and_add_stack[compile] | 0.5223ms | 0.4209ms | 2.3756 KOps/s | 2.3652 KOps/s | |
test_compile_assign_and_add_stack[eager] | 5.9997ms | 3.9313ms | 254.3658 Ops/s | 267.1683 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.1009ms | 35.1636μs | 28.4385 KOps/s | 29.1521 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 1.0655ms | 47.3337μs | 21.1266 KOps/s | 21.4957 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.1090ms | 29.5188μs | 33.8767 KOps/s | 33.5196 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 0.1021ms | 28.8401μs | 34.6740 KOps/s | 35.5792 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.1063ms | 29.4140μs | 33.9974 KOps/s | 32.9661 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 93.6360μs | 29.4356μs | 33.9724 KOps/s | 35.2825 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1538ms | 73.9409μs | 13.5243 KOps/s | 13.5873 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.3735ms | 28.0011μs | 35.7129 KOps/s | 36.9020 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.1534ms | 68.7749μs | 14.5402 KOps/s | 14.8349 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 73.8580μs | 23.2837μs | 42.9485 KOps/s | 43.8890 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1856ms | 67.8846μs | 14.7309 KOps/s | 14.9396 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 63.2180μs | 23.4039μs | 42.7279 KOps/s | 44.0893 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.1805ms | 74.2538μs | 13.4673 KOps/s | 13.7046 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 1.0261ms | 27.9085μs | 35.8314 KOps/s | 37.0712 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.3095ms | 70.3797μs | 14.2086 KOps/s | 14.9343 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 69.5100μs | 23.1146μs | 43.2628 KOps/s | 44.1376 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.1453ms | 67.8278μs | 14.7432 KOps/s | 14.9297 KOps/s | |
test_compile_indexing[int-pytree-eager] | 73.2270μs | 23.0671μs | 43.3518 KOps/s | 44.1907 KOps/s | |
test_mod_add[eager] | 94.7580μs | 25.9495μs | 38.5363 KOps/s | 40.2539 KOps/s | |
test_mod_add[compile] | 99.1760μs | 39.1535μs | 25.5405 KOps/s | 25.7271 KOps/s | |
test_mod_add[compile-overhead] | 88.8360μs | 38.9173μs | 25.6955 KOps/s | 25.6494 KOps/s | |
test_mod_wrap[eager] | 0.3497ms | 0.2119ms | 4.7191 KOps/s | 4.7350 KOps/s | |
test_mod_wrap[compile] | 0.6170ms | 0.2456ms | 4.0719 KOps/s | 4.2422 KOps/s | |
test_mod_wrap[compile-overhead] | 0.7639ms | 0.2344ms | 4.2659 KOps/s | 4.2862 KOps/s | |
test_mod_wrap_and_backward[eager] | 13.4322ms | 11.3196ms | 88.3426 Ops/s | 88.7243 Ops/s | |
test_mod_wrap_and_backward[compile] | 16.1203ms | 12.0888ms | 82.7212 Ops/s | 78.9625 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 14.5347ms | 11.7905ms | 84.8142 Ops/s | 87.0312 Ops/s | |
test_seq_add[eager] | 0.1780ms | 93.7766μs | 10.6636 KOps/s | 11.0667 KOps/s | |
test_seq_add[compile] | 0.1636ms | 65.7102μs | 15.2183 KOps/s | 15.5979 KOps/s | |
test_seq_add[compile-overhead] | 0.1328ms | 62.9611μs | 15.8828 KOps/s | 15.8953 KOps/s | |
test_seq_wrap[eager] | 0.5603ms | 0.3925ms | 2.5479 KOps/s | 2.5218 KOps/s | |
test_seq_wrap[compile] | 1.1874ms | 0.2734ms | 3.6579 KOps/s | 3.6428 KOps/s | |
test_seq_wrap[compile-overhead] | 1.2524ms | 0.2788ms | 3.5870 KOps/s | 3.6404 KOps/s | |
test_func_call_runtime[False-eager] | 0.6870ms | 0.5261ms | 1.9009 KOps/s | 1.8614 KOps/s | |
test_func_call_runtime[False-compile] | 0.6084ms | 0.4989ms | 2.0044 KOps/s | 2.0005 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 1.0587ms | 0.5089ms | 1.9651 KOps/s | 1.9951 KOps/s | |
test_func_call_runtime[True-eager] | 1.0917ms | 0.7475ms | 1.3379 KOps/s | 1.3175 KOps/s | |
test_func_call_runtime[True-compile] | 0.9634ms | 0.5155ms | 1.9400 KOps/s | 1.9110 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.6151ms | 0.5109ms | 1.9574 KOps/s | 1.9100 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.7801ms | 0.5189ms | 1.9270 KOps/s | 1.8997 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.7687ms | 0.4957ms | 2.0172 KOps/s | 1.9778 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 1.0526ms | 0.4965ms | 2.0142 KOps/s | 1.9950 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.3950ms | 0.8789ms | 1.1377 KOps/s | 1.1280 KOps/s | |
test_func_call_cm_runtime[True-compile] | 1.1498ms | 0.7367ms | 1.3575 KOps/s | 1.3330 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 1.3415ms | 0.7409ms | 1.3498 KOps/s | 1.3267 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.3638ms | 1.8467ms | 541.5097 Ops/s | 537.2644 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 2.7057ms | 1.8990ms | 526.5870 Ops/s | 503.5937 Ops/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 2.6303ms | 1.9032ms | 525.4310 Ops/s | 523.4744 Ops/s | |
test_distributed | 0.2502ms | 0.1247ms | 8.0224 KOps/s | 7.9604 KOps/s | |
test_tdmodule | 0.1674ms | 20.0570μs | 49.8578 KOps/s | 54.4591 KOps/s | |
test_tdmodule_dispatch | 71.1830μs | 39.1946μs | 25.5137 KOps/s | 26.7488 KOps/s | |
test_tdseq | 40.9370μs | 21.9598μs | 45.5379 KOps/s | 46.5019 KOps/s | |
test_tdseq_dispatch | 65.1020μs | 44.0241μs | 22.7148 KOps/s | 22.6249 KOps/s | |
test_instantiation_functorch | 1.7041ms | 1.5693ms | 637.2401 Ops/s | 618.5027 Ops/s | |
test_instantiation_td | 4.1929ms | 1.1992ms | 833.9171 Ops/s | 862.6042 Ops/s | |
test_exec_functorch | 0.2724ms | 0.1844ms | 5.4220 KOps/s | 5.3810 KOps/s | |
test_exec_functional_call | 0.2842ms | 0.1749ms | 5.7165 KOps/s | 5.7982 KOps/s | |
test_exec_td | 0.2502ms | 0.1752ms | 5.7085 KOps/s | 5.8878 KOps/s | |
test_exec_td_decorator | 0.3170ms | 0.2254ms | 4.4369 KOps/s | 4.4754 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.1029ms | 0.6626ms | 1.5092 KOps/s | 1.5478 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.8669ms | 0.6455ms | 1.5493 KOps/s | 1.5679 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.9780ms | 0.4955ms | 2.0180 KOps/s | 2.0442 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.7807ms | 0.4957ms | 2.0173 KOps/s | 2.0418 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.4455ms | 0.6414ms | 1.5590 KOps/s | 1.6042 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.1250ms | 0.6437ms | 1.5535 KOps/s | 1.6111 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.8266ms | 0.5187ms | 1.9279 KOps/s | 1.9604 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.8025ms | 0.5187ms | 1.9278 KOps/s | 1.9685 KOps/s | |
test_to_module_speed[True] | 1.9223ms | 1.3169ms | 759.3339 Ops/s | 777.0628 Ops/s | |
test_to_module_speed[False] | 1.5056ms | 1.2644ms | 790.8737 Ops/s | 799.3849 Ops/s | |
test_tc_init | 0.1204ms | 48.1498μs | 20.7685 KOps/s | 21.7712 KOps/s | |
test_tc_init_nested | 0.1747ms | 96.4526μs | 10.3678 KOps/s | 10.9840 KOps/s | |
test_tc_first_layer_tensor | 17.9140μs | 1.5321μs | 652.6787 KOps/s | 670.0815 KOps/s | |
test_tc_first_layer_nontensor | 27.8220μs | 4.6816μs | 213.6019 KOps/s | 216.1416 KOps/s | |
test_tc_second_layer_tensor | 36.2540μs | 2.8335μs | 352.9169 KOps/s | 362.1665 KOps/s | |
test_tc_second_layer_nontensor | 27.8620μs | 6.0408μs | 165.5400 KOps/s | 169.0408 KOps/s | |
test_unbind | 0.4662s | 13.0895ms | 76.3973 Ops/s | 75.8455 Ops/s | |
test_full_like | 8.6785ms | 7.3364ms | 136.3065 Ops/s | 149.3339 Ops/s | |
test_zeros_like | 3.4883ms | 2.8714ms | 348.2572 Ops/s | 381.1541 Ops/s | |
test_ones_like | 3.6852ms | 3.3257ms | 300.6894 Ops/s | 328.5795 Ops/s | |
test_clone | 5.8746ms | 5.1104ms | 195.6780 Ops/s | 210.3674 Ops/s | |
test_squeeze | 74.3490μs | 13.0779μs | 76.4649 KOps/s | 73.4051 KOps/s | |
test_unsqueeze | 0.1861ms | 93.0653μs | 10.7451 KOps/s | 10.8166 KOps/s | |
test_split | 0.5568ms | 0.1939ms | 5.1570 KOps/s | 5.1202 KOps/s | |
test_permute | 0.3089ms | 0.2191ms | 4.5651 KOps/s | 4.6143 KOps/s | |
test_stack | 30.7159ms | 26.4943ms | 37.7439 Ops/s | 40.4269 Ops/s | |
test_cat | 28.1366ms | 25.7979ms | 38.7629 Ops/s | 40.7365 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 0.1473ms | 14.6117μs | 68.4385 KOps/s | 76.5902 KOps/s | |
test_plain_set_stack_nested | 48.0410μs | 14.6277μs | 68.3632 KOps/s | 75.2726 KOps/s | |
test_plain_set_nested_inplace | 45.6010μs | 15.7354μs | 63.5511 KOps/s | 70.8747 KOps/s | |
test_plain_set_stack_nested_inplace | 43.7610μs | 15.7963μs | 63.3060 KOps/s | 71.1738 KOps/s | |
test_items | 25.1500μs | 2.9266μs | 341.6961 KOps/s | 341.4362 KOps/s | |
test_items_nested | 0.3655ms | 0.3220ms | 3.1058 KOps/s | 3.0951 KOps/s | |
test_items_nested_locked | 0.3953ms | 0.3271ms | 3.0573 KOps/s | 3.0645 KOps/s | |
test_items_nested_leaf | 90.4410μs | 55.6422μs | 17.9720 KOps/s | 17.9633 KOps/s | |
test_items_stack_nested | 0.3863ms | 0.3267ms | 3.0608 KOps/s | 3.0584 KOps/s | |
test_items_stack_nested_leaf | 93.1810μs | 56.7889μs | 17.6091 KOps/s | 17.5020 KOps/s | |
test_items_stack_nested_locked | 0.3858ms | 0.3294ms | 3.0361 KOps/s | 3.0705 KOps/s | |
test_keys | 28.1300μs | 3.3957μs | 294.4900 KOps/s | 286.8991 KOps/s | |
test_keys_nested | 89.7610μs | 55.5731μs | 17.9943 KOps/s | 18.2205 KOps/s | |
test_keys_nested_locked | 2.2021ms | 61.9875μs | 16.1323 KOps/s | 16.0659 KOps/s | |
test_keys_nested_leaf | 76.2110μs | 47.3708μs | 21.1101 KOps/s | 21.0465 KOps/s | |
test_keys_stack_nested | 90.7010μs | 55.8950μs | 17.8907 KOps/s | 17.6864 KOps/s | |
test_keys_stack_nested_leaf | 77.9810μs | 47.7978μs | 20.9215 KOps/s | 20.7600 KOps/s | |
test_keys_stack_nested_locked | 96.1910μs | 60.9235μs | 16.4140 KOps/s | 15.9931 KOps/s | |
test_values | 4.8000μs | 0.8370μs | 1.1948 MOps/s | 1.1747 MOps/s | |
test_values_nested | 94.2010μs | 40.5860μs | 24.6390 KOps/s | 24.4954 KOps/s | |
test_values_nested_locked | 75.8400μs | 42.7900μs | 23.3700 KOps/s | 23.5153 KOps/s | |
test_values_nested_leaf | 64.4310μs | 35.1779μs | 28.4270 KOps/s | 28.4630 KOps/s | |
test_values_stack_nested | 74.3710μs | 41.5323μs | 24.0777 KOps/s | 23.9836 KOps/s | |
test_values_stack_nested_leaf | 66.9710μs | 35.5549μs | 28.1255 KOps/s | 27.9393 KOps/s | |
test_values_stack_nested_locked | 83.7710μs | 43.7535μs | 22.8553 KOps/s | 22.9430 KOps/s | |
test_membership | 1.8405μs | 0.5052μs | 1.9795 MOps/s | 1.9781 MOps/s | |
test_membership_nested | 18.6905μs | 1.8052μs | 553.9682 KOps/s | 547.4736 KOps/s | |
test_membership_nested_leaf | 10.5367μs | 1.7903μs | 558.5689 KOps/s | 552.7976 KOps/s | |
test_membership_stacked_nested | 46.1900μs | 1.8732μs | 533.8462 KOps/s | 528.2587 KOps/s | |
test_membership_stacked_nested_leaf | 32.9000μs | 1.8835μs | 530.9283 KOps/s | 533.5136 KOps/s | |
test_membership_nested_last | 35.8600μs | 2.7465μs | 364.0940 KOps/s | 364.9603 KOps/s | |
test_membership_nested_leaf_last | 27.9310μs | 2.7677μs | 361.3100 KOps/s | 363.7175 KOps/s | |
test_membership_stacked_nested_last | 38.4700μs | 7.8120μs | 128.0077 KOps/s | 316.5517 KOps/s | |
test_membership_stacked_nested_leaf_last | 45.0000μs | 7.8806μs | 126.8941 KOps/s | 318.4709 KOps/s | |
test_nested_getleaf | 33.1800μs | 6.0865μs | 164.2975 KOps/s | 163.0349 KOps/s | |
test_nested_get | 28.0810μs | 5.7203μs | 174.8171 KOps/s | 172.6997 KOps/s | |
test_stacked_getleaf | 38.8000μs | 6.0710μs | 164.7179 KOps/s | 164.5571 KOps/s | |
test_stacked_get | 34.6010μs | 5.7501μs | 173.9107 KOps/s | 176.2136 KOps/s | |
test_nested_getitemleaf | 35.8600μs | 6.1162μs | 163.4994 KOps/s | 163.3630 KOps/s | |
test_nested_getitem | 28.5200μs | 5.7728μs | 173.2271 KOps/s | 172.3702 KOps/s | |
test_stacked_getitemleaf | 36.1600μs | 6.1178μs | 163.4585 KOps/s | 163.8933 KOps/s | |
test_stacked_getitem | 25.9100μs | 5.7649μs | 173.4632 KOps/s | 172.9474 KOps/s | |
test_lock_nested | 4.7092ms | 0.4185ms | 2.3896 KOps/s | 2.4364 KOps/s | |
test_lock_stack_nested | 0.4191ms | 0.3681ms | 2.7166 KOps/s | 2.6785 KOps/s | |
test_unlock_nested | 0.7462ms | 0.3528ms | 2.8347 KOps/s | 2.8608 KOps/s | |
test_unlock_stack_nested | 0.3335ms | 0.3071ms | 3.2563 KOps/s | 3.1993 KOps/s | |
test_flatten_speed | 99.4210μs | 70.3966μs | 14.2052 KOps/s | 14.6724 KOps/s | |
test_unflatten_speed | 0.3442ms | 0.2825ms | 3.5393 KOps/s | 3.5291 KOps/s | |
test_common_ops | 1.5744ms | 1.2633ms | 791.5703 Ops/s | 845.6460 Ops/s | |
test_creation | 32.6700μs | 1.4668μs | 681.7752 KOps/s | 698.2502 KOps/s | |
test_creation_empty | 51.3010μs | 16.9207μs | 59.0994 KOps/s | 73.4737 KOps/s | |
test_creation_nested_1 | 48.0400μs | 18.8575μs | 53.0292 KOps/s | 65.2287 KOps/s | |
test_creation_nested_2 | 57.7700μs | 21.3673μs | 46.8005 KOps/s | 55.4558 KOps/s | |
test_clone | 70.7210μs | 28.4263μs | 35.1786 KOps/s | 35.7000 KOps/s | |
test_getitem[int] | 91.0624ms | 22.7164μs | 44.0210 KOps/s | 64.5042 KOps/s | |
test_getitem[slice_int] | 0.1182ms | 26.6866μs | 37.4720 KOps/s | 37.2882 KOps/s | |
test_getitem[range] | 0.2324ms | 0.1085ms | 9.2176 KOps/s | 9.6174 KOps/s | |
test_getitem[tuple] | 0.1216ms | 23.2436μs | 43.0226 KOps/s | 43.5181 KOps/s | |
test_getitem[list] | 0.1945ms | 97.5885μs | 10.2471 KOps/s | 10.2296 KOps/s | |
test_setitem_dim[int] | 66.7300μs | 44.5713μs | 22.4359 KOps/s | 21.8092 KOps/s | |
test_setitem_dim[slice_int] | 91.5910μs | 67.5034μs | 14.8141 KOps/s | 15.1261 KOps/s | |
test_setitem_dim[range] | 0.1605ms | 0.1259ms | 7.9453 KOps/s | 8.1035 KOps/s | |
test_setitem_dim[tuple] | 85.1310μs | 60.3367μs | 16.5737 KOps/s | 16.9289 KOps/s | |
test_setitem | 89.2210μs | 41.5768μs | 24.0519 KOps/s | 25.6618 KOps/s | |
test_set | 78.4110μs | 40.8605μs | 24.4735 KOps/s | 26.4357 KOps/s | |
test_set_shared | 0.3331ms | 49.9682μs | 20.0127 KOps/s | 20.2516 KOps/s | |
test_update | 87.1310μs | 49.5877μs | 20.1663 KOps/s | 21.9836 KOps/s | |
test_update_nested | 88.3200μs | 57.2713μs | 17.4608 KOps/s | 19.0265 KOps/s | |
test_update__nested | 0.1126ms | 57.5725μs | 17.3694 KOps/s | 17.4709 KOps/s | |
test_set_nested | 90.5810μs | 42.8461μs | 23.3394 KOps/s | 25.0268 KOps/s | |
test_set_nested_new | 82.4200μs | 46.4655μs | 21.5214 KOps/s | 22.4436 KOps/s | |
test_select | 97.5120μs | 59.6215μs | 16.7725 KOps/s | 17.4946 KOps/s | |
test_select_nested | 0.4913ms | 42.7345μs | 23.4003 KOps/s | 23.9539 KOps/s | |
test_exclude_nested | 92.3910μs | 58.5374μs | 17.0831 KOps/s | 17.3942 KOps/s | |
test_empty[True] | 0.2732ms | 0.2425ms | 4.1242 KOps/s | 4.1013 KOps/s | |
test_empty[False] | 3.7940μs | 0.7395μs | 1.3523 MOps/s | 1.3526 MOps/s | |
test_to | 53.9010μs | 24.5707μs | 40.6988 KOps/s | 39.8909 KOps/s | |
test_to_nonblocking | 62.8210μs | 23.6418μs | 42.2980 KOps/s | 41.4903 KOps/s | |
test_unbind_speed | 0.3365ms | 0.2753ms | 3.6328 KOps/s | 3.6998 KOps/s | |
test_unbind_speed_stack0 | 0.3165ms | 0.2656ms | 3.7651 KOps/s | 3.7310 KOps/s | |
test_unbind_speed_stack1 | 90.6388ms | 0.6965ms | 1.4358 KOps/s | 1.4419 KOps/s | |
test_split | 92.1653ms | 2.1486ms | 465.4150 Ops/s | 470.8769 Ops/s | |
test_chunk | 94.1542ms | 2.1559ms | 463.8447 Ops/s | 472.2673 Ops/s | |
test_creation[device0] | 0.3405ms | 0.1229ms | 8.1365 KOps/s | 8.0612 KOps/s | |
test_creation_from_tensor | 0.4876ms | 0.1257ms | 7.9540 KOps/s | 7.8767 KOps/s | |
test_add_one[memmap_tensor0] | 0.1735ms | 8.5236μs | 117.3213 KOps/s | 115.4556 KOps/s | |
test_contiguous[memmap_tensor0] | 30.0000μs | 2.1753μs | 459.6978 KOps/s | 445.3857 KOps/s | |
test_stack[memmap_tensor0] | 36.2300μs | 6.4397μs | 155.2879 KOps/s | 151.8354 KOps/s | |
test_memmaptd_index | 1.0033ms | 0.4120ms | 2.4270 KOps/s | 2.4633 KOps/s | |
test_memmaptd_index_astensor | 0.7162ms | 0.4621ms | 2.1643 KOps/s | 2.1640 KOps/s | |
test_memmaptd_index_op | 1.4035ms | 1.0038ms | 996.2011 Ops/s | 1.0370 KOps/s | |
test_serialize_model | 0.1307s | 0.1299s | 7.6992 Ops/s | 7.7409 Ops/s | |
test_serialize_model_pickle | 1.3464s | 1.2118s | 0.8252 Ops/s | 0.8207 Ops/s | |
test_serialize_weights | 0.1310s | 0.1292s | 7.7429 Ops/s | 7.0179 Ops/s | |
test_serialize_weights_returnearly | 0.2377s | 61.5034ms | 16.2593 Ops/s | 18.0950 Ops/s | |
test_serialize_weights_pickle | 1.7665s | 1.2603s | 0.7935 Ops/s | 0.6197 Ops/s | |
test_reshape_pytree | 64.2800μs | 34.9538μs | 28.6092 KOps/s | 28.9190 KOps/s | |
test_reshape_td | 77.1010μs | 41.4799μs | 24.1081 KOps/s | 24.3113 KOps/s | |
test_view_pytree | 73.9810μs | 34.9471μs | 28.6147 KOps/s | 29.1777 KOps/s | |
test_view_td | 82.1210μs | 45.1941μs | 22.1268 KOps/s | 21.3256 KOps/s | |
test_unbind_pytree | 64.0900μs | 34.0869μs | 29.3368 KOps/s | 29.9581 KOps/s | |
test_unbind_td | 0.7108ms | 42.5218μs | 23.5174 KOps/s | 23.8993 KOps/s | |
test_split_pytree | 78.1910μs | 47.0598μs | 21.2496 KOps/s | 22.6536 KOps/s | |
test_split_td | 0.1839ms | 55.1990μs | 18.1163 KOps/s | 15.8880 KOps/s | |
test_add_pytree | 0.1028ms | 55.2818μs | 18.0891 KOps/s | 18.4068 KOps/s | |
test_add_td | 0.1452ms | 90.8862μs | 11.0028 KOps/s | 11.7744 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.4054ms | 0.2088ms | 4.7886 KOps/s | 4.8204 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.1937ms | 0.1490ms | 6.7093 KOps/s | 6.8061 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.2073ms | 0.1469ms | 6.8062 KOps/s | 7.1389 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.2409ms | 0.1777ms | 5.6286 KOps/s | 5.8290 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 55.6610μs | 22.2324μs | 44.9793 KOps/s | 50.0286 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 87.4510μs | 43.3828μs | 23.0506 KOps/s | 23.7127 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.2420ms | 63.0894μs | 15.8505 KOps/s | 16.0430 KOps/s | |
test_compile_copy_nested[pytree-eager] | 0.1005ms | 49.0015μs | 20.4075 KOps/s | 20.4780 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.4081ms | 0.3094ms | 3.2321 KOps/s | 3.2487 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.2692ms | 0.2079ms | 4.8101 KOps/s | 4.8680 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1720ms | 0.1269ms | 7.8776 KOps/s | 8.0869 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.1099ms | 61.6414μs | 16.2229 KOps/s | 17.2910 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.3685ms | 0.3107ms | 3.2184 KOps/s | 3.2147 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.7185ms | 0.6040ms | 1.6556 KOps/s | 1.7244 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.2946ms | 0.2464ms | 4.0578 KOps/s | 4.0648 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.3660ms | 0.3119ms | 3.2059 KOps/s | 3.2300 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.1065ms | 69.1628μs | 14.4586 KOps/s | 14.7985 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1793ms | 0.1259ms | 7.9401 KOps/s | 8.1215 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.5967ms | 0.5147ms | 1.9429 KOps/s | 1.9997 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.3557ms | 0.3094ms | 3.2319 KOps/s | 3.2330 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 43.3700μs | 17.8564μs | 56.0022 KOps/s | 60.4932 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 69.4100μs | 27.4253μs | 36.4627 KOps/s | 37.2380 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1070ms | 67.9420μs | 14.7184 KOps/s | 14.8054 KOps/s | |
test_compile_copy_flat[pytree-eager] | 91.8900μs | 51.8180μs | 19.2983 KOps/s | 19.6484 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 2.3056ms | 0.8075ms | 1.2384 KOps/s | 1.1388 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 3.2641ms | 3.0763ms | 325.0613 Ops/s | 327.9306 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 2.2796ms | 0.8018ms | 1.2473 KOps/s | 1.1353 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 3.2487ms | 3.1033ms | 322.2415 Ops/s | 325.7881 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.1591ms | 0.1080ms | 9.2561 KOps/s | 9.3371 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.1915ms | 60.1702μs | 16.6195 KOps/s | 16.0320 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.1409ms | 0.1009ms | 9.9061 KOps/s | 9.7728 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 0.1720ms | 42.5894μs | 23.4800 KOps/s | 23.3756 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.1496ms | 0.1052ms | 9.5035 KOps/s | 9.8887 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 87.9610μs | 42.5063μs | 23.5259 KOps/s | 22.6068 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.2059ms | 0.1386ms | 7.2144 KOps/s | 7.5614 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.1555ms | 24.0468μs | 41.5855 KOps/s | 41.6255 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.2071ms | 0.1276ms | 7.8378 KOps/s | 7.9461 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 70.9410μs | 20.1958μs | 49.5153 KOps/s | 49.5986 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1850ms | 0.1281ms | 7.8090 KOps/s | 7.8376 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 57.3710μs | 20.3708μs | 49.0899 KOps/s | 50.5460 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.1826ms | 0.1344ms | 7.4409 KOps/s | 7.5187 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.5111ms | 23.8349μs | 41.9553 KOps/s | 41.2057 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.1634ms | 0.1280ms | 7.8110 KOps/s | 7.8787 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 0.1199ms | 20.8187μs | 48.0337 KOps/s | 49.8992 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.1906ms | 0.1280ms | 7.8128 KOps/s | 7.8503 KOps/s | |
test_compile_indexing[int-pytree-eager] | 52.1010μs | 20.2113μs | 49.4772 KOps/s | 49.7982 KOps/s | |
test_mod_add[eager] | 71.8010μs | 31.5597μs | 31.6860 KOps/s | 33.9726 KOps/s | |
test_mod_add[compile] | 0.1751ms | 68.4185μs | 14.6159 KOps/s | 14.2624 KOps/s | |
test_mod_add[compile-overhead] | 0.2616ms | 0.1330ms | 7.5173 KOps/s | 7.0118 KOps/s | |
test_mod_wrap[eager] | 0.3464ms | 0.2443ms | 4.0934 KOps/s | 4.1686 KOps/s | |
test_mod_wrap[compile] | 0.6390ms | 0.2841ms | 3.5197 KOps/s | 3.4206 KOps/s | |
test_mod_wrap[compile-overhead] | 7.6480ms | 4.1055ms | 243.5728 Ops/s | 246.0068 Ops/s | |
test_mod_wrap_and_backward[eager] | 1.4965ms | 1.3657ms | 732.2291 Ops/s | 679.5456 Ops/s | |
test_mod_wrap_and_backward[compile] | 1.5872ms | 1.3106ms | 763.0222 Ops/s | 701.3656 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 1.3376ms | 0.8990ms | 1.1123 KOps/s | 953.0049 Ops/s | |
test_seq_add[eager] | 0.1517ms | 93.6108μs | 10.6825 KOps/s | 10.7992 KOps/s | |
test_seq_add[compile] | 0.5246ms | 77.2442μs | 12.9460 KOps/s | 12.4963 KOps/s | |
test_seq_add[compile-overhead] | 0.1603ms | 0.1113ms | 8.9854 KOps/s | 9.0779 KOps/s | |
test_seq_wrap[eager] | 0.4418ms | 0.3799ms | 2.6323 KOps/s | 2.6760 KOps/s | |
test_seq_wrap[compile] | 0.3778ms | 0.3035ms | 3.2944 KOps/s | 3.2190 KOps/s | |
test_seq_wrap[compile-overhead] | 0.2592ms | 0.2137ms | 4.6786 KOps/s | 4.5779 KOps/s | |
test_func_call_runtime[False-eager] | 0.8184ms | 0.7347ms | 1.3611 KOps/s | 1.3064 KOps/s | |
test_func_call_runtime[False-compile] | 0.9276ms | 0.7561ms | 1.3226 KOps/s | 1.2997 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.4204ms | 0.3523ms | 2.8386 KOps/s | 2.8019 KOps/s | |
test_func_call_runtime[True-eager] | 0.9610ms | 0.8894ms | 1.1243 KOps/s | 1.1072 KOps/s | |
test_func_call_runtime[True-compile] | 0.9383ms | 0.7758ms | 1.2889 KOps/s | 1.2685 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.4892ms | 0.3740ms | 2.6739 KOps/s | 2.6759 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.7840ms | 0.7266ms | 1.3762 KOps/s | 1.3613 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.8891ms | 0.7570ms | 1.3209 KOps/s | 1.2953 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.4412ms | 0.3530ms | 2.8329 KOps/s | 2.8085 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.0686ms | 0.9783ms | 1.0222 KOps/s | 1.0051 KOps/s | |
test_func_call_cm_runtime[True-compile] | 1.0260ms | 0.8109ms | 1.2332 KOps/s | 1.2171 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 0.4414ms | 0.3970ms | 2.5189 KOps/s | 2.4920 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.4962ms | 2.0532ms | 487.0368 Ops/s | 483.0382 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 0.9428ms | 0.8182ms | 1.2221 KOps/s | 1.1962 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.4583ms | 0.4050ms | 2.4689 KOps/s | 2.4694 KOps/s | |
test_distributed | 2.7136ms | 0.1894ms | 5.2802 KOps/s | 8.8202 KOps/s | |
test_tdmodule | 0.2631ms | 15.9217μs | 62.8073 KOps/s | 69.4944 KOps/s | |
test_tdmodule_dispatch | 50.2400μs | 30.5473μs | 32.7361 KOps/s | 37.5434 KOps/s | |
test_tdseq | 25.3800μs | 16.3703μs | 61.0864 KOps/s | 68.2745 KOps/s | |
test_tdseq_dispatch | 53.9210μs | 33.4116μs | 29.9298 KOps/s | 34.3292 KOps/s | |
test_instantiation_functorch | 2.0047ms | 1.8308ms | 546.2184 Ops/s | 548.5747 Ops/s | |
test_instantiation_td | 1.7687ms | 1.1685ms | 855.8304 Ops/s | 848.6527 Ops/s | |
test_exec_functorch | 0.3496ms | 0.2110ms | 4.7391 KOps/s | 4.8194 KOps/s | |
test_exec_functional_call | 0.3115ms | 0.2234ms | 4.4761 KOps/s | 4.6685 KOps/s | |
test_exec_td | 0.2789ms | 0.2283ms | 4.3795 KOps/s | 4.3335 KOps/s | |
test_exec_td_decorator | 0.8523ms | 0.2701ms | 3.7022 KOps/s | 3.8768 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.7666ms | 0.6830ms | 1.4642 KOps/s | 1.4668 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.7614ms | 0.6841ms | 1.4618 KOps/s | 1.4706 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.6612ms | 0.5774ms | 1.7320 KOps/s | 1.7333 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.6420ms | 0.5736ms | 1.7435 KOps/s | 1.7592 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.2335ms | 0.6727ms | 1.4864 KOps/s | 1.5092 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.7840ms | 0.6708ms | 1.4908 KOps/s | 1.5015 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7272ms | 0.5843ms | 1.7114 KOps/s | 1.6906 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.6886ms | 0.5854ms | 1.7082 KOps/s | 1.6720 KOps/s | |
test_vmap_transformer_speed[True-True] | 8.3700ms | 8.2960ms | 120.5400 Ops/s | 120.1404 Ops/s | |
test_vmap_transformer_speed[True-False] | 8.6670ms | 8.3971ms | 119.0891 Ops/s | 120.2400 Ops/s | |
test_vmap_transformer_speed[False-True] | 8.4092ms | 8.2957ms | 120.5448 Ops/s | 123.0963 Ops/s | |
test_vmap_transformer_speed[False-False] | 8.5334ms | 8.3108ms | 120.3253 Ops/s | 123.8357 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 20.4039ms | 20.0035ms | 49.9912 Ops/s | 51.7147 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 20.1065ms | 19.5820ms | 51.0673 Ops/s | 51.6551 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 19.6443ms | 19.2568ms | 51.9298 Ops/s | 52.1132 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 19.3843ms | 19.2839ms | 51.8568 Ops/s | 52.0436 Ops/s | |
test_to_module_speed[True] | 1.3873ms | 0.9345ms | 1.0701 KOps/s | 1.0678 KOps/s | |
test_to_module_speed[False] | 1.2673ms | 0.8915ms | 1.1217 KOps/s | 1.0951 KOps/s | |
test_tc_init | 56.1710μs | 34.6229μs | 28.8826 KOps/s | 30.2922 KOps/s | |
test_tc_init_nested | 0.1158ms | 72.5173μs | 13.7898 KOps/s | 15.0748 KOps/s | |
test_tc_first_layer_tensor | 4.1316μs | 0.6696μs | 1.4933 MOps/s | 1.4954 MOps/s | |
test_tc_first_layer_nontensor | 26.3900μs | 2.2010μs | 454.3398 KOps/s | 453.0034 KOps/s | |
test_tc_second_layer_tensor | 9.1050μs | 1.3594μs | 735.6153 KOps/s | 741.9127 KOps/s | |
test_tc_second_layer_nontensor | 30.1900μs | 2.9209μs | 342.3587 KOps/s | 343.6494 KOps/s | |
test_unbind | 0.1966s | 12.3039ms | 81.2754 Ops/s | 93.5989 Ops/s | |
test_full_like | 0.6571ms | 0.5747ms | 1.7401 KOps/s | 1.7382 KOps/s | |
test_zeros_like | 0.2768ms | 0.1979ms | 5.0527 KOps/s | 5.0536 KOps/s | |
test_ones_like | 0.2333ms | 0.1978ms | 5.0568 KOps/s | 5.0579 KOps/s | |
test_clone | 0.4480ms | 0.4144ms | 2.4131 KOps/s | 2.4139 KOps/s | |
test_squeeze | 50.0610μs | 9.6076μs | 104.0845 KOps/s | 101.5683 KOps/s | |
test_unsqueeze | 0.2193ms | 73.4244μs | 13.6194 KOps/s | 14.0090 KOps/s | |
test_split | 0.4298ms | 0.1552ms | 6.4441 KOps/s | 6.4738 KOps/s | |
test_permute | 0.2267ms | 0.1730ms | 5.7817 KOps/s | 5.5896 KOps/s | |
test_stack | 1.2661ms | 0.8626ms | 1.1593 KOps/s | 1.1424 KOps/s | |
test_cat | 1.2566ms | 1.2314ms | 812.0774 Ops/s | 811.8455 Ops/s |
vmoens
added a commit
that referenced
this pull request
Sep 25, 2024
ghstack-source-id: 69f4795d5cb81db7b79d9c98626414c4cc5ce886 Pull Request resolved: #1009
vmoens
added a commit
that referenced
this pull request
Sep 25, 2024
ghstack-source-id: 69f4795d5cb81db7b79d9c98626414c4cc5ce886 Pull Request resolved: #1009
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something isn't working
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Refactor
Refactoring code - not a new feature
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):