-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] better sync and instantiation of cudagraphs #1013
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ghstack-source-id: d12b596cce3db900ca584d0956cef03105db510f Pull Request resolved: #1011
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 75.9510μs | 20.8001μs | 48.0768 KOps/s | 50.7321 KOps/s | |
test_plain_set_stack_nested | 57.4580μs | 20.5283μs | 48.7132 KOps/s | 50.0294 KOps/s | |
test_plain_set_nested_inplace | 81.2310μs | 22.1863μs | 45.0729 KOps/s | 45.7433 KOps/s | |
test_plain_set_stack_nested_inplace | 59.8820μs | 22.5394μs | 44.3667 KOps/s | 46.5999 KOps/s | |
test_items | 44.8940μs | 4.1201μs | 242.7117 KOps/s | 243.8054 KOps/s | |
test_items_nested | 0.6724ms | 0.3657ms | 2.7348 KOps/s | 2.7611 KOps/s | |
test_items_nested_locked | 0.5893ms | 0.3676ms | 2.7203 KOps/s | 2.7386 KOps/s | |
test_items_nested_leaf | 0.1263ms | 69.5355μs | 14.3811 KOps/s | 14.7118 KOps/s | |
test_items_stack_nested | 0.6441ms | 0.3704ms | 2.7001 KOps/s | 2.7382 KOps/s | |
test_items_stack_nested_leaf | 0.1741ms | 72.9856μs | 13.7013 KOps/s | 14.0434 KOps/s | |
test_items_stack_nested_locked | 0.6210ms | 0.3725ms | 2.6847 KOps/s | 2.7422 KOps/s | |
test_keys | 53.0910μs | 3.5419μs | 282.3346 KOps/s | 287.8711 KOps/s | |
test_keys_nested | 0.1546ms | 0.1009ms | 9.9067 KOps/s | 9.8705 KOps/s | |
test_keys_nested_locked | 0.6963ms | 0.1065ms | 9.3911 KOps/s | 9.3930 KOps/s | |
test_keys_nested_leaf | 0.1436ms | 83.8502μs | 11.9260 KOps/s | 11.7913 KOps/s | |
test_keys_stack_nested | 0.2021ms | 0.1013ms | 9.8691 KOps/s | 9.7950 KOps/s | |
test_keys_stack_nested_leaf | 0.1401ms | 82.3233μs | 12.1472 KOps/s | 11.6353 KOps/s | |
test_keys_stack_nested_locked | 0.1729ms | 0.1050ms | 9.5276 KOps/s | 9.3013 KOps/s | |
test_values | 6.9328μs | 1.0821μs | 924.0925 KOps/s | 980.4459 KOps/s | |
test_values_nested | 0.1243ms | 73.6632μs | 13.5753 KOps/s | 13.6180 KOps/s | |
test_values_nested_locked | 0.1260ms | 73.5766μs | 13.5913 KOps/s | 13.5671 KOps/s | |
test_values_nested_leaf | 0.1206ms | 62.7534μs | 15.9354 KOps/s | 16.1928 KOps/s | |
test_values_stack_nested | 0.1252ms | 74.1892μs | 13.4791 KOps/s | 13.5026 KOps/s | |
test_values_stack_nested_leaf | 0.1083ms | 60.2811μs | 16.5889 KOps/s | 15.8408 KOps/s | |
test_values_stack_nested_locked | 0.1280ms | 74.0579μs | 13.5030 KOps/s | 13.3820 KOps/s | |
test_membership | 5.6060μs | 0.7375μs | 1.3559 MOps/s | 1.3993 MOps/s | |
test_membership_nested | 29.6350μs | 2.7725μs | 360.6865 KOps/s | 360.4876 KOps/s | |
test_membership_nested_leaf | 25.9280μs | 2.8116μs | 355.6664 KOps/s | 361.8073 KOps/s | |
test_membership_stacked_nested | 25.2870μs | 2.7904μs | 358.3728 KOps/s | 364.9851 KOps/s | |
test_membership_stacked_nested_leaf | 21.0890μs | 2.8152μs | 355.2141 KOps/s | 364.6970 KOps/s | |
test_membership_nested_last | 29.1340μs | 4.0657μs | 245.9599 KOps/s | 250.1414 KOps/s | |
test_membership_nested_leaf_last | 34.2340μs | 4.0594μs | 246.3428 KOps/s | 251.2361 KOps/s | |
test_membership_stacked_nested_last | 57.6370μs | 13.1865μs | 75.8354 KOps/s | 253.2746 KOps/s | |
test_membership_stacked_nested_leaf_last | 37.8400μs | 13.2308μs | 75.5811 KOps/s | 252.0183 KOps/s | |
test_nested_getleaf | 50.3530μs | 10.7214μs | 93.2718 KOps/s | 95.3846 KOps/s | |
test_nested_get | 40.6050μs | 10.2226μs | 97.8227 KOps/s | 99.9076 KOps/s | |
test_stacked_getleaf | 39.4840μs | 10.9375μs | 91.4283 KOps/s | 94.3651 KOps/s | |
test_stacked_get | 45.6950μs | 10.0671μs | 99.3339 KOps/s | 98.0864 KOps/s | |
test_nested_getitemleaf | 52.7260μs | 11.0438μs | 90.5489 KOps/s | 91.6781 KOps/s | |
test_nested_getitem | 42.1580μs | 10.6054μs | 94.2916 KOps/s | 97.6586 KOps/s | |
test_stacked_getitemleaf | 56.5360μs | 10.9709μs | 91.1501 KOps/s | 91.4678 KOps/s | |
test_stacked_getitem | 60.0400μs | 10.6210μs | 94.1529 KOps/s | 95.7942 KOps/s | |
test_lock_nested | 97.3995ms | 0.5964ms | 1.6766 KOps/s | 1.9966 KOps/s | |
test_lock_stack_nested | 0.5428ms | 0.4455ms | 2.2449 KOps/s | 2.1554 KOps/s | |
test_unlock_nested | 98.2050ms | 0.5116ms | 1.9546 KOps/s | 2.3773 KOps/s | |
test_unlock_stack_nested | 0.4603ms | 0.3609ms | 2.7706 KOps/s | 2.6067 KOps/s | |
test_flatten_speed | 0.1754ms | 90.6780μs | 11.0280 KOps/s | 11.4360 KOps/s | |
test_unflatten_speed | 0.7547ms | 0.4791ms | 2.0874 KOps/s | 2.1573 KOps/s | |
test_common_ops | 5.1349ms | 1.1675ms | 856.5629 Ops/s | 881.1135 Ops/s | |
test_creation | 27.8420μs | 2.1098μs | 473.9783 KOps/s | 485.2424 KOps/s | |
test_creation_empty | 51.7960μs | 17.5536μs | 56.9685 KOps/s | 59.8439 KOps/s | |
test_creation_nested_1 | 61.0140μs | 20.4842μs | 48.8181 KOps/s | 49.8920 KOps/s | |
test_creation_nested_2 | 60.0520μs | 25.5412μs | 39.1524 KOps/s | 40.7313 KOps/s | |
test_clone | 0.3770ms | 17.2496μs | 57.9723 KOps/s | 57.6422 KOps/s | |
test_getitem[int] | 0.6604ms | 17.7159μs | 56.4464 KOps/s | 58.6095 KOps/s | |
test_getitem[slice_int] | 0.1353ms | 32.6457μs | 30.6319 KOps/s | 32.2090 KOps/s | |
test_getitem[range] | 0.1774ms | 61.0101μs | 16.3907 KOps/s | 16.4984 KOps/s | |
test_getitem[tuple] | 0.1402ms | 26.6561μs | 37.5148 KOps/s | 38.7592 KOps/s | |
test_getitem[list] | 0.5553ms | 56.8491μs | 17.5904 KOps/s | 17.8335 KOps/s | |
test_setitem_dim[int] | 80.8910μs | 34.8353μs | 28.7065 KOps/s | 29.8642 KOps/s | |
test_setitem_dim[slice_int] | 0.1158ms | 64.0303μs | 15.6176 KOps/s | 15.2459 KOps/s | |
test_setitem_dim[range] | 0.1710ms | 88.6989μs | 11.2741 KOps/s | 11.3927 KOps/s | |
test_setitem_dim[tuple] | 0.1171ms | 53.7709μs | 18.5974 KOps/s | 19.9277 KOps/s | |
test_setitem | 0.3872ms | 29.9121μs | 33.4313 KOps/s | 33.8729 KOps/s | |
test_set | 0.3678ms | 28.4233μs | 35.1824 KOps/s | 34.8321 KOps/s | |
test_set_shared | 3.3666ms | 0.2208ms | 4.5280 KOps/s | 4.5524 KOps/s | |
test_update | 0.3385ms | 35.8435μs | 27.8990 KOps/s | 27.7882 KOps/s | |
test_update_nested | 0.3925ms | 46.7282μs | 21.4004 KOps/s | 21.5400 KOps/s | |
test_update__nested | 0.3967ms | 35.1300μs | 28.4657 KOps/s | 28.5458 KOps/s | |
test_set_nested | 0.3414ms | 31.0999μs | 32.1544 KOps/s | 32.1246 KOps/s | |
test_set_nested_new | 0.3754ms | 38.0387μs | 26.2890 KOps/s | 27.4331 KOps/s | |
test_select | 0.4082ms | 55.2550μs | 18.0979 KOps/s | 18.3584 KOps/s | |
test_select_nested | 0.1408ms | 60.6033μs | 16.5008 KOps/s | 16.9204 KOps/s | |
test_exclude_nested | 0.1507ms | 76.0698μs | 13.1458 KOps/s | 13.4374 KOps/s | |
test_empty[True] | 1.0859ms | 0.3254ms | 3.0736 KOps/s | 3.1022 KOps/s | |
test_empty[False] | 8.3780μs | 1.2482μs | 801.1698 KOps/s | 829.3826 KOps/s | |
test_unbind_speed | 0.3818ms | 0.3041ms | 3.2886 KOps/s | 3.2625 KOps/s | |
test_unbind_speed_stack0 | 0.5818ms | 0.2893ms | 3.4566 KOps/s | 3.3388 KOps/s | |
test_unbind_speed_stack1 | 0.1022s | 0.7964ms | 1.2557 KOps/s | 1.4574 KOps/s | |
test_split | 3.1457ms | 2.0615ms | 485.0724 Ops/s | 456.7281 Ops/s | |
test_chunk | 0.1025s | 2.2731ms | 439.9201 Ops/s | 456.7179 Ops/s | |
test_creation[device0] | 0.2710ms | 0.1191ms | 8.3931 KOps/s | 8.3380 KOps/s | |
test_creation_from_tensor | 3.5683ms | 0.1200ms | 8.3330 KOps/s | 8.2538 KOps/s | |
test_add_one[memmap_tensor0] | 0.6545ms | 7.6524μs | 130.6785 KOps/s | 137.9640 KOps/s | |
test_contiguous[memmap_tensor0] | 24.0750μs | 1.8669μs | 535.6376 KOps/s | 513.5518 KOps/s | |
test_stack[memmap_tensor0] | 0.1258ms | 5.7612μs | 173.5746 KOps/s | 171.6360 KOps/s | |
test_memmaptd_index | 1.1863ms | 0.4146ms | 2.4121 KOps/s | 2.4731 KOps/s | |
test_memmaptd_index_astensor | 0.7556ms | 0.4900ms | 2.0406 KOps/s | 2.0532 KOps/s | |
test_memmaptd_index_op | 1.7610ms | 1.0450ms | 956.9662 Ops/s | 989.7771 Ops/s | |
test_serialize_model | 0.2209s | 0.1400s | 7.1404 Ops/s | 8.3259 Ops/s | |
test_serialize_model_pickle | 0.4747s | 0.3946s | 2.5343 Ops/s | 2.5269 Ops/s | |
test_serialize_weights | 0.1305s | 0.1194s | 8.3756 Ops/s | 8.3300 Ops/s | |
test_serialize_weights_returnearly | 0.2605s | 0.1729s | 5.7846 Ops/s | 6.2410 Ops/s | |
test_serialize_weights_pickle | 0.4502s | 0.3988s | 2.5077 Ops/s | 2.3279 Ops/s | |
test_serialize_weights_filesystem | 0.1509s | 0.1445s | 6.9181 Ops/s | 6.8625 Ops/s | |
test_serialize_model_filesystem | 0.1606s | 0.1528s | 6.5438 Ops/s | 5.9774 Ops/s | |
test_reshape_pytree | 83.2150μs | 39.7709μs | 25.1440 KOps/s | 25.5039 KOps/s | |
test_reshape_td | 0.1209ms | 46.2590μs | 21.6174 KOps/s | 22.2850 KOps/s | |
test_view_pytree | 79.5180μs | 39.7882μs | 25.1331 KOps/s | 25.6331 KOps/s | |
test_view_td | 0.1337ms | 53.3852μs | 18.7318 KOps/s | 18.8463 KOps/s | |
test_unbind_pytree | 81.1010μs | 36.7487μs | 27.2119 KOps/s | 27.5695 KOps/s | |
test_unbind_td | 0.2920ms | 45.2945μs | 22.0777 KOps/s | 22.0417 KOps/s | |
test_split_pytree | 0.1120ms | 39.8635μs | 25.0856 KOps/s | 26.4998 KOps/s | |
test_split_td | 0.4517ms | 59.0595μs | 16.9321 KOps/s | 17.7012 KOps/s | |
test_add_pytree | 0.1149ms | 46.6712μs | 21.4265 KOps/s | 21.9586 KOps/s | |
test_add_td | 0.1675ms | 83.6977μs | 11.9478 KOps/s | 12.2541 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.1097ms | 57.3176μs | 17.4466 KOps/s | 16.8204 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.3943ms | 0.1846ms | 5.4159 KOps/s | 5.5192 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1256ms | 56.8667μs | 17.5850 KOps/s | 17.0718 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.2874ms | 0.1471ms | 6.7960 KOps/s | 7.0898 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 55.8840μs | 21.0877μs | 47.4209 KOps/s | 46.9453 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.1300ms | 67.5963μs | 14.7937 KOps/s | 14.9844 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.1580ms | 76.7841μs | 13.0235 KOps/s | 13.4418 KOps/s | |
test_compile_copy_nested[pytree-eager] | 0.1284ms | 69.7734μs | 14.3321 KOps/s | 14.9228 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.2703ms | 0.1748ms | 5.7216 KOps/s | 5.6349 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.3476ms | 0.1971ms | 5.0741 KOps/s | 5.1750 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1160ms | 47.3711μs | 21.1099 KOps/s | 19.9415 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.1495ms | 72.1403μs | 13.8619 KOps/s | 14.0120 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.2951ms | 0.1779ms | 5.6209 KOps/s | 5.6541 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.5068ms | 0.3014ms | 3.3179 KOps/s | 3.4049 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.4241ms | 0.2122ms | 4.7115 KOps/s | 4.9723 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.3738ms | 0.1775ms | 5.6334 KOps/s | 5.6543 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.1400ms | 65.0401μs | 15.3751 KOps/s | 16.2785 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1129ms | 47.3405μs | 21.1236 KOps/s | 19.8788 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.4285ms | 0.2365ms | 4.2279 KOps/s | 4.2604 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.4086ms | 0.1804ms | 5.5431 KOps/s | 5.5809 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 0.1891ms | 0.1036ms | 9.6494 KOps/s | 9.4521 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 0.1178ms | 57.9392μs | 17.2595 KOps/s | 17.4645 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1529ms | 77.0701μs | 12.9752 KOps/s | 13.2221 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1422ms | 69.4458μs | 14.3997 KOps/s | 14.6270 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 0.2638ms | 0.1929ms | 5.1853 KOps/s | 5.1322 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 2.6254ms | 1.7293ms | 578.2753 Ops/s | 590.7948 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 0.2887ms | 0.1924ms | 5.1986 KOps/s | 5.1396 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 1.3555ms | 1.1396ms | 877.5318 Ops/s | 896.1605 Ops/s | |
test_compile_assign_and_add_stack[compile] | 0.5337ms | 0.4183ms | 2.3904 KOps/s | 2.3091 KOps/s | |
test_compile_assign_and_add_stack[eager] | 5.7078ms | 3.9884ms | 250.7299 Ops/s | 257.0290 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 80.8810μs | 35.0470μs | 28.5331 KOps/s | 27.1591 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.6155ms | 49.7886μs | 20.0849 KOps/s | 20.0128 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.1404ms | 30.4633μs | 32.8264 KOps/s | 32.4168 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 0.1045ms | 29.8820μs | 33.4649 KOps/s | 34.9710 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 80.7400μs | 29.7354μs | 33.6299 KOps/s | 32.6941 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 0.1247ms | 29.3769μs | 34.0404 KOps/s | 34.3908 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1556ms | 74.3416μs | 13.4514 KOps/s | 13.1459 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.6151ms | 28.5626μs | 35.0108 KOps/s | 35.7613 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.1688ms | 68.8127μs | 14.5322 KOps/s | 14.2080 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 81.4810μs | 23.8567μs | 41.9169 KOps/s | 42.9265 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1620ms | 68.5795μs | 14.5816 KOps/s | 14.3254 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 76.4620μs | 23.5948μs | 42.3822 KOps/s | 43.0981 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.1606ms | 74.8603μs | 13.3582 KOps/s | 13.1142 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 1.2940ms | 28.5833μs | 34.9855 KOps/s | 35.7834 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.1479ms | 68.6829μs | 14.5597 KOps/s | 14.1857 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 83.4050μs | 24.1839μs | 41.3498 KOps/s | 43.1998 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.1682ms | 69.0550μs | 14.4812 KOps/s | 14.3822 KOps/s | |
test_compile_indexing[int-pytree-eager] | 85.9000μs | 23.2966μs | 42.9247 KOps/s | 43.7718 KOps/s | |
test_mod_add[eager] | 79.9190μs | 26.2216μs | 38.1365 KOps/s | 40.6286 KOps/s | |
test_mod_add[compile] | 94.8970μs | 40.3340μs | 24.7930 KOps/s | 24.0370 KOps/s | |
test_mod_add[compile-overhead] | 0.1040ms | 40.7676μs | 24.5293 KOps/s | 23.8894 KOps/s | |
test_mod_wrap[eager] | 0.4197ms | 0.2170ms | 4.6077 KOps/s | 4.6565 KOps/s | |
test_mod_wrap[compile] | 0.4251ms | 0.2402ms | 4.1631 KOps/s | 4.1761 KOps/s | |
test_mod_wrap[compile-overhead] | 0.4374ms | 0.2377ms | 4.2064 KOps/s | 4.2113 KOps/s | |
test_mod_wrap_and_backward[eager] | 12.1570ms | 11.0382ms | 90.5941 Ops/s | 89.8722 Ops/s | |
test_mod_wrap_and_backward[compile] | 13.4672ms | 11.3170ms | 88.3626 Ops/s | 87.5500 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 15.2512ms | 11.8384ms | 84.4707 Ops/s | 83.4057 Ops/s | |
test_seq_add[eager] | 0.1799ms | 95.7162μs | 10.4476 KOps/s | 11.0292 KOps/s | |
test_seq_add[compile] | 0.1657ms | 66.1192μs | 15.1242 KOps/s | 15.4909 KOps/s | |
test_seq_add[compile-overhead] | 0.1296ms | 64.3328μs | 15.5442 KOps/s | 15.6692 KOps/s | |
test_seq_wrap[eager] | 0.6069ms | 0.3977ms | 2.5144 KOps/s | 2.5867 KOps/s | |
test_seq_wrap[compile] | 1.4902ms | 0.2763ms | 3.6196 KOps/s | 3.6484 KOps/s | |
test_seq_wrap[compile-overhead] | 1.5212ms | 0.2750ms | 3.6357 KOps/s | 3.6097 KOps/s | |
test_func_call_runtime[False-eager] | 0.8836ms | 0.5438ms | 1.8391 KOps/s | 1.8191 KOps/s | |
test_func_call_runtime[False-compile] | 0.6949ms | 0.5092ms | 1.9637 KOps/s | 1.9694 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.6372ms | 0.5120ms | 1.9533 KOps/s | 1.9493 KOps/s | |
test_func_call_runtime[True-eager] | 0.9934ms | 0.7791ms | 1.2836 KOps/s | 1.2984 KOps/s | |
test_func_call_runtime[True-compile] | 0.6513ms | 0.5221ms | 1.9152 KOps/s | 1.9144 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.6960ms | 0.5223ms | 1.9148 KOps/s | 1.9063 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.8022ms | 0.5388ms | 1.8560 KOps/s | 1.8364 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.9879ms | 0.5145ms | 1.9436 KOps/s | 1.9465 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.6742ms | 0.5119ms | 1.9535 KOps/s | 1.9404 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.4854ms | 0.9084ms | 1.1008 KOps/s | 1.1115 KOps/s | |
test_func_call_cm_runtime[True-compile] | 1.2101ms | 0.7659ms | 1.3057 KOps/s | 1.3006 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 0.9432ms | 0.7663ms | 1.3049 KOps/s | 1.3014 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.6818ms | 1.9425ms | 514.7889 Ops/s | 515.9657 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 2.9737ms | 1.9934ms | 501.6508 Ops/s | 501.4311 Ops/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 3.6924ms | 1.9965ms | 500.8657 Ops/s | 503.6745 Ops/s | |
test_distributed | 0.3298ms | 0.1278ms | 7.8241 KOps/s | 7.5702 KOps/s | |
test_tdmodule | 45.5350μs | 18.4943μs | 54.0708 KOps/s | 54.6830 KOps/s | |
test_tdmodule_dispatch | 63.3280μs | 36.5221μs | 27.3807 KOps/s | 27.9754 KOps/s | |
test_tdseq | 45.9260μs | 20.9951μs | 47.6301 KOps/s | 48.6815 KOps/s | |
test_tdseq_dispatch | 61.1540μs | 42.0245μs | 23.7957 KOps/s | 24.0189 KOps/s | |
test_instantiation_functorch | 2.4924ms | 1.6169ms | 618.4732 Ops/s | 617.7070 Ops/s | |
test_instantiation_td | 2.2349ms | 1.2073ms | 828.3049 Ops/s | 826.1920 Ops/s | |
test_exec_functorch | 0.4461ms | 0.1895ms | 5.2769 KOps/s | 5.2423 KOps/s | |
test_exec_functional_call | 0.2868ms | 0.1799ms | 5.5572 KOps/s | 5.6105 KOps/s | |
test_exec_td | 0.3201ms | 0.1690ms | 5.9159 KOps/s | 5.6911 KOps/s | |
test_exec_td_decorator | 0.4763ms | 0.2314ms | 4.3210 KOps/s | 4.3719 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.9807ms | 0.6704ms | 1.4917 KOps/s | 1.4772 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.9746ms | 0.6656ms | 1.5024 KOps/s | 1.5038 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.9445ms | 0.5174ms | 1.9328 KOps/s | 1.9672 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.7555ms | 0.5185ms | 1.9285 KOps/s | 1.9810 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.6486ms | 0.6468ms | 1.5460 KOps/s | 1.5556 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.9718ms | 0.6485ms | 1.5419 KOps/s | 1.5688 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7387ms | 0.5333ms | 1.8751 KOps/s | 1.9181 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7449ms | 0.5342ms | 1.8718 KOps/s | 1.8834 KOps/s | |
test_to_module_speed[True] | 2.0900ms | 1.3442ms | 743.9316 Ops/s | 773.6573 Ops/s | |
test_to_module_speed[False] | 1.4437ms | 1.3083ms | 764.3363 Ops/s | 791.5185 Ops/s | |
test_tc_init | 89.5270μs | 44.4913μs | 22.4763 KOps/s | 23.5745 KOps/s | |
test_tc_init_nested | 0.1705ms | 90.5980μs | 11.0378 KOps/s | 11.8961 KOps/s | |
test_tc_first_layer_tensor | 19.8370μs | 1.5803μs | 632.8031 KOps/s | 665.8089 KOps/s | |
test_tc_first_layer_nontensor | 26.8900μs | 4.8566μs | 205.9059 KOps/s | 219.8656 KOps/s | |
test_tc_second_layer_tensor | 58.8710μs | 2.8494μs | 350.9451 KOps/s | 358.4539 KOps/s | |
test_tc_second_layer_nontensor | 37.9910μs | 6.1511μs | 162.5727 KOps/s | 168.1433 KOps/s | |
test_unbind | 0.4892s | 14.5045ms | 68.9440 Ops/s | 64.8040 Ops/s | |
test_full_like | 10.5659ms | 8.1645ms | 122.4811 Ops/s | 118.0358 Ops/s | |
test_zeros_like | 14.2730ms | 6.0616ms | 164.9719 Ops/s | 317.1927 Ops/s | |
test_ones_like | 15.2806ms | 7.5565ms | 132.3366 Ops/s | 152.6570 Ops/s | |
test_clone | 16.6054ms | 9.6694ms | 103.4186 Ops/s | 118.2681 Ops/s | |
test_squeeze | 71.8130μs | 12.2792μs | 81.4388 KOps/s | 78.6157 KOps/s | |
test_unsqueeze | 0.1686ms | 93.7685μs | 10.6646 KOps/s | 10.8212 KOps/s | |
test_split | 0.5205ms | 0.2007ms | 4.9816 KOps/s | 5.0249 KOps/s | |
test_permute | 0.4426ms | 0.2238ms | 4.4685 KOps/s | 4.4287 KOps/s | |
test_stack | 29.8026ms | 26.5032ms | 37.7313 Ops/s | 34.9054 Ops/s | |
test_cat | 29.2317ms | 26.3324ms | 37.9760 Ops/s | 36.2594 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 0.1145ms | 13.1546μs | 76.0190 KOps/s | 71.4137 KOps/s | |
test_plain_set_stack_nested | 47.3100μs | 13.1903μs | 75.8132 KOps/s | 70.2214 KOps/s | |
test_plain_set_nested_inplace | 51.1010μs | 14.3970μs | 69.4588 KOps/s | 65.9397 KOps/s | |
test_plain_set_stack_nested_inplace | 49.9810μs | 14.2705μs | 70.0748 KOps/s | 66.6707 KOps/s | |
test_items | 41.2910μs | 2.8304μs | 353.3080 KOps/s | 347.9959 KOps/s | |
test_items_nested | 0.4225ms | 0.3252ms | 3.0747 KOps/s | 3.0622 KOps/s | |
test_items_nested_locked | 0.3851ms | 0.3277ms | 3.0517 KOps/s | 3.0460 KOps/s | |
test_items_nested_leaf | 90.6720μs | 55.7349μs | 17.9421 KOps/s | 18.0031 KOps/s | |
test_items_stack_nested | 0.3874ms | 0.3310ms | 3.0216 KOps/s | 3.0681 KOps/s | |
test_items_stack_nested_leaf | 89.7520μs | 57.0104μs | 17.5407 KOps/s | 17.4157 KOps/s | |
test_items_stack_nested_locked | 0.3939ms | 0.3309ms | 3.0220 KOps/s | 3.0571 KOps/s | |
test_keys | 22.8700μs | 3.4172μs | 292.6343 KOps/s | 290.6945 KOps/s | |
test_keys_nested | 98.2420μs | 54.9546μs | 18.1969 KOps/s | 18.1870 KOps/s | |
test_keys_nested_locked | 2.7468ms | 62.1644μs | 16.0864 KOps/s | 16.0486 KOps/s | |
test_keys_nested_leaf | 77.7310μs | 46.1247μs | 21.6804 KOps/s | 21.1811 KOps/s | |
test_keys_stack_nested | 86.3710μs | 56.8519μs | 17.5896 KOps/s | 17.8293 KOps/s | |
test_keys_stack_nested_leaf | 81.0210μs | 48.2556μs | 20.7230 KOps/s | 20.8535 KOps/s | |
test_keys_stack_nested_locked | 90.3610μs | 62.0325μs | 16.1206 KOps/s | 16.4276 KOps/s | |
test_values | 5.5700μs | 0.8432μs | 1.1859 MOps/s | 1.1878 MOps/s | |
test_values_nested | 70.1410μs | 40.8735μs | 24.4657 KOps/s | 24.5571 KOps/s | |
test_values_nested_locked | 69.7510μs | 42.8608μs | 23.3313 KOps/s | 23.4612 KOps/s | |
test_values_nested_leaf | 69.0010μs | 35.4304μs | 28.2244 KOps/s | 28.3247 KOps/s | |
test_values_stack_nested | 77.8510μs | 41.9282μs | 23.8503 KOps/s | 23.9592 KOps/s | |
test_values_stack_nested_leaf | 63.9510μs | 36.4313μs | 27.4489 KOps/s | 28.1475 KOps/s | |
test_values_stack_nested_locked | 75.5420μs | 43.7637μs | 22.8500 KOps/s | 23.2119 KOps/s | |
test_membership | 2.0890μs | 0.4996μs | 2.0017 MOps/s | 1.9757 MOps/s | |
test_membership_nested | 16.3155μs | 1.9040μs | 525.2064 KOps/s | 540.4452 KOps/s | |
test_membership_nested_leaf | 12.4037μs | 1.8857μs | 530.2970 KOps/s | 553.4693 KOps/s | |
test_membership_stacked_nested | 36.5310μs | 1.9749μs | 506.3464 KOps/s | 518.3583 KOps/s | |
test_membership_stacked_nested_leaf | 26.6210μs | 1.9602μs | 510.1465 KOps/s | 513.1429 KOps/s | |
test_membership_nested_last | 27.7900μs | 2.7690μs | 361.1423 KOps/s | 363.4025 KOps/s | |
test_membership_nested_leaf_last | 38.7310μs | 2.7541μs | 363.0896 KOps/s | 363.6181 KOps/s | |
test_membership_stacked_nested_last | 26.3610μs | 3.1835μs | 314.1219 KOps/s | 292.3764 KOps/s | |
test_membership_stacked_nested_leaf_last | 35.8110μs | 3.1464μs | 317.8228 KOps/s | 295.6571 KOps/s | |
test_nested_getleaf | 39.3300μs | 6.0314μs | 165.7980 KOps/s | 165.2576 KOps/s | |
test_nested_get | 33.8510μs | 5.7449μs | 174.0681 KOps/s | 174.6679 KOps/s | |
test_stacked_getleaf | 29.5000μs | 6.0021μs | 166.6081 KOps/s | 165.4278 KOps/s | |
test_stacked_get | 52.5510μs | 5.6198μs | 177.9423 KOps/s | 173.3376 KOps/s | |
test_nested_getitemleaf | 26.0400μs | 6.1200μs | 163.3986 KOps/s | 162.6570 KOps/s | |
test_nested_getitem | 40.7610μs | 5.5771μs | 179.3031 KOps/s | 174.1781 KOps/s | |
test_stacked_getitemleaf | 25.7600μs | 6.0649μs | 164.8839 KOps/s | 161.9825 KOps/s | |
test_stacked_getitem | 30.5600μs | 5.7092μs | 175.1559 KOps/s | 171.1262 KOps/s | |
test_lock_nested | 7.1320ms | 0.4162ms | 2.4026 KOps/s | 2.4070 KOps/s | |
test_lock_stack_nested | 0.4279ms | 0.3724ms | 2.6856 KOps/s | 2.7359 KOps/s | |
test_unlock_nested | 0.7601ms | 0.3507ms | 2.8517 KOps/s | 2.8458 KOps/s | |
test_unlock_stack_nested | 0.3703ms | 0.3112ms | 3.2138 KOps/s | 3.2678 KOps/s | |
test_flatten_speed | 0.1500ms | 69.3665μs | 14.4162 KOps/s | 14.3674 KOps/s | |
test_unflatten_speed | 0.3727ms | 0.2836ms | 3.5257 KOps/s | 3.5611 KOps/s | |
test_common_ops | 1.5793ms | 1.2223ms | 818.1553 Ops/s | 802.7999 Ops/s | |
test_creation | 21.3610μs | 1.4651μs | 682.5297 KOps/s | 663.9703 KOps/s | |
test_creation_empty | 47.6610μs | 14.3022μs | 69.9195 KOps/s | 63.5265 KOps/s | |
test_creation_nested_1 | 46.1310μs | 15.8340μs | 63.1552 KOps/s | 57.5233 KOps/s | |
test_creation_nested_2 | 35.9510μs | 19.2793μs | 51.8690 KOps/s | 50.0243 KOps/s | |
test_clone | 81.4420μs | 29.1067μs | 34.3564 KOps/s | 34.9490 KOps/s | |
test_getitem[int] | 1.1638ms | 15.3803μs | 65.0182 KOps/s | 65.0209 KOps/s | |
test_getitem[slice_int] | 0.1224ms | 26.8588μs | 37.2318 KOps/s | 36.7694 KOps/s | |
test_getitem[range] | 0.2476ms | 0.1146ms | 8.7274 KOps/s | 9.1287 KOps/s | |
test_getitem[tuple] | 0.1258ms | 22.7648μs | 43.9275 KOps/s | 43.6401 KOps/s | |
test_getitem[list] | 0.1942ms | 0.1019ms | 9.8117 KOps/s | 10.1879 KOps/s | |
test_setitem_dim[int] | 72.3010μs | 46.5574μs | 21.4789 KOps/s | 22.8000 KOps/s | |
test_setitem_dim[slice_int] | 0.1047ms | 67.4632μs | 14.8229 KOps/s | 15.0302 KOps/s | |
test_setitem_dim[range] | 0.1844ms | 0.1304ms | 7.6701 KOps/s | 7.8956 KOps/s | |
test_setitem_dim[tuple] | 98.6720μs | 61.7349μs | 16.1983 KOps/s | 16.6405 KOps/s | |
test_setitem | 87.1720μs | 42.8892μs | 23.3159 KOps/s | 24.2138 KOps/s | |
test_set | 83.1110μs | 39.9674μs | 25.0204 KOps/s | 24.9957 KOps/s | |
test_set_shared | 0.3526ms | 50.5925μs | 19.7658 KOps/s | 20.0423 KOps/s | |
test_update | 0.1154ms | 49.4000μs | 20.2429 KOps/s | 20.4959 KOps/s | |
test_update_nested | 0.1035ms | 60.6841μs | 16.4788 KOps/s | 17.8198 KOps/s | |
test_update__nested | 0.1182ms | 66.3291μs | 15.0763 KOps/s | 17.0503 KOps/s | |
test_set_nested | 98.5110μs | 46.4157μs | 21.5444 KOps/s | 23.2417 KOps/s | |
test_set_nested_new | 92.2810μs | 50.5330μs | 19.7890 KOps/s | 21.4959 KOps/s | |
test_select | 0.1096ms | 64.0390μs | 15.6155 KOps/s | 16.9081 KOps/s | |
test_select_nested | 78.1720μs | 42.2739μs | 23.6553 KOps/s | 23.9697 KOps/s | |
test_exclude_nested | 98.0820μs | 58.9630μs | 16.9598 KOps/s | 17.0628 KOps/s | |
test_empty[True] | 0.3139ms | 0.2449ms | 4.0827 KOps/s | 4.1626 KOps/s | |
test_empty[False] | 4.1950μs | 0.7459μs | 1.3406 MOps/s | 1.3590 MOps/s | |
test_to | 59.1610μs | 24.6283μs | 40.6037 KOps/s | 39.6879 KOps/s | |
test_to_nonblocking | 59.4210μs | 23.5039μs | 42.5461 KOps/s | 42.6129 KOps/s | |
test_unbind_speed | 1.3299ms | 0.2741ms | 3.6482 KOps/s | 3.7045 KOps/s | |
test_unbind_speed_stack0 | 0.3318ms | 0.2708ms | 3.6934 KOps/s | 3.7807 KOps/s | |
test_unbind_speed_stack1 | 93.0856ms | 0.7013ms | 1.4259 KOps/s | 1.4491 KOps/s | |
test_split | 94.3487ms | 2.1587ms | 463.2427 Ops/s | 467.4006 Ops/s | |
test_chunk | 96.3650ms | 2.1595ms | 463.0622 Ops/s | 465.1197 Ops/s | |
test_creation[device0] | 0.3718ms | 0.1252ms | 7.9884 KOps/s | 7.9294 KOps/s | |
test_creation_from_tensor | 0.3962ms | 0.1288ms | 7.7615 KOps/s | 7.8397 KOps/s | |
test_add_one[memmap_tensor0] | 0.1350ms | 8.7522μs | 114.2569 KOps/s | 119.7562 KOps/s | |
test_contiguous[memmap_tensor0] | 19.9500μs | 2.1373μs | 467.8725 KOps/s | 469.6891 KOps/s | |
test_stack[memmap_tensor0] | 35.3510μs | 6.4107μs | 155.9892 KOps/s | 153.3992 KOps/s | |
test_memmaptd_index | 1.2118ms | 0.4099ms | 2.4396 KOps/s | 2.4466 KOps/s | |
test_memmaptd_index_astensor | 0.7448ms | 0.4654ms | 2.1487 KOps/s | 2.1282 KOps/s | |
test_memmaptd_index_op | 1.4118ms | 0.9711ms | 1.0298 KOps/s | 985.2126 Ops/s | |
test_serialize_model | 0.1319s | 0.1308s | 7.6472 Ops/s | 7.7175 Ops/s | |
test_serialize_model_pickle | 1.3791s | 1.2191s | 0.8203 Ops/s | 0.8430 Ops/s | |
test_serialize_weights | 0.1312s | 0.1299s | 7.6997 Ops/s | 7.7108 Ops/s | |
test_serialize_weights_returnearly | 0.2315s | 56.0311ms | 17.8472 Ops/s | 16.0578 Ops/s | |
test_serialize_weights_pickle | 1.3733s | 1.2170s | 0.8217 Ops/s | 0.8149 Ops/s | |
test_reshape_pytree | 80.1120μs | 34.9406μs | 28.6200 KOps/s | 28.5097 KOps/s | |
test_reshape_td | 85.2520μs | 40.8070μs | 24.5056 KOps/s | 25.6015 KOps/s | |
test_view_pytree | 69.4610μs | 35.0405μs | 28.5384 KOps/s | 29.5658 KOps/s | |
test_view_td | 80.0010μs | 46.8296μs | 21.3540 KOps/s | 22.4203 KOps/s | |
test_unbind_pytree | 67.9510μs | 33.8494μs | 29.5426 KOps/s | 30.1146 KOps/s | |
test_unbind_td | 0.5721ms | 41.5681μs | 24.0569 KOps/s | 23.7373 KOps/s | |
test_split_pytree | 0.1702ms | 46.2471μs | 21.6230 KOps/s | 21.5242 KOps/s | |
test_split_td | 0.7411ms | 55.4193μs | 18.0442 KOps/s | 18.2720 KOps/s | |
test_add_pytree | 0.1035ms | 54.9913μs | 18.1847 KOps/s | 18.4986 KOps/s | |
test_add_td | 0.1534ms | 87.3919μs | 11.4427 KOps/s | 11.6258 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.4133ms | 0.2091ms | 4.7825 KOps/s | 4.7762 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.1919ms | 0.1526ms | 6.5510 KOps/s | 6.6965 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1823ms | 0.1437ms | 6.9612 KOps/s | 6.8174 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.2587ms | 0.1803ms | 5.5461 KOps/s | 5.3160 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 0.1383ms | 21.4026μs | 46.7233 KOps/s | 48.7784 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.2427ms | 43.6008μs | 22.9353 KOps/s | 23.3232 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.2596ms | 64.3320μs | 15.5444 KOps/s | 15.7309 KOps/s | |
test_compile_copy_nested[pytree-eager] | 81.6110μs | 49.0732μs | 20.3777 KOps/s | 20.3146 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.4496ms | 0.3146ms | 3.1782 KOps/s | 3.1893 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.2779ms | 0.2096ms | 4.7714 KOps/s | 4.7979 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1723ms | 0.1262ms | 7.9254 KOps/s | 7.9812 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.1907ms | 62.1175μs | 16.0985 KOps/s | 16.8227 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.4404ms | 0.3224ms | 3.1021 KOps/s | 3.2077 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.7765ms | 0.6282ms | 1.5919 KOps/s | 1.6540 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.3358ms | 0.2517ms | 3.9732 KOps/s | 4.0276 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.4262ms | 0.3217ms | 3.1087 KOps/s | 3.2172 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.1870ms | 72.4741μs | 13.7980 KOps/s | 14.6071 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.2279ms | 0.1314ms | 7.6110 KOps/s | 7.6118 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.6218ms | 0.5144ms | 1.9438 KOps/s | 1.9058 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.4723ms | 0.3229ms | 3.0967 KOps/s | 3.1806 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 0.1114ms | 17.7718μs | 56.2691 KOps/s | 61.8656 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 0.1463ms | 27.6772μs | 36.1309 KOps/s | 36.8348 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1569ms | 70.5522μs | 14.1739 KOps/s | 14.2416 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1344ms | 51.9124μs | 19.2632 KOps/s | 19.5367 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 2.3147ms | 0.8204ms | 1.2190 KOps/s | 1.1197 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 3.3250ms | 3.1855ms | 313.9199 Ops/s | 306.1032 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 2.2800ms | 0.8036ms | 1.2444 KOps/s | 1.1525 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 3.2044ms | 3.0688ms | 325.8639 Ops/s | 321.0347 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.1648ms | 0.1083ms | 9.2360 KOps/s | 8.9934 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.1931ms | 57.4783μs | 17.3979 KOps/s | 16.4475 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.1365ms | 0.1020ms | 9.8062 KOps/s | 9.7525 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 81.3610μs | 41.3642μs | 24.1755 KOps/s | 23.5466 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.1406ms | 0.1024ms | 9.7688 KOps/s | 9.7116 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 75.3510μs | 41.4351μs | 24.1341 KOps/s | 23.9019 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1817ms | 0.1360ms | 7.3533 KOps/s | 7.3050 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.1613ms | 24.0799μs | 41.5285 KOps/s | 40.6030 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.1624ms | 0.1292ms | 7.7390 KOps/s | 7.6981 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 61.1310μs | 20.1705μs | 49.5772 KOps/s | 47.8002 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1841ms | 0.1301ms | 7.6868 KOps/s | 7.6189 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 65.4410μs | 20.2421μs | 49.4019 KOps/s | 48.1993 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.1723ms | 0.1368ms | 7.3112 KOps/s | 7.2441 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.4660ms | 26.2413μs | 38.1079 KOps/s | 40.4239 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.1724ms | 0.1303ms | 7.6722 KOps/s | 7.5999 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 47.8810μs | 20.0550μs | 49.8628 KOps/s | 47.8845 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.1696ms | 0.1304ms | 7.6673 KOps/s | 7.5855 KOps/s | |
test_compile_indexing[int-pytree-eager] | 58.3110μs | 19.9466μs | 50.1339 KOps/s | 47.8637 KOps/s | |
test_mod_add[eager] | 72.2210μs | 29.8631μs | 33.4861 KOps/s | 31.8670 KOps/s | |
test_mod_add[compile] | 0.3695ms | 67.7558μs | 14.7589 KOps/s | 14.3162 KOps/s | |
test_mod_add[compile-overhead] | 0.2729ms | 0.1377ms | 7.2611 KOps/s | 6.8720 KOps/s | |
test_mod_wrap[eager] | 0.3168ms | 0.2366ms | 4.2271 KOps/s | 4.1091 KOps/s | |
test_mod_wrap[compile] | 1.6401ms | 0.2965ms | 3.3731 KOps/s | 3.4403 KOps/s | |
test_mod_wrap[compile-overhead] | 7.5070ms | 4.0424ms | 247.3793 Ops/s | 246.8375 Ops/s | |
test_mod_wrap_and_backward[eager] | 1.5488ms | 1.4356ms | 696.5648 Ops/s | 676.4913 Ops/s | |
test_mod_wrap_and_backward[compile] | 1.7403ms | 1.4148ms | 706.7993 Ops/s | 698.3815 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 1.5656ms | 1.0250ms | 975.6551 Ops/s | 999.3087 Ops/s | |
test_seq_add[eager] | 0.1458ms | 91.4855μs | 10.9307 KOps/s | 9.8934 KOps/s | |
test_seq_add[compile] | 0.1699ms | 77.4123μs | 12.9178 KOps/s | 12.0244 KOps/s | |
test_seq_add[compile-overhead] | 0.1661ms | 0.1141ms | 8.7644 KOps/s | 8.5466 KOps/s | |
test_seq_wrap[eager] | 0.4475ms | 0.3874ms | 2.5816 KOps/s | 2.5016 KOps/s | |
test_seq_wrap[compile] | 0.3449ms | 0.3059ms | 3.2692 KOps/s | 3.0756 KOps/s | |
test_seq_wrap[compile-overhead] | 0.2622ms | 0.2179ms | 4.5888 KOps/s | 4.4091 KOps/s | |
test_func_call_runtime[False-eager] | 0.8656ms | 0.7311ms | 1.3679 KOps/s | 1.2887 KOps/s | |
test_func_call_runtime[False-compile] | 0.8968ms | 0.7992ms | 1.2513 KOps/s | 1.2893 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.4152ms | 0.3631ms | 2.7543 KOps/s | 2.8113 KOps/s | |
test_func_call_runtime[True-eager] | 1.0676ms | 0.9451ms | 1.0581 KOps/s | 1.1276 KOps/s | |
test_func_call_runtime[True-compile] | 0.9626ms | 0.8240ms | 1.2136 KOps/s | 1.2321 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.4398ms | 0.3865ms | 2.5872 KOps/s | 2.6257 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.8964ms | 0.7714ms | 1.2963 KOps/s | 1.2916 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.9046ms | 0.8031ms | 1.2452 KOps/s | 1.2438 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.4693ms | 0.3596ms | 2.7806 KOps/s | 2.8172 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.1444ms | 1.0361ms | 965.1671 Ops/s | 962.0295 Ops/s | |
test_func_call_cm_runtime[True-compile] | 0.9689ms | 0.8517ms | 1.1741 KOps/s | 1.1651 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 0.4657ms | 0.4050ms | 2.4692 KOps/s | 2.4120 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.6801ms | 2.1010ms | 475.9556 Ops/s | 466.5701 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 0.9635ms | 0.8622ms | 1.1599 KOps/s | 1.1594 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.5443ms | 0.4072ms | 2.4556 KOps/s | 2.4717 KOps/s | |
test_distributed | 0.6176ms | 0.1182ms | 8.4638 KOps/s | 8.8873 KOps/s | |
test_tdmodule | 0.1257ms | 14.5803μs | 68.5855 KOps/s | 63.6446 KOps/s | |
test_tdmodule_dispatch | 67.9210μs | 27.7659μs | 36.0154 KOps/s | 32.9582 KOps/s | |
test_tdseq | 48.7910μs | 15.1907μs | 65.8296 KOps/s | 62.7382 KOps/s | |
test_tdseq_dispatch | 56.6610μs | 31.0979μs | 32.1565 KOps/s | 30.3163 KOps/s | |
test_instantiation_functorch | 2.0256ms | 1.8810ms | 531.6377 Ops/s | 526.3941 Ops/s | |
test_instantiation_td | 1.8021ms | 1.1979ms | 834.7938 Ops/s | 829.4163 Ops/s | |
test_exec_functorch | 0.2660ms | 0.2108ms | 4.7448 KOps/s | 4.9206 KOps/s | |
test_exec_functional_call | 0.3091ms | 0.2185ms | 4.5767 KOps/s | 4.8836 KOps/s | |
test_exec_td | 0.3575ms | 0.2203ms | 4.5386 KOps/s | 4.7982 KOps/s | |
test_exec_td_decorator | 0.9853ms | 0.2645ms | 3.7810 KOps/s | 3.9281 KOps/s | |
test_vmap_mlp_speed[True-True] | 0.8075ms | 0.7054ms | 1.4176 KOps/s | 1.4570 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.8261ms | 0.7047ms | 1.4190 KOps/s | 1.4739 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.6963ms | 0.5978ms | 1.6728 KOps/s | 1.7213 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.7206ms | 0.5900ms | 1.6950 KOps/s | 1.7567 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.3187ms | 0.6720ms | 1.4881 KOps/s | 1.5009 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8132ms | 0.6724ms | 1.4872 KOps/s | 1.4947 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7228ms | 0.5838ms | 1.7129 KOps/s | 1.6875 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7271ms | 0.5986ms | 1.6706 KOps/s | 1.6407 KOps/s | |
test_vmap_transformer_speed[True-True] | 8.3885ms | 8.2537ms | 121.1581 Ops/s | 120.4660 Ops/s | |
test_vmap_transformer_speed[True-False] | 8.5415ms | 8.2492ms | 121.2237 Ops/s | 120.8372 Ops/s | |
test_vmap_transformer_speed[False-True] | 8.1747ms | 8.0494ms | 124.2323 Ops/s | 124.2361 Ops/s | |
test_vmap_transformer_speed[False-False] | 8.3312ms | 8.0460ms | 124.2848 Ops/s | 124.1249 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 19.5177ms | 19.3611ms | 51.6500 Ops/s | 51.8932 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 20.0367ms | 19.4177ms | 51.4995 Ops/s | 51.9250 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 19.4459ms | 19.2239ms | 52.0187 Ops/s | 52.3489 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 20.4899ms | 19.2853ms | 51.8529 Ops/s | 52.2974 Ops/s | |
test_to_module_speed[True] | 1.3439ms | 0.9198ms | 1.0871 KOps/s | 1.0830 KOps/s | |
test_to_module_speed[False] | 1.3025ms | 0.8988ms | 1.1125 KOps/s | 1.1176 KOps/s | |
test_tc_init | 0.1198ms | 31.4656μs | 31.7807 KOps/s | 29.0668 KOps/s | |
test_tc_init_nested | 0.1734ms | 63.6310μs | 15.7156 KOps/s | 14.3309 KOps/s | |
test_tc_first_layer_tensor | 12.4431μs | 0.6677μs | 1.4977 MOps/s | 1.4909 MOps/s | |
test_tc_first_layer_nontensor | 99.6810μs | 2.2262μs | 449.2058 KOps/s | 453.2649 KOps/s | |
test_tc_second_layer_tensor | 21.4878μs | 1.3665μs | 731.7778 KOps/s | 734.5749 KOps/s | |
test_tc_second_layer_nontensor | 82.6210μs | 2.9445μs | 339.6146 KOps/s | 338.7629 KOps/s | |
test_unbind | 0.1981s | 12.2542ms | 81.6050 Ops/s | 93.1674 Ops/s | |
test_full_like | 0.6553ms | 0.5741ms | 1.7420 KOps/s | 1.7360 KOps/s | |
test_zeros_like | 0.2751ms | 0.1979ms | 5.0527 KOps/s | 5.0552 KOps/s | |
test_ones_like | 0.2810ms | 0.1977ms | 5.0582 KOps/s | 5.0533 KOps/s | |
test_clone | 1.2233ms | 0.4144ms | 2.4131 KOps/s | 2.4181 KOps/s | |
test_squeeze | 32.8110μs | 9.5760μs | 104.4273 KOps/s | 102.5455 KOps/s | |
test_unsqueeze | 0.2192ms | 71.4335μs | 13.9990 KOps/s | 13.2393 KOps/s | |
test_split | 0.4303ms | 0.1548ms | 6.4613 KOps/s | 6.3754 KOps/s | |
test_permute | 0.2084ms | 0.1732ms | 5.7732 KOps/s | 5.5829 KOps/s | |
test_stack | 1.2820ms | 0.8981ms | 1.1134 KOps/s | 1.1604 KOps/s | |
test_cat | 1.2572ms | 1.2312ms | 812.2005 Ops/s | 811.6273 Ops/s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
enhancement
New feature or request
Quality
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.