Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] better sync and instantiation of cudagraphs #1013

Merged
merged 7 commits into from
Sep 30, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Sep 30, 2024

No description provided.

ghstack-source-id: d12b596cce3db900ca584d0956cef03105db510f
Pull Request resolved: #1011
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 30, 2024
@vmoens vmoens added enhancement New feature or request Quality labels Sep 30, 2024
Copy link

github-actions bot commented Sep 30, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 222. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}21$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 75.9510μs 20.8001μs 48.0768 KOps/s 50.7321 KOps/s $\textbf{\color{#d91a1a}-5.23\%}$
test_plain_set_stack_nested 57.4580μs 20.5283μs 48.7132 KOps/s 50.0294 KOps/s $\color{#d91a1a}-2.63\%$
test_plain_set_nested_inplace 81.2310μs 22.1863μs 45.0729 KOps/s 45.7433 KOps/s $\color{#d91a1a}-1.47\%$
test_plain_set_stack_nested_inplace 59.8820μs 22.5394μs 44.3667 KOps/s 46.5999 KOps/s $\color{#d91a1a}-4.79\%$
test_items 44.8940μs 4.1201μs 242.7117 KOps/s 243.8054 KOps/s $\color{#d91a1a}-0.45\%$
test_items_nested 0.6724ms 0.3657ms 2.7348 KOps/s 2.7611 KOps/s $\color{#d91a1a}-0.95\%$
test_items_nested_locked 0.5893ms 0.3676ms 2.7203 KOps/s 2.7386 KOps/s $\color{#d91a1a}-0.67\%$
test_items_nested_leaf 0.1263ms 69.5355μs 14.3811 KOps/s 14.7118 KOps/s $\color{#d91a1a}-2.25\%$
test_items_stack_nested 0.6441ms 0.3704ms 2.7001 KOps/s 2.7382 KOps/s $\color{#d91a1a}-1.39\%$
test_items_stack_nested_leaf 0.1741ms 72.9856μs 13.7013 KOps/s 14.0434 KOps/s $\color{#d91a1a}-2.44\%$
test_items_stack_nested_locked 0.6210ms 0.3725ms 2.6847 KOps/s 2.7422 KOps/s $\color{#d91a1a}-2.10\%$
test_keys 53.0910μs 3.5419μs 282.3346 KOps/s 287.8711 KOps/s $\color{#d91a1a}-1.92\%$
test_keys_nested 0.1546ms 0.1009ms 9.9067 KOps/s 9.8705 KOps/s $\color{#35bf28}+0.37\%$
test_keys_nested_locked 0.6963ms 0.1065ms 9.3911 KOps/s 9.3930 KOps/s $\color{#d91a1a}-0.02\%$
test_keys_nested_leaf 0.1436ms 83.8502μs 11.9260 KOps/s 11.7913 KOps/s $\color{#35bf28}+1.14\%$
test_keys_stack_nested 0.2021ms 0.1013ms 9.8691 KOps/s 9.7950 KOps/s $\color{#35bf28}+0.76\%$
test_keys_stack_nested_leaf 0.1401ms 82.3233μs 12.1472 KOps/s 11.6353 KOps/s $\color{#35bf28}+4.40\%$
test_keys_stack_nested_locked 0.1729ms 0.1050ms 9.5276 KOps/s 9.3013 KOps/s $\color{#35bf28}+2.43\%$
test_values 6.9328μs 1.0821μs 924.0925 KOps/s 980.4459 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_values_nested 0.1243ms 73.6632μs 13.5753 KOps/s 13.6180 KOps/s $\color{#d91a1a}-0.31\%$
test_values_nested_locked 0.1260ms 73.5766μs 13.5913 KOps/s 13.5671 KOps/s $\color{#35bf28}+0.18\%$
test_values_nested_leaf 0.1206ms 62.7534μs 15.9354 KOps/s 16.1928 KOps/s $\color{#d91a1a}-1.59\%$
test_values_stack_nested 0.1252ms 74.1892μs 13.4791 KOps/s 13.5026 KOps/s $\color{#d91a1a}-0.17\%$
test_values_stack_nested_leaf 0.1083ms 60.2811μs 16.5889 KOps/s 15.8408 KOps/s $\color{#35bf28}+4.72\%$
test_values_stack_nested_locked 0.1280ms 74.0579μs 13.5030 KOps/s 13.3820 KOps/s $\color{#35bf28}+0.90\%$
test_membership 5.6060μs 0.7375μs 1.3559 MOps/s 1.3993 MOps/s $\color{#d91a1a}-3.10\%$
test_membership_nested 29.6350μs 2.7725μs 360.6865 KOps/s 360.4876 KOps/s $\color{#35bf28}+0.06\%$
test_membership_nested_leaf 25.9280μs 2.8116μs 355.6664 KOps/s 361.8073 KOps/s $\color{#d91a1a}-1.70\%$
test_membership_stacked_nested 25.2870μs 2.7904μs 358.3728 KOps/s 364.9851 KOps/s $\color{#d91a1a}-1.81\%$
test_membership_stacked_nested_leaf 21.0890μs 2.8152μs 355.2141 KOps/s 364.6970 KOps/s $\color{#d91a1a}-2.60\%$
test_membership_nested_last 29.1340μs 4.0657μs 245.9599 KOps/s 250.1414 KOps/s $\color{#d91a1a}-1.67\%$
test_membership_nested_leaf_last 34.2340μs 4.0594μs 246.3428 KOps/s 251.2361 KOps/s $\color{#d91a1a}-1.95\%$
test_membership_stacked_nested_last 57.6370μs 13.1865μs 75.8354 KOps/s 253.2746 KOps/s $\textbf{\color{#d91a1a}-70.06\%}$
test_membership_stacked_nested_leaf_last 37.8400μs 13.2308μs 75.5811 KOps/s 252.0183 KOps/s $\textbf{\color{#d91a1a}-70.01\%}$
test_nested_getleaf 50.3530μs 10.7214μs 93.2718 KOps/s 95.3846 KOps/s $\color{#d91a1a}-2.22\%$
test_nested_get 40.6050μs 10.2226μs 97.8227 KOps/s 99.9076 KOps/s $\color{#d91a1a}-2.09\%$
test_stacked_getleaf 39.4840μs 10.9375μs 91.4283 KOps/s 94.3651 KOps/s $\color{#d91a1a}-3.11\%$
test_stacked_get 45.6950μs 10.0671μs 99.3339 KOps/s 98.0864 KOps/s $\color{#35bf28}+1.27\%$
test_nested_getitemleaf 52.7260μs 11.0438μs 90.5489 KOps/s 91.6781 KOps/s $\color{#d91a1a}-1.23\%$
test_nested_getitem 42.1580μs 10.6054μs 94.2916 KOps/s 97.6586 KOps/s $\color{#d91a1a}-3.45\%$
test_stacked_getitemleaf 56.5360μs 10.9709μs 91.1501 KOps/s 91.4678 KOps/s $\color{#d91a1a}-0.35\%$
test_stacked_getitem 60.0400μs 10.6210μs 94.1529 KOps/s 95.7942 KOps/s $\color{#d91a1a}-1.71\%$
test_lock_nested 97.3995ms 0.5964ms 1.6766 KOps/s 1.9966 KOps/s $\textbf{\color{#d91a1a}-16.03\%}$
test_lock_stack_nested 0.5428ms 0.4455ms 2.2449 KOps/s 2.1554 KOps/s $\color{#35bf28}+4.16\%$
test_unlock_nested 98.2050ms 0.5116ms 1.9546 KOps/s 2.3773 KOps/s $\textbf{\color{#d91a1a}-17.78\%}$
test_unlock_stack_nested 0.4603ms 0.3609ms 2.7706 KOps/s 2.6067 KOps/s $\textbf{\color{#35bf28}+6.29\%}$
test_flatten_speed 0.1754ms 90.6780μs 11.0280 KOps/s 11.4360 KOps/s $\color{#d91a1a}-3.57\%$
test_unflatten_speed 0.7547ms 0.4791ms 2.0874 KOps/s 2.1573 KOps/s $\color{#d91a1a}-3.24\%$
test_common_ops 5.1349ms 1.1675ms 856.5629 Ops/s 881.1135 Ops/s $\color{#d91a1a}-2.79\%$
test_creation 27.8420μs 2.1098μs 473.9783 KOps/s 485.2424 KOps/s $\color{#d91a1a}-2.32\%$
test_creation_empty 51.7960μs 17.5536μs 56.9685 KOps/s 59.8439 KOps/s $\color{#d91a1a}-4.80\%$
test_creation_nested_1 61.0140μs 20.4842μs 48.8181 KOps/s 49.8920 KOps/s $\color{#d91a1a}-2.15\%$
test_creation_nested_2 60.0520μs 25.5412μs 39.1524 KOps/s 40.7313 KOps/s $\color{#d91a1a}-3.88\%$
test_clone 0.3770ms 17.2496μs 57.9723 KOps/s 57.6422 KOps/s $\color{#35bf28}+0.57\%$
test_getitem[int] 0.6604ms 17.7159μs 56.4464 KOps/s 58.6095 KOps/s $\color{#d91a1a}-3.69\%$
test_getitem[slice_int] 0.1353ms 32.6457μs 30.6319 KOps/s 32.2090 KOps/s $\color{#d91a1a}-4.90\%$
test_getitem[range] 0.1774ms 61.0101μs 16.3907 KOps/s 16.4984 KOps/s $\color{#d91a1a}-0.65\%$
test_getitem[tuple] 0.1402ms 26.6561μs 37.5148 KOps/s 38.7592 KOps/s $\color{#d91a1a}-3.21\%$
test_getitem[list] 0.5553ms 56.8491μs 17.5904 KOps/s 17.8335 KOps/s $\color{#d91a1a}-1.36\%$
test_setitem_dim[int] 80.8910μs 34.8353μs 28.7065 KOps/s 29.8642 KOps/s $\color{#d91a1a}-3.88\%$
test_setitem_dim[slice_int] 0.1158ms 64.0303μs 15.6176 KOps/s 15.2459 KOps/s $\color{#35bf28}+2.44\%$
test_setitem_dim[range] 0.1710ms 88.6989μs 11.2741 KOps/s 11.3927 KOps/s $\color{#d91a1a}-1.04\%$
test_setitem_dim[tuple] 0.1171ms 53.7709μs 18.5974 KOps/s 19.9277 KOps/s $\textbf{\color{#d91a1a}-6.68\%}$
test_setitem 0.3872ms 29.9121μs 33.4313 KOps/s 33.8729 KOps/s $\color{#d91a1a}-1.30\%$
test_set 0.3678ms 28.4233μs 35.1824 KOps/s 34.8321 KOps/s $\color{#35bf28}+1.01\%$
test_set_shared 3.3666ms 0.2208ms 4.5280 KOps/s 4.5524 KOps/s $\color{#d91a1a}-0.54\%$
test_update 0.3385ms 35.8435μs 27.8990 KOps/s 27.7882 KOps/s $\color{#35bf28}+0.40\%$
test_update_nested 0.3925ms 46.7282μs 21.4004 KOps/s 21.5400 KOps/s $\color{#d91a1a}-0.65\%$
test_update__nested 0.3967ms 35.1300μs 28.4657 KOps/s 28.5458 KOps/s $\color{#d91a1a}-0.28\%$
test_set_nested 0.3414ms 31.0999μs 32.1544 KOps/s 32.1246 KOps/s $\color{#35bf28}+0.09\%$
test_set_nested_new 0.3754ms 38.0387μs 26.2890 KOps/s 27.4331 KOps/s $\color{#d91a1a}-4.17\%$
test_select 0.4082ms 55.2550μs 18.0979 KOps/s 18.3584 KOps/s $\color{#d91a1a}-1.42\%$
test_select_nested 0.1408ms 60.6033μs 16.5008 KOps/s 16.9204 KOps/s $\color{#d91a1a}-2.48\%$
test_exclude_nested 0.1507ms 76.0698μs 13.1458 KOps/s 13.4374 KOps/s $\color{#d91a1a}-2.17\%$
test_empty[True] 1.0859ms 0.3254ms 3.0736 KOps/s 3.1022 KOps/s $\color{#d91a1a}-0.92\%$
test_empty[False] 8.3780μs 1.2482μs 801.1698 KOps/s 829.3826 KOps/s $\color{#d91a1a}-3.40\%$
test_unbind_speed 0.3818ms 0.3041ms 3.2886 KOps/s 3.2625 KOps/s $\color{#35bf28}+0.80\%$
test_unbind_speed_stack0 0.5818ms 0.2893ms 3.4566 KOps/s 3.3388 KOps/s $\color{#35bf28}+3.53\%$
test_unbind_speed_stack1 0.1022s 0.7964ms 1.2557 KOps/s 1.4574 KOps/s $\textbf{\color{#d91a1a}-13.84\%}$
test_split 3.1457ms 2.0615ms 485.0724 Ops/s 456.7281 Ops/s $\textbf{\color{#35bf28}+6.21\%}$
test_chunk 0.1025s 2.2731ms 439.9201 Ops/s 456.7179 Ops/s $\color{#d91a1a}-3.68\%$
test_creation[device0] 0.2710ms 0.1191ms 8.3931 KOps/s 8.3380 KOps/s $\color{#35bf28}+0.66\%$
test_creation_from_tensor 3.5683ms 0.1200ms 8.3330 KOps/s 8.2538 KOps/s $\color{#35bf28}+0.96\%$
test_add_one[memmap_tensor0] 0.6545ms 7.6524μs 130.6785 KOps/s 137.9640 KOps/s $\textbf{\color{#d91a1a}-5.28\%}$
test_contiguous[memmap_tensor0] 24.0750μs 1.8669μs 535.6376 KOps/s 513.5518 KOps/s $\color{#35bf28}+4.30\%$
test_stack[memmap_tensor0] 0.1258ms 5.7612μs 173.5746 KOps/s 171.6360 KOps/s $\color{#35bf28}+1.13\%$
test_memmaptd_index 1.1863ms 0.4146ms 2.4121 KOps/s 2.4731 KOps/s $\color{#d91a1a}-2.47\%$
test_memmaptd_index_astensor 0.7556ms 0.4900ms 2.0406 KOps/s 2.0532 KOps/s $\color{#d91a1a}-0.61\%$
test_memmaptd_index_op 1.7610ms 1.0450ms 956.9662 Ops/s 989.7771 Ops/s $\color{#d91a1a}-3.31\%$
test_serialize_model 0.2209s 0.1400s 7.1404 Ops/s 8.3259 Ops/s $\textbf{\color{#d91a1a}-14.24\%}$
test_serialize_model_pickle 0.4747s 0.3946s 2.5343 Ops/s 2.5269 Ops/s $\color{#35bf28}+0.29\%$
test_serialize_weights 0.1305s 0.1194s 8.3756 Ops/s 8.3300 Ops/s $\color{#35bf28}+0.55\%$
test_serialize_weights_returnearly 0.2605s 0.1729s 5.7846 Ops/s 6.2410 Ops/s $\textbf{\color{#d91a1a}-7.31\%}$
test_serialize_weights_pickle 0.4502s 0.3988s 2.5077 Ops/s 2.3279 Ops/s $\textbf{\color{#35bf28}+7.73\%}$
test_serialize_weights_filesystem 0.1509s 0.1445s 6.9181 Ops/s 6.8625 Ops/s $\color{#35bf28}+0.81\%$
test_serialize_model_filesystem 0.1606s 0.1528s 6.5438 Ops/s 5.9774 Ops/s $\textbf{\color{#35bf28}+9.48\%}$
test_reshape_pytree 83.2150μs 39.7709μs 25.1440 KOps/s 25.5039 KOps/s $\color{#d91a1a}-1.41\%$
test_reshape_td 0.1209ms 46.2590μs 21.6174 KOps/s 22.2850 KOps/s $\color{#d91a1a}-3.00\%$
test_view_pytree 79.5180μs 39.7882μs 25.1331 KOps/s 25.6331 KOps/s $\color{#d91a1a}-1.95\%$
test_view_td 0.1337ms 53.3852μs 18.7318 KOps/s 18.8463 KOps/s $\color{#d91a1a}-0.61\%$
test_unbind_pytree 81.1010μs 36.7487μs 27.2119 KOps/s 27.5695 KOps/s $\color{#d91a1a}-1.30\%$
test_unbind_td 0.2920ms 45.2945μs 22.0777 KOps/s 22.0417 KOps/s $\color{#35bf28}+0.16\%$
test_split_pytree 0.1120ms 39.8635μs 25.0856 KOps/s 26.4998 KOps/s $\textbf{\color{#d91a1a}-5.34\%}$
test_split_td 0.4517ms 59.0595μs 16.9321 KOps/s 17.7012 KOps/s $\color{#d91a1a}-4.35\%$
test_add_pytree 0.1149ms 46.6712μs 21.4265 KOps/s 21.9586 KOps/s $\color{#d91a1a}-2.42\%$
test_add_td 0.1675ms 83.6977μs 11.9478 KOps/s 12.2541 KOps/s $\color{#d91a1a}-2.50\%$
test_compile_add_one_nested[tensordict-compile] 0.1097ms 57.3176μs 17.4466 KOps/s 16.8204 KOps/s $\color{#35bf28}+3.72\%$
test_compile_add_one_nested[tensordict-eager] 0.3943ms 0.1846ms 5.4159 KOps/s 5.5192 KOps/s $\color{#d91a1a}-1.87\%$
test_compile_add_one_nested[pytree-compile] 0.1256ms 56.8667μs 17.5850 KOps/s 17.0718 KOps/s $\color{#35bf28}+3.01\%$
test_compile_add_one_nested[pytree-eager] 0.2874ms 0.1471ms 6.7960 KOps/s 7.0898 KOps/s $\color{#d91a1a}-4.15\%$
test_compile_copy_nested[tensordict-compile] 55.8840μs 21.0877μs 47.4209 KOps/s 46.9453 KOps/s $\color{#35bf28}+1.01\%$
test_compile_copy_nested[tensordict-eager] 0.1300ms 67.5963μs 14.7937 KOps/s 14.9844 KOps/s $\color{#d91a1a}-1.27\%$
test_compile_copy_nested[pytree-compile] 0.1580ms 76.7841μs 13.0235 KOps/s 13.4418 KOps/s $\color{#d91a1a}-3.11\%$
test_compile_copy_nested[pytree-eager] 0.1284ms 69.7734μs 14.3321 KOps/s 14.9228 KOps/s $\color{#d91a1a}-3.96\%$
test_compile_add_one_flat[tensordict-compile] 0.2703ms 0.1748ms 5.7216 KOps/s 5.6349 KOps/s $\color{#35bf28}+1.54\%$
test_compile_add_one_flat[tensordict-eager] 0.3476ms 0.1971ms 5.0741 KOps/s 5.1750 KOps/s $\color{#d91a1a}-1.95\%$
test_compile_add_one_flat[tensorclass-compile] 0.1160ms 47.3711μs 21.1099 KOps/s 19.9415 KOps/s $\textbf{\color{#35bf28}+5.86\%}$
test_compile_add_one_flat[tensorclass-eager] 0.1495ms 72.1403μs 13.8619 KOps/s 14.0120 KOps/s $\color{#d91a1a}-1.07\%$
test_compile_add_one_flat[pytree-compile] 0.2951ms 0.1779ms 5.6209 KOps/s 5.6541 KOps/s $\color{#d91a1a}-0.59\%$
test_compile_add_one_flat[pytree-eager] 0.5068ms 0.3014ms 3.3179 KOps/s 3.4049 KOps/s $\color{#d91a1a}-2.55\%$
test_compile_add_self_flat[tensordict-eager] 0.4241ms 0.2122ms 4.7115 KOps/s 4.9723 KOps/s $\textbf{\color{#d91a1a}-5.25\%}$
test_compile_add_self_flat[tensordict-compile] 0.3738ms 0.1775ms 5.6334 KOps/s 5.6543 KOps/s $\color{#d91a1a}-0.37\%$
test_compile_add_self_flat[tensorclass-eager] 0.1400ms 65.0401μs 15.3751 KOps/s 16.2785 KOps/s $\textbf{\color{#d91a1a}-5.55\%}$
test_compile_add_self_flat[tensorclass-compile] 0.1129ms 47.3405μs 21.1236 KOps/s 19.8788 KOps/s $\textbf{\color{#35bf28}+6.26\%}$
test_compile_add_self_flat[pytree-eager] 0.4285ms 0.2365ms 4.2279 KOps/s 4.2604 KOps/s $\color{#d91a1a}-0.76\%$
test_compile_add_self_flat[pytree-compile] 0.4086ms 0.1804ms 5.5431 KOps/s 5.5809 KOps/s $\color{#d91a1a}-0.68\%$
test_compile_copy_flat[tensordict-compile] 0.1891ms 0.1036ms 9.6494 KOps/s 9.4521 KOps/s $\color{#35bf28}+2.09\%$
test_compile_copy_flat[tensordict-eager] 0.1178ms 57.9392μs 17.2595 KOps/s 17.4645 KOps/s $\color{#d91a1a}-1.17\%$
test_compile_copy_flat[pytree-compile] 0.1529ms 77.0701μs 12.9752 KOps/s 13.2221 KOps/s $\color{#d91a1a}-1.87\%$
test_compile_copy_flat[pytree-eager] 0.1422ms 69.4458μs 14.3997 KOps/s 14.6270 KOps/s $\color{#d91a1a}-1.55\%$
test_compile_assign_and_add[tensordict-compile] 0.2638ms 0.1929ms 5.1853 KOps/s 5.1322 KOps/s $\color{#35bf28}+1.04\%$
test_compile_assign_and_add[tensordict-eager] 2.6254ms 1.7293ms 578.2753 Ops/s 590.7948 Ops/s $\color{#d91a1a}-2.12\%$
test_compile_assign_and_add[pytree-compile] 0.2887ms 0.1924ms 5.1986 KOps/s 5.1396 KOps/s $\color{#35bf28}+1.15\%$
test_compile_assign_and_add[pytree-eager] 1.3555ms 1.1396ms 877.5318 Ops/s 896.1605 Ops/s $\color{#d91a1a}-2.08\%$
test_compile_assign_and_add_stack[compile] 0.5337ms 0.4183ms 2.3904 KOps/s 2.3091 KOps/s $\color{#35bf28}+3.52\%$
test_compile_assign_and_add_stack[eager] 5.7078ms 3.9884ms 250.7299 Ops/s 257.0290 Ops/s $\color{#d91a1a}-2.45\%$
test_compile_indexing[tensor-tensordict-compile] 80.8810μs 35.0470μs 28.5331 KOps/s 27.1591 KOps/s $\textbf{\color{#35bf28}+5.06\%}$
test_compile_indexing[tensor-tensordict-eager] 0.6155ms 49.7886μs 20.0849 KOps/s 20.0128 KOps/s $\color{#35bf28}+0.36\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1404ms 30.4633μs 32.8264 KOps/s 32.4168 KOps/s $\color{#35bf28}+1.26\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1045ms 29.8820μs 33.4649 KOps/s 34.9710 KOps/s $\color{#d91a1a}-4.31\%$
test_compile_indexing[tensor-pytree-compile] 80.7400μs 29.7354μs 33.6299 KOps/s 32.6941 KOps/s $\color{#35bf28}+2.86\%$
test_compile_indexing[tensor-pytree-eager] 0.1247ms 29.3769μs 34.0404 KOps/s 34.3908 KOps/s $\color{#d91a1a}-1.02\%$
test_compile_indexing[slice-tensordict-compile] 0.1556ms 74.3416μs 13.4514 KOps/s 13.1459 KOps/s $\color{#35bf28}+2.32\%$
test_compile_indexing[slice-tensordict-eager] 0.6151ms 28.5626μs 35.0108 KOps/s 35.7613 KOps/s $\color{#d91a1a}-2.10\%$
test_compile_indexing[slice-tensorclass-compile] 0.1688ms 68.8127μs 14.5322 KOps/s 14.2080 KOps/s $\color{#35bf28}+2.28\%$
test_compile_indexing[slice-tensorclass-eager] 81.4810μs 23.8567μs 41.9169 KOps/s 42.9265 KOps/s $\color{#d91a1a}-2.35\%$
test_compile_indexing[slice-pytree-compile] 0.1620ms 68.5795μs 14.5816 KOps/s 14.3254 KOps/s $\color{#35bf28}+1.79\%$
test_compile_indexing[slice-pytree-eager] 76.4620μs 23.5948μs 42.3822 KOps/s 43.0981 KOps/s $\color{#d91a1a}-1.66\%$
test_compile_indexing[int-tensordict-compile] 0.1606ms 74.8603μs 13.3582 KOps/s 13.1142 KOps/s $\color{#35bf28}+1.86\%$
test_compile_indexing[int-tensordict-eager] 1.2940ms 28.5833μs 34.9855 KOps/s 35.7834 KOps/s $\color{#d91a1a}-2.23\%$
test_compile_indexing[int-tensorclass-compile] 0.1479ms 68.6829μs 14.5597 KOps/s 14.1857 KOps/s $\color{#35bf28}+2.64\%$
test_compile_indexing[int-tensorclass-eager] 83.4050μs 24.1839μs 41.3498 KOps/s 43.1998 KOps/s $\color{#d91a1a}-4.28\%$
test_compile_indexing[int-pytree-compile] 0.1682ms 69.0550μs 14.4812 KOps/s 14.3822 KOps/s $\color{#35bf28}+0.69\%$
test_compile_indexing[int-pytree-eager] 85.9000μs 23.2966μs 42.9247 KOps/s 43.7718 KOps/s $\color{#d91a1a}-1.94\%$
test_mod_add[eager] 79.9190μs 26.2216μs 38.1365 KOps/s 40.6286 KOps/s $\textbf{\color{#d91a1a}-6.13\%}$
test_mod_add[compile] 94.8970μs 40.3340μs 24.7930 KOps/s 24.0370 KOps/s $\color{#35bf28}+3.15\%$
test_mod_add[compile-overhead] 0.1040ms 40.7676μs 24.5293 KOps/s 23.8894 KOps/s $\color{#35bf28}+2.68\%$
test_mod_wrap[eager] 0.4197ms 0.2170ms 4.6077 KOps/s 4.6565 KOps/s $\color{#d91a1a}-1.05\%$
test_mod_wrap[compile] 0.4251ms 0.2402ms 4.1631 KOps/s 4.1761 KOps/s $\color{#d91a1a}-0.31\%$
test_mod_wrap[compile-overhead] 0.4374ms 0.2377ms 4.2064 KOps/s 4.2113 KOps/s $\color{#d91a1a}-0.11\%$
test_mod_wrap_and_backward[eager] 12.1570ms 11.0382ms 90.5941 Ops/s 89.8722 Ops/s $\color{#35bf28}+0.80\%$
test_mod_wrap_and_backward[compile] 13.4672ms 11.3170ms 88.3626 Ops/s 87.5500 Ops/s $\color{#35bf28}+0.93\%$
test_mod_wrap_and_backward[compile-overhead] 15.2512ms 11.8384ms 84.4707 Ops/s 83.4057 Ops/s $\color{#35bf28}+1.28\%$
test_seq_add[eager] 0.1799ms 95.7162μs 10.4476 KOps/s 11.0292 KOps/s $\textbf{\color{#d91a1a}-5.27\%}$
test_seq_add[compile] 0.1657ms 66.1192μs 15.1242 KOps/s 15.4909 KOps/s $\color{#d91a1a}-2.37\%$
test_seq_add[compile-overhead] 0.1296ms 64.3328μs 15.5442 KOps/s 15.6692 KOps/s $\color{#d91a1a}-0.80\%$
test_seq_wrap[eager] 0.6069ms 0.3977ms 2.5144 KOps/s 2.5867 KOps/s $\color{#d91a1a}-2.79\%$
test_seq_wrap[compile] 1.4902ms 0.2763ms 3.6196 KOps/s 3.6484 KOps/s $\color{#d91a1a}-0.79\%$
test_seq_wrap[compile-overhead] 1.5212ms 0.2750ms 3.6357 KOps/s 3.6097 KOps/s $\color{#35bf28}+0.72\%$
test_func_call_runtime[False-eager] 0.8836ms 0.5438ms 1.8391 KOps/s 1.8191 KOps/s $\color{#35bf28}+1.10\%$
test_func_call_runtime[False-compile] 0.6949ms 0.5092ms 1.9637 KOps/s 1.9694 KOps/s $\color{#d91a1a}-0.29\%$
test_func_call_runtime[False-compile-overhead] 0.6372ms 0.5120ms 1.9533 KOps/s 1.9493 KOps/s $\color{#35bf28}+0.20\%$
test_func_call_runtime[True-eager] 0.9934ms 0.7791ms 1.2836 KOps/s 1.2984 KOps/s $\color{#d91a1a}-1.14\%$
test_func_call_runtime[True-compile] 0.6513ms 0.5221ms 1.9152 KOps/s 1.9144 KOps/s $\color{#35bf28}+0.05\%$
test_func_call_runtime[True-compile-overhead] 0.6960ms 0.5223ms 1.9148 KOps/s 1.9063 KOps/s $\color{#35bf28}+0.45\%$
test_func_call_cm_runtime[False-eager] 0.8022ms 0.5388ms 1.8560 KOps/s 1.8364 KOps/s $\color{#35bf28}+1.07\%$
test_func_call_cm_runtime[False-compile] 0.9879ms 0.5145ms 1.9436 KOps/s 1.9465 KOps/s $\color{#d91a1a}-0.15\%$
test_func_call_cm_runtime[False-compile-overhead] 0.6742ms 0.5119ms 1.9535 KOps/s 1.9404 KOps/s $\color{#35bf28}+0.67\%$
test_func_call_cm_runtime[True-eager] 1.4854ms 0.9084ms 1.1008 KOps/s 1.1115 KOps/s $\color{#d91a1a}-0.96\%$
test_func_call_cm_runtime[True-compile] 1.2101ms 0.7659ms 1.3057 KOps/s 1.3006 KOps/s $\color{#35bf28}+0.39\%$
test_func_call_cm_runtime[True-compile-overhead] 0.9432ms 0.7663ms 1.3049 KOps/s 1.3014 KOps/s $\color{#35bf28}+0.27\%$
test_vmap_func_call_cm_runtime[eager] 2.6818ms 1.9425ms 514.7889 Ops/s 515.9657 Ops/s $\color{#d91a1a}-0.23\%$
test_vmap_func_call_cm_runtime[compile] 2.9737ms 1.9934ms 501.6508 Ops/s 501.4311 Ops/s $\color{#35bf28}+0.04\%$
test_vmap_func_call_cm_runtime[compile-overhead] 3.6924ms 1.9965ms 500.8657 Ops/s 503.6745 Ops/s $\color{#d91a1a}-0.56\%$
test_distributed 0.3298ms 0.1278ms 7.8241 KOps/s 7.5702 KOps/s $\color{#35bf28}+3.35\%$
test_tdmodule 45.5350μs 18.4943μs 54.0708 KOps/s 54.6830 KOps/s $\color{#d91a1a}-1.12\%$
test_tdmodule_dispatch 63.3280μs 36.5221μs 27.3807 KOps/s 27.9754 KOps/s $\color{#d91a1a}-2.13\%$
test_tdseq 45.9260μs 20.9951μs 47.6301 KOps/s 48.6815 KOps/s $\color{#d91a1a}-2.16\%$
test_tdseq_dispatch 61.1540μs 42.0245μs 23.7957 KOps/s 24.0189 KOps/s $\color{#d91a1a}-0.93\%$
test_instantiation_functorch 2.4924ms 1.6169ms 618.4732 Ops/s 617.7070 Ops/s $\color{#35bf28}+0.12\%$
test_instantiation_td 2.2349ms 1.2073ms 828.3049 Ops/s 826.1920 Ops/s $\color{#35bf28}+0.26\%$
test_exec_functorch 0.4461ms 0.1895ms 5.2769 KOps/s 5.2423 KOps/s $\color{#35bf28}+0.66\%$
test_exec_functional_call 0.2868ms 0.1799ms 5.5572 KOps/s 5.6105 KOps/s $\color{#d91a1a}-0.95\%$
test_exec_td 0.3201ms 0.1690ms 5.9159 KOps/s 5.6911 KOps/s $\color{#35bf28}+3.95\%$
test_exec_td_decorator 0.4763ms 0.2314ms 4.3210 KOps/s 4.3719 KOps/s $\color{#d91a1a}-1.16\%$
test_vmap_mlp_speed[True-True] 0.9807ms 0.6704ms 1.4917 KOps/s 1.4772 KOps/s $\color{#35bf28}+0.98\%$
test_vmap_mlp_speed[True-False] 0.9746ms 0.6656ms 1.5024 KOps/s 1.5038 KOps/s $\color{#d91a1a}-0.09\%$
test_vmap_mlp_speed[False-True] 0.9445ms 0.5174ms 1.9328 KOps/s 1.9672 KOps/s $\color{#d91a1a}-1.75\%$
test_vmap_mlp_speed[False-False] 0.7555ms 0.5185ms 1.9285 KOps/s 1.9810 KOps/s $\color{#d91a1a}-2.65\%$
test_vmap_mlp_speed_decorator[True-True] 1.6486ms 0.6468ms 1.5460 KOps/s 1.5556 KOps/s $\color{#d91a1a}-0.62\%$
test_vmap_mlp_speed_decorator[True-False] 0.9718ms 0.6485ms 1.5419 KOps/s 1.5688 KOps/s $\color{#d91a1a}-1.71\%$
test_vmap_mlp_speed_decorator[False-True] 0.7387ms 0.5333ms 1.8751 KOps/s 1.9181 KOps/s $\color{#d91a1a}-2.24\%$
test_vmap_mlp_speed_decorator[False-False] 0.7449ms 0.5342ms 1.8718 KOps/s 1.8834 KOps/s $\color{#d91a1a}-0.61\%$
test_to_module_speed[True] 2.0900ms 1.3442ms 743.9316 Ops/s 773.6573 Ops/s $\color{#d91a1a}-3.84\%$
test_to_module_speed[False] 1.4437ms 1.3083ms 764.3363 Ops/s 791.5185 Ops/s $\color{#d91a1a}-3.43\%$
test_tc_init 89.5270μs 44.4913μs 22.4763 KOps/s 23.5745 KOps/s $\color{#d91a1a}-4.66\%$
test_tc_init_nested 0.1705ms 90.5980μs 11.0378 KOps/s 11.8961 KOps/s $\textbf{\color{#d91a1a}-7.22\%}$
test_tc_first_layer_tensor 19.8370μs 1.5803μs 632.8031 KOps/s 665.8089 KOps/s $\color{#d91a1a}-4.96\%$
test_tc_first_layer_nontensor 26.8900μs 4.8566μs 205.9059 KOps/s 219.8656 KOps/s $\textbf{\color{#d91a1a}-6.35\%}$
test_tc_second_layer_tensor 58.8710μs 2.8494μs 350.9451 KOps/s 358.4539 KOps/s $\color{#d91a1a}-2.09\%$
test_tc_second_layer_nontensor 37.9910μs 6.1511μs 162.5727 KOps/s 168.1433 KOps/s $\color{#d91a1a}-3.31\%$
test_unbind 0.4892s 14.5045ms 68.9440 Ops/s 64.8040 Ops/s $\textbf{\color{#35bf28}+6.39\%}$
test_full_like 10.5659ms 8.1645ms 122.4811 Ops/s 118.0358 Ops/s $\color{#35bf28}+3.77\%$
test_zeros_like 14.2730ms 6.0616ms 164.9719 Ops/s 317.1927 Ops/s $\textbf{\color{#d91a1a}-47.99\%}$
test_ones_like 15.2806ms 7.5565ms 132.3366 Ops/s 152.6570 Ops/s $\textbf{\color{#d91a1a}-13.31\%}$
test_clone 16.6054ms 9.6694ms 103.4186 Ops/s 118.2681 Ops/s $\textbf{\color{#d91a1a}-12.56\%}$
test_squeeze 71.8130μs 12.2792μs 81.4388 KOps/s 78.6157 KOps/s $\color{#35bf28}+3.59\%$
test_unsqueeze 0.1686ms 93.7685μs 10.6646 KOps/s 10.8212 KOps/s $\color{#d91a1a}-1.45\%$
test_split 0.5205ms 0.2007ms 4.9816 KOps/s 5.0249 KOps/s $\color{#d91a1a}-0.86\%$
test_permute 0.4426ms 0.2238ms 4.4685 KOps/s 4.4287 KOps/s $\color{#35bf28}+0.90\%$
test_stack 29.8026ms 26.5032ms 37.7313 Ops/s 34.9054 Ops/s $\textbf{\color{#35bf28}+8.10\%}$
test_cat 29.2317ms 26.3324ms 37.9760 Ops/s 36.2594 Ops/s $\color{#35bf28}+4.73\%$

Copy link

github-actions bot commented Sep 30, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 228. Improved: $\large\color{#35bf28}24$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1145ms 13.1546μs 76.0190 KOps/s 71.4137 KOps/s $\textbf{\color{#35bf28}+6.45\%}$
test_plain_set_stack_nested 47.3100μs 13.1903μs 75.8132 KOps/s 70.2214 KOps/s $\textbf{\color{#35bf28}+7.96\%}$
test_plain_set_nested_inplace 51.1010μs 14.3970μs 69.4588 KOps/s 65.9397 KOps/s $\textbf{\color{#35bf28}+5.34\%}$
test_plain_set_stack_nested_inplace 49.9810μs 14.2705μs 70.0748 KOps/s 66.6707 KOps/s $\textbf{\color{#35bf28}+5.11\%}$
test_items 41.2910μs 2.8304μs 353.3080 KOps/s 347.9959 KOps/s $\color{#35bf28}+1.53\%$
test_items_nested 0.4225ms 0.3252ms 3.0747 KOps/s 3.0622 KOps/s $\color{#35bf28}+0.41\%$
test_items_nested_locked 0.3851ms 0.3277ms 3.0517 KOps/s 3.0460 KOps/s $\color{#35bf28}+0.19\%$
test_items_nested_leaf 90.6720μs 55.7349μs 17.9421 KOps/s 18.0031 KOps/s $\color{#d91a1a}-0.34\%$
test_items_stack_nested 0.3874ms 0.3310ms 3.0216 KOps/s 3.0681 KOps/s $\color{#d91a1a}-1.52\%$
test_items_stack_nested_leaf 89.7520μs 57.0104μs 17.5407 KOps/s 17.4157 KOps/s $\color{#35bf28}+0.72\%$
test_items_stack_nested_locked 0.3939ms 0.3309ms 3.0220 KOps/s 3.0571 KOps/s $\color{#d91a1a}-1.15\%$
test_keys 22.8700μs 3.4172μs 292.6343 KOps/s 290.6945 KOps/s $\color{#35bf28}+0.67\%$
test_keys_nested 98.2420μs 54.9546μs 18.1969 KOps/s 18.1870 KOps/s $\color{#35bf28}+0.05\%$
test_keys_nested_locked 2.7468ms 62.1644μs 16.0864 KOps/s 16.0486 KOps/s $\color{#35bf28}+0.24\%$
test_keys_nested_leaf 77.7310μs 46.1247μs 21.6804 KOps/s 21.1811 KOps/s $\color{#35bf28}+2.36\%$
test_keys_stack_nested 86.3710μs 56.8519μs 17.5896 KOps/s 17.8293 KOps/s $\color{#d91a1a}-1.34\%$
test_keys_stack_nested_leaf 81.0210μs 48.2556μs 20.7230 KOps/s 20.8535 KOps/s $\color{#d91a1a}-0.63\%$
test_keys_stack_nested_locked 90.3610μs 62.0325μs 16.1206 KOps/s 16.4276 KOps/s $\color{#d91a1a}-1.87\%$
test_values 5.5700μs 0.8432μs 1.1859 MOps/s 1.1878 MOps/s $\color{#d91a1a}-0.15\%$
test_values_nested 70.1410μs 40.8735μs 24.4657 KOps/s 24.5571 KOps/s $\color{#d91a1a}-0.37\%$
test_values_nested_locked 69.7510μs 42.8608μs 23.3313 KOps/s 23.4612 KOps/s $\color{#d91a1a}-0.55\%$
test_values_nested_leaf 69.0010μs 35.4304μs 28.2244 KOps/s 28.3247 KOps/s $\color{#d91a1a}-0.35\%$
test_values_stack_nested 77.8510μs 41.9282μs 23.8503 KOps/s 23.9592 KOps/s $\color{#d91a1a}-0.45\%$
test_values_stack_nested_leaf 63.9510μs 36.4313μs 27.4489 KOps/s 28.1475 KOps/s $\color{#d91a1a}-2.48\%$
test_values_stack_nested_locked 75.5420μs 43.7637μs 22.8500 KOps/s 23.2119 KOps/s $\color{#d91a1a}-1.56\%$
test_membership 2.0890μs 0.4996μs 2.0017 MOps/s 1.9757 MOps/s $\color{#35bf28}+1.32\%$
test_membership_nested 16.3155μs 1.9040μs 525.2064 KOps/s 540.4452 KOps/s $\color{#d91a1a}-2.82\%$
test_membership_nested_leaf 12.4037μs 1.8857μs 530.2970 KOps/s 553.4693 KOps/s $\color{#d91a1a}-4.19\%$
test_membership_stacked_nested 36.5310μs 1.9749μs 506.3464 KOps/s 518.3583 KOps/s $\color{#d91a1a}-2.32\%$
test_membership_stacked_nested_leaf 26.6210μs 1.9602μs 510.1465 KOps/s 513.1429 KOps/s $\color{#d91a1a}-0.58\%$
test_membership_nested_last 27.7900μs 2.7690μs 361.1423 KOps/s 363.4025 KOps/s $\color{#d91a1a}-0.62\%$
test_membership_nested_leaf_last 38.7310μs 2.7541μs 363.0896 KOps/s 363.6181 KOps/s $\color{#d91a1a}-0.15\%$
test_membership_stacked_nested_last 26.3610μs 3.1835μs 314.1219 KOps/s 292.3764 KOps/s $\textbf{\color{#35bf28}+7.44\%}$
test_membership_stacked_nested_leaf_last 35.8110μs 3.1464μs 317.8228 KOps/s 295.6571 KOps/s $\textbf{\color{#35bf28}+7.50\%}$
test_nested_getleaf 39.3300μs 6.0314μs 165.7980 KOps/s 165.2576 KOps/s $\color{#35bf28}+0.33\%$
test_nested_get 33.8510μs 5.7449μs 174.0681 KOps/s 174.6679 KOps/s $\color{#d91a1a}-0.34\%$
test_stacked_getleaf 29.5000μs 6.0021μs 166.6081 KOps/s 165.4278 KOps/s $\color{#35bf28}+0.71\%$
test_stacked_get 52.5510μs 5.6198μs 177.9423 KOps/s 173.3376 KOps/s $\color{#35bf28}+2.66\%$
test_nested_getitemleaf 26.0400μs 6.1200μs 163.3986 KOps/s 162.6570 KOps/s $\color{#35bf28}+0.46\%$
test_nested_getitem 40.7610μs 5.5771μs 179.3031 KOps/s 174.1781 KOps/s $\color{#35bf28}+2.94\%$
test_stacked_getitemleaf 25.7600μs 6.0649μs 164.8839 KOps/s 161.9825 KOps/s $\color{#35bf28}+1.79\%$
test_stacked_getitem 30.5600μs 5.7092μs 175.1559 KOps/s 171.1262 KOps/s $\color{#35bf28}+2.35\%$
test_lock_nested 7.1320ms 0.4162ms 2.4026 KOps/s 2.4070 KOps/s $\color{#d91a1a}-0.18\%$
test_lock_stack_nested 0.4279ms 0.3724ms 2.6856 KOps/s 2.7359 KOps/s $\color{#d91a1a}-1.84\%$
test_unlock_nested 0.7601ms 0.3507ms 2.8517 KOps/s 2.8458 KOps/s $\color{#35bf28}+0.21\%$
test_unlock_stack_nested 0.3703ms 0.3112ms 3.2138 KOps/s 3.2678 KOps/s $\color{#d91a1a}-1.65\%$
test_flatten_speed 0.1500ms 69.3665μs 14.4162 KOps/s 14.3674 KOps/s $\color{#35bf28}+0.34\%$
test_unflatten_speed 0.3727ms 0.2836ms 3.5257 KOps/s 3.5611 KOps/s $\color{#d91a1a}-0.99\%$
test_common_ops 1.5793ms 1.2223ms 818.1553 Ops/s 802.7999 Ops/s $\color{#35bf28}+1.91\%$
test_creation 21.3610μs 1.4651μs 682.5297 KOps/s 663.9703 KOps/s $\color{#35bf28}+2.80\%$
test_creation_empty 47.6610μs 14.3022μs 69.9195 KOps/s 63.5265 KOps/s $\textbf{\color{#35bf28}+10.06\%}$
test_creation_nested_1 46.1310μs 15.8340μs 63.1552 KOps/s 57.5233 KOps/s $\textbf{\color{#35bf28}+9.79\%}$
test_creation_nested_2 35.9510μs 19.2793μs 51.8690 KOps/s 50.0243 KOps/s $\color{#35bf28}+3.69\%$
test_clone 81.4420μs 29.1067μs 34.3564 KOps/s 34.9490 KOps/s $\color{#d91a1a}-1.70\%$
test_getitem[int] 1.1638ms 15.3803μs 65.0182 KOps/s 65.0209 KOps/s $-0.00\%$
test_getitem[slice_int] 0.1224ms 26.8588μs 37.2318 KOps/s 36.7694 KOps/s $\color{#35bf28}+1.26\%$
test_getitem[range] 0.2476ms 0.1146ms 8.7274 KOps/s 9.1287 KOps/s $\color{#d91a1a}-4.40\%$
test_getitem[tuple] 0.1258ms 22.7648μs 43.9275 KOps/s 43.6401 KOps/s $\color{#35bf28}+0.66\%$
test_getitem[list] 0.1942ms 0.1019ms 9.8117 KOps/s 10.1879 KOps/s $\color{#d91a1a}-3.69\%$
test_setitem_dim[int] 72.3010μs 46.5574μs 21.4789 KOps/s 22.8000 KOps/s $\textbf{\color{#d91a1a}-5.79\%}$
test_setitem_dim[slice_int] 0.1047ms 67.4632μs 14.8229 KOps/s 15.0302 KOps/s $\color{#d91a1a}-1.38\%$
test_setitem_dim[range] 0.1844ms 0.1304ms 7.6701 KOps/s 7.8956 KOps/s $\color{#d91a1a}-2.86\%$
test_setitem_dim[tuple] 98.6720μs 61.7349μs 16.1983 KOps/s 16.6405 KOps/s $\color{#d91a1a}-2.66\%$
test_setitem 87.1720μs 42.8892μs 23.3159 KOps/s 24.2138 KOps/s $\color{#d91a1a}-3.71\%$
test_set 83.1110μs 39.9674μs 25.0204 KOps/s 24.9957 KOps/s $\color{#35bf28}+0.10\%$
test_set_shared 0.3526ms 50.5925μs 19.7658 KOps/s 20.0423 KOps/s $\color{#d91a1a}-1.38\%$
test_update 0.1154ms 49.4000μs 20.2429 KOps/s 20.4959 KOps/s $\color{#d91a1a}-1.23\%$
test_update_nested 0.1035ms 60.6841μs 16.4788 KOps/s 17.8198 KOps/s $\textbf{\color{#d91a1a}-7.53\%}$
test_update__nested 0.1182ms 66.3291μs 15.0763 KOps/s 17.0503 KOps/s $\textbf{\color{#d91a1a}-11.58\%}$
test_set_nested 98.5110μs 46.4157μs 21.5444 KOps/s 23.2417 KOps/s $\textbf{\color{#d91a1a}-7.30\%}$
test_set_nested_new 92.2810μs 50.5330μs 19.7890 KOps/s 21.4959 KOps/s $\textbf{\color{#d91a1a}-7.94\%}$
test_select 0.1096ms 64.0390μs 15.6155 KOps/s 16.9081 KOps/s $\textbf{\color{#d91a1a}-7.64\%}$
test_select_nested 78.1720μs 42.2739μs 23.6553 KOps/s 23.9697 KOps/s $\color{#d91a1a}-1.31\%$
test_exclude_nested 98.0820μs 58.9630μs 16.9598 KOps/s 17.0628 KOps/s $\color{#d91a1a}-0.60\%$
test_empty[True] 0.3139ms 0.2449ms 4.0827 KOps/s 4.1626 KOps/s $\color{#d91a1a}-1.92\%$
test_empty[False] 4.1950μs 0.7459μs 1.3406 MOps/s 1.3590 MOps/s $\color{#d91a1a}-1.35\%$
test_to 59.1610μs 24.6283μs 40.6037 KOps/s 39.6879 KOps/s $\color{#35bf28}+2.31\%$
test_to_nonblocking 59.4210μs 23.5039μs 42.5461 KOps/s 42.6129 KOps/s $\color{#d91a1a}-0.16\%$
test_unbind_speed 1.3299ms 0.2741ms 3.6482 KOps/s 3.7045 KOps/s $\color{#d91a1a}-1.52\%$
test_unbind_speed_stack0 0.3318ms 0.2708ms 3.6934 KOps/s 3.7807 KOps/s $\color{#d91a1a}-2.31\%$
test_unbind_speed_stack1 93.0856ms 0.7013ms 1.4259 KOps/s 1.4491 KOps/s $\color{#d91a1a}-1.60\%$
test_split 94.3487ms 2.1587ms 463.2427 Ops/s 467.4006 Ops/s $\color{#d91a1a}-0.89\%$
test_chunk 96.3650ms 2.1595ms 463.0622 Ops/s 465.1197 Ops/s $\color{#d91a1a}-0.44\%$
test_creation[device0] 0.3718ms 0.1252ms 7.9884 KOps/s 7.9294 KOps/s $\color{#35bf28}+0.74\%$
test_creation_from_tensor 0.3962ms 0.1288ms 7.7615 KOps/s 7.8397 KOps/s $\color{#d91a1a}-1.00\%$
test_add_one[memmap_tensor0] 0.1350ms 8.7522μs 114.2569 KOps/s 119.7562 KOps/s $\color{#d91a1a}-4.59\%$
test_contiguous[memmap_tensor0] 19.9500μs 2.1373μs 467.8725 KOps/s 469.6891 KOps/s $\color{#d91a1a}-0.39\%$
test_stack[memmap_tensor0] 35.3510μs 6.4107μs 155.9892 KOps/s 153.3992 KOps/s $\color{#35bf28}+1.69\%$
test_memmaptd_index 1.2118ms 0.4099ms 2.4396 KOps/s 2.4466 KOps/s $\color{#d91a1a}-0.29\%$
test_memmaptd_index_astensor 0.7448ms 0.4654ms 2.1487 KOps/s 2.1282 KOps/s $\color{#35bf28}+0.96\%$
test_memmaptd_index_op 1.4118ms 0.9711ms 1.0298 KOps/s 985.2126 Ops/s $\color{#35bf28}+4.52\%$
test_serialize_model 0.1319s 0.1308s 7.6472 Ops/s 7.7175 Ops/s $\color{#d91a1a}-0.91\%$
test_serialize_model_pickle 1.3791s 1.2191s 0.8203 Ops/s 0.8430 Ops/s $\color{#d91a1a}-2.69\%$
test_serialize_weights 0.1312s 0.1299s 7.6997 Ops/s 7.7108 Ops/s $\color{#d91a1a}-0.14\%$
test_serialize_weights_returnearly 0.2315s 56.0311ms 17.8472 Ops/s 16.0578 Ops/s $\textbf{\color{#35bf28}+11.14\%}$
test_serialize_weights_pickle 1.3733s 1.2170s 0.8217 Ops/s 0.8149 Ops/s $\color{#35bf28}+0.84\%$
test_reshape_pytree 80.1120μs 34.9406μs 28.6200 KOps/s 28.5097 KOps/s $\color{#35bf28}+0.39\%$
test_reshape_td 85.2520μs 40.8070μs 24.5056 KOps/s 25.6015 KOps/s $\color{#d91a1a}-4.28\%$
test_view_pytree 69.4610μs 35.0405μs 28.5384 KOps/s 29.5658 KOps/s $\color{#d91a1a}-3.47\%$
test_view_td 80.0010μs 46.8296μs 21.3540 KOps/s 22.4203 KOps/s $\color{#d91a1a}-4.76\%$
test_unbind_pytree 67.9510μs 33.8494μs 29.5426 KOps/s 30.1146 KOps/s $\color{#d91a1a}-1.90\%$
test_unbind_td 0.5721ms 41.5681μs 24.0569 KOps/s 23.7373 KOps/s $\color{#35bf28}+1.35\%$
test_split_pytree 0.1702ms 46.2471μs 21.6230 KOps/s 21.5242 KOps/s $\color{#35bf28}+0.46\%$
test_split_td 0.7411ms 55.4193μs 18.0442 KOps/s 18.2720 KOps/s $\color{#d91a1a}-1.25\%$
test_add_pytree 0.1035ms 54.9913μs 18.1847 KOps/s 18.4986 KOps/s $\color{#d91a1a}-1.70\%$
test_add_td 0.1534ms 87.3919μs 11.4427 KOps/s 11.6258 KOps/s $\color{#d91a1a}-1.58\%$
test_compile_add_one_nested[tensordict-compile] 0.4133ms 0.2091ms 4.7825 KOps/s 4.7762 KOps/s $\color{#35bf28}+0.13\%$
test_compile_add_one_nested[tensordict-eager] 0.1919ms 0.1526ms 6.5510 KOps/s 6.6965 KOps/s $\color{#d91a1a}-2.17\%$
test_compile_add_one_nested[pytree-compile] 0.1823ms 0.1437ms 6.9612 KOps/s 6.8174 KOps/s $\color{#35bf28}+2.11\%$
test_compile_add_one_nested[pytree-eager] 0.2587ms 0.1803ms 5.5461 KOps/s 5.3160 KOps/s $\color{#35bf28}+4.33\%$
test_compile_copy_nested[tensordict-compile] 0.1383ms 21.4026μs 46.7233 KOps/s 48.7784 KOps/s $\color{#d91a1a}-4.21\%$
test_compile_copy_nested[tensordict-eager] 0.2427ms 43.6008μs 22.9353 KOps/s 23.3232 KOps/s $\color{#d91a1a}-1.66\%$
test_compile_copy_nested[pytree-compile] 0.2596ms 64.3320μs 15.5444 KOps/s 15.7309 KOps/s $\color{#d91a1a}-1.19\%$
test_compile_copy_nested[pytree-eager] 81.6110μs 49.0732μs 20.3777 KOps/s 20.3146 KOps/s $\color{#35bf28}+0.31\%$
test_compile_add_one_flat[tensordict-compile] 0.4496ms 0.3146ms 3.1782 KOps/s 3.1893 KOps/s $\color{#d91a1a}-0.35\%$
test_compile_add_one_flat[tensordict-eager] 0.2779ms 0.2096ms 4.7714 KOps/s 4.7979 KOps/s $\color{#d91a1a}-0.55\%$
test_compile_add_one_flat[tensorclass-compile] 0.1723ms 0.1262ms 7.9254 KOps/s 7.9812 KOps/s $\color{#d91a1a}-0.70\%$
test_compile_add_one_flat[tensorclass-eager] 0.1907ms 62.1175μs 16.0985 KOps/s 16.8227 KOps/s $\color{#d91a1a}-4.30\%$
test_compile_add_one_flat[pytree-compile] 0.4404ms 0.3224ms 3.1021 KOps/s 3.2077 KOps/s $\color{#d91a1a}-3.29\%$
test_compile_add_one_flat[pytree-eager] 0.7765ms 0.6282ms 1.5919 KOps/s 1.6540 KOps/s $\color{#d91a1a}-3.76\%$
test_compile_add_self_flat[tensordict-eager] 0.3358ms 0.2517ms 3.9732 KOps/s 4.0276 KOps/s $\color{#d91a1a}-1.35\%$
test_compile_add_self_flat[tensordict-compile] 0.4262ms 0.3217ms 3.1087 KOps/s 3.2172 KOps/s $\color{#d91a1a}-3.37\%$
test_compile_add_self_flat[tensorclass-eager] 0.1870ms 72.4741μs 13.7980 KOps/s 14.6071 KOps/s $\textbf{\color{#d91a1a}-5.54\%}$
test_compile_add_self_flat[tensorclass-compile] 0.2279ms 0.1314ms 7.6110 KOps/s 7.6118 KOps/s $\color{#d91a1a}-0.01\%$
test_compile_add_self_flat[pytree-eager] 0.6218ms 0.5144ms 1.9438 KOps/s 1.9058 KOps/s $\color{#35bf28}+1.99\%$
test_compile_add_self_flat[pytree-compile] 0.4723ms 0.3229ms 3.0967 KOps/s 3.1806 KOps/s $\color{#d91a1a}-2.64\%$
test_compile_copy_flat[tensordict-compile] 0.1114ms 17.7718μs 56.2691 KOps/s 61.8656 KOps/s $\textbf{\color{#d91a1a}-9.05\%}$
test_compile_copy_flat[tensordict-eager] 0.1463ms 27.6772μs 36.1309 KOps/s 36.8348 KOps/s $\color{#d91a1a}-1.91\%$
test_compile_copy_flat[pytree-compile] 0.1569ms 70.5522μs 14.1739 KOps/s 14.2416 KOps/s $\color{#d91a1a}-0.48\%$
test_compile_copy_flat[pytree-eager] 0.1344ms 51.9124μs 19.2632 KOps/s 19.5367 KOps/s $\color{#d91a1a}-1.40\%$
test_compile_assign_and_add[tensordict-compile] 2.3147ms 0.8204ms 1.2190 KOps/s 1.1197 KOps/s $\textbf{\color{#35bf28}+8.87\%}$
test_compile_assign_and_add[tensordict-eager] 3.3250ms 3.1855ms 313.9199 Ops/s 306.1032 Ops/s $\color{#35bf28}+2.55\%$
test_compile_assign_and_add[pytree-compile] 2.2800ms 0.8036ms 1.2444 KOps/s 1.1525 KOps/s $\textbf{\color{#35bf28}+7.97\%}$
test_compile_assign_and_add[pytree-eager] 3.2044ms 3.0688ms 325.8639 Ops/s 321.0347 Ops/s $\color{#35bf28}+1.50\%$
test_compile_indexing[tensor-tensordict-compile] 0.1648ms 0.1083ms 9.2360 KOps/s 8.9934 KOps/s $\color{#35bf28}+2.70\%$
test_compile_indexing[tensor-tensordict-eager] 0.1931ms 57.4783μs 17.3979 KOps/s 16.4475 KOps/s $\textbf{\color{#35bf28}+5.78\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.1365ms 0.1020ms 9.8062 KOps/s 9.7525 KOps/s $\color{#35bf28}+0.55\%$
test_compile_indexing[tensor-tensorclass-eager] 81.3610μs 41.3642μs 24.1755 KOps/s 23.5466 KOps/s $\color{#35bf28}+2.67\%$
test_compile_indexing[tensor-pytree-compile] 0.1406ms 0.1024ms 9.7688 KOps/s 9.7116 KOps/s $\color{#35bf28}+0.59\%$
test_compile_indexing[tensor-pytree-eager] 75.3510μs 41.4351μs 24.1341 KOps/s 23.9019 KOps/s $\color{#35bf28}+0.97\%$
test_compile_indexing[slice-tensordict-compile] 0.1817ms 0.1360ms 7.3533 KOps/s 7.3050 KOps/s $\color{#35bf28}+0.66\%$
test_compile_indexing[slice-tensordict-eager] 0.1613ms 24.0799μs 41.5285 KOps/s 40.6030 KOps/s $\color{#35bf28}+2.28\%$
test_compile_indexing[slice-tensorclass-compile] 0.1624ms 0.1292ms 7.7390 KOps/s 7.6981 KOps/s $\color{#35bf28}+0.53\%$
test_compile_indexing[slice-tensorclass-eager] 61.1310μs 20.1705μs 49.5772 KOps/s 47.8002 KOps/s $\color{#35bf28}+3.72\%$
test_compile_indexing[slice-pytree-compile] 0.1841ms 0.1301ms 7.6868 KOps/s 7.6189 KOps/s $\color{#35bf28}+0.89\%$
test_compile_indexing[slice-pytree-eager] 65.4410μs 20.2421μs 49.4019 KOps/s 48.1993 KOps/s $\color{#35bf28}+2.50\%$
test_compile_indexing[int-tensordict-compile] 0.1723ms 0.1368ms 7.3112 KOps/s 7.2441 KOps/s $\color{#35bf28}+0.93\%$
test_compile_indexing[int-tensordict-eager] 0.4660ms 26.2413μs 38.1079 KOps/s 40.4239 KOps/s $\textbf{\color{#d91a1a}-5.73\%}$
test_compile_indexing[int-tensorclass-compile] 0.1724ms 0.1303ms 7.6722 KOps/s 7.5999 KOps/s $\color{#35bf28}+0.95\%$
test_compile_indexing[int-tensorclass-eager] 47.8810μs 20.0550μs 49.8628 KOps/s 47.8845 KOps/s $\color{#35bf28}+4.13\%$
test_compile_indexing[int-pytree-compile] 0.1696ms 0.1304ms 7.6673 KOps/s 7.5855 KOps/s $\color{#35bf28}+1.08\%$
test_compile_indexing[int-pytree-eager] 58.3110μs 19.9466μs 50.1339 KOps/s 47.8637 KOps/s $\color{#35bf28}+4.74\%$
test_mod_add[eager] 72.2210μs 29.8631μs 33.4861 KOps/s 31.8670 KOps/s $\textbf{\color{#35bf28}+5.08\%}$
test_mod_add[compile] 0.3695ms 67.7558μs 14.7589 KOps/s 14.3162 KOps/s $\color{#35bf28}+3.09\%$
test_mod_add[compile-overhead] 0.2729ms 0.1377ms 7.2611 KOps/s 6.8720 KOps/s $\textbf{\color{#35bf28}+5.66\%}$
test_mod_wrap[eager] 0.3168ms 0.2366ms 4.2271 KOps/s 4.1091 KOps/s $\color{#35bf28}+2.87\%$
test_mod_wrap[compile] 1.6401ms 0.2965ms 3.3731 KOps/s 3.4403 KOps/s $\color{#d91a1a}-1.95\%$
test_mod_wrap[compile-overhead] 7.5070ms 4.0424ms 247.3793 Ops/s 246.8375 Ops/s $\color{#35bf28}+0.22\%$
test_mod_wrap_and_backward[eager] 1.5488ms 1.4356ms 696.5648 Ops/s 676.4913 Ops/s $\color{#35bf28}+2.97\%$
test_mod_wrap_and_backward[compile] 1.7403ms 1.4148ms 706.7993 Ops/s 698.3815 Ops/s $\color{#35bf28}+1.21\%$
test_mod_wrap_and_backward[compile-overhead] 1.5656ms 1.0250ms 975.6551 Ops/s 999.3087 Ops/s $\color{#d91a1a}-2.37\%$
test_seq_add[eager] 0.1458ms 91.4855μs 10.9307 KOps/s 9.8934 KOps/s $\textbf{\color{#35bf28}+10.48\%}$
test_seq_add[compile] 0.1699ms 77.4123μs 12.9178 KOps/s 12.0244 KOps/s $\textbf{\color{#35bf28}+7.43\%}$
test_seq_add[compile-overhead] 0.1661ms 0.1141ms 8.7644 KOps/s 8.5466 KOps/s $\color{#35bf28}+2.55\%$
test_seq_wrap[eager] 0.4475ms 0.3874ms 2.5816 KOps/s 2.5016 KOps/s $\color{#35bf28}+3.19\%$
test_seq_wrap[compile] 0.3449ms 0.3059ms 3.2692 KOps/s 3.0756 KOps/s $\textbf{\color{#35bf28}+6.29\%}$
test_seq_wrap[compile-overhead] 0.2622ms 0.2179ms 4.5888 KOps/s 4.4091 KOps/s $\color{#35bf28}+4.08\%$
test_func_call_runtime[False-eager] 0.8656ms 0.7311ms 1.3679 KOps/s 1.2887 KOps/s $\textbf{\color{#35bf28}+6.14\%}$
test_func_call_runtime[False-compile] 0.8968ms 0.7992ms 1.2513 KOps/s 1.2893 KOps/s $\color{#d91a1a}-2.95\%$
test_func_call_runtime[False-compile-overhead] 0.4152ms 0.3631ms 2.7543 KOps/s 2.8113 KOps/s $\color{#d91a1a}-2.03\%$
test_func_call_runtime[True-eager] 1.0676ms 0.9451ms 1.0581 KOps/s 1.1276 KOps/s $\textbf{\color{#d91a1a}-6.17\%}$
test_func_call_runtime[True-compile] 0.9626ms 0.8240ms 1.2136 KOps/s 1.2321 KOps/s $\color{#d91a1a}-1.50\%$
test_func_call_runtime[True-compile-overhead] 0.4398ms 0.3865ms 2.5872 KOps/s 2.6257 KOps/s $\color{#d91a1a}-1.47\%$
test_func_call_cm_runtime[False-eager] 0.8964ms 0.7714ms 1.2963 KOps/s 1.2916 KOps/s $\color{#35bf28}+0.36\%$
test_func_call_cm_runtime[False-compile] 0.9046ms 0.8031ms 1.2452 KOps/s 1.2438 KOps/s $\color{#35bf28}+0.11\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4693ms 0.3596ms 2.7806 KOps/s 2.8172 KOps/s $\color{#d91a1a}-1.30\%$
test_func_call_cm_runtime[True-eager] 1.1444ms 1.0361ms 965.1671 Ops/s 962.0295 Ops/s $\color{#35bf28}+0.33\%$
test_func_call_cm_runtime[True-compile] 0.9689ms 0.8517ms 1.1741 KOps/s 1.1651 KOps/s $\color{#35bf28}+0.77\%$
test_func_call_cm_runtime[True-compile-overhead] 0.4657ms 0.4050ms 2.4692 KOps/s 2.4120 KOps/s $\color{#35bf28}+2.37\%$
test_vmap_func_call_cm_runtime[eager] 2.6801ms 2.1010ms 475.9556 Ops/s 466.5701 Ops/s $\color{#35bf28}+2.01\%$
test_vmap_func_call_cm_runtime[compile] 0.9635ms 0.8622ms 1.1599 KOps/s 1.1594 KOps/s $\color{#35bf28}+0.04\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5443ms 0.4072ms 2.4556 KOps/s 2.4717 KOps/s $\color{#d91a1a}-0.65\%$
test_distributed 0.6176ms 0.1182ms 8.4638 KOps/s 8.8873 KOps/s $\color{#d91a1a}-4.77\%$
test_tdmodule 0.1257ms 14.5803μs 68.5855 KOps/s 63.6446 KOps/s $\textbf{\color{#35bf28}+7.76\%}$
test_tdmodule_dispatch 67.9210μs 27.7659μs 36.0154 KOps/s 32.9582 KOps/s $\textbf{\color{#35bf28}+9.28\%}$
test_tdseq 48.7910μs 15.1907μs 65.8296 KOps/s 62.7382 KOps/s $\color{#35bf28}+4.93\%$
test_tdseq_dispatch 56.6610μs 31.0979μs 32.1565 KOps/s 30.3163 KOps/s $\textbf{\color{#35bf28}+6.07\%}$
test_instantiation_functorch 2.0256ms 1.8810ms 531.6377 Ops/s 526.3941 Ops/s $\color{#35bf28}+1.00\%$
test_instantiation_td 1.8021ms 1.1979ms 834.7938 Ops/s 829.4163 Ops/s $\color{#35bf28}+0.65\%$
test_exec_functorch 0.2660ms 0.2108ms 4.7448 KOps/s 4.9206 KOps/s $\color{#d91a1a}-3.57\%$
test_exec_functional_call 0.3091ms 0.2185ms 4.5767 KOps/s 4.8836 KOps/s $\textbf{\color{#d91a1a}-6.28\%}$
test_exec_td 0.3575ms 0.2203ms 4.5386 KOps/s 4.7982 KOps/s $\textbf{\color{#d91a1a}-5.41\%}$
test_exec_td_decorator 0.9853ms 0.2645ms 3.7810 KOps/s 3.9281 KOps/s $\color{#d91a1a}-3.74\%$
test_vmap_mlp_speed[True-True] 0.8075ms 0.7054ms 1.4176 KOps/s 1.4570 KOps/s $\color{#d91a1a}-2.70\%$
test_vmap_mlp_speed[True-False] 0.8261ms 0.7047ms 1.4190 KOps/s 1.4739 KOps/s $\color{#d91a1a}-3.73\%$
test_vmap_mlp_speed[False-True] 0.6963ms 0.5978ms 1.6728 KOps/s 1.7213 KOps/s $\color{#d91a1a}-2.82\%$
test_vmap_mlp_speed[False-False] 0.7206ms 0.5900ms 1.6950 KOps/s 1.7567 KOps/s $\color{#d91a1a}-3.51\%$
test_vmap_mlp_speed_decorator[True-True] 1.3187ms 0.6720ms 1.4881 KOps/s 1.5009 KOps/s $\color{#d91a1a}-0.85\%$
test_vmap_mlp_speed_decorator[True-False] 0.8132ms 0.6724ms 1.4872 KOps/s 1.4947 KOps/s $\color{#d91a1a}-0.50\%$
test_vmap_mlp_speed_decorator[False-True] 0.7228ms 0.5838ms 1.7129 KOps/s 1.6875 KOps/s $\color{#35bf28}+1.51\%$
test_vmap_mlp_speed_decorator[False-False] 0.7271ms 0.5986ms 1.6706 KOps/s 1.6407 KOps/s $\color{#35bf28}+1.83\%$
test_vmap_transformer_speed[True-True] 8.3885ms 8.2537ms 121.1581 Ops/s 120.4660 Ops/s $\color{#35bf28}+0.57\%$
test_vmap_transformer_speed[True-False] 8.5415ms 8.2492ms 121.2237 Ops/s 120.8372 Ops/s $\color{#35bf28}+0.32\%$
test_vmap_transformer_speed[False-True] 8.1747ms 8.0494ms 124.2323 Ops/s 124.2361 Ops/s $-0.00\%$
test_vmap_transformer_speed[False-False] 8.3312ms 8.0460ms 124.2848 Ops/s 124.1249 Ops/s $\color{#35bf28}+0.13\%$
test_vmap_transformer_speed_decorator[True-True] 19.5177ms 19.3611ms 51.6500 Ops/s 51.8932 Ops/s $\color{#d91a1a}-0.47\%$
test_vmap_transformer_speed_decorator[True-False] 20.0367ms 19.4177ms 51.4995 Ops/s 51.9250 Ops/s $\color{#d91a1a}-0.82\%$
test_vmap_transformer_speed_decorator[False-True] 19.4459ms 19.2239ms 52.0187 Ops/s 52.3489 Ops/s $\color{#d91a1a}-0.63\%$
test_vmap_transformer_speed_decorator[False-False] 20.4899ms 19.2853ms 51.8529 Ops/s 52.2974 Ops/s $\color{#d91a1a}-0.85\%$
test_to_module_speed[True] 1.3439ms 0.9198ms 1.0871 KOps/s 1.0830 KOps/s $\color{#35bf28}+0.39\%$
test_to_module_speed[False] 1.3025ms 0.8988ms 1.1125 KOps/s 1.1176 KOps/s $\color{#d91a1a}-0.45\%$
test_tc_init 0.1198ms 31.4656μs 31.7807 KOps/s 29.0668 KOps/s $\textbf{\color{#35bf28}+9.34\%}$
test_tc_init_nested 0.1734ms 63.6310μs 15.7156 KOps/s 14.3309 KOps/s $\textbf{\color{#35bf28}+9.66\%}$
test_tc_first_layer_tensor 12.4431μs 0.6677μs 1.4977 MOps/s 1.4909 MOps/s $\color{#35bf28}+0.46\%$
test_tc_first_layer_nontensor 99.6810μs 2.2262μs 449.2058 KOps/s 453.2649 KOps/s $\color{#d91a1a}-0.90\%$
test_tc_second_layer_tensor 21.4878μs 1.3665μs 731.7778 KOps/s 734.5749 KOps/s $\color{#d91a1a}-0.38\%$
test_tc_second_layer_nontensor 82.6210μs 2.9445μs 339.6146 KOps/s 338.7629 KOps/s $\color{#35bf28}+0.25\%$
test_unbind 0.1981s 12.2542ms 81.6050 Ops/s 93.1674 Ops/s $\textbf{\color{#d91a1a}-12.41\%}$
test_full_like 0.6553ms 0.5741ms 1.7420 KOps/s 1.7360 KOps/s $\color{#35bf28}+0.34\%$
test_zeros_like 0.2751ms 0.1979ms 5.0527 KOps/s 5.0552 KOps/s $\color{#d91a1a}-0.05\%$
test_ones_like 0.2810ms 0.1977ms 5.0582 KOps/s 5.0533 KOps/s $\color{#35bf28}+0.10\%$
test_clone 1.2233ms 0.4144ms 2.4131 KOps/s 2.4181 KOps/s $\color{#d91a1a}-0.20\%$
test_squeeze 32.8110μs 9.5760μs 104.4273 KOps/s 102.5455 KOps/s $\color{#35bf28}+1.84\%$
test_unsqueeze 0.2192ms 71.4335μs 13.9990 KOps/s 13.2393 KOps/s $\textbf{\color{#35bf28}+5.74\%}$
test_split 0.4303ms 0.1548ms 6.4613 KOps/s 6.3754 KOps/s $\color{#35bf28}+1.35\%$
test_permute 0.2084ms 0.1732ms 5.7732 KOps/s 5.5829 KOps/s $\color{#35bf28}+3.41\%$
test_stack 1.2820ms 0.8981ms 1.1134 KOps/s 1.1604 KOps/s $\color{#d91a1a}-4.05\%$
test_cat 1.2572ms 1.2312ms 812.2005 Ops/s 811.6273 Ops/s $\color{#35bf28}+0.07\%$

@vmoens vmoens merged commit ae29423 into main Sep 30, 2024
53 of 57 checks passed
@vmoens vmoens deleted the add-sync-graph2 branch September 30, 2024 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request Quality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants