Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] tensordict.to_padded_tensor #723

Merged
merged 2 commits into from
Mar 28, 2024
Merged

[Feature] tensordict.to_padded_tensor #723

merged 2 commits into from
Mar 28, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 27, 2024

cc @cpuhrsch

cc @ahmed-touati: once this lands, you will be able to speed up split-trajectories using this

split_trajectories(sample, as_nested=True).to_padded_tensor()

which brings a 2.5x speed-up in my benchmarks

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 27, 2024
Copy link

github-actions bot commented Mar 27, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}18$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 38.5720μs 17.2637μs 57.9252 KOps/s 59.1190 KOps/s $\color{#d91a1a}-2.02\%$
test_plain_set_stack_nested 50.0230μs 17.5170μs 57.0872 KOps/s 57.1900 KOps/s $\color{#d91a1a}-0.18\%$
test_plain_set_nested_inplace 50.8150μs 19.4855μs 51.3203 KOps/s 51.3656 KOps/s $\color{#d91a1a}-0.09\%$
test_plain_set_stack_nested_inplace 47.2780μs 19.6586μs 50.8683 KOps/s 51.2130 KOps/s $\color{#d91a1a}-0.67\%$
test_items 79.3270μs 2.4523μs 407.7795 KOps/s 409.3800 KOps/s $\color{#d91a1a}-0.39\%$
test_items_nested 0.3288ms 0.2690ms 3.7168 KOps/s 3.6604 KOps/s $\color{#35bf28}+1.54\%$
test_items_nested_locked 0.4376ms 0.2698ms 3.7061 KOps/s 3.6226 KOps/s $\color{#35bf28}+2.30\%$
test_items_nested_leaf 0.5502ms 0.1652ms 6.0536 KOps/s 5.9181 KOps/s $\color{#35bf28}+2.29\%$
test_items_stack_nested 0.4628ms 0.2695ms 3.7110 KOps/s 3.6255 KOps/s $\color{#35bf28}+2.36\%$
test_items_stack_nested_leaf 0.2079ms 0.1653ms 6.0487 KOps/s 5.9552 KOps/s $\color{#35bf28}+1.57\%$
test_items_stack_nested_locked 0.5571ms 0.2687ms 3.7213 KOps/s 3.5883 KOps/s $\color{#35bf28}+3.70\%$
test_keys 28.7330μs 3.8433μs 260.1925 KOps/s 256.9267 KOps/s $\color{#35bf28}+1.27\%$
test_keys_nested 2.2310ms 0.1443ms 6.9285 KOps/s 6.9018 KOps/s $\color{#35bf28}+0.39\%$
test_keys_nested_locked 0.2397ms 0.1468ms 6.8138 KOps/s 6.7206 KOps/s $\color{#35bf28}+1.39\%$
test_keys_nested_leaf 33.5415ms 0.1308ms 7.6433 KOps/s 7.9331 KOps/s $\color{#d91a1a}-3.65\%$
test_keys_stack_nested 0.2695ms 0.1425ms 7.0188 KOps/s 6.7301 KOps/s $\color{#35bf28}+4.29\%$
test_keys_stack_nested_leaf 0.2235ms 0.1247ms 8.0224 KOps/s 7.7064 KOps/s $\color{#35bf28}+4.10\%$
test_keys_stack_nested_locked 0.2797ms 0.1476ms 6.7739 KOps/s 6.5400 KOps/s $\color{#35bf28}+3.58\%$
test_values 8.3205μs 1.1526μs 867.5717 KOps/s 855.6009 KOps/s $\color{#35bf28}+1.40\%$
test_values_nested 0.1041ms 49.8010μs 20.0799 KOps/s 19.6885 KOps/s $\color{#35bf28}+1.99\%$
test_values_nested_locked 0.1004ms 49.6631μs 20.1357 KOps/s 19.6700 KOps/s $\color{#35bf28}+2.37\%$
test_values_nested_leaf 2.4387ms 45.3456μs 22.0528 KOps/s 21.6096 KOps/s $\color{#35bf28}+2.05\%$
test_values_stack_nested 0.1013ms 50.2227μs 19.9113 KOps/s 19.2236 KOps/s $\color{#35bf28}+3.58\%$
test_values_stack_nested_leaf 86.1400μs 45.0159μs 22.2144 KOps/s 22.0893 KOps/s $\color{#35bf28}+0.57\%$
test_values_stack_nested_locked 0.1020ms 50.1941μs 19.9227 KOps/s 19.5901 KOps/s $\color{#35bf28}+1.70\%$
test_membership 26.2390μs 1.3518μs 739.7534 KOps/s 730.6342 KOps/s $\color{#35bf28}+1.25\%$
test_membership_nested 27.8620μs 3.3916μs 294.8476 KOps/s 291.6372 KOps/s $\color{#35bf28}+1.10\%$
test_membership_nested_leaf 24.5760μs 3.4193μs 292.4541 KOps/s 267.2739 KOps/s $\textbf{\color{#35bf28}+9.42\%}$
test_membership_stacked_nested 29.5950μs 3.4187μs 292.5111 KOps/s 292.1608 KOps/s $\color{#35bf28}+0.12\%$
test_membership_stacked_nested_leaf 30.1760μs 3.4014μs 294.0009 KOps/s 288.5602 KOps/s $\color{#35bf28}+1.89\%$
test_membership_nested_last 29.7450μs 4.1188μs 242.7865 KOps/s 238.9674 KOps/s $\color{#35bf28}+1.60\%$
test_membership_nested_leaf_last 24.2660μs 4.1528μs 240.8027 KOps/s 234.6741 KOps/s $\color{#35bf28}+2.61\%$
test_membership_stacked_nested_last 31.6790μs 4.1613μs 240.3081 KOps/s 204.4082 KOps/s $\textbf{\color{#35bf28}+17.56\%}$
test_membership_stacked_nested_leaf_last 37.2790μs 4.1555μs 240.6426 KOps/s 205.3399 KOps/s $\textbf{\color{#35bf28}+17.19\%}$
test_nested_getleaf 33.8630μs 10.4930μs 95.3015 KOps/s 94.4317 KOps/s $\color{#35bf28}+0.92\%$
test_nested_get 43.0600μs 9.8827μs 101.1871 KOps/s 99.1776 KOps/s $\color{#35bf28}+2.03\%$
test_stacked_getleaf 57.1360μs 10.5177μs 95.0774 KOps/s 93.3709 KOps/s $\color{#35bf28}+1.83\%$
test_stacked_get 63.3480μs 9.8518μs 101.5039 KOps/s 99.1212 KOps/s $\color{#35bf28}+2.40\%$
test_nested_getitemleaf 34.5240μs 11.1655μs 89.5618 KOps/s 86.3184 KOps/s $\color{#35bf28}+3.76\%$
test_nested_getitem 43.8320μs 10.2270μs 97.7806 KOps/s 95.9036 KOps/s $\color{#35bf28}+1.96\%$
test_stacked_getitemleaf 53.0580μs 11.3972μs 87.7406 KOps/s 90.6544 KOps/s $\color{#d91a1a}-3.21\%$
test_stacked_getitem 33.6620μs 10.2240μs 97.8095 KOps/s 97.0259 KOps/s $\color{#35bf28}+0.81\%$
test_lock_nested 0.7061ms 0.3330ms 3.0032 KOps/s 2.9412 KOps/s $\color{#35bf28}+2.11\%$
test_lock_stack_nested 0.3745ms 0.3022ms 3.3093 KOps/s 3.3363 KOps/s $\color{#d91a1a}-0.81\%$
test_unlock_nested 88.8970ms 0.4255ms 2.3499 KOps/s 2.3392 KOps/s $\color{#35bf28}+0.46\%$
test_unlock_stack_nested 0.4097ms 0.3123ms 3.2023 KOps/s 3.2405 KOps/s $\color{#d91a1a}-1.18\%$
test_flatten_speed 0.5515ms 0.2638ms 3.7912 KOps/s 3.7574 KOps/s $\color{#35bf28}+0.90\%$
test_unflatten_speed 0.5825ms 0.4068ms 2.4584 KOps/s 2.4151 KOps/s $\color{#35bf28}+1.80\%$
test_common_ops 5.3717ms 0.7195ms 1.3899 KOps/s 1.4421 KOps/s $\color{#d91a1a}-3.62\%$
test_creation 34.3950μs 1.8251μs 547.9083 KOps/s 555.1214 KOps/s $\color{#d91a1a}-1.30\%$
test_creation_empty 42.8800μs 11.4405μs 87.4088 KOps/s 96.6454 KOps/s $\textbf{\color{#d91a1a}-9.56\%}$
test_creation_nested_1 43.5610μs 14.0600μs 71.1237 KOps/s 77.4148 KOps/s $\textbf{\color{#d91a1a}-8.13\%}$
test_creation_nested_2 64.1390μs 17.3737μs 57.5583 KOps/s 60.5992 KOps/s $\textbf{\color{#d91a1a}-5.02\%}$
test_clone 0.1657ms 13.4736μs 74.2191 KOps/s 74.1466 KOps/s $\color{#35bf28}+0.10\%$
test_getitem[int] 37.0490μs 11.3603μs 88.0260 KOps/s 87.6266 KOps/s $\color{#35bf28}+0.46\%$
test_getitem[slice_int] 79.5380μs 23.5527μs 42.4580 KOps/s 44.4475 KOps/s $\color{#d91a1a}-4.48\%$
test_getitem[range] 0.2254ms 43.4147μs 23.0337 KOps/s 23.8163 KOps/s $\color{#d91a1a}-3.29\%$
test_getitem[tuple] 76.1120μs 18.9411μs 52.7953 KOps/s 52.1264 KOps/s $\color{#35bf28}+1.28\%$
test_getitem[list] 0.1073ms 39.2828μs 25.4564 KOps/s 26.8191 KOps/s $\textbf{\color{#d91a1a}-5.08\%}$
test_setitem_dim[int] 77.1330μs 36.9447μs 27.0675 KOps/s 29.0504 KOps/s $\textbf{\color{#d91a1a}-6.83\%}$
test_setitem_dim[slice_int] 0.1086ms 63.1480μs 15.8358 KOps/s 16.4047 KOps/s $\color{#d91a1a}-3.47\%$
test_setitem_dim[range] 0.1521ms 82.3375μs 12.1451 KOps/s 12.3641 KOps/s $\color{#d91a1a}-1.77\%$
test_setitem_dim[tuple] 0.1053ms 52.7922μs 18.9422 KOps/s 20.0272 KOps/s $\textbf{\color{#d91a1a}-5.42\%}$
test_setitem 0.1352ms 20.9422μs 47.7504 KOps/s 48.8498 KOps/s $\color{#d91a1a}-2.25\%$
test_set 96.7600μs 20.3008μs 49.2591 KOps/s 50.8760 KOps/s $\color{#d91a1a}-3.18\%$
test_set_shared 1.9286ms 0.1452ms 6.8856 KOps/s 6.9639 KOps/s $\color{#d91a1a}-1.12\%$
test_update 0.1942ms 22.9520μs 43.5691 KOps/s 47.2999 KOps/s $\textbf{\color{#d91a1a}-7.89\%}$
test_update_nested 0.8509ms 30.9254μs 32.3359 KOps/s 34.5729 KOps/s $\textbf{\color{#d91a1a}-6.47\%}$
test_update__nested 98.5430μs 24.7431μs 40.4152 KOps/s 40.9014 KOps/s $\color{#d91a1a}-1.19\%$
test_set_nested 97.4820μs 22.2783μs 44.8867 KOps/s 47.2159 KOps/s $\color{#d91a1a}-4.93\%$
test_set_nested_new 0.1970ms 26.2150μs 38.1462 KOps/s 40.3136 KOps/s $\textbf{\color{#d91a1a}-5.38\%}$
test_select 0.1653ms 40.1784μs 24.8890 KOps/s 24.9031 KOps/s $\color{#d91a1a}-0.06\%$
test_select_nested 0.1560ms 58.9737μs 16.9567 KOps/s 17.0175 KOps/s $\color{#d91a1a}-0.36\%$
test_exclude_nested 0.2045ms 0.1164ms 8.5907 KOps/s 8.5281 KOps/s $\color{#35bf28}+0.73\%$
test_empty[True] 0.5754ms 0.4070ms 2.4567 KOps/s 2.4249 KOps/s $\color{#35bf28}+1.31\%$
test_empty[False] 6.2536μs 1.0493μs 953.0135 KOps/s 952.9854 KOps/s $+0.00\%$
test_unbind_speed 0.3239ms 0.2438ms 4.1012 KOps/s 4.0519 KOps/s $\color{#35bf28}+1.22\%$
test_unbind_speed_stack0 0.4105ms 0.2444ms 4.0921 KOps/s 4.1595 KOps/s $\color{#d91a1a}-1.62\%$
test_unbind_speed_stack1 0.1310s 0.6830ms 1.4641 KOps/s 1.4733 KOps/s $\color{#d91a1a}-0.63\%$
test_split 0.1199s 1.6623ms 601.5585 Ops/s 601.0026 Ops/s $\color{#35bf28}+0.09\%$
test_chunk 1.7111ms 1.4708ms 679.8954 Ops/s 686.1311 Ops/s $\color{#d91a1a}-0.91\%$
test_creation[device0] 0.1969ms 0.1039ms 9.6228 KOps/s 9.4265 KOps/s $\color{#35bf28}+2.08\%$
test_creation_from_tensor 6.1756ms 86.2082μs 11.5998 KOps/s 12.1401 KOps/s $\color{#d91a1a}-4.45\%$
test_add_one[memmap_tensor0] 0.1883ms 5.8435μs 171.1291 KOps/s 182.4048 KOps/s $\textbf{\color{#d91a1a}-6.18\%}$
test_contiguous[memmap_tensor0] 12.1530μs 0.6424μs 1.5567 MOps/s 1.5222 MOps/s $\color{#35bf28}+2.27\%$
test_stack[memmap_tensor0] 34.0130μs 3.8617μs 258.9505 KOps/s 277.1399 KOps/s $\textbf{\color{#d91a1a}-6.56\%}$
test_memmaptd_index 0.8876ms 0.2546ms 3.9276 KOps/s 4.1818 KOps/s $\textbf{\color{#d91a1a}-6.08\%}$
test_memmaptd_index_astensor 0.5631ms 0.3156ms 3.1683 KOps/s 3.3266 KOps/s $\color{#d91a1a}-4.76\%$
test_memmaptd_index_op 0.9599ms 0.6452ms 1.5498 KOps/s 1.6850 KOps/s $\textbf{\color{#d91a1a}-8.02\%}$
test_serialize_model 0.2227s 0.1160s 8.6220 Ops/s 8.5359 Ops/s $\color{#35bf28}+1.01\%$
test_serialize_model_pickle 0.4498s 0.3796s 2.6345 Ops/s 2.4765 Ops/s $\textbf{\color{#35bf28}+6.38\%}$
test_serialize_weights 0.1044s 99.7736ms 10.0227 Ops/s 10.1217 Ops/s $\color{#d91a1a}-0.98\%$
test_serialize_weights_returnearly 0.2544s 0.1522s 6.5702 Ops/s 8.1061 Ops/s $\textbf{\color{#d91a1a}-18.95\%}$
test_serialize_weights_pickle 1.0423s 0.5914s 1.6909 Ops/s 2.3556 Ops/s $\textbf{\color{#d91a1a}-28.22\%}$
test_serialize_weights_filesystem 98.5383ms 93.4796ms 10.6975 Ops/s 10.7860 Ops/s $\color{#d91a1a}-0.82\%$
test_serialize_model_filesystem 0.1013s 96.3928ms 10.3742 Ops/s 10.3099 Ops/s $\color{#35bf28}+0.62\%$
test_reshape_pytree 71.6130μs 21.0592μs 47.4852 KOps/s 47.3128 KOps/s $\color{#35bf28}+0.36\%$
test_reshape_td 0.1001ms 32.9726μs 30.3282 KOps/s 31.3683 KOps/s $\color{#d91a1a}-3.32\%$
test_view_pytree 53.3290μs 21.0573μs 47.4896 KOps/s 47.8559 KOps/s $\color{#d91a1a}-0.77\%$
test_view_td 0.1311s 61.9214μs 16.1495 KOps/s 16.5974 KOps/s $\color{#d91a1a}-2.70\%$
test_unbind_pytree 73.5860μs 24.2213μs 41.2860 KOps/s 41.6844 KOps/s $\color{#d91a1a}-0.96\%$
test_unbind_td 0.1233ms 35.9359μs 27.8273 KOps/s 26.9906 KOps/s $\color{#35bf28}+3.10\%$
test_split_pytree 60.8530μs 23.8229μs 41.9764 KOps/s 41.7413 KOps/s $\color{#35bf28}+0.56\%$
test_split_td 0.1245ms 39.9714μs 25.0179 KOps/s 24.7021 KOps/s $\color{#35bf28}+1.28\%$
test_add_pytree 70.4010μs 30.1807μs 33.1337 KOps/s 33.6337 KOps/s $\color{#d91a1a}-1.49\%$
test_add_td 0.1187ms 57.7690μs 17.3103 KOps/s 17.9440 KOps/s $\color{#d91a1a}-3.53\%$
test_distributed 0.2900ms 0.1023ms 9.7742 KOps/s 9.9186 KOps/s $\color{#d91a1a}-1.46\%$
test_tdmodule 40.3250μs 18.2969μs 54.6541 KOps/s 58.0911 KOps/s $\textbf{\color{#d91a1a}-5.92\%}$
test_tdmodule_dispatch 61.6350μs 35.7059μs 28.0066 KOps/s 29.4226 KOps/s $\color{#d91a1a}-4.81\%$
test_tdseq 39.3040μs 21.5148μs 46.4796 KOps/s 50.1802 KOps/s $\textbf{\color{#d91a1a}-7.37\%}$
test_tdseq_dispatch 76.2620μs 41.1699μs 24.2896 KOps/s 25.7067 KOps/s $\textbf{\color{#d91a1a}-5.51\%}$
test_instantiation_functorch 1.9055ms 1.3077ms 764.7235 Ops/s 757.0340 Ops/s $\color{#35bf28}+1.02\%$
test_instantiation_td 1.6638ms 1.0128ms 987.3632 Ops/s 985.6893 Ops/s $\color{#35bf28}+0.17\%$
test_exec_functorch 0.3200ms 0.1609ms 6.2159 KOps/s 6.4605 KOps/s $\color{#d91a1a}-3.79\%$
test_exec_functional_call 0.3476ms 0.1513ms 6.6074 KOps/s 6.9267 KOps/s $\color{#d91a1a}-4.61\%$
test_exec_td 0.2699ms 0.1477ms 6.7696 KOps/s 7.0088 KOps/s $\color{#d91a1a}-3.41\%$
test_exec_td_decorator 0.3296ms 0.1983ms 5.0421 KOps/s 5.0793 KOps/s $\color{#d91a1a}-0.73\%$
test_vmap_mlp_speed[True-True] 0.7466ms 0.4879ms 2.0497 KOps/s 2.1251 KOps/s $\color{#d91a1a}-3.55\%$
test_vmap_mlp_speed[True-False] 0.7505ms 0.4895ms 2.0431 KOps/s 2.1325 KOps/s $\color{#d91a1a}-4.19\%$
test_vmap_mlp_speed[False-True] 0.6477ms 0.3951ms 2.5312 KOps/s 2.5971 KOps/s $\color{#d91a1a}-2.54\%$
test_vmap_mlp_speed[False-False] 0.5749ms 0.3964ms 2.5226 KOps/s 2.6034 KOps/s $\color{#d91a1a}-3.11\%$
test_vmap_mlp_speed_decorator[True-True] 0.7537ms 0.5086ms 1.9661 KOps/s 2.0232 KOps/s $\color{#d91a1a}-2.82\%$
test_vmap_mlp_speed_decorator[True-False] 0.8618ms 0.5076ms 1.9699 KOps/s 2.0333 KOps/s $\color{#d91a1a}-3.12\%$
test_vmap_mlp_speed_decorator[False-True] 0.6934ms 0.4155ms 2.4069 KOps/s 2.5010 KOps/s $\color{#d91a1a}-3.76\%$
test_vmap_mlp_speed_decorator[False-False] 0.6625ms 0.4100ms 2.4391 KOps/s 2.5035 KOps/s $\color{#d91a1a}-2.57\%$
test_to_module_speed[True] 2.2498ms 1.3905ms 719.1856 Ops/s 712.8044 Ops/s $\color{#35bf28}+0.90\%$
test_to_module_speed[False] 1.9289ms 1.3755ms 727.0136 Ops/s 725.5271 Ops/s $\color{#35bf28}+0.20\%$

@vmoens vmoens changed the title [WIP] tensordict.to_padded_tensor [Feature] tensordict.to_padded_tensor Mar 28, 2024
@vmoens vmoens added the enhancement New feature or request label Mar 28, 2024
@vmoens vmoens merged commit 7304a1c into main Mar 28, 2024
45 of 48 checks passed
@vmoens vmoens deleted the to_padded_tensor branch October 21, 2024 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants