Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Ensure dtype is preserved with autocast #773

Merged
merged 1 commit into from
May 13, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented May 13, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 13, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 29.7360μs 17.0931μs 58.5032 KOps/s 57.4568 KOps/s $\color{#35bf28}+1.82\%$
test_plain_set_stack_nested 42.5690μs 16.9518μs 58.9907 KOps/s 57.1631 KOps/s $\color{#35bf28}+3.20\%$
test_plain_set_nested_inplace 67.4960μs 19.6464μs 50.8998 KOps/s 50.6200 KOps/s $\color{#35bf28}+0.55\%$
test_plain_set_stack_nested_inplace 69.0590μs 19.3975μs 51.5531 KOps/s 50.8889 KOps/s $\color{#35bf28}+1.31\%$
test_items 24.1460μs 2.6330μs 379.8010 KOps/s 402.9169 KOps/s $\textbf{\color{#d91a1a}-5.74\%}$
test_items_nested 0.5768ms 0.2753ms 3.6325 KOps/s 3.7220 KOps/s $\color{#d91a1a}-2.40\%$
test_items_nested_locked 0.5767ms 0.2712ms 3.6868 KOps/s 3.7579 KOps/s $\color{#d91a1a}-1.89\%$
test_items_nested_leaf 0.1661ms 77.6200μs 12.8833 KOps/s 13.0233 KOps/s $\color{#d91a1a}-1.07\%$
test_items_stack_nested 0.5667ms 0.2724ms 3.6709 KOps/s 3.7300 KOps/s $\color{#d91a1a}-1.58\%$
test_items_stack_nested_leaf 0.1599ms 77.5631μs 12.8927 KOps/s 12.5407 KOps/s $\color{#35bf28}+2.81\%$
test_items_stack_nested_locked 0.5704ms 0.2760ms 3.6229 KOps/s 3.6737 KOps/s $\color{#d91a1a}-1.38\%$
test_keys 52.8180μs 3.9495μs 253.1939 KOps/s 237.2636 KOps/s $\textbf{\color{#35bf28}+6.71\%}$
test_keys_nested 0.2902ms 0.1406ms 7.1108 KOps/s 7.2764 KOps/s $\color{#d91a1a}-2.28\%$
test_keys_nested_locked 2.1830ms 0.1432ms 6.9850 KOps/s 6.9949 KOps/s $\color{#d91a1a}-0.14\%$
test_keys_nested_leaf 0.2391ms 0.1167ms 8.5684 KOps/s 8.1573 KOps/s $\textbf{\color{#35bf28}+5.04\%}$
test_keys_stack_nested 0.3005ms 0.1376ms 7.2693 KOps/s 7.3396 KOps/s $\color{#d91a1a}-0.96\%$
test_keys_stack_nested_leaf 0.2342ms 0.1189ms 8.4116 KOps/s 8.6097 KOps/s $\color{#d91a1a}-2.30\%$
test_keys_stack_nested_locked 0.2493ms 0.1403ms 7.1272 KOps/s 7.0544 KOps/s $\color{#35bf28}+1.03\%$
test_values 11.3362μs 1.1534μs 866.9754 KOps/s 869.8048 KOps/s $\color{#d91a1a}-0.33\%$
test_values_nested 0.1016ms 50.8325μs 19.6724 KOps/s 19.8106 KOps/s $\color{#d91a1a}-0.70\%$
test_values_nested_locked 91.6710μs 50.8009μs 19.6847 KOps/s 19.8876 KOps/s $\color{#d91a1a}-1.02\%$
test_values_nested_leaf 82.3340μs 46.0153μs 21.7319 KOps/s 21.8074 KOps/s $\color{#d91a1a}-0.35\%$
test_values_stack_nested 0.1206ms 52.6224μs 19.0033 KOps/s 19.6172 KOps/s $\color{#d91a1a}-3.13\%$
test_values_stack_nested_leaf 96.0690μs 45.8094μs 21.8296 KOps/s 22.0375 KOps/s $\color{#d91a1a}-0.94\%$
test_values_stack_nested_locked 0.1154ms 52.2300μs 19.1461 KOps/s 19.7555 KOps/s $\color{#d91a1a}-3.08\%$
test_membership 19.9880μs 1.3346μs 749.2718 KOps/s 742.7704 KOps/s $\color{#35bf28}+0.88\%$
test_membership_nested 49.5620μs 3.3518μs 298.3429 KOps/s 296.7284 KOps/s $\color{#35bf28}+0.54\%$
test_membership_nested_leaf 24.1250μs 3.3751μs 296.2916 KOps/s 293.3240 KOps/s $\color{#35bf28}+1.01\%$
test_membership_stacked_nested 48.1100μs 3.3250μs 300.7550 KOps/s 295.5873 KOps/s $\color{#35bf28}+1.75\%$
test_membership_stacked_nested_leaf 28.2720μs 3.3350μs 299.8540 KOps/s 296.2329 KOps/s $\color{#35bf28}+1.22\%$
test_membership_nested_last 40.4060μs 4.5530μs 219.6352 KOps/s 231.1296 KOps/s $\color{#d91a1a}-4.97\%$
test_membership_nested_leaf_last 31.4490μs 4.2462μs 235.5038 KOps/s 234.6714 KOps/s $\color{#35bf28}+0.35\%$
test_membership_stacked_nested_last 57.0660μs 11.2337μs 89.0178 KOps/s 237.3767 KOps/s $\textbf{\color{#d91a1a}-62.50\%}$
test_membership_stacked_nested_leaf_last 45.9560μs 11.2627μs 88.7887 KOps/s 233.6286 KOps/s $\textbf{\color{#d91a1a}-62.00\%}$
test_nested_getleaf 58.0780μs 10.7450μs 93.0665 KOps/s 95.6449 KOps/s $\color{#d91a1a}-2.70\%$
test_nested_get 55.6640μs 10.1072μs 98.9390 KOps/s 98.2965 KOps/s $\color{#35bf28}+0.65\%$
test_stacked_getleaf 38.6830μs 10.7021μs 93.4397 KOps/s 91.9944 KOps/s $\color{#35bf28}+1.57\%$
test_stacked_get 55.5530μs 10.0025μs 99.9752 KOps/s 97.9202 KOps/s $\color{#35bf28}+2.10\%$
test_nested_getitemleaf 59.1500μs 11.1738μs 89.4952 KOps/s 87.5068 KOps/s $\color{#35bf28}+2.27\%$
test_nested_getitem 32.9320μs 10.3115μs 96.9795 KOps/s 95.4913 KOps/s $\color{#35bf28}+1.56\%$
test_stacked_getitemleaf 54.5050μs 11.0787μs 90.2633 KOps/s 90.4780 KOps/s $\color{#d91a1a}-0.24\%$
test_stacked_getitem 46.6140μs 10.1521μs 98.5021 KOps/s 97.3179 KOps/s $\color{#35bf28}+1.22\%$
test_lock_nested 52.0627ms 0.4031ms 2.4810 KOps/s 2.8274 KOps/s $\textbf{\color{#d91a1a}-12.25\%}$
test_lock_stack_nested 0.5375ms 0.2999ms 3.3344 KOps/s 3.2507 KOps/s $\color{#35bf28}+2.57\%$
test_unlock_nested 0.8736ms 0.3539ms 2.8255 KOps/s 2.5124 KOps/s $\textbf{\color{#35bf28}+12.46\%}$
test_unlock_stack_nested 0.5045ms 0.3050ms 3.2787 KOps/s 3.1726 KOps/s $\color{#35bf28}+3.34\%$
test_flatten_speed 0.2221ms 95.8295μs 10.4352 KOps/s 10.4581 KOps/s $\color{#d91a1a}-0.22\%$
test_unflatten_speed 0.7072ms 0.4106ms 2.4355 KOps/s 2.4963 KOps/s $\color{#d91a1a}-2.44\%$
test_common_ops 3.0153ms 0.7152ms 1.3983 KOps/s 1.3906 KOps/s $\color{#35bf28}+0.56\%$
test_creation 0.1019ms 1.9019μs 525.7825 KOps/s 537.0045 KOps/s $\color{#d91a1a}-2.09\%$
test_creation_empty 35.1250μs 10.0064μs 99.9363 KOps/s 92.9134 KOps/s $\textbf{\color{#35bf28}+7.56\%}$
test_creation_nested_1 42.9800μs 12.6654μs 78.9552 KOps/s 72.2926 KOps/s $\textbf{\color{#35bf28}+9.22\%}$
test_creation_nested_2 40.2950μs 15.8436μs 63.1169 KOps/s 58.4259 KOps/s $\textbf{\color{#35bf28}+8.03\%}$
test_clone 0.1276ms 13.4659μs 74.2617 KOps/s 77.3186 KOps/s $\color{#d91a1a}-3.95\%$
test_getitem[int] 44.5630μs 11.2943μs 88.5406 KOps/s 85.0521 KOps/s $\color{#35bf28}+4.10\%$
test_getitem[slice_int] 66.4440μs 22.5545μs 44.3370 KOps/s 44.1537 KOps/s $\color{#35bf28}+0.42\%$
test_getitem[range] 78.7170μs 57.3488μs 17.4372 KOps/s 16.8279 KOps/s $\color{#35bf28}+3.62\%$
test_getitem[tuple] 53.3500μs 19.2847μs 51.8545 KOps/s 54.0518 KOps/s $\color{#d91a1a}-4.07\%$
test_getitem[list] 0.1234ms 40.9826μs 24.4006 KOps/s 24.8048 KOps/s $\color{#d91a1a}-1.63\%$
test_setitem_dim[int] 67.7770μs 33.7984μs 29.5872 KOps/s 28.5263 KOps/s $\color{#35bf28}+3.72\%$
test_setitem_dim[slice_int] 0.1010ms 60.3863μs 16.5600 KOps/s 16.2577 KOps/s $\color{#35bf28}+1.86\%$
test_setitem_dim[range] 0.1526ms 82.6004μs 12.1065 KOps/s 12.0948 KOps/s $\color{#35bf28}+0.10\%$
test_setitem_dim[tuple] 94.3260μs 50.0380μs 19.9848 KOps/s 20.4611 KOps/s $\color{#d91a1a}-2.33\%$
test_setitem 78.1660μs 19.7768μs 50.5642 KOps/s 49.8970 KOps/s $\color{#35bf28}+1.34\%$
test_set 0.3751ms 19.2782μs 51.8721 KOps/s 51.9452 KOps/s $\color{#d91a1a}-0.14\%$
test_set_shared 1.6121ms 0.1425ms 7.0159 KOps/s 6.9684 KOps/s $\color{#35bf28}+0.68\%$
test_update 0.1194ms 21.2469μs 47.0657 KOps/s 46.2939 KOps/s $\color{#35bf28}+1.67\%$
test_update_nested 81.6120μs 29.3254μs 34.1002 KOps/s 34.0675 KOps/s $\color{#35bf28}+0.10\%$
test_update__nested 91.2830μs 25.6011μs 39.0609 KOps/s 41.0463 KOps/s $\color{#d91a1a}-4.84\%$
test_set_nested 68.0070μs 21.4222μs 46.6806 KOps/s 47.0047 KOps/s $\color{#d91a1a}-0.69\%$
test_set_nested_new 80.5070μs 25.6190μs 39.0335 KOps/s 40.3618 KOps/s $\color{#d91a1a}-3.29\%$
test_select 89.2760μs 41.4914μs 24.1014 KOps/s 25.4002 KOps/s $\textbf{\color{#d91a1a}-5.11\%}$
test_select_nested 0.1210ms 60.4353μs 16.5466 KOps/s 16.6438 KOps/s $\color{#d91a1a}-0.58\%$
test_exclude_nested 0.1949ms 0.1215ms 8.2317 KOps/s 8.3813 KOps/s $\color{#d91a1a}-1.78\%$
test_empty[True] 0.6591ms 0.3982ms 2.5114 KOps/s 2.5314 KOps/s $\color{#d91a1a}-0.79\%$
test_empty[False] 6.5442μs 1.0394μs 962.1365 KOps/s 938.6201 KOps/s $\color{#35bf28}+2.51\%$
test_unbind_speed 0.3509ms 0.2586ms 3.8677 KOps/s 3.9142 KOps/s $\color{#d91a1a}-1.19\%$
test_unbind_speed_stack0 5.1181ms 0.2485ms 4.0247 KOps/s 3.9275 KOps/s $\color{#35bf28}+2.48\%$
test_unbind_speed_stack1 67.0147ms 0.7351ms 1.3603 KOps/s 1.3038 KOps/s $\color{#35bf28}+4.34\%$
test_split 67.2591ms 1.6180ms 618.0614 Ops/s 621.0309 Ops/s $\color{#d91a1a}-0.48\%$
test_chunk 68.1900ms 1.6052ms 622.9660 Ops/s 625.5501 Ops/s $\color{#d91a1a}-0.41\%$
test_creation[device0] 0.1831ms 0.1056ms 9.4687 KOps/s 9.6312 KOps/s $\color{#d91a1a}-1.69\%$
test_creation_from_tensor 3.6525ms 85.2447μs 11.7309 KOps/s 11.8873 KOps/s $\color{#d91a1a}-1.32\%$
test_add_one[memmap_tensor0] 72.8560μs 5.1505μs 194.1566 KOps/s 187.5413 KOps/s $\color{#35bf28}+3.53\%$
test_contiguous[memmap_tensor0] 17.2330μs 0.6283μs 1.5916 MOps/s 1.5715 MOps/s $\color{#35bf28}+1.28\%$
test_stack[memmap_tensor0] 19.2760μs 3.6182μs 276.3834 KOps/s 279.5627 KOps/s $\color{#d91a1a}-1.14\%$
test_memmaptd_index 1.0237ms 0.2515ms 3.9756 KOps/s 4.0147 KOps/s $\color{#d91a1a}-0.97\%$
test_memmaptd_index_astensor 0.5651ms 0.3270ms 3.0579 KOps/s 3.0873 KOps/s $\color{#d91a1a}-0.95\%$
test_memmaptd_index_op 0.9676ms 0.5978ms 1.6727 KOps/s 1.6203 KOps/s $\color{#35bf28}+3.24\%$
test_serialize_model 0.1785s 0.1105s 9.0458 Ops/s 8.6685 Ops/s $\color{#35bf28}+4.35\%$
test_serialize_model_pickle 0.4502s 0.3776s 2.6480 Ops/s 2.6315 Ops/s $\color{#35bf28}+0.63\%$
test_serialize_weights 0.1705s 0.1088s 9.1884 Ops/s 9.0475 Ops/s $\color{#35bf28}+1.56\%$
test_serialize_weights_returnearly 0.1905s 0.1317s 7.5919 Ops/s 8.0789 Ops/s $\textbf{\color{#d91a1a}-6.03\%}$
test_serialize_weights_pickle 0.9645s 0.5548s 1.8025 Ops/s 2.4076 Ops/s $\textbf{\color{#d91a1a}-25.14\%}$
test_serialize_weights_filesystem 95.5914ms 91.9806ms 10.8719 Ops/s 10.4380 Ops/s $\color{#35bf28}+4.16\%$
test_serialize_model_filesystem 0.1671s 99.9707ms 10.0029 Ops/s 9.8982 Ops/s $\color{#35bf28}+1.06\%$
test_reshape_pytree 54.7920μs 25.6184μs 39.0344 KOps/s 39.4329 KOps/s $\color{#d91a1a}-1.01\%$
test_reshape_td 80.4910μs 33.5048μs 29.8465 KOps/s 30.6795 KOps/s $\color{#d91a1a}-2.72\%$
test_view_pytree 70.7820μs 25.0142μs 39.9773 KOps/s 40.2159 KOps/s $\color{#d91a1a}-0.59\%$
test_view_td 0.1053ms 37.1838μs 26.8935 KOps/s 27.4738 KOps/s $\color{#d91a1a}-2.11\%$
test_unbind_pytree 76.2630μs 28.4676μs 35.1277 KOps/s 34.9097 KOps/s $\color{#35bf28}+0.62\%$
test_unbind_td 0.3984ms 38.0627μs 26.2724 KOps/s 26.5378 KOps/s $\color{#d91a1a}-1.00\%$
test_split_pytree 71.4540μs 29.0572μs 34.4148 KOps/s 35.6741 KOps/s $\color{#d91a1a}-3.53\%$
test_split_td 0.1248ms 40.8909μs 24.4553 KOps/s 24.6363 KOps/s $\color{#d91a1a}-0.73\%$
test_add_pytree 96.0190μs 34.6741μs 28.8400 KOps/s 29.5923 KOps/s $\color{#d91a1a}-2.54\%$
test_add_td 0.1176ms 54.0279μs 18.5089 KOps/s 18.6694 KOps/s $\color{#d91a1a}-0.86\%$
test_distributed 0.2638ms 0.1040ms 9.6126 KOps/s 9.5993 KOps/s $\color{#35bf28}+0.14\%$
test_tdmodule 67.3260μs 16.7862μs 59.5727 KOps/s 56.7630 KOps/s $\color{#35bf28}+4.95\%$
test_tdmodule_dispatch 49.4720μs 33.6382μs 29.7281 KOps/s 28.0122 KOps/s $\textbf{\color{#35bf28}+6.13\%}$
test_tdseq 40.8960μs 19.7377μs 50.6644 KOps/s 49.3158 KOps/s $\color{#35bf28}+2.73\%$
test_tdseq_dispatch 66.9850μs 38.7256μs 25.8227 KOps/s 24.8477 KOps/s $\color{#35bf28}+3.92\%$
test_instantiation_functorch 2.0379ms 1.3515ms 739.8948 Ops/s 757.2964 Ops/s $\color{#d91a1a}-2.30\%$
test_instantiation_td 1.6298ms 1.0415ms 960.1499 Ops/s 992.2791 Ops/s $\color{#d91a1a}-3.24\%$
test_exec_functorch 0.3088ms 0.1629ms 6.1396 KOps/s 6.2734 KOps/s $\color{#d91a1a}-2.13\%$
test_exec_functional_call 0.3102ms 0.1510ms 6.6224 KOps/s 6.8031 KOps/s $\color{#d91a1a}-2.66\%$
test_exec_td 0.2856ms 0.1439ms 6.9515 KOps/s 7.0241 KOps/s $\color{#d91a1a}-1.03\%$
test_exec_td_decorator 0.6801ms 0.2253ms 4.4378 KOps/s 4.6075 KOps/s $\color{#d91a1a}-3.68\%$
test_vmap_mlp_speed[True-True] 0.5960ms 0.4793ms 2.0866 KOps/s 2.0437 KOps/s $\color{#35bf28}+2.10\%$
test_vmap_mlp_speed[True-False] 0.7279ms 0.4775ms 2.0941 KOps/s 2.0600 KOps/s $\color{#35bf28}+1.66\%$
test_vmap_mlp_speed[False-True] 0.6144ms 0.3903ms 2.5621 KOps/s 2.5665 KOps/s $\color{#d91a1a}-0.17\%$
test_vmap_mlp_speed[False-False] 0.5575ms 0.3889ms 2.5715 KOps/s 2.5382 KOps/s $\color{#35bf28}+1.31\%$
test_vmap_mlp_speed_decorator[True-True] 1.3677ms 0.5490ms 1.8215 KOps/s 1.7915 KOps/s $\color{#35bf28}+1.68\%$
test_vmap_mlp_speed_decorator[True-False] 0.7260ms 0.5450ms 1.8347 KOps/s 1.8111 KOps/s $\color{#35bf28}+1.30\%$
test_vmap_mlp_speed_decorator[False-True] 0.7261ms 0.4508ms 2.2182 KOps/s 2.2034 KOps/s $\color{#35bf28}+0.67\%$
test_vmap_mlp_speed_decorator[False-False] 0.7193ms 0.4517ms 2.2136 KOps/s 2.2003 KOps/s $\color{#35bf28}+0.61\%$
test_to_module_speed[True] 2.6624ms 1.6832ms 594.1066 Ops/s 602.8526 Ops/s $\color{#d91a1a}-1.45\%$
test_to_module_speed[False] 2.3898ms 1.6232ms 616.0744 Ops/s 600.6259 Ops/s $\color{#35bf28}+2.57\%$

@vmoens vmoens added the bug Something isn't working label May 13, 2024
@vmoens vmoens merged commit 5c4cef7 into main May 13, 2024
32 of 38 checks passed
@vmoens vmoens deleted the autocast-dtype-fix branch May 13, 2024 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants