codegen/llvm: improve loadTruncate() to use a simple load when possible #18137

xxxbxxx · 2023-11-26T21:17:08Z

forcing llvm to truncate the padding bits on load prevents some optimizations.

fixes 370662c
closes #17768

xxxbxxx · 2023-11-26T21:36:47Z

The idea of this change is to assume that if there is a local variable in it's own alloca,
then the padding bits are clean.
Now,

I don't know if it's true (I expect it is possible to write garbage to the bits with some intToPtr / ptrCast. but maybe it can happen when coping from an other variable? )
I don't know if it the proper way to implement this...

however, it does get the job done:

Benchmark 1 (81 runs): ./benchmark_baseline 1 1000/10000 281483566841860
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           249ms ± 13.0ms     225ms …  298ms          4 ( 5%)        0%
  peak_rss            880KB ± 7.17KB     819KB …  881KB          2 ( 2%)        0%
  cpu_cycles          836M  ± 6.80M      828M  …  859M           2 ( 2%)        0%
  instructions       2.06G  ± 67.9      2.06G  … 2.06G          10 (12%)        0%
  cache_references    759K  ±  142K      521K  … 1.21M           1 ( 1%)        0%
  cache_misses       17.2K  ± 9.76K     1.56K  … 53.8K           1 ( 1%)        0%
  branch_misses      5.77M  ± 16.6K     5.73M  … 5.80M           0 ( 0%)        0%
Benchmark 2 (70 runs): ./benchmark_master 1 1000/10000 281483566841860
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           286ms ± 7.91ms     273ms …  306ms          0 ( 0%)        💩+ 15.0% ±  1.4%
  peak_rss            909KB ±    0       909KB …  909KB          0 ( 0%)        💩+  3.4% ±  0.2%
  cpu_cycles         1.00G  ± 8.34M      987M  … 1.02G           0 ( 0%)        💩+ 19.9% ±  0.3%
  instructions       2.39G  ± 69.2      2.39G  … 2.39G           5 ( 7%)        💩+ 16.0% ±  0.0%
  cache_references    943K  ±  697K      539K  … 6.51M           3 ( 4%)        💩+ 24.3% ± 20.5%
  cache_misses       28.9K  ± 11.8K     6.86K  … 65.4K           1 ( 1%)        💩+ 68.4% ± 20.0%
  branch_misses      5.89M  ± 13.8K     5.87M  … 5.92M           0 ( 0%)        💩+  2.1% ±  0.1%
Benchmark 3 (79 runs): ./benchmark_patched 1 1000/10000 281483566841860
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           253ms ± 8.16ms     237ms …  285ms          6 ( 8%)          +  1.7% ±  1.4%
  peak_rss            909KB ±    0       909KB …  909KB          0 ( 0%)        💩+  3.4% ±  0.2%
  cpu_cycles          888M  ± 8.69M      874M  …  918M           1 ( 1%)        💩+  6.2% ±  0.3%
  instructions       2.04G  ± 59.2      2.04G  … 2.04G          14 (18%)        ⚡-  1.1% ±  0.0%
  cache_references    978K  ± 1.56M      469K  … 14.5M          10 (13%)          + 29.0% ± 44.9%
  cache_misses       25.8K  ± 16.7K     1.10K  … 70.9K           6 ( 8%)        💩+ 50.0% ± 24.6%
  branch_misses      5.80M  ± 16.7K     5.76M  … 5.84M          13 (16%)          +  0.5% ±  0.1%

forcing llvm to truncate the padding bits prevents some optimizations. fixes 370662c see ziglang#17768

andrewrk · 2024-08-24T05:23:16Z

I don't know if it the proper way to implement this...

given this, combined with the fact that it's failing CI checks, needs a rebase, I'm going to close it. I don't know if it's the proper way to implement it either, unless I dive in and implement the whole thing myself, in which case I might as well close this and start over. sorry for your wasted efforts. feel free to reopen to start the discussion anew.

xxxbxxx · 2024-08-24T11:11:53Z

yeah I'm guessing it should be done by some generic "provenance" framework, maybe in relation to alias analysis / result location semantics..

for the record, to leave some breadcrumb for future implementer:

I've kept the commit up to date and active my local branch and the compiler works fine with it, never stumbled on a miscompilation since it's last year.
I've updated the branch xxxbxxx:load-truncate-optim with commit rebased on the current master branch (but I don't know if it's possible to update the closed pullrequest)

xxxbxxx mentioned this pull request Apr 24, 2024

design flaw: semantics of loading values with undefined bits #19634

Open

codegen/llvm: improve loadTruncate() to use a simple load when possible

5a99ca7

forcing llvm to truncate the padding bits prevents some optimizations. fixes 370662c see ziglang#17768

andrewrk force-pushed the load-truncate-optim branch from 319d1c2 to 5a99ca7 Compare May 9, 2024 23:13

andrewrk closed this Aug 24, 2024

xxxbxxx mentioned this pull request Aug 24, 2024

Suboptimal codegen for integer variables whose bit width isn't a multiple of 8 #21121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codegen/llvm: improve loadTruncate() to use a simple load when possible #18137

codegen/llvm: improve loadTruncate() to use a simple load when possible #18137

xxxbxxx commented Nov 26, 2023

xxxbxxx commented Nov 26, 2023

andrewrk commented Aug 24, 2024

xxxbxxx commented Aug 24, 2024 •

edited

Loading

codegen/llvm: improve loadTruncate() to use a simple load when possible #18137

codegen/llvm: improve loadTruncate() to use a simple load when possible #18137

Conversation

xxxbxxx commented Nov 26, 2023

xxxbxxx commented Nov 26, 2023

andrewrk commented Aug 24, 2024

xxxbxxx commented Aug 24, 2024 • edited Loading

xxxbxxx commented Aug 24, 2024 •

edited

Loading