Replies: 28 comments 31 replies
-
DCEI first implemented a simple dead code elimination pass. For global DCE, I simply generated a set of all used variables using a first pass, and removed any instructions writing to destinations not in this set. For local DCE, I needed to detect situations where writes occur after writes with no reads in between, and then delete the first write. For this, I decided to actually loop through instructions in reverse, where I would add destinations to a "written but not read" set until they were read, in which case they were removed (In cases like LVNLVN was definitely the majority of the work for this lesson. I decided to represent my table as a number of maps between numerical IDs, variable names (representatives) and values. Values were stored in a few ways, with constants stored as I also had the time to implement canonicalization, which includes commutative operators (like SummaryBy far the biggest challenge occurred when implementing LVN and dealing with all the edge cases where issues occurred. My main issues were when dealing with unknown instructions that generated values. Initially, I entirely ignored values created by unknown instructions which lead to problems when variables were renamed, and afterwards I had issues with instructions like
This caused problems as |
Beta Was this translation helpful? Give feedback.
-
Testing I wrote my own testing tool that takes a bril-optimizing unix-shell-pipe-command and a bunch of bril programs and checks that the optimization doesn't change the behavior of the program and doesn't make the programs any worse (for a configurable definition of worse). I thought this would be worth the effort because I didn't want to manually parse the output of brench to make sure every program was getting faster. DCE I implemented my optimizations as Python scripts and by far the best decision I made during my DCE implementation was to create decorators for global and local optimizations as well as iterating to convergence. That way I can implement each optimization as a single pass over a single basic block or function and push the shared logic of disassembling and reassembling full program files into the decorators. Besides that, this was pretty straightforward. I implemented the global and local dce passes we discussed in class and used my testing tool to confirm that they did not break or slow down any programs in the dce testing directory or the core benchmarks. LVN This one was much more of an adventure. One fun thing my early lvn implementations did was abuse the boolean representation to perform optimizations like turning
into
I thought that was kind of clever and I was proud of it but as it turns out that isn't allowed so I had to add some type information to my value representation. Obviously there were lots of other bugs as well, but I thought this one was the most fun. I tested my lvn implementation as well as my lvn implementation together with my global dce implementation on the tdce and lvn test directories as well as the core bril benchmarks. They do not break any of those programs nor do they make them slower. When I run both passes together I can remove 144 (or 11%) of the static instructions and 7235 (approximately 0%) of the dynamic instructions in the core bril benchmarks. But running both passes on the combined contents of the tdce and lvn test directories (which are presumably designed to be highly optimizable by these passes) saves 59 dynamic instructions (51%) and 93 static instructions (58%). |
Beta Was this translation helpful? Give feedback.
-
I started with trivial dead code elimination (TDCE) to get rid of instructions that define variables never used. Global TDCE collects all the variables actually used in a function and removes any assignments that don’t contribute to the final result. Local TDCE eliminates cases where a variable gets overwritten before it’s ever read, which I handled by tracking the last write to each variable in a block and removing earlier, unused definitions. This helped clean up a lot of unnecessary instructions, but I realized some of these optimizations would be redundant once I implemented LVN. LVN (w Copy Propagation) LVN took more effort. The basic version eliminates redundant computations by keeping track of a value table that maps expressions to numerical IDs and canonical variables, meaning repeated calculations get replaced with a single stored value. I also added canonicalization for commutative operations like add and mul so that equivalent expressions (like a + b and b + a) weren’t recomputed unnecessarily. Then I added copy propagation, which turned out to be trickier than expected. At first, my implementation wasn’t fully resolving id chains, meaning redundant assignments stuck around. The fix was resolving each variable’s original value before inserting instructions, so print copy3 would correctly turn into print x. Once that was working, TDCE was able to clean up the leftover assignments entirely. I tested everything using brench with Bril’s core benchmarks to make sure the optimizations worked without breaking anything. For copy propagation, I ran programs with multiple redundant id assignments and checked that only the necessary ones remained. LVN consistently reduced instruction count by eliminating redundant computations, though the actual improvement varied across benchmarks. The hardest part was finding the right balance between optimization and correctness. Early versions of LVN were too aggressive, removing function arguments and messing up control flow. Debugging involved carefully stepping through intermediate program output and adjusting LVN to preserve necessary computations while still optimizing effectively. Making sure function parameters and branch conditions weren’t accidentally eliminated was a key fix. |
Beta Was this translation helpful? Give feedback.
-
I implemented global and local trivial dead code elimination here and a local value numbering pass here, which I coupled with copy propagation and optimization of arguments for commutative operations. Finally, because I was initially having some issues with Brench, I wrote a tiny correctness script to TDCE PassesFor global TDCE, my implementation is a single function that looks through all of the functions in the program. For each function, I used a higher-order function to gather a set of "used" argument variables (which is just a set of strings), and then made another pass through the function to get rid of all instructions that write to variables that are not in the "used" set. If I got rid of an instruction, I tail-recursively called this global TDCE function again. For local TDCE, my implementation is centered around a function called LVN PassMy LVN is implemented by looking through a basic block and maintaining the Correctness + Performance ImprovementsBefore I got Brench working, I used the script I mentioned earlier to ensure correctness across all benchmarks. After I nailed down a few corner cases, my script indeed ended up showing no
While at least half the benchmarks didn't get any benefit from my passes (but also didn't get any worse!), I'm happy with the average ChallengesThe hardest part was deriving general rules for nailing down corner cases. For example, I ran into an issue where I would sub-expression match something like I think my work deserves a Michelin star because I implemented and thoroughly evaluated what was asked, plus a couple extensions. |
Beta Was this translation helpful? Give feedback.
-
I've attached my implementations for trivial DCE and LVN in this folder https://github.com/dhan0779/cs6120-impl/tree/main/l3. I used Brench for comparing the outputs for correctness and created my own script to test for dynamic instruction count changes. Trivial DCE: LVN: Another issue I had was figuring out how to implement when to use fresh variable names. We mentioned in class that we could rename all variables, but then we would have to create another map for resolving the canonical name at the very end. I decided the easiest way to resolve this was looping through the instructions in a basic block in reverse and check whether variables were overwritten. This is similar to the pass I did in TDCE for checking whether a variable was being used or not. Implementing this resolved all issues with my basic LVN implementation. As a small optimization, I added a very naive function for constant folding (including some of my own test cases to make sure that this worked). When I first created the Value tuple, I would check if this Value could be folded into a constant by looking up possible values in my Correctness and Performance: |
Beta Was this translation helpful? Give feedback.
-
My implementation of DCE and LVN are here: https://github.com/mt-xing/cs6120/tree/main/l3 The implementation is split into three passes. First, a variable renaming pass handles renaming all variables in a block whose values are overwritten, renaming the earlier occurance. This renaming works by using a colon My code is in TypeScript, using Deno. You can call it by My implementation of LVN also includes constant folding, copy propagation, and commutative reordering. The constant folding and commutative operators were very easy; I simply needed to examine which operator was being processed, and to sort them for the commutative ones (add, multiply, equals), or to just do the math itself. Note that, since the JSON format itself does not support 64 bit integers, my implementation (being written in TypeScript) will also fail to parse ints larger than can be safely stored in a double. This matches spec-compliant JSON behavior. Copy propagation gave me the most trouble, specifically in handling live-ins. Since my variable renaming pass only works within one block, it can't rename the inbound variable, and so a naive implementation of copy propagation (which had assumed all rewrites were renamed) ended up clobbering variable values. I ultimately resolved this by choosing not to copy propagate any live-ins for the first copy. For testing, I built my own testing harness I call To run all test cases, My test library prints out a final optimization report with some simple statistics. Running against all my hand-crafted tests (which consists primarily of examples from class) as well as the entire bril benchmark folder, my final optimization averaged a staggering I believe my work here definitely deserves a Michelin star (and I'm not sure what the criteria is for multiple stars). Not only did I implement DCE and LVN, as well as all the optional extensions, but I also built a custom estensible testing framework that I intend to use for all my future assignments in this class. The ability to encode how much optimization to expect from each file in the test itself, and to change it for each optimization, allows me the ability to code with high confidence in the correctness of my final solution and the efficiency of my optimizations. |
Beta Was this translation helpful? Give feedback.
-
Repo for TDCE + LVN (Joint work with @emmanueljs1) We implemented trivial DCE (removing instructions whose results are never used again, and deleting instructions whose result is unused before reassignment) and LVN in OCaml. Testing Out of all 51 benchmarks in the Bril repo, our implementation optimizes 21 of them. (We used a Python script to parse Brench's output and to generate the plot below.) Here's a plot visualizing the % reduction in dynamic instruction count resulting from our optimization: (In the plot, Challenges When debugging LVN, we also adopted the following strategies:
Overall, we think our work deserves a Michelin star because our optimization preserves the same program behavior for all the benchmarks and makes a non-trivial subset of them faster. |
Beta Was this translation helpful? Give feedback.
-
DCE LVN + Optimizations Verification/Performance Challenges Encountered
|
Beta Was this translation helpful? Give feedback.
-
DCEI implemented one forward pass version of DCE. The index of instruction that can be potentially deleted is tracked along the way. To make our DCE work in global context, we conservatively assume that last assignment to a variable in each basic block will be used in somewhere in other descendant blocks. LVNI implemented LVN with copy propagation, commutativity exploit and const folding. To avoid the re-assignment problem, before we do lvn, we scan the entire basic block to perform variable random renaming. For those variables that either come from ancestor block or will be potentially used in descendant block, we keep its name untouched. The basic lvn framework is similar to the pseudo code shown in class. For each instruction, we first try to query every argument in
When constructing canonical tuple, TestingI have tested my code on both simplified case I manually constructed, including those in the slides and on more general cases in ConclusionOne of the most challenging part is to balance between aggressiveness of optimization and correctness. I also try to back propagate liveness info from leaf node of CFG to do a more aggressive global dce, but I fail to handle the cyclic CFG case. I gave up then. I am planning to incorporate what I have learned today: worklist algorithm on liveness detection to my local DCE optimizer on the next lesson to see how for I can go with that. I think I deserve a Michelin Star because I tried my best to implement a good local optimizer (w/ all bonus features, though still a lot of space for improvement as a global dce) and my code is thoroughly tested on both manually constructed cases and general benchmarks |
Beta Was this translation helpful? Give feedback.
-
Code: DCE is here, LVN is here DCE: I implemented our pseudocode for forward Dead Code Elimination. The algorithm itself was pretty standard, though I did struggle a bit with implementing the iteration to convergence, since I had set up my defined_not_used function (getting variables that had been defined but not yet used) to operate on an entire function, while my rewritten function (getting instructions that had been overwritten) operated on each individual block. I eventually figured it out, though, and threaded the defined_not_used and rewritten calls in with each other to iterate to convergence. LVN: As predicted, LVN was much trickier than DCE. I originally had renaming variables in an instruction and replacing an instruction with an id of another as two separate parts of my code; I realized I had to thread them together, though, because an instruction might be a direct copy of another only AFTER its variables are renamed. Also, translating between blocks, and realizing that there were unknown values of variables that were passed in from other blocks, was a real pain, as I had to make sure I didn't overwrite these. At first I had hoped to get away with renaming everything except the first definition of a variable, but I had to rename every instance except the last instance of the variable definition, to keep it consisting when passing that variable's value onto other blocks. Also, full credit, when creating new variables for the LVN program, I borrowed Michael Xing's idea of starting the variable with a colon, as users would not be likely to do this in their programs. To make sure I always had unique new variable named, I just kept a counter that was globally incremented to make sure that, even after the block had cleared out the table, there would always be a new variable name when we needed one. Testing: In order to ensure correctness, I wrote a python script to run LVN and DCE against all the core benchmarks in the bril repo, checking to make sure the desired output remained the same. This helped me catch a lot of bugs that I didn't get with my toy example; I struggled for a while what to do when receiving a variable as passed into a function/block that gets rewritten within that block-- we don't want to rename this because then the args don't match the variables being used, but we also want to make sure anything calling the variable after its rewriting isn't accidentally referring to the wrong value. It also helped me catch memory issues; renaming variables and alloc-ing said variables does NOT mix well. Eventually, I had to put the stopgap on that my lvn would not optimize blocks that dealt with alloc-ing variables and handling pointers. Still, when I first ran my test suite on the benchmarks, I failed 9 tests, with unique reasons behind nearly all of them, so checking against the benchmarks for correctness was definitely necessary. I also of course checked that my LVN correctly substituted variables for the example toy programs we saw in class, and I checked my DCE against the toy programs from class as well. I hope to add more to this testing suite I built over time. Overall, I thought this assignment, and LVN in particular, much harder than last week's. I do think I deserve a Michelin star because I spent several 1AM nights working to make my test suite work, and debugging various problems that I hadn't previously considered for my LVN; it was like every time I thought I was done, a new issue popped up. But I'm now confident in correctness as well as the performance increases of my programs. |
Beta Was this translation helpful? Give feedback.
-
DCEhttps://github.com/devv64/6120/blob/main/lesson3/tdce.py I extended my trivial dead code elimination algorithm from last lesson's work. Here I simply deleted instructions that are never used before they are reassigned and iterated to convergence. The actual algorithm was very straightforward, I ended up spending more time on setting up brench (I made some silly mistakes that significantly delayed this step) and figuring out how to dump the data to stdout properly. LVNhttps://github.com/devv64/6120/blob/main/lesson3/lvn.py This was significantly trickier. I spent some time trying to implement this prior to the actual lecture, so I probably spent a lot more time figuring out some small details and the approach than I needed to. I also started out using different data structures than what I ended up with, so some refactoring there did also take some time. I ended up opting for a mapping (Variable:ID) to represent the cloud and a list of tuples (Variable, Value) to represent the table, with the list index as the ID because I am never removing any element from the list so these indices are unique. I ran in to a weird case where my program was preferring to replace duplicate usages of an "id" function rather than the actual original value from the first "id" usage. a: int = const 1; would turn into a: int = const 1; This was a quick fix but I thought it was interesting. Apart from one case (an issue is happening with the way python handles 1's and 0's vs. True's and False's - it should be a simple fix but I'm running out of time! I am hoping to come back and fix this soon!), my pass has shown to provide correct results based on the brench tool. In many cases, it ends up lowering the line count of programs, which is a great sign. It results is at least the same amount of line removals as TDCE, and in some cases it removes more! I have also left comments in place where I can easily extend the functionality to support copy propagation, constant propagation and eventually constant folding. These are all features I hope to add when I get a chance. Once again, I really enjoyed this assignment. I spent a lot of time working out the small details to finally get a working implementation. Making progress on this was always a great feeling, every time that I saw slight improvements gave me satisfaction and motivation. I think I deserve a Michelin star for my work because I believe my work meets the expectations defined in the lesson tasks and I had a great learning experience. |
Beta Was this translation helpful? Give feedback.
-
SummaryI implemented trivial dead code elimination and used linear value numbering for optimizing common subexpressions, copy propagation, and constant folding). I used an approach that was quite verbose and relied on a lot of code duplication, but that allowed me to implement instructions incrementally and test that the optimizations were working in isolation (i.e. i started out only optimizing add instructions and only when that was correct i moved on to optimize mul operations and so on and so forth); although i'm happy with this approach for ease of debugging and testing i'm not too happy about the amount of code duplication and i'm exploring other approaches that could reduce duplication. How it works.DCE: I used a trivial algorithm for global optimization, tracks all the variables that are used in a function and removes all declarations of variables that are unused. On top of variables I also removed any unused branches in the function. LVN:
The same approach is used for all other core instructions (add, mul, div, sub, eq, le, ge, lt, gt, and, not or) with the appropriate adjustments for each instruction (changing add for the right instruction and exploiting commutativity in those instructions that meet the property like add, and mul). A benefit and drawback of this approach, is that there are no "automagical" optimizations for any new instructions (i.e. float operations are not optimized at all with the current implementation) every new instruction needs to implement a way of converting that instruction into a LocalValue and back into a BrilInstruction with all the supported techniques modifying the instruction (constant folding, common subexpression, commutativity if appropriate), this means more work but also means less surprises out of the box. Tests and resultsI tested most implementations in isolation and only worked incrementally (i used manual tests and not automated tests):
I saw impact in some benchmarks but not enough changes to dynamic instruction counts for it to be significant. The largest (absolute) difference I saw in a benchmark was in Hardest partThe hardest part was to build an intuition of when to keep track of an instruction as a local value and when not to do so. I decided to define a simple program and what its output should be with simple local numbering and no dce (shown below) and that gave me a place to start before implementing more fun techniques. The rest of the task felt more like repetition of the techniques that were already working (i.e. not only support add and mul but all the other instructions in bril's core). Example:
Michelin ⭐?Yeah, i implemented DCE and LVN (with support for cse, copy propagation elimination, and constant folding). I hope that michelin inspectors would agree with me. |
Beta Was this translation helpful? Give feedback.
-
dceI implemented the dead code elimination algorithms that we talked about in class. One pass over the entire CFG eliminates assignments to variables that aren't used anywhere. Then, another pass over each block in the CFG eliminates definitions that get overwritten before being read. This was pretty straightforward as expected, and I didn't have any major bugs. I wrote some simple tests by hand to make sure that the pass was doing what I expected, which it was. lvnThis pass was a bit trickier, and took some time to get right. I started out by following the pseudocode we developed in class, and the pass's main "table" is a hashmap from For me, probably the hardest part was figuring out what to do about renaming. In class we used a scheme that only renames variables when necessary, which is when it'll be overwritten later in the block. Past experience with implementing compilers has made me very scared of name collisions, so I thought it would be a good idea to just rename everything. This added some complexity to the pass that wasn't strictly necessary, and it made it harder for me to understand my own code because any time I wanted to reference a variable I had to look up what it was renamed to, then look that up in the table. It got confusing with figuring out when I should use the renamed, source name, or the table index when reconstructing programs, so I decided to scrap this idea and just go with what we discussed in class. This made things more straightforward, and I guess I'll have to hope that programs don't use a Also, I had to wrap my head around what to do with instructions like function calls, pointer allocations, and pointer frees. We shouldn't eliminate sub-expressions that use these instructions, since they have side effects. I decided that the most straightforward thing to do would be to add these to the table, but if we find a match in the table for a value that is a call, alloc, or free, we shouldn't use it. This was able to solve the bugs that I saw arising from these types of instructions. testingI used brench to test my pass on all the benchmarks in the bril repo, with a similar setup to what was demonstrated in class. It was really nice to be able to run everything with a single command. When things were incorrect, I examined the code that my pass was generating and compared it to the original program. Usually this made it clear what was going wrong, and I was able to pinpoint the issue in my code. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
TDCE + LVN Repo Trivial Dead Code Elimination Local Value Numbering Testing We paired our LVN implementation with DCE as a post-processing step and used brench to confirm all bril benchmarks ![]() Overall, I think this work deserves a Michelin star because of our implementation of several of the bonus features (copy propagation, and commutativity, and trying to get constant folding to work), and we tested our LVN code rigorously after each implementation to ensure that all benchmarks were passing (which initially, they weren't after implementing each bonus feature). |
Beta Was this translation helpful? Give feedback.
-
TDCE. This optimization was fairly straightforward. To make it easier, and hopefully simplify future tasks too, I added some helper functions for traversing the whole code in particular ways, using the visitor pattern. This made it very easy to e.g. collect all uses -- it became a one-liner. The code is pretty declarative too -- for instance, the line LVN. This one was far trickier, mostly because of all the cases involved, like, does the instruction have a destination? is it a call or an operation? is its value already in the table? was there a previous write to this variable that needs saving? etc. One other tricky part was generating fresh variable names for saving temporary values. I decided on a scheme where I renamed all program variables to Correctness testing. I am preferring to write my own scripts for testing rather than using Turnt, since I already know how to write Bash so it's lower effort. I made a simple script which runs any Performance testing. I modified the script to also compare dynamic instruction count, and check that the optimized version is indeed faster. However, at first, it was often slower! This came from my transformation to and from a CFG, which ends up shuffling the basic blocks around (I was storing them in a sorted map, by label) and inserting jumps all over the place. So all the extra dynamc instructions were jumps. To fix this, I implemented an algorithm that orders the basic blocks in a statically-optimal order; that is, one that minimizes static instruction count. After doing this, every program in the core benchmarks folder ran at least as fast with my optimizations. Conclusion. In summary, I implemented TDCE and LVN, as well as some code traversal helper functions which I think are neat, a simple testing/benchmarking script, and a CFG linearization function that chooses a smart way to order the basic blocks. My optimizations don't change program behavior, sometimes make it faster, and never make it slower. I think this deserves a Michelin star. |
Beta Was this translation helpful? Give feedback.
-
LVN 🎉Note This post has images and embedded interactive code previews which may take a few seconds to load. Source Code
I spent a few hours configuring CI for both of the above optimizations. The CI takes a bit to run because of some benchmarks which required me to set the As usual, all the passes are scripts that either take in Bril programs from standard input or a file and produce Bril's textual representation. I made some small tweaks to the I wrote a wrapper script that you can pipe the output of
Here's example output: My total time spent was 10 hours according to Toggl. Trivial Dead Code EliminationI implemented both dead code elimination passes shown in the video. I tested them by running Local Value NumberingI implemented LVN with a basic value interner. It supports every single Bril instruction and program and works on every single benchmark. Here's something I thought was funny. To make sure that I tested my implementation by running I had two issues when implementing LVN.
Note One thing to note is my strategy for coming up with new names is not entirely robust. I have a counter that strictly increments whenever a new temporary (that is, before the final assignment in a basic block) and appends ExtensionsCommutativity $ bril2json <../bril/examples/test/lvn/commute.bril | ../target/debug/lvn | diff ../bril/examples/test/lvn/commute.bril -
1,2c1
< # (a + b) * (b + a)
< @main {
---
> @main() {
6,7c5,6
< sum2: int = add b a;
< prod: int = mul sum1 sum2;
---
> sum2: int = id sum1;
> prod: int = mul sum1 sum1; Constant Folding$ bril2json <simple_fold.bril | ../target/debug/lvn | diff simple_fold.bril -
1c1
< @main {
---
> @main() {
4c4
< c: int = add a b;
---
> c: int = const 3; PerformanceHere's a quick comparison of how the optimization passes affect performance (I didn't include the LVN extensions). I wrote a simple script called To get the data yourself, you can run I graphed the data -- that seemed like a low-hanging fruit. The massive range of orders of magnitude meant that even after I split the data some small benchmarks times were not visible, and the low resolution meant not all benchmarks names appeared visibly on the x-axis. The important thing to notice is that the green bar is smaller and the yellow bar is usually smaller, so TDCE / LVN + TDCE is doing something. (You can make out the key at the top of the first chart). |
Beta Was this translation helpful? Give feedback.
-
DCE: source LVN: source I also implemented canonicalization for known commutative operations, converting values into a standardized string format to compare equality. I included copy propagation by adding a special case for Testing and Evaluation: I tested my TDCE implementation by comparing the output with the Bril TDCE tests using Turnt, and I eventually familiarized myself with Brench to more easily observe the performance difference over all of the benchmarks. Similarly, I tested the correctness of LVN (and LVN combined with DCE) using Brench, and I manually inspected many of the outputted programs to ensure that the renaming and copy propagation behavior was as expected. I observed the performance difference (in terms of dynamic instruction count) of LVN by running with Brench over the core benchmarks. My optimization reduces the total dynamic instruction count by 56% over the LVN test suite, and by 14% over the core benchmarks. Conclusions: Implementing LVN was a fun activity, as it allowed me to explore the edge cases of renaming and live-ins, and it was satisfying to observe the instruction count difference after optimization. Overall, I think my work deserves a Michelin star because of my implementations with extensions and thorough testing and analysis across all Bril benchmarks. |
Beta Was this translation helpful? Give feedback.
-
I read somewhere recently that "Python is the second best language for every programming task", so I decided to use Python for this task's implementation instead of OCaml; I think they were right. I <3 Python, but I digress. TDCEFor my tdce, I implemented a local one that operates basic blocks and a global one that operates on functions. Local TDCEThis implementation was similar to the one covered in class, where you iterate until convergence. Each pass consists of finding which variables in the block get re-defined, and checks if there are any usages of those variables in between definitions, and removes them if not. Global TDCEFor this one, I tried two implementations, but ultimately settled on the simpler one because that didn't lead to timeouts. The simpler implementation was simply checking which definitions never get used in an entire function, and removes those definitions. The other implementation I attempted took it a step further. Once a definition gets removed, it re-evaluates all the arguments it used, since those arguments now have one less use. I created several more data structures, to keep track of instructions where variables were used and defined, and continually tried removing instructions. I had a lot of timeouts though for this implementation. LVNThis part definitely was the trickiest, for several reasons. There were many edge cases to consider, especially with constants. For the most part, this implementation followed similarly to the one demoed in class. For unknown variables though, I make an assumption that they existed before the basic block, so I just add an entry in the table as a placeholder. Although Python was nice for just writing less lines of code, it really came back to bite me hard for this one tricky case that took a while for me to catch, and it's that Python considers "1 == True" to be True. I did implement some kind of mechanism to reduce some computation if there was some renaming of a variable that held a number in the LVN table, and I did this by replacing the variable in the table with another variable that was pointing to this entry. I attempted to rename things with fresh variables, but it felt quite wrong for a local optimization since I felt like renaming those variables would have consequences for future blocks. Update: Implemented copy prop, commutativity, constant prop, and constant folding! CorrectnessI ran brench on the core Bril benchmarks to check for correctness and reductions in instruction count. All my optimizations preserve correctness, which is relieving; but the decrease in instruction count was a little disappointing. I think this may be due to people writing benchmarks that don't involve a lot of dead code. I did run my optimizations through a couple more contrived programs to make sure the optimizations were actually working thogh. Update: After extensions implemented, the most dynamic instructions I cut was 531458, for the delannoy benchmark! That was pretty cool to see. The median instructions cut for the core benchmarks was still 1 though. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Deadcode EliminationThe original deadcode elimination was pretty simple. I just had to look through the instructions for writes to variables that were never read or two consecutive writes to the same variable with no read in between them. I did stray away from the in-class pseudo code a bit, though - instead of going through instructions in a block in order and keeping track of candidates for elimination, as the in-class pseudo-code did, I went through the instructions in reverse, adding them to a new list. I found this easier to code up because it meant that we found an instruction to eliminate, we're already at the location to eliminate it by simply not adding it to the new list, and we don't need to do an awkward shift to eliminate previous instructions. Local Value NumberingI had a bare-bones implementation of LVN up pretty quickly. However, I kept running into edge cases I didn't consider. For example, for a bit my LVN code didn't see the difference between ints and booleans, so long as Python thought they were equal. This means a const int 0 was being copied from a const boolean false. Additionally, my code originally didn't understand that function calls and alloc can't be copied. For example, my code would originally "optimize":
into
I fixed this by adding checks for alloc and function calls, and assigning unique values to them so they would never interfere with each other in the value table. I tested my code for correctness by creating a simple bash script. I copied the benchmarks from the bril repository and ran the original bril code for each one, as well as my optimized bril code, and making sure they both printed out the same thing. I found most of the bugs in my program when this test failed, and I manually went through the bril inputs and outputs until I found the error. Once I had LVN working, I decided to implement a couple of the optimizations that it makes possible. I was originally a bit disappointed looking at the optimization of some of the code I generated with ts2bril, because it was clearly suboptimal. The biggest impact optimization I did was adding copy propagation. The code for this was pretty simple. When creating the instructions, I just need to check if it's an
With copy propagation complete, I also decided to implement constant propagation. To do that, I just went to the case when one variable had the same value as another. In that situation, you would typically just make an instruction like
I also implemented commutative exploition by simply checking if an instruction was an add or a mul, and if it was, sort the arguments. I didn't end up implementing constant folding because it was getting late, but I see how to: when writing the result of an arithmetic expression, check of both instructions are consts, and if they are, just do the math then and there and write it as a const instead of writing the arithmetic expression. ConclusionI do believe my work deserved a Michelin Star, because I implemented LVN, got a bit more practice handling edge cases in intermediate languages, and I went further and implemented a couple of the optimizations that LVN allows, |
Beta Was this translation helpful? Give feedback.
-
Dead code elimination wasn't bad. My code makes a pass to remove unused variables, then another pass to remove unused writes. I couldn't use brench to test that my DCE implementation reduced the number of lines, so I wrote my own shell script to run all the benchmark files with and without optimization, then check that the optimized code produces the same number of results without taking up more lines. I also manually compared the outputs to the DCE examples to make sure they matched. LVN took much longer, so I didn't manage to implement any of the additional features. In no particular order, some problems I ran into:
Overall, I think the hardest parts of this assignment were getting the shell script to work and handle arguments (which took a lot of Googling and trial and error), as well as wrapping my head around LVN. I think that I deserve a Michelin star for implementing LVN, along with a new tool for parsing the bril scripts and comparing the number of instructions run. |
Beta Was this translation helpful? Give feedback.
-
Here is the code: link Implementing the trivial dead code elimination was straightforward, and I faced no difficulties with that. It was way trickier to make LVN work correctly. I spent quite a bit of time fixing my code for all the corner cases. For example, there were cases when I was wrongfully reusing already computed My implementation takes care of copy propagation and takes into account commutativity. I tested my implementation on all of the benchmarks using Brench. I found it to be quite convenient:) Here are the plots for the three largest benchmark sets: core, float, and mem. Sadly, gains in performance are not significant on average. Quite often my optimizations slow down the code. Here are some statistics for (old performance) / (optimized performance):
I routinely use copilot autocompletion when writing code. It helps we to avoid retyping things like 'import matplotlib.pyplot as plt' over and over again. I did not use any text prompts for the assignment. |
Beta Was this translation helpful? Give feedback.
-
TDCE
I made to sure to iterate until convergence! In fact, I factored out a general function for this: LVNThis part was far trickier and took me quite a while!
The module EvaluationI wrote a script that graphs and computes statistics about the speedup (
speedup on core benchmarks
speedup on memory benchmarks Phew, thankfully neither optimization slows programs down (min speedup = 1.0 for both)! They might even speed up programs in the average case (for instance, avg LVN speedup >= 1.1 on core benchmarks). Notably, certain long green bars drastically deviate from all orange and blue ones: this show LVN must be making meaningful optimizations beyond simple TDCE. |
Beta Was this translation helpful? Give feedback.
-
DCE I implemented trivial dead code elimination with no problems. After implementing LVN and running it on some benchmarks, I kept outputting incorrect programs. I quickly realized that my trivial dead code elimination was deleting assignment instructions I needed in other blocks. To fix this, I implemented a different version that only deletes assignments where the variable is overwritten before being used as an argument. However, when I ran Brench, I saw no reduction in instructions. I thought about it for a bit and couldn't figure it out so I peaked at the structure in the examples file and realized that the deadcode elimination done retains the same 'used' set over all the blocks so it was more of a global analysis than I thought we were doing. I reimplemented it with this in mind, and it worked well. LVN I thought this was fun. I had to do some thinking for the best way to represent values. I didn't like the feeling of chasing errors, though. It made me think I didn't have a good grasp of the fine details. I implemented copy propagation, constant propagation (although I may have messed one of these up because they merged in mind while I was implementing them), commutativity, and constant folding (for half the operators, I will come back and do the rest at some point). Testing I used Brench (it took a bit to figure out how to, though). My program reduces the number of dynamic instructions for plenty of the benchmarks, and for some of them, it has no effect. I didn't do any fancy analysis; I just looked at the Brench output for mine compared to Professor Sampson's, and it looked similar. Sorry for it being late. I really ran into a lot of problems due to my incorrect understanding of local analysis, which led to errors. |
Beta Was this translation helpful? Give feedback.
-
Implementation link LVN I decided to have my dead code elimination pass iterate over the whole program in order to avoid issues with a variable being declared in one block but used in a later block. My dead code elimination pass is fairly basic -- it just directly implements the algorithm discussed in class. Testing Testing with Brench is where I ran into issues. Unfortunately, as of writing this post I'm still unable to get Brench to output anything to the result csv. I've tried basically every change to the brench.toml I can think of, so I decided to do my writeup first and add my Brench results once I get those sorted out. Michelin star: I did a good job with my implementation, but since I haven't tested on the full Brench suite I'm not fully convinced it actually works (since honestly I haven't done many "real program length" test cases). I also wasn't able to finish my LVN implementation to my satisfaction by implementing constant folding, so honestly I don't think I really deserve a Michelin star since I didn't meet the standards I set for myself for this assignment. |
Beta Was this translation helpful? Give feedback.
-
I implemented a super simple lvn pass. 1 It does not do copy propagation or anything particularly fancy. It is also fairly messy. I also implemented a super simple dce optimization pass. 2 It is also somewhat messy. CorrectnessBoth passes appear to be correct on all all core benchmark programs (see my Brench test file for details). To be blunt: this does not convince me that my passes are correct with respect to all of Bril's extensions or even all possible (core) Bril programs, but you may might be convinced by this evidence . . . PerformanceTo test that my passes will actually improve performance in some programs, I have two sample programs 3 4, which appear to be optimized as-intended. Brench output confirms that my optimizations, as implemented and run over the entire core benchmark suite, are not as impressive as one would expect a more semantics-aware lvn pass + dce pass to produce. Footnotes |
Beta Was this translation helpful? Give feedback.
-
OverviewI apologize again for posting this so late. Nevertheless, this was again an incredibly fun and exciting assignment. I thought Lesson 3 as a whole was pretty interesting since we consider optimization without dataflow. I recall in CS 4120 that we started covering optimization immediately with control flow graphs and dataflow analysis, and it surprisingly was really fun to put that aside and consider optimizations solely globally across a whole function and within basic blocks without regard for control flow. I was impressed by how elegant and simple the code was while still being incredibly useful. Dataflow is obviously more powerful, but it was freeing to design my own algorithms for these optimizations from scratch without trying to fit a formal framework. On the implementation side, I've never really programmed in python. I really enjoy TypeScript, but I decided to wait another week before diving into TypeScript with Kabir in L4 and take the opportunity to get more comfortable hacking together algorithms in python. I surprisingly gained a lot more experience, mostly from debugging interesting language quirks. It turns out that I didn't know the different operations for sets and lists, and I ended up creating some wacky bugs from not realizing how python was attempting to help me out by converting between data representations behind the scenes. I'll touch on some of this below. Dead Code EliminationGlobal dead code elimination was pretty straighforward; I simply followed the algorithm from class. Local (block based) dead code elimination was actually challenging because I decided to complicate it by iterating through the list of instructions in a block in reverse order, rather than following our forwards traversal from class. In my head, this was more intuitive because by the time we reach an instruction, we will know with confidence whether we should delete it because we've already considered all instructions that will be executed after it, whereas the forward traversal requires us to remember instructions and decide to delete them later. In practice, this ended up being challenging. Perhaps I was just really overwhelmed and tired, but it took me a long time to work through the algorithm and decide what to store, when to check use vs defs, and when to delete. I ultimately landed on storing a set of variables that have been defined but not used (going backwards). These variables are candidates for earlier in execution instructions to be considered dead because a later instruction renders them redundant. Thus, at each instruction, we first check if it's assigning to a variable that is assigned to later in execution, and if so, we immediately delete it. If not, we consider it as potentially an instruction that subsumes / overwrites earlier instructions and add its destination to candidates. We then consider the arguments, because if this instruction writes to the same variable it reads from, it cannot render earlier definitions dead because it is using their definition. After testing and playing around with this, it seems like it's somewhat more efficient than a forwards iteration of local dead code elimination. Since we immediately decide to delete an instruction when we encounter it, we can skip considering its uses, allowing us to more aggressively delete instructions we see later (earlier in execution). I'm not entirely sure about this though, and I'd like to reason about it more if I had more time. Implementation was quite a pain because I didn't know Python as well as I thought I did. It turns out the way I was add to sets meant that every character in a string was put in the set, so a definition to "notdone" deleted earlier definitions to "n". In the end, I realized I just write simple silly little for loops rather than trying to do things fancy in one line that I am not entirely sure about the semantics. Local Value NumberingLocal value numbering was, as expected, more complex than I thought. I spent way too much time on local trivial dead code elimination, so I had a lot less time for this. I tried to keep the data structures simple, mostly just using dictionaries and tuples. It probably would've been better to actually define a nice data structure for the table since it was a bit awkward dealing with variables that were read before being defined. There were also weird bugs with calls and python treating False as equivalent to 0! TestingTesting was again surprisingly nontrivial! I first tried to set up turnt as I did in L2. Initially I had my programs take in a file path, which required me to put the bril json in a temporary file. I then tried to use bash process substitution, but that didn't work with turnt, so I had to tell it to explicitly use bash with I also used brench to test across all benchmarks in the repo. I did find that dead-branch didn't seem to work with brilirs. Both the baseline and opt were missing. It seems
Reviewing the brench CSV file shows that a lot of benchmarks actually showed some improvement! I was quite surprised to see this since I didn't implement the full bonus points for LVN, but it was fun to see that common subexpression elimination was actually still helpful. Of course, the best part was that it was correct! ⭐️I know this was really late, but I hope that my work is still considered for a Michelin star. I believe I went quite above and beyond with dead code elimination by implementing my own algorithm rather than the one presented to us, and I still completed local value numbering successfully. |
Beta Was this translation helpful? Give feedback.
-
The tasks for this lesson include implementing basic dead-code elimination (DCE) and, as the main event, implementing local value numbering (LVN). I ❤️ LVN!
Beta Was this translation helpful? Give feedback.
All reactions