Replies: 18 comments 35 replies
-
Dead Code EliminationImplemented dead code elimination of both varieties (global use analysis and local reassignment analysis). The code for it is here. Tested the implementation using Local Value NumberingImplemented local value numbering based optimizations. I implemented the following optimizations:
The code for it is here. Tested the implementation using
I took the advice from @sampsyo's comment and added a Java property It can optimize all the examples shown in class. A good example is this test case which combines a lot of elements from all test cases.
CorrectnessTo verify the correctness of both DCE and LVN implementations, I created an additional test folder This test helped me catch a bug in my implementation. When going through the list of instructions I would maintain a map of assignments that have been renamed (because they will be clobbered in the future). However, I had forgotten to remove it from the map after encountering the instruction that clobbers it. Speed IncreaseAcross all the tests in |
Beta Was this translation helpful? Give feedback.
-
The code is here. The testcases can be easily runned following the README. Dead Code EliminationImplemented DCE with both trivial one (only global analysis) and the plus one (with local reassignment analysis). The correctness is verified, and by checking static code size Local Value NumberingFirst I implemented the basic LVN, and then extended it for three optimizations. They can be enabled using corresponding optional arguments.
Similar as DCE, I also tested the optimization result by checking static code size and dynamic instruction count on various testcases. Note that by constant folding, the dynamic instruction count would not decrease, but a const op is surely more efficient than the other op. DiscussionsI'd like to mention some difficulties & bugs during programming. The difficulties are partly from the difference between the pseudo code and real code implementation.
|
Beta Was this translation helpful? Give feedback.
-
My implementation of DCE and LVN is here. Dead Code EliminationI implemented both local and global DCE in Local Value NumberingI implemented constant propagation, common sub-expression elimination, and constant folding in LVN. I added tests from Bril and used Taking
DiscussionData structureI used a straightforward dictionary LVN: optimization on-the-flyInstead of building the table first and then generate the optimized code, we can loop over instructions only once and generate the transformed code on-the-fly. To do this, we need to carefully consider the following things:
|
Beta Was this translation helpful? Give feedback.
-
My implementation of DCE and LVN can be found here. Dead Code Elimination (DCE)DCE is relatively easy to implement, just following the pseudocode would be fine:
I also added basic control flow support for DCE. Basically I just follow the control flow ( The following shows the test status of the benchmarks from
Local Variable Numbering (LVN)LVN is much tricky somehow and requires careful considerations. I use a list to store the LVN table. Each element is a tuple containing the value and the canonical name. This takes O(1) to access a row by index but takes O(n) in the worst case to find a value. I am not sure if there is data structure that can enjoy both of the world -- searching by index and searching by key can both achieve constant time (if not storing redundant data). It would be helpful to reduce the searching overheads if the program has lots of instructions. Based on LVN, I implemented all three (1) common subexpression elimination, (2) copy propagation, and (3) constant propagation/folding. (1) only needs to follow the LVN algorithm and check if the value is computed before, and then replace it. (2) is also straightforward that needs to find the initial definition of the variable. (3) is a little bit tricky. What I do is write specific computation rules for each arithmetic operation, but it is very inconvenient and the if-else cases take almost 1/3 part of the code. Also, to make constants pass through the program from top to bottom, I iterate the algorithm several times until the program does not changed just like what we did in DCE. The attached pesudocode is actually a simplified version of the LVN algorithm. In my implementation, I pay more attention to the following things:
The following shows the test status of the benchmarks from
CorrectnessApart from the above tests given in the bril repository, I also wrote additional test cases to test the correctness of my algorithms. The result of LVN is passed to DCE to do further optimization. All the cases are tested using Here I give another example to show how my passes work, and it should cover all the cases in LVN and DCE.
Using this testbench, I also found a subtle mistake that I should use For control flow testing, I used the
|
Beta Was this translation helpful? Give feedback.
-
My code is here. My implementation of TDCE is straightforward. As suggested, I define "one pass" variants of the two strategies, compose those, and then iterate the composition to convergence. I do not support jumps and branches. Instead of throwing an error, I check for jumps and branches and, on finding any, correctly do nothing. I tested my implementation by running it on the suite of reference examples. There were a few programs that I did a "better" or comparable job on (modulo alpha renaming) so I used turnt's --save utility to clobber the expected values. My implementation of LVN is organized in a similar way; I compose LVN with TDCE and iterate the composed method until convergence. I used a DataFrame for my table, which was a bit clunky and may have been a mistake. Anyway I got through it with a few getters and setters. As an extension to the basic algorithm, I expose the semantics of arithmetic, logic, identity, copying, and const, and realize profit. Again, I tested my implementation against the suite of reference examples for TDCE and LVN. Where appropriate, I clobbered the expected .out values with my own. Introducing constant-folding and exposing the semantics of arithmetic collapses many of the previous examples into two-liners. A few problems I ran into:
|
Beta Was this translation helpful? Give feedback.
-
My implementation is here TDCE:For dead code elimination, i wrote a function that collects all of the "used" variables, and then removes any instruction that assigns into a variable that isn't used, and then looped it until there was no update. I didn't worry about dead code wrt control flow, since that will be dealt with by LVN CFG:I made a CFG of basic blocks so that I could work on those, and then reconstructed the program (also kept track of the initial order of the basic blocks, since there can be "fall through" when there is sth like
The reconstruction deals with removing dead code "between" blocks, but it isn't smart enough to get rid of blocks that are never jumped to. Oh well. After making a the cfg stuff (and a function that printed DOT code so i could check if it was right), i made a mapping function that took an "optimization" function as an argument, and then applied it to every basic block in the program, finally reconstructing them all in order. Nothing too fancy here; it would probably be nice to get rid of/merge labels when possible, so that we can have big basic blocks, but this didn't really seem to be an issue. LVNThis was when i implemented LVN (didn't realize till lecture today that I didn't really need the CFG, but oh well, head start for next week) Basically just did the algo from the lecture, and then added copy propagation by checking if the op was id, and then looking using the same value number. Constant folding was also pretty straight forward; in my translation from insn -> value, i checked if all the args were constant and then just returned the constant, which since i didn't get it quite right at first ended up with some funny programs like this:
being optimized to
but i fixed it pretty easily. This was probably the most tedious, as i had to write out a mini-interpreter for bril insns, but that was only like 15 lines or something. I also added a check for if an instruction had two of the same argument, there is sometimes special things to do, for instance x - x = 0, and x < x is always false. Leveraged divide by zero being undefined behavior a couple times here: 5/0 will be 0, but 0/0 might be 1 sometimes, depending on the order of things. I also dealt with commutative operations by sorting the values. Testing strategyOptimizations should preserve correctness, so I used turnt to check that the output of the program stayed the same after optimizing. This was pretty convenient, and only the divide by zero one changed, which again, I expect. Then, to see if I'm actually doing anything, I made a separate script that applies the opt, and then puts it through bril2txt, and i just looked at all the files, and didn't see anything that wasn't being optimized that should have, so i'll call that a win. One difficulty i ran into was that the turnt/bril workflow is very different from the LISP workflow, the latter of which is very interactive and you do pretty much everything in a repl. I ended up loading in a few bril programs to test with, but in the end i felt a little torn between working in the REPL and then running all the tests with turnt, but by the end i felt pretty efficient. |
Beta Was this translation helpful? Give feedback.
-
DCEMy DCE implementation is here. I pretty much just implemented the pseudocode we saw in class. I tested my implementation using on examples in the tdce directory using turnt. I had to update a few of the .out files, as some of them did not remove all dead code (for example, one .out program only had one pass of DCE performed). I also verified that the code from the dce output had the same behavior as the original code for each example by running the brill interpreter. LVNMy LVN implementation is here. I had to use some additional data structures than what was described in the pseudocode to get things working- in addition to a table mapping value tuples to canonical variables, I also kept a list called To test my implementation, I used turnt on many of the examples in the lvn directory. I set turnt to run lvn and then dce. I tested my implementation on the examples below, which all resulted in code with fewer instructions but the same behavior. I did not test my code on all examples in the lvn directory since I did not implement any additional optimizations.
DiscussionSome challenging parts of this assignment included figuring out how to tweak my algorithm to get copy propagation working as I mentioned above. Other than that, it took me a while to get turnt working since I didn't realize that comments in a bril program had an effect on turnt. I really would have liked to implement constant folding and common subexpressions with commutativity if I had more time. |
Beta Was this translation helpful? Give feedback.
-
LinksHere is my implementation of dce and here is my implementation of lvn. Tests are at dce-tests, lvn-tests and lvn-dce-tests. SummaryI implemented trivial DCE with both the local deletion of overwritten instruction destinations within a basic block, and also added global deletion of unused variables. To run these, you can use I also implemented local value numbering with constant propagation, copy propagation, some basic simplification as an attempt to do common subexpression elimination and constant folding. All these optimization attempts are run on the pass, and I did not add flags to individually enable these optimizations. For constant propagation, every time a generated LVN value was a constant, instead of returning the variable, I replaced it with a constant. For copy propagation, every time an identity expression is generated, I replace it with the previous variable it refers to. For common subexpression elimination, I implemented a simple approach to commutative expressions like addition or multiplication. This allows TestingFor testing, I first checked the generated code and any printed statements were correct manually. I then ran these tests using turnt, to ensure correctness as more changes were made. Some of the tests are smaller and attempt to test certain features, such as variables that are not used or variables that are immediately overwritten. Other tests combine multiple optimization opportunities. The lvn-dce-tests in particular leverage both lvn and dce opportunities, and run the lvn, and then the dce pass, in order to clean up the code. The results of these tests can be found in the directories dce-tests, lvn-tests, and lvn-dce-tests. OptimizationTo check for optimizxation, I ran tests in the lvn-dce-tests directory. I paid attention to whether the output was smaller than the input in terms of dynamic and static instructions. In general, I was pleased with the amount of redudancy that was eliminated. An example of this is the original program:
which can have constant propagation, copy propagation, constant folding and common subexpression elimination applied. There are also examples of dead code and overwritten variables.
in which many of the assignments are removed, and the constants are able to be propagated effectively. In particular, only 5 of the statements are non-print statements while the remainder are print statements and cannot be eliminated. DifficultiesSome of the difficulties I had included creating an unsound procedure for both dce and lvn, dealing with arguments in LVN, as well as not dealing with division by 0. For dce, when I split into basic blocks, I removed all variables as dead at the end of a basic block. I realized these variables could persist past the end of a basic block and fixed the problem appropriately. In the case of lvn, I failed to take into account that variables can be reassigned multiple times. I changed this by creating unique variable names, by adding on a numerical index to the variable. I also failed to deal with arguments in LVN correctly, initially. I later added each argument into the LVN table before running the LVN optimization, to overcome this problem. For division by 0, I initially allowed the lvn pass to crash when the divisor was 0. After the question raised in class, I changed this so that the compiler would not raise an error or crash, and would instead just not optimize that specific instruction where the division by 0 occurs. The division by 0 will occur at runtime. |
Beta Was this translation helpful? Give feedback.
-
TestingI extended the testing from the example tests by creating a script that runs all the bril programs in a directory, writes their outputs in a new directory, copies the programs to that directory, and creates a turnt config to show that transforming them with dce or lvn still works. After some light editing to give things arguments and such, this shows that on the example programs my dce and lvn implementations produce programs with the same outputs. I also started some work to store the profile output, but still have to write the python code that runs the transformed programs to show that they always decrease the number of dynamic instructions. Further testing is described in the relevant sections for DCE and LVN. DCEMy DCE implementation is here. It's very basic and only eliminates the lines that never get used, without doing anything to try to look at the CFG or remove code that way. It has two parts, one which eliminates unused variables, and another which eliminates redundant assigns within a basic block. The first is run repeatedly until there are no changes, and the second is run once at the end. I think that it is correct because it matches the example test code outputs, except for on double-pass.out and reassign.out, where my implementation removes a bit more code. LVNMy LVN implementation is here. I did copy propagation and common subexpression elimination from class, with a few extensions:
Getting basic LVN to work was pretty challenging until I realized that I was making horrible mistake in putting After that, constant propagation was pretty tricky, since it meant not just knowing that a value was already in the table, but knowing whether it was the result of an effectful operation, or if it was a constant. Another challenge was self-inflicted -- to try to future-proof against all of the issues where trying to evaluate code on the constants could cause an error (which honestly might only be division by zero...) I initially put a lot of my code for constant folding with a try/except Exception branch. This meant that a lot of things silently failed, until I restricted the scope of that try/catch to only one line that I knew worked. With that all in place, the commutativity checks, constant folding, and special-casing felt straightforward because they all can be implemented in the step where we construct the value of an instruction, since all the values that you need are in the table and you don't really need to mutate anything or deal with control flow. The constant folding shows up in many of the tests (basically everything that says fold) and produces reasonable outputs, and the special casing is best showcased in logical-operators.bril.
Overall, the changes between the original test code outputs and the result of my code can be found in this git commit: atucker/bril@88c504f |
Beta Was this translation helpful? Give feedback.
-
DCEI implemented DCE using a trivial global analysis that removed unused variables and a trivial local analysis that removed variables that were overwritten before they were used. TestingThere were two approaches used to gain confidence and understand the flaws of my solution. The first was to modify the The second way I tested this transformation was by modifying the LVNI implemented LVN which performed CSE and commutative CSE. In order to get commutative CSE working I sorted arguments for commutative operations using the python TestingI tested this transformation in a similar way to testing DCE. I first used
I also tested against the benchmark programs and passed many of those tests. I did suffer from issues like divide by zero and the situation where variables were defined in one basic block with a use in another. DiscussionSome difficulties I had were with division by zero and handling variables defined by other basic blocks that were not defined in the current block. For DCE, I removed instructions that divided by zero if they were unused throughout the entire function or if the variable that stored the result of division by zero was redefined before a use. A similar thing occurred in LVN. This is still a problem that is not addressed by my solution. With regard to division by zero, I wonder what production compilers must have to deal with as there are many more side effects than just division by zero. For example, any programs that call out to the OS seems like another class of side-effects. A compiler writer must be aware of these types of side-effects when generating optimizations. It's clear that a good baseline test sweet is beneficial to a compiler writer. In addition to this, I had some difficulty getting LVN to work with variables that were defined in another basic block and used in another basic block before another definition. This was not something I originally accounted for in the pseudo-code and I did not realize this was a problem until the very end. Next, testing was a bit difficult because the existing solutions for DCE and LVN had slightly different outputs as me. In the future I plan to look at the example outputs to see what the optimizations are doing so that I can conform to any standards so I can reuse those tests. Another thing that was difficult with regard to testing was understanding when my optimizations were stronger than the expected output and when the optimizations were weaker than the expected outputs (or wrong). |
Beta Was this translation helpful? Give feedback.
-
DCEMy DCE pass implements the trivial dce discussed in class, including both global and local trivial dce and iterating to convergence. The trivial dce was tested using a set of benchmarks and turnt that cover a number of difference global and local dce cases. I also spot checked these tests to ensure the results are as expected. LVNMy LVN pass performs commutative CSE and copy propagation. These options can be enabled with the While it is difficult to ensure that my LVN pass works in all circumstances, I spot checked a number of the benchmarks. For example, we can see the code below is optimized correctly after being passed through the tdce pass.
After:
DiscussionThe LVN implementation definitely took some time to implement in a real program, even when referencing the pseudo code. However, by far the most challenging aspect of this assignment for me was constant folding, because it requires actually accessing and manipulating constant literals. The current rust API is not very well suited to this task, as rust is strongly typed and will not allow you to pass a lambda that can act on multiple different literal types (ints and floats for example). I believe the proper way to implement this in rust is to extend the current API with trait implementations for each of the possible operators, allowing the user to just call the * operator on a Literal struct. I started working on this some, but it is not a small amount of work. I still don't have that much practice using rust in real programs, so this issue might just be from my lack of experience. |
Beta Was this translation helpful? Give feedback.
-
Both of my implementations for DCE and LVN are in this directory. Both implementations are of the basic algorithms for DCE and LVN. In order to test that my implementation was correct, I used the pre-existing benchmarks and compared the output of my DCE and LVN passes to the outputs expected from the benchmarks. I also used my existing tests from Lesson 2, which helped me find a bug in a portion of my code that generated the list of basic blocks. Due to some time constraints, I was not able to manually write further tests for my implementations, but in the future I would like to explore how to write both simple and effective tests for the various optimizations I am performing. |
Beta Was this translation helpful? Give feedback.
-
My implementations for both DCE and LVN can be found here. The tests can be using the Makefile in the DCEI implemented two versions of DCE, the trivial one, and one which also eliminates the reassignments. The implementations can be found here. TestIn order to test both correctness and performance, I used
However, not all the tests pass in this implementation. Specifically, the failing ones are LVNMy implementation of local value numbering can be found here. This is the basic implementation of LVN, but it also captures commutativity. However, it does not support branching yet, but I am planning to improve this. TestSimilar to the previous testing approach, I used turnt in order to check the output validity, and the performance improvement. The results are depicted in the following table, as the average
|
Beta Was this translation helpful? Give feedback.
-
Both the DCE and LVN implementations are here: https://github.com/barabanshek/bril/tree/assignments/assignment_2 . Please, use the test_* scripts to run tests. Some tests don't output numerical results (e.g. "division by zero" which outputs the exception), please, run these tests manually to see that the outputs match. Implementation status:
Issues in the current implementation:
Example of the LVN + DCE pipeline on the test
After passing through LVN and DCE:
After fixing a bug and allowing to run till convergence, I got the most optimal output:
Discussion:I'm wondering how we can make sure that optimizations cover as much as possible corner cases (not for correctness, for optimization)? For now, the only solution I have in mind is manual covering with very carefully crafted sets of tests, but what is the guarantee that these tests indeed cover everything? |
Beta Was this translation helpful? Give feedback.
-
My TDCE Implementation and LVN Implementation are here. DCEMy DCE implementation takes care of both Global unused instructions and local reassigned instructions (Both are done together in one pass). Testing on DCE is done by using turnt and the testing has covered all testcases under LVNMy LVN implementation includes value propagation, canonicalization, and constant folding. Testing on LVN is also done by using turnt. I had most of the tests passed except for Meanwhile, I have spot checked some test cases. For example with
After LVN (including value propagation, canonicalization, constant folding), and TDCE we get:
which looks pretty aggressive in terms of instructions removed! DiscussionOverall, I don't find TDCE and LVN conceptually hard. However, implementing LVN (and associated extensions) correctly requires a reasonable amount of attention on corner cases. I bet it must be much trickier to implement LVN and associated extensions in real compilers due to much more complicated typing and other factors. I was totally unaware of cases covered by |
Beta Was this translation helpful? Give feedback.
-
A bit (a lot 😬) late but here's my implementation of tdce and lvn. Usage
In the context of other bril tools, I use The most optimized program my implementation produces first uses my lvn implementation, then passes through local dce for each block and the whole function is ran through global dce. Dead Code EliminationTrivial dead code elimination is implemented from the pseudocode in class. The pseudocode discussed in class was very straight forward to follow, but while testing my lvn implementation I found this example to be pretty interesting
Mostly because I failed to realize the kinds of cases trivial dead code elimination does not capture. Local Value NumberingLocal value numbering is implemented with CSE exploiting commutativity and copy propagation. I originally started by trying to follow the pseudocode as closely as possible, but it wasn't that simple. The first problem I ran into was the control flow of when to assign a new number and when to lookup in the table due to the variety of operations. Variables that were declared before the block but used in the current block proved to be an issue. I took care of this by assigning a lvn number to these previously declared variables. Another major issue I ran into was handling re-defined variables. I used the following example discussed in class
I originally renamed the variable and wanted to rename all occurrences before the next assignment to the new name, but it was simpler to just add it to my I was originally a bit ambitious and added a type to my value tuples to support constant propagation and folding constants as they came by matching the type, but I didn't get a chance to implement this. TestingI tested my optimizations using the
and then modifying it to be
|
Beta Was this translation helpful? Give feedback.
-
Even later, but here is my implementation for dce and lvn! Dead Code EliminationI implemented both global and local reassignment here, but I Here is the buggy optimization I first wrote:
Because it would not recognize that the Local Value NumberingMy implementation deals with common subexpression elimination and copy propagation, but not constant folding. I followed the pseudocode in class as closely as possible, but there are a few cases that my implementation does not optimize. To avoid the issue of renaming subsequent uses of a variable when it gets clobbered, I simply delete the old entry in the value table when the variable is clobbered, so that future instructions with the same subexpression cannot replace it with the old variable name. This means that
does not get optimized. I also did not explicitly give each value its own number. Instead, I kept track of them using canonical variables. This is because I initially intended to perform a separate pass to rename each variable with a unique identifier, but I was unable to get this code working in time. TestingI tested these algorithms using turnt and the provided example tests, as well as added a few tests of my own. DiscussionI found dead code elimination and LVM conceptually difficult. I forgot how complex global analayses that utilize the structure of the control flow graph are, so I spent a while working on optimizations that I eventually abandoned because I decided they were a little outside the scope of this lesson. |
Beta Was this translation helpful? Give feedback.
-
FINALLY DONE! TWO WHOLE WEEKS! This was one of the hardest programming things I've ever worked on (or at least felt like it). I've written greater quantities of code, but in terms of rewriting and thinking and implementing and failing, this has to take the cake. I mean, other programming assignments/projects or whatever, I'm at least making visual progress, like the file size is going up and I'm making small tweaks here and there. This? Nah. A lot of it probably could've been contributed to the fact that I thought I could do LVN better and tried implementing something myself by renaming everything beforehand and then trying some really scuffed no-row indexed LVN, but just having it with 3 columns and following the pseudocode made it so much easier. Although doing it the "incorrect" way made me see the benefits of the current implementation. No constant folding and no copy prop and all of that, although I may revisit those later if I'm feeling good about it. After and before for clobber.bril:
DCELocalThis one was fairly straightforward, although I thought about it backwards. Because when I implemented this for a previous course, the algorithm I came up with was to go backwards and do the consumption backwards, instead of forwards In a sense, I think it's similar to the going forwards, but idk it makes more sense to me backwards. "Consumption" in this sense meaning to match uses with declarations, so every use has a preceding declaration to match (if the block ends that's also fine), but if there's excessive declaration, the first "use case of variable" would've been matched already so the declaration would get skipped and therefore optimized away. GlobalThis one was also straightforward, I think? It was like just have certain, really restrictive qualities for dead code, and they were for me:
I think this is enough to guarantee correctness in the bril language. LVNThis one was really, really annoying as stated. When I finally sat down and decided to actually implement the algorithm provided in class instead of trying to make my own, I still had a lot of difficulties resolving edge cases. I think the three most important ones were:
My solution was to instead of making the table a map, making it an array and then storing each object as a (tuple, string) tuple. The variables that came from outside the function and the variables that have a destination have just straight up no value in the tuple, it's empty. I don't know this is the best idea, but I took the idea from the lecture because the first row was empty when storing variable x. There's probably some better way, but I don't want to think more about this problem for some time tbh. (actually thinking about this, the side effect operation one is a combination of the first and last one, so I overcomplicated it, but agh whatever) TestingI ran testing by first confirming that on my test cases (I copied the ones from the lecture and made a few of my own involving jumps) and the ones in the lvn folder all ran fine and had the same outputs (or in the case of divide-by-zero, the code looked compilable), then I looked at the difference between each file and the optimized output. I didn't look at the dyn-ops but given that the code was much much shorter I'd imagine it'd be better. |
Beta Was this translation helpful? Give feedback.
-
The tasks for this lesson include implementing basic dead-code elimination (DCE) and, as the main event, implementing local value numbering (LVN). I ❤️ LVN!
Beta Was this translation helpful? Give feedback.
All reactions