Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove-comb-groups should be optional #911

Closed
1 task
rachitnigam opened this issue Feb 11, 2022 · 9 comments · Fixed by #1422
Closed
1 task

remove-comb-groups should be optional #911

rachitnigam opened this issue Feb 11, 2022 · 9 comments · Fixed by #1422
Labels
AMC Needed for Andrew's memory compiler C: Calyx Extension or change to the Calyx IL S: Available Can be worked upon

Comments

@rachitnigam
Copy link
Contributor

rachitnigam commented Feb 11, 2022

The current compilation pipeline requires remove-comb-groups to be run before any compilation passes are run (tdcc and static-timing). Instead of doing this, the compilation passes should be allowed to directly handle the compilation of combinational groups.

  • tdcc can enable the combinational assignments in the cycle when it is reading the conditional port.
  • As @mikeurbach mentions, static-timing can enable the combinational assignments in the cycle when the conditional port is read and in the case of a @bound while, eliminated altogether.

TODO

  • infer-static-timing should no longer infer timing for if to be max(tr, fl) + 1 since the 1 comes from 1 cycle needed to evaluate the condition group.
@rachitnigam rachitnigam added S: Discussion needed Issues blocked on discussion C: Calyx Extension or change to the Calyx IL labels Feb 11, 2022
@rachitnigam rachitnigam added S: Available Can be worked upon and removed S: Discussion needed Issues blocked on discussion labels Feb 18, 2022
@mikeurbach
Copy link
Collaborator

Logistically, how would this work? Would remove-comb-groups be removed from the pre-opt alias, and the tdcc and top-down-st passes updated accordingly?

@rachitnigam
Copy link
Contributor Author

Yup! Alternatively, if we only implement this in top-down-st, we can move remove-comb-groups to run after top-down-st. There might also be some complications with pre-opt optimizations needing comb groups removed but we can tackle those.

@andrewb1999
Copy link
Collaborator

andrewb1999 commented Nov 15, 2022

Just commenting an example of how this could be useful. The current implementation of remove-comb-groups ends up evaluating the comb group at the end of the loop body and then checking whether the while loop should run again in the next cycle. The check of whether the while loop should run again is a simple 1 bit equality check so it can be easily chained together with any reasonable comb group. It is hard to show exactly how this change will affect the FSM as the current version of tdst does not support non-bounded while loops yet, bet here is a visual idea of how the evaluation of one iteration of the loop would change:

========== means a clock cycle boundary

Current loop body implementation:

==========
loop body groups
...
==========
...
more loop body groups
==========
evaluate comb group
==========
run again check
==========

Without remove comb groups, with chained comb groups during tdst:

==========
loop body groups
...
==========
...
more loop body groups
==========
evaluate comb group
run again check
==========

This only saves one cycle per loop iteration, but that adds up quickly for loops that run for many iterations.

@mikeurbach
Copy link
Collaborator

I have been out of the loop (heh), but I do remember looking at that extra cycle for both statically timed loops as well as the normal latency insensitive loops, and wondering if it would be possible to save the extra cycle of latency.

@rachitnigam
Copy link
Contributor Author

While working on fixing this, I came across this program which cannot be compiled correctly:

    seq {
        if lt.out with C1 {
          A;
        }
        if lt.out with C2 {
          B;
        }
    }

In this case, the predecessors of A & B are computed to be the start of the seq. Normally, this allows the control program to directly transition to B if the condition for A is false, similar to how switch works in C-like languages.

However, when fixing this issue, we run the assignments in the comb group in every predecessor state, that is, every state that can transition into A or B. The result is this FSM:

======== main:tdcc =========
0:
  lt.left = base;
  lt.right = zero.out;
  lt.left = stored_base.out;
  lt.right = one.out;
1:
  rev_res_sign[go] = !rev_res_sign[done] ? 1'd1;
  lt.left = stored_base.out;
  lt.right = one.out;
2:
  set_res_reciprocal[go] = !set_res_reciprocal[done] ? 1'd1;
3:
  <end>
transitions:
  (0, 1): lt.out
  (0, 2): !lt.out & lt.out
  (0, 3): !lt.out
  (1, 2): rev_res_sign[done] & lt.out
  (1, 3): rev_res_sign[done] & !lt.out
  (2, 3): set_res_reciprocal[done]

The obvious problem is that 0 (start state) can transition to both A or B and therefore assignments corresponding fo their comb groups are activated causing a conflict.

Before implementing the fix, we would always transform the program to look like this:

seq {
  C1;
  if lt_reg.out  { A }
  C2;
  if lt_reg.out { B }
}

Which would ensure that conflicting assignments to lt are not produced.

This problem seems to reflect the fact that if <port> with <group> represents some computation being performed while if <port> does not

@sampsyo
Copy link
Contributor

sampsyo commented Nov 21, 2022

Sorry if this is naive, but wouldn't the correct predecessor set for B be both A and the top of the seq? That is, we don't know if control will come from the prior if or whether that will be skipped. (The only predecessor for A would indeed be the top of the seq.)

@rachitnigam
Copy link
Contributor Author

Okay, so I've realized that loops where we chain the condition check with the next group will also potentially suffer from the above problem.

Consider:

while lt.out with C1 {
  while lt.out with C2 {
    A
  }
}

If we exit the inner loop and want to start a new iteration of the outer loop, how many cycles should it take for A to execute again?

The current pass attempts to schedule A in the next cycle but that again is infeasible because of conflicting assignments. I think we might need to say something like "before the first iteration of a loop, we compute the combination group in a different cycle while backward edges are chained with the next iteration".

@sampsyo
Copy link
Contributor

sampsyo commented Nov 23, 2022

Wow, nice catch. I can't help but wonder if there's an underlying principle here that would cover if and while and all their possible nestings/sequencing… maybe there's some general way we could express the need to execute a series of (normal and comb) groups in a way that (a) preserves the illusion of sequential execution, and (b) allows parallelism when there are no conflicts. Maybe this brings us back to the idea of a generalized with statement…

Anyway, yeah, it seems like the thing to do for this particular example is to allow for an extra FSM state (and an extra cycle), at least when the loop conditions conflict.

@rachitnigam
Copy link
Contributor Author

After #1338, I've realized that there is no good way to eliminate this pass in general. Instead, I think we should change it so that it doesn't create unnecessary groups. For example, there is no reason for the pass to compile away invoke-with statements and should instead let compile-invoke handle their compilation.

The new static timing proposal (#1344) will enable us to write loops that can re-execute their bodies every cycle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AMC Needed for Andrew's memory compiler C: Calyx Extension or change to the Calyx IL S: Available Can be worked upon
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants