The Jacobi method and its corresponding code #20

YingHH1 · 2023-11-27T10:31:15Z

In the blog, the Jacobi method is directed to the wiki page on Jacobi iterative method for solving linear system. But here we are solving nonlinear system of equations, so what exactly is the the algorithm? Could it be the Gauss-Jacobi method described in https://www.researchgate.net/profile/Romulo-Chumacero/publication/265633573_Solving_Nonlinear_Equations/links/556e350708aec2268308c3d3/Solving-Nonlinear-Equations.pdf

Furthermore, if it is an iterative method for solving nonlinear equations, then wouldn't the solutions being floating point numbers rather than the integers in input_ids for LLM?

I would like to understand the overall structure of the code implementation, and find out more details of the whole Lookahead decoding algorithm (including the solving nonlinear equations using Jacobi method part). However, I couldn't locate the relevant code, so any direction on where to look for code for implementing the Jacobi method would be greatly appreciated.

Many thanks for the help!

Viol2000 · 2023-11-28T05:41:47Z

Hi, thanks for your interest!

I think Jacobi is similar in non-linear and linear systems. If every problem in the system is a linear function, it is a linear system, or it will be a non-linear system. Moreover, I think Gauss-Seidel and Jacobi are different on the parallelism side, and Jacobi is more parallelized. You can take a look at these two papers: https://proceedings.mlr.press/v139/song21a/song21a.pdf and https://arxiv.org/pdf/2305.10427.pdf.

The previous methods tried to apply Jacobi-iteration in accelerating the language model decoding process, but I think they do not have a very visible speedup. I think one reason is the floating point problem. The Jacobi method does not work well as the input and output are discrete integers rather than floating points in the LLM decoding process.

Our method, though motivated by Jacobi iteration. It is actually different from it. The lookahead branch is like a delayed Jacobi iteration (I do not know its exact name) with several inputs from a much earlier step rather than the last step to solve several non-linear systems (LLM) to obtain output for each window slot. And we capture the tokens from this process. Here is another model detailed discuss. When the N-grams's N=2, it becomes a Jacobi iteration. Here, you can simply think that LLM decoding one token solves its non-linear problem. Parallelly decoding several consecutive tokens solves a non-linear system in parallel or in a Jacobi style.

Suppose you want to dive into the code. I think you should shortly discard the Jacobi iteration and focus on our algorithm instead because it is not exactly the Jacobi you want. You can take a look at and track this variable past_tokens. It updates as a Jacobi iteration to obtain new n-grams.

YingHH1 · 2023-11-30T04:37:22Z

Thank you for the response. I think I understand the whole algorithm now.

YingHH1 closed this as completed Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Jacobi method and its corresponding code #20

The Jacobi method and its corresponding code #20

YingHH1 commented Nov 27, 2023

Viol2000 commented Nov 28, 2023

YingHH1 commented Nov 30, 2023 •

edited

Loading

The Jacobi method and its corresponding code #20

The Jacobi method and its corresponding code #20

Comments

YingHH1 commented Nov 27, 2023

Viol2000 commented Nov 28, 2023

YingHH1 commented Nov 30, 2023 • edited Loading

YingHH1 commented Nov 30, 2023 •

edited

Loading