Difei Gao, Lei Ji, Luowei Zhou, Kevin Qinghong Lin, Joya Chen, Zihan Fan, Mike Zheng Shou
AssistGPT can reason in an interleaved language and code format. Given a query input and visual inputs, AssistGPT plans the problem-solving path in language, using structured code to call upon various powerful tools. The Inspector, part of the system, can manage visual inputs and intermediate results, assisting the Planner to invoke tools. Meanwhile, the Learner can assess the reasoning process and collect in-context examples.
The code will be released soon.