A experiment of fuse all optimize op to one #8941

jacquesqiao · 2018-03-09T14:54:18Z

Background

In fluid's current design, the last part of a program is optimization related operators. For example, in se_resnext example, the last operators are a list of SGD as timeline shows below:

After discuss with @chengduoZH @panyx0718, we think that currently, these little ops will waste a lot of time launching kernels, so maybe we can fuse these little ops into a big one.

There is two part of things to do:

Add a fused SGD op that it takes a list of parameter, gradient and learning rate as the input. This is done by @chengduoZH in PR [Don't merge] Add sgd group #8869.
Add a fuse transpiler to fuse all SGD op into the big one. This is done in PR fuse optimize op transpiler #8940.

Experiment result

Timeline after the fusing.
Running time for all SGD ops. The running time has been reduced by half. There are still promotion space if we can merge all CUDA kernel into one in the sgd_group op.

Notice

This is just an experiment for one kind of solution, there are many other solutions, for example, multi-thread async execution that need to discuss.

chengduoZH · 2018-03-09T16:31:53Z

That is awesome!
I will try to use only one CUDA kernel do all the SGD operations.

jacquesqiao mentioned this issue Mar 9, 2018

fuse optimize op transpiler #8940

Closed

chengduoZH mentioned this issue Mar 12, 2018

SE-ResNeXt Optimization #8990

Closed

shanyi15 closed this as completed Aug 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A experiment of fuse all optimize op to one #8941

A experiment of fuse all optimize op to one #8941

jacquesqiao commented Mar 9, 2018

chengduoZH commented Mar 9, 2018

A experiment of fuse all optimize op to one #8941

A experiment of fuse all optimize op to one #8941

Comments

jacquesqiao commented Mar 9, 2018

Background

Experiment result

Notice

chengduoZH commented Mar 9, 2018