You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In fluid's current design, the last part of a program is optimization related operators. For example, in se_resnext example, the last operators are a list of SGD as timeline shows below:
After discuss with @chengduoZH@panyx0718, we think that currently, these little ops will waste a lot of time launching kernels, so maybe we can fuse these little ops into a big one.
Running time for all SGD ops. The running time has been reduced by half. There are still promotion space if we can merge all CUDA kernel into one in the sgd_group op.
Notice
This is just an experiment for one kind of solution, there are many other solutions, for example, multi-thread async execution that need to discuss.
The text was updated successfully, but these errors were encountered:
Background
In fluid's current design, the last part of a program is optimization related operators. For example, in se_resnext example, the last operators are a list of SGD as timeline shows below:
After discuss with @chengduoZH @panyx0718, we think that currently, these little ops will waste a lot of time launching kernels, so maybe we can fuse these little ops into a big one.
There is two part of things to do:
Experiment result
Timeline after the fusing.
Running time for all SGD ops. The running time has been reduced by half. There are still promotion space if we can merge all CUDA kernel into one in the sgd_group op.
Notice
This is just an experiment for one kind of solution, there are many other solutions, for example, multi-thread async execution that need to discuss.
The text was updated successfully, but these errors were encountered: