Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine cos-sim-op #6601

Merged
merged 10 commits into from
Jan 2, 2018
Merged

Conversation

chengduoZH
Copy link
Contributor

@chengduoZH chengduoZH commented Dec 14, 2017

Fix #6486

I found GPU running a little slower than CPU. The result of @typhoonzero's statistics in this issue is also a little slower in GPU.

@chengduoZH chengduoZH force-pushed the profiling/cosine_op branch 2 times, most recently from 784740d to 14d3271 Compare December 16, 2017 10:56
@chengduoZH
Copy link
Contributor Author

chengduoZH commented Dec 16, 2017

This way of writing maybe not good, but in terms of speed, this way is 10 times faster than before using eigen.

Config and Env

I1216 11:10:28.262279  4923 Stat.cc:102] ======= StatSet: [GlobalStatInfo] status ======
I1216 11:10:28.262320  4923 Stat.cc:105] Stat=sgd                            TID=4923   total=4661.3     avg=0.028      max=2.98       min=0.003      count=161782
I1216 11:10:28.262349  4923 Stat.cc:105] Stat=sequence_conv_grad             TID=4923   total=5843.71    avg=0.83       max=5.892      min=0.095      count=7034
I1216 11:10:28.262361  4923 Stat.cc:105] Stat=concat_grad                    TID=4923   total=613.533    avg=0.043      max=2.679      min=0.012      count=14068
I1216 11:10:28.262370  4923 Stat.cc:105] Stat=sequence_pool_grad             TID=4923   total=4908.71    avg=0.348      max=3.124      min=0.031      count=14068
I1216 11:10:28.262380  4923 Stat.cc:105] Stat=tanh_grad                      TID=4923   total=593.86     avg=0.028      max=2.743      min=0.006      count=21102
I1216 11:10:28.262389  4923 Stat.cc:105] Stat=elementwise_add                TID=4923   total=1965.21    avg=0.034      max=2.712      min=0.004      count=56272
I1216 11:10:28.262423  4923 Stat.cc:105] Stat=square_grad                    TID=4923   total=31.446     avg=0.004      max=0.019      min=0.003      count=7034
I1216 11:10:28.262436  4923 Stat.cc:105] Stat=elementwise_sub_grad           TID=4923   total=53.291     avg=0.007      max=0.047      min=0.006      count=7034
I1216 11:10:28.262447  4923 Stat.cc:105] Stat=cos_sim_grad                   TID=4923   total=13780.8   *avg=1.959      max=7.045      min=0.167      count=7034
I1216 11:10:28.262457  4923 Stat.cc:105] Stat=ExecutorRunTimer               TID=4923   total=54087.2    avg=7.688      max=40.391     min=2.004      count=7035
I1216 11:10:28.262466  4923 Stat.cc:105] Stat=lookup_table                   TID=4923   total=861.373    avg=0.017      max=2.677      min=0.004      count=49238
I1216 11:10:28.262476  4923 Stat.cc:105] Stat=uniform_random                 TID=4923   total=3.58       avg=0.238      max=1.563      min=0.002      count=15
I1216 11:10:28.262485  4923 Stat.cc:105] Stat=mul_grad                       TID=4923   total=2579.63    avg=0.052      max=6.61       min=0.009      count=49238
I1216 11:10:28.262495  4923 Stat.cc:105] Stat=mul                            TID=4923   total=1313.67    avg=0.026      max=21.773     min=0.006      count=49238
I1216 11:10:28.262503  4923 Stat.cc:105] Stat=sequence_conv                  TID=4923   total=2861.32    avg=0.406      max=4.465      min=0.051      count=7034
I1216 11:10:28.262511  4923 Stat.cc:105] Stat=CreateLocalScopeTimer          TID=4923   total=812.327    avg=0.115      max=3.91       min=0.028      count=7035
I1216 11:10:28.262521  4923 Stat.cc:105] Stat=elementwise_add_grad           TID=4923   total=2687.21    avg=0.047      max=2.822      min=0.006      count=56272
I1216 11:10:28.262529  4923 Stat.cc:105] Stat=elementwise_sub                TID=4923   total=50.991     avg=0.007      max=0.073      min=0.005      count=7034
I1216 11:10:28.262538  4923 Stat.cc:105] Stat=lookup_table_grad              TID=4923   total=817.553    avg=0.016      max=2.657      min=0.004      count=49238
I1216 11:10:28.262547  4923 Stat.cc:105] Stat=DeleteLocalScopeTimer          TID=4923   total=727.117    avg=0.103      max=2.836      min=0          count=7035
I1216 11:10:28.262560  4923 Stat.cc:105] Stat=feed                           TID=4923   total=155.101    avg=0.002      max=2.639      min=0.001      count=56272
I1216 11:10:28.262572  4923 Stat.cc:105] Stat=cos_sim                        TID=4923   total=1242.92   *avg=0.176      max=2.796      min=0.021      count=7034
I1216 11:10:28.262581  4923 Stat.cc:105] Stat=CreateOpTimer                  TID=4923   total=1470.28    avg=0.002      max=2.643      min=0          count=710480
I1216 11:10:28.262594  4923 Stat.cc:105] Stat=fill_constant                  TID=4923   total=23.298     avg=0.003      max=0.174      min=0.001      count=7065
I1216 11:10:28.262606  4923 Stat.cc:105] Stat=concat                         TID=4923   total=1081.62    avg=0.076      max=2.736      min=0.012      count=14068
I1216 11:10:28.262617  4923 Stat.cc:105] Stat=sequence_pool                  TID=4923   total=3199.65    avg=0.227      max=3.001      min=0.022      count=14068
I1216 11:10:28.262629  4923 Stat.cc:105] Stat=square                         TID=4923   total=24.624     avg=0.003      max=0.034      min=0.002      count=7034
I1216 11:10:28.262639  4923 Stat.cc:105] Stat=mean                           TID=4923   total=23.769     avg=0.003      max=0.042      min=0.002      count=7034
I1216 11:10:28.262652  4923 Stat.cc:105] Stat=fetch                          TID=4923   total=18.538     avg=0.002      max=0.018      min=0.002      count=7034
I1216 11:10:28.262663  4923 Stat.cc:105] Stat=tanh                           TID=4923   total=986.212    avg=0.046      max=2.662      min=0.007      count=21102
I1216 11:10:28.262676  4923 Stat.cc:105] Stat=mean_grad                      TID=4923   total=52.354     avg=0.007      max=0.046      min=0.004      count=7034

Config and Env

The experimental environment is the same as that described above, only the code is different.

I1216 11:18:42.599000  5020 Stat.cc:102] ======= StatSet: [GlobalStatInfo] status ======
I1216 11:18:42.599068  5020 Stat.cc:105] Stat=sgd                            TID=5020   total=6178.64    avg=0.038      max=4.322      min=0.003      count=161782
I1216 11:18:42.599099  5020 Stat.cc:105] Stat=sequence_conv_grad             TID=5020   total=9347.53    avg=1.328      max=8.59       min=0.268      count=7034
I1216 11:18:42.599117  5020 Stat.cc:105] Stat=concat_grad                    TID=5020   total=809.502    avg=0.057      max=7.832      min=0.019      count=14068
I1216 11:18:42.599134  5020 Stat.cc:105] Stat=sequence_pool_grad             TID=5020   total=8229.83    avg=0.585      max=6.552      min=0.116      count=14068
I1216 11:18:42.599149  5020 Stat.cc:105] Stat=tanh_grad                      TID=5020   total=828.636    avg=0.039      max=3.286      min=0.009      count=21102
I1216 11:18:42.599169  5020 Stat.cc:105] Stat=elementwise_add                TID=5020   total=2711.11    avg=0.048      max=3.342      min=0.006      count=56272
I1216 11:18:42.599185  5020 Stat.cc:105] Stat=square_grad                    TID=5020   total=47.618     avg=0.006      max=2.215      min=0.004      count=7034
I1216 11:18:42.599200  5020 Stat.cc:105] Stat=elementwise_sub_grad           TID=5020   total=76.46      avg=0.01       max=0.043      min=0.007      count=7034
I1216 11:18:42.599213  5020 Stat.cc:105] Stat=cos_sim_grad                   TID=5020   total=136841    *avg=19.454     max=41.869     min=1.832      count=7034
I1216 11:18:42.599223  5020 Stat.cc:105] Stat=ExecutorRunTimer               TID=5020   total=263645     avg=37.476     max=925.089    min=4.034      count=7035
I1216 11:18:42.599241  5020 Stat.cc:105] Stat=lookup_table                   TID=5020   total=1837.72    avg=0.037      max=4.261      min=0.005      count=49238
I1216 11:18:42.599256  5020 Stat.cc:105] Stat=uniform_random                 TID=5020   total=3.532      avg=0.235      max=1.461      min=0.002      count=15
I1216 11:18:42.599272  5020 Stat.cc:105] Stat=mul_grad                       TID=5020   total=3195.79    avg=0.064      max=11.355     min=0.016      count=49238
I1216 11:18:42.599298  5020 Stat.cc:105] Stat=mul                            TID=5020   total=1779.07    avg=0.036      max=18.428     min=0.008      count=49238
I1216 11:18:42.599313  5020 Stat.cc:105] Stat=sequence_conv                  TID=5020   total=4855.38    avg=0.69       max=7.503      min=0.176      count=7034
I1216 11:18:42.599326  5020 Stat.cc:105] Stat=CreateLocalScopeTimer          TID=5020   total=1028.88    avg=0.146      max=4.527      min=0.044      count=7035
I1216 11:18:42.599341  5020 Stat.cc:105] Stat=elementwise_add_grad           TID=5020   total=3242.68    avg=0.057      max=4.629      min=0.011      count=56272
I1216 11:18:42.599355  5020 Stat.cc:105] Stat=elementwise_sub                TID=5020   total=72.778     avg=0.01       max=2.654      min=0.006      count=7034
I1216 11:18:42.599370  5020 Stat.cc:105] Stat=lookup_table_grad              TID=5020   total=42960.8    avg=0.872      max=877.8      min=0.028      count=49238
I1216 11:18:42.599386  5020 Stat.cc:105] Stat=DeleteLocalScopeTimer          TID=5020   total=25503.1    avg=3.625      max=37.766     min=0.001      count=7035
I1216 11:18:42.599401  5020 Stat.cc:105] Stat=feed                           TID=5020   total=488.886    avg=0.008      max=2.97       min=0.001      count=56272
I1216 11:18:42.599416  5020 Stat.cc:105] Stat=cos_sim                        TID=5020   total=1744.48   *avg=0.248      max=7.265      min=0.037      count=7034
I1216 11:18:42.599431  5020 Stat.cc:105] Stat=CreateOpTimer                  TID=5020   total=2127.62    avg=0.002      max=4.47       min=0          count=710480
I1216 11:18:42.599447  5020 Stat.cc:105] Stat=fill_constant                  TID=5020   total=33.835     avg=0.004      max=0.203      min=0.001      count=7065
I1216 11:18:42.599460  5020 Stat.cc:105] Stat=concat                         TID=5020   total=1239.54    avg=0.088      max=4.125      min=0.018      count=14068
I1216 11:18:42.599476  5020 Stat.cc:105] Stat=sequence_pool                  TID=5020   total=5816.23    avg=0.413      max=7.131      min=0.088      count=14068
I1216 11:18:42.599490  5020 Stat.cc:105] Stat=square                         TID=5020   total=34.847     avg=0.004      max=0.057      min=0.003      count=7034
I1216 11:18:42.599505  5020 Stat.cc:105] Stat=mean                           TID=5020   total=32.475     avg=0.004      max=0.047      min=0.003      count=7034
I1216 11:18:42.599520  5020 Stat.cc:105] Stat=fetch                          TID=5020   total=30.044     avg=0.004      max=0.027      min=0.002      count=7034
I1216 11:18:42.599535  5020 Stat.cc:105] Stat=tanh                           TID=5020   total=1511.34    avg=0.071      max=7.862      min=0.011      count=21102
I1216 11:18:42.599550  5020 Stat.cc:105] Stat=mean_grad                      TID=5020   total=72.27      avg=0.01       max=2.152      min=0.006      count=7034

z_(z),
cols_(static_cast<size_t>(cols)) {}

inline HOSTDEVICE void operator()(size_t offset) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change offset => i and below for to j may be more clear? Or row_id and col_id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I have replace offset with row_id.

@chengduoZH chengduoZH force-pushed the profiling/cosine_op branch 2 times, most recently from 72ce007 to c2577f4 Compare December 29, 2017 09:37
Copy link
Contributor

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM++

kernel block_size can use some global configuration, since it's rarely chaged.

@chengduoZH
Copy link
Contributor Author

Yes, I will change this in the next PR.

@chengduoZH chengduoZH merged commit f58fe6d into PaddlePaddle:develop Jan 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants