-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine cos-sim-op #6601
Refine cos-sim-op #6601
Conversation
784740d
to
14d3271
Compare
This way of writing maybe not good, but in terms of speed, this way is 10 times faster than before using eigen. Config and Env
Config and EnvThe experimental environment is the same as that described above, only the code is different.
|
a45bc33
to
784740d
Compare
63e3ff5
to
116bde6
Compare
116bde6
to
49df2a7
Compare
… profiling/cosine_op_debug
paddle/operators/cos_sim_op.h
Outdated
z_(z), | ||
cols_(static_cast<size_t>(cols)) {} | ||
|
||
inline HOSTDEVICE void operator()(size_t offset) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change offset
=> i
and below for
to j
may be more clear? Or row_id and col_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I have replace offset
with row_id
.
72ce007
to
c2577f4
Compare
c2577f4
to
812c5f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM++
kernel block_size
can use some global configuration, since it's rarely chaged.
Yes, I will change this in the next PR. |
Fix #6486
Experiments Env:
Code:
Total time of 1 Pass:
I found GPU running a little slower than CPU. The result of @typhoonzero's statistics in this issue is also a little slower in GPU.