-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SE layer fix when not using fused kernel #852
Conversation
ankan-ban
commented
May 16, 2019
- only SE layer needs transposed weights.
- need to store non-transposed weights too in case we have to fall back.
use bestmove_is_sent_ for Search::IsSearchActive() (LeelaChessZero#502)
get latest
get latest
get latest
get latest
Get latest
- replace all cudaMemcpyAsync used for loading weights with cudaMemcpy as source (in CPU memory) could be deleted before the async version of the function actually does the copy. - minor naming/style changes. - add comment explaining what the policy map layer does and how the layout conversion from CHW to HWC works.
get latest
get latest
get latest
- only SE layer needs transposed weights. - need to store non-transposed weights too in case we have to fall back.
I'm feeling confused - the PR title says this is a fix for when not using fused kernel - but I see 0 logical changes outside of the kUseFusedSELayer paths. |
The issue was that we were always transposing weights for the FC layers when kUseFusedSELayer is true (in load weights function). If a matching filter size isn't found, the fused implementation can return failure and we fall back to non fused path (even when kUseFusedSELayer is set), and the non fused path requires non transposed weights. |
Ahh - could alternatively kUseFusedSELayer not be a constant but instead calculated by whether we actually support fusing - so there is no fallback? |
There should be no performance impact (only one extra copy done at loading time). kUseFusedSELayer flag was supposed to be only a compile time switch for debugging. |