Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SE layer fix when not using fused kernel #852

Merged
merged 14 commits into from
May 17, 2019

Conversation

ankan-ban
Copy link
Member

  • only SE layer needs transposed weights.
  • need to store non-transposed weights too in case we have to fall back.

use bestmove_is_sent_ for Search::IsSearchActive() (LeelaChessZero#502)
- replace all cudaMemcpyAsync used for loading weights with cudaMemcpy as  source (in CPU memory) could be deleted before the async version of the function actually does the copy.
- minor naming/style changes.
- add comment explaining what the policy map layer does and how the layout conversion from CHW to HWC works.
- only SE layer needs transposed weights.
- need to store non-transposed weights too in case we have to fall back.
@Tilps
Copy link
Contributor

Tilps commented May 17, 2019

I'm feeling confused - the PR title says this is a fix for when not using fused kernel - but I see 0 logical changes outside of the kUseFusedSELayer paths.

@ankan-ban
Copy link
Member Author

The issue was that we were always transposing weights for the FC layers when kUseFusedSELayer is true (in load weights function). If a matching filter size isn't found, the fused implementation can return failure and we fall back to non fused path (even when kUseFusedSELayer is set), and the non fused path requires non transposed weights.

@Tilps
Copy link
Contributor

Tilps commented May 17, 2019

Ahh - could alternatively kUseFusedSELayer not be a constant but instead calculated by whether we actually support fusing - so there is no fallback?
I assume there is negligable performance impact of this change as is? If so then maybe it doesn't matter.

@ankan-ban
Copy link
Member Author

ankan-ban commented May 17, 2019

There should be no performance impact (only one extra copy done at loading time).

kUseFusedSELayer flag was supposed to be only a compile time switch for debugging.

@ankan-ban ankan-ban merged commit 07babd1 into LeelaChessZero:master May 17, 2019
@ankan-ban ankan-ban deleted the misc branch May 17, 2019 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants