SE layer fix when not using fused kernel #852

ankan-ban · 2019-05-16T17:18:10Z

only SE layer needs transposed weights.
need to store non-transposed weights too in case we have to fall back.

use bestmove_is_sent_ for Search::IsSearchActive() (LeelaChessZero#502)

get latest

Get latest

- replace all cudaMemcpyAsync used for loading weights with cudaMemcpy as source (in CPU memory) could be deleted before the async version of the function actually does the copy. - minor naming/style changes. - add comment explaining what the policy map layer does and how the layout conversion from CHW to HWC works.

get latest

- only SE layer needs transposed weights. - need to store non-transposed weights too in case we have to fall back.

Tilps · 2019-05-17T02:06:35Z

I'm feeling confused - the PR title says this is a fix for when not using fused kernel - but I see 0 logical changes outside of the kUseFusedSELayer paths.

ankan-ban · 2019-05-17T02:16:21Z

The issue was that we were always transposing weights for the FC layers when kUseFusedSELayer is true (in load weights function). If a matching filter size isn't found, the fused implementation can return failure and we fall back to non fused path (even when kUseFusedSELayer is set), and the non fused path requires non transposed weights.

Tilps · 2019-05-17T05:31:54Z

Ahh - could alternatively kUseFusedSELayer not be a constant but instead calculated by whether we actually support fusing - so there is no fallback?
I assume there is negligable performance impact of this change as is? If so then maybe it doesn't matter.

ankan-ban · 2019-05-17T05:44:00Z

There should be no performance impact (only one extra copy done at loading time).

kUseFusedSELayer flag was supposed to be only a compile time switch for debugging.

ankan-ban added 14 commits November 11, 2018 18:09

Merge pull request #3 from LeelaChessZero/master

e3ad2c0

use bestmove_is_sent_ for Search::IsSearchActive() (LeelaChessZero#502)

Merge pull request #4 from LeelaChessZero/master

b2e5114

get latest

Merge pull request #7 from LeelaChessZero/master

beed96e

get latest

Merge pull request #8 from LeelaChessZero/master

80ac4a1

get latest

Merge pull request #10 from LeelaChessZero/master

0f7bc50

get latest

Merge pull request #11 from LeelaChessZero/master

e4737e3

Get latest

fix typo in comment

acfd7c1

clang-format

33f3d57

address review comment

1976777

Merge pull request #13 from LeelaChessZero/master

8f46984

get latest

Merge pull request #14 from LeelaChessZero/master

b8dd014

get latest

Merge pull request #15 from LeelaChessZero/master

44a21bd

get latest

SE layer fix when not using fused kernel

1299ca9

- only SE layer needs transposed weights. - need to store non-transposed weights too in case we have to fall back.

Tilps approved these changes May 17, 2019

View reviewed changes

ankan-ban merged commit 07babd1 into LeelaChessZero:master May 17, 2019

ankan-ban deleted the misc branch May 17, 2019 07:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SE layer fix when not using fused kernel #852

SE layer fix when not using fused kernel #852

ankan-ban commented May 16, 2019

Tilps commented May 17, 2019 •

edited

Loading

ankan-ban commented May 17, 2019

Tilps commented May 17, 2019

ankan-ban commented May 17, 2019 •

edited

Loading

SE layer fix when not using fused kernel #852

SE layer fix when not using fused kernel #852

Conversation

ankan-ban commented May 16, 2019

Tilps commented May 17, 2019 • edited Loading

ankan-ban commented May 17, 2019

Tilps commented May 17, 2019

ankan-ban commented May 17, 2019 • edited Loading

Tilps commented May 17, 2019 •

edited

Loading

ankan-ban commented May 17, 2019 •

edited

Loading