fix calculation of max size needed to hold tensors #892

ankan-ban · 2019-06-16T06:22:52Z

kNumFilters can be smaller than kNumInputPlanes. We need to take max of them for computing sizes of intermediate buffers.
for wdl, we need 3x the memory for holding value head output.

use bestmove_is_sent_ for Search::IsSearchActive() (LeelaChessZero#502)

get latest

Get latest

- replace all cudaMemcpyAsync used for loading weights with cudaMemcpy as source (in CPU memory) could be deleted before the async version of the function actually does the copy. - minor naming/style changes. - add comment explaining what the policy map layer does and how the layout conversion from CHW to HWC works.

get latest

- just add template instantiations. - verified that it works and provides a (very) slight speedup.

get latest

- no of filters can be less than no. of input planes!

- need 3x the size for wdl

src/neural/cuda/network_cudnn.cc

oscardssmith · 2019-06-16T17:06:23Z

Why does wdl change the policy head?

ankan-ban · 2019-06-16T17:12:29Z

It doesn't. I meant the value head. Updated description.

* fix maxSize for tensor buffers - no of filters can be less than no. of input planes! * fix calculation for value head allocation too - need 3x the size for wdl * take max of all layers when computing size to avoid assumptions

ankan-ban added 19 commits November 11, 2018 18:09

Merge pull request #3 from LeelaChessZero/master

e3ad2c0

use bestmove_is_sent_ for Search::IsSearchActive() (LeelaChessZero#502)

Merge pull request #4 from LeelaChessZero/master

b2e5114

get latest

Merge pull request #7 from LeelaChessZero/master

beed96e

get latest

Merge pull request #8 from LeelaChessZero/master

80ac4a1

get latest

Merge pull request #10 from LeelaChessZero/master

0f7bc50

get latest

Merge pull request #11 from LeelaChessZero/master

e4737e3

Get latest

fix typo in comment

acfd7c1

clang-format

33f3d57

address review comment

1976777

Merge pull request #13 from LeelaChessZero/master

8f46984

get latest

Merge pull request #14 from LeelaChessZero/master

b8dd014

get latest

Merge pull request #15 from LeelaChessZero/master

44a21bd

get latest

Merge pull request #16 from LeelaChessZero/master

6326c1b

get latest

Add 320 and 352 channel support for fused SE layer

ba6c50b

- just add template instantiations. - verified that it works and provides a (very) slight speedup.

Update fp16_kernels.cu

792c6f3

Merge pull request #17 from LeelaChessZero/master

f271c2c

get latest

fix maxSize for tensor buffers

6c787c4

- no of filters can be less than no. of input planes!

fix calculation for value head allocation too

3c39c5a

- need 3x the size for wdl

ankan-ban requested a review from Tilps June 16, 2019 06:51

fix circleci break

999112c

Ttl approved these changes Jun 16, 2019

View reviewed changes

Tilps reviewed Jun 16, 2019

View reviewed changes

src/neural/cuda/network_cudnn.cc Outdated Show resolved Hide resolved

take max of all layers when computing size to avoid assumptions

233d71a

Tilps approved these changes Jun 16, 2019

View reviewed changes

ankan-ban merged commit 7615177 into LeelaChessZero:master Jun 16, 2019

ankan-ban deleted the bugfix branch June 16, 2019 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix calculation of max size needed to hold tensors #892

fix calculation of max size needed to hold tensors #892

ankan-ban commented Jun 16, 2019 •

edited

Loading

oscardssmith commented Jun 16, 2019

ankan-ban commented Jun 16, 2019

fix calculation of max size needed to hold tensors #892

fix calculation of max size needed to hold tensors #892

Conversation

ankan-ban commented Jun 16, 2019 • edited Loading

oscardssmith commented Jun 16, 2019

ankan-ban commented Jun 16, 2019

ankan-ban commented Jun 16, 2019 •

edited

Loading