Use parent Q as a default score instead of 0 for unvisited pv. #828

Mardak · 2019-04-03T21:57:53Z

r?@mooskagh or @Tilps I just noticed from CCC 7 that lc0 would have eval drop from winning to 0 sometimes. This is because there was only one TB-allowed move, so search stops before finishing a visit to any edge. So just use the root eval as an estimate instead of 0.

E.g., 1418 lc0 vs wasp with one dtz minimizing move

position fen 3Q4/8/8/4K3/Q7/8/5k2/8 w - - 1 110
go nodes 100

# before PR
info depth 1 seldepth 1 time 34 nodes 1 score cp 0 hashfull 0 nps 29 tbhits 2 pv d8d2 f2g1

# after PR
info depth 1 seldepth 1 time 120 nodes 1 score cp 6923 hashfull 0 nps 8 tbhits 2 pv d8d2 f2g1

and 1425 senpai vs lc0 with one winning move

position fen 8/7P/6K1/8/5P2/1b4k1/7r/8 b - - 2 91
go nodes 100

# before PR
info depth 1 seldepth 1 time 132 nodes 1 score cp 0 hashfull 0 nps 7 tbhits 2 pv b3c2 g6g7

# after PR
info depth 1 seldepth 1 time 48 nodes 1 score cp 3093 hashfull 0 nps 20 tbhits 2 pv b3c2 g6g7

Tilps · 2019-04-03T22:30:48Z

While this change seems pretty straight forwardly an improvement, my understanding is that even with 1 legal edge, we should do 1 visit to that edge before aborting search unless another termination condition fires (like out of time). Otherwise in training you can end up with no policy distribution which should crash. So maybe something else needs fixing too?

Mardak · 2019-04-03T23:25:30Z

The edge is being visited. One thread sets 1 move left, extends the only move and is waiting for NN; but another thread has nothing to do and jumps straight to updating counters and triggering stop noticing there was 1 move left.

The node does come back soon after stop's printing the uci info, so it does end up loading the policy for the next move.

Tilps · 2019-04-03T23:39:09Z

Training is single threaded, but it still has watchdog which could fire best move callback - but training uses RunBlocking, so the visit is guaranteed to land before we exit and generate the policy snapshot. Seems a bit fragile but it works I guess. (Until someone tries to fix the slow shutdown issue after watchdog firing because we don't cancel in flight NN evals.)

Mardak · 2019-04-07T12:03:12Z

CCC 7: Blitz Bonanza Final (5|2) game 76 had an interesting KNNvKP ending that started with 3 consecutive moves with only 1 DTZ minimizing move resulting in it looking like a blunder from +128 to 0.00 especially with Leelenstein reporting only +0.08 eval. Fortunately this wasn't adjudicated drawn, but playing around with the position, if the black king moves in a specific pattern, there could be 10+ consecutive single DTZ minimizing moves all showing as 0.00.

mooskagh · 2019-04-13T20:14:18Z

Ensuring at least one child visit seems to be a separate issue, so Ithink I'll merge this one for now.

Use parent Q as a default score instead of 0 for unvisited pv.

b0361ff

mooskagh approved these changes Apr 13, 2019

View reviewed changes

mooskagh merged commit 7241fd4 into LeelaChessZero:master Apr 13, 2019

Mardak deleted the default-score branch April 13, 2019 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use parent Q as a default score instead of 0 for unvisited pv. #828

Use parent Q as a default score instead of 0 for unvisited pv. #828

Mardak commented Apr 3, 2019

Tilps commented Apr 3, 2019

Mardak commented Apr 3, 2019

Tilps commented Apr 3, 2019

Mardak commented Apr 7, 2019

mooskagh commented Apr 13, 2019

Use parent Q as a default score instead of 0 for unvisited pv. #828

Use parent Q as a default score instead of 0 for unvisited pv. #828

Conversation

Mardak commented Apr 3, 2019

Tilps commented Apr 3, 2019

Mardak commented Apr 3, 2019

Tilps commented Apr 3, 2019

Mardak commented Apr 7, 2019

mooskagh commented Apr 13, 2019