Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use parent Q as a default score instead of 0 for unvisited pv. #828

Merged
merged 1 commit into from
Apr 13, 2019

Conversation

Mardak
Copy link
Contributor

@Mardak Mardak commented Apr 3, 2019

r?@mooskagh or @Tilps I just noticed from CCC 7 that lc0 would have eval drop from winning to 0 sometimes. This is because there was only one TB-allowed move, so search stops before finishing a visit to any edge. So just use the root eval as an estimate instead of 0.

E.g., 1418 lc0 vs wasp with one dtz minimizing move
Screen Shot 2019-04-03 at 2 41 30 PM

position fen 3Q4/8/8/4K3/Q7/8/5k2/8 w - - 1 110
go nodes 100

# before PR
info depth 1 seldepth 1 time 34 nodes 1 score cp 0 hashfull 0 nps 29 tbhits 2 pv d8d2 f2g1

# after PR
info depth 1 seldepth 1 time 120 nodes 1 score cp 6923 hashfull 0 nps 8 tbhits 2 pv d8d2 f2g1

and 1425 senpai vs lc0 with one winning move
Screen Shot 2019-04-03 at 2 40 33 PM

position fen 8/7P/6K1/8/5P2/1b4k1/7r/8 b - - 2 91
go nodes 100

# before PR
info depth 1 seldepth 1 time 132 nodes 1 score cp 0 hashfull 0 nps 7 tbhits 2 pv b3c2 g6g7

# after PR
info depth 1 seldepth 1 time 48 nodes 1 score cp 3093 hashfull 0 nps 20 tbhits 2 pv b3c2 g6g7

@Tilps
Copy link
Contributor

Tilps commented Apr 3, 2019

While this change seems pretty straight forwardly an improvement, my understanding is that even with 1 legal edge, we should do 1 visit to that edge before aborting search unless another termination condition fires (like out of time). Otherwise in training you can end up with no policy distribution which should crash. So maybe something else needs fixing too?

@Mardak
Copy link
Contributor Author

Mardak commented Apr 3, 2019

The edge is being visited. One thread sets 1 move left, extends the only move and is waiting for NN; but another thread has nothing to do and jumps straight to updating counters and triggering stop noticing there was 1 move left.

The node does come back soon after stop's printing the uci info, so it does end up loading the policy for the next move.

@Tilps
Copy link
Contributor

Tilps commented Apr 3, 2019

Training is single threaded, but it still has watchdog which could fire best move callback - but training uses RunBlocking, so the visit is guaranteed to land before we exit and generate the policy snapshot. Seems a bit fragile but it works I guess. (Until someone tries to fix the slow shutdown issue after watchdog firing because we don't cancel in flight NN evals.)

@Mardak
Copy link
Contributor Author

Mardak commented Apr 7, 2019

CCC 7: Blitz Bonanza Final (5|2) game 76 had an interesting KNNvKP ending that started with 3 consecutive moves with only 1 DTZ minimizing move resulting in it looking like a blunder from +128 to 0.00 especially with Leelenstein reporting only +0.08 eval. Fortunately this wasn't adjudicated drawn, but playing around with the position, if the black king moves in a specific pattern, there could be 10+ consecutive single DTZ minimizing moves all showing as 0.00.

@mooskagh
Copy link
Member

Ensuring at least one child visit seems to be a separate issue, so Ithink I'll merge this one for now.

@mooskagh mooskagh merged commit 7241fd4 into LeelaChessZero:master Apr 13, 2019
@Mardak Mardak deleted the default-score branch April 13, 2019 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants