Magic bitboards #628

ddobbelaere · 2018-12-29T19:34:15Z

This PR adds "fancy magic bitboards" to move generation.

The following measurements all share the command line lc0 benchmark --backend=random --smart-pruning-factor=0 --nodes=1000000 --nncache=1000000. A 95% confidence interval for the average of the last reported nps value over 25 runs is given for the starting position, the opening position https://lichess.org/editor/rn2kb1r/pp3ppp/2p1pnb1/8/3P1N1q/4B1N1/PPPQ1PPP/R3KB1R_b_KQkq_-_5_9 and the endgame position https://lichess.org/editor/8/2R2P2/1p4K1/6R1/1r6/5rP1/1k6/2q5_w_-_-_0_55 .

The measurements are performed on an Intel Core i5-5300U CPU @ 2.30GHz with 4 cores and L1d cache: 32K, L1i cache: 32K, L2 cache: 256K, L3 cache: 3072K.

For 1 search thread there is a substantial speedup for all positions:

position	master `0af66eee`	#628	speedup
start	186968 ± 313	189317 ± 229	1.3%
opening	141347 ± 340	144938 ± 354	2.5%
endgame	154753 ± 248	162247 ± 208	4.8%

For 2 search threads the performance drops slightly on my system for all positions:

position	master `0af66eee`	#628	speedup
start	227397 ± 565	226473 ± 745	-0.4%
opening	180583 ± 442	178087 ± 569	-1.4%
endgame	191124 ± 530	188983 ± 463	-1.1%

I think this can be completely attributed to the fact that there are more cache misses as the 3MB L3 cache is shared between all cores on my CPU (and not that big really). Note that the rook attack tables are ~800KB and the bishop attack tables are ~40KB.

The speedup will probably be positive for 2 search threads on higher-end CPUs. Feel free to test!

mooskagh · 2018-12-29T21:49:27Z

FYI there are pure movegen tests (without any mcts) in board_test.cc.
They don't measure time, but that can be added. (also can be measured by external program, like bash time)

gsobala · 2018-12-30T10:59:17Z

As there have been recent search improvements, I have merged this with current master (6a639b6) and tested it against current master with 2 threads and starting position on an 8-core Xeon with L1 cache 32k/32kx8 L2 cache 8MB and L3 cache 11MB on Ubuntu, with your opening position fen and ending fen. I took the mean of 20 tests.

PR628 was 1.25% faster than master on the opening FEN and 2.3% faster than master on the ending FEN.

ddobbelaere · 2018-12-30T13:08:33Z

The chessboard test is 25% faster with this PR.

Thanks for the tests @gsobala. I still obtain more or less the same speedup results as mentioned earlier with current master.

I should also mention the single-run measurements of starting position of sv in discord (on his 2x Intel Xeon silver 4108, 11M L3 cache each @1.8 GHz).

master nps
1 thread: 112908
2 threads: 129790
4 threads: 125918
8 threads: 109047
16 threads: 103957
32 threads: 95132

PR628 nps
1 thread: 112711
2 threads: 116351
4 threads: 129308
8 threads: 98344.8
16 threads: 90542.6
32 threads: 82001.5

Before moving on, I'd like to investigate the exact cause of the slowdown in some cases in my machine by trying to pinpoint the root cause (cache misses?) and add possible optimizations.

ddobbelaere · 2018-12-30T21:52:56Z

I verified that cache misses are indeed the cause of the slowdown in some cases.

Tried a lot of code optimizations to further minimize CPU cycles (e.g. by trying faster PEXT instructions present on modern CPUs to index the attacks tables, effectively eliminating index calculation with magic numbers), to no avail, as this doesn't solve cache misses.

I think that lc0 really differs from A/B engines in this respect. In A/B engines, nps are orders of magnitudes higher (~several Mnps) and cache misses are reduced by the "batched movegen" behavior. lc0 does a lot of other stuff in between generating positions (MCTS, node cache lookup...) that fight for the same cache.

This explains why chessboard_test runs 25% faster using magic bitboards (even 27% with PEXT instruction), as the caches stay hot during movegen only code, but using magic bitboards in lc0 itself hasn't the hoped effect due to more costly memory access to the rook/bishop attacks tables.

ddobbelaere · 2019-01-01T11:16:39Z

Turns out that with LTO enabled, I experience no slowdown anymore (speedup for all considered cases). Magic bitboards are implemented in #640, as an additional layer on top of #638 to avoid merge conflicts with this PR.

ddobbelaere added 6 commits December 28, 2018 22:04

Added MagicBitBoards class + some initial routines

c0e5444

Added magic initialization + routines to fetch attack bitboards

55f8d4a

Use magic bitboards in move generation

d963790

Use magic bitboards in 'is under attack' detection

c14d715

Migrate functions to header files to allow inlining

0b53109

Removed unused code

c829414

ddobbelaere added 4 commits December 30, 2018 12:03

Refactoring + comment updates

129cc62

Merge remote-tracking branch 'upstream/master' into magic-bb

591bbdc

Cache magic number in magic parameters struct to improve memory locality

1f99f07

Comment updates + minor changes

4f76d72

ddobbelaere changed the title ~~Magic bitboards~~ [WIP] Magic bitboards Dec 30, 2018

ddobbelaere changed the title ~~[WIP] Magic bitboards~~ Magic bitboards Dec 30, 2018

ddobbelaere closed this Dec 30, 2018

ddobbelaere mentioned this pull request Jan 1, 2019

Movegen speedup via magic bitboards #640

Merged

ddobbelaere deleted the magic-bb branch January 20, 2019 09:53

mooskagh mentioned this pull request Mar 20, 2019

Improve the speed of ChessBoard::GenerateLegalMoves #808

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Magic bitboards #628

Magic bitboards #628

ddobbelaere commented Dec 29, 2018 •

edited

Loading

mooskagh commented Dec 29, 2018

gsobala commented Dec 30, 2018

ddobbelaere commented Dec 30, 2018 •

edited

Loading

ddobbelaere commented Dec 30, 2018 •

edited

Loading

ddobbelaere commented Jan 1, 2019

Magic bitboards #628

Magic bitboards #628

Conversation

ddobbelaere commented Dec 29, 2018 • edited Loading

mooskagh commented Dec 29, 2018

gsobala commented Dec 30, 2018

ddobbelaere commented Dec 30, 2018 • edited Loading

ddobbelaere commented Dec 30, 2018 • edited Loading

ddobbelaere commented Jan 1, 2019

ddobbelaere commented Dec 29, 2018 •

edited

Loading

ddobbelaere commented Dec 30, 2018 •

edited

Loading

ddobbelaere commented Dec 30, 2018 •

edited

Loading