Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update NNUE architecture to SFNNv8: L1-2560 nn-ac1dbea57aa3.nnue #4795

Closed
wants to merge 1 commit into from

Conversation

linrock
Copy link
Contributor

@linrock linrock commented Sep 21, 2023

Creating this net involved:

A strong epoch after each training stage was chosen for the next. The 6 stages were:

1. 400 epochs, lambda 1.0, default LR and gamma
   UHOx2-wIsRight-multinet-dfrc-n5000 (135G)
     nodes5000pv2_UHO.binpack
     data_pv-2_diff-100_nodes-5000.binpack
     wrongIsRight_nodes5000pv2.binpack
     multinet_pv-2_diff-100_nodes-5000.binpack
     dfrc_n5000.binpack

2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12
   LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G)
     T60T70wIsRightFarseerT60T74T75T76.binpack
     test78-junjulaug2022-16tb7p.no-db.min.binpack
     test79-mar2022-16tb7p.no-db.min.binpack
     test80-dec2022-16tb7p.no-db.min.binpack

3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20
   leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack
     leela93-filt-v1.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test78-janfeb2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-apr2022-16tb7p.min.binpack
     test80-may2022-16tb7p.min.binpack

4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24
   leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
     leela96-filt-v2.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test79-may2022-16tb7p.filter-v6-dd.min.binpack
     test80-jun2022-16tb7p.filter-v6-dd.min.binpack
     test80-sep2022-16tb7p.filter-v6-dd.min.binpack
     test80-nov2022-16tb7p.filter-v6-dd.min.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-mar2023-2tb7p.v6-sk16.min.binpack
     test60-novdec2021-16tb7p.min.binpack
     test77-dec2021-16tb7p.min.binpack
     test78-aprmay2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-may2023-2tb7p.min.binpack

5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 960 near the end of the first 800 epochs
   5af11540bbfe dataset: https://github.com/official-stockfish/Stockfish/pull/4635

6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 1000 near the end of the first 800 epochs
   1ee1aba5ed dataset: https://github.com/official-stockfish/Stockfish/pull/4782

L1 weights permuted with:

python3 serialize.py $nnue $nnue_permuted \
  --features=HalfKAv2_hm \
  --ft_optimize \
  --ft_optimize_data=/data/fishpack32.binpack \
  --ft_optimize_count=10000

Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2:

sf_base =  1329051 +/-   2224 (95%)
sf_test =  1163344 +/-   2992 (95%)
diff    =  -165706 +/-   4913 (95%)
speedup = -12.46807% +/- 0.370% (95%)

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue)
ep959 : 16.2 +/- 2.3

Failed 10+0.1 STC:
https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21
LLR: -2.92 (-2.94,2.94) <0.00,2.00>
Total: 13184 W: 3285 L: 3535 D: 6364
Ptnml(0-2): 85, 1662, 3334, 1440, 71

Failed 180+1.8 VLTC:
https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 64248 W: 16224 L: 16374 D: 31650
Ptnml(0-2): 26, 6788, 18640, 6650, 20

Passed 60+0.6 th 8 VLTC SMP (STC bounds):
https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 90630 W: 23372 L: 23033 D: 44225
Ptnml(0-2): 13, 8490, 27968, 8833, 11

Passed 60+0.6 th 8 VLTC SMP:
https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 137804 W: 35764 L: 35276 D: 66764
Ptnml(0-2): 31, 13006, 42326, 13522, 17

bench 1246812

Creating this net involved:
- a 6-stage training process from scratch. The datasets used in stages 1-5 were fully minimized.
- permuting L1 weights with official-stockfish/nnue-pytorch#254

A strong epoch after each training stage was chosen for the next. The 6 stages were:

```
1. 400 epochs, lambda 1.0, default LR and gamma
   UHOx2-wIsRight-multinet-dfrc-n5000 (135G)
     nodes5000pv2_UHO.binpack
     data_pv-2_diff-100_nodes-5000.binpack
     wrongIsRight_nodes5000pv2.binpack
     multinet_pv-2_diff-100_nodes-5000.binpack
     dfrc_n5000.binpack

2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12
   LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G)
     T60T70wIsRightFarseerT60T74T75T76.binpack
     test78-junjulaug2022-16tb7p.no-db.min.binpack
     test79-mar2022-16tb7p.no-db.min.binpack
     test80-dec2022-16tb7p.no-db.min.binpack

3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20
   leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack
     leela93-filt-v1.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test78-janfeb2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-apr2022-16tb7p.min.binpack
     test80-may2022-16tb7p.min.binpack

4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24
   leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
     leela96-filt-v2.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test79-may2022-16tb7p.filter-v6-dd.min.binpack
     test80-jun2022-16tb7p.filter-v6-dd.min.binpack
     test80-sep2022-16tb7p.filter-v6-dd.min.binpack
     test80-nov2022-16tb7p.filter-v6-dd.min.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-mar2023-2tb7p.v6-sk16.min.binpack
     test60-novdec2021-16tb7p.min.binpack
     test77-dec2021-16tb7p.min.binpack
     test78-aprmay2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-may2023-2tb7p.min.binpack

5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 960 near the end of the first 800 epochs
   5af11540bbfe dataset: official-stockfish#4635

6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 1000 near the end of the first 800 epochs
   1ee1aba5ed dataset: official-stockfish#4782
```

L1 weights permuted with:
```bash
python3 serialize.py $nnue $nnue_permuted \
  --features=HalfKAv2_hm \
  --ft_optimize \
  --ft_optimize_data=/data/fishpack32.binpack \
  --ft_optimize_count=10000
```

Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2:
```
sf_base =  1329051 +/-   2224 (95%)
sf_test =  1163344 +/-   2992 (95%)
diff    =  -165706 +/-   4913 (95%)
speedup = -12.46807% +/- 0.370% (95%)
```

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue)
ep959 : 16.2 +/- 2.3

Failed 10+0.1 STC:
https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21
LLR: -2.92 (-2.94,2.94) <0.00,2.00>
Total: 13184 W: 3285 L: 3535 D: 6364
Ptnml(0-2): 85, 1662, 3334, 1440, 71

Failed 180+1.8 VLTC:
https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 64248 W: 16224 L: 16374 D: 31650
Ptnml(0-2): 26, 6788, 18640, 6650, 20

Passed 60+0.6 th 8 VLTC SMP (STC bounds):
https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 90630 W: 23372 L: 23033 D: 44225
Ptnml(0-2): 13, 8490, 27968, 8833, 11

Passed 60+0.6 th 8 VLTC SMP:
https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 137804 W: 35764 L: 35276 D: 66764
Ptnml(0-2): 31, 13006, 42326, 13522, 17

bench 1246812
@Sopel97
Copy link
Member

Sopel97 commented Sep 21, 2023

So I see this wasn't trained with official-stockfish/nnue-pytorch#259 ? I don't see the benefits at this point. The process stays complicated, net gets larger, but the gains are within noise. I'd be in favor of this only if it simplifies the training process. Even it's as little as getting rid of these large interleaved binpacks. Could we maybe simplify with the current arch?

@linrock
Copy link
Contributor Author

linrock commented Sep 21, 2023

So I see this wasn't trained with official-stockfish/nnue-pytorch#259 ?

Correct, this net is based on a training that started in May, about 2 months before that PR was opened.

The process stays complicated, net gets larger, but the gains are within noise.

These are simplifications vs. the current L1-2048 master training run:

  • uses fully minimized binpacks for the first 5 stages
  • no longer uses 800GB+ shuffled binpacks
  • uses the new ftperm.py for permuting L1 weights
  • no longer collects activations from a custom stockfish binary for a 2-step weight permutation

It's possible there would have been more elo gains if this training had instead used larger/more-randomized binpacks and the previous complicated weight permutation process.

I'd be in favor of this only if it simplifies the training process. Even it's as little as getting rid of these large interleaved binpacks. Could we maybe simplify with the current arch?

If a simpler training process can't pass SPRT, what's the criteria for whether it can be accepted?

There's a balance between simplifying training and maximizing elo. Have to pick 2 of 3:

  • maximize elo (best for users)
  • simplify training (best for trainers)
  • time-efficient research

As long as gaining elo is top priority, training simplifications will naturally follow sometime later unless one is willing to spend significant time trying to optimize both at once.

@Sopel97
Copy link
Member

Sopel97 commented Sep 21, 2023

At this pace we will end up with a 8192-wide L1 before anyone else is able to reproduce the network

@vondele
Copy link
Member

vondele commented Sep 22, 2023

Let me post an additional measurement Sopel did https://tests.stockfishchess.org/tests/view/650c77c6fb151d43ae6d51dd showing master net is roughly 30 Elo stronger than an old master net with simpler training procedure. I believe that shows that significant progress has indeed been made, i.e. the training protocol is complex and the data sets large, but the Elo results are quite impressive.

The larger network sizes has so far shown quite consistently good scaling with TC, i.e. seemingly growing benefit at larger TC, which is consistent with intuition, and are clearly strong at fixed nodes. This could be contributing to the good performance in some of the ongoing tournaments. Reducing nps is actually also a good thing when in comes to hash pressure, i.e. less hash is needed for the same analysis time.

Having said all these positive things on the evolution of the nets, clearly, picking up training for new contributors, or people who had a break in training (like myself), is pretty difficult. It is essential that we are able to keep the process reproducible, and simple enough that we can improve on it. While I think linrock does a great job in describing in words what the process is, and providing the needed data, this really is a software engineering task. Ideally, the whole process could be reproduced starting from a single declarative file (e.g. a json that documents all datasets and parameters). Our easy_train.py is a first step, and I know we have pending PRs on nnue-pytorch that make good steps in that direction (e.g. official-stockfish/nnue-pytorch#257). I can only encourage this effort, and I will, in a couple of months, pick up training again.

@Disservin Disservin added bench-change Changes the bench 🚀 gainer Gains elo labels Sep 22, 2023
@vondele vondele added the to be merged Will be merged shortly label Sep 22, 2023
@vondele vondele closed this in 70ba9de Sep 22, 2023
@mstembera
Copy link
Contributor

I am probably not the first to have this idea but we could have a second small/fast net to use for our simple eval when the material advantage already looks decisive.

@vondele
Copy link
Member

vondele commented Sep 23, 2023

yes, the idea is around, but nobody implemented and tried it.

@Sopel97
Copy link
Member

Sopel97 commented Sep 28, 2023

  1. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24
    leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
    leela96-filt-v2.min.binpack
    dfrc99-16tb7p-filt-v2.min.binpack
    test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
    test79-may2022-16tb7p.filter-v6-dd.min.binpack
    test80-jun2022-16tb7p.filter-v6-dd.min.binpack
    test80-sep2022-16tb7p.filter-v6-dd.min.binpack
    test80-nov2022-16tb7p.filter-v6-dd.min.binpack
    test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
    test80-mar2023-2tb7p.v6-sk16.min.binpack
    test60-novdec2021-16tb7p.min.binpack
    test77-dec2021-16tb7p.min.binpack
    test78-aprmay2022-16tb7p.min.binpack
    test79-apr2022-16tb7p.min.binpack
    test80-may2023-2tb7p.min.binpack

like, none of these files exist. How do I form this dataset

@linrock
Copy link
Contributor Author

linrock commented Sep 28, 2023

leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
i believe is currently composed from subsets of these kaggle datasets:

https://www.kaggle.com/datasets/linrock/leela96-filt-v2-min
https://www.kaggle.com/datasets/linrock/dfrc99-16tb7p-filt-v2-min
leela96-filt-v2.min.binpack
dfrc99-16tb7p-filt-v2.min.binpack

https://www.kaggle.com/datasets/linrock/t80augtooctt79aprt78aprtosep-v6-mar2023min
test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack

https://www.kaggle.com/datasets/linrock/0dd1cebea57-misc-v6-dd
test79-may2022-16tb7p.filter-v6-dd.min.binpack
test80-jun2022-16tb7p.filter-v6-dd.min.binpack

https://www.kaggle.com/datasets/linrock/0dd1cebea57-test80-v6-dd/versions/2
test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
test80-nov2022-16tb7p.filter-v6-dd.min.binpack
test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack

https://www.kaggle.com/datasets/linrock/test80-mar2023-2tb7p-v6-sk16
test80-mar2023-2tb7p.v6-sk16.min.binpack

https://www.kaggle.com/datasets/linrock/nn-1e7ca356472e-t60-t79
test60-novdec2021-16tb7p.min.binpack

https://www.kaggle.com/datasets/linrock/test77-dec2021-16tb7p-84p
test77-dec2021-16tb7p.min.binpack

https://www.kaggle.com/datasets/linrock/test78-aprmayjunjul2022-16tb7p
test78-aprmay2022-16tb7p.min.binpack

https://www.kaggle.com/datasets/linrock/test79-apr2022-16tb7p
test79-apr2022-16tb7p.min.binpack

https://www.kaggle.com/datasets/linrock/1ee1aba5ed-test80-martojul2023-2tb7p
test80-may2023-2tb7p.min.binpack

The filenames may vary a bit between this description and whatever was uploaded to kaggle. Aside from small differences in filenames, the main things to notice are:

  • the leela test run and the month (ie. test80-may2023)
  • whether or not the dataset was filtered (ie. filter-v6, v6, v6-dd)

@linrock
Copy link
Contributor Author

linrock commented Sep 28, 2023

also I know the dataset situation is quite messy. It would be amazing if we could host public datasets by simply rsync'ing onto a remote server. That would free up a lot of time for keeping the datasets tidy.

Unfortunately having to manually manage data for uploading to kaggle is kind of a grind. It's currently hard to prioritize keeping the dataset simple vs. elo gainer research, since i'm handling the datasets mostly manually and large portions of the dataset are constantly changing.

linrock added a commit to linrock/Stockfish that referenced this pull request Sep 29, 2023
This is a later epoch from the same experiment that led to the previous
master net. In training stage 6, max-epoch was raised to 1,200 near the
end of the first 1,000 epochs.

For more details, see official-stockfish#4795

Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue)
ep1079 : 15.6 +/- 1.2

Passed STC:
https://tests.stockfishchess.org/tests/view/651503b3b3e74811c8af1e2a
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 29408 W: 7607 L: 7304 D: 14497
Ptnml(0-2): 97, 3277, 7650, 3586, 94

Passed LTC:
https://tests.stockfishchess.org/tests/view/651585ceb3e74811c8af2a5f
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 73164 W: 18828 L: 18440 D: 35896
Ptnml(0-2): 30, 7749, 20644, 8121, 38

bench 1306282
Disservin pushed a commit that referenced this pull request Sep 29, 2023
This is a later epoch from the same experiment that led to the previous
master net. In training stage 6, max-epoch was raised to 1,200 near the
end of the first 1,000 epochs.

For more details, see #4795

Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue)
ep1079 : 15.6 +/- 1.2

Passed STC:
https://tests.stockfishchess.org/tests/view/651503b3b3e74811c8af1e2a
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 29408 W: 7607 L: 7304 D: 14497
Ptnml(0-2): 97, 3277, 7650, 3586, 94

Passed LTC:
https://tests.stockfishchess.org/tests/view/651585ceb3e74811c8af2a5f
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 73164 W: 18828 L: 18440 D: 35896
Ptnml(0-2): 30, 7749, 20644, 8121, 38

closes #4810

Bench: 1453057
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bench-change Changes the bench 🚀 gainer Gains elo to be merged Will be merged shortly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants