Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FT optimization utility. Integrate with serialization. #254

Merged
merged 2 commits into from
Jul 12, 2023

Conversation

Sopel97
Copy link
Member

@Sopel97 Sopel97 commented Jun 28, 2023

This is a small rewrite of https://github.com/Ergodice/nnue-pytorch/tree/ftperm. Big thanks to @Ergodice for the great FT weight permutation optimization algorithm. I kept it as is, only reducing batch size in one place - needs revisiting later to clean up and potentially support cpu-only optimization.

ftperm.py can be used standalone to go through the optimization process step by step. serialize.py integration allows optionally including the full process (without intermediate files) to be performed during serialization. The optimization is NOT performed by default, as it takes very long and requires additional parameters.

Usage notes copied from notes in ftperm.py:


NOTE: This script uses CUDA and may requires large amounts of VRAM. Decrease --count if encountering problems.

Example use:

1. Generate the activation matrix for some sample dataset.

python ftperm.py gather --data=data\fishpack32.binpack --net=networks\nn-5af11540bbfe.nnue --count=1000000 --features=HalfKAv2_hm --out ftact1m.npy

python ftperm.py gather --data=noob_master_leaf_static_d12_85M_0.binpack --net=nn-5af11540bbfe.nnue --count=10000 --features=HalfKAv2_hm --out ftact1m.npy

2. Find a permutation

python ftperm.py find_perm --data=ftact1m.npy --out=ftact.perm

3. Test the permutation against the baseline

python ftperm.py eval_perm --data=ftact1m.npy --perm=ftact.perm

4. Apply permutation and save
python serialize.py nn-5af11540bbfe.nnue permuted.nnue --features=HalfKAv2_hm --ft_perm=ftact.perm

----------------------------------------------------------------

OR do the whole process in one step

python serialize.py networks\nn-5af11540bbfe.nnue permuted.nnue --features=HalfKAv2_hm --ft_optimize --ft_optimize_data=data\fishpack32.binpack --ft_optimize_count=1000000

python serialize.py nn-5af11540bbfe.nnue permuted.nnue --features=HalfKAv2_hm --ft_optimize --ft_optimize_data=noob_master_leaf_static_d12_85M_0.binpack --ft_optimize_count=10000

After this is merged the following things should be looked at:

  1. How many data points we actually need. I got good results with 10000, 1M maybe excessive.
  2. Find out ideal batch size in get_score_change. Should be smallest possible as it affects VRAM usage. I reduced it to 10000 already. With 1GB of VRAM limit I had to reduce it to 100, didn't notice any issues.
  3. Figure out if make_swaps_3 can be optimized with cupy. It takes by far the longest.
  4. Add a cpu-only mode for VRAM limited systems.

@Sopel97
Copy link
Member Author

Sopel97 commented Jun 30, 2023

I have improved the readability and maintainability of the algorithm code. I also made sure it works both with cupy and numpy. Initially I was wrong, cupy was being used for make_swaps_3, it was numpy that wasn't.

@Ergodice could you please check if the comments I added make sense? I wasn't sure in all places.

I also believe that make_swaps_3 could be made faster by reusing old max_values and only updating it along the changed blocks (as this is quite sparse given the size of it). But this is for another time to expore.

@Ergodice
Copy link

Very elegant refactor!

A couple things:
On lines 95 and 159 the shape of actmat is not (N, L1) but (N, L1 // 2).
The zeroing out on lines 265-268 is not done to prevent redundancy but because the entries are only computed correctly when they are all from different blocks.

vondele pushed a commit to vondele/Stockfish that referenced this pull request Jul 1, 2023
faster permutation of master net weights

Activation data taken from https://drive.google.com/drive/folders/1Ec9YuuRx4N03GPnVPoQOW70eucOKngQe?usp=sharing
Permutation found using https://github.com/Ergodice/nnue-pytorch/blob/836387a0e5e690431d404158c46648710f13904d/ftperm.py
See also official-stockfish/nnue-pytorch#254

The algorithm greedily selects 2- and 3-cycles that can be permuted to increase the number of runs of zeroes. The percent of zero runs from the master net increased from 68.46 to 70.11 from 2-cycles and only increased to 70.32 when considering 3-cycles. Interestingly, allowing both halves of L1 to intermix when creating zero runs can give another 0.5% zero-run density increase with this method.

Measured speedup:

```
CPU: 16 x AMD Ryzen 9 3950X 16-Core Processor
Result of 50 runs

base (./stockfish.master ) = 1561556 +/- 5439
test (./stockfish.patch ) = 1575788 +/- 5427
diff = +14231 +/- 2636

speedup = +0.0091
P(speedup > 0) = 1.0000
```

closes official-stockfish#4640

No functional change
@vondele
Copy link
Member

vondele commented Jul 6, 2023

this seems to have conflicts, and unaddressed review comments?

@Sopel97
Copy link
Member Author

Sopel97 commented Jul 6, 2023

yes, I'll address them later

@Sopel97 Sopel97 force-pushed the ftperm_ergodice branch from f6d5274 to b341816 Compare July 7, 2023 13:26
@Sopel97
Copy link
Member Author

Sopel97 commented Jul 7, 2023

Resolved conflics and addressed the comments.

Zerbinati added a commit to Zerbinati/SugaR-XPrO that referenced this pull request Jul 9, 2023
faster permutation of master net weights

Activation data taken from https://drive.google.com/drive/folders/1Ec9YuuRx4N03GPnVPoQOW70eucOKngQe?usp=sharing
Permutation found using https://github.com/Ergodice/nnue-pytorch/blob/836387a0e5e690431d404158c46648710f13904d/ftperm.py
See also official-stockfish/nnue-pytorch#254

The algorithm greedily selects 2- and 3-cycles that can be permuted to increase the number of runs of zeroes. The percent of zero runs from the master net increased from 68.46 to 70.11 from 2-cycles and only increased to 70.32 when considering 3-cycles. Interestingly, allowing both halves of L1 to intermix when creating zero runs can give another 0.5% zero-run density increase with this method.

Measured speedup:

```
CPU: 16 x AMD Ryzen 9 3950X 16-Core Processor
Result of 50 runs

base (./stockfish.master ) = 1561556 +/- 5439
test (./stockfish.patch ) = 1575788 +/- 5427
diff = +14231 +/- 2636

speedup = +0.0091
P(speedup > 0) = 1.0000
```

closes #4640

No functional change
 master
 stockfish-dev-20230706-e699fee5
@Ergodice
@vondele
Ergodice authored
@vondele vondele merged commit 5107928 into official-stockfish:master Jul 12, 2023
linrock added a commit to linrock/Stockfish that referenced this pull request Sep 21, 2023
Creating this net involved:
- a 6-stage training process from scratch
- permuting L1 weights with official-stockfish/nnue-pytorch#254

A strong epoch after each training stage was chosen for the next. The 6 stages were:

1. 400 epochs, lambda 1.0, default LR and gamma
   UHOx2-wIsRight-multinet-dfrc-n5000 (135G)
     nodes5000pv2_UHO.binpack
     data_pv-2_diff-100_nodes-5000.binpack
     wrongIsRight_nodes5000pv2.binpack
     multinet_pv-2_diff-100_nodes-5000.binpack
     dfrc_n5000.binpack

2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12
   LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G)
     T60T70wIsRightFarseerT60T74T75T76.binpack
     test78-junjulaug2022-16tb7p.no-db.min.binpack
     test79-mar2022-16tb7p.no-db.min.binpack
     test80-dec2022-16tb7p.no-db.min.binpack

3. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 20
   leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack
      leela93-filt-v1.min.binpack
      dfrc99-16tb7p-filt-v2.min.binpack
      test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.min.binpack
      test80-jan2023-16tb7p.v6-sk20.min.binpack
      test78-janfeb2022-16tb7p.min.binpack
      test79-apr2022-16tb7p.min.binpack
      test80-apr2022-16tb7p.min.binpack
      test80-may2022-16tb7p.min.binpack

4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24
   leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
      leela96-filt-v2.min.binpack
      dfrc99-16tb7p-filt-v2.min.binpack
      test78-juntosep2022-16tb7p-filter-v6-dd.min.binpack
      test79-may2022-16tb7p.filter-v6-dd.min.binpack
      test80-jun2022-16tb7p.filter-v6-dd.min.binpack
      test80-sep2022-16tb7p.filter-v6-dd.min.binpack
      test80-nov2022-16tb7p.filter-v6-dd.min.binpack
      test80-jan2023-2tb7p.filter-v6-dd.min.binpack
      test80-mar2023-2tb7p.v6-sk16.min.binpack
      test60-novdec2021-16tb7p.min.binpack
      test77-dec2021-16tb7p.min.binpack
      test78-aprmay2022-16tb7p.min.binpack
      test79-apr2022-16tb7p.min.binpack
      test80-may2023-2tb7p.min.binpack

5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 960 near the end of the first 800 epochs
   5af11540bbfe dataset: official-stockfish#4635

6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 1000 near the end of the first 800 epochs
   1ee1aba5ed dataset: official-stockfish#4782

L1 weights permuted with:
```
python3 serialize.py $nnue $nnue_permuted \
  --features=HalfKAv2_hm \
  --ft_optimize \
  --ft_optimize_data=/data/fishpack32.binpack \
  --ft_optimize_count=10000
```

Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2:
```
sf_base =  1329051 +/-   2224 (95%)
sf_test =  1163344 +/-   2992 (95%)
diff    =  -165706 +/-   4913 (95%)
speedup = -12.46807% +/- 0.370% (95%)
```

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue)
ep959 : 16.2 +/- 2.3

Failed 10+0.1 STC:
https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21
LLR: -2.92 (-2.94,2.94) <0.00,2.00>
Total: 13184 W: 3285 L: 3535 D: 6364
Ptnml(0-2): 85, 1662, 3334, 1440, 71

Failed 180+1.8 VLTC:
https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 64248 W: 16224 L: 16374 D: 31650
Ptnml(0-2): 26, 6788, 18640, 6650, 20

Passed 60+0.6 th 8 VLTC SMP (STC bounds):
https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 90630 W: 23372 L: 23033 D: 44225
Ptnml(0-2): 13, 8490, 27968, 8833, 11

Passed 60+0.6 th 8 VLTC SMP:
https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 137804 W: 35764 L: 35276 D: 66764
Ptnml(0-2): 31, 13006, 42326, 13522, 17

bench 1246812
linrock added a commit to linrock/Stockfish that referenced this pull request Sep 21, 2023
Creating this net involved:
- a 6-stage training process from scratch
- permuting L1 weights with official-stockfish/nnue-pytorch#254

A strong epoch after each training stage was chosen for the next. The 6 stages were:

1. 400 epochs, lambda 1.0, default LR and gamma
   UHOx2-wIsRight-multinet-dfrc-n5000 (135G)
     nodes5000pv2_UHO.binpack
     data_pv-2_diff-100_nodes-5000.binpack
     wrongIsRight_nodes5000pv2.binpack
     multinet_pv-2_diff-100_nodes-5000.binpack
     dfrc_n5000.binpack

2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12
   LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G)
     T60T70wIsRightFarseerT60T74T75T76.binpack
     test78-junjulaug2022-16tb7p.no-db.min.binpack
     test79-mar2022-16tb7p.no-db.min.binpack
     test80-dec2022-16tb7p.no-db.min.binpack

3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20
   leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack
      leela93-filt-v1.min.binpack
      dfrc99-16tb7p-filt-v2.min.binpack
      test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.min.binpack
      test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
      test78-janfeb2022-16tb7p.min.binpack
      test79-apr2022-16tb7p.min.binpack
      test80-apr2022-16tb7p.min.binpack
      test80-may2022-16tb7p.min.binpack

4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24
   leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
      leela96-filt-v2.min.binpack
      dfrc99-16tb7p-filt-v2.min.binpack
      test78-juntosep2022-16tb7p-filter-v6-dd.min.binpack
      test79-may2022-16tb7p.filter-v6-dd.min.binpack
      test80-jun2022-16tb7p.filter-v6-dd.min.binpack
      test80-sep2022-16tb7p.filter-v6-dd.min.binpack
      test80-nov2022-16tb7p.filter-v6-dd.min.binpack
      test80-jan2023-2tb7p.filter-v6-dd.min.binpack
      test80-mar2023-2tb7p.v6-sk16.min.binpack
      test60-novdec2021-16tb7p.min.binpack
      test77-dec2021-16tb7p.min.binpack
      test78-aprmay2022-16tb7p.min.binpack
      test79-apr2022-16tb7p.min.binpack
      test80-may2023-2tb7p.min.binpack

5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 960 near the end of the first 800 epochs
   5af11540bbfe dataset: official-stockfish#4635

6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 1000 near the end of the first 800 epochs
   1ee1aba5ed dataset: official-stockfish#4782

L1 weights permuted with:
```
python3 serialize.py $nnue $nnue_permuted \
  --features=HalfKAv2_hm \
  --ft_optimize \
  --ft_optimize_data=/data/fishpack32.binpack \
  --ft_optimize_count=10000
```

Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2:
```
sf_base =  1329051 +/-   2224 (95%)
sf_test =  1163344 +/-   2992 (95%)
diff    =  -165706 +/-   4913 (95%)
speedup = -12.46807% +/- 0.370% (95%)
```

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue)
ep959 : 16.2 +/- 2.3

Failed 10+0.1 STC:
https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21
LLR: -2.92 (-2.94,2.94) <0.00,2.00>
Total: 13184 W: 3285 L: 3535 D: 6364
Ptnml(0-2): 85, 1662, 3334, 1440, 71

Failed 180+1.8 VLTC:
https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 64248 W: 16224 L: 16374 D: 31650
Ptnml(0-2): 26, 6788, 18640, 6650, 20

Passed 60+0.6 th 8 VLTC SMP (STC bounds):
https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 90630 W: 23372 L: 23033 D: 44225
Ptnml(0-2): 13, 8490, 27968, 8833, 11

Passed 60+0.6 th 8 VLTC SMP:
https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 137804 W: 35764 L: 35276 D: 66764
Ptnml(0-2): 31, 13006, 42326, 13522, 17

bench 1246812
linrock added a commit to linrock/Stockfish that referenced this pull request Sep 21, 2023
Creating this net involved:
- a 6-stage training process from scratch
- permuting L1 weights with official-stockfish/nnue-pytorch#254

A strong epoch after each training stage was chosen for the next. The 6 stages were:

```
1. 400 epochs, lambda 1.0, default LR and gamma
   UHOx2-wIsRight-multinet-dfrc-n5000 (135G)
     nodes5000pv2_UHO.binpack
     data_pv-2_diff-100_nodes-5000.binpack
     wrongIsRight_nodes5000pv2.binpack
     multinet_pv-2_diff-100_nodes-5000.binpack
     dfrc_n5000.binpack

2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12
   LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G)
     T60T70wIsRightFarseerT60T74T75T76.binpack
     test78-junjulaug2022-16tb7p.no-db.min.binpack
     test79-mar2022-16tb7p.no-db.min.binpack
     test80-dec2022-16tb7p.no-db.min.binpack

3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20
   leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack
     leela93-filt-v1.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.min.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test78-janfeb2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-apr2022-16tb7p.min.binpack
     test80-may2022-16tb7p.min.binpack

4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24
   leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
     leela96-filt-v2.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min.binpack
     test79-may2022-16tb7p.filter-v6-dd.min.binpack
     test80-jun2022-16tb7p.filter-v6-dd.min.binpack
     test80-sep2022-16tb7p.filter-v6-dd.min.binpack
     test80-nov2022-16tb7p.filter-v6-dd.min.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-mar2023-2tb7p.v6-sk16.min.binpack
     test60-novdec2021-16tb7p.min.binpack
     test77-dec2021-16tb7p.min.binpack
     test78-aprmay2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-may2023-2tb7p.min.binpack

5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 960 near the end of the first 800 epochs
   5af11540bbfe dataset: official-stockfish#4635

6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 1000 near the end of the first 800 epochs
   1ee1aba5ed dataset: official-stockfish#4782
```

L1 weights permuted with:
```bash
python3 serialize.py $nnue $nnue_permuted \
  --features=HalfKAv2_hm \
  --ft_optimize \
  --ft_optimize_data=/data/fishpack32.binpack \
  --ft_optimize_count=10000
```

Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2:
```
sf_base =  1329051 +/-   2224 (95%)
sf_test =  1163344 +/-   2992 (95%)
diff    =  -165706 +/-   4913 (95%)
speedup = -12.46807% +/- 0.370% (95%)
```

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue)
ep959 : 16.2 +/- 2.3

Failed 10+0.1 STC:
https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21
LLR: -2.92 (-2.94,2.94) <0.00,2.00>
Total: 13184 W: 3285 L: 3535 D: 6364
Ptnml(0-2): 85, 1662, 3334, 1440, 71

Failed 180+1.8 VLTC:
https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 64248 W: 16224 L: 16374 D: 31650
Ptnml(0-2): 26, 6788, 18640, 6650, 20

Passed 60+0.6 th 8 VLTC SMP (STC bounds):
https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 90630 W: 23372 L: 23033 D: 44225
Ptnml(0-2): 13, 8490, 27968, 8833, 11

Passed 60+0.6 th 8 VLTC SMP:
https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 137804 W: 35764 L: 35276 D: 66764
Ptnml(0-2): 31, 13006, 42326, 13522, 17

bench 1246812
linrock added a commit to linrock/Stockfish that referenced this pull request Sep 21, 2023
Creating this net involved:
- a 6-stage training process from scratch
- permuting L1 weights with official-stockfish/nnue-pytorch#254

A strong epoch after each training stage was chosen for the next. The 6 stages were:

```
1. 400 epochs, lambda 1.0, default LR and gamma
   UHOx2-wIsRight-multinet-dfrc-n5000 (135G)
     nodes5000pv2_UHO.binpack
     data_pv-2_diff-100_nodes-5000.binpack
     wrongIsRight_nodes5000pv2.binpack
     multinet_pv-2_diff-100_nodes-5000.binpack
     dfrc_n5000.binpack

2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12
   LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G)
     T60T70wIsRightFarseerT60T74T75T76.binpack
     test78-junjulaug2022-16tb7p.no-db.min.binpack
     test79-mar2022-16tb7p.no-db.min.binpack
     test80-dec2022-16tb7p.no-db.min.binpack

3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20
   leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack
     leela93-filt-v1.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test78-janfeb2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-apr2022-16tb7p.min.binpack
     test80-may2022-16tb7p.min.binpack

4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24
   leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
     leela96-filt-v2.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test79-may2022-16tb7p.filter-v6-dd.min.binpack
     test80-jun2022-16tb7p.filter-v6-dd.min.binpack
     test80-sep2022-16tb7p.filter-v6-dd.min.binpack
     test80-nov2022-16tb7p.filter-v6-dd.min.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-mar2023-2tb7p.v6-sk16.min.binpack
     test60-novdec2021-16tb7p.min.binpack
     test77-dec2021-16tb7p.min.binpack
     test78-aprmay2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-may2023-2tb7p.min.binpack

5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 960 near the end of the first 800 epochs
   5af11540bbfe dataset: official-stockfish#4635

6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 1000 near the end of the first 800 epochs
   1ee1aba5ed dataset: official-stockfish#4782
```

L1 weights permuted with:
```bash
python3 serialize.py $nnue $nnue_permuted \
  --features=HalfKAv2_hm \
  --ft_optimize \
  --ft_optimize_data=/data/fishpack32.binpack \
  --ft_optimize_count=10000
```

Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2:
```
sf_base =  1329051 +/-   2224 (95%)
sf_test =  1163344 +/-   2992 (95%)
diff    =  -165706 +/-   4913 (95%)
speedup = -12.46807% +/- 0.370% (95%)
```

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue)
ep959 : 16.2 +/- 2.3

Failed 10+0.1 STC:
https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21
LLR: -2.92 (-2.94,2.94) <0.00,2.00>
Total: 13184 W: 3285 L: 3535 D: 6364
Ptnml(0-2): 85, 1662, 3334, 1440, 71

Failed 180+1.8 VLTC:
https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 64248 W: 16224 L: 16374 D: 31650
Ptnml(0-2): 26, 6788, 18640, 6650, 20

Passed 60+0.6 th 8 VLTC SMP (STC bounds):
https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 90630 W: 23372 L: 23033 D: 44225
Ptnml(0-2): 13, 8490, 27968, 8833, 11

Passed 60+0.6 th 8 VLTC SMP:
https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 137804 W: 35764 L: 35276 D: 66764
Ptnml(0-2): 31, 13006, 42326, 13522, 17

bench 1246812
linrock added a commit to linrock/Stockfish that referenced this pull request Sep 21, 2023
Creating this net involved:
- a 6-stage training process from scratch
- permuting L1 weights with official-stockfish/nnue-pytorch#254

The datasets used in stages 1-5 were fully minimized. A strong epoch after
each training stage was chosen for the next. The 6 stages were:

```
1. 400 epochs, lambda 1.0, default LR and gamma
   UHOx2-wIsRight-multinet-dfrc-n5000 (135G)
     nodes5000pv2_UHO.binpack
     data_pv-2_diff-100_nodes-5000.binpack
     wrongIsRight_nodes5000pv2.binpack
     multinet_pv-2_diff-100_nodes-5000.binpack
     dfrc_n5000.binpack

2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12
   LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G)
     T60T70wIsRightFarseerT60T74T75T76.binpack
     test78-junjulaug2022-16tb7p.no-db.min.binpack
     test79-mar2022-16tb7p.no-db.min.binpack
     test80-dec2022-16tb7p.no-db.min.binpack

3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20
   leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack
     leela93-filt-v1.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test78-janfeb2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-apr2022-16tb7p.min.binpack
     test80-may2022-16tb7p.min.binpack

4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24
   leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
     leela96-filt-v2.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test79-may2022-16tb7p.filter-v6-dd.min.binpack
     test80-jun2022-16tb7p.filter-v6-dd.min.binpack
     test80-sep2022-16tb7p.filter-v6-dd.min.binpack
     test80-nov2022-16tb7p.filter-v6-dd.min.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-mar2023-2tb7p.v6-sk16.min.binpack
     test60-novdec2021-16tb7p.min.binpack
     test77-dec2021-16tb7p.min.binpack
     test78-aprmay2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-may2023-2tb7p.min.binpack

5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 960 near the end of the first 800 epochs
   5af11540bbfe dataset: official-stockfish#4635

6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 1000 near the end of the first 800 epochs
   1ee1aba5ed dataset: official-stockfish#4782
```

L1 weights permuted with:
```bash
python3 serialize.py $nnue $nnue_permuted \
  --features=HalfKAv2_hm \
  --ft_optimize \
  --ft_optimize_data=/data/fishpack32.binpack \
  --ft_optimize_count=10000
```

Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2:
```
sf_base =  1329051 +/-   2224 (95%)
sf_test =  1163344 +/-   2992 (95%)
diff    =  -165706 +/-   4913 (95%)
speedup = -12.46807% +/- 0.370% (95%)
```

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue)
ep959 : 16.2 +/- 2.3

Failed 10+0.1 STC:
https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21
LLR: -2.92 (-2.94,2.94) <0.00,2.00>
Total: 13184 W: 3285 L: 3535 D: 6364
Ptnml(0-2): 85, 1662, 3334, 1440, 71

Failed 180+1.8 VLTC:
https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 64248 W: 16224 L: 16374 D: 31650
Ptnml(0-2): 26, 6788, 18640, 6650, 20

Passed 60+0.6 th 8 VLTC SMP (STC bounds):
https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 90630 W: 23372 L: 23033 D: 44225
Ptnml(0-2): 13, 8490, 27968, 8833, 11

Passed 60+0.6 th 8 VLTC SMP:
https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 137804 W: 35764 L: 35276 D: 66764
Ptnml(0-2): 31, 13006, 42326, 13522, 17

bench 1246812
linrock added a commit to linrock/Stockfish that referenced this pull request Sep 21, 2023
Creating this net involved:
- a 6-stage training process from scratch. The datasets used in stages 1-5 were fully minimized.
- permuting L1 weights with official-stockfish/nnue-pytorch#254

A strong epoch after each training stage was chosen for the next. The 6 stages were:

```
1. 400 epochs, lambda 1.0, default LR and gamma
   UHOx2-wIsRight-multinet-dfrc-n5000 (135G)
     nodes5000pv2_UHO.binpack
     data_pv-2_diff-100_nodes-5000.binpack
     wrongIsRight_nodes5000pv2.binpack
     multinet_pv-2_diff-100_nodes-5000.binpack
     dfrc_n5000.binpack

2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12
   LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G)
     T60T70wIsRightFarseerT60T74T75T76.binpack
     test78-junjulaug2022-16tb7p.no-db.min.binpack
     test79-mar2022-16tb7p.no-db.min.binpack
     test80-dec2022-16tb7p.no-db.min.binpack

3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20
   leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack
     leela93-filt-v1.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test78-janfeb2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-apr2022-16tb7p.min.binpack
     test80-may2022-16tb7p.min.binpack

4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24
   leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
     leela96-filt-v2.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test79-may2022-16tb7p.filter-v6-dd.min.binpack
     test80-jun2022-16tb7p.filter-v6-dd.min.binpack
     test80-sep2022-16tb7p.filter-v6-dd.min.binpack
     test80-nov2022-16tb7p.filter-v6-dd.min.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-mar2023-2tb7p.v6-sk16.min.binpack
     test60-novdec2021-16tb7p.min.binpack
     test77-dec2021-16tb7p.min.binpack
     test78-aprmay2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-may2023-2tb7p.min.binpack

5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 960 near the end of the first 800 epochs
   5af11540bbfe dataset: official-stockfish#4635

6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 1000 near the end of the first 800 epochs
   1ee1aba5ed dataset: official-stockfish#4782
```

L1 weights permuted with:
```bash
python3 serialize.py $nnue $nnue_permuted \
  --features=HalfKAv2_hm \
  --ft_optimize \
  --ft_optimize_data=/data/fishpack32.binpack \
  --ft_optimize_count=10000
```

Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2:
```
sf_base =  1329051 +/-   2224 (95%)
sf_test =  1163344 +/-   2992 (95%)
diff    =  -165706 +/-   4913 (95%)
speedup = -12.46807% +/- 0.370% (95%)
```

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue)
ep959 : 16.2 +/- 2.3

Failed 10+0.1 STC:
https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21
LLR: -2.92 (-2.94,2.94) <0.00,2.00>
Total: 13184 W: 3285 L: 3535 D: 6364
Ptnml(0-2): 85, 1662, 3334, 1440, 71

Failed 180+1.8 VLTC:
https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 64248 W: 16224 L: 16374 D: 31650
Ptnml(0-2): 26, 6788, 18640, 6650, 20

Passed 60+0.6 th 8 VLTC SMP (STC bounds):
https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 90630 W: 23372 L: 23033 D: 44225
Ptnml(0-2): 13, 8490, 27968, 8833, 11

Passed 60+0.6 th 8 VLTC SMP:
https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 137804 W: 35764 L: 35276 D: 66764
Ptnml(0-2): 31, 13006, 42326, 13522, 17

bench 1246812
vondele pushed a commit to vondele/Stockfish that referenced this pull request Sep 22, 2023
Creating this net involved:
- a 6-stage training process from scratch. The datasets used in stages 1-5 were fully minimized.
- permuting L1 weights with official-stockfish/nnue-pytorch#254

A strong epoch after each training stage was chosen for the next. The 6 stages were:

```
1. 400 epochs, lambda 1.0, default LR and gamma
   UHOx2-wIsRight-multinet-dfrc-n5000 (135G)
     nodes5000pv2_UHO.binpack
     data_pv-2_diff-100_nodes-5000.binpack
     wrongIsRight_nodes5000pv2.binpack
     multinet_pv-2_diff-100_nodes-5000.binpack
     dfrc_n5000.binpack

2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12
   LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G)
     T60T70wIsRightFarseerT60T74T75T76.binpack
     test78-junjulaug2022-16tb7p.no-db.min.binpack
     test79-mar2022-16tb7p.no-db.min.binpack
     test80-dec2022-16tb7p.no-db.min.binpack

3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20
   leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack
     leela93-filt-v1.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test78-janfeb2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-apr2022-16tb7p.min.binpack
     test80-may2022-16tb7p.min.binpack

4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24
   leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
     leela96-filt-v2.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test79-may2022-16tb7p.filter-v6-dd.min.binpack
     test80-jun2022-16tb7p.filter-v6-dd.min.binpack
     test80-sep2022-16tb7p.filter-v6-dd.min.binpack
     test80-nov2022-16tb7p.filter-v6-dd.min.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-mar2023-2tb7p.v6-sk16.min.binpack
     test60-novdec2021-16tb7p.min.binpack
     test77-dec2021-16tb7p.min.binpack
     test78-aprmay2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-may2023-2tb7p.min.binpack

5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 960 near the end of the first 800 epochs
   5af11540bbfe dataset: official-stockfish#4635

6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 1000 near the end of the first 800 epochs
   1ee1aba5ed dataset: official-stockfish#4782
```

L1 weights permuted with:
```bash
python3 serialize.py $nnue $nnue_permuted \
  --features=HalfKAv2_hm \
  --ft_optimize \
  --ft_optimize_data=/data/fishpack32.binpack \
  --ft_optimize_count=10000
```

Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2:
```
sf_base =  1329051 +/-   2224 (95%)
sf_test =  1163344 +/-   2992 (95%)
diff    =  -165706 +/-   4913 (95%)
speedup = -12.46807% +/- 0.370% (95%)
```

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue)
ep959 : 16.2 +/- 2.3

Failed 10+0.1 STC:
https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21
LLR: -2.92 (-2.94,2.94) <0.00,2.00>
Total: 13184 W: 3285 L: 3535 D: 6364
Ptnml(0-2): 85, 1662, 3334, 1440, 71

Failed 180+1.8 VLTC:
https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 64248 W: 16224 L: 16374 D: 31650
Ptnml(0-2): 26, 6788, 18640, 6650, 20

Passed 60+0.6 th 8 VLTC SMP (STC bounds):
https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 90630 W: 23372 L: 23033 D: 44225
Ptnml(0-2): 13, 8490, 27968, 8833, 11

Passed 60+0.6 th 8 VLTC SMP:
https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 137804 W: 35764 L: 35276 D: 66764
Ptnml(0-2): 31, 13006, 42326, 13522, 17

closes official-stockfish#4795

bench 1246812
linrock added a commit to linrock/Stockfish that referenced this pull request Jan 7, 2024
Created by training an L1-128 net from scratch with a wider range of evals
in the training data and wld-fen-skipping disabled during training. The
differences in this training data compared to the first dual nnue PR are:

- removal of all positions with 3 pieces
- when piece count >= 16, keep positions with simple eval above 750
- when piece count < 16, remove positions with simple eval above 3000

The asymmetric data filtering was meant to flatten the training data piece
count distribution, which was previously heavily skewed towards positions with
low piece counts.

Additionally, the simple eval range where the smallnet is used was widened to
cover more positions previously evaluated by the big net and simple eval.

```yaml
experiment-name: 128--S1-hse-S7-v4-S3-v1-no-wld-skip

training-dataset:
  - /data/hse/S3/leela96-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/dfrc99-16tb7p-eval-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/test80-apr2022-16tb7p.min.high-simple-eval-1k.binpack

  - /data/hse/S7/test60-2020-2tb7p.v6-3072.high-simple-eval-v4.binpack
  - /data/hse/S7/test60-novdec2021-12tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test77-nov2021-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test77-dec2021-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test77-jan2022-2tb7p.high-simple-eval-v4.binpack

  - /data/hse/S7/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test79-apr2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test79-may2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test80-may2022-16tb7p.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jul2022-16tb7p.v6-dd.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-aug2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-oct2022-16tb7p.v6-dd.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-nov2022-16tb7p-v6-dd.min.high-simple-eval-v4.binpack

  - /data/hse/S7/test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-feb2023-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-mar2023-2tb7p.v6-sk16.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-apr2023-2tb7p-filter-v6-sk16.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-may2023-2tb7p.v6.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jun2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jul2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-aug2023-2tb7p.v6.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-sep2023-2tb7p.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-oct2023-2tb7p.high-simple-eval-v4.binpack

wld-fen-skipping: False
start-from-engine-test-net: False

nnue-pytorch-branch: linrock/nnue-pytorch/L1-128
engine-test-branch: linrock/Stockfish/L1-128-nolazy
engine-base-branch: linrock/Stockfish/L1-128

num-epochs: 500
start-lambda: 1.0
end-lambda: 1.0
```

Experiment yaml configs converted to easy_train.sh commands with:
https://github.com/linrock/nnue-tools/blob/4339954/yaml_easy_train.py

Binpacks interleaved at training time with:
official-stockfish/nnue-pytorch#259

FT weights permuted with 10k positions from fishpack32.binpack with:
official-stockfish/nnue-pytorch#254

Data filtered for high simple eval positions (v4) with:
https://github.com/linrock/Stockfish/blob/b9c8440/src/tools/transform.cpp#L640-L675

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move of
L1-128 smallnet (nnue-only eval) vs. L1-128 trained on standard S1 data:
nn-epoch319.nnue : -241.7 +/- 3.2

Passed STC vs. 36db936:
https://tests.stockfishchess.org/tests/view/6576b3484d789acf40aabbfe
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 21920 W: 5680 L: 5381 D: 10859
Ptnml(0-2): 82, 2488, 5520, 2789, 81

Passed LTC vs. DualNNUE official-stockfish#4915:
https://tests.stockfishchess.org/tests/view/65775c034d789acf40aac7e3
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 147606 W: 36619 L: 36063 D: 74924
Ptnml(0-2): 98, 16591, 39891, 17103, 120

Bench: 1438336
Disservin pushed a commit to official-stockfish/Stockfish that referenced this pull request Jan 7, 2024
Created by training an L1-128 net from scratch with a wider range of
evals in the training data and wld-fen-skipping disabled during
training. The differences in this training data compared to the first
dual nnue PR are:

- removal of all positions with 3 pieces
- when piece count >= 16, keep positions with simple eval above 750
- when piece count < 16, remove positions with simple eval above 3000

The asymmetric data filtering was meant to flatten the training data
piece count distribution, which was previously heavily skewed towards
positions with low piece counts.

Additionally, the simple eval range where the smallnet is used was
widened to cover more positions previously evaluated by the big net and
simple eval.

```yaml
experiment-name: 128--S1-hse-S7-v4-S3-v1-no-wld-skip

training-dataset:
  - /data/hse/S3/leela96-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/dfrc99-16tb7p-eval-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/test80-apr2022-16tb7p.min.high-simple-eval-1k.binpack

  - /data/hse/S7/test60-2020-2tb7p.v6-3072.high-simple-eval-v4.binpack
  - /data/hse/S7/test60-novdec2021-12tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test77-nov2021-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test77-dec2021-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test77-jan2022-2tb7p.high-simple-eval-v4.binpack

  - /data/hse/S7/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test79-apr2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test79-may2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test80-may2022-16tb7p.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jul2022-16tb7p.v6-dd.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-aug2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-oct2022-16tb7p.v6-dd.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-nov2022-16tb7p-v6-dd.min.high-simple-eval-v4.binpack

  - /data/hse/S7/test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-feb2023-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-mar2023-2tb7p.v6-sk16.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-apr2023-2tb7p-filter-v6-sk16.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-may2023-2tb7p.v6.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jun2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jul2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-aug2023-2tb7p.v6.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-sep2023-2tb7p.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-oct2023-2tb7p.high-simple-eval-v4.binpack

wld-fen-skipping: False
start-from-engine-test-net: False

nnue-pytorch-branch: linrock/nnue-pytorch/L1-128
engine-test-branch: linrock/Stockfish/L1-128-nolazy
engine-base-branch: linrock/Stockfish/L1-128

num-epochs: 500
start-lambda: 1.0
end-lambda: 1.0
```

Experiment yaml configs converted to easy_train.sh commands with:
https://github.com/linrock/nnue-tools/blob/4339954/yaml_easy_train.py

Binpacks interleaved at training time with:
official-stockfish/nnue-pytorch#259

FT weights permuted with 10k positions from fishpack32.binpack with:
official-stockfish/nnue-pytorch#254

Data filtered for high simple eval positions (v4) with:
https://github.com/linrock/Stockfish/blob/b9c8440/src/tools/transform.cpp#L640-L675

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move of
L1-128 smallnet (nnue-only eval) vs. L1-128 trained on standard S1 data:
nn-epoch319.nnue : -241.7 +/- 3.2

Passed STC vs. 36db936:
https://tests.stockfishchess.org/tests/view/6576b3484d789acf40aabbfe
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 21920 W: 5680 L: 5381 D: 10859
Ptnml(0-2): 82, 2488, 5520, 2789, 81

Passed LTC vs. DualNNUE #4915:
https://tests.stockfishchess.org/tests/view/65775c034d789acf40aac7e3
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 147606 W: 36619 L: 36063 D: 74924
Ptnml(0-2): 98, 16591, 39891, 17103, 120

closes #4919

Bench: 1438336
Disservin pushed a commit to Disservin/Stockfish that referenced this pull request Jan 8, 2024
Created by training an L1-128 net from scratch with a wider range of
evals in the training data and wld-fen-skipping disabled during
training. The differences in this training data compared to the first
dual nnue PR are:

- removal of all positions with 3 pieces
- when piece count >= 16, keep positions with simple eval above 750
- when piece count < 16, remove positions with simple eval above 3000

The asymmetric data filtering was meant to flatten the training data
piece count distribution, which was previously heavily skewed towards
positions with low piece counts.

Additionally, the simple eval range where the smallnet is used was
widened to cover more positions previously evaluated by the big net and
simple eval.

```yaml
experiment-name: 128--S1-hse-S7-v4-S3-v1-no-wld-skip

training-dataset:
  - /data/hse/S3/leela96-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/dfrc99-16tb7p-eval-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/test80-apr2022-16tb7p.min.high-simple-eval-1k.binpack

  - /data/hse/S7/test60-2020-2tb7p.v6-3072.high-simple-eval-v4.binpack
  - /data/hse/S7/test60-novdec2021-12tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test77-nov2021-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test77-dec2021-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test77-jan2022-2tb7p.high-simple-eval-v4.binpack

  - /data/hse/S7/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test79-apr2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test79-may2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test80-may2022-16tb7p.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jul2022-16tb7p.v6-dd.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-aug2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-oct2022-16tb7p.v6-dd.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-nov2022-16tb7p-v6-dd.min.high-simple-eval-v4.binpack

  - /data/hse/S7/test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-feb2023-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-mar2023-2tb7p.v6-sk16.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-apr2023-2tb7p-filter-v6-sk16.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-may2023-2tb7p.v6.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jun2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jul2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-aug2023-2tb7p.v6.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-sep2023-2tb7p.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-oct2023-2tb7p.high-simple-eval-v4.binpack

wld-fen-skipping: False
start-from-engine-test-net: False

nnue-pytorch-branch: linrock/nnue-pytorch/L1-128
engine-test-branch: linrock/Stockfish/L1-128-nolazy
engine-base-branch: linrock/Stockfish/L1-128

num-epochs: 500
start-lambda: 1.0
end-lambda: 1.0
```

Experiment yaml configs converted to easy_train.sh commands with:
https://github.com/linrock/nnue-tools/blob/4339954/yaml_easy_train.py

Binpacks interleaved at training time with:
official-stockfish/nnue-pytorch#259

FT weights permuted with 10k positions from fishpack32.binpack with:
official-stockfish/nnue-pytorch#254

Data filtered for high simple eval positions (v4) with:
https://github.com/linrock/Stockfish/blob/b9c8440/src/tools/transform.cpp#L640-L675

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move of
L1-128 smallnet (nnue-only eval) vs. L1-128 trained on standard S1 data:
nn-epoch319.nnue : -241.7 +/- 3.2

Passed STC vs. 36db936:
https://tests.stockfishchess.org/tests/view/6576b3484d789acf40aabbfe
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 21920 W: 5680 L: 5381 D: 10859
Ptnml(0-2): 82, 2488, 5520, 2789, 81

Passed LTC vs. DualNNUE official-stockfish#4915:
https://tests.stockfishchess.org/tests/view/65775c034d789acf40aac7e3
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 147606 W: 36619 L: 36063 D: 74924
Ptnml(0-2): 98, 16591, 39891, 17103, 120

closes official-stockfish#4919

Bench: 1438336
Joachim26 pushed a commit to Joachim26/StockfishNPS that referenced this pull request Jan 14, 2024
Created by training an L1-128 net from scratch with a wider range of
evals in the training data and wld-fen-skipping disabled during
training. The differences in this training data compared to the first
dual nnue PR are:

- removal of all positions with 3 pieces
- when piece count >= 16, keep positions with simple eval above 750
- when piece count < 16, remove positions with simple eval above 3000

The asymmetric data filtering was meant to flatten the training data
piece count distribution, which was previously heavily skewed towards
positions with low piece counts.

Additionally, the simple eval range where the smallnet is used was
widened to cover more positions previously evaluated by the big net and
simple eval.

```yaml
experiment-name: 128--S1-hse-S7-v4-S3-v1-no-wld-skip

training-dataset:
  - /data/hse/S3/leela96-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/dfrc99-16tb7p-eval-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/test80-apr2022-16tb7p.min.high-simple-eval-1k.binpack

  - /data/hse/S7/test60-2020-2tb7p.v6-3072.high-simple-eval-v4.binpack
  - /data/hse/S7/test60-novdec2021-12tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test77-nov2021-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test77-dec2021-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test77-jan2022-2tb7p.high-simple-eval-v4.binpack

  - /data/hse/S7/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test79-apr2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test79-may2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test80-may2022-16tb7p.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jul2022-16tb7p.v6-dd.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-aug2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-oct2022-16tb7p.v6-dd.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-nov2022-16tb7p-v6-dd.min.high-simple-eval-v4.binpack

  - /data/hse/S7/test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-feb2023-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-mar2023-2tb7p.v6-sk16.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-apr2023-2tb7p-filter-v6-sk16.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-may2023-2tb7p.v6.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jun2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jul2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-aug2023-2tb7p.v6.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-sep2023-2tb7p.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-oct2023-2tb7p.high-simple-eval-v4.binpack

wld-fen-skipping: False
start-from-engine-test-net: False

nnue-pytorch-branch: linrock/nnue-pytorch/L1-128
engine-test-branch: linrock/Stockfish/L1-128-nolazy
engine-base-branch: linrock/Stockfish/L1-128

num-epochs: 500
start-lambda: 1.0
end-lambda: 1.0
```

Experiment yaml configs converted to easy_train.sh commands with:
https://github.com/linrock/nnue-tools/blob/4339954/yaml_easy_train.py

Binpacks interleaved at training time with:
official-stockfish/nnue-pytorch#259

FT weights permuted with 10k positions from fishpack32.binpack with:
official-stockfish/nnue-pytorch#254

Data filtered for high simple eval positions (v4) with:
https://github.com/linrock/Stockfish/blob/b9c8440/src/tools/transform.cpp#L640-L675

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move of
L1-128 smallnet (nnue-only eval) vs. L1-128 trained on standard S1 data:
nn-epoch319.nnue : -241.7 +/- 3.2

Passed STC vs. 36db936:
https://tests.stockfishchess.org/tests/view/6576b3484d789acf40aabbfe
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 21920 W: 5680 L: 5381 D: 10859
Ptnml(0-2): 82, 2488, 5520, 2789, 81

Passed LTC vs. DualNNUE official-stockfish#4915:
https://tests.stockfishchess.org/tests/view/65775c034d789acf40aac7e3
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 147606 W: 36619 L: 36063 D: 74924
Ptnml(0-2): 98, 16591, 39891, 17103, 120

closes official-stockfish#4919

Bench: 1438336
rn5f107s2 pushed a commit to rn5f107s2/Stockfish that referenced this pull request Jan 14, 2024
Created by training an L1-128 net from scratch with a wider range of
evals in the training data and wld-fen-skipping disabled during
training. The differences in this training data compared to the first
dual nnue PR are:

- removal of all positions with 3 pieces
- when piece count >= 16, keep positions with simple eval above 750
- when piece count < 16, remove positions with simple eval above 3000

The asymmetric data filtering was meant to flatten the training data
piece count distribution, which was previously heavily skewed towards
positions with low piece counts.

Additionally, the simple eval range where the smallnet is used was
widened to cover more positions previously evaluated by the big net and
simple eval.

```yaml
experiment-name: 128--S1-hse-S7-v4-S3-v1-no-wld-skip

training-dataset:
  - /data/hse/S3/leela96-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/dfrc99-16tb7p-eval-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/test80-apr2022-16tb7p.min.high-simple-eval-1k.binpack

  - /data/hse/S7/test60-2020-2tb7p.v6-3072.high-simple-eval-v4.binpack
  - /data/hse/S7/test60-novdec2021-12tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test77-nov2021-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test77-dec2021-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test77-jan2022-2tb7p.high-simple-eval-v4.binpack

  - /data/hse/S7/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test79-apr2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test79-may2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test80-may2022-16tb7p.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jul2022-16tb7p.v6-dd.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-aug2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-oct2022-16tb7p.v6-dd.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-nov2022-16tb7p-v6-dd.min.high-simple-eval-v4.binpack

  - /data/hse/S7/test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-feb2023-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-mar2023-2tb7p.v6-sk16.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-apr2023-2tb7p-filter-v6-sk16.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-may2023-2tb7p.v6.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jun2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jul2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-aug2023-2tb7p.v6.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-sep2023-2tb7p.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-oct2023-2tb7p.high-simple-eval-v4.binpack

wld-fen-skipping: False
start-from-engine-test-net: False

nnue-pytorch-branch: linrock/nnue-pytorch/L1-128
engine-test-branch: linrock/Stockfish/L1-128-nolazy
engine-base-branch: linrock/Stockfish/L1-128

num-epochs: 500
start-lambda: 1.0
end-lambda: 1.0
```

Experiment yaml configs converted to easy_train.sh commands with:
https://github.com/linrock/nnue-tools/blob/4339954/yaml_easy_train.py

Binpacks interleaved at training time with:
official-stockfish/nnue-pytorch#259

FT weights permuted with 10k positions from fishpack32.binpack with:
official-stockfish/nnue-pytorch#254

Data filtered for high simple eval positions (v4) with:
https://github.com/linrock/Stockfish/blob/b9c8440/src/tools/transform.cpp#L640-L675

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move of
L1-128 smallnet (nnue-only eval) vs. L1-128 trained on standard S1 data:
nn-epoch319.nnue : -241.7 +/- 3.2

Passed STC vs. 36db936:
https://tests.stockfishchess.org/tests/view/6576b3484d789acf40aabbfe
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 21920 W: 5680 L: 5381 D: 10859
Ptnml(0-2): 82, 2488, 5520, 2789, 81

Passed LTC vs. DualNNUE official-stockfish#4915:
https://tests.stockfishchess.org/tests/view/65775c034d789acf40aac7e3
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 147606 W: 36619 L: 36063 D: 74924
Ptnml(0-2): 98, 16591, 39891, 17103, 120

closes official-stockfish#4919

Bench: 1438336
windfishballad pushed a commit to windfishballad/Stockfish that referenced this pull request Jan 23, 2024
Created by training an L1-128 net from scratch with a wider range of
evals in the training data and wld-fen-skipping disabled during
training. The differences in this training data compared to the first
dual nnue PR are:

- removal of all positions with 3 pieces
- when piece count >= 16, keep positions with simple eval above 750
- when piece count < 16, remove positions with simple eval above 3000

The asymmetric data filtering was meant to flatten the training data
piece count distribution, which was previously heavily skewed towards
positions with low piece counts.

Additionally, the simple eval range where the smallnet is used was
widened to cover more positions previously evaluated by the big net and
simple eval.

```yaml
experiment-name: 128--S1-hse-S7-v4-S3-v1-no-wld-skip

training-dataset:
  - /data/hse/S3/leela96-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/dfrc99-16tb7p-eval-filt-v2.min.high-simple-eval-1k.binpack
  - /data/hse/S3/test80-apr2022-16tb7p.min.high-simple-eval-1k.binpack

  - /data/hse/S7/test60-2020-2tb7p.v6-3072.high-simple-eval-v4.binpack
  - /data/hse/S7/test60-novdec2021-12tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test77-nov2021-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test77-dec2021-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test77-jan2022-2tb7p.high-simple-eval-v4.binpack

  - /data/hse/S7/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test79-apr2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test79-may2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack

  - /data/hse/S7/test80-may2022-16tb7p.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jul2022-16tb7p.v6-dd.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-aug2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-oct2022-16tb7p.v6-dd.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-nov2022-16tb7p-v6-dd.min.high-simple-eval-v4.binpack

  - /data/hse/S7/test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-feb2023-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-mar2023-2tb7p.v6-sk16.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-apr2023-2tb7p-filter-v6-sk16.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-may2023-2tb7p.v6.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jun2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-jul2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-aug2023-2tb7p.v6.min.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-sep2023-2tb7p.high-simple-eval-v4.binpack
  - /data/hse/S7/test80-oct2023-2tb7p.high-simple-eval-v4.binpack

wld-fen-skipping: False
start-from-engine-test-net: False

nnue-pytorch-branch: linrock/nnue-pytorch/L1-128
engine-test-branch: linrock/Stockfish/L1-128-nolazy
engine-base-branch: linrock/Stockfish/L1-128

num-epochs: 500
start-lambda: 1.0
end-lambda: 1.0
```

Experiment yaml configs converted to easy_train.sh commands with:
https://github.com/linrock/nnue-tools/blob/4339954/yaml_easy_train.py

Binpacks interleaved at training time with:
official-stockfish/nnue-pytorch#259

FT weights permuted with 10k positions from fishpack32.binpack with:
official-stockfish/nnue-pytorch#254

Data filtered for high simple eval positions (v4) with:
https://github.com/linrock/Stockfish/blob/b9c8440/src/tools/transform.cpp#L640-L675

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move of
L1-128 smallnet (nnue-only eval) vs. L1-128 trained on standard S1 data:
nn-epoch319.nnue : -241.7 +/- 3.2

Passed STC vs. 36db936:
https://tests.stockfishchess.org/tests/view/6576b3484d789acf40aabbfe
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 21920 W: 5680 L: 5381 D: 10859
Ptnml(0-2): 82, 2488, 5520, 2789, 81

Passed LTC vs. DualNNUE official-stockfish#4915:
https://tests.stockfishchess.org/tests/view/65775c034d789acf40aac7e3
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 147606 W: 36619 L: 36063 D: 74924
Ptnml(0-2): 98, 16591, 39891, 17103, 120

closes official-stockfish#4919

Bench: 1438336
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants