This repository is a branch of Mortal original repository ,transitioning from value-based methods to policy-based methods.
Initially developed in 2022 based on Mortal V2, migrated to Mortal V4 in 2024.
This branch features:
- More stable performance optimization process
- Enhanced final performance
Note:
The performance results are based on a comparison with the baseline model. The baseline used for testing has been uploaded to RiichLab(mjai.app) and has maintained a stable rank across multiple evaluation batches.
Consistent with the original repository. Read the Documentation
Torch requirement: torch2.5.1+cu124 (install via pip)
Mortal-Policy adopts an offline to online training approach:
-
Data Preparation
Collect samples inmjai
format. -
Configuration
Renameconfig.example.toml
toconfig.toml
and set hyperparameters. -
Training Stages
-
Offline Phase1 (Advantage Weighted Regression):
Runtrain_offline_phase1.py
-
Offline Phase2 (Behavior Proximal Policy Optimization):
It is optional and the code is coming soon
-
Online Phase (Policy Gradient with Importance Sampling and PPO-style Clipping):
Runtrain_online.py
While online-only training is possible, it is not recommended.
-
Maintained alignment with original Mortal repository. For details see this post.
The weights, hyperparameters, and some online training features have been removed from this branch when it was open-sourced.
Copyright (C) 2021-2022 Equim
Copyright (C) 2025 Nitasurin
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.