Please install using the setup.py
.
$ git clone https://github.com/webbtheosim/ml-ternary-phase.git
$ cd ml-ternary-phase
$ conda create --name phase python=3.8.16
$ conda activate phase
$ # or source activate phase
$ pip install -e .
Select a disk location for data storage and update the directory paths before running the program. Download the required data from Zenodo here.
- DATA_DIR: Stores data pickle files (approx. 400 MB).
# LOAD DATA
DATA_DIR = "your/custom/dir/"
filename = os.path.join(DATA_DIR, f"data_clean.pickle")
with open(filename, "rb") as handle:
(x, y_c, y_r, phase_idx, num_phase, max_phase) = pickle.load(handle)
- x: Input x = (χAB, χBC, χAC, vA, vB, vC, φA, φB) ∈ ℝ8.
- y_c: Output one-hot encoded classification vector yc ∈ ℝ3.
- y_r: Output equilibrium composition and abundance vector yr = (φAα, φBα, φAβ, φBβ, φAγ, φBγ, wα, wβ, wγ) ∈ ℝ9.
- phase_idx: A single integer indicating which unique phase system it belongs to.
- num_phase: A single integer indicates the number of equilibrium phases the input splits into.
- max_phase: A single integer indicates the maximum number of equilibrium phases the system splits into.
To train from scratch, only the DATA_DIR is required.
Optional training results can be downloaded here.
- TRAIN_RESULT_DIR: Stores training results in pickle format (approx. 5GB). These pickle files can be used to reproduce the post-ML optimization results. Download
- RESULT_DIR: Stores results in pickle format for analysis and plotting (approx. 229 MB). These are temporary files. In the notebook, if
reload=False
, the plots will be generated using these files. Download - OPT_RESULT_DIR: Stores post-ML Newton-CG optimization results in pickle format (approx. 78 MB) . Each file represents the optimization of a single phase system, identified by a unique index. Download
- PICKLE_INNER_PATH: Stores training results of hyperparameter tuning (approx. 16 MB). These small files contain hyperparameter tuning results. If you want to start with only the outer loop of cross-validation, you can directly use these files and run
run_outercv.py
. If you'd prefer to perform hyperparameter tuning yourself, runrun_innercv.py
. Download
The notebook
folder contains Jupyter notebooks for reproducing figures and tables:
result.ipynb
: Visualizes all result figures from the paper.optimization.ipynb
: Generates coexistence curves using ML predictions and post-ML Newton-CG optimization.
The mlphase
folder contains the core ML code:
run_innercv.py
: Performs hyperparameter tuning using 10% of the training data (inner loop of nested five-fold cross-validation).run_outercv.py
: Conducts production run training using the best hyperparameter combinations (outer loop of nested five-fold cross-validation).run_opt.py
: Executes post-ML Newton-CG optimization for coexistence curve predictions.analysis/
: Scripts for calculating and evaluating accuracy metrics of model predictions.data/
: Scripts for data splitting and loading.models/
: Defines architecture and implementation of various machine learning models.plot/
: Scripts for creating visual representations of data, including figures for analysis and publication.
The hull_creation
folder contains code for convex hull data generation.
result_pickle/
: Contains temporary files used for figure preparation.result_csv/
: Contains temporary files used for table creation.
Note: To generate your own results, set reload=True
in the notebooks.
The submit
folder contains job submission scripts for High-Performance Computing environments:
innercv.submit
: Neural network hyperparameter tuning.outercv.submit
: Neural network production run training.post_opt.submit
: Post-ML optimization.