Skip to content

Commit

Permalink
Faster training in reduced bases (#4)
Browse files Browse the repository at this point in the history
* working on a more efficient reduced basis training

* updating the fast training code, still an issue in the end when re-stitching the networks

* the fast training is working currently for the as_dense architecture

* testing accuracy in l2

* updating reduced basis training codes

* fixing some last minute issues with the dino training

* more improvements

* updating fast training to accomodate more than as

* adding callbacks to the opt_parameters default dict to accomodate learning rate scheduling

* adding callbacks to the opt_parameters default dict to accomodate learning rate scheduling

* adding readme and drivers, working on finishing this pull request

* updating some code, getting drivers sorted out in the refactoring

* updating training drivers

* further refactoring

* more refactoring

* updating

* updating, lets see if the dipnet stuff still runs correctly

* massive commit incoming

* running hyperelasticity runs again

* updating

* updating again more rb_dense typos and such

* needed to move the save weights out of the train_dino which could be run in reduced setting

* hyperelasticity evaluation and post-processing working

* updating everything

* hyper and rdiff evaluation have all been checked

* a few debugging leftovers needed to be removed

* all examples are completely documented and functional in the refactoring

* adding INSTALL file

* updating README

* some updates regarding the evaluation suite
  • Loading branch information
tomoleary authored Jul 29, 2023
1 parent 42cda9a commit 1ddc2fa
Show file tree
Hide file tree
Showing 55 changed files with 4,303 additions and 1,183 deletions.
33 changes: 33 additions & 0 deletions INSTALL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
Derivative Informed Neural Operator

_____ ___ ___
/ /::\ ___ /__/\ / /\
/ /:/\:\ / /\ \ \:\ / /::\
/ /:/ \:\ / /:/ \ \:\ / /:/\:\
/__/:/ \__\:| /__/::\ _____\__\:\ / /:/ \:\
\ \:\ / /:/ \__\/\:\__ /__/::::::::\ /__/:/ \__\:\
\ \:\ /:/ \ \:\/\ \ \:\~~\~~\/ \ \:\ / /:/
\ \:\/:/ \__\::/ \ \:\ ~~~ \ \:\ /:/
\ \::/ /__/:/ \ \:\ \ \:\/:/
\__\/ \__\/ \ \:\ \ \::/
\__\/ \__\/

An Efficient Framework for High-Dimensional Parametric Derivative Learning


* PDE data generation is handled by `FEniCS` `hIPPYlib`, and `hippyflow`. For this [`hIPPYlib`](https://github.com/hippylib/hippylib) and [`hippyflow`](https://github.com/hippylib/hippyflow) must be installed.

With conda

* `conda create -n hippyflow -c uvilla -c conda-forge fenics==2019.1.0 tensorflow matplotlib scipy tensorflow=2.7.0`

Assumes that the environmental variables `HIPPYLIB_PATH`, `HIPPYFLOW_PATH` and `DINO_PATH` have been set.

* `export HIPPYLIB_PATH=path/to/hippylib`
* `export HIPPYFLOW_PATH=path/to/hippyflow`
* `export DINO_PATH=path/to/dino`


## Machine learning in Tensorflow (Beware of version / eager execution)

Neural network training is handled by `keras` within `Tensorflow`. The way that the Jacobians are extracted at present requires that some tensorflow v2 behaviour is disabled. This creates issues with eager execution in later versions of tensorflow. This library works with tensorflow `2.7.0`. In the future, `dino` may be reworked to handle the eager execution issue in later versions of tensorflow.
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,28 @@



# DINO Publications and Manuscripts.

- \[1\] O'Leary-Roseberry, T., Chen P., Villa, U., Ghattas, O.,
[**Derivative-Informed Neural Operator: An Efficient Framework for High-Dimensional Parametric Derivative Learning**](https://arxiv.org/abs/2206.10745).
([Download](https://arxiv.org/abs/2206.10745))<details><summary>BibTeX</summary><pre>
@article{OLearyRoseberryChenVillaEtAl22,
title={Derivative-informed neural operator: an efficient framework for high-dimensional parametric derivative learning},
author={O’Leary-Roseberry, THOMAS and Chen, Peng and Villa, Umberto and Ghattas, Omar},
journal={arXiv preprint arXiv:2206.10745},
year={2022}
}
}</pre></details>

The following use DINO.

- \[2\] Luo, D., O'Leary-Roseberry, T., Chen P., Ghattas, O.,
[**Efficient PDE-Constrained optimization under high-dimensional uncertainty using derivative-informed neural operators**](https://arxiv.org/abs/2305.20053).
([Download](https://arxiv.org/abs/2305.20053))<details><summary>BibTeX</summary><pre>
@article{luo2023efficient,
title={Efficient PDE-Constrained optimization under high-dimensional uncertainty using derivative-informed neural operators},
author={Luo, Dingcheng and O'Leary-Roseberry, Thomas and Chen, Peng and Ghattas, Omar},
journal={arXiv preprint arXiv:2305.20053},
year={2023}
}
}</pre></details>
39 changes: 39 additions & 0 deletions applications/confusion/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Instructions for convection-reaction-diffusion problem

## 1. Generate the training data


First in order to generate the training data run one of the following commands:

`python confusionProblemSetup.py`

or with several simultaneous MPI processes:

`python generateConfusion.py`

The command line arguments `-save_jacobian_data`, `-save_as` are set to `True` (`1`) by default. In order to generate a basis for PCANet (i.e., KLE of the input parameter), additional set the argument `-save_kle` to `True` (`1`). The data will initially be saved to `./data/` in a subfolder that specifies the specifics of the problem. When the data become large, it is also suitable to save them to a different location (e.g. a dedicated storage location) by modifying the location in `hyperelasticityProblemSetup.py`, or simply move the data after the process is complete.

## 2. Train the neural networks

The neural network scripts are all located in `dino_training/`. To run all neural network trainings used in the DINO paper, run

`python training_runs.py`

Note that these runs may take very long, and were all run on a cluster with 1TB of RAM. The data are assumed to be loaded from a subfolder in `data/`. If this was moved somewhere else I suggest using symbolic links, (e.g., in bash `ln -s /path/to/moved/data/ data/`).

When these runs finish they will output trained weights (as pickled dictionaries) to a folder `trained_weights/` within the `dino_training/` directory. The reason the neural networks are not directly [saved and loaded using tensorflow](https://www.tensorflow.org/tutorials/keras/save_and_load) is due to the significant computational graph overhead due to extracting the Jacobians from the neural network. Perhaps a better way to handle breaking up the training of DINOs from their evaluation and deployment would be to instance an identical architecture (without the Jacobian computational graph overhead) at the end of training, and copying over the weights from the trained DINO to the copy, and then saving the copy using tensorflow.

## 3. Evaluate the trained neural networks

Once the neural networks are trained, and their weights have been saved, the networks can be evaluated using the following codes, which are located in `dino_training/evaluation/`.

Once in `dino_training/evaluation/`

`python evaluation_loop.py -weights_dir ../trained_weights/`


These scripts will output dictionaries of evaluated accuracies, gradient errors and Jacobian errors to `dino_training/evaluation/postproc/`




4 changes: 2 additions & 2 deletions applications/confusion/confusionProblemSetup.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def save_logger(logger,filename = 'error_data.pkl'):
parser.add_argument('-nx',dest = 'nx',required= False,default = 64,help='targets for observable',type = int)
parser.add_argument('-ny',dest = 'ny',required= False,default = 64,help='targets for observable',type = int)
parser.add_argument('-gamma',dest = 'gamma',required=False,default = 0.1, help="gamma for matern prior",type=float)
parser.add_argument('-delta',dest = 'delta',required=False,default = 0.5, help="delta for matern prior",type=float)
parser.add_argument('-delta',dest = 'delta',required=False,default = 1.0, help="delta for matern prior",type=float)
parser.add_argument('-formulation',dest = 'formulation',required=False,default = 'confusion', help="formulation name string",type=str)
parser.add_argument('-save_data',dest = 'save_data',\
required= False,default = 0,help='boolean for saving of data',type = int)
Expand Down Expand Up @@ -85,7 +85,7 @@ def save_logger(logger,filename = 'error_data.pkl'):
my_collective = MultipleSamePartitioningPDEsCollective(collective_comm)

# Initialize directories for saving data
output_directory = 'data/'+args.formulation+'_n_obs_'+str(args.nx_targets*args.ny_targets)+'_g'+str(args.gamma)+'_d'+str(args.delta)+'_nx'+str(args.nx)+'/'
output_directory = 'data/'+args.formulation+'_nobs_'+str(args.nx_targets*args.ny_targets)+'_g'+str(args.gamma)+'_d'+str(args.delta)+'_nx'+str(args.nx)+'/'
os.makedirs(output_directory,exist_ok = True)
save_states_dir = output_directory+'save_states/'

Expand Down
33 changes: 20 additions & 13 deletions applications/confusion/dino_paper/dino_training.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
#
# Author: Tom O'Leary-Roseberry
# Contact: tom.olearyroseberry@utexas.edu

import os, sys
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
os.environ['KMP_DUPLICATE_LIB_OK']='True'
Expand All @@ -28,8 +27,8 @@
import time
import pickle

sys.path.append('../../../dino/')
from surrogate_construction import *
sys.path.append( os.environ.get('DINO_PATH'))
from dino import *

# Import CRD problem specifics
sys.path.append('../')
Expand All @@ -45,12 +44,15 @@
# Arguments to be parsed from the command line execution
parser = ArgumentParser(add_help=True)
# Architectural parameters
parser.add_argument("-architecture", dest='architecture',required=False, default = 'as_dense', help="architecture type: as_dense or generic_dense",type=str)
parser.add_argument("-fixed_input_rank", dest='fixed_input_rank',required=False, default = 50, help="rank for input of AS network",type=int)
parser.add_argument("-architecture", dest='architecture',required=False, default = 'rb_dense', help="architecture type: as_dense or generic_dense",type=str)
parser.add_argument("-input_basis", dest='input_basis',required=False, default = 'as', help="input basis: as or kle",type=str)
parser.add_argument("-output_basis", dest='output_basis',required=False, default = 'pod', help="output basis: pod or jjt",type=str)
parser.add_argument("-fixed_input_rank", dest='fixed_input_rank',required=False, default = 100, help="rank for input of AS network",type=int)
parser.add_argument("-fixed_output_rank", dest='fixed_output_rank',required=False, default = 50, help="rank for output of AS network",type=int)
parser.add_argument("-truncation_dimension", dest='truncation_dimension',required=False, default = 50, help="truncation dimension for low rank networks",type=int)
parser.add_argument("-truncation_dimension", dest='truncation_dimension',required=False, default = 100, help="truncation dimension for low rank networks",type=int)
parser.add_argument("-network_name", dest='network_name',required=True, help="out name for the saved weights",type=str)


# Optimization parameters
parser.add_argument("-total_epochs", dest='total_epochs',required=False, default = 1, help="total epochs for training",type=int)

Expand All @@ -64,22 +66,21 @@
parser.add_argument("-train_full_jacobian", dest='train_full_jacobian',required=False, default = 1, help="full J",type=int)


parser.add_argument("-train_data_size", dest='train_data_size',required=False, default = 15*1024, help="training data size",type=int)
parser.add_argument("-train_data_size", dest='train_data_size',required=False, default = 1*1024, help="training data size",type=int)
parser.add_argument("-test_data_size", dest='test_data_size',required=False, default = 1024, help="testing data size",type=int)

args = parser.parse_args()

# jacobian_network = None
problem_settings = confusion_problem_settings()


settings = jacobian_network_settings(problem_settings)

n_obs = 50
gamma = 0.1
delta = 0.5
delta = 1.0
nx = 64
settings['data_dir'] = '../data/confusion_n_obs_'+str(n_obs)+'_g'+str(gamma)+'_d'+str(delta)+'_nx'+str(nx)+'/'
settings['data_dir'] = '../data/confusion_nobs_'+str(n_obs)+'_g'+str(gamma)+'_d'+str(delta)+'_nx'+str(nx)+'/'

settings['target_rank'] = args.target_rank
settings['batch_rank'] = args.batch_rank
Expand All @@ -93,9 +94,11 @@
settings['fixed_input_rank'] = args.fixed_input_rank
settings['fixed_output_rank'] = args.fixed_output_rank
settings['truncation_dimension'] = args.truncation_dimension

settings['input_basis'] = args.input_basis
settings['output_basis'] = args.output_basis

settings['train_full_jacobian'] = args.train_full_jacobian
settings['opt_parameters']['train_full_jacobian'] = args.train_full_jacobian


if (settings['batch_rank'] == settings['target_rank']):
Expand All @@ -114,7 +117,11 @@
if args.l2_weight != 1.0:
settings['network_name'] += 'l2_weight_'+str(args.l2_weight)


jacobian_network = jacobian_training_driver(settings)
if args.h1_weight == 0.0:
# There is no need for DINO training
observable_network = observable_training_driver(settings)
else:
# There is a need for DINO Training
jacobian_network = jacobian_training_driver(settings)


Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# This file is part of the dino package
#
# dino is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, either version 2 of the License, or any later version.
#
# dino is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# If not, see <http://www.gnu.org/licenses/>.
#
# Author: Tom O'Leary-Roseberry
# Contact: tom.olearyroseberry@utexas.edu

import os, sys
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
os.environ['KMP_DUPLICATE_LIB_OK']='True'
os.environ["KMP_WARNINGS"] = "FALSE"
import numpy as np
import tensorflow as tf
if int(tf.__version__[0]) > 1:
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()


import time
import pickle

sys.path.append(os.environ.get('HIPPYLIB_PATH'))
import hippylib

sys.path.append(os.environ.get('HIPPYFLOW_PATH'))
import hippyflow


# Import dino inference module
sys.path.append(os.environ.get('DINO_PATH'))
from dino import *

from dino.evaluation.surrogatePostProcessing import evaluateJacobianNetwork

# Import reaction diffusion problem specifics
sys.path.append('../../')
from confusionModelSettings import confusion_problem_settings

try:
tf.random.set_seed(0)
except:
tf.set_random_seed(0)

from argparse import ArgumentParser
# Arguments to be parsed from the command line execution
parser = ArgumentParser(add_help=True)
# Weights directory
parser.add_argument("-weights_dir", dest='weights_dir', required=True, help="Weights directory",type=str)
# parser.add_argument("-ndata", dest='ndata', required=True, help="ndata",type=str)
parser.add_argument("-input_dim", dest = 'input_dim',required=False,default = 4225, help = "input dim",type = int)
parser.add_argument("-logging_dir", dest = 'logging_dir',required=False,default = 'postproc/accuracies/', help = "input dim",type = str)
args = parser.parse_args()

problem_settings = confusion_problem_settings()

weights_dir = args.weights_dir+'/'


weights_files = os.listdir(weights_dir)

n_obs = 50
gamma = 0.1
delta = 1.0
nx = 64
data_dir = '../../data/confusion_nobs_'+str(n_obs)+'_g'+str(gamma)+'_d'+str(delta)+'_nx'+str(nx)+'/'

print(os.path.isdir(data_dir))


for weights_name in weights_files:
print('weights_name = ',weights_name)
t0 = time.time()
####
evaluate_network = False
settings = jacobian_network_settings(problem_settings)
settings['nullspace_constraints'] = False
settings['opt_parameters']['loss_weights'] = [1.0,1.0]
settings['depth'] = 6
settings['fixed_input_rank'] = 50
settings['full_jacobian'] = True
settings['full_JVVT'] = False
####

if ('as_dense' in weights_name.lower()) or ('dipnet' in weights_name.lower()):
settings['architecture'] = 'rb_dense'
if ('10050' in weights_name) or ('100-50' in weights_name):
print('100')
settings['fixed_input_rank'] = 100

evaluate_network = True

elif 'generic_dense' in weights_name:
settings['architecture'] = 'generic_dense'
# What is a better way in general to set the input and output dimensions.
settings['input_dim'] = args.input_dim
settings['output_dim'] = 50
evaluate_network = True
else:
print('Not implemented, passing for now')
pass

if evaluate_network:
file_name = weights_dir+weights_name
jacobian_network = observable_network_loader(settings, file_name = file_name)
for i in range(2):
print(80*'#')
print('Running for :'.center(80))
print(weights_name.center(80))
for i in range(2):
print(80*'#')
results = evaluateJacobianNetwork(settings,jacobian_network = jacobian_network,data_dir = data_dir)
logging_dir = args.logging_dir
logger_name = weights_name.split(weights_dir)[-1].split('.pkl')[0]+'_accuracies.pkl'

os.makedirs(logging_dir,exist_ok = True)
import pickle

with open(logging_dir+logger_name, 'wb+') as f:
pickle.dump(results, f, pickle.HIGHEST_PROTOCOL)

print(' Time = ',time.time() - t0,'s')





Loading

0 comments on commit 1ddc2fa

Please sign in to comment.