Faster training in reduced bases (#4)

* working on a more efficient reduced basis training * updating the fast training code, still an issue in the end when re-stitching the networks * the fast training is working currently for the as_dense architecture * testing accuracy in l2 * updating reduced basis training codes * fixing some last minute issues with the dino training * more improvements * updating fast training to accomodate more than as * adding callbacks to the opt_parameters default dict to accomodate learning rate scheduling * adding callbacks to the opt_parameters default dict to accomodate learning rate scheduling * adding readme and drivers, working on finishing this pull request * updating some code, getting drivers sorted out in the refactoring * updating training drivers * further refactoring * more refactoring * updating * updating, lets see if the dipnet stuff still runs correctly * massive commit incoming * running hyperelasticity runs again * updating * updating again more rb_dense typos and such * needed to move the save weights out of the train_dino which could be run in reduced setting * hyperelasticity evaluation and post-processing working * updating everything * hyper and rdiff evaluation have all been checked * a few debugging leftovers needed to be removed * all examples are completely documented and functional in the refactoring * adding INSTALL file * updating README * some updates regarding the evaluation suite
tomoleary · Jul 29, 2023 · 1ddc2fa · 1ddc2fa
1 parent 42cda9a
commit 1ddc2fa
Show file tree

Hide file tree

Showing 55 changed files with 4,303 additions and 1,183 deletions.
diff --git a/INSTALL.md b/INSTALL.md
@@ -0,0 +1,33 @@
+				Derivative Informed Neural Operator
+
+			     _____                      ___           ___     
+			    /  /::\       ___          /__/\         /  /\    
+			   /  /:/\:\     /  /\         \  \:\       /  /::\   
+			  /  /:/  \:\   /  /:/          \  \:\     /  /:/\:\  
+			 /__/:/ \__\:| /__/::\      _____\__\:\   /  /:/  \:\ 
+			 \  \:\ /  /:/ \__\/\:\__  /__/::::::::\ /__/:/ \__\:\
+			  \  \:\  /:/     \  \:\/\ \  \:\~~\~~\/ \  \:\ /  /:/
+			   \  \:\/:/       \__\::/  \  \:\  ~~~   \  \:\  /:/ 
+			    \  \::/        /__/:/    \  \:\        \  \:\/:/  
+			     \__\/         \__\/      \  \:\        \  \::/   
+			                               \__\/         \__\/    
+
+		An Efficient Framework for High-Dimensional Parametric Derivative Learning
+
+
+* PDE data generation is handled by `FEniCS` `hIPPYlib`, and `hippyflow`. For this [`hIPPYlib`](https://github.com/hippylib/hippylib) and [`hippyflow`](https://github.com/hippylib/hippyflow) must be installed. 
+
+With conda
+
+* `conda create -n hippyflow -c uvilla -c conda-forge fenics==2019.1.0 tensorflow matplotlib scipy tensorflow=2.7.0`
+
+Assumes that the environmental variables `HIPPYLIB_PATH`, `HIPPYFLOW_PATH` and `DINO_PATH` have been set.
+
+* `export HIPPYLIB_PATH=path/to/hippylib`
+* `export HIPPYFLOW_PATH=path/to/hippyflow`
+* `export DINO_PATH=path/to/dino`
+
+
+## Machine learning in Tensorflow (Beware of version / eager execution)
+
+Neural network training is handled by `keras` within `Tensorflow`. The way that the Jacobians are extracted at present requires that some tensorflow v2 behaviour is disabled. This creates issues with eager execution in later versions of tensorflow. This library works with tensorflow `2.7.0`. In the future, `dino` may be reworked to handle the eager execution issue in later versions of tensorflow.  
diff --git a/README.md b/README.md
@@ -40,3 +40,28 @@
 
 
 
+# DINO Publications and Manuscripts.
+
+- \[1\] O'Leary-Roseberry, T., Chen P., Villa, U., Ghattas, O.,
+[**Derivative-Informed Neural Operator: An Efficient Framework for High-Dimensional Parametric Derivative Learning**](https://arxiv.org/abs/2206.10745).
+([Download](https://arxiv.org/abs/2206.10745))<details><summary>BibTeX</summary><pre>
+@article{OLearyRoseberryChenVillaEtAl22,
+  title={Derivative-informed neural operator: an efficient framework for high-dimensional parametric derivative learning},
+  author={O’Leary-Roseberry, THOMAS and Chen, Peng and Villa, Umberto and Ghattas, Omar},
+  journal={arXiv preprint arXiv:2206.10745},
+  year={2022}
+}
+}</pre></details>
+
+The following use DINO.
+
+- \[2\] Luo, D., O'Leary-Roseberry, T., Chen P., Ghattas, O.,
+[**Efficient PDE-Constrained optimization under high-dimensional uncertainty using derivative-informed neural operators**](https://arxiv.org/abs/2305.20053).
+([Download](https://arxiv.org/abs/2305.20053))<details><summary>BibTeX</summary><pre>
+@article{luo2023efficient,
+  title={Efficient PDE-Constrained optimization under high-dimensional uncertainty using derivative-informed neural operators},
+  author={Luo, Dingcheng and O'Leary-Roseberry, Thomas and Chen, Peng and Ghattas, Omar},
+  journal={arXiv preprint arXiv:2305.20053},
+  year={2023}
+}
+}</pre></details>
diff --git a/applications/confusion/README.md b/applications/confusion/README.md
@@ -0,0 +1,39 @@
+# Instructions for convection-reaction-diffusion problem
+
+## 1. Generate the training data
+
+
+First in order to generate the training data run one of the following commands:
+
+`python confusionProblemSetup.py`
+
+or with several simultaneous MPI processes:
+
+`python generateConfusion.py`
+
+The command line arguments `-save_jacobian_data`, `-save_as` are set to `True` (`1`) by default. In order to generate a basis for PCANet (i.e., KLE of the input parameter), additional set the argument `-save_kle` to `True` (`1`). The data will initially be saved to `./data/` in a subfolder that specifies the specifics of the problem. When the data become large, it is also suitable to save them to a different location (e.g. a dedicated storage location) by modifying the location in `hyperelasticityProblemSetup.py`, or simply move the data after the process is complete.
+
+## 2. Train the neural networks
+
+The neural network scripts are all located in `dino_training/`. To run all neural network trainings used in the DINO paper, run
+
+ `python training_runs.py` 
+
+ Note that these runs may take very long, and were all run on a cluster with 1TB of RAM. The data are assumed to be loaded from a subfolder in `data/`. If this was moved somewhere else I suggest using symbolic links, (e.g., in bash `ln -s /path/to/moved/data/ data/`). 
+
+ When these runs finish they will output trained weights (as pickled dictionaries) to a folder `trained_weights/` within the `dino_training/` directory. The reason the neural networks are not directly [saved and loaded using tensorflow](https://www.tensorflow.org/tutorials/keras/save_and_load) is due to the significant computational graph overhead due to extracting the Jacobians from the neural network. Perhaps a better way to handle breaking up the training of DINOs from their evaluation and deployment would be to instance an identical architecture (without the Jacobian computational graph overhead) at the end of training, and copying over the weights from the trained DINO to the copy, and then saving the copy using tensorflow.
+
+## 3. Evaluate the trained neural networks
+
+Once the neural networks are trained, and their weights have been saved, the networks can be evaluated using the following codes, which are located in `dino_training/evaluation/`.
+
+Once in `dino_training/evaluation/`
+
+`python evaluation_loop.py -weights_dir ../trained_weights/`
+
+
+These scripts will output dictionaries of evaluated accuracies, gradient errors and Jacobian errors to `dino_training/evaluation/postproc/`
+
+
+
+
diff --git a/applications/confusion/confusionProblemSetup.py b/applications/confusion/confusionProblemSetup.py
@@ -51,7 +51,7 @@ def save_logger(logger,filename = 'error_data.pkl'):
 parser.add_argument('-nx',dest = 'nx',required= False,default = 64,help='targets for observable',type = int)
 parser.add_argument('-ny',dest = 'ny',required= False,default = 64,help='targets for observable',type = int)
 parser.add_argument('-gamma',dest = 'gamma',required=False,default = 0.1, help="gamma for matern prior",type=float)
-parser.add_argument('-delta',dest = 'delta',required=False,default = 0.5, help="delta for matern prior",type=float)
+parser.add_argument('-delta',dest = 'delta',required=False,default = 1.0, help="delta for matern prior",type=float)
 parser.add_argument('-formulation',dest = 'formulation',required=False,default = 'confusion', help="formulation name string",type=str)
 parser.add_argument('-save_data',dest = 'save_data',\
 					required= False,default = 0,help='boolean for saving of data',type = int)
@@ -85,7 +85,7 @@ def save_logger(logger,filename = 'error_data.pkl'):
 my_collective = MultipleSamePartitioningPDEsCollective(collective_comm)
 
 # Initialize directories for saving data
-output_directory = 'data/'+args.formulation+'_n_obs_'+str(args.nx_targets*args.ny_targets)+'_g'+str(args.gamma)+'_d'+str(args.delta)+'_nx'+str(args.nx)+'/'
+output_directory = 'data/'+args.formulation+'_nobs_'+str(args.nx_targets*args.ny_targets)+'_g'+str(args.gamma)+'_d'+str(args.delta)+'_nx'+str(args.nx)+'/'
 os.makedirs(output_directory,exist_ok = True)
 save_states_dir = output_directory+'save_states/'
 

diff --git a/applications/confusion/dino_paper/dino_training.py b/applications/confusion/dino_paper/dino_training.py
@@ -14,7 +14,6 @@
 #
 # Author: Tom O'Leary-Roseberry
 # Contact: tom.olearyroseberry@utexas.edu
-
 import os, sys
 os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
 os.environ['KMP_DUPLICATE_LIB_OK']='True'
@@ -28,8 +27,8 @@
 import time
 import pickle
 
-sys.path.append('../../../dino/')
-from surrogate_construction import *
+sys.path.append( os.environ.get('DINO_PATH'))
+from dino import *
 
 # Import CRD problem specifics
 sys.path.append('../')
@@ -45,12 +44,15 @@
 # Arguments to be parsed from the command line execution
 parser = ArgumentParser(add_help=True)
 # Architectural parameters
-parser.add_argument("-architecture", dest='architecture',required=False, default = 'as_dense', help="architecture type: as_dense or generic_dense",type=str)
-parser.add_argument("-fixed_input_rank", dest='fixed_input_rank',required=False, default = 50, help="rank for input of AS network",type=int)
+parser.add_argument("-architecture", dest='architecture',required=False, default = 'rb_dense', help="architecture type: as_dense or generic_dense",type=str)
+parser.add_argument("-input_basis", dest='input_basis',required=False, default = 'as',  help="input basis: as or kle",type=str)
+parser.add_argument("-output_basis", dest='output_basis',required=False, default = 'pod',  help="output basis: pod or jjt",type=str)
+parser.add_argument("-fixed_input_rank", dest='fixed_input_rank',required=False, default = 100, help="rank for input of AS network",type=int)
 parser.add_argument("-fixed_output_rank", dest='fixed_output_rank',required=False, default = 50, help="rank for output of AS network",type=int)
-parser.add_argument("-truncation_dimension", dest='truncation_dimension',required=False, default = 50, help="truncation dimension for low rank networks",type=int)
+parser.add_argument("-truncation_dimension", dest='truncation_dimension',required=False, default = 100, help="truncation dimension for low rank networks",type=int)
 parser.add_argument("-network_name", dest='network_name',required=True,  help="out name for the saved weights",type=str)
 
+
 # Optimization parameters
 parser.add_argument("-total_epochs", dest='total_epochs',required=False, default = 1,  help="total epochs for training",type=int)
 
@@ -64,22 +66,21 @@
 parser.add_argument("-train_full_jacobian", dest='train_full_jacobian',required=False, default = 1,  help="full J",type=int)
 
 
-parser.add_argument("-train_data_size", dest='train_data_size',required=False, default = 15*1024,  help="training data size",type=int)
+parser.add_argument("-train_data_size", dest='train_data_size',required=False, default = 1*1024,  help="training data size",type=int)
 parser.add_argument("-test_data_size", dest='test_data_size',required=False, default = 1024,  help="testing data size",type=int)
 
 args = parser.parse_args()
 
-# jacobian_network = None
 problem_settings = confusion_problem_settings()
 
 
 settings = jacobian_network_settings(problem_settings)
 
 n_obs = 50
 gamma = 0.1
-delta = 0.5
+delta = 1.0
 nx = 64
-settings['data_dir'] = '../data/confusion_n_obs_'+str(n_obs)+'_g'+str(gamma)+'_d'+str(delta)+'_nx'+str(nx)+'/'
+settings['data_dir'] = '../data/confusion_nobs_'+str(n_obs)+'_g'+str(gamma)+'_d'+str(delta)+'_nx'+str(nx)+'/'
 
 settings['target_rank'] = args.target_rank
 settings['batch_rank'] = args.batch_rank
@@ -93,9 +94,11 @@
 settings['fixed_input_rank'] = args.fixed_input_rank
 settings['fixed_output_rank'] = args.fixed_output_rank
 settings['truncation_dimension'] = args.truncation_dimension
-
+settings['input_basis'] = args.input_basis
+settings['output_basis'] = args.output_basis
 
 settings['train_full_jacobian'] = args.train_full_jacobian
+settings['opt_parameters']['train_full_jacobian'] = args.train_full_jacobian
 
 
 if (settings['batch_rank'] == settings['target_rank']):
@@ -114,7 +117,11 @@
 if args.l2_weight != 1.0:
 	settings['network_name'] += 'l2_weight_'+str(args.l2_weight)
 
-
-jacobian_network = jacobian_training_driver(settings)
+if args.h1_weight == 0.0:
+	# There is no need for DINO training
+	observable_network = observable_training_driver(settings)
+else:
+	# There is a need for DINO Training
+	jacobian_network = jacobian_training_driver(settings)
 
 
diff --git a/applications/confusion/dino_paper/evaluation/evaluate_network_accuracies.py b/applications/confusion/dino_paper/evaluation/evaluate_network_accuracies.py
@@ -0,0 +1,136 @@
+# This file is part of the dino package
+#
+# dino is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation, either version 2 of the License, or any later version.
+#
+# dino is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public License
+# If not, see <http://www.gnu.org/licenses/>.
+#
+# Author: Tom O'Leary-Roseberry
+# Contact: tom.olearyroseberry@utexas.edu
+
+import os, sys
+os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
+os.environ['KMP_DUPLICATE_LIB_OK']='True'
+os.environ["KMP_WARNINGS"] = "FALSE" 
+import numpy as np
+import tensorflow as tf
+if int(tf.__version__[0]) > 1:
+    import tensorflow.compat.v1 as tf
+    tf.disable_v2_behavior()
+
+
+import time
+import pickle
+
+sys.path.append(os.environ.get('HIPPYLIB_PATH'))
+import hippylib
+
+sys.path.append(os.environ.get('HIPPYFLOW_PATH'))
+import hippyflow
+
+
+# Import dino inference module
+sys.path.append(os.environ.get('DINO_PATH'))
+from dino import *
+
+from dino.evaluation.surrogatePostProcessing import evaluateJacobianNetwork
+
+# Import reaction diffusion problem specifics
+sys.path.append('../../')
+from confusionModelSettings import confusion_problem_settings
+
+try:
+	tf.random.set_seed(0)
+except:
+	tf.set_random_seed(0)
+
+from argparse import ArgumentParser
+# Arguments to be parsed from the command line execution
+parser = ArgumentParser(add_help=True)
+# Weights directory
+parser.add_argument("-weights_dir", dest='weights_dir', required=True, help="Weights directory",type=str)
+# parser.add_argument("-ndata", dest='ndata', required=True, help="ndata",type=str)
+parser.add_argument("-input_dim", dest = 'input_dim',required=False,default = 4225, help = "input dim",type = int)
+parser.add_argument("-logging_dir", dest = 'logging_dir',required=False,default = 'postproc/accuracies/', help = "input dim",type = str)
+args = parser.parse_args()
+
+problem_settings = confusion_problem_settings()
+
+weights_dir = args.weights_dir+'/'
+
+
+weights_files = os.listdir(weights_dir)
+
+n_obs = 50
+gamma = 0.1
+delta = 1.0
+nx = 64
+data_dir = '../../data/confusion_nobs_'+str(n_obs)+'_g'+str(gamma)+'_d'+str(delta)+'_nx'+str(nx)+'/'
+
+print(os.path.isdir(data_dir))
+
+
+for weights_name in weights_files:
+	print('weights_name = ',weights_name)
+	t0 = time.time()
+	####
+	evaluate_network = False
+	settings = jacobian_network_settings(problem_settings)
+	settings['nullspace_constraints'] = False
+	settings['opt_parameters']['loss_weights'] = [1.0,1.0]
+	settings['depth'] = 6
+	settings['fixed_input_rank'] = 50
+	settings['full_jacobian'] = True
+	settings['full_JVVT'] = False
+	####
+
+	if ('as_dense' in weights_name.lower()) or ('dipnet' in weights_name.lower()):
+		settings['architecture'] = 'rb_dense'
+		if ('10050' in weights_name) or ('100-50' in weights_name):
+			print('100')
+			settings['fixed_input_rank'] = 100
+
+		evaluate_network = True
+
+	elif 'generic_dense' in weights_name:
+		settings['architecture'] = 'generic_dense'
+		# What is a better way in general to set the input and output dimensions.
+		settings['input_dim'] = args.input_dim
+		settings['output_dim'] = 50
+		evaluate_network = True
+	else:
+		print('Not implemented, passing for now')
+		pass
+
+	if evaluate_network:
+		file_name = weights_dir+weights_name
+		jacobian_network = observable_network_loader(settings, file_name = file_name)	
+		for i in range(2):
+			print(80*'#')
+		print('Running for :'.center(80))
+		print(weights_name.center(80))
+		for i in range(2):
+			print(80*'#')
+		results = evaluateJacobianNetwork(settings,jacobian_network = jacobian_network,data_dir = data_dir)
+		logging_dir = args.logging_dir
+		logger_name = weights_name.split(weights_dir)[-1].split('.pkl')[0]+'_accuracies.pkl'
+
+		os.makedirs(logging_dir,exist_ok = True)
+		import pickle
+
+		with open(logging_dir+logger_name, 'wb+') as f:
+		    pickle.dump(results, f, pickle.HIGHEST_PROTOCOL)
+
+	print(' Time = ',time.time() - t0,'s')
+
+
+
+
+