TROPHY (Trust Region Optimization with Precision HierarchY) is, as the name implies, a trust region method that uses differing precision to reduce storage, communication costs, and power consumption by using lower precision when able and increasing precision when necessary. You will find code in this repos validating the algorithms merit and providing numerical results for publication. A copy our paper can be found here. Please cite
@inproceedings{clancy2022trophy,
title={TROPHY: Trust Region Optimization Using a Precision Hierarchy},
author={Clancy, Richard J and Menickelly, Matt and H{\"u}ckelheim, Jan and Hovland, Paul and Nalluri, Prani and Gjini, Rebecca},
booktitle={Computational Science--ICCS 2022: 22nd International Conference, London, UK, June 21--23, 2022, Proceedings, Part I},
pages={445--459},
year={2022},
organization={Springer}
}
To work with multi-precision in PyCUTEst, a problem cache must be built in multiple precisions. This was accomplished by tracking down the point in the PyCUTEst code after source files have been generated via the SIF decoder, but before the source files have been compiled. At this point, preducer.py
is called which runs a subroutine that casts the problem in single precision then compiles. Although we don't expect any computational speed-ups, function evaluations are carried out in lower precision allowing us to evaluate deterioration of iterates/solutions using single rather than double precision.
To build the multi-precision objectives and gradients for the CUTEst set via PyCUTEst, we do the following:
-
The CUTEst set must be installed first and the environment set up correctly. Installation instructions can be found at the PyCUTEst site and the other links included. It is necessary to set several environment variables for instance:
export ARCHDEFS="~/Research/vp_trust/archdefs" export SIFDECODE="~/Research/vp_trust/sifdecode" export CUTEST="~/Research/vp_trust/cutest" export PREDUCER="~/repos/trophy/preducer" () export GITREPOS="~/repos/
Although the PyCUTEst package will be installed, a modified package must be used to build the different packages and cache them for test use later. The main difference is that the amended package has been modifed to allow for a precision specification, i.e., single or double.
-
Pip install the dependencies for
preducer
:fparser, sys, re, textwrap
-
Navigate to the folder containing
BuildMultiPrecision.py
and thepycutest_for_trophy
package then runpython BuildMultiPrecision.py
. When running on new machine, I had difficulting finding thecutest.h
header. I resolved the problems by setting the C++ include path withexport CPLUS_INCLUDE_PATH=$CUTEST/include
.
After doing this, the problems should be built without trouble. In the event some of the builds fail, add the problems name to the bad_problems
list and build the rest.
Once all the CUTEst problems have been built, we are able to benchmark performance of TROPHY versus single and double precision alone and other algorithms if we choose. We ran run_examples.py
to accomplish this.
Our output files can be found in the data folder, the graphics were generated by running profiler.py
.
From the trophy/julia
folder, data exists and can be plotted using profiler_for_julia_port.py
. We generated data by specifying the desire precision dictionary in run_examples_for_julia_port.py
then running it. If you are using Anaconda, there are issues with the Julia library used within Python, the script may not run. You can work around this problem by using the python-jl run_examples_for_julia_port.py
from the command line. It seems as though there maybe a memory leak within the Julia code that executes Python scripts, so periodic restarting drastically incease the speed.
The PyDDA
package uses radar data and other components in a physics informed objective to generate wind fields. We chose this problem for several reasons. First, a previous intern had worked on the package to implement the objective in JAX which is a Google written and maintained AD tool that allows for calculations to run on GPUs. This in theory would allow us to accelerate computation and speed convergence.
To get the package working with TROPHY, there were several adjustments that needed to be made. Modifications were built on top of the PyDDA package that was augmented by Rebecca's work with function and gradient calls written in JAX. I have since added an option that determines whether or not the Jax version is on. I did not perform any tests to validate the accuracy of the JAX implementation. We see differences when solving the huricane_florence
problem but it's not clear where it's from and goes uninvestigated. It appears to converge to a different point then flatten out. Regardless, the hurricane florence example givens us trouble and is a difficult problem to solve in either case.
Beyond using JAX amended functions, we have augmented the solver to use TROPHY instead of f_min_lbfgs_b
as was used previously. There is a flag use_dynTR
that defaults to true. If set to false, the old BFGS algorithm is used. I haven't tested BFGS extensively after making my edits, breaks could ensue for a variety of reasons. I also removed the restarting procedure for the solver. Previously, L-BFGS would restart after 10 iterations. Convergence checks were conducted between restarts. We changed this to run continuosly without restarts; the solver is called once and runs until a stopping criteria is satisfied.
To use multiple precisions, we needed an objective that accepts different precisions and a way to execute in multiple precisions. To do this, I wrote a function in wind_retrieve.py
called obj_fun
with the inputs z, params, precision, jax_on
with z
being the winds speed variable, params
being the problem parameters, precision
taking string values of "half", "single", "double"
, and jax_on
specifying whether or not to use JAX functions where available. From within obj_fun
, calls J_function, grad_J
located in cost_functions.py
. Both these functions then call on function change_precision
in cost_functions.py
that accept variable and parameter then caste them into the desired precision. This is a poor implementations since the parameters are being recast each time. This is certainly worth revisiting in the future.
Additional code was added to print more details, iterates and gradients are not stored due to size. Although there was a lot of experimenting, the main package used for testing is dynTRpydda_edits
which includes all relevant edits (calls to TROPHY, JAX implementation, elimination of restarting loop, etc). Since the main focus was on generating solutions of high accuracy, I didn't spend any time plotting the results and there is a chance breaks could occur. In addition, I turned off all after the fact smoothing which could also raise issues.
The JAX package is having difficulty running on the Apple M1 chip. As a result, I am waiting to add more details on the implementation until I can walk through it again myself.