Skip to content

The convergence package is designed to help in automatic equilibration detection & run length control.

License

Notifications You must be signed in to change notification settings

openkim/kim-convergence

Repository files navigation

kim-convergence utility module

Python package Windows Build status Anaconda-Server Badge PyPI License

How do you automatically estimate the length of the simulation required?

It is desirable to simulate the minimum amount of time necessary to reach an acceptable amount of uncertainty in the quantity of interest.

How do you automatically estimate the length of the warm-up period required?

Welcome to kim-convergence module!

The kim-convergence package is designed to help in automatic equilibration detection & run length control.

PLEASE NOTE:

the kim-convergence code is under active development and is still in beta versions 0.0.2. In general changes to the patch version (the third number) indicate backward compatible beta releases, but please be aware that file formats and APIs may change.

Bug reports are also welcomed in the GitHub issues!

Document

!WORK IN PROGRESS!

Installing kim-convergence

Requirements

You need Python 3.7 or later to run kim-convergence. You can have multiple Python versions (2.x and 3.x) installed on the same system without problems.

To install Python 3 for different Linux flavors, macOS and Windows, packages are available at
https://www.python.org/getit/

Using pip

pip is the most popular tool for installing Python packages, and the one included with modern versions of Python.

kim-convergence can be installed with pip:

pip install kim-convergence

NOTE:

Depending on your Python installation, you may need to use pip3 instead of pip.

pip3 install kim-convergence

Depending on your configuration, you may have to run pip like this:

python3 -m pip install kim-convergence

Using pip (GIT Support)

pip currently supports cloning over git

pip install git+https://github.com/openkim/kim-convergence.git

For more information and examples, see the pip install reference.

Using conda

conda is the package management tool for Anaconda Python installations.

Installing kim-convergence from the conda-forge channel can be achieved by adding conda-forge to your channels with:

conda config --add channels conda-forge
conda config --set channel_priority strict

Once the conda-forge channel has been enabled, kim-convergence can be installed with:

conda install kim-convergence

It is possible to list all of the versions of kim-convergence available on your platform with:

conda search kim-convergence --channel conda-forge

Basic Usage

Basic usage involves importing kim-convergence and use the utility to control the length of the time series data from a simulation run or a sampling approach, or a dump file from the previously done simulation.

The main requirement is a get_trajectory function. get_trajectory is a callback function with a specific signature of

get_trajectory(nstep: int) -> 1darray

if we only have one variable or,

get_trajectory(nstep: int) -> 2darray

with the shape of return array as,

(number_of_variables, nstep).

For example,

rng = np.random.RandomState(12345)
stop = 0

def get_trajectory(step: int) -> np.ndarray:
  global stop
  start = stop
  if 100000 < start + step:
    step = 100000 - start
  stop += step
  data = np.ones(step) * 10 + (rng.random_sample(step) - 0.5)
  return data

NOTE:

To use extra arguments in calling the get_trajectory function, one can use the other specific signature of

get_trajectory(nstep: int, args: dict) -> 1darray

or

get_trajectory(nstep: int, args: dict) -> 2darray,

where all the extra required parameters and arguments can be provided with the args.

rng = np.random.RandomState(12345)
args = {'stop': 0, 'maximum_steps': 100000}

def get_trajectory(step: int, args: dict) -> np.ndarray:
  start = args['stop']
  if args['maximum_steps'] < start + step:
    step = args['maximum_steps'] - start
  args['stop'] += step
  data = np.ones(step) * 10 + (rng.random_sample(step) - 0.5)
  return data

Then call the run_length_control function as below,

import kim_convergence as cr

msg = cr.run_length_control(
  get_trajectory=get_trajectory,
  number_of_variables=1,
  initial_run_length=1000,
  maximum_run_length=100000,
  relative_accuracy=0.01,
  fp_format='json'
)

or

import kim_convergence as cr

msg = cr.run_length_control(
  get_trajectory=get_trajectory,
  get_trajectory_args=args,
  number_of_variables=1,
  initial_run_length=1000,
  maximum_run_length=100000,
  relative_accuracy=0.01,
  fp_format='json'
)

An estimate produced by a simulation typically has an accuracy requirement and is an input to the utility. This requirement means that the experimenter wishes to run the simulation only until an estimate meets this accuracy requirement. Running the simulation less than this length would not provide the information needed while running it longer would be a waste of computing time. In the above example, the accuracy requirement is specified as the relative accuracy.

In case of having more than one variable,

rng = np.random.RandomState(12345)
stop = 0

def get_trajectory(step: int) -> np.ndarray:
  global stop
  start = stop
  if 100000 < start + step:
    step = 100000 - start
  stop += step
  data = np.ones((3, step)) * 10 + (rng.random_sample(3 * step).reshape(3, step) - 0.5)
  return data

Then call the run_length_control function as below,

import kim_convergence as cr

msg = cr.run_length_control(
  get_trajectory=get_trajectory,
  number_of_variables=3,
  initial_run_length=1000,
  maximum_run_length=100000,
  relative_accuracy=0.01,
  fp_format='json'
)

NOTE:

All the values returned from this get_trajectory function should be finite values, otherwise the code will stop wih error message explaining the issue.

ERROR(@_get_trajectory): there is/are value/s in the input which is/are non-finite or not number.

Thus, one should remove infinit values or Not a Number (NaN) values from the returning array within the get_trajectory function.


The run-length control procedure employs initial_run_length parameter. It begins at time 0 and starts calling the get_trajectory function with the provided number of steps (e.g. initial_run_length=1000). At this point, and with no assumptions about the distribution of the observable of interest, it tries to estimate an equilibration time. Failing to find the transition point will request more data and call the get_trajectory function until it finds the equilibration time or hits the maximum run length limit (e.g. maximum_run_length=100000).

At this point, and after finding an optimal equilibration time, the confidence interval (CI) generation method is applied to the set of available data points. If the resulting confidence interval met the provided accuracy value (e.g. relative_accuracy=0.01), the simulation is terminated. If not, the simulation is continued by requesting more data and calling the get_trajectory function again and again until it does. This procedure continues until the criteria is met or it reaches the maximum run length limit.

The relative_accuracy as mentioned above, is the relative precision and defined as a half-width of the estimator's confidence interval or an approximated upper confidence limit (UCL) divided by the computed sample mean.

The UCL is calculated as a confidence_coefficient% confidence interval for the mean, using the portion of the time series data, which is in the stationary region. If the ratio is bigger than relative_accuracy, the length of the time series is deemed not long enough to estimate the mean with sufficient accuracy, which means the run should be extended.

The accuracy parameter relative_accuracy specifies the maximum relative error that will be allowed in the mean value of the data point series. In other words, the distance from the confidence limit(s) to the mean (which is also known as the precision, half-width, or margin of error). A value of 0.01 is usually used to request two digits of accuracy, and so forth.

The parameter confidence_coefficient is the confidence coefficient and often, the values 0.95 is used. For the confidence coefficient, confidence_coefficient, we can use the following interpretation, If thousands of samples of n items are drawn from a population using simple random sampling and a confidence interval is calculated for each sample, the proportion of those intervals that will include the true population mean is confidence_coefficient.

Contact us

If something is not working as you think it should or would like it to, please get in touch with us! Further, if you have an algorithm or any idea that you would want to try using the kim-convergence, please get in touch with us, we would be glad to help!

Gitter

Contributing

Contributions are very welcome.

Copyright

Copyright (c) 2021, Regents of the University of Minnesota.
All Rights Reserved

Contributors

Contributors:
      Yaser Afshar

About

The convergence package is designed to help in automatic equilibration detection & run length control.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages