Forward modeling for circumstellar debris disks in scattered light.
Method: Monte Carlo Markov Chain (MCMC) with one of the two disk modeling software
- the MCFOST disk modeling software, see here for the documentation of MCFOST.
- the Python-based disk modeling code using Henyey–Greenstein phase function by M. Millar-Blanchaer, see here for the code, and here for the original usage of the code.
DebrisDiskFM is first used in Ren, Choquet, Perrin et al. (2019) (ADS link, BibTeX) with MCFOST, and in Ren, Choquet, Perrin et al. (2021) (ADS link, BibTeX) with the M. Millar-Blanchaer code.
pip install --user -e git+https://github.com/seawander/DebrisDiskFM.git#egg=Package
The above command does not require administrator access, and can be run both on one's personal desktop and on a computer cluster.
For a given sytem, generate a disk template with the following command
sampleTemplate = mcfostParameterTemplate.generateMcfostTemplate(n_zone = n_zone, n_species = n_species, n_star = n_star)
The above command generate a template in the collections.OrderedDict()
structure, for this template, there are n_zone
zones; n_species
species (note: n_species
is a list. For example, when n_zone = 2
, we can let n_species = [3, 2]
, then in the 0th zone, i.e., zone0
, there are 3 species of grains, and in the 1st zone, i.e., zone1
, there are 2 species; and n_star
stars shining the whole system.
From the previous paragraph, we have a sample template, then we should modify the parameters in the template for our specific target.
The structure of the template is in this mind-map-structured PDF file: MCFOST Parameter OrderedDict.pdf. In this PDF file, the quoted parameters are what you can modify, and all of the parameters have the same names as the MCFOST parameter file. The only added ones are the row numbers in each block (named as 'rowW'
where W = 0 to the number of rows - 1 in that block), and 'zoneX'
where X = 0 to (n_zone - 1), 'speciesY'
where Y = 0 to (n_species[X] - 1), and 'starZ'
where Z = 0 to (n_star - 1).
For example, if you want to turn on the Stokes maps, use
sampleTemplate['#Wavelength']['row3']['stokes parameters?'] = 'T'
or if you want to change the input file for optical indices to 'ice_opct.dat' for the 2nd species in zone0, use
sampleTemplate['#Grain properties']['zone0']['species2']['row1']['Optical indices file'] = 'ice_opct.dat'
for the detailed parameters that can be changed, refer to the PDF file.
Just call
save_path = None
mcfostParameterTemplate.display_file(sampleTemplate, save_path)
and it will save the parameter structure in a proper MCFOST parameter file format to save_path
, if None
then it will display to the screen only; or save to the address if it is not None
.
2. Markov chain Monte Carlo (MCMC) debris disk radiative transfer modeling framework set-up
From the conditional probability equation,
P(A|B) = P(AB)/P(B) = P(B|A)P(A)/P(B),
we haveP(B|A) = P(A|B)P(B)/P(A).
Let A be the observed data, and B be the hidden parameters, then we can infer the distribution of B from the A data we have from the above equation. However, in most cases, we do not know the distribution of A, the above equation can be written asP(B|A) ∝ P(A|B) P(B),
which is the famous posterior probability relationship, i.e.,
Posterior probability ∝ Likelihood x Prior probability.
We can use MCMC for to extract the posterior probability distribution for the unknown parameters. In reality, we usually take a log function on both sides for the above equation, which then transforms multiplication to addition, i.e.,
log(Posterior probability) = log(Likelihood) + log(Prior probability) + constant.
Now we use the HD191089 debris disk system as an example to explain the DebrisDiskFM framework.
Define the variable names and prior values as in debrisdiskfm/lnprior.py. In the lnprior_hd191089
function, the default prior distribution is uniform distribution.
Please change or adjust the names/values/distribution for your own system. For a uniform prior, if a value is drawn outside the range, then negative infinity is returned.
Note: the variable names should match the names in the #### Section 2: Variables ####
section in the debrisdiskfm/mcfostRun.py run_hd191089
function. As long as the names are matching between the two files, you can assign the names as what you want.
The log-likelihood script is in debrisdiskfm/lnlike.py, where the lnlike_hd191089
function defines the log-likelihood of your system. The lnlike_hd191089
function has two major sections: the ### Observations
section, and the ### (Forwarded) Models
section.
In the ### Observations
section, the observed data and the corresponding uncertainties are loaded.
In the ### (Forwarded) Models
section, the simulated data are loaded (and convolved with a point source point-spread-function) to simulate the observation.
The observed data and models are then combined to calculate log-likelihood. Note: the chi2
function in debrisdiskfm/lnlike.py returns the log-likelihood by default; and more importantly, for normal distribution, which is our assumption of how the unknown parameters are distributed, the chi-squared value is the log-likelihood added with some constants.
The script for calculating the log posterior distribution is in debrisdiskfm/lnpost.py, and the lnpost_hd191089
function is used for the HD191089 system.
The lnpost_hd191089
function takes the input variable names and values, calculate the MCFOST models using mcfostRun.run_hd191089
, then calculate the log-likelihood, then add the prior with the log-likelihood to obtain the posterior.
As mentioned in 2.2.3, the mcfostRun.run_hd191089
function is where the MCFOST models are generated. The script is in debrisdiskfm/mcfostRun.py, however, it can be generalized to use any radiative transfer modeling software, as long as the input parameter files are also correctly modified.
In the run_hd191089
function of debrisdiskfm/mcfostRun.py, there are 4 major sections: Section 1: Fixed Parameters
, Section 2: Variables
, Section 3: Parameter File for HD191089
, and Section 4: Run
.
Section 1 sets the fixed parameters for the system;
Section 2 takes the input var_names
and var_values
the modify the corresponding parameters;
Section 3 save the MCFOST parameter files for different instruments to paraPath
(the path can be hashed for a cluster, and this will prevent multiple MCFOST runs accessing the same folder, this is by default set to True
);
Section 4 run the MCFOST radiative transfer modeling using the parameter files generated from Section 1 to 3.
The are two exampels in debrisdiskfm: main_laptop.py for running the codes on a laptop/desktop which does not involve multiple nodes but handles multiple cores on a single node, and main_cluster.py which handles the case for multiple nodes on a computer cluster.
In main_cluster.py, if you are limited by the run time for a single job, this script is able to handle that by saving the progress then load it again in the next run. This is enabled using the backend function in the 3.0rc1 version of emcee with h5py installed. To enable this, please install the latest 3.0rc1 version of the emcee software by
git clone https://github.com/dfm/emcee.git
cd emcee
python setup.py install
The default status is stored in filename
, you can change the variable in that line in main_cluster.py per your own discretion. If the file is not deleted, then for a new run, emcee will automatically load it and start calculation from there.
The other parameters to change are: n_walkers
which defines the number of walkers (10 times of the dimension or more is suggested), and step
which denotes how many MCMC steps do you want to perform in this run.
To extract the information from the backend file, such as the corner plot, please refer to the emcee webpage for details.
Example: Deploy the code to a Slurm cluster with mpi4py:
Create a file named sub_mcmc_cluster
, with the contents:
#!/bin/bash -l
#SBATCH --partition=parallel
#SBATCH --job-name=mcmc_cluster
#SBATCH --time=0:10:0
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=6
#SBATCH --cpus-per-task=4
#SBATCH --mem=10G
export MKL_NUM_THREADS=4
mpiexec -n 12 python3 -W ignore main_cluster.py
which used 2 nodes, and 6 tasks on each node, with each task using 4 cores. Now submit it in the command line using
sbatch sub_mcmc_cluster
and wait for the results!
For more detailed explanation of the script, please go to DebrisDiskFM/cluster_example_emcee/.
BibTex:
@misc{debrisdiskFM,
author = { {Ren}, Bin and {Perrin}, Marshall },
title = {DebrisDiskFM, v1.0, Zenodo,
doi: \href{https://zenodo.org/badge/latestdoi/141328805}{10.5281/zenodo.2398963}. },
version = {1.0},
publisher = {Zenodo},
month = Dec,
year = 2018,
doi = {10.5281/zenodo.2398963},
url = {https://doi.org/10.5281/zenodo.2398963}
}