Linear regression of the simple least-squares variety has been a canonical method used to characterize the relation between two variables, but its utility is limited by the fact that it reduces full population statistics down to three numbers: a slope, normalization and variance/standard deviation. With large empirical or simulated samples we can perform a more sensitive analysis using a localized linear regression method (see, Farahi et al. 2018 and Anbajagane et al. 2020). The KLLR method generates estimates of conditional statistics in terms of the local the slope, normalization, and covariance. Such a method provides a more nuanced description of population statistics appropriate for the very large samples with non-linear trends.
This code is an implementation of the Kernel Localized Linear Regression (KLLR) method that performs a localized Linear regression described in Farahi et al. (2022). It employs bootstrap re-sampling technique to estimate the uncertainties. We also provide a set of visualization tools so practitioners can seamlessly generate visualization of the model parameters.
If you use KLLR or derivates based on it, please cite the following papers which introduced the tool:
KLLR: A scale-dependent, multivariate model class for regression analysis
A list of other publications that employed KLLR
in their data analysis.
Wu et al., Optical selection bias and projection effects in stacked galaxy cluster weak lensing, MNRAS (2022).
W. K. Black, A. E. Evrard, Red Dragon: a redshift-evolving Gaussian mixture model for galaxies, MNRAS (2022).
D. Anbajagane, A. Evrard, A. Farahi, Baryonic Imprints on DM Halos: Population Statistics from Dwarf Galaxies to Galaxy Clusters, MNRAS (2022).
D. Anbajagane et al., Galaxy Velocity Bias in Cosmological Simulations: Towards Percent-level Calibration, MNRAS (2022).
A. Nachmann, W. K. Black, Intra-cluster Summed Galaxy Colors, arXiv preprint (2021).
D. Anbajagane et al., Stellar Property Statistics of Massive Halos from Cosmological Hydrodynamics Simulations: Common Kernel Shapes, MNRAS (2020).
numpy
, scipy
, matplotlib
, pandas
, sklearn
, tqdm
[1]. A. Farahi, et al. "KLLR: A scale-dependent, multivariate model class for regression analysis." (2022, arXiv: 2202.09903)
[2]. A. Farahi, et al. "Localized massive halo properties in BAHAMAS and MACSIS simulations: scalings, lognormality, and covariance." Monthly Notices of the Royal Astronomical Society 478.2 (2018): 2618-2632.
[3]. D. Anbajagane, et al. Stellar Property Statistics of Massive Halos from Cosmological Hydrodynamics Simulations: Common Kernel Shapes. No. arXiv: 2001.02283. 2020.
A.F. is supported by the University of Texas at Austin. D.A. is supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE 1746045.
Run the following to install:
pip install kllr
To start using KLLR, simply use from KLLR import kllr_model
to
access the primary functions and class. The exact requirements for the inputs are
listed in the docstring of the kllr_model() class further below.
An example for using KLLR looks like this:
from kllr import kllr_model
lm = kllr_model(kernel_type = 'gaussian', kernel_width = 0.2)
xrange, yrange_mean, intercept, slope, scatter, skew, kurt = lm.fit(x, y, bins=11)
Illustration of the KLLR
fit with varying kernel size.