Skip to content

Commit

Permalink
Docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Lucas Einig committed Oct 2, 2024
1 parent 63feb59 commit ad28de9
Show file tree
Hide file tree
Showing 35 changed files with 1,855 additions and 692 deletions.
31 changes: 31 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Publish package to PyPI when a new version tag is pushed

on:
push:
tags:
- 'v[0-9]+.[0-9]+.[0-9]+'

jobs:
pypi-publish:
name: Publish release to PyPI
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/p/astro-ranch>
permissions:
id-token: write
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.x"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -q build
- name: Build package
run: |
python -m build
- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# InfoVar

[![PyPI version](https://badge.fury.io/py/infovar.svg)](https://badge.fury.io/py/infovar)
[![Documentation Status](https://readthedocs.org/projects/infovar/badge/?version=latest)](https://infovar.readthedocs.io/en/latest/?badge=latest)
![test coverage](./coverage.svg)
[![Documentation status](https://readthedocs.org/projects/infovar/badge/?version=latest)](https://infovar.readthedocs.io/en/latest/?badge=latest)
![](./coverage.svg)

The `infovar` Python package provides tools to efficiently study the informativity of variables on data of interest.

Expand Down
4 changes: 2 additions & 2 deletions coverage.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 3 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,15 @@

sys.path.insert(0, os.path.abspath("../"))

import importlib.metadata

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = "InfoVar"
copyright = "2024, Lucas Einig"
author = "Lucas Einig"
release = "0.2.0"
release = importlib.metadata.version("infovar")

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
Expand Down
21 changes: 0 additions & 21 deletions docs/coverage.svg

This file was deleted.

5 changes: 2 additions & 3 deletions docs/gallery-examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,10 @@ Gallery of examples

This gallery contains several application examples for the ``infovar`` package to illustrate diverse features.


**Discrete and continuous handler basic use:**

- ``discrete-handler.ipynb``: illustrate the basic features of `DiscreteHandler` on synthetic data
- ``continuous-handler.ipynb``: illustrate the basic features of `ContinuousHandler` on synthetic data
- ``discrete-handler.ipynb``: illustrates the basic features of ``DiscreteHandler`` on synthetic data
- ``continuous-handler.ipynb``: illustrates the basic features of ``ContinuousHandler`` on synthetic data

**California Housing real-world example:**

Expand Down
25 changes: 10 additions & 15 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,20 +1,15 @@
.. InfoVar documentation master file, created by
sphinx-quickstart on Thu Sep 19 09:37:55 2024.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to InfoVar's documentation
==================================

The `infovar` Python package provides tools to efficiently study the informativity of variables on data of interest.
The ``infovar`` Python package provides tools to efficiently study the informativity of variables on data of interest.


Context
=======

The informativity of a variable or set of variables is defined here as the ability of these variables, if known, to reduce the uncertainty we have about a quantity of interest. This uncertainty can be defined in several ways, for example in the sense of Shannon's information theory.

This is a ubiquitous problem in science in general, with very concrete applications in climatology, economics, psychology, sociology, and astrophysics, to name a few. Consequently, `InfoVar` has been designed to be very general.
This is a ubiquitous problem in science in general, with very concrete applications in climatology, economics, psychology, sociology, and astrophysics, to name a few. Consequently, *InfoVar* has been designed to be very general.

This package provides tools for quantifying the statistical dependence (e.g., mutual information, but other metrics are available) between continuous numerical data and estimating the associated error as well as the influence of the latter on the order of variables in terms of importance.

Expand Down Expand Up @@ -84,7 +79,7 @@ Statistics

In this project, we propose to measure the statistical dependence of variables based on the mutual information. Other metrics can also be used, such as the conditional differential entropy, which is closely related to mutual information, or canonical correlation coefficient.

Mutual information and conditional differential entropy are estimated nonparametrically using [Greg Ver Steeg's implementation](http://www.isi.edu/~gregv/npeet.html). More details are given in the `assessment` directory, which evaluates the properties of each available statistics and provides further mathematical context and references.
Mutual information and conditional differential entropy are estimated nonparametrically using [Greg Ver Steeg's implementation](http://www.isi.edu/~gregv/npeet.html). More details are given in the ``assessment`` directory, which evaluates the properties of each available statistics and provides further mathematical context and references.

If you're interested in other metrics, it's possible to add and use them.

Expand All @@ -100,9 +95,9 @@ To account for these uncertainties and to be able to compare different values pr
Estimation for different range of values
----------------------------------------

The heart of `InfoVar` lies in the fact that the informativity of a variable on a quantity of interest can vary according to the selected range of value of this quantity.
The heart of *InfoVar* lies in the fact that the informativity of a variable on a quantity of interest can vary according to the selected range of value of this quantity.

For example, if we're interested in house prices in California (see `examples/california-housing`), among a set of variables, geographical location (latitude, longitude) appears to be the most important pair of variables. However, if we restrict ourselves to the 10% most expensive homes, it appears that the number of rooms in the house becomes most useful. This type of observation is important, for example, from a data analysis point of view, but also in a variable selection context.
For example, if we're interested in house prices in California (see ``examples/california-housing``), among a set of variables, geographical location (latitude, longitude) appears to be the most important pair of variables. However, if we restrict ourselves to the 10% most expensive homes, it appears that the number of rooms in the house becomes most useful. This type of observation is important, for example, from a data analysis point of view, but also in a variable selection context.

More generally, taking into account these variations as a function of ranges of values of the variable of interest enables more refined analysis of phenomena. To help you understand, here are a few examples of possible applications.

Expand Down Expand Up @@ -130,20 +125,20 @@ It is also possible to perform the same analysis, but according to the value ran
- *Data of interest:* number of medals won by each country in each of the last 10 editions of the games.
- *Variables:* amount invested by the national Olympic committee, population, per capita income, unemployment rate.

The `InfoVar` allows you to perform sensitivity analysis in two ways:
*InfoVar* allows you to perform sensitivity analysis in two ways:
1. Define rigid intervals for the data that varies (example: houses priced below $150k, between $150 and $350k and above $350k).
2. Define a sliding window and calculate the evolution of the statistics almost continuously.

In case 1 (discrete case), the `DiscreteHandler` class provides all the important functions for calculating, storing and accessing results. In case 2 (continuous case), the `ContinuousHandler` class is used. The notebooks in `examples` give an example of the use of each of these two classes.
In case 1 (discrete case), the ``DiscreteHandler`` class provides all the important functions for calculating, storing and accessing results. In case 2 (continuous case), the ``ContinuousHandler`` class is used. The notebooks in ``examples`` give an example of the use of each of these two classes.


References
==========

[1] Einig, L & Palud, P. & Roueff, A. & Pety, J. & Bron, E. & Le Petit, F. & Gerin, M. & Chanussot, J. & Chainais, P. & Thouvenin, P.-A. & Languignon, D. & Bešlić, I. & Coudé, S. & Mazurek, H. & Orkisz, J. H. & G. Santa-Maria, M. & Ségal, L. & Zakardjian, A. & Bardeau, S. & Demyk, K. & de Souza Magalhẽs, V. & Javier R. Goicoechea & Gratier, P. & V. Guzmán, V. & Hughes, A. & Levrier, F. & Le Bourlot, J. & Darek C. Lis & Liszt, H. S. & Peretto, N. & Roueff, E & Sievers, A. (2024).
**Quantifying the informativity of emission lines to infer physical conditions in giant molecular clouds. I. Application to model predictions.** *Astronomy & Astrophysics.*
10.xxxx/xxxx-xxxx/xxxxxxxxx.
`10.1051/0004-6361/202451588_ <https://doi.org/10.1051/0004-6361/202451588>`.

[2] Einig, L et al (2024, in prep.).
[2] Einig, L et al (in prep.).
**Quantifying the informativity of emission lines to infer physical conditions in giant molecular clouds. II. Training robust models from selected observations.** *Astronomy & Astrophysics.*
10.xxxx/xxxx-xxxx/xxxxxxxxx.
`10.xxxx/xxxx-xxxx/xxxxxxxxx_ <todo.com>`.
1 change: 1 addition & 0 deletions docs/infovar.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Subpackages
:maxdepth: 4

infovar.handlers
infovar.processing
infovar.stats

Module contents
Expand Down
7 changes: 2 additions & 5 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,8 @@ sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.6
sphinxcontrib-serializinghtml==1.1.9
matplotlib-inline==0.1.6
matplotlib==3.8.0
myst_parser==2.0.0
numpy==1.26.1
scipy==1.13.1
scikit-learn==1.5.1
tqdm==4.66.1
ipykernel==6.26.0
jupyter==1.0.0

-e . # pip install -e .
5 changes: 4 additions & 1 deletion examples/.gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# California housing inputs and outputs
# Handlers introduction outputs
handlers/data

# California housing outputs
california-housing/data-out
california-housing/data-out-continuous

Expand Down
21 changes: 18 additions & 3 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,21 @@
# Example Jupyter notebooks

This directory contains example notebooks illustrating the use of the `InfoVar` package for two different use cases.
This directory contains example notebooks illustrating the use of the *InfoVar* package for two different use cases.

- `california-housing.ipynb`: seeks the most useful variables or set of variables to estimate accurately the price of houses in California.
- `multivariate-functions.ipynb`: seeks the most informative subset of variables in real multivariate analytic functions.
**Discrete and continuous handler basic use:**

- ``discrete-handler.ipynb``: illustrate the basic features of `DiscreteHandler` on synthetic data
- ``continuous-handler.ipynb``: illustrate the basic features of `ContinuousHandler` on synthetic data

**California Housing real-world example:**

- ``californa-housing.ipynb``: most informative feature to predict house values in California

**Statistical details:**

- ``stat:bias-variance.ipynb``: controlling the bias and estimating the variance of canonical correlations and mutual information estimators
- ``stat:canonical-corr.ipynb``: canonical correlations and their relation to mutual information in detail
- ``stat:degeneracy.ipynb``: influence of duplicate data on estimators of differential conditional entropy and mutual information
- ``stat:distribution.ipynb``: influence of data distribution on conditional differential entropy, mutual information and canonical correlations
- ``stat:multidim.ipynb``: behavior and interpretation of mutual information for multidimensional variables or targets
- ``stat:ranking``: variable ranking in details
201 changes: 95 additions & 106 deletions examples/california-housing.ipynb

Large diffs are not rendered by default.

File renamed without changes.
Loading

0 comments on commit ad28de9

Please sign in to comment.