diff --git a/README.rst b/README.rst index 376278bd37..7c3d1e04c1 100755 --- a/README.rst +++ b/README.rst @@ -1,4 +1,4 @@ -|BuildStatus|_ |PyPiVersion|_ |PythonSupport|_ |Downloads|_ +|BuildStatus|_ |PyPiVersion|_ |PythonSupport|_ |Downloads|_ |discord|_ .. |PyPiVersion| image:: https://img.shields.io/pypi/v/dowhy.svg .. _PyPiVersion: https://pypi.org/project/dowhy/ @@ -6,96 +6,73 @@ .. |PythonSupport| image:: https://img.shields.io/pypi/pyversions/dowhy.svg .. _PythonSupport: https://pypi.org/project/dowhy/ -.. |BuildStatus| image:: https://github.com/microsoft/dowhy/workflows/Python%20package/badge.svg -.. _BuildStatus: https://github.com/microsoft/dowhy/actions +.. |BuildStatus| image:: https://github.com/py-why/dowhy/actions/workflows/ci.yml/badge.svg +.. _BuildStatus: https://github.com/py-why/dowhy/actions .. |Downloads| image:: https://pepy.tech/badge/dowhy .. _Downloads: https://pepy.tech/project/dowhy +.. |discord| image:: https://img.shields.io/discord/818456847551168542 +.. _discord: https://discord.gg/cSBGb3vsZb .. image:: dowhy-logo-large.png :width: 50% :align: center -\ -=============================== - - Introducing DoWhy and the 4 steps of causal inference | `Microsoft Research Blog `_ | `Video Tutorial `_ | `Arxiv Paper `_ | `Arxiv Paper (GCM-extension) `_ | `Slides `_ - - Read the `docs `_ | Try it online! |Binder|_ - -.. |Binder| image:: https://mybinder.org/badge_logo.svg -.. _Binder: https://mybinder.org/v2/gh/microsoft/dowhy/main?filepath=docs%2Fsource%2F - -**Case Studies using DoWhy**: `Hotel booking cancellations `_ | `Effect of customer loyalty programs `_ | `Optimizing article headlines `_ | `Effect of home visits on infant health (IHDP) `_ | `Causes of customer churn/attrition `_ - -.. image:: https://raw.githubusercontent.com/microsoft/dowhy/main/docs/images/dowhy-schematic.png -As computing systems are more frequently and more actively intervening in societally critical domains such as healthcare, education, and governance, it is critical to correctly predict and understand the causal effects of these interventions. Without an A/B test, conventional machine learning methods, built on pattern recognition and correlational analyses, are insufficient for decision-making. +`Checkout the documentation `_ +=============================================================== -Much like machine learning libraries have done for prediction, **"DoWhy" is a Python library that aims to spark causal thinking and analysis**. DoWhy provides a principled four-step interface for causal inference that focuses on explicitly modeling causal assumptions and validating them as much as possible. The key feature of DoWhy is its state-of-the-art refutation API that can automatically test causal assumptions for any estimation method, thus making inference more robust and accessible to non-experts. DoWhy supports estimation of the average causal effect for backdoor, frontdoor, instrumental variable and other identification methods, and estimation of the conditional effect (CATE) through an integration with the EconML library. +- The documentation, user guide, sample notebooks and other information are available at + `https://py-why.github.io/dowhy `_ +- DoWhy is part of the `PyWhy Ecosystem `_. For more tools and libraries related to causality, checkout the `PyWhy GitHub organization `_! +- For any questions, comments, or discussions about specific use cases, join our community on `Discord `_ (|discord|_) +- Jump right into some case studies: + - Effect estimation: `Hotel booking cancellations `_ | `Effect of customer loyalty programs `_ | `Optimizing article headlines `_ | `Effect of home visits on infant health (IHDP) `_ | `Causes of customer churn/attrition `_ + - Root cause analysis and explanations: `Root Cause Analysis with DoWhy, an Open Source Python Library for Causal Machine Learning `_ | `Finding the Root Cause of Elevated Latencies in a Microservice Architecture `_ | `Finding Root Causes of Changes in a Supply Chain `_ -For a quick introduction to causal inference, check out `amit-sharma/causal-inference-tutorial `_. We also gave a more comprehensive tutorial at the ACM Knowledge Discovery and Data Mining (`KDD 2018 `_) conference: `causalinference.gitlab.io/kdd-tutorial `_. For an introduction to the four steps of causal inference and its implications for machine learning, you can access this video tutorial from Microsoft Research: `DoWhy Webinar `_. +For more example notebooks, see `here! `_ -Documentation for DoWhy is available at `py-why.github.io/dowhy `_. +Introduction & Key Features +=========================== +Decision-making involves understanding how different variables affect each other and predicting the outcome when some of them are changed to new values. For instance, given an outcome variable, one may be interested in determining how a potential action(s) may affect it, understanding what led to its current value, or simulate what would happen if some variables are changed. Answering such questions requires causal reasoning. DoWhy is a Python library that guides you through the various steps of causal reasoning and provides a unified interface for answering causal questions. -.. i here comment toctree:: -.. i here comment :maxdepth: 4 -.. i here comment :caption: Contents: -.. contents:: **Contents** +DoWhy provides a wide variety of algorithms for effect estimation, prediction, quantification +of causal influences, diagnosis of causal structures, root cause analysis, interventions and +counterfactuals. A key feature of DoWhy is its refutation and falsification API that can test causal assumptions for any estimation method, +thus making inference more robust and accessible to non-experts. -News ------ -**2022.05.27**: +**Graphical Causal Models and Potential Outcomes: Best of both worlds** -* **DoWhy now part of PyWhy** - - We have moved DoWhy from microsoft/dowhy to py-why/dowhy. While GitHub will automatically - redirect your git command for cloning, pulling, etc., we recommend updating git remotes and bookmarks. Please note - that the **documentation** has now moved to https://py-why.github.io/dowhy with **no** redirect from the old URL. - -* **Support for GCM-based inference** - - We have started adding support for graphical causal model-based inference (or in short GCM-based). At the moment, - this includes support for interventions, counterfactuals, and attributing distribution changes. As part of this, - we also added features for Shapley value estimation and independence tests. We're still in the process of fleshing - everything out, including `documentation `_. Some of it is already on `main - `_, other parts are on feature branches (prefixed with ``gcm-``) with open - pull-requests, other parts will appear as new pull-requests in the next couple of weeks. Be sure to watch this space - here as we quickly expand functionality and documentation. - -The need for causal inference ----------------------------------- +DoWhy builds on two of the most powerful frameworks for causal inference: +graphical causal models and potential outcomes. For effect estimation, it uses graph-based criteria and do-calculus for +modeling assumptions and identifying a non-parametric causal effect. For estimation, it switches to methods based +primarily on potential outcomes. -Predictive models uncover patterns that connect the inputs and outcome in observed data. To intervene, however, we need to estimate the effect of changing an input from its current value, for which no data exists. Such questions, involving estimating a *counterfactual*, are common in decision-making scenarios. +For causal questions beyond effect estimation, it uses the power of graphical causal models by modeling the data +generation process via explicit causal mechanisms at each node, which, for instance, unlocks capabilities to attribute +observed effects to particular variables or estimate point-wise counterfactuals. -* Will it work? - * Does a proposed change to a system improve people's outcomes? -* Why did it work? - * What led to a change in a system's outcome? -* What should we do? - * What changes to a system are likely to improve outcomes for people? -* What are the overall effects? - * How does the system interact with human behavior? - * What is the effect of a system's recommendations on people's activity? +For a quick introduction to causal inference, check out `amit-sharma/causal-inference-tutorial `_ +We also gave a more comprehensive tutorial at the ACM Knowledge Discovery and Data Mining (`KDD 2018 `_) conference: `causalinference.gitlab.io/kdd-tutorial `_. +For an introduction to the four steps of causal inference and its implications for machine learning, you can access this video tutorial from Microsoft Research `DoWhy Webinar `_ and for an introduction to the graphical causal model API, see the `PyCon presentation on Root Cause Analysis with DoWhy `_. -Answering these questions requires causal reasoning. While many methods exist -for causal inference, it is hard to compare their assumptions and robustness of results. DoWhy makes three contributions, +Key Features +~~~~~~~~~~~~ -1. Provides a principled way of modeling a given problem as a causal graph so - that all assumptions are explicit. -2. Provides a unified interface for many popular causal inference methods, combining the two major frameworks of graphical models and potential outcomes. -3. Automatically tests for the validity of assumptions if possible and assesses - the robustness of the estimate to violations. +.. image:: https://raw.githubusercontent.com/py-why/dowhy/main/docs/images/dowhy-features.png -To see DoWhy in action, check out how it can be applied to estimate the effect -of a subscription or rewards program for customers [`Rewards notebook -`_] and for implementing and evaluating causal inference methods on benchmark datasets like the `Infant Health and Development Program (IHDP) `_ dataset, `Infant Mortality (Twins) `_ dataset, and the `Lalonde Jobs `_ dataset. +DoWhy supports the following causal tasks: +- Effect estimation (identification, average causal effect, conditional average causal effect, instrumental variables and more) +- Quantify causal influences (mediation analysis, direct arrow strength, intrinsic causal influence) +- What-if analysis (generate samples from interventional distribution, estimate counterfactuals) +- Root cause analysis and explanations (attribute anomalies to their causes, find causes for changes in distributions, estimate feature relevance and more) -Installation -------------- +For more details and how to use these methods in practice, checkout the documentation at `https://py-why.github.io/dowhy `_ +Quick Start +=========== DoWhy support Python 3.8+. To install, you can use pip, poetry, or conda. **Latest Release** @@ -115,9 +92,7 @@ Install the latest `release `__ using poetry. Install the latest `release `__ using conda. .. code:: shell - conda install -c conda-forge dowhy - If you face "Solving environment" problems with conda, then try :code:`conda update --all` and then install dowhy. If that does not work, then use :code:`conda config --set channel_priority false` and try to install again. If the problem persists, please `add your issue here `_. **Development Version** @@ -125,7 +100,6 @@ If you face "Solving environment" problems with conda, then try :code:`conda upd If you prefer to use the latest dev version, your dependency management tool will need to point at our GitHub repository. .. code:: shell - pip install git+https://github.com/py-why/dowhy@main **Requirements** @@ -141,7 +115,6 @@ If you face any problems, try installing dependencies manually. Optionally, if you wish to input graphs in the dot format, then install pydot (or pygraphviz). - For better-looking graphs, you can optionally install pygraphviz. To proceed, first install graphviz and then pygraphviz (on Ubuntu and Ubuntu WSL). @@ -152,11 +125,10 @@ first install graphviz and then pygraphviz (on Ubuntu and Ubuntu WSL). pip install pygraphviz --install-option="--include-path=/usr/include/graphviz" \ --install-option="--library-path=/usr/lib/graphviz/" -Sample causal inference analysis in DoWhy -------------------------------------------- -Most DoWhy -analyses for causal inference take 4 lines to write, assuming a -pandas dataframe df that contains the data: +Example: Effect identification and estimation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Most causal tasks in DoWhy only require a few lines of code to write. Here, we exemplarily estimate the causal effect of +a treatment on an outcome variable: .. code:: python @@ -171,8 +143,9 @@ pandas dataframe df that contains the data: num_samples=10000, treatment_is_binary=True) -DoWhy supports two formats for providing the causal graph: `gml `_ (preferred) and `dot `_. After loading in the data, we use the four main operations in DoWhy: *model*, -*estimate*, *identify* and *refute*: +A causal graph can be defined in different way, but the most common way is via `NetworkX `_. +After loading in the data, we use the four main operations for effect estimation in DoWhy: *model*, *identify*, +*estimate* and *refute*: .. code:: python @@ -181,7 +154,7 @@ DoWhy supports two formats for providing the causal graph: `gml `_ notebook. You can also use Conditional Average Treatment Effect (CATE) estimation methods from other libraries such as EconML and CausalML, as shown in the `Conditional Treatment Effects `_ notebook. For more examples of using DoWhy, check out the Jupyter notebooks in `docs/source/example_notebooks `_ or try them online at `Binder `_. +you can inspect the untested assumptions, identified estimands (if any), and the +estimate (if any). Here's a sample output of the linear regression estimator: +.. image:: https://raw.githubusercontent.com/py-why/dowhy/main/docs/images/regression_output.png + :width: 80% -GCM-based inference ----------------------------------- +For a full code example, check out the `Getting Started with DoWhy `_ notebook. -Graphical causal model-based inference, or GCM-based inference for short, is an addition to DoWhy. For -details, check out the `documentation for the gcm sub-package `_. The basic -recipe for this API works as follows: +You can also use Conditional Average Treatment Effect (CATE) estimation methods from `EconML `_, as shown in the `Conditional Treatment Effects `_ notebook. Here's a code snippet. .. code:: python - # 1. Modeling cause-effect relationships as a structural causal model - # (causal graph + functional causal models): - scm = gcm.StructuralCausalModel(nx.DiGraph([('X', 'Y'), ('Y', 'Z')])) # X -> Y -> Z - scm.set_causal_mechanism('X', gcm.EmpiricalDistribution()) - scm.set_causal_mechanism('Y', gcm.AdditiveNoiseModel(gcm.ml.create_linear_regressor())) - scm.set_causal_mechanism('Z', gcm.AdditiveNoiseModel(gcm.ml.create_linear_regressor())) - - # 2. Fitting the SCM to the data: - gcm.fit(scm, data) - - # 3. Answering a causal query based on the SCM: - results = gcm.(scm, ...) - -Where can be one of multiple functions explained in `Answering Causal Questions `_. + from sklearn.preprocessing import PolynomialFeatures + from sklearn.linear_model import LassoCV + from sklearn.ensemble import GradientBoostingRegressor + dml_estimate = model.estimate_effect(identified_estimand, method_name="backdoor.econml.dml.DML", + control_value = 0, + treatment_value = 1, + target_units = lambda df: df["X0"]>1, + confidence_intervals=False, + method_params={ + "init_params":{'model_y':GradientBoostingRegressor(), + 'model_t': GradientBoostingRegressor(), + 'model_final':LassoCV(), + 'featurizer':PolynomialFeatures(degree=1, include_bias=True)}, + "fit_params":{}}) -A high-level Pandas API ------------------------ +Example: Graphical causal model (GCM) based inference +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +DoWhy's graphical causal model framework offers powerful tools to address causal questions beyond effect estimation. +It is based on Pearl's graphical causal model framework and models the causal data generation process of each variable +explicitly via *causal mechanisms* to support a wide range of causal algorithms. For more details, see the book +`Elements of Causal Inference `_. -We've made an even simpler API for dowhy which is a light layer on top of the standard one. The goal is to make causal analysis much more like regular exploratory analysis. To use this API, simply -import :code:`dowhy.api`. This will magically add the :code:`causal` namespace to your -:code:`pandas.DataFrame` s. Then, -you can use the namespace as follows. +Complex causal queries, such as attributing observed anomalies to nodes in the system, can be performed with just a few +lines of code: .. code:: python - import dowhy.api - import dowhy.datasets - - data = dowhy.datasets.linear_dataset(beta=5, - num_common_causes=1, - num_instruments = 0, - num_samples=1000, - treatment_is_binary=True) - - # data['df'] is just a regular pandas.DataFrame - data['df'].causal.do(x='v0', # name of treatment variable - variable_types={'v0': 'b', 'y': 'c', 'W0': 'c'}, - outcome='y', - common_causes=['W0']).groupby('v0').mean().plot(y='y', kind='bar') - -.. image:: https://raw.githubusercontent.com/microsoft/dowhy/main/docs/images/do_barplot.png - -For some methods, the :code:`variable_types` field must be specified. It should be a :code:`dict`, where the keys are -variable names, and values are 'o' for ordered discrete, 'u' for un-ordered discrete, 'd' for discrete, or 'c' -for continuous. - -**Note:If the** :code:`variable_types` **is not specified we make use of the following implicit conversions:** -:: - - int -> 'c' - float -> 'c' - binary -> 'b' - category -> 'd' - -**Currently we have not added support for timestamps.** - -The :code:`do` method in the causal namespace generates a random sample from $P(outcome|do(X=x))$ of the -same length as your data set, and returns this outcome as a new :code:`DataFrame`. You can continue to perform -the usual :code:`DataFrame` operations with this sample, and so you can compute statistics and create plots -for causal outcomes! - -The :code:`do` method is built on top of the lower-level :code:`dowhy` objects, so can still take a graph and perform -identification automatically when you provide a graph instead of :code:`common_causes`. - -For more details, check out the `Pandas API -`_ notebook or the `Do Sampler `_ -notebook. - -Graphical Models and Potential Outcomes: Best of both worlds -============================================================ -DoWhy builds on two of the most powerful frameworks for causal inference: -graphical models and potential outcomes. It uses graph-based criteria and -do-calculus for modeling assumptions and identifying a non-parametric causal effect. -For estimation, it switches to methods based primarily on potential outcomes. - -A unifying language for causal inference ----------------------------------------- + import networkx as nx, numpy as np, pandas as pd + from dowhy import gcm -DoWhy is based on a simple unifying language for causal inference. Causal -inference may seem tricky, but almost all methods follow four key steps: + # Let's generate some "normal" data we assume we're given from our problem domain: + X = np.random.normal(loc=0, scale=1, size=1000) + Y = 2 * X + np.random.normal(loc=0, scale=1, size=1000) + Z = 3 * Y + np.random.normal(loc=0, scale=1, size=1000) + data = pd.DataFrame(dict(X=X, Y=Y, Z=Z)) -1. Model a causal inference problem using assumptions. -2. Identify an expression for the causal effect under these assumptions ("causal estimand"). -3. Estimate the expression using statistical methods such as matching or instrumental variables. -4. Finally, verify the validity of the estimate using a variety of robustness checks. - -This workflow can be captured by four key verbs in DoWhy: - -- model -- identify -- estimate -- refute - -Using these verbs, DoWhy implements a causal inference engine that can support -a variety of methods. *model* encodes prior knowledge as a formal causal graph, *identify* uses -graph-based methods to identify the causal effect, *estimate* uses -statistical methods for estimating the identified estimand, and finally *refute* -tries to refute the obtained estimate by testing robustness to assumptions. - -Key differences compared to available causal inference software ----------------------------------------------------------------- -DoWhy brings three key differences compared to available software for causal inference: - -**Explicit identifying assumptions** - Assumptions are first-class citizens in DoWhy. - - Each analysis starts with a - building a causal model. The assumptions can be viewed graphically or in terms - of conditional independence statements. Wherever possible, DoWhy can also - automatically test for stated assumptions using observed data. - -**Separation between identification and estimation** - Identification is the causal problem. Estimation is simply a statistical problem. - - DoWhy - respects this boundary and treats them separately. This focuses the causal - inference effort on identification, and frees up estimation using any - available statistical estimator for a target estimand. In addition, multiple - estimation methods can be used for a single identified_estimand and - vice-versa. - -**Automated robustness checks** - What happens when key identifying assumptions may not be satisfied? - - The most critical, and often skipped, part of causal analysis is checking the - robustness of an estimate to unverified assumptions. DoWhy makes it easy to - automatically run sensitivity and robustness checks on the obtained estimate. - -Finally, DoWhy is easily extensible, allowing other implementations of the -four verbs to co-exist (e.g., we support implementations of the *estimation* verb from -EconML and CausalML libraries). The four verbs are mutually independent, so their -implementations can be combined in any way. - - - -Below are more details about the current implementation of each of these verbs. - -Four steps of causal inference -=============================== - -I. Model a causal problem ------------------------------ - -DoWhy creates an underlying causal graphical model for each problem. This -serves to make each causal assumption explicit. This graph need not be -complete---you can provide a partial graph, representing prior -knowledge about some of the variables. DoWhy automatically considers the rest -of the variables as potential confounders. - -Currently, DoWhy supports two formats for graph input: `gml `_ (preferred) and -`dot `_. We strongly suggest to use gml as the input format, as it works well with networkx. You can provide the graph either as a .gml file or as a string. If you prefer to use dot format, you will need to install additional packages (pydot or pygraphviz, see the installation section above). Both .dot files and string format are supported. - -While not recommended, you can also specify common causes and/or instruments directly -instead of providing a graph. - -Supported formats for specifying causal assumptions -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -* **Graph**: Provide a causal graph in either gml or dot format. Can be a text file - or a string. -* **Named variable sets**: Instead of the graph, provide variable names that - correspond to relevant categories, such as common causes, instrumental variables, effect - modifiers, frontdoor variables, etc. - -Examples of how to instantiate a causal model are in the `Getting Started -`_ -notebook. - -.. i comment image:: causal_model.png - -II. Identify a target estimand under the model ----------------------------------------------- - -Based on the causal graph, DoWhy finds all possible ways of identifying a desired causal effect based on -the graphical model. It uses graph-based criteria and do-calculus to find -potential ways find expressions that can identify the causal effect. - -Supported identification criteria -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -* Back-door criterion -* Front-door criterion -* Instrumental Variables -* Mediation (Direct and indirect effect identification) - -Different notebooks illustrate how to use these identification criteria. Check -out the `Simple Backdoor `_ notebook for the back-door criterion, and the `Simple IV `_ notebook for the instrumental variable criterion. - -III. Estimate causal effect based on the identified estimand ------------------------------------------------------------- + # 1. Modeling cause-effect relationships as a structural causal model + # (causal graph + functional causal models): + causal_model = gcm.StructuralCausalModel(nx.DiGraph([('X', 'Y'), ('Y', 'Z')])) # X -> Y -> Z + gcm.auto.assign_causal_mechanisms(causal_model, data) -DoWhy supports methods based on both back-door criterion and instrumental -variables. It also provides a non-parametric confidence intervals and a permutation test for testing -the statistical significance of obtained estimate. + # 2. Fitting the SCM to the data: + gcm.fit(causal_model, data) -Supported estimation methods -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + # Optional: Evaluate causal model + print(gcm.evaluate_causal_model(causal_model, data)) -* Methods based on estimating the treatment assignment - * Propensity-based Stratification - * Propensity Score Matching - * Inverse Propensity Weighting + # Step 3: Perform a causal analysis. + # results = gcm.(causal_model, ...) + # For instance, root cause analysis: + anomalous_sample = pd.DataFrame(dict(X=[0.1], Y=[6.2], Z=[19])) # Here, Y is the root cause. -* Methods based on estimating the outcome model - * Linear Regression - * Generalized Linear Models + # "Which node is the root cause of the anomaly in Z?": + anomaly_attribution = gcm.attribute_anomalies(causal_model, "Z", anomalous_sample) -* Methods based on the instrumental variable equation - * Binary Instrument/Wald Estimator - * Two-stage least squares - * Regression discontinuity + # Or sampling from an interventional distribution. Here, under the intervention do(Y := 2). + samples = gcm.interventional_samples(causal_model, interventions={'Y': lambda y: 2}, num_samples_to_draw=100) -* Methods for front-door criterion and general mediation - * Two-stage linear regression +The GCM framework offers many more features beyond these examples. For a full code example, check out the `Online Shop example notebook `_. -Examples of using these methods are in the `Estimation methods -`_ -notebook. +For more functionalities, example applications of DoWhy and details about the outputs, see the `User Guide `_ or +checkout `Jupyter notebooks `_. -Using EconML and CausalML estimation methods in DoWhy -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -It is easy to call external estimation methods using DoWhy. Currently we -support integrations with the `EconML `_ and `CausalML `_ packages. Here's an example -of estimating conditional treatment effects using EconML's double machine -learning estimator. +More Information & Resources +============================ +`Microsoft Research Blog `_ | `Video Tutorial for Effect Estimation `_ | `Video Tutorial for Root Cause Analysis `_ | `Arxiv Paper `_ | `Arxiv Paper (Graphical Causal Model extension) `_ | `Slides `_ -.. code:: python - - from sklearn.preprocessing import PolynomialFeatures - from sklearn.linear_model import LassoCV - from sklearn.ensemble import GradientBoostingRegressor - dml_estimate = model.estimate_effect(identified_estimand, method_name="backdoor.econml.dml.DML", - control_value = 0, - treatment_value = 1, - target_units = lambda df: df["X0"]>1, - confidence_intervals=False, - method_params={ - "init_params":{'model_y':GradientBoostingRegressor(), - 'model_t': GradientBoostingRegressor(), - 'model_final':LassoCV(), - 'featurizer':PolynomialFeatures(degree=1, include_bias=True)}, - "fit_params":{}} - ) - - -More examples are in the `Conditional Treatment Effects with DoWhy -`_ notebook. - -IV. Refute the obtained estimate -------------------------------------- -Having access to multiple refutation methods to validate an effect estimate from a -causal estimator is -a key benefit of using DoWhy. - -Supported refutation methods -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -* **Add Random Common Cause**: Does the estimation method change its estimate after - we add an independent random variable as a common cause to the dataset? - (*Hint: It should not*) -* **Placebo Treatment**: What happens to the estimated causal effect when we - replace the true treatment variable with an independent random variable? - (*Hint: the effect should go to zero*) -* **Dummy Outcome**: What happens to the estimated causal effect when we replace - the true outcome variable with an independent random variable? (*Hint: The - effect should go to zero*) -* **Simulated Outcome**: What happens to the estimated causal effect when we - replace the dataset with a simulated dataset based on a known data-generating - process closest to the given dataset? (*Hint: It should match the effect parameter - from the data-generating process*) -* **Add Unobserved Common Causes**: How sensitive is the effect estimate when we - add an additional common cause (confounder) to the dataset that is correlated - with the treatment and the outcome? (*Hint: It should not be too sensitive*) -* **Data Subsets Validation**: Does the estimated effect change significantly when - we replace the given dataset with a randomly selected subset? (*Hint: It - should not*) -* **Bootstrap Validation**: Does the estimated effect change significantly when we - replace the given dataset with bootstrapped samples from the same dataset? (*Hint: It should not*) - -Examples of using refutation methods are in the `Refutations `_ notebook. For an advanced refutation that uses a simulated dataset based on user-provided or learnt data-generating processes, check out the `Dummy Outcome Refuter `_ notebook. -As a practical example, `this notebook `_ shows an application of refutation methods on evaluating effect estimators for the Infant Health and Development Program (IHDP) and Lalonde datasets. Citing this package -==================== +~~~~~~~~~~~~~~~~~~~ If you find DoWhy useful for your work, please cite **both** of the following two references: - Amit Sharma, Emre Kiciman. DoWhy: An End-to-End Library for Causal Inference. 2020. https://arxiv.org/abs/2011.04216 @@ -518,15 +272,12 @@ Bibtex:: year={2022} } -Roadmap -======= -The `projects `_ page lists the next steps for DoWhy. If you would like to contribute, have a look at the current projects. If you have a specific request for DoWhy, please `raise an issue `_. - -Contributing -============ -This project welcomes contributions and suggestions. For a guide to contributing and a list of all contributors, check out `CONTRIBUTING.md `_ and our `docs for contributing code `_. Our `contributor code of conduct is available here `_. You can also join the DoWhy development channel on Discord: |discord|_ +Issues +~~~~~~ +If you encounter an issue or have a specific request for DoWhy, please `raise an issue `_. -.. |discord| image:: https://img.shields.io/discord/818456847551168542 -.. _discord: https://discord.gg/cSBGb3vsZb +Contributing +~~~~~~~~~~~~ +This project welcomes contributions and suggestions. For a guide to contributing and a list of all contributors, check out `CONTRIBUTING.md `_ and our `docs for contributing code `_. Our `contributor code of conduct is available here `_. diff --git a/docs/images/dowhy-features.png b/docs/images/dowhy-features.png new file mode 100644 index 0000000000..f7bfa7091f Binary files /dev/null and b/docs/images/dowhy-features.png differ