2SLS IV regression with Python
This module allows you 2SLS IV regression estimation on Python. It provides also a final summary report where you can check first stage results, second stage results, and weak identification test for instruments. The estimates are executed using my ols linear regression module that i also attach in this repository. You can also find it in a standalone repository in my github profile.
The 2SLS IV Regression model can be called by using the class two_sls in the main file. To call this class, you need the following inputs:
- dataset is the dataframe where you stored your data you want to use for the IV regression.
- dependent is name of the column of the dependent variable in your pandas dataset.
- regressors is the list of columns' name of exogenous regressors that you want to use for the regression. It is very important that you pass a list of strings even if there is just one variable for the category.
- endogenous is the column name of the endogenous variable in the datagrame. This is not a list but just a string.
- instruments is the list of columns' name of instruments that you want to use in addition of exogenous regressors.
- cons is True by default, but if you want to regress without intercept, just declare cons = False .
- fixed_eff by default is an empty list. However, you can pass a list with the string of all variables you want to use for fixed effects. For example, if you have a panel dataset of countries years, you can pass a list of string of the variable that stores countries and a string for the variable that stores years. BY NOW IT DOESN'T WORK SINCE IT IS NOT IMPLEMENTED AND TRIED. PLEASE DO NOT CHANGE THIS PARAMETER
-
To summarize the results, just call "your object name" . summary(). The report includes first stage table of results, second stage table of results, and weak identification test for the instrument, reporting Cragg-Donald Wald F statistic and Stock-Yogo critical values.
-
Use first_stage() to return a dictionary of usefull elements for the first stage regression like the ols_model ('model'), fitted values ('fitted'), and beta coefficients ('betas'*)
-
Use second_stage().get('std') to return a dictionary of usefull elements for the second stage regression. You can obtain beta coefficients ('beta') and variance covariance matrix ('var_matrix'). Other elements can be obtain. Check the code at the second stage function to see dictionary's keys.
-
Use weak_id_test() to get the Cragg-Donald Wald F statistic.
-
Use summary() if you want a table form summary of the estimation.
- For now, it is possible to use just one instrument for just one endogenous variable. Updates to complete the code are coming.
- To use the two stage module you will also need the ols linear regression module that you can find in this repository.
To import the model
import tsls_reg as tsls
Initialize the model
model = tsls.two_sls(dataset=df, dependent = 'wage', regressors= ['exper'],
endogenous= 'educ', instruments=['sibs'])
Print the results
print(model.summary())
And here is the output:
First stage regression
-------------------------------------------------------------------------------
educ coefficient se t p_value low 95 high 95
------ ------------- --------- ------ --------- --------- ---------
exper -0.221952 0.0142567 -15.57 0 -0.249895 -0.194009
sibs -0.200841 0.0270426 -7.43 0 -0.253845 -0.147838
cons 16.6257 0.188911 88.01 0 16.2555 16.996
-------------------------------------------------------------------------------
Second stage regression
-------------------------------------------------------------------------------
wage coefficient se t p_value low 95 high 95
------ ------------- --------- ----- --------- ---------- ---------
exper 32.1567 7.06488 4.55 0 18.3095 46.0038
educ 139.684 28.0369 4.98 0 84.7315 194.636
cons -1295.23 453.262 -2.86 0.004 -2183.62 -406.834
-------------------------------------------------------------------------------
Weak instruments identification test
----------------------------------------------------------------------------
Cragg-Donald Wald F statistic: 55.158218554314
Stock Yogo weak ID critical values: 10%:16; 15%:9; 20%:7; 25%:6
Reference: Stock-Yogo (2005)
----------------------------------------------------------------------------
Instrumented: ['educ']
Included instruments: ['exper']
Excluded instruments: ['sibs']
----------------------------------------------------------------------------
Process finished with exit code 0
- Stock J, Yogo M. Testing for Weak Instruments in Linear IV Regression. In: Andrews DWK Identification and Inference for Econometric Models. New York: Cambridge University Press ; 2005. pp. 80-108.
- Dataset from Wooldridge data sets: http://fmwww.bc.edu/ec-p/data/wooldridge/datasets.list.html
- Wooldridge, Jeffrey M. Econometric analysis of cross section and panel data. MIT press, 2010.