-
Notifications
You must be signed in to change notification settings - Fork 25
Linear Regression
arahuja edited this page Jan 9, 2014
·
4 revisions
##Statsmodels
Statsmodels is a relatively new package, but provides better utilities for investigating the results of a model. It use's Patsy to provide R formula syntax
A formula allows you to write a functional relationship between variables.
Example:
Y ~ X1 + X2 + X3
It automatically assumes there is an intercept term. You can make this explicit by using
Y ~ 1 + X1 + X2 + X3
As you can see +
is not acting as an addition operator but as a separator between other variables. There are other operators that lose their algebraic meaning in a formula. :
adds the interaction of two variables. *
adds the original terms as well as their interaction effect.
import statsmodels.formula.api as sm
import pd as pd
data = pd.read_csv("http://data.princeton.edu/wws509/datasets/salary.dat", sep='\s+')
model = sm.ols(formula="sl ~ yr", data=data).fit()
model.summary()
model = sm.ols(formula="sl ~ sx + yr + rk", data=data).fit()
model.summary()
model = sm.ols(formula="sl ~ sx + yr + rk", data=data).fit()
model.summary()
from patsy import dmatrices
y, X = dmatrices('sl ~ sx + yr + rk', data=data, return_type='dataframe')
###Sklearn
Scikits-learn also offer the same, but also provides regularization operations and more robust methods.
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model = model.fit(X,y)
model.score(X,y)
from sklearn import linear_model
model = linear_model.Ridge(alpha = .5)
model.fit(X,y)
print model.coef_
from sklearn import linear_model
model = linear_model.RidgeCV(alphas=[0.1, 1.0, 10.0])
model.fit(X,y)
print model.coef_
print model.alpha_