-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions #2
Comments
Hi! I am not sure I completely understand your question, but I will take a stab at it. I think the confusion might be because the term degrees of freedom is "flawed", or at the very least often not formally defined. It's easiest to motivate this using standard linear regression: Y= X\beta + \varepsilon, where X is your nxp predictor matrix, \beta (px1) is your coefficient vector, \varepsilon is the noise and Y is the observed response. At least intuitively, the degrees of freedom in this model are p. We would like a more general definition though, that might apply to a larger class of models. Note that if X has full rank, then the least squares estimate is \hat{\beta} = (X^T X)^{-1}X^T Y and your fitted (predicted) Ys are: \hat{Y} = X\hat{\beta} = X (X^T X)^{-1} X^T Y So if we call H = X (X^T X)^{-1} X^T (the so called hat-matrix), then \hat{Y} = HY. It now turns out that one can show that Trace(H)=p. Hence we take Trace(H) as the definition of degrees of freedom (there are even further generalization but it is enough for smoothing splines). To see that this is more generally applicable, consider the following: Let λ > 0 and define (I being the identity matrix): S = X (X^T X + λΙ)^{-1} X^T and let \hat{Y} = S Y By adding this diagonal to X^T X, we "damp down" our predictions, i.e. we regularized ("ridge/tikhonov regularization"). Intuitively, this model should have less degrees of freedom than p (and in fact, the larger λ, the smaller the degrees of freedom). Indeed with the definition df = Trace(S), it will turn out that for λ=0, df = p , but as λ goes to infinity, df goes to 0, so at least this definition makes some sense. Now what is a smoothing spline? In constrast to a regression spline, here we place knots at every sample point x_i, thus resulting in a design matrix X of dimension n x n! Now imagine using standard linear regression of Y onto X, then (if X full rank, it is invertible), so: H = X(X^T X)^{-1} X^T = I \hat{Y} = I Y = Y Degrees of freedom are equal to n. This is silly since we just interpolated the data. But now if you do the ridge trick, i.e. add some positive definite matrix V (actually in this case it's not diagonal) to X^T X, i.e. take S_λ = X( X^T X + λV)^{-1} X^T and fitted values \hat{Y} = S_λ Y then as λ grows, your degrees of freedom shrink. It turns out that penalizing the curvature of your fitted function (with regularization parameter λ), corresponds to exactly this type of shrinkage for a matrix V. Hence given λ, you get your smoother matrix S_λ, and thus the degrees of freedom df = Trace(S_λ). That's a 1-1 mapping and given the degrees of freedom, you can figure out what λ was. Sometimes people prefer to think of it in one way or another (and degrees of freedom, even though flawed, at least provides one possibility of measuring complexity of different models). Does this make sense? The element of statistical learning is a great book to learn about these things. |
Thanks for your detailed answer! I guess it mostly boils down to my ignorance of statistics, which will stay that way due to time constraints... What I'm after, is trying to understand the lambda parameter and how to set it. The degree-of-freedom is something I, a deterministic modeler, have a grasp on (although, not in the statistical sense). So how do I calculate Trace(S_λ) from Maybe another way to ask my question would be: what is the smallest wave-length present in the smoothing spline. (I would expect that the smallest wave-length is somewhere around the smallest distance between used knots.) For example, if I take a noisy sine curve |
Trial and error gives me a transform |
Did you figure out how to extract the smoothing matrix |
This |
Thanks for the link. But I didn't mean the trace, but the actually smoothing matrix... |
Found it, though it is not very efficient in computing
|
Sorry to abuse the issue-tracker for questions: You mention "conversion between regularization parameter λ and degrees of freedom". How would I do that? Is there a way to get the knots where the individual splines join? (or is that the wrong way to look at it entirely?)
The text was updated successfully, but these errors were encountered: