Model.fit() gives the wrong 'rsquared' when 'weights' is given #921

ydy1206 · 2023-11-07T14:08:58Z

First Time Issue Code

Yes, I read the instructions and I am sure this is a GitHub Issue.

Description

The ModelResult instance gained from Model.fit() has the wrong rsquared attribute. Floating point R^2 statistic, defined for data y and best-fit model f as R^2=1-\sum{(y_i-f_i)^2/(y_i-y_mean)^2}. In model.py, line1482, the code to calculate rsquared attribute is self.rsquared = 1.0 - (self.residual**2).sum()/max(tiny, sstot), but in your code, residual is not best_fit-data but the return value of the objective function when using the best-fit values of the parameters, which is (best_fit-data)*weights. if weights=None in Model.fit(), the two are same, otherwise they are different. This is a small bug, but I don't know if there's other part where you use residual as best_fit-data. Hope you check your code carefully. Thank you for all your contributions.

A Minimal, Complete, and Verifiable example

import lmfit as lt
import numpy as np

# Model function
def func(x, k=1, b=0):
    return k*x+b

# function to calculate R^2
def fit_R2(modelresult):
    y = modelresult.data
    f = modelresult.best_fit
    ym = sum(y)/len(y)
    return 1 - sum((y-f)**2)/sum((y-ym)**2)

# data
x = np.array([1, 2, 3, 4])
y = np.array([1.1, 1.9, 3.05, 3.95])
yerr = np.array([0.03, 0.04, 0.01, 0.02])

# fitting
mod = lt.Model(func)
params = mod.make_params()
result = mod.fit(y, params, x=x, weights=1/yerr)
print(result.fit_report())

# comparing
print('\n')
print('R^2 in ModelResult:  ', result.rsquared)
print('My R^2:  ', fit_R2(result))
print('best_fit-data:  ', result.best_fit - result.data )
print('(best_fit-data)*weights:  ', (result.best_fit - result.data ) * result.weights)
print('residual:  ', result.residual)

Fit report:

[[Model]]
    Model(func)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 7
    # data points      = 4
    # variables        = 2
    chi-square         = 26.0483871
    reduced chi-square = 13.0241935
    Akaike info crit   = 11.4946460
    Bayesian info crit = 10.2672347
    R-squared          = -4.51288616
[[Variables]]
    k:  0.96370968 +/- 0.04150367 (4.31%) (init = 1)
    b:  0.13774194 +/- 0.12714875 (92.31%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
    C(k, b) = -0.9713


R^2 in ModelResult:   -4.51288615804741
My R^2:   0.993748167968698
best_fit-data:   [ 0.00145161  0.16516129 -0.02112903  0.04258065]
(best_fit-data)*weights:   [ 0.0483871   4.12903226 -2.11290323  2.12903226]
residual:   [ 0.0483871   4.12903226 -2.11290323  2.12903226]

Error message:

There's no error message, but as you see, ModelResult gives the wrong rsquared.

Version information

Python: 3.12.0 (tags/v3.12.0:0fb18b0, Oct 2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)]

lmfit: 1.2.2, scipy: 1.11.3, numpy: 1.26.0,asteval: 0.9.31, uncertainties: 3.1.7

The text was updated successfully, but these errors were encountered:

newville · 2023-11-07T15:35:42Z

@ydy1206 Thanks -- yes, it looks like the R^2 value is definitely very wrong when weights are used!
That should be an easy fix, almost certainly

lmfit-py/lmfit/model.py

Line 1482 in c415990

self.rsquared = 1.0 - (self.residual**2).sum()/max(tiny, sstot)

should replace self.residual (which, as you say, is weighted) with dat-self.best_fit (probably moving the calculation of self.best_fit that is just a few lines below.

If you are willing and able to make a Pull Request, that would be great. If not, let us know and we'll fix this very soon.

ydy1206 · 2023-11-07T16:40:41Z

Thank you for reply, but I'm sorry I don't know how to make a Pull Request. Actually, I just register my Github account today, and I'm afraid I'll mess it up if I make a Pull Request.

newville · 2023-11-07T19:46:13Z

No worries, we'll do this. Thanks!

gyger · 2024-09-24T16:03:41Z

I have a small comment to this fix. I have slightly misused lmfit in the sense that I made a Child Model class that supports geometric objects in 2D using shapely and fitting those models.
I then re implement the _residual function using shapelys distance function to get the distance of a set of points from a boundary. best_fit now only reports the vertices of the shape instead a point for every data point, as that would be kind of tricky.

Now to my suggestion, instead of using best_fit, could we use the _residual function evaluated without weights?

newville · 2024-09-24T17:12:04Z

@gyger Is this related to the value of r-squared when using weights? For sure, responding to an issue that was closed many months ago will drive the conversation to "how is this comment/suggestion/report related to the original question".

Now to my suggestion, instead of using best_fit, could we use the _residual function evaluated without weights?

Well, if you want to use best_fit or your overwritten residual function to calculate some statistic, that seems like it shouldn't be that hard.

OTOH, if you changed the residual function so that it uses only uses a limited number of data and model values (not sure why you would need to do that instead of just limiting the data used, but okay), wouldn't that be the more meaningful statistic?

I think what we're going to focus on here is what the default value calculated by default after each fit should be.

gyger · 2024-09-24T21:24:08Z

It does not directly have something to do with r-squared when using weights, but with the fix to this problem in #923.
This change of code there.

    self.best_fit = self.model.eval(params=_ret.params, **self.userkws)
    if (self.data is not None and len(self.data) > 1
       and isinstance(self.best_fit, np.ndarray)
       and len(self.best_fit) > 1):
        dat = coerce_arraylike(self.data)
        resid = ((dat - self.best_fit)**2).sum()
        sstot = ((dat - dat.mean())**2).sum()
        self.rsquared = 1.0 - resid/max(tiny, sstot)

This assumes that self.best_fit has points for all data points. This assumption is not always true when the fitting function is not parametrized with a dependent variable e.g. fitting a circle to XY data points with noise, does not necessarily have evaluated data for every data point so the fit function can no longer be called and just exits with a dimension error, because dat and self.best_fit do not have the same size.

newville · 2024-09-24T21:57:38Z

@gyger

It does not directly have something to do with r-squared when using weights

But you chose to restart a conversation in this thread (closed for nearly a year) anyway, instead of asking a question about whatever your having trouble with? That's sort of deliberately distracting.

Yes, that section of code was changed, referencing this issue. I don't see any reason to re-open this issue.

This assumes that self.best_fit has points for all data points. This assumption is not always true when the fitting function is not parametrized with a dependent variable e.g. fitting a circle to XY data points with noise, does not necessarily have evaluated data for every data point so the fit function can no longer be called and just exits with a dimension error, because dat and self.best_fit do not have the same size.

Um, what? The fit is evaluated for all the data points, and best_fit and the fitted y data must have the same size. This is not assumed and sometimes not true. It is enforced.

The independent data might be a different size, and you can set up a model function to generate "current model" from the independent data any way you want. But that model array has to match the input data array. And, if weights are provided, those also much match. The Model fit minimizes (data-model)*weights: data and model must be the same size.

If you have a question about lmfit, I strongly recommend asking a question with Discussions or the mailing list, and including real code describing what you are doing.

gyger · 2024-09-24T22:35:00Z

Thanks for replying. I mostly indented to put the question in context where the code change was motivated that created a change in the way lmfit behaved. I understand that this can be non-wanted functionality. Happy to open a discussions instead.

newville mentioned this issue Nov 7, 2023

Rsquared with weights #923

Merged

12 tasks

newville closed this as completed in #923 Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model.fit() gives the wrong 'rsquared' when 'weights' is given #921

Model.fit() gives the wrong 'rsquared' when 'weights' is given #921

ydy1206 commented Nov 7, 2023

newville commented Nov 7, 2023

ydy1206 commented Nov 7, 2023

newville commented Nov 7, 2023

gyger commented Sep 24, 2024

newville commented Sep 24, 2024

gyger commented Sep 24, 2024 •

edited

Loading

newville commented Sep 24, 2024

gyger commented Sep 24, 2024

Model.fit() gives the wrong 'rsquared' when 'weights' is given #921

Model.fit() gives the wrong 'rsquared' when 'weights' is given #921

Comments

ydy1206 commented Nov 7, 2023

First Time Issue Code

Description

A Minimal, Complete, and Verifiable example

Fit report:

Error message:

Version information

newville commented Nov 7, 2023

ydy1206 commented Nov 7, 2023

newville commented Nov 7, 2023

gyger commented Sep 24, 2024

newville commented Sep 24, 2024

gyger commented Sep 24, 2024 • edited Loading

newville commented Sep 24, 2024

gyger commented Sep 24, 2024

gyger commented Sep 24, 2024 •

edited

Loading