Why do two similar fitting curves have completely different fitting results #980

xmuworker · 2024-12-12T08:54:10Z

xmuworker
Dec 12, 2024

This is always the case when I use lmfit for batch fitting. It is clear that the shapes of curves are similar, but the fitting results differ greatly. Is there any way to improve this phenomenon.

good fit:

bad fit:

here is my code

from lmfit import Minimizer, Model, Parameters, create_params, report_fit
import numpy as np
import pandas as pd
from scipy import interpolate
from sklearn.metrics import r2_score, mean_squared_error



def lm_fit(fit_df, model):
    fit_df_scale = fit_df/(fit_df.max()- fit_df.min())
    x_k = fit_df['Separation(nm)'].max() - fit_df['Separation(nm)'].min()
    y_k = fit_df['Cor_Force(nN)'].max() - fit_df['Cor_Force(nN)'].min()
    fit_model = Model(model, independent_vars=['x'], nan_policy='omit')
    params = fit_model.make_params(Z=dict(value=0.08 / y_k, min=0, max=0.1/y_k),
                                    A_H = dict(value=0.05/(x_k*y_k), min=1e-5/(x_k*y_k), max=1/(x_k*y_k)),
                                    κ = dict(value=0.1*x_k, min=0.001*x_k, max=10000*x_k ),
                                    )
    params.add('R', value=30/x_k, vary=False)
    if model is func4:
        params.add('x0', value=0, min=-0.1, max=0.1)
    if model is func5:
        params.add('x0', value=0, min=-0.1, max=0.1)
        params.add('y0', value=0, min=-0.1, max=0.1)
    if model is func6:
        params.add('x0', value=0, min=-0.1, max=0.1)
        params.add('y0', value=0, min=-0.1, max=0.1)
        params.add('Z_H', value=2*x_k/y_k, min=0, max=100*x_k/y_k)
        params.add('D_H', value=5/x_k, min = 1e-10/x_k, max=100/x_k)
    x = np.array(fit_df_scale.iloc[:, 0])
    data = np.array(fit_df_scale.iloc[:, 1])
    # tol = 1e-13
    result = fit_model.fit(data, params=params, x=x, max_nfev=20000, method='nelder-mead')
    final = result.best_fit
    # params_dict = result.params.valuesdict()
    r2 = r2_score(data, final)
    return result, r2, final

def func6(x, Z, A_H, κ, Z_H, D_H, x0, y0, R=30):
    epsilon = 1e-10
    # 确保分母中无除零情况
    valid_x = np.where((x - x0) != 0, x - x0, epsilon)
    valid_x_R = np.where((x - x0 + 2 * R) != 0, x - x0 + 2 * R, epsilon)

    # 计算公式
    result = (
        Z * R * κ * np.exp(-valid_x * κ)
        - (2 * A_H * R ** 3) / (3 * valid_x ** 2 * valid_x_R ** 2)
        - Z_H * R * np.exp(-valid_x / D_H)
        + y0
    )
    return result

result, r2, y_pred = lm_fit(fit_df, func6)
# fit_df is my data which contains two columns, each with approximately 100 rows

Answered by newville

Dec 18, 2024

yeah, if either -valid_x * k > 700 or -valid_x / D_H > 700, you'll have problems with np.inf. You do have a check that helps gaurd so that valid_x**2 + valid_x_R**2 cannot be tiny. You might also check that k and D_H are not so far off that the exponentials give you np.inf.

Again, I am generally very suspicious when I see bounds on variable Parameters being generated from the data. OTOH, clipping the arguments to exponentials, either at run time or by setting bounds appropriately seems like a good thing to do.

View full answer

newville · 2024-12-12T13:59:24Z

newville
Dec 12, 2024
Maintainer

@xmuworker It's hard for us to give specific advice about any particular "bad fit", especially without seeing a fit report. I would make a few suggestions of what to look at:

a) the fit report is the main result of a fit. Importantly for your fits, it will tell you if a fit got stuck at a bound or if some parameter was not moved from its initial value or went to some crazy value, or if you hit the limit of the number of function evaluations. It will also give you the uncertainties in the parameters. For example, in your "bad fit", why did A_H go to zero? Is that at a limit or is it just a "precision of reporting" issue?

b) seeing bounds on parameter values set programmatically always worries me. I admit that I sometimes do this myself, but only when I feel like I understand the "physical/meaningful" values. The way you are setting bounds seems "mostly not too scary to me" (assuming that the dataframes are not causing x_k or y_k to be zero, but that seems to not be true for the case in your plots). Still, I recommend looking out for parameters getting stuck at bounds.

c) Your func6 returns an array, and you have many data points, so I would recommend using method='leastsq' instead of nelder-mead.

If those don't guide you to better results, I suggest posting a more complete example of one of the "not very good" fits.

8 replies

newville Dec 18, 2024
Maintainer

@xmuworker Why does the array change size between calls? This is probably because you're getting nan or inf from you model function. You'll have to find out why this is happening (hint: exp(710) ) and prevent it.

As we always (always) say, if you are having trouble, post real code that shows the problem and the full traceback.

xmuworker Dec 18, 2024
Author

import pandas as pd
import os
import numpy as np
from lmfit import Minimizer, Model, Parameters, create_params, report_fit



def lm_fit(fit_df, model):
    # fit_df_scale = fit_df.copy()
    # fit_df_scale['Cor_Force(nN)'] = fit_df_scale['Cor_Force(nN)'] * 1e3
    fit_df_scale = fit_df/(fit_df.max() - fit_df.min())
    x_k = fit_df['Separation(nm)'].max() - fit_df['Separation(nm)'].min()
    y_k = fit_df['Cor_Force(nN)'].max() - fit_df['Cor_Force(nN)'].min()
    fit_model = Model(model, independent_vars=['x'], nan_policy='omit')
    params = fit_model.make_params(A_H = dict(value=0.01/(x_k*y_k), min=1e-3/(x_k*y_k), max=1/(x_k*y_k)),
                                    )
    params.add('R', value=30/x_k, vary=False)
    if model is func1:
        params.add('x0', value=0, min=-0.1, max=0.1)
        params.add('y0', value=0, min=-0.1, max=0.1)
        params.add('Z_H', value=2*x_k/y_k, min=0, max=100*x_k/y_k)
        params.add('D_H', value=5/x_k, min = 1e-10/x_k, max=100/x_k)
    if model is func4:
        params.add('x0', value=0, min=-0.1, max=0.1)
    if model is func5:
        params.add('x0', value=0, min=-0.1, max=0.1)
        params.add('y0', value=0, min=-0.1, max=0.1)
        params.add('Z', value=0.08/y_k, min=0, max=0.1/y_k)
        params.add('κ', value=0.1*x_k, min=0.00105*x_k, max=15*x_k)
    if model is func6:
        params.add('x0', value=0, min=-0.2, max=0.2)
        params.add('y0', value=0, min=fit_df['Cor_Force(nN)'].min()/y_k, max=fit_df['Cor_Force(nN)'].max()/y_k)
        params.add('Z', value=0.08/y_k, min=0, max=0.1/y_k)
        params.add('κ', value=0.1*x_k, min=0.00105*x_k, max=15*x_k)
        params.add('Z_H', value=0.01*x_k/y_k, min=0, max=10*x_k/y_k)
        params.add('D_H', value=5/x_k, min = 1e-5/x_k, max=20/x_k)
    x = np.array(fit_df_scale.iloc[:, 0])
    data = np.array(fit_df_scale.iloc[:, 1])
    # tol = 1e-13
    result = fit_model.fit(data, params=params, x=x, max_nfev=20000)
    final = result.best_fit
    # params_dict = result.params.valuesdict()
    r2 = r2_score(data, final)
    return result, r2, final




def func0(x, A_H):
    return -1 * A_H / (6 * x**2)


def func1(x, A_H, Z_H, D_H, x0, y0, R=30):
    epsilon = 1e-10
    valid_x = np.where((x - x0) != 0, x - x0, epsilon)
    valid_x_R = np.where((x - x0 + 2 * R) != 0, x - x0 + 2 * R, epsilon)

    # 计算公式
    result = (
        - (2 * A_H * R ** 3) / (3 * valid_x ** 2 * valid_x_R ** 2)
        - Z_H * R * np.exp(-valid_x / D_H)
        + y0
    )
    return result


def func3(x, A_H, Z, κ):
    return Z*κ*np.exp(-x*κ)-A_H/(6*(x)**2)


def func4(x, A_H, Z, κ, x2):
    return Z*κ*np.exp(-(x-x2)*κ)-A_H/(6*(x-x2)**2)


def func5(x, A_H, Z, κ, x0, y0, R=30):
    return Z*R*κ*np.exp(-(x-x0)*κ)-2*A_H*R**3/(3*(x-x0)**2*(x-x0+2*R)**2) + y0


def func6(x, A_H, Z, κ, Z_H, D_H, x0, y0, R=30):
    epsilon = 1e-10
    valid_x = np.where((x - x0) != 0, x - x0, epsilon)
    valid_x_R = np.where((x - x0 + 2 * R) != 0, x - x0 + 2 * R, epsilon)

    result = (
        Z * R * κ * np.exp(-valid_x * κ)
        - (2 * A_H * R ** 3) / (3 * valid_x ** 2 * valid_x_R ** 2)
        - Z_H * R * np.exp(-valid_x / D_H)
        + y0
    )
    return result


fit_df = pd.read_csv("error fit data.txt", sep=' ')
result, r2, y_pred = lm_fit(fit_df, func6)

error fit data.txt
i am sorry for that i didn't post the code, and above are both code and data, i find the parameter in exponent like kappa will be very huge when fittng, make the data 'inf', but i don't konw how to solve it
Looking forward to your reply!

xmuworker Dec 18, 2024
Author

Added: the code python r2 = r2_score(data, final)should be deleted

newville Dec 18, 2024
Maintainer

yeah, if either -valid_x * k > 700 or -valid_x / D_H > 700, you'll have problems with np.inf. You do have a check that helps gaurd so that valid_x**2 + valid_x_R**2 cannot be tiny. You might also check that k and D_H are not so far off that the exponentials give you np.inf.

Again, I am generally very suspicious when I see bounds on variable Parameters being generated from the data. OTOH, clipping the arguments to exponentials, either at run time or by setting bounds appropriately seems like a good thing to do.

Answer selected by xmuworker

xmuworker Dec 31, 2024
Author

Thank you for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lmfit

Why do two similar fitting curves have completely different fitting results #980

{{title}}

Replies: 1 comment 8 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

lmfit

Why do two similar fitting curves have completely different fitting results #980

xmuworker Dec 12, 2024

Replies: 1 comment · 8 replies

newville Dec 12, 2024 Maintainer

newville Dec 18, 2024 Maintainer

xmuworker Dec 18, 2024 Author

xmuworker Dec 18, 2024 Author

newville Dec 18, 2024 Maintainer

xmuworker Dec 31, 2024 Author

xmuworker
Dec 12, 2024

Replies: 1 comment 8 replies

newville
Dec 12, 2024
Maintainer

newville Dec 18, 2024
Maintainer

xmuworker Dec 18, 2024
Author

xmuworker Dec 18, 2024
Author

newville Dec 18, 2024
Maintainer

xmuworker Dec 31, 2024
Author