Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple quantile regression with hist method can violate logical quantile inequalities #9848

Closed
david-cortes opened this issue Dec 4, 2023 · 3 comments

Comments

@david-cortes
Copy link
Contributor

If passing multiple quantiles for a quantile regression with the histogram method, there can be cases in which it produces larger predictions for a lower quantile than for a higher quantile.

In this example, I am passing quantiles 0.05 and 0.5. Logically speaking, given the same data, the predictions for quantile 0.05 should be lower or equal than for quantile 0.5, and this does seem to be the case with the regular sorted-indices algorithm (tree_method=exact), but not for the histogram algorithm:

import numpy as np, xgboost as xgb
mtcars = np.array([[21,6,160,110,3.9,2.62,16.46,0,1,4,4],
[21,6,160,110,3.9,2.875,17.02,0,1,4,4],
[22.8,4,108,93,3.85,2.32,18.61,1,1,4,1],
[21.4,6,258,110,3.08,3.215,19.44,1,0,3,1],
[18.7,8,360,175,3.15,3.44,17.02,0,0,3,2],
[18.1,6,225,105,2.76,3.46,20.22,1,0,3,1],
[14.3,8,360,245,3.21,3.57,15.84,0,0,3,4],
[24.4,4,146.7,62,3.69,3.19,20,1,0,4,2],
[22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2],
[19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4],
[17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4],
[16.4,8,275.8,180,3.07,4.07,17.4,0,0,3,3],
[17.3,8,275.8,180,3.07,3.73,17.6,0,0,3,3],
[15.2,8,275.8,180,3.07,3.78,18,0,0,3,3],
[10.4,8,472,205,2.93,5.25,17.98,0,0,3,4],
[10.4,8,460,215,3,5.424,17.82,0,0,3,4],
[14.7,8,440,230,3.23,5.345,17.42,0,0,3,4],
[32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1],
[30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2],
[33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1],
[21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1],
[15.5,8,318,150,2.76,3.52,16.87,0,0,3,2],
[15.2,8,304,150,3.15,3.435,17.3,0,0,3,2],
[13.3,8,350,245,3.73,3.84,15.41,0,0,3,4],
[19.2,8,400,175,3.08,3.845,17.05,0,0,3,2],
[27.3,4,79,66,4.08,1.935,18.9,1,1,4,1],
[26,4,120.3,91,4.43,2.14,16.7,0,1,5,2],
[30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2],
[15.8,8,351,264,4.22,3.17,14.5,0,1,5,4],
[19.7,6,145,175,3.62,2.77,15.5,0,1,5,6],
[15,8,301,335,3.54,3.57,14.6,0,1,5,8],
[21.4,4,121,109,4.11,2.78,18.6,1,1,4,2]])
y = mtcars[:, 0]
X = mtcars[:, 1:]

dm = xgb.DMatrix(data=X, label=y)
model = xgb.train(
    dtrain=dm,
    params={
        "tree_method": "hist",
        "objective" : "reg:quantileerror",
        "quantile_alpha" : [0.05, 0.5, 0.95]
    },
    num_boost_round=5,
)
model.inplace_predict(X)[0]
array([20.959513, 20.943884, 24.354473], dtype=float32)
@trivialfis
Copy link
Member

trivialfis commented Dec 4, 2023

Indeed, I'm currently on mobile,but I think there's note on the demo that quantile crossing can happen. I believe there are ways to mitigate that, but it will require us to give up second order gradient line search.

I'm not too concerned about it, we are exploring better ways to obtain CI. Many exciting things to do.

@trivialfis
Copy link
Member

We will close this for now since it's a limitation in the algorithm and is known. We will have to explore other ways.

@rodolphetilt
Copy link

Hello,
I'm experiencing the same problem when using quantile loss.
Since you closed this issue, do you have any news on this subject?
Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants