Decouple confidence normalization from `ranking_length` parameter #9642

dakshvar22 · 2021-09-15T08:54:43Z

What problem are you trying to solve?

Currently, all ML components (DIETClassifier, ResponseSelector, TEDPolicy) have a parameter called ranking_length which is supposed to be used when the user does not want to see the confidences of all possible candidate labels (intents / actions) and instead want to see a smaller number of labels.

However, what happens by default is that model's predicted confidences for the labels inside the ranking length are renormalized to sum up to 1. This is wrong because this can artificially boost the confidences of labels and hides the original confidences that were predicted by the model itself. Renormalization step should be decoupled from this parameter.

Removing the renormalization step is also not possible because if a user has a lot of intents, say 200, all the predicted confidences tend to be very low which is problematic for fallback. So, it should be done only when needed, i.e. the user sees a benefit of it. Anecdotal evidence of this happening with customers

What's your suggested solution?

Have a separate parameter, something like - renormalize_confidences which if set to True applies the renormalization.

Note that the re-normalization step is currently only applied when loss_type is set to cross_entropy and that condition should still stay.

Examples (if relevant)

No response

Is anything blocking this from being implemented? (if relevant)

No response

Definition of Done

Renormalization logic decoupled from ranking_length parameter.
New parameter added for renormalization (decide on the name if it the above suggestion doesn't seem appropriate)
Changelog and migration guide entry
Docs
Tests

The text was updated successfully, but these errors were encountered:

dakshvar22 · 2021-09-15T08:58:03Z

@TyDunn Just bringing this issue to your notice. This coupling between ranking_length and the renomalization step has come up quite often as causing problems as to how users (some customers) interpret confidences.
I think we should decouple this logic with 3.0 itself because the right thing to do is to not apply this renormalization ideally but for practical reasons if some users find it useful we'll need to support it. Doing this in a minor will mean that we can't avoid renormalization by default.

ka-bu · 2021-10-07T12:37:47Z

@dakshvar22 ,

We have that:

TED is only using the ranking_length for the purpose of normalisation (i.e. there seems to be no usage of that for the purpose of pruning predicted actions for predictions, or am I missing something?)
DIETClassifier and ResponseSelector use it for two things:
- the number of reported top confidences is min(ranking_length, LABEL_RANKING) if ranking_length is defined - otherwise it's fixed to LABEL_RANKING
- the normalisation over the ranking_length many items
- which leads to the funny situation that the reported confidences might not sum up to 1 if ranking_length > LABEL_RANKING (cf. "test_softmax_normalization" case commented with "higher than default ranking_length" which does not sum up to 1 because of this)

So how about we ...

in DIETClassifier/ResponseSelector use
- LABEL_RANKING as default for ranking_length
- report ranking_length many top confidences if ranking_length>0, and otherwise all confidences
- re-normalise all reported confidences if and only if the new renormalize_confidences flag is set
- (this breaks what people expected - but that was supposed to happen anyway)
in TED we
- if ranking_length > 0, then always set all confidences that are not among the top ranking_length many to 0 (this is the only possible analog to "reporting only x many" I think) (which is kept at it's default of 10)
- re-normalise all confidences if and only if the new renormalize_confidences flag is set (which is True by default)
- (with the default of the new parameter being True, no old config should break)

wdyt?

ka-bu · 2021-10-07T12:42:28Z

Oh, similarly, we could parameterize sklearn intent classifier - that one will always report confidences for top 10 intents (if there are that many)

dakshvar22 · 2021-10-07T13:00:35Z

TED is only using the ranking_length for the purpose of normalisation

Probably a mistake, should be fixed.

So how about we ...

I agree with everything but lets keep the behaviour universal for all 3 components, i.e.:

LABEL_RANKING as default for ranking_length
report ranking_length many top confidences if ranking_length>0, and otherwise all confidences
re-normalise all reported confidences if and only if the new renormalize_confidences flag is set
By default, renormalize_confidences is set to False.

If people want to get the old behaviour they can set renormalize_confidences=True.
Wdyt?

ka-bu · 2021-10-07T13:13:48Z

LABEL_RANKING as default for ranking_length

👍 wrote 10 above because LABEL_RANKING is from nlu.classifiers - will move this parameter to some other constants.py then 🤔

report ranking_length many top confidences if ranking_length>0, and otherwise all confidences

This isn't possible for TED. A policy prediction must contain values for all predictions. Think the closest we can get to consistent behaviour is the setting-to-0 approach (and not normalising).

By default, renormalize_confidences is set to False.

Wasn't sure if it's ok to change that one, I'm fine with breaking things to unify the behaviour :D

dakshvar22 · 2021-10-07T13:24:36Z

This isn't possible for TED. A policy prediction must contain values for all predictions. Think the closest we can get to consistent behaviour is the setting-to-0 approach (and not normalising).

Ahh right, forgot about it. So, in the setting-to-0 approach, the confidences that have been set to 0 don't contribute to renormalization (if renormalization was turned on), right?

Wasn't sure if it's ok to change that one, I'm fine with breaking things to unify the behaviour

This is exactly why this needs to be done in 3.0 . Renormalization is completely wrong IMO but if it floats the boat for some users, we should support it, but just not the default behaviour 😅

ka-bu · 2021-10-07T13:31:08Z

Ahh right, forgot about it. So, in the setting-to-0 approach, the confidences that have been set to 0 don't contribute to renormalization (if renormalization was turned on), right?

I'm not sure what you mean by they don't contribute to renormalisation. We normalise by the sum of the confidences. If we remove them from the list (in DIET), they don't contribute to the sum, just like they don't contribute to the sum if we set them to 0 (in TED).

Btw, there's also a tiny bug in the normalisation new_values[new_values < ranked[ranking_length - 1]] = 0 that'll probably close to never happen (only in case we have multiple ranks with same probability that could be ranked as the ranking_lengths item) but I'll still fix it along with the rest :)

This is exactly why this needs to be done in 3.0 . Renormalization is completely wrong IMO but if it floats the boat for some users, we should support it, but just not the default behaviour 😅

👍

dakshvar22 · 2021-10-07T13:32:38Z

I'm not sure what you mean by they don't contribute to renormalisation. We normalise by the sum of the confidences. If we remove them from the list (in DIET), they don't contribute to the sum, just like they don't contribute to the sum if we set them to 0

Totally my bad, don't know what I was thinking

ka-bu · 2021-10-07T16:08:00Z

No worries, happens to me all the time ^^. Btw, actually there's no reason to mask the probabilities in TED, is there? The ensemble won't care about the values that we are setting to 0. So maybe we set the default to 0 instead of LABEL_RANKING_LENGTH ?

dakshvar22 added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Sep 15, 2021

TyDunn added the area:rasa-oss/ml 👁 All issues related to machine learning label Sep 15, 2021

TyDunn assigned ka-bu Oct 4, 2021

TyDunn added the effort:atom-squad/2 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. label Oct 4, 2021

ka-bu mentioned this issue Oct 7, 2021

9642 / Decouple confidence normalisation from ranking length #9823

Merged

4 tasks

ka-bu closed this as completed in #9823 Oct 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple confidence normalization from `ranking_length` parameter #9642

Decouple confidence normalization from `ranking_length` parameter #9642

dakshvar22 commented Sep 15, 2021

dakshvar22 commented Sep 15, 2021 •

edited

Loading

ka-bu commented Oct 7, 2021

ka-bu commented Oct 7, 2021

dakshvar22 commented Oct 7, 2021

ka-bu commented Oct 7, 2021

dakshvar22 commented Oct 7, 2021

ka-bu commented Oct 7, 2021 •

edited

Loading

dakshvar22 commented Oct 7, 2021

ka-bu commented Oct 7, 2021

Decouple confidence normalization from ranking_length parameter #9642

Decouple confidence normalization from ranking_length parameter #9642

Comments

dakshvar22 commented Sep 15, 2021

What problem are you trying to solve?

What's your suggested solution?

Examples (if relevant)

Is anything blocking this from being implemented? (if relevant)

Definition of Done

dakshvar22 commented Sep 15, 2021 • edited Loading

ka-bu commented Oct 7, 2021

ka-bu commented Oct 7, 2021

dakshvar22 commented Oct 7, 2021

ka-bu commented Oct 7, 2021

dakshvar22 commented Oct 7, 2021

ka-bu commented Oct 7, 2021 • edited Loading

dakshvar22 commented Oct 7, 2021

ka-bu commented Oct 7, 2021

Decouple confidence normalization from `ranking_length` parameter #9642

Decouple confidence normalization from `ranking_length` parameter #9642

dakshvar22 commented Sep 15, 2021 •

edited

Loading

ka-bu commented Oct 7, 2021 •

edited

Loading