Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple confidence normalization from ranking_length parameter #9642

Closed
5 tasks
dakshvar22 opened this issue Sep 15, 2021 · 9 comments · Fixed by #9823
Closed
5 tasks

Decouple confidence normalization from ranking_length parameter #9642

dakshvar22 opened this issue Sep 15, 2021 · 9 comments · Fixed by #9823
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml 👁 All issues related to machine learning effort:atom-squad/2 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR

Comments

@dakshvar22
Copy link
Contributor

What problem are you trying to solve?

Currently, all ML components (DIETClassifier, ResponseSelector, TEDPolicy) have a parameter called ranking_length which is supposed to be used when the user does not want to see the confidences of all possible candidate labels (intents / actions) and instead want to see a smaller number of labels.

However, what happens by default is that model's predicted confidences for the labels inside the ranking length are renormalized to sum up to 1. This is wrong because this can artificially boost the confidences of labels and hides the original confidences that were predicted by the model itself. Renormalization step should be decoupled from this parameter.

Removing the renormalization step is also not possible because if a user has a lot of intents, say 200, all the predicted confidences tend to be very low which is problematic for fallback. So, it should be done only when needed, i.e. the user sees a benefit of it. Anecdotal evidence of this happening with customers

What's your suggested solution?

Have a separate parameter, something like - renormalize_confidences which if set to True applies the renormalization.

Note that the re-normalization step is currently only applied when loss_type is set to cross_entropy and that condition should still stay.

Examples (if relevant)

No response

Is anything blocking this from being implemented? (if relevant)

No response

Definition of Done

  • Renormalization logic decoupled from ranking_length parameter.
  • New parameter added for renormalization (decide on the name if it the above suggestion doesn't seem appropriate)
  • Changelog and migration guide entry
  • Docs
  • Tests
@dakshvar22 dakshvar22 added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Sep 15, 2021
@dakshvar22
Copy link
Contributor Author

dakshvar22 commented Sep 15, 2021

@TyDunn Just bringing this issue to your notice. This coupling between ranking_length and the renomalization step has come up quite often as causing problems as to how users (some customers) interpret confidences.
I think we should decouple this logic with 3.0 itself because the right thing to do is to not apply this renormalization ideally but for practical reasons if some users find it useful we'll need to support it. Doing this in a minor will mean that we can't avoid renormalization by default.

@TyDunn TyDunn added the area:rasa-oss/ml 👁 All issues related to machine learning label Sep 15, 2021
@TyDunn TyDunn added the effort:atom-squad/2 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. label Oct 4, 2021
@ka-bu
Copy link
Contributor

ka-bu commented Oct 7, 2021

@dakshvar22 ,

We have that:

  • TED is only using the ranking_length for the purpose of normalisation (i.e. there seems to be no usage of that for the purpose of pruning predicted actions for predictions, or am I missing something?)
  • DIETClassifier and ResponseSelector use it for two things:
    • the number of reported top confidences is min(ranking_length, LABEL_RANKING) if ranking_length is defined - otherwise it's fixed to LABEL_RANKING
    • the normalisation over the ranking_length many items
    • which leads to the funny situation that the reported confidences might not sum up to 1 if ranking_length > LABEL_RANKING (cf. "test_softmax_normalization" case commented with "higher than default ranking_length" which does not sum up to 1 because of this)

So how about we ...

  • in DIETClassifier/ResponseSelector use
    • LABEL_RANKING as default for ranking_length
    • report ranking_length many top confidences if ranking_length>0, and otherwise all confidences
    • re-normalise all reported confidences if and only if the new renormalize_confidences flag is set
    • (this breaks what people expected - but that was supposed to happen anyway)
  • in TED we
    • if ranking_length > 0, then always set all confidences that are not among the top ranking_length many to 0 (this is the only possible analog to "reporting only x many" I think) (which is kept at it's default of 10)
    • re-normalise all confidences if and only if the new renormalize_confidences flag is set (which is True by default)
    • (with the default of the new parameter being True, no old config should break)

wdyt?

@ka-bu
Copy link
Contributor

ka-bu commented Oct 7, 2021

Oh, similarly, we could parameterize sklearn intent classifier - that one will always report confidences for top 10 intents (if there are that many)

@dakshvar22
Copy link
Contributor Author

TED is only using the ranking_length for the purpose of normalisation

Probably a mistake, should be fixed.

So how about we ...

I agree with everything but lets keep the behaviour universal for all 3 components, i.e.:

  1. LABEL_RANKING as default for ranking_length
  2. report ranking_length many top confidences if ranking_length>0, and otherwise all confidences
  3. re-normalise all reported confidences if and only if the new renormalize_confidences flag is set
  4. By default, renormalize_confidences is set to False.

If people want to get the old behaviour they can set renormalize_confidences=True.
Wdyt?

@ka-bu
Copy link
Contributor

ka-bu commented Oct 7, 2021

  1. LABEL_RANKING as default for ranking_length

👍 wrote 10 above because LABEL_RANKING is from nlu.classifiers - will move this parameter to some other constants.py then 🤔

  1. report ranking_length many top confidences if ranking_length>0, and otherwise all confidences

This isn't possible for TED. A policy prediction must contain values for all predictions. Think the closest we can get to consistent behaviour is the setting-to-0 approach (and not normalising).

  1. By default, renormalize_confidences is set to False.

Wasn't sure if it's ok to change that one, I'm fine with breaking things to unify the behaviour :D

@dakshvar22
Copy link
Contributor Author

This isn't possible for TED. A policy prediction must contain values for all predictions. Think the closest we can get to consistent behaviour is the setting-to-0 approach (and not normalising).

Ahh right, forgot about it. So, in the setting-to-0 approach, the confidences that have been set to 0 don't contribute to renormalization (if renormalization was turned on), right?

Wasn't sure if it's ok to change that one, I'm fine with breaking things to unify the behaviour

This is exactly why this needs to be done in 3.0 . Renormalization is completely wrong IMO but if it floats the boat for some users, we should support it, but just not the default behaviour 😅

@ka-bu
Copy link
Contributor

ka-bu commented Oct 7, 2021

Ahh right, forgot about it. So, in the setting-to-0 approach, the confidences that have been set to 0 don't contribute to renormalization (if renormalization was turned on), right?

I'm not sure what you mean by they don't contribute to renormalisation. We normalise by the sum of the confidences. If we remove them from the list (in DIET), they don't contribute to the sum, just like they don't contribute to the sum if we set them to 0 (in TED).

Btw, there's also a tiny bug in the normalisation new_values[new_values < ranked[ranking_length - 1]] = 0 that'll probably close to never happen (only in case we have multiple ranks with same probability that could be ranked as the ranking_lengths item) but I'll still fix it along with the rest :)

This is exactly why this needs to be done in 3.0 . Renormalization is completely wrong IMO but if it floats the boat for some users, we should support it, but just not the default behaviour 😅

👍

@dakshvar22
Copy link
Contributor Author

I'm not sure what you mean by they don't contribute to renormalisation. We normalise by the sum of the confidences. If we remove them from the list (in DIET), they don't contribute to the sum, just like they don't contribute to the sum if we set them to 0

Totally my bad, don't know what I was thinking

@ka-bu
Copy link
Contributor

ka-bu commented Oct 7, 2021

No worries, happens to me all the time ^^. Btw, actually there's no reason to mask the probabilities in TED, is there? The ensemble won't care about the values that we are setting to 0. So maybe we set the default to 0 instead of LABEL_RANKING_LENGTH ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml 👁 All issues related to machine learning effort:atom-squad/2 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants