Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #9302: Featurised slots sometimes appear in general rules #9475

Merged
merged 10 commits into from
Sep 8, 2021

Conversation

samsucik
Copy link
Contributor

@samsucik samsucik commented Aug 30, 2021

Over in #9302, I identified a bug in the CategoricalSlot featurisation code: If the slot influences_conversation but is currently unset (i.e. set to None), it can still get featurised as if it was set to an actual value. This happens when one of the allowed values for the slot is "none" (or "None", or "nOnE", etc.) due to this line.

Proposed changes:
I'm proposing a fix: To treat unset slots in the abovementioned case the same way as we treat categorical slots for which featurising the slot fails. For those cases, we return an all-zero array as the slot's featurisation. Note that, in the future, we might want to distinguish between these cases and return different features for each.

In order to implement the fix, I need to disallow the Pythonic None as a valid value for categorical slots. Currently, one can list null in domain.yml among the possible values for such a slot. That null gets translated into "none" within Domain's Python code. Which, in turn, during the featurisation of the slot, matches the cases where the slot's value is the Pythonic None (i.e. the slot is unset). In this sense, we are allowing people to use the special unset value (represented by null/None) as a regular value for categorical slots. I think this isn't good.

I was considering returning an empty feature array ([]) but then decided not to do this. We return this value in Slot.as_feature for slots that don't influence the conversation, but here we're talking about conversation-influencing yet unset slots.

I also concluded that we need to talk in the docs more about unset slots. Or at least to collect all relevant info in one place, which I'm trying to do here (to be actually put into our docs in a separate issue):

  • a slot is originally unset (unless it has initial_value defined)
  • this means its value is the Pythonic None
  • where such a slot influences_conversation, models know that it's unset (i.e. its featurisation differs from when it's set to some value)
  • in Python (custom actions), we unset a slot by setting it to None; in stories/rules, we write slot_name: null
  • categorical slots can't have the Pythonic None as a possible legit value. None (in domain.yml written as null) is reserved for when the slot is unset

In addition to unset slots, we also need to make it clear in the docs that featurisation of categorical slots is case-insensitive and involves casting to str (and we might re-think these design decisions in the future).

One more related issue to be clearly communicated in the docs is slot-agnostic vs slot-dependent behaviour: If, for some given context, a rule prescribes certain next steps irrespective of slots, and another rule (or a story) for the same context specifies different behaviour while conditioning on slots, then these two rules/stories are in conflict. A good example of this is reported here and elaborated on here.

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@samsucik samsucik self-assigned this Aug 30, 2021
@kedz
Copy link
Contributor

kedz commented Aug 30, 2021

In addition to unset slots, we also need to make it clear in the docs that featurisation of categorical slots is case-insensitive (and we might re-think this design decision in the future).

Also possibly worth mentioning that the value is also cast to a str type before lower casing and testing for str equality.

@samsucik
Copy link
Contributor Author

Also possibly worth mentioning that the value is also cast to a str type before lower casing and testing for str equality.

Yeah, that makes sense. I mean, the way we do the value checking isn't exactly obvious, so making it explicit can't hurt.

@samsucik
Copy link
Contributor Author

I've now added tests and checked that all of them fail with the fix disabled and pass with the fix included. @kedz the last bit I'll add before a review is a changelog entry 🙂

@samsucik samsucik requested a review from kedz August 31, 2021 14:48
@samsucik samsucik marked this pull request as ready for review August 31, 2021 14:51
@samsucik samsucik requested a review from a team as a code owner August 31, 2021 14:51
@samsucik samsucik requested review from alwx and removed request for a team August 31, 2021 14:51
@samsucik
Copy link
Contributor Author

@alwx fyi I've been discussing bits of this fix with Joe over here.

Copy link
Contributor

@kedz kedz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Added a minor comment about wording of a warning message.

@samsucik
Copy link
Contributor Author

samsucik commented Sep 2, 2021

Thanks @kedz! I reworded the warning, more or less following your suggestion 😉 The "null" idea also made me extend our tests (+ I added a few cases that should've been added previously).

Copy link
Contributor

@alwx alwx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good for me. Thanks!

@samsucik samsucik merged commit fc9e0ba into 2.8.x Sep 8, 2021
@samsucik samsucik deleted the fix-featurised-slots-in-rules branch September 8, 2021 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants