Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Featurised slots with initial_value get wrongly embedded in general rules #9483

Closed
2 tasks
samsucik opened this issue Aug 31, 2021 · 14 comments
Closed
2 tasks
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml/policies Issues focused around rasa's dialogue management policies effort:research/4 type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@samsucik
Copy link
Contributor

samsucik commented Aug 31, 2021

Rasa Open Source version

2.8.3

Rasa SDK version

No response

Rasa X version

No response

Python version

3.8

What operating system are you using?

OSX

What happened?

We encountered this issue a long time ago and thought we fixed it in #8161. But looks like we didn't. If a featurised slot has an initial_value set, the slot gets included in rules that don't mention this slot explicitly. This is incorrect as those rules are then not applicable in states where the slot has a value different from the initial one.

Findings from initial investigation

See bellow for an example setup to reproduce this issue.

In #8161, we introduced the omit_unset_slots flag, which, when training RulePolicy, is used to exclude slots that are unset (i.e. set to their initial values) from general rules (i.e. from rules that don't mention those slots and should be applicable regardless of those slots' values).

Unfortunately, the intended pattern with omit_unset_slots needs further work (and testing). What is going wrong is:

  1. During data generation from rules, the generator removes duplicated trackers – deduplication is done by creating tracker states (and caching them) and subsequently hashing them. In this step, omit_unset_slots is switched off.
  2. Subsequently, RulePolicy trains – creates its rule lookup, with omit_unset_slots switched on. In doing this, trackers need to be turned into states. And, since we have the states cached from the previous step, we re-use them. And this is where things break – the previously created states incorrectly contain initial values of featurised slots because they were created with omit_unset_slots=False.

I see a few options for possible fixes:

  • see if it makes sense to use omit_unset_slots=True also during tracker deduplication
  • don't use the caching
  • use caching but check that the cached states were created with the same value of omit_unset_slots
  • ...

Example setup to reproduce the issue Command:

rasa train --augmentation 0
 rasa test

The default config. Training data:

version: "2.0"
 rules:

*   rule: goodbye
     steps:
*   intent: goodbye
*   action: utter_goodbye
     nlu:
*   intent: greet
     examples: |
*   hey
*   hello
*   hi
*   hello there
*   good morning
*   good evening
*   moin
*   hey there
*   let's go
*   hey dude
*   goodmorning
*   goodevening
*   good afternoon
*   intent: goodbye
     examples: |
*   cu
*   good by
*   cee you later
*   good night
*   bye
*   goodbye
*   have a nice day
*   see you around
*   bye bye
*   see you later

Test stories:

version: "2.0"
 stories:

*   story: goodbye - num_appts slot
     steps:
*   slot_was_set:
*   num_appts: multiple
*   intent: goodbye
*   action: utter_goodbye

Domain:

version: "2.0"
 intents:

*   greet
*   goodbye
     slots:
     num_appts:
     type: categorical
     initial_value: single
     values:
*   none
*   single
     responses:
     utter_goodbye:
*   text: "Bye"

Definition of done

  • [x] Verify that the cause of the issue is as outlined above
  • [x] Write tests that catch the bug (turns out the tests in Rules states should not include the initial values of slots. #8161 weren't sufficient)
  • Decide which way to fix it (involve CSE and Product Management)
  • Create followup issue: "Implement the fix, adding more tests if necessary"
@samsucik samsucik added type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Aug 31, 2021
@samsucik
Copy link
Contributor Author

samsucik commented Aug 31, 2021

Exalate commented:

samsucik commented:

I'm tentatively putting this into Enable's inbox, though it might relate a fair bit to the (research-owned) RulePolicy.

Either way, I think this bug is a high-priority one, judging by the original one for this same issue.

@stale
Copy link

stale bot commented Jan 9, 2022

Exalate commented:

stale[bot] commented:

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 9, 2022
@samsucik
Copy link
Contributor Author

samsucik commented Jan 10, 2022

Exalate commented:

samsucik commented:

I just verified that this bug is still present in 3.0. To reproduce, use this adjusted 3.0 domain:

version: "3.0"
 intents:

*   greet
*   goodbye
     slots:
     num_appts:
     type: categorical
     initial_value: single
     values:
*   none
*   single
     mappings:
*   type: from_entity
     entity: num_appts
     responses:
     utter_goodbye:
*   text: "Bye"

@TyDunn bringing this to your attention so it doesn't get lost under layers of dust again.

@JEM-Mosig
Copy link

JEM-Mosig commented Jan 24, 2022

Exalate commented:

JEM-Mosig commented:

@JEM-Mosig is reviewer

@samsucik
Copy link
Contributor Author

samsucik commented Jan 28, 2022

Exalate commented:

samsucik commented:

Alright, I can confirm that the initial exploration was correct and it's indeed the caching mechanism of TrackerWithCachedStates that is to blame.

In short: During training data generation, for each tracker, its states are generated and cached in the tracker's _states_for_hashing. During policy training, RulePolicy asks for each rule tracker's states but tells the tracker to omit from the states any slots that are unset or set to their initial values. The tracker retrieves its cached states but ignores that they were created in a slightly different way and do contain the problematic slots. Hence, RulePolicy gets the wrong states and creates a flawed rules lookup, leading to wrong predictions at test time.

In both 3.0 and 2.8 (current states of the 3.0.x and 2.8.x branches), I was able to verify this by cleaning the rule trackers' cache just before training RulePolicy: by inserting at the very beginning of the _create_lookup_from_trackers method the simple:

for t in rule_trackers: t._states_for_hashing = None

This led to correct test-time predictions.

@samsucik
Copy link
Contributor Author

samsucik commented Jan 28, 2022

Exalate commented:

samsucik commented:

I've added a test case which catches the bug in 3.0. The test case passes when the simple patch from my previous comment is applied to RulePolicy._create_lookup_from_trackers.

When we're implementing an actual fix in the branch I started, we should move that test case to a better place (to one of the existing test files). I couldn't quickly decide where to put it because it's a test case that tests RulePolicy but feels more like an integration test – it tests the policy in the particular context that is the full train-test pipeline.

@samsucik
Copy link
Contributor Author

samsucik commented Jan 28, 2022

Exalate commented:

samsucik commented:

Let's move on to how to fix this bug. There are many ways to think about it. I think the bug fundamentally exists because we currently ignore the need for policy-specific versions of training data – we take our training trackers and pass them to each policy's train method. However, it's likely that we won't address this issue now as it's well beyond the scope here. Instead, I can see these simpler alternative fixes:

  1. Change TrackerWithCachedStates such that its cache is parametrised (i.e. its contents are associated with the arguments used upon the contents' creation). So that when you call TrackerWithCachedStates.past_states_for_hashing with omit_unset_slots=True, you won't get back cached states created in a historical call with omit_unset_slots=False.
  2. Inside RulePolicy.train, make a copy of the trackers and, before training on them, reset their cache (as in the simple hacky fix from two comments above).
  3. another alternative which I'm just not seeing right now

Naturally, all potential fixes should be compared in terms of memory concerns as creating policy-specific training data can take consume more memory – whether in the form of storing multiple parametrised versions of cached states in a TrackerWithCachedStates, or in the form of RulePolicy duplicating all the trackers for its own use.

Now, I feel like we're reaching the Enable/Engine area here, so for deciding on the concrete way of fixing this bug I'd like to bring in @ka-bu (& CC @TyDunn), with the future fixing work potentially belonging to a different squad (of course, RulePolicy is still research-owned and if we agree to fix things over there, I'm happy to own it).

@ka-bu
Copy link
Contributor

ka-bu commented Jan 28, 2022

Exalate commented:

ka-bu commented:

Regarding the other approaches above:

  1. Think this might break: Looks to me like this could differ between consumers? And if there is no agreement then the caching will definitely break
  2. Sounds not too bad: If we clear the cache, re-compute the states, and add them to the lookup one by one then we don't increase memory consumption - but just pay more time (re-compute the states again).

Adding one more option: 3. This is most costly in terms of refactoring: We could adapt the state creation so that a state is unambiguous - and create both needed versions from that. That is, currently, a state can take to forms based on that ‘omit_unset_slots’ flag, which breaks the caching idea (which will only cache one of those). If we had a more general representation then we could remove the slots that have not been set for the rules in the rule policy (right before hashing again for the lookup).

More questions for research:

  • Why do you think this flag needed and where is it needed / exploited? (I have some ideas but none sounds convincing yet) Ah, I guess, it's because of how rules are written and a workaround would be to insert them during data loading ... kind of what we do with the "from intent" global slot mappings, which is not so nice

    🤔
    ... Actually, does that make sense. Wouldn't during prediction that default value be filled in (wether the slot has been filled by the human or not)? But maybe I'm messing something up now.
  • Have you run some timings on the workaround version?

Some more random thoughts:

  • Might be good to figure out how much the caching helps. I don't think that we exploit e.g. that we could throw away events from previous sessions yet before doing anything, and that might be one of the problems why the caching helps.

@TyDunn this issue fits well with the domain + tracker featurizer disentangling work stream (Note that the domain disentangling alone does not cover this – but the prototype for the tracker featurizer rework does).

@samsucik
Copy link
Contributor Author

samsucik commented Jan 31, 2022

Exalate commented:

samsucik commented:

At a high level: I'm not aware of past benchmarks w.r.t. the usefulness of caching states in TrackerWithCachedStates (perhaps because it fell under Enable, not Research). And so we might want to start from there, instead of making assumptions such as "the caching is useful" or "caching all states 2x means too much memory overhead" or "computing all states 2x means too much runtime overhead". This is the main point I'd like to get across to you, @TyDunn. I also think the discussion we're having here definitely belongs to Enable/Engine, and they're better equipped to decide how to fix the bug. Wdyt?

Now to react to your more detailed points, @ka-bu (and thanks a lot for the input!):

  1. "[...] this could differ between consumers? And if there is no agreement then the caching will definitely break" – I think this should be okay IF we keep it standardised how the cache gets filled and accessed. Like, if there is only one public method that changes the cache, and only the arguments of that method parametrise the cache, then there shouldn't be conflicts, or am I missing something?
  2. Just to check my understanding: Here, RulePolicy would re-compute the states but not cache them, so as not to "spoil" the cached states for other policies?
  3. I'm definitely a fan – I guess this kind of proper refactoring is more about "when" than "if"

    😅

As for your questions around the omit_unset_slots flag, @ka-bu: It exists because of the tension between slots with initial values and rules. We want rules to ignore slots that aren't explicitly mentioned in the rules. But when turning a rule tracker into states, Domain sneaks in there also the featurisation of all slots with initial values

🤦
More broadly, this is about the way Rasa Open Source consumes a simple representation (like a YAML rule) but turns it into a more complex one internally – adding in lots of things like action_listens, RULE_SNIPPET_ACTION_NAME, which are opaque to a Rasa user and may sometimes lead to surprises.

@ka-bu
Copy link
Contributor

ka-bu commented Jan 31, 2022

Exalate commented:

ka-bu commented:

As for your questions around the omit_unset_slots flag, @ka-bu: It exists because of the tension between slots with initial values and rules. We want rules to ignore slots that aren't explicitly mentioned in the rules. But when turning a rule tracker into states, Domain sneaks in there also the featurisation of all slots with initial values

🤦
More broadly, this is about the way Rasa Open Source consumes a simple representation (like a YAML rule) but turns it into a more complex one internally – adding in lots of things like action_listens, RULE_SNIPPET_ACTION_NAME, which are opaque to a Rasa user and may sometimes lead to surprises.

I think the main reason for my confusions is because of these:

  1. If there is a need to distinguish between whether a slot was set by the user - or is filled with the default value - then that default value should not be used as default value. (Imho, the assumption of reasonable "defaults" is what justifies the use "omit_unset_slots=False" for other policies?)
  2. If there is a need to remove values of default slots during training , then doesn't that mean these default values are not part of the states during prediction ..... which sound unexpected.
  • But it is done because _prediction_states leads to state creation with the default value for omit_unset_slots with False. So that is consistent but still does not seem logical.

I thought one reason might be un-setting a slot. But that kind of slot setting will actually look like a lot is "set" (i.e. "has_been_set" will be true). (There is a reset() method but that is only used for resetting the whole tracker). .... So this can't be the reason and is another thing confusing me / looking inconsistent to me.

Sorry if I'm creating spam by mixing up things again. Thanks for the explanations so far and, nearly forgot to say, nice catch, @samsucik !

@ka-bu
Copy link
Contributor

ka-bu commented Jan 31, 2022

Exalate commented:

ka-bu commented:

"[...] this could differ between consumers? And if there is no agreement then the caching will definitely break" – I think this should be okay IF we keep it standardised how the cache gets filled and accessed. Like, if there is only one public method that changes the cache, and only the arguments of that method parametrise the cache, then there shouldn't be conflicts, or am I missing something?

Oh, not sure what I was thinking of here but I think I get what you mean now. So this could be a) having two caches per parameter or b) on-demand computation of states with the non-cached parameter, all behind the scenes in the from cache method. So similar to 2?

: Here, RulePolicy would re-compute the states but not cache them, so as not to "spoil" the cached states for other policies?

yep, but not sure why I thought "clear the cache" is a good idea (Friday evening

😬
)

@TyDunn
Copy link
Contributor

TyDunn commented Feb 2, 2022

Exalate commented:

TyDunn commented:

this issue fits well with the domain + tracker featurizer disentangling work stream (Note that the domain disentangling alone does not cover this – but the prototype for the tracker featurizer rework does).

I also think the discussion we're having here definitely belongs to Enable/Engine, and they're better equipped to decide how to fix the bug. Wdyt?

@ka-bu @samsucik Sounds good to me. Let's move this to Engine

@TyDunn TyDunn added the area:rasa-oss/ml/policies Issues focused around rasa's dialogue management policies label Feb 2, 2022
@rasabot-exalate rasabot-exalate added area:rasa-oss and removed type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Mar 15, 2022 — with Exalate Issue Sync
@rasabot rasabot added area:rasa-oss 🎡 Anything related to the open source Rasa framework and removed area:rasa-oss labels Mar 16, 2022
@rasabot-exalate rasabot-exalate added area:rasa-oss_:ferris_wheel: and removed area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Mar 17, 2022 — with Exalate Issue Sync
@rasabot-exalate rasabot-exalate added effort:research/4 area:rasa-oss 🎡 Anything related to the open source Rasa framework and removed area:rasa-oss_:ferris_wheel: labels Mar 17, 2022 — with Exalate Issue Sync
@m-vdb m-vdb added the type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. label Oct 7, 2022
@harshit-dxt
Copy link

harshit-dxt commented Nov 10, 2022

Hey @samsucik, what is the expected behaviour here for rules_loop_for_unhappy_path? Should the initial_values slot get embedded in those rules? I want to understand how the unhappy path rules work. It would be of great help?

@sync-by-unito
Copy link

sync-by-unito bot commented Dec 19, 2022

➤ Maxime Verger commented:

💡 Heads up! We're moving issues to Jira: https://rasa-open-source.atlassian.net/browse/OSS.

From now on, this Jira board is the place where you can browse (without an account) and create issues (you'll need a free Jira account for that). This GitHub issue has already been migrated to Jira and will be closed on January 9th, 2023. Do not forget to subscribe to the corresponding Jira issue!

➡️ More information in the forum: https://forum.rasa.com/t/migration-of-rasa-oss-issues-to-jira/56569.

@m-vdb m-vdb closed this as completed Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml/policies Issues focused around rasa's dialogue management policies effort:research/4 type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

No branches or pull requests

8 participants