RasaHQ · tabergma · Dec 17, 2019 · Oct 18, 2019 · Oct 18, 2019 · Oct 18, 2019
diff --git a/changelog/4935.feature.rst b/changelog/4935.feature.rst
@@ -0,0 +1,15 @@
+Preparation for an upcoming change in the ``EmbeddingIntentClassifier``:
+
+Add option ``use_cls_token`` to all tokenizers. If it is set to ``True``, the token ``__CLS__`` will be added to
+the end of the list of tokens. Default is set to ``False``. No need to change the default value for now.
+
+Add option ``return_sequence`` to all featurizers. By default all featurizers return a matrix of size
+(1 x feature-dimension). If the option ``return_sequence`` is set to ``True``, the corresponding featurizer will return
+a matrix of size (token-length x feature-dimension). See https://rasa.com/docs/rasa/nlu/components/#featurizers.
+Default value is set to ``False``. However, you might want to set it to ``True`` if you want to use custom features
+in the ``CRFEntityExtractor``.
+See https://rasa.com/docs/rasa/nlu/entity-extraction/#passing-custom-features-to-crfentityextractor.
+
+.. warning::
+
+    These changes break model compatibility. You will need to retrain your old models!
diff --git a/changelog/4935.removal.rst b/changelog/4935.removal.rst
@@ -0,0 +1,12 @@
+Removed ``ner_features`` as a feature name from ``CRFEntityExtractor``, use ``text_dense_features`` instead. If
+
+The following settings match the previous ``NGramFeaturizer``:
+
+.. code-block:: yaml
+
+    - name: 'CountVectorsFeaturizer'
+        analyzer: 'char_wb'
+        min_ngram: 3
+        max_ngram: 17
+        max_features: 10
+        min_df: 5
diff --git a/changelog/4957.removal.rst b/changelog/4957.removal.rst
@@ -0,0 +1,5 @@
+To use custom features in the ``CRFEntityExtractor`` use ``text_dense_features`` instead of ``ner_features``. If
+``text_dense_features`` are present in the feature set, the ``CRFEntityExtractor`` will automatically make use of
+them. Just make sure to add a dense featurizer in front of the ``CRFEntityExtractor`` in your pipeline and set the
+flag ``return_sequence`` to ``True`` for that featurizer.
+See https://rasa.com/docs/rasa/nlu/entity-extraction/#passing-custom-features-to-crfentityextractor.
diff --git a/docs/api/featurization.rst → docs/api/core-featurization.rst b/docs/api/featurization.rst → docs/api/core-featurization.rst
@@ -1,10 +1,10 @@
 :desc: Find out how to apply machine learning algorithms to conversational AI
        using vector representations of conversations with Rasa.
 
-.. _featurization:
+.. _featurization_conversations:
 
-Featurization
-==============
+Featurization of Conversations
+==============================
 
 .. edit-link::
 

diff --git a/docs/core/policies.rst b/docs/core/policies.rst
@@ -70,7 +70,7 @@ in the policy configuration yaml file.
 
     Only the ``MaxHistoryTrackerFeaturizer`` uses a max history,
     whereas the ``FullDialogueTrackerFeaturizer`` always looks at
-    the full conversation history. See :ref:`featurization` for details.
+    the full conversation history. See :ref:`featurization_conversations` for details.
 
 As an example, let's say you have an ``out_of_scope`` intent which
 describes off-topic user messages. If your bot sees this intent multiple
@@ -218,7 +218,7 @@ following steps:
 
 It is recommended to use
 ``state_featurizer=LabelTokenizerSingleStateFeaturizer(...)``
-(see :ref:`featurization` for details).
+(see :ref:`featurization_conversations` for details).
 
 **Configuration:**
 
@@ -308,7 +308,7 @@ It is recommended to use
         Default ``max_history`` for this policy is ``None`` which means it'll use
         the ``FullDialogueTrackerFeaturizer``. We recommend to set ``max_history`` to
         some finite value in order to use ``MaxHistoryTrackerFeaturizer``
-        for **faster training**. See :ref:`featurization` for details.
+        for **faster training**. See :ref:`featurization_conversations` for details.
         We recommend to increase ``batch_size`` for ``MaxHistoryTrackerFeaturizer``
         (e.g. ``"batch_size": [32, 64]``)
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -91,7 +91,7 @@ Understand messages, hold conversations, and connect to messaging channels and A
    api/event-brokers
    api/lock-stores
    api/training-data-importers
-   api/featurization
+   api/core-featurization
    migration-guide
    changelog
 

diff --git a/docs/migration-guide.rst b/docs/migration-guide.rst
@@ -37,7 +37,7 @@ General
 - Default ``max_history`` for ``EmbeddingPolicy`` is ``None`` which means it'll use
   the ``FullDialogueTrackerFeaturizer``. We recommend to set ``max_history`` to
   some finite value in order to use ``MaxHistoryTrackerFeaturizer``
-  for **faster training**. See :ref:`featurization` for details.
+  for **faster training**. See :ref:`featurization_conversations` for details.
   We recommend to increase ``batch_size`` for ``MaxHistoryTrackerFeaturizer``
   (e.g. ``"batch_size": [32, 64]``)
 - **Compare** mode of ``rasa train core`` allows the whole core config comparison.

diff --git a/docs/nlu/components.rst b/docs/nlu/components.rst
diff --git a/docs/nlu/entity-extraction.rst b/docs/nlu/entity-extraction.rst
@@ -151,10 +151,17 @@ If you just want to match regular expressions exactly, you can do this in your c
 as a postprocessing step after receiving the response from Rasa NLU.
 
 
+.. _entity-extraction-custom-features:
+
 Passing Custom Features to ``CRFEntityExtractor``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-If you want to pass custom features to ``CRFEntityExtractor``, you can create a ``Featurizer`` that provides ``ner_features``.
-If you do, ``ner_features`` should be an iterable of ``len(tokens)``, where each entry is a vector.
-If ``CRFEntityExtractor`` finds ``"ner_features"`` in one of the arrays in ``features`` in the config, it will pass the ``ner_features`` vectors to ``sklearn_crfsuite``.
-The simplest example of this is to pass word vectors as features, which you can do using :ref:``SpacyFeaturizer``.
+If you want to pass custom features, such as pre-trained word embeddings, to ``CRFEntityExtractor``, you can
+add any dense featurizer (except ``ConveRTFeaturizer``) to the pipeline before the ``CRFEntityExtractor``.
+Make sure to set ``"return_sequence"`` to ``True`` for the corresponding dense featurizer.
+``CRFEntityExtractor`` automatically finds the additional dense features and checks if the dense features are an
+iterable of ``len(tokens)``, where each entry is a vector.
+A warning will be shown in case the check fails.
+However, ``CRFEntityExtractor`` will continue to train just without the additional custom features.
+In case dense features are present, ``CRFEntityExtractor`` will pass the dense features to ``sklearn_crfsuite``
+and use them for training.
diff --git a/examples/restaurantbot/config.yml b/examples/restaurantbot/config.yml
@@ -6,23 +6,24 @@ pipeline:
   - name: "SpacyFeaturizer"
   - name: "SklearnIntentClassifier"
   - name: "CRFEntityExtractor"
-    features: [ ["low", "title", "upper"],
+    features: [
+      ["low", "title", "upper"],
       [
-          "bias",
-          "low",
-          "prefix5",
-          "prefix2",
-          "suffix5",
-          "suffix3",
-          "suffix2",
-          "upper",
-          "title",
-          "digit",
-          "pattern",
-          "ner_features",
+        "bias",
+        "low",
+        "prefix5",
+        "prefix2",
+        "suffix5",
+        "suffix3",
+        "suffix2",
+        "upper",
+        "title",
+        "digit",
+        "pattern",
+        "text_dense_features"
       ],
-     ["low", "title", "upper"]]
-
+      ["low", "title", "upper"],
+    ]
   - name: "EntitySynonymMapper"
 
 policies:

diff --git a/rasa/constants.py b/rasa/constants.py
@@ -33,7 +33,7 @@
 CONFIG_MANDATORY_KEYS_NLU = ["language", "pipeline"]
 CONFIG_MANDATORY_KEYS = CONFIG_MANDATORY_KEYS_CORE + CONFIG_MANDATORY_KEYS_NLU
 
-MINIMUM_COMPATIBLE_VERSION = "1.3.0a2"
+MINIMUM_COMPATIBLE_VERSION = "1.6.0a2"
 
 GLOBAL_USER_CONFIG_PATH = os.path.expanduser("~/.config/rasa/global.yml")
 

diff --git a/rasa/core/actions/action.py b/rasa/core/actions/action.py
@@ -19,7 +19,7 @@
 from rasa.nlu.constants import (
     DEFAULT_OPEN_UTTERANCE_TYPE,
     OPEN_UTTERANCE_PREDICTION_KEY,
-    MESSAGE_SELECTOR_PROPERTY_NAME,
+    RESPONSE_SELECTOR_PROPERTY_NAME,
 )
 
 from rasa.core.events import (
@@ -201,7 +201,7 @@ async def run(
         """Query the appropriate response and create a bot utterance with that."""
 
         response_selector_properties = tracker.latest_message.parse_data[
-            MESSAGE_SELECTOR_PROPERTY_NAME
+            RESPONSE_SELECTOR_PROPERTY_NAME
         ]
 
         if self.intent_name_from_action() in response_selector_properties:

diff --git a/rasa/core/policies/embedding_policy.py b/rasa/core/policies/embedding_policy.py
@@ -252,25 +252,25 @@ def _label_features_for_Y(self, label_ids: "np.ndarray") -> "np.ndarray":
     # noinspection PyPep8Naming
     def _create_session_data(
         self, data_X: "np.ndarray", data_Y: Optional["np.ndarray"] = None
-    ) -> "train_utils.SessionData":
-        """Combine all tf session related data into a named tuple"""
-
+    ) -> "train_utils.SessionDataType":
+        """Combine all tf session related data into dict."""
         if data_Y is not None:
             # training time
             label_ids = self._label_ids_for_Y(data_Y)
             Y = self._label_features_for_Y(label_ids)
-
-            # idea taken from sklearn's stratify split
-            if label_ids.ndim == 2:
-                # for multi-label y, map each distinct row to a string repr
-                # using join because str(row) uses an ellipsis if len(row) > 1000
-                label_ids = np.array([" ".join(row.astype("str")) for row in label_ids])
+            # explicitly add last dimension to label_ids
+            # to track correctly dynamic sequences
+            label_ids = np.expand_dims(label_ids, -1)
         else:
             # prediction time
             label_ids = None
             Y = None
 
-        return train_utils.SessionData(X=data_X, Y=Y, label_ids=label_ids)
+        return {
+            "dialogue_features": [data_X],
+            "bot_features": [Y],
+            "action_ids": [label_ids],
+        }
 
     def _create_tf_bot_embed(self, b_in: "tf.Tensor") -> "tf.Tensor":
         """Create embedding bot vector."""
@@ -331,9 +331,9 @@ def _create_tf_dial(self, a_in) -> Tuple["tf.Tensor", "tf.Tensor"]:
 
     def _build_tf_train_graph(self) -> Tuple["tf.Tensor", "tf.Tensor"]:
         """Bulid train graph using iterator."""
+        # iterator returns a_in, b_in, action_ids
+        self.a_in, self.b_in, _ = self._iterator.get_next()
 
-        # session data are int counts but we need a float tensors
-        self.a_in, self.b_in = self._iterator.get_next()
         if isinstance(self.featurizer, MaxHistoryTrackerFeaturizer):
             # add time dimension if max history featurizer is used
             self.b_in = self.b_in[:, tf.newaxis, :]
@@ -364,23 +364,25 @@ def _build_tf_train_graph(self) -> Tuple["tf.Tensor", "tf.Tensor"]:
         )
 
     # prepare for prediction
-    def _create_tf_placeholders(self, session_data: "train_utils.SessionData") -> None:
+    def _create_tf_placeholders(
+        self, session_data: "train_utils.SessionDataType"
+    ) -> None:
         """Create placeholders for prediction."""
 
         dialogue_len = None  # use dynamic time
         self.a_in = tf.placeholder(
             dtype=tf.float32,
-            shape=(None, dialogue_len, session_data.X.shape[-1]),
+            shape=(None, dialogue_len, session_data["dialogue_features"][0].shape[-1]),
             name="a",
         )
         self.b_in = tf.placeholder(
             dtype=tf.float32,
-            shape=(None, dialogue_len, None, session_data.Y.shape[-1]),
+            shape=(None, dialogue_len, None, session_data["bot_features"][0].shape[-1]),
             name="b",
         )
 
     def _build_tf_pred_graph(
-        self, session_data: "train_utils.SessionData"
+        self, session_data: "train_utils.SessionDataType"
     ) -> "tf.Tensor":
         """Rebuild tf graph for prediction."""
 
@@ -440,7 +442,10 @@ def train(
 
         if self.evaluate_on_num_examples:
             session_data, eval_session_data = train_utils.train_val_split(
-                session_data, self.evaluate_on_num_examples, self.random_seed
+                session_data,
+                self.evaluate_on_num_examples,
+                self.random_seed,
+                label_key="action_ids",
             )
         else:
             eval_session_data = None
@@ -458,7 +463,11 @@ def train(
                 train_init_op,
                 eval_init_op,
             ) = train_utils.create_iterator_init_datasets(
-                session_data, eval_session_data, batch_size_in, self.batch_strategy
+                session_data,
+                eval_session_data,
+                batch_size_in,
+                self.batch_strategy,
+                label_key="action_ids",
             )
 
             self._is_training = tf.placeholder_with_default(False, shape=())
@@ -512,7 +521,9 @@ def continue_training(
                 session_data = self._create_session_data(
                     training_data.X, training_data.y
                 )
-                train_dataset = train_utils.create_tf_dataset(session_data, batch_size)
+                train_dataset = train_utils.create_tf_dataset(
+                    session_data, batch_size, label_key="action_ids"
+                )
                 train_init_op = self._iterator.make_initializer(train_dataset)
                 self.session.run(train_init_op)
 
@@ -535,7 +546,7 @@ def tf_feed_dict_for_prediction(
         data_X = self.featurizer.create_X([tracker], domain)
         session_data = self._create_session_data(data_X)
 
-        return {self.a_in: session_data.X}
+        return {self.a_in: session_data["dialogue_features"][0]}
 
     def predict_action_probabilities(
         self, tracker: "DialogueStateTracker", domain: "Domain"