Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ConveRTTokenizer #4984

Closed
wants to merge 20 commits into from
Closed

Add ConveRTTokenizer #4984

wants to merge 20 commits into from

Conversation

tabergma
Copy link
Contributor

@tabergma tabergma commented Dec 17, 2019

Proposed changes:

  • Add ConveRTTokenizer
  • ConveRTFeaturizer can now be used with return_sequence set to True
  • Use ConveRTTokenizer as tokenizer in the pipeline pretrained_embeddings_convert

closes #4978

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@tabergma tabergma requested a review from dakshvar22 December 17, 2019 08:39
dakshvar22
dakshvar22 previously approved these changes Dec 17, 2019
Copy link
Contributor

@dakshvar22 dakshvar22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome implementation! Left a couple of minor comments. I would suggest adding one more test to check if the embedding of CLS token is correctly set to the sentence encoding.

@dakshvar22
Copy link
Contributor

@tabergma Can you also rename rasa/nlu/featurizers/featurzier.py to rasa/nlu/featurizers/featurizer.py :)

@dakshvar22 dakshvar22 self-requested a review December 17, 2019 10:07
@dakshvar22 dakshvar22 dismissed their stale review December 17, 2019 13:48

Discussed. There is a bit more to do.

@tabergma
Copy link
Contributor Author

We need to add more functionality. Will open a new PR once implementation is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tokenizer for ConveRT
2 participants