-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data split cli #6604
Fix data split cli #6604
Conversation
nlg_test_data.yml test_data.json | ||
nlg_training_data.yml training_data.json | ||
nlg_test_data.yml test_data.yml | ||
nlg_training_data.yml training_data.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, since the response selector training data format changed, i wasn't actually sure that the nlg stuff is still relevant. but @dakshvar22 PR isn't merged yet, so it may be up to him!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait so responses:
separately and responses:
under domain function differently? that seems error prone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll wait for sasha to answer that since he has been working on it.
c8b90ad
to
311b97f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for taking this over 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for taking this over! 🚀
I've tried this:
rasa data split nlu -u ./examples/responseselectorbot/data --out ~/Desktop/split
And it created 4 files with the same content. Then I've tried the same with a moodbot
, and it gave me 4 empty files :(
Also one file is named nlg_training_data..yml
, with two dots in the middle.
So I don't think that existing test is complete until it verifies the file's content.
@@ -19,6 +20,7 @@ | |||
|
|||
logger = logging.getLogger(__name__) | |||
|
|||
DEFAULT_FILE_FORMAT = "yml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we re-use YAML_FILE_EXTENSIONS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well it's a set()
, so I don't know if we want to define the default file format there or not :/
587d787
to
86bde37
Compare
@@ -311,10 +311,10 @@ def persist_nlu(self, filename: Text = DEFAULT_TRAINING_DATA_OUTPUT_PATH) -> Non | |||
elif rasa.shared.data.is_likely_markdown_file(filename): | |||
rasa.shared.utils.io.write_text_file(self.nlu_as_markdown(), filename) | |||
elif rasa.shared.data.is_likely_yaml_file(filename): | |||
rasa.shared.utils.io.write_text_file(self.nlg_as_yaml(), filename) | |||
rasa.shared.utils.io.write_text_file(self.nlu_as_yaml(), filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the bad guy
# Add nlg_ as prefix and change extension to .md | ||
filename = ( | ||
Path(nlu_filename) | ||
.with_name("nlg_" + Path(nlu_filename).name) | ||
.with_suffix("." + extension) | ||
.with_suffix(extension) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is another bad guy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing that!
Existing tests didn't catch that, so I still think we need to extend them to be able to verify the content of the generated files.
@degiz have you seen the tests I wrote here? https://github.com/RasaHQ/rasa/pull/6604/files#diff-b30255e9158f657c1e469f01310a4106R23-R41 |
@m-vdb I have! I just think that it won't catch a situation when all four generated files have the same content. |
🤔 hum OK. I wasn't able to reproduce this one. I am not sure I'll be able to finish it before leaving. Can you take it over? |
3451cac
to
40155c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed failing tests and accidentally fixed a bug with --persist-nlu-data
flag for rasa train
Proposed changes:
rasa data split nlu
no longer works #6588test_data_split_nlu
after YAML writer is ready #6363Status (please check what you already did):
black
(please check Readme for instructions)