-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Dockerfile for label data dir #200
Conversation
Make sure we keep the label, column data dir for the existing pipelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data path is now installed as part of the python/conda package. Does this PR need to be more about documentaiton/scripting updates?
@@ -146,6 +146,7 @@ COPY "./docker" "./docker" | |||
COPY "./docs" "./docs" | |||
COPY "./examples" "./examples" | |||
COPY "./models" "./models" | |||
COPY "./morpheus/data" "./morpheus/data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This directory gets installed now with the morpheus
python package. If you run:
$ python -c "import morpheus;print(morpheus.DATA_DIR)"
$HOME/Repos/morpheus/morpheus-dev/morpheus/data
Is this path used by scripts anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me double check things once we have a working release build. I'll validate that our external instructions are correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release build should be working as of #164
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this may be a doc issue but we are storing the pipeline label/column files now out at the site-package which is not necessarily intuitive if a user wants to locate and examine those files.
Sure, the idea is that they can provide whatever path they want and the doc help tells them where these default files are located. We do currently (22.04) document a local reference in the QSG and the internal platforms which may need to be updated.
This is kind of a UX question I suppose.
@BartleyR
# python -c "import morpheus;print(morpheus.DATA_DIR)"
/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/data
(morpheus) root@12cd8b30e179:/opt/conda/envs/morpheus# ls /opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/data
bert-base-cased-hash.txt bert-base-uncased-hash.txt columns_ae.txt columns_fil.txt labels_ae.txt labels_nlp.txt labels_phishing.txt
In PR #130 we moved the `data` directory into `morpheus/data` and installed it with the python package. This required changing some of the default CLI arguments from relative paths like `data/labels_nlp.txt` to absolute paths like `morpheus.DATA_DIR/labels_nlp.txt`. To make it easy for the user to see how to change the labels file, we respecified the default argument value in documentation (i.e. `--labels_file=data/labels_nlp.txt`). Now that this needs to be an absolute path, the command in the documentation does not work. Adding absolute paths in the documentation is not feasible since this would require very long paths that would change from machine to machine. Instead, if the user specifies a data file with a relative path, we first check to see if a file exists relative to the current working directory. If it doesnt exist, then we check for a relative file to the current morpheus install. This allows commands from the documentation like: `morpheus run pipeline-nlp --labels_file=data/labels_nlp.txt` to find the correct path. We only choose the fallback value when no other file is found. Related to PR #200 Authors: - Michael Demoret (https://github.com/mdemoret-nv) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: #232
Closing this in favor of #232. |
Make sure we keep the label, column data dir for the existing pipelines.