Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Dockerfile for label data dir #200

Closed
wants to merge 1 commit into from
Closed

Conversation

pdmack
Copy link
Contributor

@pdmack pdmack commented Jun 27, 2022

Make sure we keep the label, column data dir for the existing pipelines.

Make sure we keep the label, column data dir for the existing pipelines.
@pdmack pdmack requested a review from a team as a code owner June 27, 2022 16:12
@pdmack pdmack added non-breaking Non-breaking change improvement Improvement to existing functionality 3 - Ready for Review labels Jun 27, 2022
Copy link
Contributor

@mdemoret-nv mdemoret-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data path is now installed as part of the python/conda package. Does this PR need to be more about documentaiton/scripting updates?

@@ -146,6 +146,7 @@ COPY "./docker" "./docker"
COPY "./docs" "./docs"
COPY "./examples" "./examples"
COPY "./models" "./models"
COPY "./morpheus/data" "./morpheus/data"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directory gets installed now with the morpheus python package. If you run:

$ python -c "import morpheus;print(morpheus.DATA_DIR)"
$HOME/Repos/morpheus/morpheus-dev/morpheus/data

Is this path used by scripts anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me double check things once we have a working release build. I'll validate that our external instructions are correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release build should be working as of #164

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this may be a doc issue but we are storing the pipeline label/column files now out at the site-package which is not necessarily intuitive if a user wants to locate and examine those files.

Sure, the idea is that they can provide whatever path they want and the doc help tells them where these default files are located. We do currently (22.04) document a local reference in the QSG and the internal platforms which may need to be updated.

This is kind of a UX question I suppose.
@BartleyR

# python -c "import morpheus;print(morpheus.DATA_DIR)"
/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/data
(morpheus) root@12cd8b30e179:/opt/conda/envs/morpheus# ls /opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/data
bert-base-cased-hash.txt  bert-base-uncased-hash.txt  columns_ae.txt  columns_fil.txt  labels_ae.txt  labels_nlp.txt  labels_phishing.txt

ghost pushed a commit that referenced this pull request Jun 29, 2022
In PR #130 we moved the `data` directory into `morpheus/data` and installed it with the python package. This required changing some of the default CLI arguments from relative paths like `data/labels_nlp.txt` to absolute paths like `morpheus.DATA_DIR/labels_nlp.txt`.

To make it easy for the user to see how to change the labels file, we respecified the default argument value in documentation (i.e. `--labels_file=data/labels_nlp.txt`). Now that this needs to be an absolute path, the command in the documentation does not work. Adding absolute paths in the documentation is not feasible since this would require very long paths that would change from machine to machine.

Instead, if the user specifies a data file with a relative path, we first check to see if a file exists relative to the current working directory. If it doesnt exist, then we check for a relative file to the current morpheus install. This allows commands from the documentation like: `morpheus run pipeline-nlp --labels_file=data/labels_nlp.txt` to find the correct path. We only choose the fallback value when no other file is found.

Related to PR #200

Authors:
  - Michael Demoret (https://github.com/mdemoret-nv)

Approvers:
  - David Gardner (https://github.com/dagardner-nv)

URL: #232
@mdemoret-nv
Copy link
Contributor

Closing this in favor of #232.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement to existing functionality non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants