Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split data dir, moving large files into examples/data #130

Merged
43 commits merged into from
Jun 7, 2022

Conversation

dagardner-nv
Copy link
Contributor

@dagardner-nv dagardner-nv commented May 27, 2022

Brings directory size down to 1.6MB down from 270MB

  • Move large files to exaples/data
  • Migrate existing files in exaples/data to git-lfs
  • Add missing email_with_addresses.jsonlines needed for phishing detection developer guide
  • Update file paths in docs for data files
  • Update import paths in developer guide

Depends on changes in #62
Fixes #120

dagardner-nv and others added 30 commits April 27, 2022 11:11
…e build, since we only generate them when doing an inplace build
…unclear on this, but the internet seems to imply that either include_package_data+MANIFEST.in or package_data should be used but not both
Co-authored-by: Christopher Harris <xixonia@gmail.com>
Copy link
Contributor

@mdemoret-nv mdemoret-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good. I'm not sure what the with_data_len.json and without_data_len.json files are used for. Can you verify that they arent used anywhere and remove them? Otherwise, looks good.

@dagardner-nv
Copy link
Contributor Author

Removed unused with_data_len.json and without_data_len.json files

Copy link
Contributor

@mdemoret-nv mdemoret-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to revert the pinned Neo version

docker/conda/environments/cuda11.4_dev.yml Outdated Show resolved Hide resolved
Copy link
Contributor

@mdemoret-nv mdemoret-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now.

@mdemoret-nv
Copy link
Contributor

@gpucibot merge

@ghost ghost merged commit b94bd9b into nv-morpheus:branch-22.06 Jun 7, 2022
ghost pushed a commit that referenced this pull request Jun 29, 2022
In PR #130 we moved the `data` directory into `morpheus/data` and installed it with the python package. This required changing some of the default CLI arguments from relative paths like `data/labels_nlp.txt` to absolute paths like `morpheus.DATA_DIR/labels_nlp.txt`.

To make it easy for the user to see how to change the labels file, we respecified the default argument value in documentation (i.e. `--labels_file=data/labels_nlp.txt`). Now that this needs to be an absolute path, the command in the documentation does not work. Adding absolute paths in the documentation is not feasible since this would require very long paths that would change from machine to machine.

Instead, if the user specifies a data file with a relative path, we first check to see if a file exists relative to the current working directory. If it doesnt exist, then we check for a relative file to the current morpheus install. This allows commands from the documentation like: `morpheus run pipeline-nlp --labels_file=data/labels_nlp.txt` to find the correct path. We only choose the fallback value when no other file is found.

Related to PR #200

Authors:
  - Michael Demoret (https://github.com/mdemoret-nv)

Approvers:
  - David Gardner (https://github.com/dagardner-nv)

URL: #232
@dagardner-nv dagardner-nv deleted the david-split-data-dir branch February 12, 2024 23:20
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Improvements or additions to documentation enhancement Additional functionality added to an existing feature non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Separate morpheus data dir
3 participants