Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove development target, install all packages in postcreatecommand #679

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

pstjohn
Copy link
Collaborator

@pstjohn pstjohn commented Feb 5, 2025

A fairly significant refactor of our release and dev targets in the Dockerfile, and removes the "development" target. (Along with some scripts that reference it, and the old uv.lock file we don't use).

Addresses some of the concerns posted in https://nvidia.slack.com/archives/C074Z808N05/p1738787505094379, specifically the error

../bionemo-noodles/src/bionemo/noodles/__init__.py:16: in <module>
    from bionemo.noodles_fasta_wrapper import (
E   ModuleNotFoundError: No module named 'bionemo.noodles_fasta_wrapper'

That arose from deleting bionemo modules in creating the dev container, but not installing the editable packages with their dependencies.

This changes the "release" target to run as the ubuntu user by default, so that files created by this image don't end up owned as root. It also changes the devcontainer to install + remove our packages by default, but rather just install everything we need in postCreateCommand, which should mimic the release container environment more closely.

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@9cec09f). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #679   +/-   ##
=======================================
  Coverage        ?   86.25%           
=======================================
  Files           ?      119           
  Lines           ?     7208           
  Branches        ?        0           
=======================================
  Hits            ?     6217           
  Misses          ?      991           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Copy link
Collaborator

@skothenhill-nv skothenhill-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deletes interna/scripts. If this is intentional, could you please:

  • update the README with alternative instructions for building images and developer environment.
  • update the PR description to describe the changes to developer workflow (e.g. instead of using "build_dev_image.sh, we prefer... blah"
  • describe the major changes to Dockerfile in a bullet list? I think it would be easier to look at the change and see something like, 1) removed development target, 2) release is built from dev 3)

Or something else if it makes more sense. Ideally both the PR description and the README would make clear both the changes from this script and how we expect workflows to change.

@cspades
Copy link
Member

cspades commented Feb 6, 2025

Discussed with @pstjohn, ideally we can preserve our docker build development workflow, instead of relying on a release container without editable installs, or devcontainers which requires... devcontainers.

Implementation is super flexible (e.g. a third build path in the Dockerfile called oss-dev), but the requirements if we can do this are:

  • To run a development setup with editable installs of all of our sub-packages using a simple docker run --rm -it -v <code> -v <data> -v <results> (or similar, etc.) and build the setup with only docker build --target oss-dev as in build_dev_image.sh.
    • This also provides a clear and simple recipe for our OSS developers to build containers.
  • To either setup the ENV correctly, or mount the relevant configs (.aws for PBSS, .ngc for NGC, .netrc for W&B, .cache for pip cache installs, .ssh for credentials) that allow us to rapidly get up to speed after running the container, which is identical to what devcontainer supports, except with only docker build.
  • Patching NeMo and Megatron should be supported and easy for development / editable containers, and that patch should be editable-installed.
  • ➕ on @skothenhill-nv 's point, documentation for it all!

Part of my (maybe illogical) fear is that devcontainers isn't supported everywhere, and I can imagine a bare-metal, air-gapped node on some private Slurm cluster needs to build their images locally or suffer moving images between different clusters (maybe they end up building the x86 image locally but they need the ARM image on the cluster) when they could just build the container and edit their local code mount on their cluster.

Originally, this PR was meant to fix an issue where bionemo-noodles had some of its installation dependencies deleted during the build: https://github.com/NVIDIA/bionemo-framework/blob/main/Dockerfile#L178

RUN <<EOF
  set -eo pipefail
  rm -rf /usr/local/lib/python3.12/dist-packages/bionemo*
  pip uninstall -y nemo_toolkit megatron_core
EOF

No rush! We have WAR for bionemo-noodles and a working setup, let's get it right and hopefully not worry about this for a while!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants