Replies: 1 comment 1 reply
-
Thanks for the issue. We're working to come up with a good solution for S3 checkpointing and distributed checkpointing. Will update this issue when we have more information we can share. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I'm working with the latest NeMo Framework container (nvcr.io/nvidia/nemo:dev) and I'd like to use distributed S3 checkpoints with an internal S3-compatible storage system.
I'm following the guidance here: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/common/s3_checkpointing.html
I don't see any evidence of being able to specify an endpoint_url, ca certificate, etc. There are also some glaring omissions from the NeMo container, like
tenancity
, which is reported by an error message as a missing dependency.I also found some guidance here about S3 support but it seems to be outdated: #7832.
Can you please provide guidance on how to make this work?
Beta Was this translation helpful? Give feedback.
All reactions