Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPath: Implement forking directly and fix serialization #62

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

voegtlel
Copy link
Collaborator

@voegtlel voegtlel commented Feb 7, 2025

  • Remove the need to call EPath.prepare_forked_process after forking
  • Also fix serialization of epath to not include the fs, but only the path

Closes #48

…l functions. Also fix serialization of epath
s3_args: Optional[Dict[str, Any]]

def __init__(
self,
initial_path: Union[str, "EPath", PathlibPath],
config_override: Optional[Dict[str, ConfigEntry]] = None,
*,
_test_config_override: Optional[Dict[str, ConfigEntry]] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to rename to config_override since not only for tests, but config can be provided by user directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make that unsupported. Because that'd mean we serialize the config along every EPath. I think every override should be made in the global config. Also, we'll anyways transition to msc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But even for msc, I think it makes sense to have the option to provide a config live if you don't want to have the common file on disk. The config should then only be serialized if it was overridden.

src/megatron/energon/epathlib/epath.py Show resolved Hide resolved
else:
ignore_obj = None
EPath._after_fork_global()
for object in gc.get_objects():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I'm not sure if I like this loop. We didn't need the GC before. I prefer the fork guard that we had before which is called on-demand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something you forget quickly. If any user e.g. parallelizes some method we offer, it'll break. This is automatic, nothing needed to think about.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and we also do that for the torch tensors)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop can be reverted without the user having to remember anything, right? This is about fork_guard() not about prepare_forked_process()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve serialization of EPath objects
2 participants