Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to pathlib #537

Merged
merged 215 commits into from
Oct 11, 2022
Merged

Migrate to pathlib #537

merged 215 commits into from
Oct 11, 2022

Conversation

Rocamonde
Copy link
Member

@Rocamonde Rocamonde commented Aug 23, 2022

Description

Fixes #531.

Migrate all path manipulation operations to be done under the pathlib package, a safe and robust way of handling paths accross platforms.

  • remove os.path
  • search any objects working as paths that are str and convert them to pathlib.Path
  • search for osp

Design Principles

  • Pass around pathlib.Path objects only (except in some special places) accross internal layers of the API
  • Allow pathlib.Path, str, bytes, or path-like and convert with validation to pathlib.Path at the highest API level (user level)
  • Paths in config dictionaries to remain stored as strings (for now, until someone checks what these things actually do)
  • Force helper functions to receive absolute paths, construct relative paths explicitly using cwd only at the highest level that is viable.
  • Force all helper functions to either create a directory or check if it exists before operating on the path

The reason for enforcing these checks is:

  • [...]: For consistency. If we are going to use Path objects, we should use them everywhere that is viable, unless there is a good reason for it
  • [...]: Forcing a user to pass a pathlib.Path object can be annoying and seems like an unnecessary requirement. We can have a function that normalizes string paths and makes the necessary checks.
  • [...]: Not sure whether those config objects are used in other places that don't accept Path objects, and checking seems complicated, so external help would be valuable
  • [...]: Functions that are not managing the working environment should not be making assumptions about where folders should be stored. It is also easier to debug IO errors if paths are absolute, as one can add a checkpoint and cd into the folder directly.
  • [...] Explicitly checking if a folder exists or creating is self-documenting of the role of the function.

Testing

Current tests should suffice, if the migration is done properly nothing should break. No new features are being added.

  • Add tests where it is viable to check whether helper functions using paths are disallowing relative paths

@Rocamonde
Copy link
Member Author

Rocamonde commented Aug 28, 2022

imitation.scripts.common.demonstrations:guess_expert_dir() is trying to assert that the data dir is absolute, but this is not the case (it's always passed a relative dir). When looking up where this path is constructed and whether the dir is relative to the current working directory, I found something weird.

The dagger object passed below:

@train_imitation_ex.capture
def train_imitation(
_run,
bc_kwargs: Mapping[str, Any],
bc_train_kwargs: Mapping[str, Any],
dagger: Mapping[str, Any],
use_dagger: bool,
agent_path: Optional[str],
) -> Mapping[str, Mapping[str, float]]:
"""Runs DAgger (if `use_dagger`) or BC (otherwise) training.

contains an expert_policy_path key that points to data/expert_models/cartpole_0/policies/final, but this path never exists during the lifecycle of the tests (not relative to the cwd or to any other directory), nor is it used anywhere in the tests. The actual location of the policy is in tests/testadata/. This is injected through the TEST_DATA_PATH when the policy is actually accessed, but this strange path is still getting constructed.

  1. It seems valuable to ensure that paths are absolute wherever possible, as it allows spotting errors like these.
  2. I could force the path to be constructed relative to the cwd somewhere, so that the assertion passes, but this seems strange because I don't actually know what this path would be for or if it would be used in some circumstance/setting.

@Rocamonde
Copy link
Member Author

Something closely related to this is what happens below:

@demonstrations_ingredient.config_hook
def hook(config, command_name, logger):
"""If rollout_path not set explicitly, then guess it based on environment name."""
del command_name, logger
updates: Dict[str, str] = {}
if config["demonstrations"]["rollout_path"] is None:
data_dir = config["demonstrations"]["data_dir"]
env_name = config["common"]["env_name"].replace("/", "_")
updates["rollout_path"] = str(
guess_expert_dir(data_dir, env_name) / "rollouts" / "final.pkl"
)
return updates

A similar path is constructed when config["demonstrations"]["rollout_path"] is None, but this never happens because this is where the correct path gets injected from the TEST_DATA_PATH.

I presume this is some sort of fallback-setting feature that tries to guess where the files are, but cannot find the test corresponding to this feature.

Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this change, seems like a significant improvement to our path handling.

Most of my comments are nits. The one substantive one is to actually add some tests for types.parse_path -- shouldn't be too hard.

tests/data/test_types.py Outdated Show resolved Hide resolved
tests/data/test_types.py Show resolved Hide resolved
src/imitation/policies/serialize.py Outdated Show resolved Hide resolved
src/imitation/scripts/analyze.py Outdated Show resolved Hide resolved
src/imitation/scripts/common/common.py Outdated Show resolved Hide resolved
tests/scripts/test_scripts.py Show resolved Hide resolved
tests/scripts/test_scripts.py Show resolved Hide resolved
tests/scripts/test_scripts.py Show resolved Hide resolved
tests/scripts/test_scripts.py Show resolved Hide resolved
tests/scripts/test_scripts.py Show resolved Hide resolved
Base automatically changed from add-mypy to master October 8, 2022 17:30
Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some small suggestions

src/imitation/data/types.py Outdated Show resolved Hide resolved
src/imitation/data/types.py Outdated Show resolved Hide resolved
src/imitation/data/types.py Outdated Show resolved Hide resolved
src/imitation/scripts/common/common.py Outdated Show resolved Hide resolved
Rocamonde and others added 3 commits October 11, 2022 14:15
Co-authored-by: Adam Gleave <adam@gleave.me>
Co-authored-by: Adam Gleave <adam@gleave.me>
@Rocamonde Rocamonde merged commit 531fa06 into master Oct 11, 2022
@Rocamonde Rocamonde deleted the migrate-to-pathlib branch October 11, 2022 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch to using pathlib
2 participants