Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REP-003: Package and Context Localisation #680

Open
nerdvegas opened this issue Jul 27, 2019 · 6 comments
Open

REP-003: Package and Context Localisation #680

nerdvegas opened this issue Jul 27, 2019 · 6 comments
Labels
REP REPs are Rez enhancement proposals that need in-depth discussion

Comments

@nerdvegas
Copy link
Contributor

nerdvegas commented Jul 27, 2019

REP-003: Package and Context Localisation

Rez contexts contain references to the variants in their resolve, in the form of variant “handles”. A handle is just a metadata dict that uniquely identifies a variant and defines where it is located. For example, a variant handle typically contains the package name, package version, path to the package repository containing it, and index of the variant within that package. From its handle, the package definition can be retrieved, which is needed to configure the environment (for example, commands needs to be evaluated).

When resolved into an environment, a variant will typically reference its installed payload (its “root”) in some way - for example, by adding a “bin” subdirectory to PATH. The installed payload will reside within the package repository directory.

These are two examples of cases where use of a context requires access to external resources - either the package definitions (which would generally come from memcached in a production Rez install, but may also come from disk); and the package installation payload.

There are two different reasons why this may not be desirable:

  • We wish for a context to act in a standalone manner, so it can be run in isolation. For example, we would like a VM or container running a Rez-based micro service to not have to access memcached, nor to require a mount to shared file storage.
  • We wish for package payloads to be locally available, rather than having to be fetched over an nfs. This has the potential to significantly improve performance and decrease load on the filer.

Context Localisation

A “localised” or “deep” context would be one that contains entire copies of the package definitions its variants require. This would cause the following differences in behaviour (when compared with standard contexts):

  • The rxt file size would be larger (possibly significantly);
  • Changes to package definitions in their original package repositories would not take effect in the deep context;
  • The deep context would not require fetching of package definitions from memcached, and thus would source faster.

The following features are desirable, in order to give users as much control as possible, and to maintain backwards compatibility:

  • Use of deep contexts would be configurable, but would also be manually overridable in all cases where contexts are created. For example, rez-env would have a new —context-mode option (with “deep” and “shallow” choices), for use in combination with its —output option.
  • It should be possible to load a shallow context and convert it to a deep context, and vice versa (probably via the existing rez-context tool).

One thing that also needs to be considered is if it’s ever desirable to store deep contexts into memcached. This would result in more optimal loading of cached resolves, but at the cost of far more cache space - each package definition would effectively be stored many times over (once for each deep context they appear in). It may be better not to support this, and to instead wait for a port to redis, which supports multiple key retrieval (via its mget operation).

Package Localisation

The basic idea behind this feature is that, when a context is created, the variant payloads that that context uses are lazily copied into a local disk cache. However, there are a few points to consider:

  • We don’t necessarily want to copy all variant payloads, otherwise a context might take too long to create;
  • Copying package payloads may cause technical problems in some cases;
  • The user may need control over where the package cache resides - it may be too limiting to assume that there is just one cache.
  • How do we clean up old localised packages, so the cache doesn’t grow forever?

These points are now addressed in turn.

Localisation Mode

There should be a mode that determines how localisation behaves. Potential modes are:

  • Full. Localise all packages in the context, regardless of how long it takes;
  • Limit. Localise up to N packages at any one time;
  • Time. Localise packages until a time limit is reached;
  • None. Don’t localise anything.

If rez-env uses any of those last three modes, then a context may only have had some of its packages localised. That is ok though - more and more packages will be cumulatively localised with every resolve the user performs, and this should fairly rapidly result in full localisation anyway. If maximum localisation is priority, then “full” mode can be used, at the cost of context load time (and even then, this should rapidly improve anyway, as more and more packages are localised).

It’s worth noting the distinction between “none” mode localisation, and disabling localisation altogether. None mode would simply make use of any packages already localised; disabling localisation on the other hand would ignore the package cache completely.

Technical Problems

There are some instances where copying package payloads could be problematic:

  • The package is large, and could eat up too much home directory quota;
  • The package payload is not position independent (ie moving it will break it, because there are absolute references to itself, with itself; or, relative references to something outside of itself);
  • One package is not position independent relative to another (perhaps package A is rpathed to package B - moving B will break A’s symbol resolution).

Similar to the existing “non_relocatable” package attribute, there is need also for a “non_localisable” attribute. It would make sense for this to default to “non_relocatable”, as this is a very similar concept, and typically a non-relocatable package would also be non-localisable.

Describing the last case is a little more complex however. It may be necessary for another package attribute, that lists other packages that become non-localisable in their presence. For example, if the aforementioned A and B packages appear in a context, then B is not localisable, because of the presence of A.

Package Cache Location

It would make sense most of the time to use one configured package cache across the board, for any given user. However, there are cases where this doesn’t make sense. For example, to create a standalone context for use in a container, you would want to create a package cache specifically for that container.

Package cache location should be overridable in the Rez API and tools where appropriate. Furthermore, there should be a tool (rez-package-cache lets say) that allows for manually wrangling package caches. It would also be useful to be able to associate a context with a specific cache.

Examples of tool use:

# populate a specific cache using a context
]$ rez-package-cache —populate foo.rxt ~/mycache

# list variants in a cache
]$ rez-package-cache —list-variants ~/mycache

# create a context and associate it with a specific cache:
]$ Rez-env pkgA pkgB —package-cache ~/mycache —output bah.rxt

# create a copy of a context, associated with a different cache
]$ rez-package-cache —bake src.rxt dest.rxt ~/.othercache

# source a context and use a specific package cache
]$ Rez-env —input foo.rxt —package-cache ~/mycache — echo hello

If a context has a baked cache, perhaps it should fall back onto the globally configured cache, for any non-cached packages.

Cleaning up Old Cached Packages

There isn’t really a reliable way to 100% ensure that any given cached package is not still in use. Perhaps we could drop lock files into the cache to indicate they’re being used; but a context that unexpectedly aborts would undoubtedly cause these locks to be left behind, and their associated cached packages never to be deleted. Realistically we probably just want to delete based on date of last use, or a combination of that and max disk usage, or package count.

To allow for this, the rez-package-cache tool should be able to perform deletion based on the parameters described above. Furthermore, cached package directories should be touched on use so we can reliably say when they were last used.

To trigger the deletion, either a cron could be setup on workstations, or perhaps a configured setting would cause Rez itself to perform the cache deletions once every N configured days (or hours etc), each time a context is created or sourced.

Package Cache Structure

The package cache would be structured on disk like so:

{root}/foo/1.2.3/ab58fca5283cfbbbc3ca5680/
{root}/foo/1.2.3/3cfbbbc3ca5680ab58fca528/
{root}/bah/12.0.5/a258fc3ca5680a5283cfbbbc/

Each variant would be stored under a hash of its handle. To help with debugging, the leading directories would be the package name and version respectively. Different hashes within the same package and version represent different variants - either different variants within the same package, or variants from packages with the same name and version, but in a different repository.

When a cached copy of a variant in a context is found, its root is simply changed to that cached variant directory.

Package Immutability

One point to note is that caching is only useful if we can assume that the contents of a package do not change. Otherwise, the time spent verifying that the contents of a variant are the same as that in the cache, would negate the value of caching in the first place.

Fortunately, for the majority case in Rez - packages that have been released - immutability is already a property, and so these can be readily cached. The case where they cannot is local packages, which in practice get re-installed over all the time. So, package caching would be disabled for local packages (which makes sense anyway, because local packages typically already reside on the local disk, so there’s no point caching them).

Standalone Contexts

It’s already been mentioned that we would like to be able to create contexts that are completely standalone, for the purposes of running a micro service for example. In this case, we would like to localise both the context and the packages it uses. To do that, it would make sense to bake the package cache into the context, as shown earlier. However, it would also be practical to store this as a relative reference in the context, so we could copy both context and cache to a service VM or container together, rather than having to construct them in-place.

Here’s what that might look like:

]$ mkdir svr
]$ Rez-env pkgA pkgB —package-cache svr/pkg_cache —rel-package-cache —context-mode deep —output svr/svr.rxt
]$ rez-package-cache —populate svr/svr.rxt

This would:

  • Create a deep context (package definitions are embedded into the rxt);
  • Bake the relative path to the package cache, into the context;
  • Create a package cache (if it didn’t already exist);
  • Cache all the variants used by the context, into the cache.

We would then have an svr/ directory that we could copy to a server and run the server binary from. The only prerequisites would be:

  • The server has a system (platform, os etc) that is compatible with the context that was created;
  • Rez is installed on the server.

Running the service would simply involve:

]$ Rez-env —input svr/svr.rxt — my-server-command
@nerdvegas nerdvegas added the REP REPs are Rez enhancement proposals that need in-depth discussion label Jul 27, 2019
@instinct-vfx
Copy link
Contributor

I gave this a first thorough read. I am not done thinking through all of this but i have a first set of remarks. Note that some of these is a bit playing devil's advocate so take with a grain of salt.

Package Cache Structure:

  • What is the advantage of a custom cache setup as opposed to "yet another repository, that is local and acts as cache"?

either different variants within the same package, or variants from packages with the same name and version, but in a different repository.

  • I am not sure how this is meant. Would the same package variant from different repositories create different caches? That would make packages mutable no? I was wondering before if there is a use-case for introducing a package hash (a hash that includes both the metadata in package.py AND the actual file and folder strucutre) to be able to ensure immutability.

Package Immutability

This brings up a case that i wanted to discuss anyways and i think it suits fine here. (Happy to fork the discussion elsewhere if you disagree):

While variants are typically immutable, packages are often not (at least here). The main case that is giving me a headache here are packages that i release programmatically from installers. An example would be VRay. VRay ships as installers. These installers are specific to the Max version and the VRay version. Hence every installer contains a single variant. I have a tool that i can drop the installer (or nightly release) on and it will install to a local temp, gather files, apply some patches, wrap it up as package and release it.
This works fine because Rez handles the merging of variants just fine. BUT that makes the package.py file mutable. The alternative would mean re-releasing all previous versions if a new variant is released. So if we support Max Versions 2015-2018 and 2019 is released i am creating
an additional copy of 4 variants that come at a whopping 1 GB per variant.

How would you handle such a case? It has been giving me a headache in all my planning in regards to syncing packages between offices (=remote repositories) and the case seems to have a LOT of overlap.

Technical Problems:
In addition to the "not_relocatable" and "not_localizable" attributes what about a "should_localzie" or even "must_localize"? We have quite a few packages (e.g. maya or houdini) that can not run from the server location altogether. On a similar note: It would actually be beneficial for us if this very case could be treated as a special case. For these packages we would like to store the payload zipped and localization would be "copy local -> extract" because copying these over the network is a LOT slower than copying the zip.

Great to see this being kicked off, this is a big one for us! 👏

@nerdvegas
Copy link
Contributor Author

nerdvegas commented Jul 29, 2019 via email

@nerdvegas
Copy link
Contributor Author

nerdvegas commented Jul 29, 2019 via email

@instinct-vfx
Copy link
Contributor

Sorry for the delay and thanks for the detailed answer! I think the REP does make sense as is. In regards to the payload issue: These are the cases where we have DCCs that are NOT installed locally. Take houdini or nuke for example. They are VERY big packages and even though you can technically run them of the file share, the load times can be very bad (depending on the OS and storage used obviously). Especially since these consist of so many files. This also makes localizing a pain. The current, rather naive, localizing pattern i implemented (simply based on local repos) takes a VERY long time to localize just a single variant of said packages. With some of these (e.g. houdini) you also get a LOT of releases because of daily builds.
So for these cases it would be nice to be have the payload as a zip and copy that locally then extract as that is a LOT faster than copying the raw files. To not over-complicate for every other case i guess we could live with this being an optional "cache_commands" and simply not adding the payload and making the package fail if not resolved locally in commands().

@bfloch
Copy link
Contributor

bfloch commented Aug 19, 2019

Building on @instinct-vfx comment I believe the transport should be an abstract mechanism.
It should not matter if the remote repository type is a file, or S3+db etc.
It should not matter how we transfer (copy, rsync, zip + copy + unzip, download, gridftp etc.) and I am not even sure this knowledge belongs to the package as much as it belongs to the repository.
Obviously the default implementation should be a simple copy for filesystem repositories but I would love it if we could use this as entry point for other infrastructure dependent localization types.

I understand the reasons for having a dedicated structure but I don't feel particular strong about them.
I can live with mimicking a cache repo for remote repo.
As to variant indices - I am a little confused. Isn't the only use of those during development? I never saw that as something that has to be consistent. Especially since we have hashed variants now this should be even simpler.

I can think of cases where it would make sense to keep the same structure for a cache as a repo. One is drive remapping to eliminate the mentioned reference problems (at least for local references).
We could easily install from CI to a R:-drive that is mapped to the server, then synchronize from that server location (UNC) to a local R: drive and many of the reference issues would be gone.

Maybe it is also something that is not considered, but having caches act as repos could be maybe extended for load balancing or non-centralized localization in future.
We are discussing a "2-step" synchronization here and the problems are related. So e.g.
S3 -> Server -> Local Machine. And in order to better balance the load on the server I was flirting with the idea of peer-to-peer synchronization among the local machines.
In this case I don't really believe that any of these repositories deserve special treatment. Maybe I am over-idealizing the problem, but I wanted to get it out here for discussion.

As to cleaning up caches: Most of us probably have some kind of toolset-management in place that knows exactly which packages are currently deployed for production (for the common artist). So we should be able to provide a set of keep-alive packages in order of usage/importance coming from external sources to help the cleaning mechanism out as opposed to just relying on some heuristic that misses this domain knowledge.

Regardless, great proposal and thanks for all the details!

@nerdvegas
Copy link
Contributor Author

nerdvegas commented Aug 19, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
REP REPs are Rez enhancement proposals that need in-depth discussion
Projects
None yet
Development

No branches or pull requests

3 participants