REP-003: Package and Context Localisation #680

nerdvegas · 2019-07-27T17:37:16Z

REP-003: Package and Context Localisation

Rez contexts contain references to the variants in their resolve, in the form of variant “handles”. A handle is just a metadata dict that uniquely identifies a variant and defines where it is located. For example, a variant handle typically contains the package name, package version, path to the package repository containing it, and index of the variant within that package. From its handle, the package definition can be retrieved, which is needed to configure the environment (for example, commands needs to be evaluated).

When resolved into an environment, a variant will typically reference its installed payload (its “root”) in some way - for example, by adding a “bin” subdirectory to PATH. The installed payload will reside within the package repository directory.

These are two examples of cases where use of a context requires access to external resources - either the package definitions (which would generally come from memcached in a production Rez install, but may also come from disk); and the package installation payload.

There are two different reasons why this may not be desirable:

We wish for a context to act in a standalone manner, so it can be run in isolation. For example, we would like a VM or container running a Rez-based micro service to not have to access memcached, nor to require a mount to shared file storage.
We wish for package payloads to be locally available, rather than having to be fetched over an nfs. This has the potential to significantly improve performance and decrease load on the filer.

Context Localisation

A “localised” or “deep” context would be one that contains entire copies of the package definitions its variants require. This would cause the following differences in behaviour (when compared with standard contexts):

The rxt file size would be larger (possibly significantly);
Changes to package definitions in their original package repositories would not take effect in the deep context;
The deep context would not require fetching of package definitions from memcached, and thus would source faster.

The following features are desirable, in order to give users as much control as possible, and to maintain backwards compatibility:

Use of deep contexts would be configurable, but would also be manually overridable in all cases where contexts are created. For example, rez-env would have a new —context-mode option (with “deep” and “shallow” choices), for use in combination with its —output option.
It should be possible to load a shallow context and convert it to a deep context, and vice versa (probably via the existing rez-context tool).

One thing that also needs to be considered is if it’s ever desirable to store deep contexts into memcached. This would result in more optimal loading of cached resolves, but at the cost of far more cache space - each package definition would effectively be stored many times over (once for each deep context they appear in). It may be better not to support this, and to instead wait for a port to redis, which supports multiple key retrieval (via its mget operation).

Package Localisation

The basic idea behind this feature is that, when a context is created, the variant payloads that that context uses are lazily copied into a local disk cache. However, there are a few points to consider:

We don’t necessarily want to copy all variant payloads, otherwise a context might take too long to create;
Copying package payloads may cause technical problems in some cases;
The user may need control over where the package cache resides - it may be too limiting to assume that there is just one cache.
How do we clean up old localised packages, so the cache doesn’t grow forever?

These points are now addressed in turn.

Localisation Mode

There should be a mode that determines how localisation behaves. Potential modes are:

Full. Localise all packages in the context, regardless of how long it takes;
Limit. Localise up to N packages at any one time;
Time. Localise packages until a time limit is reached;
None. Don’t localise anything.

If rez-env uses any of those last three modes, then a context may only have had some of its packages localised. That is ok though - more and more packages will be cumulatively localised with every resolve the user performs, and this should fairly rapidly result in full localisation anyway. If maximum localisation is priority, then “full” mode can be used, at the cost of context load time (and even then, this should rapidly improve anyway, as more and more packages are localised).

It’s worth noting the distinction between “none” mode localisation, and disabling localisation altogether. None mode would simply make use of any packages already localised; disabling localisation on the other hand would ignore the package cache completely.

Technical Problems

There are some instances where copying package payloads could be problematic:

The package is large, and could eat up too much home directory quota;
The package payload is not position independent (ie moving it will break it, because there are absolute references to itself, with itself; or, relative references to something outside of itself);
One package is not position independent relative to another (perhaps package A is rpathed to package B - moving B will break A’s symbol resolution).

Similar to the existing “non_relocatable” package attribute, there is need also for a “non_localisable” attribute. It would make sense for this to default to “non_relocatable”, as this is a very similar concept, and typically a non-relocatable package would also be non-localisable.

Describing the last case is a little more complex however. It may be necessary for another package attribute, that lists other packages that become non-localisable in their presence. For example, if the aforementioned A and B packages appear in a context, then B is not localisable, because of the presence of A.

Package Cache Location

It would make sense most of the time to use one configured package cache across the board, for any given user. However, there are cases where this doesn’t make sense. For example, to create a standalone context for use in a container, you would want to create a package cache specifically for that container.

Package cache location should be overridable in the Rez API and tools where appropriate. Furthermore, there should be a tool (rez-package-cache lets say) that allows for manually wrangling package caches. It would also be useful to be able to associate a context with a specific cache.

Examples of tool use:

# populate a specific cache using a context
]$ rez-package-cache —populate foo.rxt ~/mycache

# list variants in a cache
]$ rez-package-cache —list-variants ~/mycache

# create a context and associate it with a specific cache:
]$ Rez-env pkgA pkgB —package-cache ~/mycache —output bah.rxt

# create a copy of a context, associated with a different cache
]$ rez-package-cache —bake src.rxt dest.rxt ~/.othercache

# source a context and use a specific package cache
]$ Rez-env —input foo.rxt —package-cache ~/mycache — echo hello

If a context has a baked cache, perhaps it should fall back onto the globally configured cache, for any non-cached packages.

Cleaning up Old Cached Packages

There isn’t really a reliable way to 100% ensure that any given cached package is not still in use. Perhaps we could drop lock files into the cache to indicate they’re being used; but a context that unexpectedly aborts would undoubtedly cause these locks to be left behind, and their associated cached packages never to be deleted. Realistically we probably just want to delete based on date of last use, or a combination of that and max disk usage, or package count.

To allow for this, the rez-package-cache tool should be able to perform deletion based on the parameters described above. Furthermore, cached package directories should be touched on use so we can reliably say when they were last used.

To trigger the deletion, either a cron could be setup on workstations, or perhaps a configured setting would cause Rez itself to perform the cache deletions once every N configured days (or hours etc), each time a context is created or sourced.

Package Cache Structure

The package cache would be structured on disk like so:

{root}/foo/1.2.3/ab58fca5283cfbbbc3ca5680/
{root}/foo/1.2.3/3cfbbbc3ca5680ab58fca528/
{root}/bah/12.0.5/a258fc3ca5680a5283cfbbbc/

Each variant would be stored under a hash of its handle. To help with debugging, the leading directories would be the package name and version respectively. Different hashes within the same package and version represent different variants - either different variants within the same package, or variants from packages with the same name and version, but in a different repository.

When a cached copy of a variant in a context is found, its root is simply changed to that cached variant directory.

Package Immutability

One point to note is that caching is only useful if we can assume that the contents of a package do not change. Otherwise, the time spent verifying that the contents of a variant are the same as that in the cache, would negate the value of caching in the first place.

Fortunately, for the majority case in Rez - packages that have been released - immutability is already a property, and so these can be readily cached. The case where they cannot is local packages, which in practice get re-installed over all the time. So, package caching would be disabled for local packages (which makes sense anyway, because local packages typically already reside on the local disk, so there’s no point caching them).

Standalone Contexts

It’s already been mentioned that we would like to be able to create contexts that are completely standalone, for the purposes of running a micro service for example. In this case, we would like to localise both the context and the packages it uses. To do that, it would make sense to bake the package cache into the context, as shown earlier. However, it would also be practical to store this as a relative reference in the context, so we could copy both context and cache to a service VM or container together, rather than having to construct them in-place.

Here’s what that might look like:

]$ mkdir svr
]$ Rez-env pkgA pkgB —package-cache svr/pkg_cache —rel-package-cache —context-mode deep —output svr/svr.rxt
]$ rez-package-cache —populate svr/svr.rxt

This would:

Create a deep context (package definitions are embedded into the rxt);
Bake the relative path to the package cache, into the context;
Create a package cache (if it didn’t already exist);
Cache all the variants used by the context, into the cache.

We would then have an svr/ directory that we could copy to a server and run the server binary from. The only prerequisites would be:

The server has a system (platform, os etc) that is compatible with the context that was created;
Rez is installed on the server.

Running the service would simply involve:

]$ Rez-env —input svr/svr.rxt — my-server-command

The text was updated successfully, but these errors were encountered:

instinct-vfx · 2019-07-29T11:39:35Z

I gave this a first thorough read. I am not done thinking through all of this but i have a first set of remarks. Note that some of these is a bit playing devil's advocate so take with a grain of salt.

Package Cache Structure:

What is the advantage of a custom cache setup as opposed to "yet another repository, that is local and acts as cache"?

either different variants within the same package, or variants from packages with the same name and version, but in a different repository.

I am not sure how this is meant. Would the same package variant from different repositories create different caches? That would make packages mutable no? I was wondering before if there is a use-case for introducing a package hash (a hash that includes both the metadata in package.py AND the actual file and folder strucutre) to be able to ensure immutability.

Package Immutability

This brings up a case that i wanted to discuss anyways and i think it suits fine here. (Happy to fork the discussion elsewhere if you disagree):

While variants are typically immutable, packages are often not (at least here). The main case that is giving me a headache here are packages that i release programmatically from installers. An example would be VRay. VRay ships as installers. These installers are specific to the Max version and the VRay version. Hence every installer contains a single variant. I have a tool that i can drop the installer (or nightly release) on and it will install to a local temp, gather files, apply some patches, wrap it up as package and release it.
This works fine because Rez handles the merging of variants just fine. BUT that makes the package.py file mutable. The alternative would mean re-releasing all previous versions if a new variant is released. So if we support Max Versions 2015-2018 and 2019 is released i am creating
an additional copy of 4 variants that come at a whopping 1 GB per variant.

How would you handle such a case? It has been giving me a headache in all my planning in regards to syncing packages between offices (=remote repositories) and the case seems to have a LOT of overlap.

Technical Problems:
In addition to the "not_relocatable" and "not_localizable" attributes what about a "should_localzie" or even "must_localize"? We have quite a few packages (e.g. maya or houdini) that can not run from the server location altogether. On a similar note: It would actually be beneficial for us if this very case could be treated as a special case. For these packages we would like to store the payload zipped and localization would be "copy local -> extract" because copying these over the network is a LOT slower than copying the zip.

Great to see this being kicked off, this is a big one for us! 👏

nerdvegas · 2019-07-29T18:48:35Z

Great questions, see below.

On Mon, Jul 29, 2019 at 4:39 AM Thorsten Kaufmann ***@***.***> wrote: I gave this a first thorough read. I am not done thinking through all of this but i have a first set of remarks. Note that some of these is a bit playing devil's advocate *so take with a grain of salt.* *Package Cache Structure:* - What is the advantage of a custom cache setup as opposed to "yet another repository, that is local and acts as cache"? Interestingly, my first take on localisation was also that this would

involve creating a local package repository that acts as a cache. However, consider the following points: * You'd need multiple of them. Cached packages can come from any number of repos, and so you'd need an analogous local cache repo for each * Things actually get pretty complicated when remapping a context to variants in the local cache repos. The problem is that you'd cache a variant at a time - so straight away there's an index mismatch. While your context may refer to variant #3 in a given package, that same variant could be #0 in the cached package (because you don't want to cache packages - you want to cache variants). You'd have to iterate over variants in cached packages and do a true comparison of the variants to find one that matches. * The cache doesn't really need to be a package repo, because nothing else is going to use it as such. Ie, I can't think of a case where you'd want to be able to point REZ_PACKAGES_PATH to a cache repo. So in that sense, we don't lose anything by the cache not being a native package repo. When I thought through the approach that the REP takes - ie, that variant payloads are simply copied into dirs based on a hash of the variant handle - I realised that this is actually a lot simpler to implement, and should also be cheaper at runtime (the remapping from a context's variant handle to the cached variant, is straightforward).

either different variants within the same package, or variants from packages with the same name and version, but in a different repository. - I am not sure how this is meant. Would the same package variant from different repositories create different caches? That would make packages mutable no? I was wondering before if there is a use-case for introducing a package hash (a hash that includes both the metadata in package.py AND the actual file and folder strucutre) to be able to ensure immutability. No, everything goes into the same cache. I'm just saying that that cache

might contain, for example: * variant #0 from foo-1.0.0@/packages/internal * variant #1 from foo-1.0.0@/packages/internal * variant #0 from foo-1.0.0@/packages/external All these variants would be cached under (cache-root)/foo/1.0.0/(hash-of-variant-handle) Maybe the wording isn't great, perhaps it should read "...with the same name and version, but FROM a different repository."

*Package Immutability* This brings up a case that i wanted to discuss anyways and i think it suits fine here. (Happy to fork the discussion elsewhere if you disagree): While variants are typically immutable, packages are often not (at least here). The main case that is giving me a headache here are packages that i release programmatically from installers. An example would be VRay. VRay ships as installers. These installers are specific to the Max version and the VRay version. Hence every installer contains a single variant. I have a tool that i can drop the installer (or nightly release) on and it will install to a local temp, gather files, apply some patches, wrap it up as package and release it. This works fine because Rez handles the merging of variants just fine. BUT that makes the package.py file mutable. The alternative would mean re-releasing all previous versions if a new variant is released. So if we support Max Versions 2015-2018 and 2019 is released i am creating an additional copy of 4 variants that come at a whopping 1 GB per variant. How would you handle such a case? It has been giving me a headache in all my planning in regards to syncing packages between offices (=remote repositories) and the case seems to have a LOT of overlap.

Sorry that's a terminology fail on my behalf.. it's really common to use 'package' and 'variant' interchangeably. What I really mean is variant immutability. So what is the problem specifically - do you mean that every time you release a new variant into a package, you have to sync the entire package again? If so then yeah we have that issue at Method also, we just put up with it. One thing you could do is: * use rez-cp to make a copy of the package, but containing just the new variant; * sync this to the remote site; * rez-cp this one new variant into the target package at the remote site As an aside, we really do need to sort out per-variant timestamps. I've listed that in the REP, but I think it's gonna be a tricky one. This will become a lot more important when installing variants into existing packages becomes more common (and I see that being the case).

*Technical Problems:* In addition to the "not_relocatable" and "not_localizable" attributes what about a "should_localzie" or even "must_localize"? We have quite a few packages (e.g. maya or houdini) that can not run from the server location altogether. On a similar note: It would actually be beneficial for us if this very case could be treated as a special case. For these packages we would like to store the payload zipped and localization would be "copy local -> extract" because copying these over the network is a LOT slower than copying the zip.

This is starting to get pretty involved... really we're almost talking about packages being able to support some sort of custom operation that defines how they get locally cached. Is this a case where you have these DCCs locally installed on workstations? If so, can you not just have a rez package that knows this and binds to the local install anyway, thereby removing the need for caching at all? Back to the caching anyway. Maybe that is the right approach - ie we could support a new cache_commands() package attribute, which would be given the dest cache directory as an argument. The package might then do a number of things, like: * copy a zip and unzip after; * copy the payload, but then do some shennanigans like patchelf to sort out rpathing. Actually that last example raises a point I didn't think of - maybe it'd be good if caching was done in reverse dependency order, so that if a package does need to do any sort of patching, it can be guaranteed that any of its requirements have already been cached (if possible of course). I am not sure about introducing a "must_localise" though, as that suggests that many rez packages that exist today should simply not work when this feature is added. Also, this means you'd _have to_ specify a package cache, which feels like a sort of backwards incompatibility. And a cache by nature should be voluntary.

…

Great to see this being kicked off, this is a big one for us! 👏 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#680>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAMOUSWPA2HEKE4SW7CGR7DQB3JHTANCNFSM4IHKB3NA> .

nerdvegas · 2019-07-29T18:53:20Z

Oh I also meant to mention RE why not package repo as local cache. This would actually make deletion of old cached variants a much harder problem. You'd delete the local variant, but then have to modify the package definition to remove it from there too... or just put up with having a somewhat janky package definition present in the cache. In any case, that's a lot more difficult to solve than simply deleting a directory, which is all you need to do with the REP's current approach.

…

On Mon, Jul 29, 2019 at 11:48 AM Allan Johns ***@***.***> wrote: Great questions, see below. On Mon, Jul 29, 2019 at 4:39 AM Thorsten Kaufmann < ***@***.***> wrote: > I gave this a first thorough read. I am not done thinking through all of > this but i have a first set of remarks. Note that some of these is a bit > playing devil's advocate *so take with a grain of salt.* > > *Package Cache Structure:* > > - What is the advantage of a custom cache setup as opposed to "yet > another repository, that is local and acts as cache"? > > Interestingly, my first take on localisation was also that this would involve creating a local package repository that acts as a cache. However, consider the following points: * You'd need multiple of them. Cached packages can come from any number of repos, and so you'd need an analogous local cache repo for each * Things actually get pretty complicated when remapping a context to variants in the local cache repos. The problem is that you'd cache a variant at a time - so straight away there's an index mismatch. While your context may refer to variant #3 in a given package, that same variant could be #0 in the cached package (because you don't want to cache packages - you want to cache variants). You'd have to iterate over variants in cached packages and do a true comparison of the variants to find one that matches. * The cache doesn't really need to be a package repo, because nothing else is going to use it as such. Ie, I can't think of a case where you'd want to be able to point REZ_PACKAGES_PATH to a cache repo. So in that sense, we don't lose anything by the cache not being a native package repo. When I thought through the approach that the REP takes - ie, that variant payloads are simply copied into dirs based on a hash of the variant handle - I realised that this is actually a lot simpler to implement, and should also be cheaper at runtime (the remapping from a context's variant handle to the cached variant, is straightforward). > > > either different variants within the same package, or variants from > packages with the same name and version, but in a different repository. > > > - I am not sure how this is meant. Would the same package variant > from different repositories create different caches? That would make > packages mutable no? I was wondering before if there is a use-case for > introducing a package hash (a hash that includes both the metadata in > package.py AND the actual file and folder strucutre) to be able to ensure > immutability. > > No, everything goes into the same cache. I'm just saying that that cache might contain, for example: * variant #0 from foo-1.0.0@/packages/internal * variant #1 from foo-1.0.0@/packages/internal * variant #0 from foo-1.0.0@/packages/external All these variants would be cached under (cache-root)/foo/1.0.0/(hash-of-variant-handle) Maybe the wording isn't great, perhaps it should read "...with the same name and version, but FROM a different repository." > > > *Package Immutability* > > This brings up a case that i wanted to discuss anyways and i think it > suits fine here. (Happy to fork the discussion elsewhere if you disagree): > > While variants are typically immutable, packages are often not (at least > here). The main case that is giving me a headache here are packages that i > release programmatically from installers. An example would be VRay. VRay > ships as installers. These installers are specific to the Max version and > the VRay version. Hence every installer contains a single variant. I have a > tool that i can drop the installer (or nightly release) on and it will > install to a local temp, gather files, apply some patches, wrap it up as > package and release it. > This works fine because Rez handles the merging of variants just fine. > BUT that makes the package.py file mutable. The alternative would mean > re-releasing all previous versions if a new variant is released. So if we > support Max Versions 2015-2018 and 2019 is released i am creating > an additional copy of 4 variants that come at a whopping 1 GB per variant. > > How would you handle such a case? It has been giving me a headache in all > my planning in regards to syncing packages between offices (=remote > repositories) and the case seems to have a LOT of overlap. > Sorry that's a terminology fail on my behalf.. it's really common to use 'package' and 'variant' interchangeably. What I really mean is variant immutability. So what is the problem specifically - do you mean that every time you release a new variant into a package, you have to sync the entire package again? If so then yeah we have that issue at Method also, we just put up with it. One thing you could do is: * use rez-cp to make a copy of the package, but containing just the new variant; * sync this to the remote site; * rez-cp this one new variant into the target package at the remote site As an aside, we really do need to sort out per-variant timestamps. I've listed that in the REP, but I think it's gonna be a tricky one. This will become a lot more important when installing variants into existing packages becomes more common (and I see that being the case). > *Technical Problems:* > In addition to the "not_relocatable" and "not_localizable" attributes > what about a "should_localzie" or even "must_localize"? We have quite a few > packages (e.g. maya or houdini) that can not run from the server location > altogether. On a similar note: It would actually be beneficial for us if > this very case could be treated as a special case. For these packages we > would like to store the payload zipped and localization would be "copy > local -> extract" because copying these over the network is a LOT slower > than copying the zip. > This is starting to get pretty involved... really we're almost talking about packages being able to support some sort of custom operation that defines how they get locally cached. Is this a case where you have these DCCs locally installed on workstations? If so, can you not just have a rez package that knows this and binds to the local install anyway, thereby removing the need for caching at all? Back to the caching anyway. Maybe that is the right approach - ie we could support a new cache_commands() package attribute, which would be given the dest cache directory as an argument. The package might then do a number of things, like: * copy a zip and unzip after; * copy the payload, but then do some shennanigans like patchelf to sort out rpathing. Actually that last example raises a point I didn't think of - maybe it'd be good if caching was done in reverse dependency order, so that if a package does need to do any sort of patching, it can be guaranteed that any of its requirements have already been cached (if possible of course). I am not sure about introducing a "must_localise" though, as that suggests that many rez packages that exist today should simply not work when this feature is added. Also, this means you'd _have to_ specify a package cache, which feels like a sort of backwards incompatibility. And a cache by nature should be voluntary. > Great to see this being kicked off, this is a big one for us! 👏 > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#680>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAMOUSWPA2HEKE4SW7CGR7DQB3JHTANCNFSM4IHKB3NA> > . >

instinct-vfx · 2019-08-05T07:52:43Z

Sorry for the delay and thanks for the detailed answer! I think the REP does make sense as is. In regards to the payload issue: These are the cases where we have DCCs that are NOT installed locally. Take houdini or nuke for example. They are VERY big packages and even though you can technically run them of the file share, the load times can be very bad (depending on the OS and storage used obviously). Especially since these consist of so many files. This also makes localizing a pain. The current, rather naive, localizing pattern i implemented (simply based on local repos) takes a VERY long time to localize just a single variant of said packages. With some of these (e.g. houdini) you also get a LOT of releases because of daily builds.
So for these cases it would be nice to be have the payload as a zip and copy that locally then extract as that is a LOT faster than copying the raw files. To not over-complicate for every other case i guess we could live with this being an optional "cache_commands" and simply not adding the payload and making the package fail if not resolved locally in commands().

bfloch · 2019-08-19T21:12:50Z

Building on @instinct-vfx comment I believe the transport should be an abstract mechanism.
It should not matter if the remote repository type is a file, or S3+db etc.
It should not matter how we transfer (copy, rsync, zip + copy + unzip, download, gridftp etc.) and I am not even sure this knowledge belongs to the package as much as it belongs to the repository.
Obviously the default implementation should be a simple copy for filesystem repositories but I would love it if we could use this as entry point for other infrastructure dependent localization types.

I understand the reasons for having a dedicated structure but I don't feel particular strong about them.
I can live with mimicking a cache repo for remote repo.
As to variant indices - I am a little confused. Isn't the only use of those during development? I never saw that as something that has to be consistent. Especially since we have hashed variants now this should be even simpler.

I can think of cases where it would make sense to keep the same structure for a cache as a repo. One is drive remapping to eliminate the mentioned reference problems (at least for local references).
We could easily install from CI to a R:-drive that is mapped to the server, then synchronize from that server location (UNC) to a local R: drive and many of the reference issues would be gone.

Maybe it is also something that is not considered, but having caches act as repos could be maybe extended for load balancing or non-centralized localization in future.
We are discussing a "2-step" synchronization here and the problems are related. So e.g.
S3 -> Server -> Local Machine. And in order to better balance the load on the server I was flirting with the idea of peer-to-peer synchronization among the local machines.
In this case I don't really believe that any of these repositories deserve special treatment. Maybe I am over-idealizing the problem, but I wanted to get it out here for discussion.

As to cleaning up caches: Most of us probably have some kind of toolset-management in place that knows exactly which packages are currently deployed for production (for the common artist). So we should be able to provide a set of keep-alive packages in order of usage/importance coming from external sources to help the cleaning mechanism out as opposed to just relying on some heuristic that misses this domain knowledge.

Regardless, great proposal and thanks for all the details!

nerdvegas · 2019-08-19T23:09:53Z

So one thing to bear in mind is that package repositories do not represent the installed package payloads themselves - they represent _only_ the way in which package definitions are stored, traversed and serialized. Right now there is no mechanism that abstracts the concept of package storage itself. I agree that there probably should be, but that's also a separate feature IMO (and a reasonably significant one). The reason it's been done this way is because I think it's feasible that one may want to mix and match different package _repositories_ with different package _storage_ mechanisms. For example, you might want filesystem-based repos, but S3-based storage; or a db-based repo, and file-based storage. To that end, I don't think that package localisation should attempt to introduce such an abstraction - if done, this must be a separate PR. Bear in mind also that the existing rez-cp tool and API already covers this ground too (an abstracted package storage mechanism would require rez-cp to be updated). In terms of the cache itself - storing that as another package repo does introduce multiple complexities, as I pointed out. One of the hardest to solve is deletion of cached variants. Doing it this way would mean adding support for _deleting_ variants from packages. Not only would this be non-trivial to implement, but it actually goes against one of the main design tenets of rez - that packages are immutable (wrt variants that is. Variants can be added, but existing variants are unchanged). I'm not in favor of this approach, because it makes it much more complicated to implement, for the sake of benefits that are largely theoretical at this point (decentralized peer-to-peer for eg). RE custom caching behavior (eg the zip-based approach as mentioned earlier). Thinking about this, it could be good to introduce a new plugin type to manage this. Packages could specify their localisation strategy (ie plugin to use) in their package definition. Later, if/when we have abstracted package storage, the storage implementation could also have its own localisation strategy also. But again, regardless of how we do this, it'd be a separate PR. I did also mention that we could introduce a new localise_commands() package function for custom localisation. This isn't mutually exclusive with the plugin-based approach, and may get us a usable feature in shorter time, so I think it could make sense to do this also. RE package deletion - I think we should provide basic functionality in the new rez-package-cache tool (eg LRU-like functionality; delete all not used in last N days, etc). I agree that more advanced functionality would be left up to studios, who can write their own deletion tool using the rez API. I do think whitelisting support might be useful to have built in though, although I think I'd leave this to a separate PR (in the interests of getting to MVP asap). Thanks for taking the time to read the REP! A

…

On Tue, Aug 20, 2019 at 7:12 AM Blazej Floch ***@***.***> wrote: Building on @instinct-vfx <https://github.com/instinct-vfx> comment I believe the transport should be an abstract mechanism. It should not matter if the remote repository type is a file, or S3+db etc. It should not matter how we transfer (copy, rsync, zip + copy + unzip, download, gridftp etc.) and I am not even sure this knowledge belongs to the package as much as it belongs to the repository. Obviously the default implementation should be a simple copy for filesystem repositories but I would love it if we could use this as entry point for other infrastructure dependent localization types. I understand the reasons for having a dedicated structure but I don't feel particular strong about them. I can live with mimicking a cache repo for remote repo. As to variant indices - I am a little confused. Isn't the only use of those during development? I never saw that as something that has to be consistent. Especially since we have hashed variants now this should be even simpler. I can think of cases where it would make sense to keep the same structure for a cache as a repo. One is drive remapping to eliminate the mentioned reference problems (at least for local references). We could easily install from CI to a R:-drive that is mapped to the server, then synchronize from that server location (UNC) to a local R: drive and many of the reference issues would be gone. Maybe it is also something that is not considered, but having caches act as repos could be maybe extended for load balancing or non-centralized localization in future. We are discussing a "2-step" synchronization here and the problems are related. So e.g. S3 -> Server -> Local Machine. And in order to better balance the load on the server I was flirting with the idea of peer-to-peer synchronization among the local machines. In this case I don't really believe that any of these repositories deserve special treatment. Maybe I am over-idealizing the problem, but I wanted to get it out here for discussion. As to cleaning up caches: Most of us probably have some kind of toolset-management in place that knows exactly which packages are currently deployed for production (for the common artist). So we should be able to provide a set of keep-alive packages in order of usage/importance coming from external sources to help the cleaning mechanism out as opposed to just relying on some heuristic that misses this domain knowledge. Regardless, great proposal and thanks for all the details! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#680>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAMOUSQMWNOQ4LG6IZBU6QTQFMEFJANCNFSM4IHKB3NA> .

nerdvegas added the REP REPs are Rez enhancement proposals that need in-depth discussion label Jul 27, 2019

nerdvegas mentioned this issue May 9, 2020

added get_variant_from_uri functionality #886

Merged

ColinKennedy mentioned this issue Apr 9, 2022

Allow enable/disable package cache for specific directories. #1281

Open

This was referenced Sep 16, 2022

Should be possible to configure that a resolve blocks until caching completes (and use the cache for all items) #1379

Closed

Adding Artifact Repository #1380

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REP-003: Package and Context Localisation #680

REP-003: Package and Context Localisation #680

nerdvegas commented Jul 27, 2019 •

edited

Loading

instinct-vfx commented Jul 29, 2019

nerdvegas commented Jul 29, 2019 via email

nerdvegas commented Jul 29, 2019 via email

instinct-vfx commented Aug 5, 2019

bfloch commented Aug 19, 2019

nerdvegas commented Aug 19, 2019 via email

REP-003: Package and Context Localisation #680

REP-003: Package and Context Localisation #680

Comments

nerdvegas commented Jul 27, 2019 • edited Loading

REP-003: Package and Context Localisation

Context Localisation

Package Localisation

Localisation Mode

Technical Problems

Package Cache Location

Cleaning up Old Cached Packages

Package Cache Structure

Package Immutability

Standalone Contexts

instinct-vfx commented Jul 29, 2019

nerdvegas commented Jul 29, 2019 via email

nerdvegas commented Jul 29, 2019 via email

instinct-vfx commented Aug 5, 2019

bfloch commented Aug 19, 2019

nerdvegas commented Aug 19, 2019 via email

nerdvegas commented Jul 27, 2019 •

edited

Loading