Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy Archive [FEATURE REQUEST] #469

Open
gresavage opened this issue Jun 5, 2024 · 3 comments
Open

Lazy Archive [FEATURE REQUEST] #469

gresavage opened this issue Jun 5, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@gresavage
Copy link
Contributor

Description

Allow for the ability to "lazy init" an archive where solution_dim and measure_dim are inferred from the first call to add

Use Case

In the case of a complicated or highly dynamic program where the structure of the archive is not definitively known ahead of time this can be useful. For example, say I have a program which builds the ANN topology for a reinforcement learning algorithm... since it is common in RL to use the same algorithm on a wide variety of environments, the shape of the network input is highly dynamic. I (the user) currently have to delay initialization of the archive until the size of the flattened network parameters are known.

Furthermore, it is common to use convolutional nets for measure encoding. The process of determining the output size of a CNN can be rather tedious. Allowing for "lazy" archive initialization (analogous to PyTorch's lazy modules, e.g. LazyLinear) eases these issues.

Snippet

Aptly noted by @btjanaka in #468 this requires initialization in a separate method. Below is an example of a class which naively implements this:

from inspect import signature

import numpy as np

from ribs.archives import ArchiveBase


class LazyArchive(ArchiveBase):
    def __init__(self, *, base_archive: type[ArchiveBase], **kwargs) -> None:
        self._base_archive = base_archive
        self._init_kwargs = kwargs
        self._has_init = False

        # make sure `self.empty` returns `True` for emitters
        self._num_occupied = 0

    def add(  # noqa: D102
        self,
        solution_batch: np.ndarray,
        objective_batch: np.ndarray,
        measures_batch: np.ndarray,
        metadata_batch: np.ndarray | None = None,
        **other_kwargs,
    ) -> tuple[np.ndarray, np.ndarray]:
        if not self._has_init:
            solution_dim = solution_batch.shape[1]
            measure_dim = measures_batch.shape[1]
            kwargs = {**self._init_kwargs} | {"solution_dim": solution_dim, "measure_dim": measure_dim}
            params = signature(self._base_archive.__init__).parameters
            self._base_archive.__init__(self, **dict(filter(lambda x: x[0] in params, kwargs.items())))
            self._has_init = True

        return self._base_archive.add(
            self,
            solution_batch=solution_batch,
            objective_batch=objective_batch,
            measures_batch=measures_batch,
            metadata_batch=metadata_batch,
            **other_kwargs,
        )

Additionally, a LazyEmitter and LazyScheduler would have to be implemented in a similar fashion in order to handle the fact that the __init__ methods of those classes rely on attributes like solution_dim.

@gresavage gresavage added the enhancement New feature or request label Jun 5, 2024
@btjanaka
Copy link
Member

btjanaka commented Jun 6, 2024

Hi @gresavage, thanks for making this suggestion! Your use case seems valid. I think there may be a lot of small obstacles to overcome, but it may be manageable to have lazy classes for each category. One thing is that we would want the lazy archive to behave identically to the original archives after the first add(); a user should not have to change their entire code just to make the initialization a bit easier. As such, I wonder if it is possible to essentially set self = _base_archive after the initialization, as passing through the methods is rather clunky, and the methods in each archive also differ. Hopefully there are some Python tricks that could be useful here.

Small nit-pick: metadata_batch is no longer in pyribs since we have extra_fields in archives. Also, we no longer use _batch on the names (this was a recent change from 0.6.0 to 0.7.0, where I refactored large parts of the archives and also tweaks the APIs a tiny bit).

@gresavage
Copy link
Contributor Author

@btjanaka after thinking about this more, instead of self = _base_archive after initialization it might be better to override __new__, to make an ad-hoc class with the _base_archive in the MRO. Then change the add method here to also replace itself with the _base_archive add like:

if not self._has_init:
  ...
  # stuff to infer dimensions
  ...
  self.add = self._base_archive.add
  return self.add(...) # this should now point to the correct `add` routine

# this codepath should never be reached
Raise RuntimeError("some informative message")

Sorry, I wrote this based on PyRibs 0.6.4, the 0.7 release snuck in without me noticing!

I will have to think some more about how what to do for the lazy classes for each category/init signature.

@btjanaka
Copy link
Member

The one issue I see is that there are a bunch of methods to replace for each archive, and not all are shared across all the archives. There must be some hacky thing we can do in Python, like "once we receive the call to add, transform this class's API to match that of the _base_archive" I guess it's not exactly self = _base_archive but more like self.adopt_api_of(base_archive).

What do you mean by replacing __new__? And what does MRO stand for?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants