Lazy Archive [FEATURE REQUEST] #469

gresavage · 2024-06-05T16:15:29Z

Description

Allow for the ability to "lazy init" an archive where solution_dim and measure_dim are inferred from the first call to add

Use Case

In the case of a complicated or highly dynamic program where the structure of the archive is not definitively known ahead of time this can be useful. For example, say I have a program which builds the ANN topology for a reinforcement learning algorithm... since it is common in RL to use the same algorithm on a wide variety of environments, the shape of the network input is highly dynamic. I (the user) currently have to delay initialization of the archive until the size of the flattened network parameters are known.

Furthermore, it is common to use convolutional nets for measure encoding. The process of determining the output size of a CNN can be rather tedious. Allowing for "lazy" archive initialization (analogous to PyTorch's lazy modules, e.g. LazyLinear) eases these issues.

Snippet

Aptly noted by @btjanaka in #468 this requires initialization in a separate method. Below is an example of a class which naively implements this:

from inspect import signature

import numpy as np

from ribs.archives import ArchiveBase


class LazyArchive(ArchiveBase):
    def __init__(self, *, base_archive: type[ArchiveBase], **kwargs) -> None:
        self._base_archive = base_archive
        self._init_kwargs = kwargs
        self._has_init = False

        # make sure `self.empty` returns `True` for emitters
        self._num_occupied = 0

    def add(  # noqa: D102
        self,
        solution_batch: np.ndarray,
        objective_batch: np.ndarray,
        measures_batch: np.ndarray,
        metadata_batch: np.ndarray | None = None,
        **other_kwargs,
    ) -> tuple[np.ndarray, np.ndarray]:
        if not self._has_init:
            solution_dim = solution_batch.shape[1]
            measure_dim = measures_batch.shape[1]
            kwargs = {**self._init_kwargs} | {"solution_dim": solution_dim, "measure_dim": measure_dim}
            params = signature(self._base_archive.__init__).parameters
            self._base_archive.__init__(self, **dict(filter(lambda x: x[0] in params, kwargs.items())))
            self._has_init = True

        return self._base_archive.add(
            self,
            solution_batch=solution_batch,
            objective_batch=objective_batch,
            measures_batch=measures_batch,
            metadata_batch=metadata_batch,
            **other_kwargs,
        )

Additionally, a LazyEmitter and LazyScheduler would have to be implemented in a similar fashion in order to handle the fact that the __init__ methods of those classes rely on attributes like solution_dim.

The text was updated successfully, but these errors were encountered:

btjanaka · 2024-06-06T23:13:24Z

Hi @gresavage, thanks for making this suggestion! Your use case seems valid. I think there may be a lot of small obstacles to overcome, but it may be manageable to have lazy classes for each category. One thing is that we would want the lazy archive to behave identically to the original archives after the first add(); a user should not have to change their entire code just to make the initialization a bit easier. As such, I wonder if it is possible to essentially set self = _base_archive after the initialization, as passing through the methods is rather clunky, and the methods in each archive also differ. Hopefully there are some Python tricks that could be useful here.

Small nit-pick: metadata_batch is no longer in pyribs since we have extra_fields in archives. Also, we no longer use _batch on the names (this was a recent change from 0.6.0 to 0.7.0, where I refactored large parts of the archives and also tweaks the APIs a tiny bit).

gresavage · 2024-06-10T17:00:05Z

@btjanaka after thinking about this more, instead of self = _base_archive after initialization it might be better to override __new__, to make an ad-hoc class with the _base_archive in the MRO. Then change the add method here to also replace itself with the _base_archive add like:

if not self._has_init:
  ...
  # stuff to infer dimensions
  ...
  self.add = self._base_archive.add
  return self.add(...) # this should now point to the correct `add` routine

# this codepath should never be reached
Raise RuntimeError("some informative message")

Sorry, I wrote this based on PyRibs 0.6.4, the 0.7 release snuck in without me noticing!

I will have to think some more about how what to do for the lazy classes for each category/init signature.

btjanaka · 2024-06-10T22:49:30Z

The one issue I see is that there are a bunch of methods to replace for each archive, and not all are shared across all the archives. There must be some hacky thing we can do in Python, like "once we receive the call to add, transform this class's API to match that of the _base_archive" I guess it's not exactly self = _base_archive but more like self.adopt_api_of(base_archive).

What do you mean by replacing __new__? And what does MRO stand for?

gresavage added the enhancement New feature or request label Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy Archive [FEATURE REQUEST] #469

Lazy Archive [FEATURE REQUEST] #469

gresavage commented Jun 5, 2024

btjanaka commented Jun 6, 2024

gresavage commented Jun 10, 2024

btjanaka commented Jun 10, 2024

Lazy Archive [FEATURE REQUEST] #469

Lazy Archive [FEATURE REQUEST] #469

Comments

gresavage commented Jun 5, 2024

Description

Use Case

Snippet

btjanaka commented Jun 6, 2024

gresavage commented Jun 10, 2024

btjanaka commented Jun 10, 2024