Skip to content

Commit

Permalink
progress in problem.md
Browse files Browse the repository at this point in the history
  • Loading branch information
juan43ramirez committed Sep 11, 2024
1 parent 3f22392 commit f3fabc2
Show file tree
Hide file tree
Showing 7 changed files with 114 additions and 125 deletions.
2 changes: 2 additions & 0 deletions docs/source/constrained_optimizer.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
(optim) =

# Constrained Optimizer

```{eval-rst}
Expand Down
18 changes: 18 additions & 0 deletions docs/source/formulations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@

```{eval-rst}
.. currentmodule:: cooper.formulation
```

## Formulation

TODO: move somewhere else?

Formulations denote mathematical or algorithmic techniques aimed at solving a
specific (family of) CMP. **Cooper** is heavily (but not exclusively!) designed
for an easy integration of Lagrangian-based formulations. You can find more
details in {doc}`lagrangian_formulation`.

```{eval-rst}
.. autoclass:: Formulation
:members:
```
2 changes: 1 addition & 1 deletion docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ notebooks/index
:maxdepth: 2
problem
lagrangian_formulation
formulations
constrained_optimizer
optim
multipliers
Expand Down
209 changes: 89 additions & 120 deletions docs/source/problem.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,6 @@
```{eval-rst}
.. currentmodule:: cooper.problem
```

(cmp)=

# Constrained Minimization Problem
# Constrained Minimization Problems

We consider constrained minimization problems (CMPs) expressed as:

Expand All @@ -13,79 +9,47 @@ $$
& \,\, \mathbf{g}(\mathbf{x}) \le \mathbf{0} \\ & \,\, \mathbf{h}(\mathbf{x}) = \mathbf{0}
$$

Here $\Omega$ represents the domain of definition of the functions
$f, \mathbf{g}$ and $\mathbf{h}$. Note that $f$ is a scalar-valued function, whereas
$\mathbf{g}$ and $\mathbf{h}$ are vector-valued functions. We group together all the
inequality constraints in $\mathbf{g}$ and all the equality constraints in $\mathbf{h}$.
Here $\Omega$ represents the domain of definition of the functions $f, \mathbf{g}$ and $\mathbf{h}$. Note that $f$ is a scalar-valued function, whereas $\mathbf{g}$ and $\mathbf{h}$ are vector-valued functions. We group together all the inequality constraints in $\mathbf{g}$ and all the equality constraints in $\mathbf{h}$.
In other words, a component function $h_i(x)$ corresponds to the scalar constraint
$h_i(\mathbf{x}) = 0$.

:::{admonition} Brief notes on conventions and terminology
:::{admonition} Conventions and terminology

- We refer to $f$ as the **loss** or **objective** to be minimized.
- We adopt the convention $g(\mathbf{x}) \le 0$ for inequality constraints and
$h(\mathbf{x}) = 0$ for equality constraints. If your constraints are different,
for example $g(\mathbf{x}) \ge \epsilon$, you should provide **Cooper** with
$\epsilon - g(\mathbf{x}) \le 0$.
- We use the term **constraint violation** to refer to $\mathbf{g}(\mathbf{x})$ and
$\mathbf{h}(\mathbf{x})$.
that equality constraints $h(x)$ are satisfied *only* when their
defect is zero. On the other hand, a *negative* defect for an inequality
constraint $g(x)$ means that the constraint is *strictly* satisfied;
while a *positive* defect means that the inequality constraint is being
violated.
- We adopt the convention $g(\mathbf{x}) \le 0$ for inequality constraints and $h(\mathbf{x}) = 0$ for equality constraints. If your constraints are different, for example $g(\mathbf{x}) \ge \epsilon$, you should provide **Cooper** with $\epsilon - g(\mathbf{x}) \le 0$.
- We use the term **constraint violation** to refer to $\mathbf{g}(\mathbf{x})$ and $\mathbf{h}(\mathbf{x})$. Equality constraints $h(x)$ are satisfied *only* when their defect is zero. On the other hand, a *negative* defect for an inequality constraint $g(x)$ means that the constraint is *strictly* satisfied; while a *positive* defect means that the inequality constraint is being violated.
:::

## Constraints
TODO


## CMP State

We represent computationally the "state" of a CMP using a {py:class}`CMPState`
object. A `CMPState` is a {py:class}`dataclasses.dataclass` which contains the
information about the loss and equality/inequality violations at a given point
$x$. If a problem has no equality or inequality constraints, these
arguments can be omitted in the creation of the `CMPState`.

:::{admonition} Stochastic estimates in `CMPState`
:class: important
TODO: brief intro to Lagrangian formulations and multipliers. Needed for context bellow. Define primal and dual variables.

In problems for which computing the loss or constraints exactly is prohibitively
expensive, the {py:class}`CMPState` may contain stochastic estimates of the
loss/constraints. For example, this is the case when the loss corresponds to a
sum over a large number of terms, such as training examples. In this case, the
loss and constraints may be estimated using mini-batches of data.

Note that, just as in the unconstrained case, these approximations can
entail a compromise in the stability of the optimization process.
:::{warning}
**Cooper** is primarily oriented towards **non-convex** CMPs that arise
in many deep learning applications. That is, problems for which one of
the functions $f, \mathbf{g}, \mathbf{h}$ is non-convex. While the techniques
implemented in **Cooper** are applicable to convex problems as well, we
recommend using specialized solvers for convex optimization problems whenever
possible.
:::

```{eval-rst}
.. autoclass:: CMPState
:members: as_tuple
```
In order to express CMPs, we will define the following objects:
- {py:class}`~cooper.constraints.Constraint`: represents a group of constraints, either equality or inequality.
- {py:class}`~cooper.ConstrainedMinimizationProblem`: represents the constrained minimization problem itself. It must include a method `compute_cmp_state` that computes the loss and constraints at a given point.

For details on the use of proxy-constraints and the `proxy_ineq_defect` and
`proxy_eq_defect` attributes, please see {ref}`lagrangian_formulations`.

## Constrained Minimization Problem

```{eval-rst}
.. autoclass:: ConstrainedMinimizationProblem
:members:
```
Moreover, in order to package the values of the loss and constraints, we will define the following objects:
- {py:class}`~cooper.constraints.ConstraintState`: represents the state of a {py:class}`~cooper.constraints.Constraint` by packaging its violation.
- {py:class}`~cooper.CMPState`: represents the state of a CMP at a given point. It contains the values of the loss and {py:class}`~cooper.constraints.ConstraintState` objects for some or all of its associated constraints.

## Example

The example below illustrates the main steps that need to be carried out to
define a `ConstrainedMinimizationProblem` in **Cooper**.
define a {py:class}`~cooper.ConstrainedMinimizationProblem` class. In this

1. *\[Line 4\]* Define a custom class which inherits from {py:class}`ConstrainedMinimizationProblem`.
2. *\[Line 10\]* Write a closure function that computes the loss and constraints.
3. *\[Line 14\]* Note how the `misc` attribute can be use to store previous results.
4. *\[Line 18\]* Return the information about the loss and constraints packaged into a {py:class}`CMPState`.
5. *\[Line 18\]* (Optional) Modularize the code to allow for evaluating the constraints `only`.
1. *\[Line 4\]* Define a custom class which inherits from {py:class}`~cooper.ConstrainedMinimizationProblem`.
2. *\[Line 6\]* Define a multiplier object for the constraints.
3. *\[Line 8\]* Define the constraint object.
4. *\[Line 10\]* Implement the `compute_cmp_state` method that computes the loss and constraints.
5. *\[Line 12\]* Return the information about the loss and constraints packaged into a {py:class}`~cooper.CMPState`.
6. *\[Line 18\]* (Optional) Modularize the code to allow for evaluating the constraints **only**. This is useful for optimization algorithms that sometimes need to evaluate the constraints without computing the loss.

```{code-block} python
:emphasize-lines: 4,10,14,18,20
Expand All @@ -94,85 +58,90 @@ define a `ConstrainedMinimizationProblem` in **Cooper**.
import torch
import cooper
class MyCustomCMP(cooper.ConstrainedMinimizationProblem):
def __init__(self, problem_attributes, criterion):
self.problem_attributes = problem_attributes
self.criterion = criterion
class MyCMP(cooper.ConstrainedMinimizationProblem):
def __init__(self):
super().__init__()
def closure(self, model, inputs, targets):
cmp_state = self.defect_fn(model, inputs, targets)
logits = cmp_state.misc["logits"]
loss = self.criterion(logits, targets)
multiplier = cooper.multipliers.DenseMultiplier(num_constraints=..., device=...)
# By default, constraints are built using `formulation_type=cooper.LagrangianFormulation`
self.constraint = cooper.Constraint(
multiplier=multiplier, constraint_type=cooper.ConstraintType.INEQUALITY
)
def compute_cmp_state(self, model, inputs, targets):
loss = ...
cmp_state = self.compute_violations(model, inputs, targets)
cmp_state.loss = loss
return cmp_state
def defect_fn(self, model, inputs, targets):
def compute_violations(self, model, inputs, targets):
# This method is optional. It allows for evaluating the constraints without computing the loss.
violation = ... # ensure that the constraint follows the convention "g <= 0"
constraint_state = cooper.ConstraintState(violation=...)
observed_constraints = {self.constraint: constraint_state}
logits = model.forward(inputs)
return cooper.CMPState(loss=None, observed_constraints=observed_constraints)
```

const_level0, const_level1 = self.problem_attributes

# Remember to write the constraints using the convention "g <= 0"!
## Constraints

# (Greater than) Inequality that only depends on the model properties or parameters
# g_0 >= const_level0 --> const_level0 - g_0 <= 0
defect0 = const_level0 - ineq_const0(model)
{py:class}`~cooper.constraints.Constraint` objects are used to group similar constraints together. While it is possible to have multiple constraints represented by the same {py:class}`~cooper.constraints.Constraint` object, they must share the same type (i.e., all equality or all inequality constraints) and all must be handled through the same {py:class}`~cooper.formulation.Formulation` (for example, all with a Lagrangian formulation). For combining different types of constraints or formulations, you should use separate {py:class}`~cooper.constraints.Constraint` objects.

# (Less than) Inequality that depends on the model's predictions
# g_1 <= const_level1 --> g_2 - const_level1 <= 0
defect1 = ineq_const1(logits) - const_level1
```{eval-rst}
.. currentmodule:: cooper.constraints
```

# We recommend using torch.stack to ensure the dependencies in the computational
# graph are properly preserved.
ineq_defect = torch.stack([defect0, defect1])

return cooper.CMPState(ineq_defect=ineq_defect, eq_defect=None, misc={'logits': logits})
```{eval-rst}
.. autoclass:: Constraint
:members: as_tuple
```

:::{warning}
**Cooper** is primarily oriented towards **non-convex** CMPs that arise
in many machine/deep learning settings. That is, problems for which one of
the functions $f, g, h$ or the set $\Omega$ is non-convex.

Whenever possible, we provide references to appropriate literature
describing convergence results for our implemented (under suitable
assumptions). In general, however, the use of Lagrangian-based approaches
for solving non-convex CMPs does not come with guarantees regarding
optimality or feasibility.

Some theoretical results can be obtained when considering mixed strategies
(distributions over actions for the primal and dual players), or by relaxing
the game-theoretic solution concept (i.e. aiming for approximate/correlated
equilibria), even for problems which are non-convex on the primal (model)
parameters. For more details, see the work of {cite:t}`cotter2019JMLR` and
{cite:t}`lin2020gradient` and references therein. We plan to include some
of these techniques in future versions of **Cooper**.

If you are dealing with optimization problems under "nicely behaved" convex
constraints (e.g. cones or $L_p$-balls) we encourage you to check out
[CHOP](https://github.com/openopt/chop). If your problems involves "manifold"
constraints (e.g. orthogonal or PSD matrices), you might consider using
[GeoTorch](https://github.com/Lezcano/geotorch).
:::
In their simplest form, {py:class}`~cooper.constraints.ConstraintState` objects simply contain the value of the constraint violation. However, they can be extended to enable extra functionality:
- **Sampled constraints**: if not all violations of a {py:class}`Constraint` are observed at every step, you can still use **Cooper** by providing the observed constraint violations in the {py:class}`~cooper.constraints.ConstraintState`. To do this, provide only the observed violations in `violation`, their corresponding indices in `constraint_features`, and make sure that you are using an {py:class}`~cooper.multipliers.IndexedMultiplier` as the multiplier associated with the constraint. **Cooper** will then know which entries to consider when computing contributions of the constraint to the Lagrangian, and which to ignore.
- **Implicit parameterization of the Lagrange multipliers** {cite:p}`narasimhan2020multiplier`: similar to the sampled constraints case, you can use an implicit parameterization for the Lagrange multipliers (a neural network, for example). In this case, the `constraint_features` must contain the input features to the Lagrange multiplier model associated with the evaluated constraints. Implicit multipliers are discussed in more detail in {doc}`multipliers`.
- **Proxy constraints** {cite:p}`cotter2019proxy`: in some settings, it is desirable to use different constraint violations for updating the primal and dual variables. This can be achieved by a `violation`, which will be used for updating the primal variables, and a `strict_violation`, which will be used for updating the dual variables. When following this approach, ensure that the `violation` is differentiable with respect to the primal variables. Note that proxy constraints can be used in conjunction with sampled constraints and implicit parameterization of the Lagrange multipliers, by providing both `constraint_features` and `strict_constraint_features`.

```{eval-rst}
.. currentmodule:: cooper.formulation
.. autoclass:: ConstraintState
:members: as_tuple
```

## Formulation

TODO: move somewhere else?
## CMP objects

Formulations denote mathematical or algorithmic techniques aimed at solving a
specific (family of) CMP. **Cooper** is heavily (but not exclusively!) designed
for an easy integration of Lagrangian-based formulations. You can find more
details in {doc}`lagrangian_formulation`.
```{eval-rst}
.. currentmodule:: cooper
```

{py:class}`ConstrainedMinimizationProblem` objects must be implemented by the user, as exemplified in the [example](#example) above.

```{eval-rst}
.. autoclass:: Formulation
.. autoclass:: ConstrainedMinimizationProblem
:members:
```

## CMPState

We represent computationally the "state" of a CMP using a {py:class}`CMPState`
object. A {py:class}`CMPState` is a dataclass containing the information about the
loss and the equality/inequality violations at a given point $\mathbf{x}$. The constraints included in the `CMPState` must be passed as a dictionary, where the keys are the {py:class}`Constraint` objects and the values are the associated {py:class}`ConstraintState` objects.

:::{admonition} Stochastic estimates in `CMPState`
:class: important

In problems for which computing the loss or constraints exactly is prohibitively
expensive, the {py:class}`CMPState` may contain stochastic estimates of the
loss/constraints. For example, this is the case when the loss corresponds to a
sum over a large number of terms, such as training examples. In this case, the
loss and constraints may be estimated using mini-batches of data.

Note that, just as in the unconstrained case, these approximations can
entail a compromise in the stability of the optimization process.
:::

```{eval-rst}
.. autoclass:: CMPState
:members: as_tuple
```
2 changes: 1 addition & 1 deletion docs/source/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ @book{bertsekas1999NonlinearProgramming
publisher = {{Athena scientific}},
address = {{Belmont, Mass}},
}
@article{cotter2019JMLR,
@article{cotter2019proxy,
author = {Andrew Cotter and Heinrich Jiang and Maya Gupta and Serena Wang and Taman Narayan and Seungil You and Karthik Sridharan},
title = {{Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals}},
journal = {Journal of Machine Learning Research},
Expand Down
4 changes: 2 additions & 2 deletions src/cooper/constraints/constraint.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ class Constraint:
"""This class is used to define a constraint in the optimization problem.
Args:
constraint_type: One of `cooper.ConstraintType.EQUALITY` or
`cooper.ConstraintType.INEQUALITY`.
constraint_type: One of :py:class:`cooper.ConstraintType.EQUALITY` or
:py:class:`cooper.ConstraintType.INEQUALITY`.
multiplier: The Lagrange multiplier associated with the constraint.
formulation_type: The type of formulation for the constrained optimization
problem. Must be a subclass of :py:class:`~cooper.formulations.Formulation`.
Expand Down
2 changes: 1 addition & 1 deletion tests/constraints/test_constraint_state.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ def test_constraint_state_initialization(

def test_constraint_state_initialization_failure(violation, strict_constraint_features):
with pytest.raises(
ValueError, match="strict_violation must be provided if strict_constraint_features is provided."
ValueError, match="`strict_violation` must be provided if `strict_constraint_features` is provided."
):
cooper.ConstraintState(violation=violation, strict_constraint_features=strict_constraint_features)

Expand Down

0 comments on commit f3fabc2

Please sign in to comment.