diff --git a/docs/source/constrained_optimizer.md b/docs/source/constrained_optimizer.md
index 412d5f9b..d688581b 100644
--- a/docs/source/constrained_optimizer.md
+++ b/docs/source/constrained_optimizer.md
@@ -1,3 +1,5 @@
+(optim) =
+
 # Constrained Optimizer
 
 ```{eval-rst}
diff --git a/docs/source/formulations.md b/docs/source/formulations.md
new file mode 100644
index 00000000..a023cb6a
--- /dev/null
+++ b/docs/source/formulations.md
@@ -0,0 +1,18 @@
+
+```{eval-rst}
+.. currentmodule:: cooper.formulation
+```
+
+## Formulation
+
+TODO: move somewhere else?
+
+Formulations denote mathematical or algorithmic techniques aimed at solving a
+specific (family of) CMP. **Cooper** is heavily (but not exclusively!) designed
+for an easy integration of Lagrangian-based formulations. You can find more
+details in {doc}`lagrangian_formulation`.
+
+```{eval-rst}
+.. autoclass:: Formulation
+    :members:
+```
diff --git a/docs/source/index.md b/docs/source/index.md
index 9a766dc1..f1fb03c5 100644
--- a/docs/source/index.md
+++ b/docs/source/index.md
@@ -17,7 +17,7 @@ notebooks/index
 :maxdepth: 2
 
 problem
-lagrangian_formulation
+formulations
 constrained_optimizer
 optim
 multipliers
diff --git a/docs/source/problem.md b/docs/source/problem.md
index 4e2594b3..6fef2b86 100644
--- a/docs/source/problem.md
+++ b/docs/source/problem.md
@@ -1,10 +1,6 @@
-```{eval-rst}
-.. currentmodule:: cooper.problem
-```
-
 (cmp)=
 
-# Constrained Minimization Problem
+# Constrained Minimization Problems
 
 We consider constrained minimization problems (CMPs) expressed as:
 
@@ -13,79 +9,47 @@ $$
 & \,\, \mathbf{g}(\mathbf{x}) \le \mathbf{0} \\ & \,\, \mathbf{h}(\mathbf{x}) = \mathbf{0}
 $$
 
-Here $\Omega$ represents the domain of definition of the functions
-$f, \mathbf{g}$ and $\mathbf{h}$. Note that $f$ is a scalar-valued function, whereas
-$\mathbf{g}$ and $\mathbf{h}$ are vector-valued functions. We group together all the
-inequality constraints in $\mathbf{g}$ and all the equality constraints in $\mathbf{h}$.
+Here $\Omega$ represents the domain of definition of the functions $f, \mathbf{g}$ and $\mathbf{h}$. Note that $f$ is a scalar-valued function, whereas $\mathbf{g}$ and $\mathbf{h}$ are vector-valued functions. We group together all the inequality constraints in $\mathbf{g}$ and all the equality constraints in $\mathbf{h}$.
 In other words, a component function $h_i(x)$ corresponds to the scalar constraint
 $h_i(\mathbf{x}) = 0$.
 
-:::{admonition} Brief notes on conventions and terminology
+:::{admonition} Conventions and terminology
 
 - We refer to $f$ as the **loss** or **objective** to be minimized.
-- We adopt the convention $g(\mathbf{x}) \le 0$ for inequality constraints and
-  $h(\mathbf{x}) = 0$ for equality constraints. If your constraints are different,
-  for example $g(\mathbf{x}) \ge \epsilon$, you should provide **Cooper** with
-  $\epsilon - g(\mathbf{x}) \le 0$.
-- We use the term **constraint violation** to refer to $\mathbf{g}(\mathbf{x})$ and
-    $\mathbf{h}(\mathbf{x})$.
-  that equality constraints $h(x)$ are satisfied *only* when their
-  defect is zero. On the other hand, a *negative* defect for an inequality
-  constraint  $g(x)$ means that the constraint is *strictly* satisfied;
-  while a *positive* defect means that the inequality constraint is being
-  violated.
+- We adopt the convention $g(\mathbf{x}) \le 0$ for inequality constraints and $h(\mathbf{x}) = 0$ for equality constraints. If your constraints are different, for example $g(\mathbf{x}) \ge \epsilon$, you should provide **Cooper** with $\epsilon - g(\mathbf{x}) \le 0$.
+- We use the term **constraint violation** to refer to $\mathbf{g}(\mathbf{x})$ and $\mathbf{h}(\mathbf{x})$. Equality constraints $h(x)$ are satisfied *only* when their defect is zero. On the other hand, a *negative* defect for an inequality constraint $g(x)$ means that the constraint is *strictly* satisfied; while a *positive* defect means that the inequality constraint is being violated.
 :::
 
-## Constraints
-TODO
-
-
-## CMP State
-
-We represent computationally the "state" of a CMP using a {py:class}`CMPState`
-object. A `CMPState` is a {py:class}`dataclasses.dataclass` which contains the
-information about the loss and equality/inequality violations at a given point
-$x$. If a problem has no equality or inequality constraints, these
-arguments can be omitted in the creation of the `CMPState`.
-
-:::{admonition} Stochastic estimates in `CMPState`
-:class: important
+TODO: brief intro to Lagrangian formulations and multipliers. Needed for context bellow. Define primal and dual variables.
 
-In problems for which computing the loss or constraints exactly is prohibitively
-expensive, the {py:class}`CMPState` may contain stochastic estimates of the
-loss/constraints. For example, this is the case when the loss corresponds to a
-sum over a large number of terms, such as training examples. In this case, the
-loss and constraints may be estimated using mini-batches of data.
-
-Note that, just as in the unconstrained case, these approximations can
-entail a compromise in the stability of the optimization process.
+:::{warning}
+**Cooper** is primarily oriented towards **non-convex** CMPs that arise
+in many deep learning applications. That is, problems for which one of
+the functions $f, \mathbf{g}, \mathbf{h}$ is non-convex. While the techniques
+implemented in **Cooper** are applicable to convex problems as well, we
+recommend using specialized solvers for convex optimization problems whenever
+possible.
 :::
 
-```{eval-rst}
-.. autoclass:: CMPState
-    :members: as_tuple
-```
+In order to express CMPs, we will define the following objects:
+- {py:class}`~cooper.constraints.Constraint`: represents a group of constraints, either equality or inequality.
+- {py:class}`~cooper.ConstrainedMinimizationProblem`: represents the constrained minimization problem itself. It must include a method `compute_cmp_state` that computes the loss and constraints at a given point.
 
-For details on the use of proxy-constraints and the `proxy_ineq_defect` and
-`proxy_eq_defect` attributes, please see {ref}`lagrangian_formulations`.
-
-## Constrained Minimization Problem
-
-```{eval-rst}
-.. autoclass:: ConstrainedMinimizationProblem
-    :members:
-```
+Moreover, in order to package the values of the loss and constraints, we will define the following objects:
+- {py:class}`~cooper.constraints.ConstraintState`: represents the state of a {py:class}`~cooper.constraints.Constraint` by packaging its violation.
+- {py:class}`~cooper.CMPState`: represents the state of a CMP at a given point. It contains the values of the loss and {py:class}`~cooper.constraints.ConstraintState` objects for some or all of its associated constraints.
 
 ## Example
 
 The example below illustrates the main steps that need to be carried out to
-define a `ConstrainedMinimizationProblem` in **Cooper**.
+define a {py:class}`~cooper.ConstrainedMinimizationProblem` class. In this
 
-1. *\[Line 4\]* Define a custom class which inherits from {py:class}`ConstrainedMinimizationProblem`.
-2. *\[Line 10\]* Write a closure function that computes the loss and constraints.
-3. *\[Line 14\]* Note how the `misc` attribute can be use to store previous results.
-4. *\[Line 18\]* Return the information about the loss and constraints packaged into a {py:class}`CMPState`.
-5. *\[Line 18\]* (Optional) Modularize the code to allow for evaluating the constraints `only`.
+1. *\[Line 4\]* Define a custom class which inherits from {py:class}`~cooper.ConstrainedMinimizationProblem`.
+2. *\[Line 6\]* Define a multiplier object for the constraints.
+3. *\[Line 8\]* Define the constraint object.
+4. *\[Line 10\]* Implement the `compute_cmp_state` method that computes the loss and constraints.
+5. *\[Line 12\]* Return the information about the loss and constraints packaged into a {py:class}`~cooper.CMPState`.
+6. *\[Line 18\]* (Optional) Modularize the code to allow for evaluating the constraints **only**. This is useful for optimization algorithms that sometimes need to evaluate the constraints without computing the loss.
 
 ```{code-block} python
 :emphasize-lines: 4,10,14,18,20
@@ -94,85 +58,90 @@ define a `ConstrainedMinimizationProblem` in **Cooper**.
 import torch
 import cooper
 
-class MyCustomCMP(cooper.ConstrainedMinimizationProblem):
-    def __init__(self, problem_attributes, criterion):
-        self.problem_attributes = problem_attributes
-        self.criterion = criterion
+class MyCMP(cooper.ConstrainedMinimizationProblem):
+    def __init__(self):
         super().__init__()
-
-    def closure(self, model, inputs, targets):
-
-        cmp_state = self.defect_fn(model, inputs, targets)
-
-        logits = cmp_state.misc["logits"]
-        loss = self.criterion(logits, targets)
+        multiplier = cooper.multipliers.DenseMultiplier(num_constraints=..., device=...)
+        # By default, constraints are built using `formulation_type=cooper.LagrangianFormulation`
+        self.constraint = cooper.Constraint(
+            multiplier=multiplier, constraint_type=cooper.ConstraintType.INEQUALITY
+        )
+
+    def compute_cmp_state(self, model, inputs, targets):
+        loss = ...
+        cmp_state = self.compute_violations(model, inputs, targets)
         cmp_state.loss = loss
 
         return cmp_state
 
-    def defect_fn(self, model, inputs, targets):
+    def compute_violations(self, model, inputs, targets):
+        # This method is optional. It allows for evaluating the constraints without computing the loss.
+        violation = ... # ensure that the constraint follows the convention "g <= 0"
+        constraint_state = cooper.ConstraintState(violation=...)
+        observed_constraints = {self.constraint: constraint_state}
 
-        logits = model.forward(inputs)
+        return cooper.CMPState(loss=None, observed_constraints=observed_constraints)
+```
 
-        const_level0, const_level1 = self.problem_attributes
 
-        # Remember to write the constraints using the convention "g <= 0"!
+## Constraints
 
-        # (Greater than) Inequality that only depends on the model properties or parameters
-        # g_0 >= const_level0 --> const_level0 - g_0 <= 0
-        defect0 = const_level0 - ineq_const0(model)
+{py:class}`~cooper.constraints.Constraint` objects are used to group similar constraints together. While it is possible to have multiple constraints represented by the same {py:class}`~cooper.constraints.Constraint` object, they must share the same type (i.e., all equality or all inequality constraints) and all must be handled through the same {py:class}`~cooper.formulation.Formulation` (for example, all with a Lagrangian formulation). For combining different types of constraints or formulations, you should use separate {py:class}`~cooper.constraints.Constraint` objects.
 
-        # (Less than) Inequality that depends on the model's predictions
-        # g_1 <= const_level1 --> g_2  - const_level1 <= 0
-        defect1 = ineq_const1(logits) - const_level1
+```{eval-rst}
+.. currentmodule:: cooper.constraints
+```
 
-        # We recommend using torch.stack to ensure the dependencies in the computational
-        # graph are properly preserved.
-        ineq_defect = torch.stack([defect0, defect1])
 
-        return cooper.CMPState(ineq_defect=ineq_defect, eq_defect=None, misc={'logits': logits})
+```{eval-rst}
+.. autoclass:: Constraint
+    :members: as_tuple
 ```
 
-:::{warning}
-**Cooper** is primarily oriented towards **non-convex** CMPs that arise
-in many machine/deep learning settings. That is, problems for which one of
-the functions $f, g, h$ or the set $\Omega$ is non-convex.
-
-Whenever possible, we provide references to appropriate literature
-describing convergence results for our implemented (under suitable
-assumptions). In general, however, the use of Lagrangian-based approaches
-for solving non-convex CMPs does not come with guarantees regarding
-optimality or feasibility.
-
-Some theoretical results can be obtained when considering mixed strategies
-(distributions over actions for the primal and dual players), or by relaxing
-the game-theoretic solution concept (i.e. aiming for approximate/correlated
-equilibria), even for problems which are non-convex on the primal (model)
-parameters. For more details, see the work of {cite:t}`cotter2019JMLR` and
-{cite:t}`lin2020gradient` and references therein. We plan to include some
-of these techniques in future versions of **Cooper**.
-
-If you are dealing with optimization problems under "nicely behaved" convex
-constraints (e.g. cones or $L_p$-balls) we encourage you to check out
-[CHOP](https://github.com/openopt/chop). If your problems involves "manifold"
-constraints (e.g. orthogonal or PSD matrices), you might consider using
-[GeoTorch](https://github.com/Lezcano/geotorch).
-:::
+In their simplest form, {py:class}`~cooper.constraints.ConstraintState` objects simply contain the value of the constraint violation. However, they can be extended to enable extra functionality:
+- **Sampled constraints**: if not all violations of a {py:class}`Constraint` are observed at every step, you can still use **Cooper** by providing the observed constraint violations in the {py:class}`~cooper.constraints.ConstraintState`. To do this, provide only the observed violations in `violation`, their corresponding indices in `constraint_features`, and make sure that you are using an {py:class}`~cooper.multipliers.IndexedMultiplier` as the multiplier associated with the constraint. **Cooper** will then know which entries to consider when computing contributions of the constraint to the Lagrangian, and which to ignore.
+- **Implicit parameterization of the Lagrange multipliers** {cite:p}`narasimhan2020multiplier`: similar to the sampled constraints case, you can use an implicit parameterization for the Lagrange multipliers (a neural network, for example). In this case, the `constraint_features` must contain the input features to the Lagrange multiplier model associated with the evaluated constraints. Implicit multipliers are discussed in more detail in {doc}`multipliers`.
+- **Proxy constraints** {cite:p}`cotter2019proxy`: in some settings, it is desirable to use different constraint violations for updating the primal and dual variables. This can be achieved by a `violation`, which will be used for updating the primal variables, and a `strict_violation`, which will be used for updating the dual variables. When following this approach, ensure that the `violation` is differentiable with respect to the primal variables. Note that proxy constraints can be used in conjunction with sampled constraints and implicit parameterization of the Lagrange multipliers, by providing both `constraint_features` and `strict_constraint_features`.
 
 ```{eval-rst}
-.. currentmodule:: cooper.formulation
+.. autoclass:: ConstraintState
+    :members: as_tuple
 ```
 
-## Formulation
 
-TODO: move somewhere else?
+## CMP objects
 
-Formulations denote mathematical or algorithmic techniques aimed at solving a
-specific (family of) CMP. **Cooper** is heavily (but not exclusively!) designed
-for an easy integration of Lagrangian-based formulations. You can find more
-details in {doc}`lagrangian_formulation`.
+```{eval-rst}
+.. currentmodule:: cooper
+```
+
+{py:class}`ConstrainedMinimizationProblem` objects must be implemented by the user, as exemplified in the [example](#example) above.
 
 ```{eval-rst}
-.. autoclass:: Formulation
+.. autoclass:: ConstrainedMinimizationProblem
     :members:
 ```
+
+## CMPState
+
+We represent computationally the "state" of a CMP using a {py:class}`CMPState`
+object. A {py:class}`CMPState` is a dataclass containing the information about the
+loss and the equality/inequality violations at a given point $\mathbf{x}$. The constraints included in the `CMPState` must be passed as a dictionary, where the keys are the {py:class}`Constraint` objects and the values are the associated {py:class}`ConstraintState` objects.
+
+:::{admonition} Stochastic estimates in `CMPState`
+:class: important
+
+In problems for which computing the loss or constraints exactly is prohibitively
+expensive, the {py:class}`CMPState` may contain stochastic estimates of the
+loss/constraints. For example, this is the case when the loss corresponds to a
+sum over a large number of terms, such as training examples. In this case, the
+loss and constraints may be estimated using mini-batches of data.
+
+Note that, just as in the unconstrained case, these approximations can
+entail a compromise in the stability of the optimization process.
+:::
+
+```{eval-rst}
+.. autoclass:: CMPState
+    :members: as_tuple
+```
diff --git a/docs/source/references.bib b/docs/source/references.bib
index aa85570d..db52cb17 100644
--- a/docs/source/references.bib
+++ b/docs/source/references.bib
@@ -15,7 +15,7 @@ @book{bertsekas1999NonlinearProgramming
   publisher = {{Athena scientific}},
   address = {{Belmont, Mass}},
 }
-@article{cotter2019JMLR,
+@article{cotter2019proxy,
   author  = {Andrew Cotter and Heinrich Jiang and Maya Gupta and Serena Wang and Taman Narayan and Seungil You and Karthik Sridharan},
   title   = {{Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals}},
   journal = {Journal of Machine Learning Research},
diff --git a/src/cooper/constraints/constraint.py b/src/cooper/constraints/constraint.py
index 5b986821..53534b42 100644
--- a/src/cooper/constraints/constraint.py
+++ b/src/cooper/constraints/constraint.py
@@ -10,8 +10,8 @@ class Constraint:
     """This class is used to define a constraint in the optimization problem.
 
     Args:
-        constraint_type: One of `cooper.ConstraintType.EQUALITY` or
-            `cooper.ConstraintType.INEQUALITY`.
+        constraint_type: One of :py:class:`cooper.ConstraintType.EQUALITY` or
+            :py:class:`cooper.ConstraintType.INEQUALITY`.
         multiplier: The Lagrange multiplier associated with the constraint.
         formulation_type: The type of formulation for the constrained optimization
             problem. Must be a subclass of :py:class:`~cooper.formulations.Formulation`.
diff --git a/tests/constraints/test_constraint_state.py b/tests/constraints/test_constraint_state.py
index fe224ba0..03f7956d 100644
--- a/tests/constraints/test_constraint_state.py
+++ b/tests/constraints/test_constraint_state.py
@@ -87,7 +87,7 @@ def test_constraint_state_initialization(
 
 def test_constraint_state_initialization_failure(violation, strict_constraint_features):
     with pytest.raises(
-        ValueError, match="strict_violation must be provided if strict_constraint_features is provided."
+        ValueError, match="`strict_violation` must be provided if `strict_constraint_features` is provided."
     ):
         cooper.ConstraintState(violation=violation, strict_constraint_features=strict_constraint_features)