diff --git a/docs/make.jl b/docs/make.jl
index feec4e056..599f21b66 100644
--- a/docs/make.jl
+++ b/docs/make.jl
@@ -90,7 +90,6 @@ makedocs(; sitename="Lux.jl Docs",
         repo="github.com/LuxDL/Lux.jl", devbranch="main", devurl="dev",
         deploy_url="https://lux.csail.mit.edu", deploy_decision),
     draft=false,
-    warnonly=:linkcheck,  # Lately it has been failing quite a lot but those links are actually fine
     pages)
 
 deploydocs(; repo="github.com/LuxDL/Lux.jl.git",
diff --git a/docs/src/manual/autodiff.md b/docs/src/manual/autodiff.md
index a8e16ed53..8413fb070 100644
--- a/docs/src/manual/autodiff.md
+++ b/docs/src/manual/autodiff.md
@@ -14,7 +14,7 @@ Lux. Additionally, we provide some convenience functions for working with AD.
 | [`ForwardDiff.jl`](https://github.com/JuliaDiff/ForwardDiff.jl)    | Forward | ✔️     | ✔️     | ✔️                   | Tier I        |
 | [`ReverseDiff.jl`](https://github.com/JuliaDiff/ReverseDiff.jl)    | Reverse | ✔️     | ❌     | ❌                   | Tier II       |
 | [`Tracker.jl`](https://github.com/FluxML/Tracker.jl)               | Reverse | ✔️     | ✔️     | ❌                   | Tier II       |
-| [`Tapir.jl`](https://github.com/withbayes/Tapir.jl)                | Reverse | ❓[^q] | ❌     | ❌                   | Tier III      |
+| [`Tapir.jl`](https://github.com/compintell/Tapir.jl)               | Reverse | ❓[^q] | ❌     | ❌                   | Tier III      |
 | [`Diffractor.jl`](https://github.com/JuliaDiff/Diffractor.jl)      | Forward | ❓[^q] | ❓[^q] | ❓[^q]               | Tier III      |
 
 [^e]: Currently Enzyme outperforms other AD packages in terms of CPU performance. However,
diff --git a/docs/src/manual/nested_autodiff.md b/docs/src/manual/nested_autodiff.md
index 9ea529fdf..f373a8f95 100644
--- a/docs/src/manual/nested_autodiff.md
+++ b/docs/src/manual/nested_autodiff.md
@@ -192,9 +192,9 @@ nothing; # hide
 
 Hutchinson Trace Estimation often shows up in machine learning literature to provide a fast
 estimate of the trace of a Jacobian Matrix. This is based off of
-[Hutchinson 1990](https://www.researchgate.net/publication/243668757_A_Stochastic_Estimator_of_the_Trace_of_the_Influence_Matrix_for_Laplacian_Smoothing_Splines) which
-computes the estimated trace of a matrix ``A \in \mathbb{R}^{D \times D}`` using random
-vectors ``v \in \mathbb{R}^{D}`` s.t. ``\mathbb{E}\left[v v^T\right] = I``.
+[Hutchinson 1990](https://www.nowozin.net/sebastian/blog/thoughts-on-trace-estimation-in-deep-learning.html)
+which computes the estimated trace of a matrix ``A \in \mathbb{R}^{D \times D}`` using
+random vectors ``v \in \mathbb{R}^{D}`` s.t. ``\mathbb{E}\left[v v^T\right] = I``.
 
 ```math
 \text{Tr}(A) = \mathbb{E}\left[v^T A v\right] = \frac{1}{V} \sum_{i = 1}^V v_i^T A v_i
diff --git a/docs/src/manual/performance_pitfalls.md b/docs/src/manual/performance_pitfalls.md
index 24a17dc14..92be45e0b 100644
--- a/docs/src/manual/performance_pitfalls.md
+++ b/docs/src/manual/performance_pitfalls.md
@@ -67,4 +67,4 @@ GPUArraysCore.allowscalar(false)
 `Lux.jl` is integrated with `DispatchDoctor.jl` to catch type instabilities. You can easily
 enable it by setting the `instability_check` preference. This will help you catch type
 instabilities in your code. For more information on how to set preferences, check out
-[`set_dispatch_doctor_preferences`](@ref).
+[`Lux.set_dispatch_doctor_preferences!`](@ref).
diff --git a/docs/src/manual/preferences.md b/docs/src/manual/preferences.md
index 88117b2ad..eaea213ee 100644
--- a/docs/src/manual/preferences.md
+++ b/docs/src/manual/preferences.md
@@ -50,8 +50,8 @@ By default, both of these preferences are set to `false`.
 ## [Dispatch Doctor](@id dispatch-doctor-preference)
 
 1. `instability_check` - Preference controlling the dispatch doctor. See the documentation
-   on [`set_dispatch_doctor_preferences!`](@ref) for more details. The preferences need to
-   be set for `LuxCore` and `LuxLib` packages. Both of them default to `disable`.
+   on [`Lux.set_dispatch_doctor_preferences!`](@ref) for more details. The preferences need
+   to be set for `LuxCore` and `LuxLib` packages. Both of them default to `disable`.
    - Setting the `LuxCore` preference sets the check at the level of `LuxCore.apply`. This
      essentially activates the dispatch doctor for all Lux layers.
    - Setting the `LuxLib` preference sets the check at the level of functional layer of
diff --git a/examples/Basics/main.jl b/examples/Basics/main.jl
index ec2696365..2ce2fd9df 100644
--- a/examples/Basics/main.jl
+++ b/examples/Basics/main.jl
@@ -3,7 +3,7 @@
 # This is a quick intro to [Lux](https://github.com/LuxDL/Lux.jl) loosely based on:
 #
 # 1. [PyTorch's tutorial](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html).
-# 2. [Flux's tutorial](https://fluxml.ai/Flux.jl/stable/tutorials/2020-09-15-deep-learning-flux/).
+# 2. Flux's tutorial (the link for which has now been lost to abyss).
 # 3. [Jax's tutorial](https://jax.readthedocs.io/en/latest/jax-101/index.html).
 #
 # It introduces basic Julia programming, as well `Zygote`, a source-to-source automatic
diff --git a/examples/BayesianNN/main.jl b/examples/BayesianNN/main.jl
index 4e38e2ee0..31e1635de 100644
--- a/examples/BayesianNN/main.jl
+++ b/examples/BayesianNN/main.jl
@@ -1,11 +1,13 @@
 # # Bayesian Neural Network
 
 # We borrow this tutorial from the
-# [official Turing Docs](https://turinglang.org/stable/tutorials/03-bayesian-neural-network/). We
-# will show how the explicit parameterization of Lux enables first-class composability with
-# packages which expect flattened out parameter vectors.
+# [official Turing Docs](https://turinglang.org/docs/tutorials/03-bayesian-neural-network/index.html).
+# We will show how the explicit parameterization of Lux enables first-class composability
+# with packages which expect flattened out parameter vectors.
 
-# We will use [Turing.jl](https://turinglang.org/stable/) with [Lux.jl](https://lux.csail.mit.edu/)
+# Note: The tutorial in the official Turing docs is now using Lux instead of Flux.
+
+# We will use [Turing.jl](https://turinglang.org/) with [Lux.jl](https://lux.csail.mit.edu/)
 # to implement implementing a classification algorithm. Lets start by importing the relevant
 # libraries.
 
diff --git a/examples/SymbolicOptimalControl/main.jl b/examples/SymbolicOptimalControl/main.jl
index 4fd5dc07d..7a96eb1a0 100644
--- a/examples/SymbolicOptimalControl/main.jl
+++ b/examples/SymbolicOptimalControl/main.jl
@@ -2,8 +2,8 @@
 
 # This tutorial is based on [SciMLSensitivity.jl tutorial](https://docs.sciml.ai/SciMLSensitivity/stable/examples/optimal_control/optimal_control/).
 # Instead of using a classical NN architecture, here we will combine the NN with a symbolic
-# expression from [DynamicExpressions.jl](https://symbolicml.org/DynamicExpressions.jl) (the
-# symbolic engine behind [SymbolicRegression.jl](https://astroautomata.com/SymbolicRegression.jl)
+# expression from [DynamicExpressions.jl](https://symbolicml.org/DynamicExpressions.jl/) (the
+# symbolic engine behind [SymbolicRegression.jl](https://astroautomata.com/SymbolicRegression.jl/)
 # and [PySR](https://github.com/MilesCranmer/PySR/)).
 
 # Here we will solve a classic optimal control problem with a universal differential
diff --git a/src/helpers/losses.jl b/src/helpers/losses.jl
index ba56cc97c..2f1df05eb 100644
--- a/src/helpers/losses.jl
+++ b/src/helpers/losses.jl
@@ -595,7 +595,7 @@ true
 ## Special Note
 
 This function takes any of the
-[`LossFunctions.jl`](https://juliaml.github.io/LossFunctions.jl/stable) public functions
+[`LossFunctions.jl`](https://juliaml.github.io/LossFunctions.jl/stable/) public functions
 into the Lux Losses API with efficient aggregation.
 """
 @concrete struct GenericLossFunction <: AbstractLossFunction