From 772bf0803f871f2c3516af76315fb4ed3b0948e2 Mon Sep 17 00:00:00 2001 From: Pietro Monticone <38562595+pitmonticone@users.noreply.github.com> Date: Tue, 31 Jan 2023 23:20:06 +0100 Subject: [PATCH] Fix a few typos --- _weave/lecture02/optimizing.jmd | 2 +- _weave/lecture03/sciml.jmd | 2 +- _weave/lecture04/dynamical_systems.jmd | 2 +- _weave/lecture05/parallelism_overview.jmd | 6 +++--- _weave/lecture06/styles_of_parallelism.jmd | 2 +- _weave/lecture07/discretizing_odes.jmd | 8 ++++---- _weave/lecture08/automatic_differentiation.jmd | 2 +- _weave/lecture09/stiff_odes.jmd | 2 +- _weave/lecture10/estimation_identification.jmd | 2 +- _weave/lecture11/adjoints.jmd | 2 +- _weave/lecture13/gpus.jmd | 4 ++-- _weave/lecture14/pdes_and_convolutions.jmd | 4 ++-- _weave/lecture15/diffeq_machine_learning.jmd | 2 +- course/index.md | 2 +- 14 files changed, 21 insertions(+), 21 deletions(-) diff --git a/_weave/lecture02/optimizing.jmd b/_weave/lecture02/optimizing.jmd index b5bf3b4b..1a792496 100644 --- a/_weave/lecture02/optimizing.jmd +++ b/_weave/lecture02/optimizing.jmd @@ -1118,7 +1118,7 @@ interrupting every single `+`. Fortunately these function calls disappear during the compilation process due to what's known as inlining. Essentially, if the function call is determined to be "cheap enough", the actual function call is removed and the code is basically pasted into the function caller. We can -force a function call to occur by teling it to not inline: +force a function call to occur by telling it to not inline: ```julia @noinline fnoinline(x,y) = x + y diff --git a/_weave/lecture03/sciml.jmd b/_weave/lecture03/sciml.jmd index dc590bc3..ae495a54 100644 --- a/_weave/lecture03/sciml.jmd +++ b/_weave/lecture03/sciml.jmd @@ -679,7 +679,7 @@ sol = solve(prob) plot(sol,label=["Velocity" "Position"]) ``` -Don't worry if you don't understand this sytnax yet: we will go over differential +Don't worry if you don't understand this syntax yet: we will go over differential equation solvers and DifferentialEquations.jl in a later lecture. Let's say we want to learn how to predict the force applied on the spring at diff --git a/_weave/lecture04/dynamical_systems.jmd b/_weave/lecture04/dynamical_systems.jmd index 7ac431a8..5eba0845 100644 --- a/_weave/lecture04/dynamical_systems.jmd +++ b/_weave/lecture04/dynamical_systems.jmd @@ -395,7 +395,7 @@ step through the function pointer. What will approximately be the value of this dynamical system after 1000 steps if you start at `1.0` with parameter `p=0.25`? Can you guess without solving the -system? Think about steady states and stabiltiy. +system? Think about steady states and stability. ```julia solve_system(f,1.0,0.25,1000) diff --git a/_weave/lecture05/parallelism_overview.jmd b/_weave/lecture05/parallelism_overview.jmd index aeac0c68..ed3337ac 100644 --- a/_weave/lecture05/parallelism_overview.jmd +++ b/_weave/lecture05/parallelism_overview.jmd @@ -47,7 +47,7 @@ Each process can have many compute threads. These threads are the unit of execution that needs to be done. On each task is its own stack and a virtual CPU (virtual CPU since it's not the true CPU, since that would require that the task is ON the CPU, which it might not be because the task can be temporarily -haulted). The kernel of the operating systems then *schedules* tasks, which +halted). The kernel of the operating systems then *schedules* tasks, which runs them. In order to keep the computer running smooth, *context switching*, i.e. changing the task that is actually running, happens all the time. This is independent of whether tasks are actually scheduled in parallel or not. @@ -82,7 +82,7 @@ be polled, then it will execute the command, and give the result. There are two variants: - Non-Blocking vs Blocking: Whether the thread will periodically poll for whether that task is complete, or whether it should wait for the task to complete before doing anything else -- Synchronous vs Asynchronus: Whether to execute the operation as initiated by the program or as a response to an event from the kernel. +- Synchronous vs Asynchronous: Whether to execute the operation as initiated by the program or as a response to an event from the kernel. I/O operations cause a *privileged context switch*, allowing the task which is handling the I/O to directly be switched to in order to continue actions. @@ -117,7 +117,7 @@ a higher level interrupt which Julia handles the moment the safety locks says it's okay (these locks occur during memory allocations to ensure that memory is not corrupted). -#### Asynchronus Calling Example +#### Asynchronous Calling Example This example will become more clear when we get to distributed computing, but for think of `remotecall_fetch` as a way to run a command on a different computer. diff --git a/_weave/lecture06/styles_of_parallelism.jmd b/_weave/lecture06/styles_of_parallelism.jmd index 8a2ac87e..c12d6d1f 100644 --- a/_weave/lecture06/styles_of_parallelism.jmd +++ b/_weave/lecture06/styles_of_parallelism.jmd @@ -351,7 +351,7 @@ using blocking structures, one needs to be careful about deadlock! ### Two Programming Models: Loop-Level Parallelism and Task-Based Parallelism As described in the previous lecture, one can also use `Threads.@spawn` to -do multithreading in Julia v1.3+. The same factors all applay: how to do locks +do multithreading in Julia v1.3+. The same factors all apply: how to do locks and Mutex etc. This is a case of a parallelism construct having two alternative **programming models**. `Threads.@spawn` represents task-based parallelism, while `Threads.@threads` represents Loop-Level Parallelism or a parallel iterator diff --git a/_weave/lecture07/discretizing_odes.jmd b/_weave/lecture07/discretizing_odes.jmd index 9f1a4581..fdeeecc5 100644 --- a/_weave/lecture07/discretizing_odes.jmd +++ b/_weave/lecture07/discretizing_odes.jmd @@ -151,7 +151,7 @@ system by describing the force between two particles as: $$F = G \frac{m_1m_2}{r^2}$$ -where $r^2$ is the Euclidian distance between the two particles. From here, we +where $r^2$ is the Euclidean distance between the two particles. From here, we use the fact that $$F = ma$$ @@ -451,7 +451,7 @@ that $$u(t+\Delta t) = u(t) + \Delta t f(u,p,t) + \mathcal{O}(\Delta t^2)$$ This is a first order approximation because the error in our step can be -expresed as an error in the derivative, i.e. +expressed as an error in the derivative, i.e. $$\frac{u(t + \Delta t) - u(t)}{\Delta t} = f(u,p,t) + \mathcal{O}(\Delta t)$$ @@ -542,7 +542,7 @@ be larger, even if it matches another one asymptotically. ## What Makes a Good Method? -### Leading Truncation Coeffcients +### Leading Truncation Coefficients For given orders of explicit Runge-Kutta methods, lower bounds for the number of `f` evaluations (stages) required to receive a given order are known: @@ -743,7 +743,7 @@ Stiffness can thus be approximated in some sense by the condition number of the Jacobian. The condition number of a matrix is its maximal eigenvalue divided by its minimal eigenvalue and gives a rough measure of the local timescale separations. If this value is large and one wants to resolve the slow dynamics, -then explict integrators, like the explicit Runge-Kutta methods described before, +then explicit integrators, like the explicit Runge-Kutta methods described before, have issues with stability. In this case implicit integrators (or other forms of stabilized stepping) are required in order to efficiently reach the end time step. diff --git a/_weave/lecture08/automatic_differentiation.jmd b/_weave/lecture08/automatic_differentiation.jmd index 3b11fa74..5c0c9711 100644 --- a/_weave/lecture08/automatic_differentiation.jmd +++ b/_weave/lecture08/automatic_differentiation.jmd @@ -583,7 +583,7 @@ for $e_i$ the $i$th basis vector, then $f(d) = f(d_0) + Je_1 \epsilon_1 + \ldots + Je_n \epsilon_n$ -computes all columns of the Jacobian simultaniously. +computes all columns of the Jacobian simultaneously. ### Array of Structs Representation diff --git a/_weave/lecture09/stiff_odes.jmd b/_weave/lecture09/stiff_odes.jmd index 8035d74a..d73df3e0 100644 --- a/_weave/lecture09/stiff_odes.jmd +++ b/_weave/lecture09/stiff_odes.jmd @@ -478,7 +478,7 @@ The most common method in the Krylov subspace family of methods is the GMRES method. Essentially, in step $i$ one computes $\mathcal{K}_i$, and finds the $x$ that is the closest to the Krylov subspace, i.e. finds the $x \in \mathcal{K}_i$ such that $\Vert Jx-v \Vert$ is minimized. At each step, it adds the new vector -to the Krylov subspace after orthgonalizing it against the other vectors via +to the Krylov subspace after orthogonalizing it against the other vectors via Arnoldi iterations, leading to an orthogonal basis of $\mathcal{K}_i$ which makes it easy to express $x$. diff --git a/_weave/lecture10/estimation_identification.jmd b/_weave/lecture10/estimation_identification.jmd index c8f08829..df60e2fb 100644 --- a/_weave/lecture10/estimation_identification.jmd +++ b/_weave/lecture10/estimation_identification.jmd @@ -560,7 +560,7 @@ unscaled gradient. ### Multi-Seeding -Similarly to forward-mode having a dual number with multiple simultanious +Similarly to forward-mode having a dual number with multiple simultaneous derivatives through partials $d = x + v_1 \epsilon_1 + \ldots + v_m \epsilon_m$, one can see that multi-seeding is an option in reverse-mode AD by, instead of pulling back a matrix instead of a row vector, where each row is a direction. diff --git a/_weave/lecture11/adjoints.jmd b/_weave/lecture11/adjoints.jmd index aa6ce416..3145ced8 100644 --- a/_weave/lecture11/adjoints.jmd +++ b/_weave/lecture11/adjoints.jmd @@ -143,7 +143,7 @@ does not need to be re-calculated. Using this style, Tracker.jl moves forward, building up the value and closures for the backpass and then recursively pulls back the input `Δ` to receive the -derivatve. +derivative. ### Source-to-Source AD diff --git a/_weave/lecture13/gpus.jmd b/_weave/lecture13/gpus.jmd index 945b9367..8d5a6ed8 100644 --- a/_weave/lecture13/gpus.jmd +++ b/_weave/lecture13/gpus.jmd @@ -74,7 +74,7 @@ Loop: fld f0, 0(x1) i > 0 && @goto Loop # Cycle 8 ``` -With our given latencies and issueing one operation per cycle, +With our given latencies and issuing one operation per cycle, we can execute the loop in 8 cycles. By reordering we can execute it in 7 cycles. Can we do better? @@ -91,7 +91,7 @@ execute it in 7 cycles. Can we do better? By reordering the decrement we can hide the load latency. -- How many cylces are overhead: 2 +- How many cycles are overhead: 2 - How many stall cycles: 2 - How many cycles are actually work: 3 diff --git a/_weave/lecture14/pdes_and_convolutions.jmd b/_weave/lecture14/pdes_and_convolutions.jmd index b78afa63..51df8fdf 100644 --- a/_weave/lecture14/pdes_and_convolutions.jmd +++ b/_weave/lecture14/pdes_and_convolutions.jmd @@ -13,7 +13,7 @@ weave_options: At this point we have identified how the worlds of machine learning and scientific computing collide by looking at the parameter estimation problem. Training neural networks is parameter estimation of a function `f` where `f` is a neural -network. Backpropogation of a neural network is simply the adjoint problem +network. Backpropagation of a neural network is simply the adjoint problem for `f`, and it falls under the class of methods used in reverse-mode automatic differentiation. But this story also extends to structure. Recurrent neural networks are the Euler discretization of a continuous recurrent neural network, @@ -82,7 +82,7 @@ m = Chain( ## Discretizations of Partial Differential Equations -Now let's investigate discertizations of partial differential equations. A +Now let's investigate discretizations of partial differential equations. A canonical differential equation to start with is the Poisson equation. This is the equation: diff --git a/_weave/lecture15/diffeq_machine_learning.jmd b/_weave/lecture15/diffeq_machine_learning.jmd index 4b2d97fa..486b5949 100644 --- a/_weave/lecture15/diffeq_machine_learning.jmd +++ b/_weave/lecture15/diffeq_machine_learning.jmd @@ -77,7 +77,7 @@ example: - [Hybrid differential equations](http://diffeq.sciml.ai/latest/features/callback_functions/) (DEs with event handling) For each of these equations, one can come up with an adjoint definition in order -to define a backpropogation, or perform direct automatic differentiation of the +to define a backpropagation, or perform direct automatic differentiation of the solver code. One such paper in this area includes [neural stochastic differential equations](https://arxiv.org/abs/1905.09883) diff --git a/course/index.md b/course/index.md index 64d015b5..8ce37cea 100644 --- a/course/index.md +++ b/course/index.md @@ -61,7 +61,7 @@ Homework 2: Parameter estimation in dynamical systems and overhead of parallelis - Definition of inverse problems with applications to clinical pharmacology and smartgrid optimization - Adjoint methods for fast gradients - - Automated adjoints through reverse-mode automatic differentiation (backpropogation) + - Automated adjoints through reverse-mode automatic differentiation (backpropagation) - Adjoints of differential equations - Using neural ordinary differential equations as a memory-efficient RNN for deep learning