diff --git a/docs/src/reference.md b/docs/src/reference.md index 4f90db762..5edde7195 100644 --- a/docs/src/reference.md +++ b/docs/src/reference.md @@ -78,10 +78,11 @@ pad_zeros `NNlib.conv` supports complex datatypes on CPU and CUDA devices. -!!! AMDGPU MIOpen supports only cross-correlation (flipkernel=true). - Therefore for every regular convolution (flipkernel=false) +!!! note "AMDGPU MIOpen supports only cross-correlation (`flipkernel=true`)." + + Therefore for every regular convolution (`flipkernel=false`) kernel is flipped before calculation. - For better performance, use cross-correlation (flipkernel=true) + For better performance, use cross-correlation (`flipkernel=true`) and manually flip the kernel before `NNlib.conv` call. `Flux` handles this automatically, this is only required for direct calls. diff --git a/src/activations.jl b/src/activations.jl index 2df90cd2e..4ed58622a 100644 --- a/src/activations.jl +++ b/src/activations.jl @@ -31,7 +31,7 @@ The ascii name `sigmoid` is also exported. See also [`sigmoid_fast`](@ref). -``` +```julia-repl julia> using UnicodePlots julia> lineplot(sigmoid, -5, 5, height=7) @@ -63,7 +63,7 @@ const sigmoid = σ Piecewise linear approximation of [`sigmoid`](@ref). -``` +```julia-repl julia> lineplot(hardsigmoid, -5, 5, height=7) ┌────────────────────────────────────────┐ 1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⢀⡠⠖⠋⠉⠉⠉⠉⠉⠉⠉⠉│ hardσ(x) @@ -102,7 +102,7 @@ const hardsigmoid = hardσ Return `log(σ(x))` which is computed in a numerically stable way. -``` +```julia-repl julia> lineplot(logsigmoid, -5, 5, height=7) ┌────────────────────────────────────────┐ 0 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡧⠤⠔⠒⠒⠒⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉│ logσ(x) @@ -128,7 +128,7 @@ Segment-wise linear approximation of `tanh`, much cheaper to compute. See ["Large Scale Machine Learning"](https://ronan.collobert.com/pub/matos/2004_phdthesis_lip6.pdf). See also [`tanh_fast`](@ref). -``` +```julia-repl julia> lineplot(hardtanh, -2, 2, height=7) ┌────────────────────────────────────────┐ 1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⣀⠔⠋⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉│ hardtanh(x) @@ -164,7 +164,7 @@ hardtanh(x) = clamp(x, oftype(x, -1), oftype(x, 1)) # clamp(x, -1, 1) is type-s [Rectified Linear Unit](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)) activation function. -``` +```julia-repl julia> lineplot(relu, -2, 2, height=7) ┌────────────────────────────────────────┐ 2 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠔⠋│ relu(x) @@ -188,7 +188,7 @@ Leaky [Rectified Linear Unit](https://en.wikipedia.org/wiki/Rectifier_(neural_ne activation function. You can also specify the coefficient explicitly, e.g. `leakyrelu(x, 0.01)`. -```julia +```julia-repl julia> lineplot(x -> leakyrelu(x, 0.5), -2, 2, height=7) ┌────────────────────────────────────────┐ 2 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠤⠒⠉│ #42(x) @@ -220,7 +220,7 @@ const leakyrelu_a = 0.01 # also used in gradient below activation function capped at 6. See ["Convolutional Deep Belief Networks"](https://www.cs.toronto.edu/~kriz/conv-cifar10-aug2010.pdf) from CIFAR-10. -``` +```julia-repl julia> lineplot(relu6, -10, 10, height=7) ┌────────────────────────────────────────┐ 6 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠎⠉⠉⠉⠉⠉⠉⠉⠉│ relu6(x) @@ -245,7 +245,7 @@ Randomized Leaky Rectified Linear Unit activation function. See ["Empirical Evaluation of Rectified Activations"](https://arxiv.org/abs/1505.00853) You can also specify the bound explicitly, e.g. `rrelu(x, 0.0, 1.0)`. -```julia +```julia-repl julia> lineplot(rrelu, -20, 10, height=7) ┌────────────────────────────────────────┐ 10 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠖⠋│ rrelu(x) @@ -275,7 +275,7 @@ Exponential Linear Unit activation function. See ["Fast and Accurate Deep Network Learning by Exponential Linear Units"](https://arxiv.org/abs/1511.07289). You can also specify the coefficient explicitly, e.g. `elu(x, 1)`. -``` +```julia-repl julia> lineplot(elu, -2, 2, height=7) ┌────────────────────────────────────────┐ 2 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠤⠒⠉│ elu(x) @@ -305,7 +305,7 @@ deriv_elu(Ω, α=1) = ifelse(Ω ≥ 0, one(Ω), Ω + oftype(Ω, α)) Activation function from ["Gaussian Error Linear Units"](https://arxiv.org/abs/1606.08415). -``` +```julia-repl julia> lineplot(gelu, -2, 2, height=7) ┌────────────────────────────────────────┐ 2 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠔⠊│ gelu(x) @@ -363,7 +363,7 @@ end Self-gated activation function. See ["Swish: a Self-Gated Activation Function"](https://arxiv.org/abs/1710.05941). -``` +```julia-repl julia> lineplot(swish, -2, 2, height=7) ┌────────────────────────────────────────┐ 2 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤│ swish(x) @@ -386,7 +386,7 @@ julia> lineplot(swish, -2, 2, height=7) Hard-Swish activation function. See ["Searching for MobileNetV3"](https://arxiv.org/abs/1905.02244). -``` +```julia-repl julia> lineplot(hardswish, -2, 5, height = 7) ┌────────────────────────────────────────┐ 5 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠔⠒⠉│ hardswish(x) @@ -430,7 +430,7 @@ deriv_hardswish(x) = ifelse(x < -3, oftf(x,0), ifelse(x > 3, oftf(x,1), x/3 + of Activation function from ["LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent ..."](https://arxiv.org/abs/1901.05894) -``` +```julia-repl julia> lineplot(lisht, -2, 2, height=7) ┌────────────────────────────────────────┐ 2 │⠢⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠔│ lisht(x) @@ -469,7 +469,7 @@ lisht(x) = x * tanh_fast(x) Scaled exponential linear units. See ["Self-Normalizing Neural Networks"](https://arxiv.org/abs/1706.02515). -``` +```julia-repl julia> lineplot(selu, -3, 2, height=7) ┌────────────────────────────────────────┐ 3 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ selu(x) @@ -507,7 +507,7 @@ end Activation function from ["Continuously Differentiable Exponential Linear Units"](https://arxiv.org/abs/1704.07483). -``` +```julia-repl julia> lineplot(celu, -2, 2, height=7) ┌────────────────────────────────────────┐ 2 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠤⠒⠉│ celu(x) @@ -535,7 +535,7 @@ deriv_celu(Ω, α=1) = ifelse(Ω > 0, oftf(Ω, 1), Ω / oftf(Ω, α) + 1) Threshold gated rectified linear activation function. See ["Zero-bias autoencoders and the benefits of co-adapting features"](https://arxiv.org/abs/1402.3337) -``` +```julia-repl julia> lineplot(trelu, -2, 4, height=7) ┌────────────────────────────────────────┐ 4 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⡤⠖⠋│ trelu(x) @@ -559,7 +559,7 @@ const thresholdrelu = trelu See ["Quadratic Polynomials Learn Better Image Features"](http://www.iro.umontreal.ca/~lisa/publications2/index.php/attachments/single/205) (2009). -``` +```julia-repl julia> lineplot(softsign, -5, 5, height=7) ┌────────────────────────────────────────┐ 1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⣀⣀⣀⣀⠤⠤⠤⠤⠤│ softsign(x) @@ -602,7 +602,7 @@ deriv_softsign(x) = 1 / (1 + abs(x))^2 See ["Deep Sparse Rectifier Neural Networks"](http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf), JMLR 2011. -``` +```julia-repl julia> lineplot(softplus, -3, 3, height=7) ┌────────────────────────────────────────┐ 4 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ softplus(x) @@ -640,7 +640,7 @@ softplus(x) = log1p(exp(-abs(x))) + relu(x) Return `log(cosh(x))` which is computed in a numerically stable way. -``` +```julia-repl julia> lineplot(logcosh, -5, 5, height=7) ┌────────────────────────────────────────┐ 5 │⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ logcosh(x) @@ -664,7 +664,7 @@ const log2 = log(2) Activation function from ["Mish: A Self Regularized Non-Monotonic Neural Activation Function"](https://arxiv.org/abs/1908.08681). -``` +```julia-repl julia> lineplot(mish, -5, 5, height=7) ┌────────────────────────────────────────┐ 5 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⡠⠖⠋│ mish(x) @@ -686,7 +686,7 @@ mish(x) = x * tanh(softplus(x)) See ["Tanhshrink Activation Function"](https://www.gabormelli.com/RKB/Tanhshrink_Activation_Function). -``` +```julia-repl julia> lineplot(tanhshrink, -3, 3, height=7) ┌────────────────────────────────────────┐ 3 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ tanhshrink(x) @@ -712,7 +712,7 @@ tanhshrink(x) = x - tanh_fast(x) See ["Softshrink Activation Function"](https://www.gabormelli.com/RKB/Softshrink_Activation_Function). -``` +```julia-repl julia> lineplot(softshrink, -2, 2, height=7) ┌────────────────────────────────────────┐ 2 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀│ softshrink(x) @@ -770,7 +770,7 @@ For any other number types, it just calls `tanh`. See also [`sigmoid_fast`](@ref). -``` +```julia-repl julia> tanh(0.5f0) 0.46211717f0 @@ -808,11 +808,11 @@ tanh_fast(x::Number) = Base.tanh(x) sigmoid_fast(x) This is a faster, and very slightly less accurate, version of `sigmoid`. -For `x::Float32, perhaps 3 times faster, and maximum errors 2 eps instead of 1. +For `x::Float32`, perhaps 3 times faster, and maximum errors 2 eps instead of 1. See also [`tanh_fast`](@ref). -``` +```julia-repl julia> sigmoid(0.2f0) 0.54983395f0 diff --git a/src/audio/mel.jl b/src/audio/mel.jl index 6fda9a091..4181c23eb 100644 --- a/src/audio/mel.jl +++ b/src/audio/mel.jl @@ -4,7 +4,7 @@ fmin::Float32 = 0f0, fmax::Float32 = Float32(sample_rate ÷ 2)) Create triangular Mel scale filter banks -(ref: https://en.wikipedia.org/wiki/Mel_scale). +(ref: [Mel scale - Wikipedia](https://en.wikipedia.org/wiki/Mel_scale)). Each column is a filterbank that highlights its own frequency. # Arguments: diff --git a/src/audio/stft.jl b/src/audio/stft.jl index 847fccad6..c90e7f49c 100644 --- a/src/audio/stft.jl +++ b/src/audio/stft.jl @@ -5,14 +5,14 @@ ) where T <: Real Hamming window function -(ref: https://en.wikipedia.org/wiki/Window_function#Hann_and_Hamming_windows). +(ref: [Window function § Hann and Hamming windows - Wikipedia](https://en.wikipedia.org/wiki/Window_function#Hann_and_Hamming_windows)). Generalized version of `hann_window`. -``w[n] = \\alpha - \\beta cos(\\frac{2 \\pi n}{N - 1})`` +``w[n] = \\alpha - \\beta \\cos(\\frac{2 \\pi n}{N - 1})`` Where ``N`` is the window length. -```julia +```julia-repl julia> lineplot(hamming_window(100); width=30, height=10) ┌──────────────────────────────┐ 1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠚⠉⠉⠉⠢⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ @@ -72,13 +72,13 @@ end ) where T <: Real Hann window function -(ref: https://en.wikipedia.org/wiki/Window_function#Hann_and_Hamming_windows). +(ref: [Window function § Hann and Hamming windows - Wikipedia](https://en.wikipedia.org/wiki/Window_function#Hann_and_Hamming_windows)). -``w[n] = \\frac{1}{2}[1 - cos(\\frac{2 \\pi n}{N - 1})]`` +``w[n] = \\frac{1}{2}[1 - \\cos(\\frac{2 \\pi n}{N - 1})]`` Where ``N`` is the window length. -```julia +```julia-repl julia> lineplot(hann_window(100); width=30, height=10) ┌──────────────────────────────┐ 1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣠⠚⠉⠉⠉⠢⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ @@ -138,7 +138,7 @@ Short-time Fourier transform (STFT). The STFT computes the Fourier transform of short overlapping windows of the input, giving frequency components of the signal as they change over time. -``Y[\\omega, m] = \\sum_{k = 0}^{N - 1} \\text{window}[k] \\text{input}[m \\times \\text{hop length} + k] exp(-j \\frac{2 \\pi \\omega k}{\\text{n fft}})`` +``Y[\\omega, m] = \\sum_{k = 0}^{N - 1} \\text{window}[k] \\text{input}[m \\times \\text{hop length} + k] \\exp(-j \\frac{2 \\pi \\omega k}{\\text{n fft}})`` where ``N`` is the window length, ``\\omega`` is the frequency ``0 \\le \\omega < \\text{n fft}`` diff --git a/src/ctc.jl b/src/ctc.jl index 6202622c3..c5188768f 100644 --- a/src/ctc.jl +++ b/src/ctc.jl @@ -23,7 +23,8 @@ function logaddexp(a, b) end """ - add_blanks(z) + add_blanks(z) + Adds blanks to the start and end of `z`, and between items in `z` """ function add_blanks(z, blank) diff --git a/src/dim_helpers/DepthwiseConvDims.jl b/src/dim_helpers/DepthwiseConvDims.jl index 8163a3def..fbfbcd718 100644 --- a/src/dim_helpers/DepthwiseConvDims.jl +++ b/src/dim_helpers/DepthwiseConvDims.jl @@ -2,7 +2,7 @@ DepthwiseConvDims Concrete subclass of `ConvDims` for a depthwise convolution. Differs primarily due to -characterization by C_in, C_mult, rather than C_in, C_out. Useful to be separate from +characterization by `C_in`, `C_mult`, rather than `C_in`, `C_out`. Useful to be separate from DenseConvDims primarily for channel calculation differences. """ struct DepthwiseConvDims{N, K, S, P, D} <: ConvDims{N} diff --git a/src/dim_helpers/PoolDims.jl b/src/dim_helpers/PoolDims.jl index 75d56b8cd..bfb39e1bc 100644 --- a/src/dim_helpers/PoolDims.jl +++ b/src/dim_helpers/PoolDims.jl @@ -1,6 +1,6 @@ """ PoolDims(x_size::NTuple{M}, k::Union{NTuple{L, Int}, Int}; - stride=k, padding=0, dilation=1) where {M, L} + stride=k, padding=0, dilation=1) where {M, L} Dimensions for a "pooling" operation that can have an arbitrary input size, kernel size, stride, dilation, and channel count. Used to dispatch onto efficient implementations at diff --git a/src/dropout.jl b/src/dropout.jl index 02673cf03..44fc59c14 100644 --- a/src/dropout.jl +++ b/src/dropout.jl @@ -12,7 +12,7 @@ i.e. each row of a matrix is either zero or not. Optional first argument is the random number generator used. # Examples -``` +```julia-repl julia> dropout(ones(2, 10), 0.2) 2×10 Matrix{Float64}: 1.25 1.25 0.0 1.25 1.25 1.25 1.25 1.25 1.25 1.25 diff --git a/src/pooling.jl b/src/pooling.jl index 6caa1045d..59db9b465 100644 --- a/src/pooling.jl +++ b/src/pooling.jl @@ -162,7 +162,7 @@ Perform mean pool operation with window size `k` on input tensor `x`. Arguments: -* `x` and `k`: Expects `ndim(x) ∈ 3:5``, and always `length(k) == ndim(x) - 2` +* `x` and `k`: Expects `ndim(x) ∈ 3:5`, and always `length(k) == ndim(x) - 2` * `pad`: See [`pad_zeros`](@ref) for details. * `stride`: Either a tuple with the same length as `k`, or one integer for all directions. Default is `k`. """ @@ -182,7 +182,7 @@ This pooling operator from [Learned-Norm Pooling for Deep Feedforward and Recurr Arguments: -* `x` and `k`: Expects `ndim(x) ∈ 3:5``, and always `length(k) == ndim(x) - 2` +* `x` and `k`: Expects `ndim(x) ∈ 3:5`, and always `length(k) == ndim(x) - 2` * `p` is restricted to `0 < p < Inf`. * `pad`: See [`pad_zeros`](@ref) for details. * `stride`: Either a tuple with the same length as `k`, or one integer for all directions. Default is `k`. diff --git a/src/softmax.jl b/src/softmax.jl index 4ed1af957..709b828be 100644 --- a/src/softmax.jl +++ b/src/softmax.jl @@ -39,7 +39,7 @@ Note that, when used with Flux.jl, `softmax` must not be passed to layers like ` which accept an activation function. The activation is broadcasted over the result, thus applies to individual numbers. But `softmax` always needs to see the whole column. -```julia +```julia-repl julia> using Flux julia> x = randn(Float32, 4, 4, 3, 13); diff --git a/src/utils.jl b/src/utils.jl index 3d23e7383..baf95c8da 100644 --- a/src/utils.jl +++ b/src/utils.jl @@ -10,7 +10,7 @@ pass it an array whose gradient is of interest. There is also an overload for ForwardDiff.jl's `Dual` types (and arrays of them). # Examples -``` +```julia-repl julia> using ForwardDiff, Zygote, NNlib julia> f_good(x) = if NNlib.within_gradient(x)