Skip to content

Latest commit

 

History

History
57 lines (31 loc) · 2.75 KB

Activation Functions.md

File metadata and controls

57 lines (31 loc) · 2.75 KB

Below are some of the details about the Activation Functions. Choosing the suitable Activation Function can improve the neural network performance.

image

Sigmoid Activation Function Advantages:

  • Non-linear: It can capture the non-linearity in the data easily.

  • Probability Output: Outputs values between 0 and 1, suitable for binary classification as probabilities.

  • Smooth Gradient: The function is smooth and differentiable, aiding gradient-based optimization.

  • Historical Use: Mimics the firing rate of neurons, making it a traditional choice in neural networks.

Disadvantages:

  • Saturating Function: For very high or very low inputs, the gradient becomes very small, causing Vanishing Gradient Problem.

  • Not Zero-Centered: Outputs are always positive, leading to inefficient gradient updates and slower convergence.

  • Computational Cost: Involves exponential calculations, making it more computationally expensive than ReLU.

image

Tanh (Hyperbolic Tangent) Activation Function Advantages:

  • Non-linear: It can also capture the non-linearity in the data easily.

  • Zero-Centered Output: Outputs range from -1 to 1, aiding in centering data and potentially faster convergence.

  • Smooth Gradient: The gradient is larger for inputs close to zero compared to sigmoid, reducing the vanishing gradient problem.

Disadvantages:

  • Saturating Function: For very high or very low inputs, the gradient still becomes small causing vanishing gradient problem, though less severely than sigmoid.

  • Computational Cost: Similar to sigmoid, it involves exponential calculations, making it computationally expensive.

image

ReLU (Rectified Linear Unit) Activation Function Advantages:

  • Efficient Computation: Simple thresholding at zero makes it computationally efficient.

  • Non-Saturating Gradients: Does not saturate for positive inputs, maintaining large gradients and efficient learning. Negative inputs, however, lead to zero gradients (dying ReLU problem) which can cause vanishing gradient problem.

  • Sparse Activation: Produces sparse outputs (many neurons output zero), improving model efficiency and generalization.

Disadvantages:

  • Non Zero-Centered: Similar to Sigmoid, ReLU outputs are not zero-centered, which can cause issues in gradient updates (It can be resolved using batch normalization).

  • Dying ReLU Problem: Neurons can "die" if they get stuck outputting zero for all inputs, leading to no learning.

  • Unbounded Output: Outputs can grow very large, potentially requiring careful weight initialization and regularization techniques.