Sigmoid • Tanh • ReLU • Leaky ReLU • Softmax
Activation functions are a key part of neural networks. They decide whether a neuron should become "active" or remain "inactive." This helps the network learn complex patterns such as shapes in images, sounds in speech, or meaning in text. Without activation functions, neural networks would behave like simple linear equations and fail to learn anything meaningful.
This chapter explains the five most important activation functions in Deep Learning: Sigmoid, Tanh, ReLU, Leaky ReLU, and Softmax.
1. Sigmoid Activation Function
The Sigmoid function turns any value into a number between 0 and 1.
This makes it perfect for tasks where the model must output a probability.
Simple Explanation
Sigmoid behaves like a smooth "S-shaped" curve.
- If the output is close to 1 → the model is confident about "Yes"
- If the output is close to 0 → the model is confident about "No"
Where It Is Used
- Binary classification (e.g., "spam" or "not spam")
- Output layer in two-class problems
Real Example
Predicting whether an email is spam:
- Output 0.91 → 91% chance it is spam
- Output 0.12 → 12% chance it is spam
Code Example
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
print(sigmoid(2.0)) # example output2. Tanh Activation Function
Tanh (Hyperbolic Tangent) is similar to Sigmoid but outputs values between -1 and +1.
Why This Is Useful
Tanh is more balanced around zero, so the neuron learns faster.
Where It Is Used
- Hidden layers in neural networks
- Situations where negative values matter
Real Example
Imagine predicting mood on a scale:
- -1 = very sad
- 0 = neutral
- +1 = very happy
Tanh is perfect for such balanced outputs.
Code Example
import numpy as np
def tanh(x):
return np.tanh(x)
print(tanh(1.5))3. ReLU Activation Function
ReLU (Rectified Linear Unit) is the most widely used activation function in modern deep learning.
Simple Behavior
- If input is positive → output is the same
- If input is negative → output becomes 0
Why It Is Powerful
- Very fast
- Avoids slow learning problems
- Works extremely well in image and deep networks
Where It Is Used
- Almost all CNNs and DNNs
- Hidden layers of deep models
Real Example
Imagine detecting edges in an image:
- Positive values = useful features
- Negative values = noise removed by ReLU
Code Example
import numpy as np
def relu(x):
return np.maximum(0, x)
print(relu([-2, 0, 3]))4. Leaky ReLU Activation Function
Leaky ReLU is an improvement over ReLU.
What Problem Does It Solve?
ReLU completely "kills" negative values by making them zero.
Sometimes, this causes neurons to stop learning (called "dead ReLU problem").
Leaky ReLU Fix
Instead of making negative values 0, it allows a small negative slope.
Why It Helps
It keeps neurons active even for negative inputs.
Code Example
import numpy as np
def leaky_relu(x, alpha=0.01):
return np.where(x > 0, x, alpha * x)
print(leaky_relu([-3, 0, 4]))5. Softmax Activation Function
Softmax is used to choose one class out of many.
It converts numbers into probabilities that add up to 1.
Where It Is Used
- Multi-class classification
- Examples:
- Digit recognition (0–9)
- Weather prediction (sunny, rainy, cloudy)
Simple Explanation
If a model predicts scores like:
[2.0, 1.0, 0.1]
Softmax turns them into probabilities like:
[0.65, 0.24, 0.11]
Highest probability = chosen class.
Code Example
import numpy as np
def softmax(x):
exp_x = np.exp(x - np.max(x))
return exp_x / exp_x.sum()
print(softmax([2.0, 1.0, 0.1]))6. Summary Table of Activation Functions
| Activation | Output Range | Common Use | Notes |
|---|---|---|---|
| Sigmoid | 0 to 1 | Binary classification | Can be slow to train |
| Tanh | -1 to 1 | Hidden layers | Faster than sigmoid |
| ReLU | 0 to ∞ | Deep networks | Most popular |
| Leaky ReLU | small −ve to ∞ | Prevents dead neurons | Improvement over ReLU |
| Softmax | 0 to 1 (sum=1) | Multi-class classification | Used in output layer |
7. Real-World Example
Example: Self-Driving Car Sensors
A self-driving car receives many signals:
- Speed
- Distance from objects
- Road shape
- Traffic lights
Activation functions help the neural network decide:
- Should the car brake? → Sigmoid
- Which steering angle to choose? → Tanh
- Which object is in front (car, person, sign)? → Softmax
- Detect edges in camera images → ReLU
Without activation functions, the car could not make intelligent decisions.