Gradient Descent • Learning Rate • Underfitting & Overfitting • Bias-Variance Tradeoff • Train/Validation/Test Split
Training a neural network means teaching it to learn patterns from data. Just like students improve through practice and correction, neural networks improve through repeated learning steps. This chapter explains the key ideas behind how training works in Deep Learning: Gradient Descent, Learning Rate, Underfitting & Overfitting, Bias–Variance Tradeoff, and how we divide data into Train, Validation, and Test sets.
1. Gradient Descent
Gradient Descent is the main method used to train neural networks. It helps the model reduce errors and make better predictions.
Simple Explanation
Imagine standing on a hill in thick fog. You want to reach the lowest point (the valley).
You cannot see far, so you feel the ground and take small steps downward.
Each step gets you closer to the lowest point.
This is how Gradient Descent works:
- The "hill" = loss (error)
- The "lowest point" = best model performance
- Each "step downward" = updating weights and bias to reduce error
Why Gradient Descent Is Needed
- Neural networks make predictions.
- We compare predictions with correct answers to calculate loss.
- Gradient Descent changes weights and bias to reduce this loss.
2. Learning Rate
Learning Rate controls how big the steps are during Gradient Descent.
If Learning Rate is too high
- The model jumps around
- It may never reach the best point
- Training becomes unstable
If Learning Rate is too low
- Training becomes very slow
- It may get stuck and not reach the best solution
Ideal Learning Rate
- Not too high.
- Not too low.
- Just the right amount for smooth learning.
3. Underfitting and Overfitting
Neural networks must generalize well, meaning they should work on new, unseen data—not just the data they were trained on.
A. Underfitting
The model is too simple and cannot learn the patterns properly.
Signs of Underfitting
- Poor accuracy on training data
- Poor accuracy on test data
Example
Trying to draw a curve using only a straight line.
The line is too simple to match the shape.
B. Overfitting
The model learns too much, including noise and unnecessary details.
Signs of Overfitting
- Very high accuracy on training data
- Low accuracy on test data
Example
A student memorizes past papers exactly.
But during the actual exam, they fail because the questions are slightly different.
This is overfitting.
4. Bias–Variance Tradeoff
This concept explains the balance needed to avoid underfitting and overfitting.
Bias
Bias is error from making the model too simple.
High bias → underfitting.
Variance
Variance is error from making the model too complex.
High variance → overfitting.
Bias–Variance Tradeoff
A good model:
- Has low bias (it learns enough patterns)
- Has low variance (it does not memorize noise)
Deep Learning aims to find the perfect balance.
5. Train / Validation / Test Split
To train a model properly, we divide the dataset into three parts:
A. Training Set
- Largest portion of the data
- Used to teach the model
- The model adjusts weights using this data
B. Validation Set
- Used during training to check how well the model generalizes
- Helps select:
- Best learning rate
- Best number of layers
- Best number of training epochs
- Prevents overfitting
C. Test Set
- Used after training is complete
- Checks final performance on unseen data
- Tells how well the model works in real life
6. Code Examples (Simple and Beginner-Friendly)
Below are easy examples showing key training concepts.
A. Train/Validation/Test Split Example (Python)
from sklearn.model_selection import train_test_split
import numpy as np
# Sample dataset
X = np.random.rand(1000, 4)
y = np.random.randint(0, 2, 1000)
# 60% train, 20% validation, 20% test
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5)
print("Train size:", len(X_train))
print("Validation size:", len(X_val))
print("Test size:", len(X_test))B. Simple Neural Network Training
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
# Basic neural network
model = Sequential([
Dense(8, activation='relu', input_shape=(4,)),
Dense(1, activation='sigmoid')
])
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
loss='binary_crossentropy',
metrics=['accuracy']
)
history = model.fit(
X_train, y_train,
validation_data=(X_val, y_val),
epochs=10,
batch_size=32
)C. Gradient Descent Demonstration (Manual)
import numpy as np
# Example of updating weight using gradient descent
weight = 2.0 # initial value
learning_rate = 0.1
gradient = 0.6 # example gradient
# Update rule
new_weight = weight - learning_rate * gradient
print("Updated weight:", new_weight)D. Overfitting Detection (Using Training Curves)
import matplotlib.pyplot as plt
# Example training history (dummy values)
train_loss = [0.9, 0.6, 0.4, 0.3, 0.25]
val_loss = [1.0, 0.8, 0.7, 0.9, 1.2] # validation loss increases = overfitting
plt.plot(train_loss, label='Train Loss')
plt.plot(val_loss, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()7. Real-World Example
Imagine Training a Model to Recognize Cats
- Training set: 10,000 cat photos
- Validation set: 2,000 cat photos
- Test set: 2,000 cat photos
If training accuracy = 99%
But test accuracy = 70%
→ Overfitting has occurred.
The model memorized the training images instead of learning real patterns.