FUNDAMENTALS OF TRAINING

Training a neural network means teaching it to learn patterns from data. Just like students improve through practice and correction, neural networks improve through repeated learning steps. This chapter explains the key ideas behind how training works in Deep Learning: Gradient Descent, Learning Rate, Underfitting & Overfitting, Bias–Variance Tradeoff, and how we divide data into Train, Validation, and Test sets.

1. Gradient Descent

Gradient Descent is the main method used to train neural networks. It helps the model reduce errors and make better predictions.

Simple Explanation

Imagine standing on a hill in thick fog. You want to reach the lowest point (the valley).

You cannot see far, so you feel the ground and take small steps downward.

Each step gets you closer to the lowest point.

This is how Gradient Descent works:

The "hill" = loss (error)
The "lowest point" = best model performance
Each "step downward" = updating weights and bias to reduce error

Why Gradient Descent Is Needed

Neural networks make predictions.
We compare predictions with correct answers to calculate loss.
Gradient Descent changes weights and bias to reduce this loss.

2. Learning Rate

Learning Rate controls how big the steps are during Gradient Descent.

If Learning Rate is too high

The model jumps around
It may never reach the best point
Training becomes unstable

If Learning Rate is too low

Training becomes very slow
It may get stuck and not reach the best solution

Ideal Learning Rate

Not too high.
Not too low.
Just the right amount for smooth learning.

3. Underfitting and Overfitting

Neural networks must generalize well, meaning they should work on new, unseen data—not just the data they were trained on.

A. Underfitting

The model is too simple and cannot learn the patterns properly.

Signs of Underfitting

Poor accuracy on training data
Poor accuracy on test data

Example

Trying to draw a curve using only a straight line.

The line is too simple to match the shape.

B. Overfitting

The model learns too much, including noise and unnecessary details.

Signs of Overfitting

Very high accuracy on training data
Low accuracy on test data

Example

A student memorizes past papers exactly.

But during the actual exam, they fail because the questions are slightly different.

This is overfitting.

4. Bias–Variance Tradeoff

This concept explains the balance needed to avoid underfitting and overfitting.

Bias

Bias is error from making the model too simple.

High bias → underfitting.

Variance

Variance is error from making the model too complex.

High variance → overfitting.

Bias–Variance Tradeoff

A good model:

Has low bias (it learns enough patterns)
Has low variance (it does not memorize noise)

Deep Learning aims to find the perfect balance.

5. Train / Validation / Test Split

To train a model properly, we divide the dataset into three parts:

A. Training Set

Largest portion of the data
Used to teach the model
The model adjusts weights using this data

B. Validation Set

Used during training to check how well the model generalizes
Helps select:
- Best learning rate
- Best number of layers
- Best number of training epochs
Prevents overfitting

C. Test Set

Used after training is complete
Checks final performance on unseen data
Tells how well the model works in real life

6. Code Examples (Simple and Beginner-Friendly)

Below are easy examples showing key training concepts.

A. Train/Validation/Test Split Example (Python)

from sklearn.model_selection import train_test_split
import numpy as np

# Sample dataset
X = np.random.rand(1000, 4)
y = np.random.randint(0, 2, 1000)

# 60% train, 20% validation, 20% test
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5)

print("Train size:", len(X_train))
print("Validation size:", len(X_val))
print("Test size:", len(X_test))

B. Simple Neural Network Training

import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

# Basic neural network
model = Sequential([
    Dense(8, activation='relu', input_shape=(4,)),
    Dense(1, activation='sigmoid')
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=10,
    batch_size=32
)

C. Gradient Descent Demonstration (Manual)

import numpy as np

# Example of updating weight using gradient descent
weight = 2.0      # initial value
learning_rate = 0.1
gradient = 0.6    # example gradient

# Update rule
new_weight = weight - learning_rate * gradient
print("Updated weight:", new_weight)

D. Overfitting Detection (Using Training Curves)

import matplotlib.pyplot as plt

# Example training history (dummy values)
train_loss = [0.9, 0.6, 0.4, 0.3, 0.25]
val_loss   = [1.0, 0.8, 0.7, 0.9, 1.2]  # validation loss increases = overfitting

plt.plot(train_loss, label='Train Loss')
plt.plot(val_loss, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

7. Real-World Example

Imagine Training a Model to Recognize Cats

Training set: 10,000 cat photos
Validation set: 2,000 cat photos
Test set: 2,000 cat photos

If training accuracy = 99%

But test accuracy = 70%

→ Overfitting has occurred.

The model memorized the training images instead of learning real patterns.

FUNDAMENTALS OF TRAINING

1. Gradient Descent

Gradient Descent is the main method used to train neural networks. It helps the model reduce errors and make better predictions.

Simple Explanation

Imagine standing on a hill in thick fog. You want to reach the lowest point (the valley).

You cannot see far, so you feel the ground and take small steps downward.

Each step gets you closer to the lowest point.

This is how Gradient Descent works:

The "hill" = loss (error)
The "lowest point" = best model performance
Each "step downward" = updating weights and bias to reduce error

Why Gradient Descent Is Needed

Neural networks make predictions.
We compare predictions with correct answers to calculate loss.
Gradient Descent changes weights and bias to reduce this loss.

2. Learning Rate

Learning Rate controls how big the steps are during Gradient Descent.

If Learning Rate is too high

The model jumps around
It may never reach the best point
Training becomes unstable

If Learning Rate is too low

Training becomes very slow
It may get stuck and not reach the best solution

Ideal Learning Rate

Not too high.
Not too low.
Just the right amount for smooth learning.

3. Underfitting and Overfitting

Neural networks must generalize well, meaning they should work on new, unseen data—not just the data they were trained on.

A. Underfitting

The model is too simple and cannot learn the patterns properly.

Signs of Underfitting

Poor accuracy on training data
Poor accuracy on test data

Example

Trying to draw a curve using only a straight line.

The line is too simple to match the shape.

B. Overfitting

The model learns too much, including noise and unnecessary details.

Signs of Overfitting

Very high accuracy on training data
Low accuracy on test data

Example

A student memorizes past papers exactly.

But during the actual exam, they fail because the questions are slightly different.

This is overfitting.

4. Bias–Variance Tradeoff

This concept explains the balance needed to avoid underfitting and overfitting.

Bias

Bias is error from making the model too simple.

High bias → underfitting.

Variance

Variance is error from making the model too complex.

High variance → overfitting.

Bias–Variance Tradeoff

A good model:

Has low bias (it learns enough patterns)
Has low variance (it does not memorize noise)

Deep Learning aims to find the perfect balance.

5. Train / Validation / Test Split

To train a model properly, we divide the dataset into three parts:

A. Training Set

Largest portion of the data
Used to teach the model
The model adjusts weights using this data

B. Validation Set

Used during training to check how well the model generalizes
Helps select:
- Best learning rate
- Best number of layers
- Best number of training epochs
Prevents overfitting

C. Test Set

Used after training is complete
Checks final performance on unseen data
Tells how well the model works in real life

6. Code Examples (Simple and Beginner-Friendly)

Below are easy examples showing key training concepts.

A. Train/Validation/Test Split Example (Python)

from sklearn.model_selection import train_test_split
import numpy as np

# Sample dataset
X = np.random.rand(1000, 4)
y = np.random.randint(0, 2, 1000)

# 60% train, 20% validation, 20% test
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5)

print("Train size:", len(X_train))
print("Validation size:", len(X_val))
print("Test size:", len(X_test))

B. Simple Neural Network Training

import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

# Basic neural network
model = Sequential([
    Dense(8, activation='relu', input_shape=(4,)),
    Dense(1, activation='sigmoid')
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=10,
    batch_size=32
)

C. Gradient Descent Demonstration (Manual)

import numpy as np

# Example of updating weight using gradient descent
weight = 2.0      # initial value
learning_rate = 0.1
gradient = 0.6    # example gradient

# Update rule
new_weight = weight - learning_rate * gradient
print("Updated weight:", new_weight)

D. Overfitting Detection (Using Training Curves)

import matplotlib.pyplot as plt

# Example training history (dummy values)
train_loss = [0.9, 0.6, 0.4, 0.3, 0.25]
val_loss   = [1.0, 0.8, 0.7, 0.9, 1.2]  # validation loss increases = overfitting

plt.plot(train_loss, label='Train Loss')
plt.plot(val_loss, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

7. Real-World Example

Imagine Training a Model to Recognize Cats

Training set: 10,000 cat photos
Validation set: 2,000 cat photos
Test set: 2,000 cat photos

If training accuracy = 99%

But test accuracy = 70%

→ Overfitting has occurred.

The model memorized the training images instead of learning real patterns.

deep-learning Topics

deep-learning Tutorial

FUNDAMENTALS OF TRAINING

1. Gradient Descent

Simple Explanation

Why Gradient Descent Is Needed

2. Learning Rate

If Learning Rate is too high

If Learning Rate is too low

Ideal Learning Rate

3. Underfitting and Overfitting

A. Underfitting

Signs of Underfitting

Example

B. Overfitting

Signs of Overfitting

Example

4. Bias–Variance Tradeoff

Bias

Variance

Bias–Variance Tradeoff

5. Train / Validation / Test Split

A. Training Set

B. Validation Set

C. Test Set

6. Code Examples (Simple and Beginner-Friendly)

A. Train/Validation/Test Split Example (Python)

B. Simple Neural Network Training

C. Gradient Descent Demonstration (Manual)

D. Overfitting Detection (Using Training Curves)

7. Real-World Example

Imagine Training a Model to Recognize Cats

deep-learning Topics

deep-learning Tutorial

FUNDAMENTALS OF TRAINING

1. Gradient Descent

Simple Explanation

Why Gradient Descent Is Needed

2. Learning Rate

If Learning Rate is too high

If Learning Rate is too low

Ideal Learning Rate

3. Underfitting and Overfitting

A. Underfitting

Signs of Underfitting

Example

B. Overfitting

Signs of Overfitting

Example

4. Bias–Variance Tradeoff

Bias

Variance

Bias–Variance Tradeoff

5. Train / Validation / Test Split

A. Training Set

B. Validation Set

C. Test Set

6. Code Examples (Simple and Beginner-Friendly)

A. Train/Validation/Test Split Example (Python)

B. Simple Neural Network Training

C. Gradient Descent Demonstration (Manual)

D. Overfitting Detection (Using Training Curves)

7. Real-World Example

Imagine Training a Model to Recognize Cats