Advancing Your ML Skills
1. Train/Test Split
When we train a machine learning model, we want to be sure it works well on new data, not only the data it has seen. To check this, we divide the dataset into two parts:
Training Set
Used to train (teach) the model.
Test Set
Used to evaluate (check) the model.
A common split:
- 70% training
- 30% testing
Python Example
from sklearn.model_selection import train_test_split
X = [[1],[2],[3],[4],[5],[6]]
y = [0,0,1,1,1,1]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
print("Training Data:", X_train)
print("Testing Data:", X_test)2. Evaluation Metrics (Accuracy, Precision, Recall, F1 Score)
We use these metrics to measure how well a model performs.
Accuracy
Percentage of correct predictions.
Precision
Out of predicted positives, how many were correct?
Recall
Out of real positives, how many did we detect?
F1 Score
A balance between precision and recall.
Python Example
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
y_true = [1,0,1,1,0]
y_pred = [1,0,0,1,0]
print("Accuracy:", accuracy_score(y_true, y_pred))
print("Precision:", precision_score(y_true, y_pred))
print("Recall:", recall_score(y_true, y_pred))
print("F1 Score:", f1_score(y_true, y_pred))3. Classification Models
Below are the most common ML models used for classification.
3.1 Logistic Regression
Used for binary classification:
- Spam or Not Spam
- Pass or Fail
- Yes or No
Python Example
from sklearn.linear_model import LogisticRegression
import numpy as np
X = np.array([[1],[2],[3],[4],[5]])
y = np.array([0,0,1,1,1])
model = LogisticRegression()
model.fit(X, y)
print(model.predict([[3]]))3.2 Decision Tree
A model that makes decisions by asking simple yes/no questions (like a flowchart).
Python Example
from sklearn.tree import DecisionTreeClassifier
X = [[1],[2],[3],[4]]
y = [0,0,1,1]
model = DecisionTreeClassifier()
model.fit(X, y)
print(model.predict([[3]]))3.3 Random Forest
A Random Forest is a group of many decision trees. They "vote" to give the final prediction.
Python Example
from sklearn.ensemble import RandomForestClassifier
X = [[1],[2],[3],[4]]
y = [0,0,1,1]
model = RandomForestClassifier(n_estimators=10)
model.fit(X, y)
print(model.predict([[3]]))3.4 KNN (K-Nearest Neighbors)
KNN checks the closest neighbors to decide the class.
Python Example
from sklearn.neighbors import KNeighborsClassifier
X = [[1],[2],[3],[4],[5]]
y = [0,0,1,1,1]
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X, y)
print(model.predict([[3]]))3.5 SVM (Support Vector Machine)
SVM tries to draw the best line that separates the classes.
Python Example
from sklearn import svm
X = [[1],[2],[3],[4]]
y = [0,0,1,1]
model = svm.SVC(kernel='linear')
model.fit(X, y)
print(model.predict([[3]]))4. Working With Datasets
Datasets may come from:
- CSV files
- Excel files
- Online sources
- Built-in libraries
Common steps:
- Load data
- Clean missing values
- Explore data
- Split data
- Train the model
Python Example
import pandas as pd
df = pd.read_csv("students.csv")
print(df.head())
df = df.dropna()
print(df.describe())5. Introduction to Neural Networks
Neural Networks are inspired by the human brain. They contain layers:
- Input Layer
- Hidden Layers
- Output Layer
Used in:
- Image recognition
- Speech recognition
- Chatbots
- Deep learning systems
Python Example
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(4, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy')
print(model.summary())6. Introduction to TensorFlow and PyTorch
Two most popular Deep Learning frameworks.
TensorFlow
Created by Google; great for beginners.
PyTorch
Created by Meta; popular for research.
TensorFlow Example
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(8, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(optimizer='adam', loss='mse')
print(model.summary())PyTorch Example
import torch
import torch.nn as nn
model = nn.Sequential(
nn.Linear(1, 8),
nn.ReLU(),
nn.Linear(8, 1)
)
print(model)