Getting Started with Pandas for Data Handling

Introduction

Imagine you have a spreadsheet with rows and columns—like a list of students, their grades, or a shop's sales. Pandas is a Python tool that lets you handle these "tables" easily. It helps you read, clean, explore, and change data with just a few lines of code. In this lesson, you'll learn the basics to get started quickly and confidently.

Audience and Goal

For Grade 8–9 students learning Python for real-world data tasks
Goal: Learn what pandas is, how to load and explore a table of data, make simple changes, and save your work

What You Need

Python installed (3.8+ is great)
pandas library (install using: pip install pandas)
A code editor or notebook (IDLE, VS Code, Jupyter, etc.)

Step-by-Step Guide

1) Meet Pandas

Pandas is a Python library for working with data in tables.

A DataFrame is pandas' word for "a table with rows and columns," like a spreadsheet.
A Series is one column of that table.

2) Install and Import Pandas

Open your terminal/command prompt and run:

pip install pandas

In your Python file or notebook, start with:

import pandas as pd

3) Make Your First Table (DataFrame)

You can create a table from a Python dictionary (key-value pairs).

4) Read Data from a CSV File

CSV files are simple text files that store tables. Each row is a line, and commas separate the values. Pandas can load CSV files with one function call.

5) Explore Your Data

head(): shows the first few rows
info(): shows columns and data types
describe(): gives simple stats (counts, min, max, average for numbers)
shape: tells you how many rows and columns

6) Select and Filter

Select a column by name (like picking one column from a spreadsheet).
Filter rows to keep only the ones you care about (like "only items with price > 10").

7) Add New Columns and Handle Missing Values

Create new columns from other columns (like "total price = price x quantity").
Missing values can be filled with a default value (like 0 or "Unknown") or removed.

8) Save Your Work

Save your cleaned or updated table back to a CSV file using to_csv.

Python Code Examples

Example 1: Create and Explore a DataFrame

What it does: builds a small table in code and explores it.

import pandas as pd

# Create a small table (DataFrame) from a dictionary
data = {
    "name": ["Ava", "Ben", "Cara", "Don", "Eva"],
    "age": [14, 15, 14, 16, 15],
    "score": [88, 92, 79, 85, 90]
}

df = pd.DataFrame(data)

# Look at the first few rows
print("First rows:")
print(df.head())

# Check the size (rows, columns)
print("Shape (rows, columns):", df.shape)

# See basic info (column types, non-missing counts)
print("\nInfo:")
print(df.info())

# Summary stats for number columns (count, mean, min, max, etc.)
print("\nDescribe:")
print(df.describe())

Example 2: Read, Select, Filter, Sort, and Add a Column

What it does: reads a CSV (we'll simulate a CSV in memory), selects columns, filters rows, sorts, and adds a new column.

import pandas as pd
from io import StringIO  # Lets us pretend a string is a file

# Simulated CSV content
csv_text = """item,category,price,quantity
Pencil,Stationery,0.99,10
Notebook,Stationery,2.49,5
Apple,Food,0.50,12
Granola Bar,Food,1.20,8
Water Bottle,Other,3.00,3
"""

# Read CSV from the string (in real life, use pd.read_csv("filename.csv"))
df = pd.read_csv(StringIO(csv_text))

print("Original data:")
print(df)

# Select a single column (a Series)
print("\nPrices column:")
print(df["price"])

# Select multiple columns (a new DataFrame)
print("\nItem and price columns:")
print(df[["item", "price"]])

# Filter rows: keep only items that cost at least $1.00
expensive = df[df["price"] >= 1.00]
print("\nItems costing at least $1.00:")
print(expensive)

# Add a new column: total value = price * quantity
df["total_value"] = df["price"] * df["quantity"]
print("\nAdded total_value column:")
print(df)

# Sort by total_value (descending: biggest first)
sorted_df = df.sort_values(by="total_value", ascending=False)
print("\nSorted by total_value (biggest first):")
print(sorted_df)

Tip:

If you have a real CSV file named shop.csv in the same folder, you can do:

df = pd.read_csv("shop.csv")

Example 3: Handling Missing Values (NaN)

What it does: shows how to find, fill, and drop missing data.

import pandas as pd
import numpy as np

data = {
    "name": ["Ava", "Ben", "Cara", "Don", "Eva"],
    "age": [14, np.nan, 14, 16, np.nan],   # np.nan represents a missing value
    "score": [88, 92, None, 85, 90]        # None also becomes a missing value in pandas
}

df = pd.DataFrame(data)
print("Original data with missing values:")
print(df)

# Check how many missing values each column has
print("\nMissing value counts:")
print(df.isna().sum())

# Option 1: Fill missing ages with a default value (e.g., 15)
df_filled = df.copy()
df_filled["age"] = df_filled["age"].fillna(15)

# Fill missing scores with the average score (mean)
mean_score = df_filled["score"].mean()
df_filled["score"] = df_filled["score"].fillna(mean_score)

print("\nAfter filling missing values:")
print(df_filled)

# Option 2: Drop rows with any missing values (use carefully!)
df_dropped = df.dropna()
print("\nAfter dropping rows with missing values:")
print(df_dropped)

Saving Your Results

To save your cleaned table to a new CSV file:

df.to_csv("cleaned_data.csv", index=False)

Small Practical Exercise: Mini Movie Rentals 🎬

Scenario: You help a small movie rental kiosk look at simple data.

Starter Data

(you can copy this into your code with StringIO like in Example 2):

item,genre,price_per_day,days_rented
Movie A,Action,3.5,2
Movie B,Comedy,2.0,5
Movie C,Action,4.0,1
Movie D,Drama,2.5,3
Movie E,Comedy,2.0,2

Tasks:

Load the CSV into a DataFrame.
Add a new column total_cost = price_per_day * days_rented.
Show only the rows where total_cost is at least 7.0.
Sort the whole table by total_cost from highest to lowest.
Save the sorted table to a CSV file named rentals_report.csv (no index).

Hints:

Use pd.read_csv with StringIO (or a real file).
For filtering, use df[df["total_cost"] >= 7.0].
For sorting, use df.sort_values(by="total_cost", ascending=False).
For saving, use df.to_csv("rentals_report.csv", index=False).

Challenge (optional):

Which genre has the most rentals in this tiny dataset? Try df["genre"].value_counts().

Recap

Pandas helps you work with table-like data (DataFrames) in Python.
You learned how to create a DataFrame, read CSVs, explore data (head, info, describe), select and filter rows, add new columns, handle missing values, sort, and save your work.
With these basics, you can start analyzing real-world data sets confidently.

Getting Started with Pandas for Data Handling

Introduction

Audience and Goal

For Grade 8–9 students learning Python for real-world data tasks
Goal: Learn what pandas is, how to load and explore a table of data, make simple changes, and save your work

What You Need

Python installed (3.8+ is great)
pandas library (install using: pip install pandas)
A code editor or notebook (IDLE, VS Code, Jupyter, etc.)

Step-by-Step Guide

1) Meet Pandas

Pandas is a Python library for working with data in tables.

A DataFrame is pandas' word for "a table with rows and columns," like a spreadsheet.
A Series is one column of that table.

2) Install and Import Pandas

Open your terminal/command prompt and run:

pip install pandas

In your Python file or notebook, start with:

import pandas as pd

3) Make Your First Table (DataFrame)

You can create a table from a Python dictionary (key-value pairs).

4) Read Data from a CSV File

CSV files are simple text files that store tables. Each row is a line, and commas separate the values. Pandas can load CSV files with one function call.

5) Explore Your Data

head(): shows the first few rows
info(): shows columns and data types
describe(): gives simple stats (counts, min, max, average for numbers)
shape: tells you how many rows and columns

6) Select and Filter

Select a column by name (like picking one column from a spreadsheet).
Filter rows to keep only the ones you care about (like "only items with price > 10").

7) Add New Columns and Handle Missing Values

Create new columns from other columns (like "total price = price x quantity").
Missing values can be filled with a default value (like 0 or "Unknown") or removed.

8) Save Your Work

Save your cleaned or updated table back to a CSV file using to_csv.

Python Code Examples

Example 1: Create and Explore a DataFrame

What it does: builds a small table in code and explores it.

import pandas as pd

# Create a small table (DataFrame) from a dictionary
data = {
    "name": ["Ava", "Ben", "Cara", "Don", "Eva"],
    "age": [14, 15, 14, 16, 15],
    "score": [88, 92, 79, 85, 90]
}

df = pd.DataFrame(data)

# Look at the first few rows
print("First rows:")
print(df.head())

# Check the size (rows, columns)
print("Shape (rows, columns):", df.shape)

# See basic info (column types, non-missing counts)
print("\nInfo:")
print(df.info())

# Summary stats for number columns (count, mean, min, max, etc.)
print("\nDescribe:")
print(df.describe())

Example 2: Read, Select, Filter, Sort, and Add a Column

What it does: reads a CSV (we'll simulate a CSV in memory), selects columns, filters rows, sorts, and adds a new column.

import pandas as pd
from io import StringIO  # Lets us pretend a string is a file

# Simulated CSV content
csv_text = """item,category,price,quantity
Pencil,Stationery,0.99,10
Notebook,Stationery,2.49,5
Apple,Food,0.50,12
Granola Bar,Food,1.20,8
Water Bottle,Other,3.00,3
"""

# Read CSV from the string (in real life, use pd.read_csv("filename.csv"))
df = pd.read_csv(StringIO(csv_text))

print("Original data:")
print(df)

# Select a single column (a Series)
print("\nPrices column:")
print(df["price"])

# Select multiple columns (a new DataFrame)
print("\nItem and price columns:")
print(df[["item", "price"]])

# Filter rows: keep only items that cost at least $1.00
expensive = df[df["price"] >= 1.00]
print("\nItems costing at least $1.00:")
print(expensive)

# Add a new column: total value = price * quantity
df["total_value"] = df["price"] * df["quantity"]
print("\nAdded total_value column:")
print(df)

# Sort by total_value (descending: biggest first)
sorted_df = df.sort_values(by="total_value", ascending=False)
print("\nSorted by total_value (biggest first):")
print(sorted_df)

Tip:

If you have a real CSV file named shop.csv in the same folder, you can do:

df = pd.read_csv("shop.csv")

Example 3: Handling Missing Values (NaN)

What it does: shows how to find, fill, and drop missing data.

import pandas as pd
import numpy as np

data = {
    "name": ["Ava", "Ben", "Cara", "Don", "Eva"],
    "age": [14, np.nan, 14, 16, np.nan],   # np.nan represents a missing value
    "score": [88, 92, None, 85, 90]        # None also becomes a missing value in pandas
}

df = pd.DataFrame(data)
print("Original data with missing values:")
print(df)

# Check how many missing values each column has
print("\nMissing value counts:")
print(df.isna().sum())

# Option 1: Fill missing ages with a default value (e.g., 15)
df_filled = df.copy()
df_filled["age"] = df_filled["age"].fillna(15)

# Fill missing scores with the average score (mean)
mean_score = df_filled["score"].mean()
df_filled["score"] = df_filled["score"].fillna(mean_score)

print("\nAfter filling missing values:")
print(df_filled)

# Option 2: Drop rows with any missing values (use carefully!)
df_dropped = df.dropna()
print("\nAfter dropping rows with missing values:")
print(df_dropped)

Saving Your Results

To save your cleaned table to a new CSV file:

df.to_csv("cleaned_data.csv", index=False)

Small Practical Exercise: Mini Movie Rentals 🎬

Scenario: You help a small movie rental kiosk look at simple data.

Starter Data

(you can copy this into your code with StringIO like in Example 2):

item,genre,price_per_day,days_rented
Movie A,Action,3.5,2
Movie B,Comedy,2.0,5
Movie C,Action,4.0,1
Movie D,Drama,2.5,3
Movie E,Comedy,2.0,2

Tasks:

Load the CSV into a DataFrame.
Add a new column total_cost = price_per_day * days_rented.
Show only the rows where total_cost is at least 7.0.
Sort the whole table by total_cost from highest to lowest.
Save the sorted table to a CSV file named rentals_report.csv (no index).

Hints:

Use pd.read_csv with StringIO (or a real file).
For filtering, use df[df["total_cost"] >= 7.0].
For sorting, use df.sort_values(by="total_cost", ascending=False).
For saving, use df.to_csv("rentals_report.csv", index=False).

Challenge (optional):

Which genre has the most rentals in this tiny dataset? Try df["genre"].value_counts().

Recap

Pandas helps you work with table-like data (DataFrames) in Python.
You learned how to create a DataFrame, read CSVs, explore data (head, info, describe), select and filter rows, add new columns, handle missing values, sort, and save your work.
With these basics, you can start analyzing real-world data sets confidently.

data-science Topics

data-science Tutorial

Getting Started with Pandas for Data Handling

Introduction

Audience and Goal

What You Need

Step-by-Step Guide

1) Meet Pandas

2) Install and Import Pandas

3) Make Your First Table (DataFrame)

4) Read Data from a CSV File

5) Explore Your Data

6) Select and Filter

7) Add New Columns and Handle Missing Values

8) Save Your Work

Python Code Examples

Example 1: Create and Explore a DataFrame

Example 2: Read, Select, Filter, Sort, and Add a Column

Tip:

Example 3: Handling Missing Values (NaN)

Saving Your Results

Small Practical Exercise: Mini Movie Rentals 🎬

Starter Data

Tasks:

Hints:

Challenge (optional):

Recap

data-science Topics

data-science Tutorial

Getting Started with Pandas for Data Handling

Introduction

Audience and Goal

What You Need

Step-by-Step Guide

1) Meet Pandas

2) Install and Import Pandas

3) Make Your First Table (DataFrame)

4) Read Data from a CSV File

5) Explore Your Data

6) Select and Filter

7) Add New Columns and Handle Missing Values

8) Save Your Work

Python Code Examples

Example 1: Create and Explore a DataFrame

Example 2: Read, Select, Filter, Sort, and Add a Column

Tip:

Example 3: Handling Missing Values (NaN)

Saving Your Results

Small Practical Exercise: Mini Movie Rentals 🎬

Starter Data

Tasks:

Hints:

Challenge (optional):

Recap