Project: Predicting the Future - Points, Performance & Breakouts (Baseball)

Introduction

Baseball prediction can start simple. If you know a hitter's recent hit rate and expected at-bats, you can estimate next-game production.

In this project, you will build a lightweight prediction model and flag breakout-watch performances.

What we're building

By the end of this project, you will:

Store player data in dictionaries
Detect trends in recent production
Predict next-game hits from recent rates
Classify breakout potential with clear rules

Why this matters

If you can build and explain a baseline model, you are learning the same workflow used in larger analytics systems.

Part 1: Quick Start

Set up your project in Cursor

For this project, you will create your files from scratch (no starter pack download).

Before you start: quick vocabulary

A folder is a container that holds files.
A file is one item inside a folder.
A file extension is the ending in the file name:
- .py means Python code.
- .csv means table-style data.

Pick an easy location for your project folder:
- Mac: Desktop in Finder
- Windows: Desktop in File Explorer
- Chromebook: My files in the Files app
Create a folder called baseball-performance-predictor.
In Cursor, go to File > Open Folder and open baseball-performance-predictor.
In Cursor's left file explorer:
- click New Folder and create data
- click New File and create performance_predictor.py
- open data, click New File, and create recent_games.csv
If you do not see file extensions on your computer, that is okay. Type the full names exactly, including .py and .csv.

Writing the code

Type this into performance_predictor.py:

recent_hits = [1, 2, 3, 1, 2]
recent_at_bats = [4, 5, 5, 4, 5]
projected_at_bats = 4

hits_per_at_bat = sum(recent_hits) / sum(recent_at_bats)
predicted_hits = hits_per_at_bat * projected_at_bats

print(f"Predicted hits next game: {predicted_hits:.2f}")

You just built a simple model: prediction = hits per at-bat x projected at-bats

Running the code

Save the file (Cmd + S on Mac, Ctrl + S on Windows/Chromebook).

Run the file with Cursor's built-in Run Python File play button in the top-right.

If the play button is missing, install/enable the Python extension in Cursor and reopen performance_predictor.py.

Expected output:

Predicted hits next game: 1.60

If your output matches, your setup is working.

Part 2: Project Milestones

Milestone 1: Store player data in a dictionary

Dictionaries let you name each value so your code stays readable.

Add this below your existing code:

player = {
    "name": "Diego Alvarez",
    "team": "Greenville Hawks",
    "season_average_hits": 1.35,
    "recent_hits": [1, 2, 3, 1, 2],
    "recent_at_bats": [4, 5, 5, 4, 5]
}

print(player["name"])
print(player["recent_hits"])

Expected output:

Diego Alvarez
[1, 2, 3, 1, 2]

Milestone 2: Build reusable prediction functions

Now wrap your model logic into reusable functions.

Add this below Milestone 1:

def average(values):
    return sum(values) / len(values)

def predict_hits(recent_hits, recent_at_bats, projected_at_bats):
    hits_per_at_bat = sum(recent_hits) / sum(recent_at_bats)
    return hits_per_at_bat * projected_at_bats

projected_at_bats = 4
prediction = predict_hits(
    player["recent_hits"],
    player["recent_at_bats"],
    projected_at_bats
)

print(f"{player['name']} predicted hits: {prediction:.2f}")

Expected output:

Diego Alvarez predicted hits: 1.60

Milestone 3: Detect trend direction

Prediction is not just one number. You also want to know if recent games are trending up or down.

Add this below your current code:

recent = player["recent_hits"]
first_half_avg = average(recent[:2])
second_half_avg = average(recent[-2:])

if second_half_avg > first_half_avg:
    trend = "up"
elif second_half_avg < first_half_avg:
    trend = "down"
else:
    trend = "flat"

print(f"Trend: {trend}")

Expected output:

Trend: up

Milestone 4: Add breakout watch logic

Now convert prediction + season average into a label.

Add this below your current code:

def breakout_label(predicted_value, season_average, breakout_margin):
    if predicted_value >= season_average + breakout_margin:
        return "Breakout watch"
    if predicted_value >= season_average:
        return "On pace"
    return "Below season pace"

tag = breakout_label(prediction, player["season_average_hits"], 0.35)
print(f"Tag: {tag}")

Expected output:

Tag: On pace

Milestone 5: Load recent game logs from CSV

Now keep game data in a CSV so you can update predictions without editing Python logic.

Open data/recent_games.csv.
Paste the data below.
Save the file.
Make sure the file name is exactly recent_games.csv (not recent_games.csv.txt).

Paste this into data/recent_games.csv:

date,hits,at_bats
2026-02-01,1,4
2026-02-03,2,5
2026-02-05,3,5
2026-02-07,1,4
2026-02-10,2,5

Now load that CSV in Python and re-run the prediction from file data. Add this below your existing code:

import csv

recent_hits = []
recent_at_bats = []

with open("data/recent_games.csv", "r") as file:
    reader = csv.DictReader(file)
    for row in reader:
        recent_hits.append(float(row["hits"]))
        recent_at_bats.append(float(row["at_bats"]))

player["recent_hits"] = recent_hits
player["recent_at_bats"] = recent_at_bats
prediction = predict_hits(recent_hits, recent_at_bats, projected_at_bats)

first_half_avg = average(recent_hits[:2])
second_half_avg = average(recent_hits[-2:])
if second_half_avg > first_half_avg:
    trend = "up"
elif second_half_avg < first_half_avg:
    trend = "down"
else:
    trend = "flat"

print(f"CSV-based prediction: {prediction:.2f}")

Click Run Python File again and verify your CSV-based prediction prints.

Expected output (your exact decimal may vary slightly):

CSV-based prediction: 1.60

Milestone 6: Build a final prediction card

Now combine prediction, trend, and breakout tag in one clean report block.

Add this below Milestone 5:

tag = breakout_label(prediction, player["season_average_hits"], 0.35)

print("=" * 50)
print(f"PREDICTION CARD: {player['name']}")
print("=" * 50)
print(f"Projected at-bats: {projected_at_bats}")
print(f"Predicted hits: {prediction:.2f}")
print(f"Season average hits: {player['season_average_hits']:.2f}")
print(f"Trend: {trend}")
print(f"Tag: {tag}")
print("=" * 50)

Expected result:

A final card with projected at-bats, prediction, season average, trend, and breakout tag.

If you made it this far, you built a complete baseball prediction workflow from scratch.

Common Fixes

FileNotFoundError: confirm path is exactly data/recent_games.csv.
ValueError: check hits and at_bats columns are numeric.
No play button: install/enable Python extension in Cursor and reopen performance_predictor.py.

Bonus Exercises: Push It Further with Your Agent

Use this short prompt structure for better AI help:

Goal: what you want to build
Context: file name + what code already exists
Constraints: keep beginner-friendly, minimal changes
Output format: ask for exact code + where to paste + expected output

Bonus 1: Predict Multiple Players

Try this prompt:

You are my Python tutor and coding assistant.

Goal:
Predict hits for 3 hitters and print one prediction card per player.

Context:
I am editing performance_predictor.py.
I already have predict_hits(...), breakout_label(...), and a final prediction card format.

Constraints:
- Keep my current single-player flow working.
- Add only what is needed for multi-player support.
- Use beginner-friendly Python.

Output format:
1) Plan (3 bullets)
2) CSV format I should use
3) Exact code changes
4) Where to paste each code block
5) Example output for 2 players

Bonus 2: Add a Prediction Range

Try this prompt:

You are my Python tutor and coding assistant.

Goal:
Show low / medium / high hit predictions instead of one value.

Context:
Current code calculates one prediction from recent hits and at-bats.

Constraints:
- Keep formulas simple (no advanced libraries).
- Explain the math in one short paragraph.
- Keep my current prediction card and add range lines.

Output format:
1) Range formula and reason
2) Code changes only
3) Expected output block
4) One way to tune the range width

Bonus 3: Compare Prediction vs Actual

Try this prompt:

You are my Python tutor and coding assistant.

Goal:
Compare predicted hits to actual hits and calculate error.

Context:
I already compute prediction in performance_predictor.py.
I want to add this after the final prediction card.

Constraints:
- Keep this beginner-level.
- Show both signed error and absolute error.
- Do not remove existing outputs.

Output format:
1) Code block to add
2) Where to paste it
3) Expected output with sample numbers
4) One sentence explaining how to interpret error