Introduction

Baseball prediction can start simple. If you know a hitter's recent hit rate and expected at-bats, you can estimate next-game production.

In this project, you will build a lightweight prediction model and flag breakout-watch performances.

What we're building

By the end of this project, you will:

  • Store player data in dictionaries
  • Detect trends in recent production
  • Predict next-game hits from recent rates
  • Classify breakout potential with clear rules

Why this matters

If you can build and explain a baseline model, you are learning the same workflow used in larger analytics systems.

Part 1: Quick Start

Set up your project in Cursor

For this project, you will create your files from scratch (no starter pack download).

Before you start: quick vocabulary

  • A folder is a container that holds files.
  • A file is one item inside a folder.
  • A file extension is the ending in the file name:
    • .py means Python code.
    • .csv means table-style data.
  1. Pick an easy location for your project folder:
    • Mac: Desktop in Finder
    • Windows: Desktop in File Explorer
    • Chromebook: My files in the Files app
  2. Create a folder called baseball-performance-predictor.
  3. In Cursor, go to File > Open Folder and open baseball-performance-predictor.
  4. In Cursor's left file explorer:
    • click New Folder and create data
    • click New File and create performance_predictor.py
    • open data, click New File, and create recent_games.csv
  5. If you do not see file extensions on your computer, that is okay. Type the full names exactly, including .py and .csv.

Writing the code

Type this into performance_predictor.py:

recent_hits = [1, 2, 3, 1, 2]
recent_at_bats = [4, 5, 5, 4, 5]
projected_at_bats = 4

hits_per_at_bat = sum(recent_hits) / sum(recent_at_bats)
predicted_hits = hits_per_at_bat * projected_at_bats

print(f"Predicted hits next game: {predicted_hits:.2f}")

You just built a simple model: prediction = hits per at-bat x projected at-bats

Running the code

Save the file (Cmd + S on Mac, Ctrl + S on Windows/Chromebook).

Run the file with Cursor's built-in Run Python File play button in the top-right.

If the play button is missing, install/enable the Python extension in Cursor and reopen performance_predictor.py.

Expected output:

Predicted hits next game: 1.60

If your output matches, your setup is working.

Part 2: Project Milestones

Milestone 1: Store player data in a dictionary

Dictionaries let you name each value so your code stays readable.

Add this below your existing code:

player = {
    "name": "Diego Alvarez",
    "team": "Greenville Hawks",
    "season_average_hits": 1.35,
    "recent_hits": [1, 2, 3, 1, 2],
    "recent_at_bats": [4, 5, 5, 4, 5]
}

print(player["name"])
print(player["recent_hits"])

Expected output:

Diego Alvarez
[1, 2, 3, 1, 2]

Milestone 2: Build reusable prediction functions

Now wrap your model logic into reusable functions.

Add this below Milestone 1:

def average(values):
    return sum(values) / len(values)

def predict_hits(recent_hits, recent_at_bats, projected_at_bats):
    hits_per_at_bat = sum(recent_hits) / sum(recent_at_bats)
    return hits_per_at_bat * projected_at_bats

projected_at_bats = 4
prediction = predict_hits(
    player["recent_hits"],
    player["recent_at_bats"],
    projected_at_bats
)

print(f"{player['name']} predicted hits: {prediction:.2f}")

Expected output:

Diego Alvarez predicted hits: 1.60

Milestone 3: Detect trend direction

Prediction is not just one number. You also want to know if recent games are trending up or down.

Add this below your current code:

recent = player["recent_hits"]
first_half_avg = average(recent[:2])
second_half_avg = average(recent[-2:])

if second_half_avg > first_half_avg:
    trend = "up"
elif second_half_avg < first_half_avg:
    trend = "down"
else:
    trend = "flat"

print(f"Trend: {trend}")

Expected output:

Trend: up

Milestone 4: Add breakout watch logic

Now convert prediction + season average into a label.

Add this below your current code:

def breakout_label(predicted_value, season_average, breakout_margin):
    if predicted_value >= season_average + breakout_margin:
        return "Breakout watch"
    if predicted_value >= season_average:
        return "On pace"
    return "Below season pace"

tag = breakout_label(prediction, player["season_average_hits"], 0.35)
print(f"Tag: {tag}")

Expected output:

Tag: On pace

Milestone 5: Load recent game logs from CSV

Now keep game data in a CSV so you can update predictions without editing Python logic.

  1. Open data/recent_games.csv.
  2. Paste the data below.
  3. Save the file.
  4. Make sure the file name is exactly recent_games.csv (not recent_games.csv.txt).

Paste this into data/recent_games.csv:

date,hits,at_bats
2026-02-01,1,4
2026-02-03,2,5
2026-02-05,3,5
2026-02-07,1,4
2026-02-10,2,5

Now load that CSV in Python and re-run the prediction from file data. Add this below your existing code:

import csv

recent_hits = []
recent_at_bats = []

with open("data/recent_games.csv", "r") as file:
    reader = csv.DictReader(file)
    for row in reader:
        recent_hits.append(float(row["hits"]))
        recent_at_bats.append(float(row["at_bats"]))

player["recent_hits"] = recent_hits
player["recent_at_bats"] = recent_at_bats
prediction = predict_hits(recent_hits, recent_at_bats, projected_at_bats)

first_half_avg = average(recent_hits[:2])
second_half_avg = average(recent_hits[-2:])
if second_half_avg > first_half_avg:
    trend = "up"
elif second_half_avg < first_half_avg:
    trend = "down"
else:
    trend = "flat"

print(f"CSV-based prediction: {prediction:.2f}")

Click Run Python File again and verify your CSV-based prediction prints.

Expected output (your exact decimal may vary slightly):

CSV-based prediction: 1.60

Milestone 6: Build a final prediction card

Now combine prediction, trend, and breakout tag in one clean report block.

Add this below Milestone 5:

tag = breakout_label(prediction, player["season_average_hits"], 0.35)

print("=" * 50)
print(f"PREDICTION CARD: {player['name']}")
print("=" * 50)
print(f"Projected at-bats: {projected_at_bats}")
print(f"Predicted hits: {prediction:.2f}")
print(f"Season average hits: {player['season_average_hits']:.2f}")
print(f"Trend: {trend}")
print(f"Tag: {tag}")
print("=" * 50)

Expected result:

  • A final card with projected at-bats, prediction, season average, trend, and breakout tag.

If you made it this far, you built a complete baseball prediction workflow from scratch.

Common Fixes

  • FileNotFoundError: confirm path is exactly data/recent_games.csv.
  • ValueError: check hits and at_bats columns are numeric.
  • No play button: install/enable Python extension in Cursor and reopen performance_predictor.py.

Bonus Exercises: Push It Further with Your Agent

Use this short prompt structure for better AI help:

  1. Goal: what you want to build
  2. Context: file name + what code already exists
  3. Constraints: keep beginner-friendly, minimal changes
  4. Output format: ask for exact code + where to paste + expected output

Bonus 1: Predict Multiple Players

Try this prompt:

You are my Python tutor and coding assistant.

Goal:
Predict hits for 3 hitters and print one prediction card per player.

Context:
I am editing performance_predictor.py.
I already have predict_hits(...), breakout_label(...), and a final prediction card format.

Constraints:
- Keep my current single-player flow working.
- Add only what is needed for multi-player support.
- Use beginner-friendly Python.

Output format:
1) Plan (3 bullets)
2) CSV format I should use
3) Exact code changes
4) Where to paste each code block
5) Example output for 2 players

Bonus 2: Add a Prediction Range

Try this prompt:

You are my Python tutor and coding assistant.

Goal:
Show low / medium / high hit predictions instead of one value.

Context:
Current code calculates one prediction from recent hits and at-bats.

Constraints:
- Keep formulas simple (no advanced libraries).
- Explain the math in one short paragraph.
- Keep my current prediction card and add range lines.

Output format:
1) Range formula and reason
2) Code changes only
3) Expected output block
4) One way to tune the range width

Bonus 3: Compare Prediction vs Actual

Try this prompt:

You are my Python tutor and coding assistant.

Goal:
Compare predicted hits to actual hits and calculate error.

Context:
I already compute prediction in performance_predictor.py.
I want to add this after the final prediction card.

Constraints:
- Keep this beginner-level.
- Show both signed error and absolute error.
- Do not remove existing outputs.

Output format:
1) Code block to add
2) Where to paste it
3) Expected output with sample numbers
4) One sentence explaining how to interpret error