> Source URL: /ai-and-sports/projects/6-predicting-the-future/predicting-the-future.baseball.project
---
title: Predicting the Future - Points, Performance & Breakouts (Baseball)
student_outcomes:
  - use dictionaries to store structured player and game data
  - find simple patterns in recent performance data
  - build a reusable prediction function for next-game output
  - label breakout candidates with a clear rule-based threshold
---

# Project: Predicting the Future - Points, Performance & Breakouts (Baseball)

---

## Introduction

Baseball prediction can start simple. If you know a hitter's recent hit rate and expected at-bats, you can estimate next-game production.

In this project, you will build a lightweight prediction model and flag breakout-watch performances.

### What we're building

By the end of this project, you will:

- **Store player data in dictionaries**
- **Detect trends** in recent production
- **Predict next-game hits** from recent rates
- **Classify breakout potential** with clear rules

### Why this matters

If you can build and explain a baseline model, you are learning the same workflow used in larger analytics systems.

---

## Part 1: Quick Start

### Set up your project in Cursor

For this project, you will create your files from scratch (no starter pack download).

#### Before you start: quick vocabulary

- A **folder** is a container that holds files.
- A **file** is one item inside a folder.
- A file **extension** is the ending in the file name:
  - `.py` means Python code.
  - `.csv` means table-style data.

1. Pick an easy location for your project folder:
   - Mac: `Desktop` in Finder
   - Windows: `Desktop` in File Explorer
   - Chromebook: `My files` in the Files app
2. Create a folder called `baseball-performance-predictor`.
3. In Cursor, go to `File > Open Folder` and open `baseball-performance-predictor`.
4. In Cursor's left file explorer:
   - click **New Folder** and create `data`
   - click **New File** and create `performance_predictor.py`
   - open `data`, click **New File**, and create `recent_games.csv`
5. If you do not see file extensions on your computer, that is okay. Type the full names exactly, including `.py` and `.csv`.

### Writing the code

Type this into `performance_predictor.py`:

```python
recent_hits = [1, 2, 3, 1, 2]
recent_at_bats = [4, 5, 5, 4, 5]
projected_at_bats = 4

hits_per_at_bat = sum(recent_hits) / sum(recent_at_bats)
predicted_hits = hits_per_at_bat * projected_at_bats

print(f"Predicted hits next game: {predicted_hits:.2f}")
```

You just built a simple model:
`prediction = hits per at-bat x projected at-bats`

### Running the code

Save the file (<Kbd>Cmd + S</Kbd> on Mac, <Kbd>Ctrl + S</Kbd> on Windows/Chromebook).

Run the file with Cursor's built-in **Run Python File** play button in the top-right.

If the play button is missing, install/enable the Python extension in Cursor and reopen `performance_predictor.py`.

Expected output:

```
Predicted hits next game: 1.60
```

If your output matches, your setup is working.

---

## Part 2: Project Milestones

### Milestone 1: Store player data in a dictionary

Dictionaries let you name each value so your code stays readable.

Add this below your existing code:

```python
player = {
    "name": "Diego Alvarez",
    "team": "Greenville Hawks",
    "season_average_hits": 1.35,
    "recent_hits": [1, 2, 3, 1, 2],
    "recent_at_bats": [4, 5, 5, 4, 5]
}

print(player["name"])
print(player["recent_hits"])
```

Expected output:

```text
Diego Alvarez
[1, 2, 3, 1, 2]
```

---

### Milestone 2: Build reusable prediction functions

Now wrap your model logic into reusable functions.

Add this below Milestone 1:

```python
def average(values):
    return sum(values) / len(values)

def predict_hits(recent_hits, recent_at_bats, projected_at_bats):
    hits_per_at_bat = sum(recent_hits) / sum(recent_at_bats)
    return hits_per_at_bat * projected_at_bats

projected_at_bats = 4
prediction = predict_hits(
    player["recent_hits"],
    player["recent_at_bats"],
    projected_at_bats
)

print(f"{player['name']} predicted hits: {prediction:.2f}")
```

Expected output:

```text
Diego Alvarez predicted hits: 1.60
```

---

### Milestone 3: Detect trend direction

Prediction is not just one number. You also want to know if recent games are trending up or down.

Add this below your current code:

```python
recent = player["recent_hits"]
first_half_avg = average(recent[:2])
second_half_avg = average(recent[-2:])

if second_half_avg > first_half_avg:
    trend = "up"
elif second_half_avg < first_half_avg:
    trend = "down"
else:
    trend = "flat"

print(f"Trend: {trend}")
```

Expected output:

```text
Trend: up
```

---

### Milestone 4: Add breakout watch logic

Now convert prediction + season average into a label.

Add this below your current code:

```python
def breakout_label(predicted_value, season_average, breakout_margin):
    if predicted_value >= season_average + breakout_margin:
        return "Breakout watch"
    if predicted_value >= season_average:
        return "On pace"
    return "Below season pace"

tag = breakout_label(prediction, player["season_average_hits"], 0.35)
print(f"Tag: {tag}")
```

Expected output:

```text
Tag: On pace
```

---

### Milestone 5: Load recent game logs from CSV

Now keep game data in a CSV so you can update predictions without editing Python logic.

1. Open `data/recent_games.csv`.
2. Paste the data below.
3. Save the file.
4. Make sure the file name is exactly `recent_games.csv` (not `recent_games.csv.txt`).

Paste this into `data/recent_games.csv`:

```text
date,hits,at_bats
2026-02-01,1,4
2026-02-03,2,5
2026-02-05,3,5
2026-02-07,1,4
2026-02-10,2,5
```

Now load that CSV in Python and re-run the prediction from file data.
Add this below your existing code:

```python
import csv

recent_hits = []
recent_at_bats = []

with open("data/recent_games.csv", "r") as file:
    reader = csv.DictReader(file)
    for row in reader:
        recent_hits.append(float(row["hits"]))
        recent_at_bats.append(float(row["at_bats"]))

player["recent_hits"] = recent_hits
player["recent_at_bats"] = recent_at_bats
prediction = predict_hits(recent_hits, recent_at_bats, projected_at_bats)

first_half_avg = average(recent_hits[:2])
second_half_avg = average(recent_hits[-2:])
if second_half_avg > first_half_avg:
    trend = "up"
elif second_half_avg < first_half_avg:
    trend = "down"
else:
    trend = "flat"

print(f"CSV-based prediction: {prediction:.2f}")
```

Click **Run Python File** again and verify your CSV-based prediction prints.

Expected output (your exact decimal may vary slightly):

```text
CSV-based prediction: 1.60
```

---

### Milestone 6: Build a final prediction card

Now combine prediction, trend, and breakout tag in one clean report block.

Add this below Milestone 5:

```python
tag = breakout_label(prediction, player["season_average_hits"], 0.35)

print("=" * 50)
print(f"PREDICTION CARD: {player['name']}")
print("=" * 50)
print(f"Projected at-bats: {projected_at_bats}")
print(f"Predicted hits: {prediction:.2f}")
print(f"Season average hits: {player['season_average_hits']:.2f}")
print(f"Trend: {trend}")
print(f"Tag: {tag}")
print("=" * 50)
```

Expected result:
- A final card with projected at-bats, prediction, season average, trend, and breakout tag.

If you made it this far, you built a complete baseball prediction workflow from scratch.

---

## Common Fixes

- `FileNotFoundError`: confirm path is exactly `data/recent_games.csv`.
- `ValueError`: check `hits` and `at_bats` columns are numeric.
- No play button: install/enable Python extension in Cursor and reopen `performance_predictor.py`.

---

## Bonus Exercises: Push It Further with Your Agent

Use this short prompt structure for better AI help:

1. **Goal**: what you want to build
2. **Context**: file name + what code already exists
3. **Constraints**: keep beginner-friendly, minimal changes
4. **Output format**: ask for exact code + where to paste + expected output

### Bonus 1: Predict Multiple Players

Try this prompt:

```text
You are my Python tutor and coding assistant.

Goal:
Predict hits for 3 hitters and print one prediction card per player.

Context:
I am editing performance_predictor.py.
I already have predict_hits(...), breakout_label(...), and a final prediction card format.

Constraints:
- Keep my current single-player flow working.
- Add only what is needed for multi-player support.
- Use beginner-friendly Python.

Output format:
1) Plan (3 bullets)
2) CSV format I should use
3) Exact code changes
4) Where to paste each code block
5) Example output for 2 players
```

### Bonus 2: Add a Prediction Range

Try this prompt:

```text
You are my Python tutor and coding assistant.

Goal:
Show low / medium / high hit predictions instead of one value.

Context:
Current code calculates one prediction from recent hits and at-bats.

Constraints:
- Keep formulas simple (no advanced libraries).
- Explain the math in one short paragraph.
- Keep my current prediction card and add range lines.

Output format:
1) Range formula and reason
2) Code changes only
3) Expected output block
4) One way to tune the range width
```

### Bonus 3: Compare Prediction vs Actual

Try this prompt:

```text
You are my Python tutor and coding assistant.

Goal:
Compare predicted hits to actual hits and calculate error.

Context:
I already compute prediction in performance_predictor.py.
I want to add this after the final prediction card.

Constraints:
- Keep this beginner-level.
- Show both signed error and absolute error.
- Do not remove existing outputs.

Output format:
1) Code block to add
2) Where to paste it
3) Expected output with sample numbers
4) One sentence explaining how to interpret error
```


---

## Backlinks

The following sources link to this document:

- [>button: ⚾ Baseball](/ai-and-sports/projects/6-predicting-the-future/predicting-the-future.project.llm.md)
