> Source URL: /ai-and-sports/projects/6-predicting-the-future/predicting-the-future.baseball.project --- title: Predicting the Future - Points, Performance & Breakouts (Baseball) student_outcomes: - use dictionaries to store structured player and game data - find simple patterns in recent performance data - build a reusable prediction function for next-game output - label breakout candidates with a clear rule-based threshold --- # Project: Predicting the Future - Points, Performance & Breakouts (Baseball) --- ## Introduction Baseball prediction can start simple. If you know a hitter's recent hit rate and expected at-bats, you can estimate next-game production. In this project, you will build a lightweight prediction model and flag breakout-watch performances. ### What we're building By the end of this project, you will: - **Store player data in dictionaries** - **Detect trends** in recent production - **Predict next-game hits** from recent rates - **Classify breakout potential** with clear rules ### Why this matters If you can build and explain a baseline model, you are learning the same workflow used in larger analytics systems. --- ## Part 1: Quick Start ### Set up your project in Cursor For this project, you will create your files from scratch (no starter pack download). #### Before you start: quick vocabulary - A **folder** is a container that holds files. - A **file** is one item inside a folder. - A file **extension** is the ending in the file name: - `.py` means Python code. - `.csv` means table-style data. 1. Pick an easy location for your project folder: - Mac: `Desktop` in Finder - Windows: `Desktop` in File Explorer - Chromebook: `My files` in the Files app 2. Create a folder called `baseball-performance-predictor`. 3. In Cursor, go to `File > Open Folder` and open `baseball-performance-predictor`. 4. In Cursor's left file explorer: - click **New Folder** and create `data` - click **New File** and create `performance_predictor.py` - open `data`, click **New File**, and create `recent_games.csv` 5. If you do not see file extensions on your computer, that is okay. Type the full names exactly, including `.py` and `.csv`. ### Writing the code Type this into `performance_predictor.py`: ```python recent_hits = [1, 2, 3, 1, 2] recent_at_bats = [4, 5, 5, 4, 5] projected_at_bats = 4 hits_per_at_bat = sum(recent_hits) / sum(recent_at_bats) predicted_hits = hits_per_at_bat * projected_at_bats print(f"Predicted hits next game: {predicted_hits:.2f}") ``` You just built a simple model: `prediction = hits per at-bat x projected at-bats` ### Running the code Save the file (Cmd + S on Mac, Ctrl + S on Windows/Chromebook). Run the file with Cursor's built-in **Run Python File** play button in the top-right. If the play button is missing, install/enable the Python extension in Cursor and reopen `performance_predictor.py`. Expected output: ``` Predicted hits next game: 1.60 ``` If your output matches, your setup is working. --- ## Part 2: Project Milestones ### Milestone 1: Store player data in a dictionary Dictionaries let you name each value so your code stays readable. Add this below your existing code: ```python player = { "name": "Diego Alvarez", "team": "Greenville Hawks", "season_average_hits": 1.35, "recent_hits": [1, 2, 3, 1, 2], "recent_at_bats": [4, 5, 5, 4, 5] } print(player["name"]) print(player["recent_hits"]) ``` Expected output: ```text Diego Alvarez [1, 2, 3, 1, 2] ``` --- ### Milestone 2: Build reusable prediction functions Now wrap your model logic into reusable functions. Add this below Milestone 1: ```python def average(values): return sum(values) / len(values) def predict_hits(recent_hits, recent_at_bats, projected_at_bats): hits_per_at_bat = sum(recent_hits) / sum(recent_at_bats) return hits_per_at_bat * projected_at_bats projected_at_bats = 4 prediction = predict_hits( player["recent_hits"], player["recent_at_bats"], projected_at_bats ) print(f"{player['name']} predicted hits: {prediction:.2f}") ``` Expected output: ```text Diego Alvarez predicted hits: 1.60 ``` --- ### Milestone 3: Detect trend direction Prediction is not just one number. You also want to know if recent games are trending up or down. Add this below your current code: ```python recent = player["recent_hits"] first_half_avg = average(recent[:2]) second_half_avg = average(recent[-2:]) if second_half_avg > first_half_avg: trend = "up" elif second_half_avg < first_half_avg: trend = "down" else: trend = "flat" print(f"Trend: {trend}") ``` Expected output: ```text Trend: up ``` --- ### Milestone 4: Add breakout watch logic Now convert prediction + season average into a label. Add this below your current code: ```python def breakout_label(predicted_value, season_average, breakout_margin): if predicted_value >= season_average + breakout_margin: return "Breakout watch" if predicted_value >= season_average: return "On pace" return "Below season pace" tag = breakout_label(prediction, player["season_average_hits"], 0.35) print(f"Tag: {tag}") ``` Expected output: ```text Tag: On pace ``` --- ### Milestone 5: Load recent game logs from CSV Now keep game data in a CSV so you can update predictions without editing Python logic. 1. Open `data/recent_games.csv`. 2. Paste the data below. 3. Save the file. 4. Make sure the file name is exactly `recent_games.csv` (not `recent_games.csv.txt`). Paste this into `data/recent_games.csv`: ```text date,hits,at_bats 2026-02-01,1,4 2026-02-03,2,5 2026-02-05,3,5 2026-02-07,1,4 2026-02-10,2,5 ``` Now load that CSV in Python and re-run the prediction from file data. Add this below your existing code: ```python import csv recent_hits = [] recent_at_bats = [] with open("data/recent_games.csv", "r") as file: reader = csv.DictReader(file) for row in reader: recent_hits.append(float(row["hits"])) recent_at_bats.append(float(row["at_bats"])) player["recent_hits"] = recent_hits player["recent_at_bats"] = recent_at_bats prediction = predict_hits(recent_hits, recent_at_bats, projected_at_bats) first_half_avg = average(recent_hits[:2]) second_half_avg = average(recent_hits[-2:]) if second_half_avg > first_half_avg: trend = "up" elif second_half_avg < first_half_avg: trend = "down" else: trend = "flat" print(f"CSV-based prediction: {prediction:.2f}") ``` Click **Run Python File** again and verify your CSV-based prediction prints. Expected output (your exact decimal may vary slightly): ```text CSV-based prediction: 1.60 ``` --- ### Milestone 6: Build a final prediction card Now combine prediction, trend, and breakout tag in one clean report block. Add this below Milestone 5: ```python tag = breakout_label(prediction, player["season_average_hits"], 0.35) print("=" * 50) print(f"PREDICTION CARD: {player['name']}") print("=" * 50) print(f"Projected at-bats: {projected_at_bats}") print(f"Predicted hits: {prediction:.2f}") print(f"Season average hits: {player['season_average_hits']:.2f}") print(f"Trend: {trend}") print(f"Tag: {tag}") print("=" * 50) ``` Expected result: - A final card with projected at-bats, prediction, season average, trend, and breakout tag. If you made it this far, you built a complete baseball prediction workflow from scratch. --- ## Common Fixes - `FileNotFoundError`: confirm path is exactly `data/recent_games.csv`. - `ValueError`: check `hits` and `at_bats` columns are numeric. - No play button: install/enable Python extension in Cursor and reopen `performance_predictor.py`. --- ## Bonus Exercises: Push It Further with Your Agent Use this short prompt structure for better AI help: 1. **Goal**: what you want to build 2. **Context**: file name + what code already exists 3. **Constraints**: keep beginner-friendly, minimal changes 4. **Output format**: ask for exact code + where to paste + expected output ### Bonus 1: Predict Multiple Players Try this prompt: ```text You are my Python tutor and coding assistant. Goal: Predict hits for 3 hitters and print one prediction card per player. Context: I am editing performance_predictor.py. I already have predict_hits(...), breakout_label(...), and a final prediction card format. Constraints: - Keep my current single-player flow working. - Add only what is needed for multi-player support. - Use beginner-friendly Python. Output format: 1) Plan (3 bullets) 2) CSV format I should use 3) Exact code changes 4) Where to paste each code block 5) Example output for 2 players ``` ### Bonus 2: Add a Prediction Range Try this prompt: ```text You are my Python tutor and coding assistant. Goal: Show low / medium / high hit predictions instead of one value. Context: Current code calculates one prediction from recent hits and at-bats. Constraints: - Keep formulas simple (no advanced libraries). - Explain the math in one short paragraph. - Keep my current prediction card and add range lines. Output format: 1) Range formula and reason 2) Code changes only 3) Expected output block 4) One way to tune the range width ``` ### Bonus 3: Compare Prediction vs Actual Try this prompt: ```text You are my Python tutor and coding assistant. Goal: Compare predicted hits to actual hits and calculate error. Context: I already compute prediction in performance_predictor.py. I want to add this after the final prediction card. Constraints: - Keep this beginner-level. - Show both signed error and absolute error. - Do not remove existing outputs. Output format: 1) Code block to add 2) Where to paste it 3) Expected output with sample numbers 4) One sentence explaining how to interpret error ``` --- ## Backlinks The following sources link to this document: - [>button: ⚾ Baseball](/ai-and-sports/projects/6-predicting-the-future/predicting-the-future.project.llm.md)