MLB Trade Analyzer

To assist in my fantasy baseball league, I developed a tool to analyze and suggest trades that would be beneficial to both teams. The tool implements web scraping, data analysis, and forecasting to model the rest of a given fantasy baseball season.

Data Scraping and Cleaning

Since ESPN provides no public API to get fantasy data, the tool first uses the Selenium library to scrape the data needed for the project. It launches a web driver, logs in to ESPN, and scrapes the tables for the standings as well as each team’s player stats. This data is then cleaned and placed into Pandas DataFrames for the standings and players for each team, such as the following tables.

OwnerHRTBRBI...SVHDERA
Michael Smith260269085482143.802
William Booth261274690085273.637
Kenan Bateman256260085058433.638
Dan Hay26026178527233.702
Josh Melton212247176483113.451
SlotPlayerPositions(Season, HR)(Season, RBI)(Season, BB)...(7 Days, AVG)(7 Days, OBP)(7 Days, FPCT)
CJ.T. RealmutoC144829.278.3501.000
1BMatt Olson1B4310779.414.5751.000
2BAndres Gimenez2B114527.435.4801.000
...
UTILRonald Acuna Jr.OF, DH267165.313.4501.000
UTILBrandon NimmoOF164858.391.4810.929
UTILJustin Turner3B, 1B, 2B, DH197339.444.4441.000

Roster Optimization

For my personal roster, I created an additional tool that optimizes the starting lineup by prioritizing certain stat categories. This is a generic constraint satisfaction problem as each player can only occupy certain roster positions. So, the tool first finds the top 14 players on the roster for a given stat category. It then tries to find a roster lineup using just those players.

If it can’t fill every roster spot, it loosens the constraint and tries to fill roster spots with the top 13 players before filling the final spot with the next best player of that position. If this doesn’t work, it continues loosening the constraint and tries to fill the roster with the top n-1 number of players until it successfully staffs those players.

The image below shows how the tool steps through the logic using an example stat. It repeatedly iterates through the players and staffs any player with only a single position remaining (tightest constraint). Once a position has been staffed, the tool removes that position from the other remaining players. If no positions remain, it staffs the player in the utility spot. It continues iterating until all players are staffed.

Season Forecasting

Once the roster is staffed, the tool then forecasts the remainder of the season by simulating the stats for the remaining games. This is accomplished by weighting each player’s recent (7-day, 15-day, and 30-day) and season-long stats to project their stats for the rest of the season.

For certain statistics, these projections are a little more involved to calculate as they may involve certain metrics that aren’t provided by ESPN. Additionally, some pitchers throw significantly more or less innings, and some batters are on teams that where they get more or less at bats. Also, incorporating injuries into stat projections complicates things further. For example, does a pitcher have fewer innings thrown because they don’t go deep into games, or were they hurt for a month early in the season? The forecasting tool calculates trends for a player to solve this by normalizing certain stats or adjusting the short term / long term weights accordingly if they are outside an expected range.

 

Trade Simulations

To simulate a trade, the tool creates temporary rosters that would result after players are swapped. It then tries to optimize each roster based on a given stat category or across all stat categories. After creating optimal rosters, it simulates the remainder of the season using the techniques described above. With the season-ending statistics, it calculates the projected final standings to see how the trade would impact each team based on their previous projection before the trade.

As a command-line program, the final output is printed out in the terminal window. It shows the traded players, how each roster would change, how each stat category would shift, and the final standings projections. Printing these in a visually appealing way was a fun challenge. This output doesn’t use any 3rd party libraries, and I personally created all the text alignment, tables, and color changes. It also follows baseball rules and correctly color codes statistics that improve both by getting lower or higher. An example output is shown below.

Languages

Python, HTML, CSS

Technology

Selenium, Pandas, Numpy