top of page

𝐓𝐡𝐞 𝐌𝐚𝐝𝐧𝐞𝐬𝐬 𝐨𝐟 𝐌𝐚𝐫𝐜𝐡

  • Writer: stookyabhay
    stookyabhay
  • Mar 5
  • 3 min read

Updated: 6 days ago

Abhay Pancharathi


Every year, 68 teams enter the most high-stakes tournament in Collegiate Basketball. However, some teams have a higher chance to hoist the Naismith Trophy and cement their place among the all-time teams in history. However, unlike the NBA and NFL playoffs where viewers have a general grasp over which teams will do well, College basketball is a complete toss-up. Teams are as likely to lose in the first round to a 16 seed as they are to go all the way and win the title. Because of the unpredictability of the tournament, people strive to make a perfect bracket every year, a feat that has never been done and will almost definitely never happen. You are more likely to be struck by lightning 4 times at the same time than to make a perfect bracket. Because of this, I tried to make a predictive model that can hopefully give me a perfect bracket. To do this, I used the statistics from barttorvik.com and took the last 5 years' statistics (excluding the covid year), found correlations, and attempted to see who would win in a head-to-head matchup.


Gathering Data

To identify trends, first we have to gather data. I took the last 5 years' stats for all the teams that qualified for the round of 64, and filtered them into Google Sheets. This was 320 teams, all of which had various statistics. Then, I made a new column for games won in the tournament, with 6 games won being the National Champions, 5 games won being the runner up, 4 games won being a Final Four team, 3 games won being an Elite 8 team, 2 games won being a Sweet Sixteen team, 1 game won being a Round of 32 team and 0 games won being a Round of 64 first round knockout.


Finding Correlations

I then found correlations between the statistics and how many games the team won in the tournament. The primary goal of this was to eliminate a couple of variables to make the statistics test later easier. I narrowed it down to the top 8 statistics that were found to have the highest correlations, being adjusted offensive efficiency, adjusted defensive efficiency, Barthag (an estimate of the percent chance a team has to beat an average D1 team), Effective field goal percentage, Effective field goal percentage defended, Offensive rebounds, 2 point percentage defended, and 3 point percentage defended. All of these statistics make sense as being important, as most can be blended down into either playing good offense or good defense. After isolating these statistics, I created an equation to estimate the number of tournament wins each team would get. To do this, I used a Google Sheets extension called XL Miner Toolpak to find the coefficients that I would then turn into an equation.



Takeaways

The model is pretty good at picking the national champion, picking them correctly in 4 of the last 5 years. It also tends to select the final 4 members pretty well, predicting UConn and Purdue to make it (and advance) last year. Of course, there are limitations. The model was very high on Houston last year, but Jamal Shead's injury doomed them against Duke. It cannot predict injuries and cannot account for them pre-game. This is a major limitation with every model, as you can't equate a player's impact into a model. Also, it will not predict the 16 seed to beat the 1 seed upsets that happen. This model is more like if you were to simulate the matchup 100 times, the choice the model picks would happen the majority of the time. For example, when Fairleigh Dickinson beat Purdue a couple of years ago, that would happen maybe 5 out of 100 times. Hence, the model would have taken Purdue to win, but that doesn't mean there wasn't some small chance that FDU won, and that's what they did. This year, the model slightly favors Houston to win, with Duke close behind. I'll attach the results below as a conclusion.



If interested, the model is on the first sheet and the bracket (with live record checking) is to the right of the first sheet.


 
 
 

Comments


bottom of page