When the NFL season started this past Fall, I convinced my family (wife & two boys) to begin a weekly NFL picks contest and select winners for every game throughout the 2022 season. It was a fun way to spend some family time together, and one more data collection project for me (I already wrote about a similar project I did with my friends during the 2022 World cup, see here).
We were a little disorganized early in the season, resulting in a two-week delay, but after that, we picked winners for every game (from week 3 forward). Below, I will share a few nice-looking plots I created to display our season-long picks, and trends over the season. Then, I’ll discuss two prediction models which I ran after conducting an imputation procedure to correct for the missing weeks. In the last part, I show how I did when picking against the spread.
How did we do?
First, NFL analysts would be relieved to learn that picking games is not totally random as the person with the most background knowledge (well...Dad) actually got the highest number of correct picks. So overall, we have evidence for the importance of knowledge, a little less for random selection of outcomes.
Here are the full season results (regular season only). Using a coloring scheme, in-plot text and removing the x-axis labels, I show the aggregate numbers for correct and wrong picks for all four of us (and a small label for the season long winning percentage).
So how did we do? as mentioned above, there is evidence for the benefits of having some knowledge as the guy who knows a little about this league got almost 65% of his picks correct (good for me..). But we have few other interesting findings: first - everyone finished around .500, including Mom who went almost fully random selection of winners. Then, we have an 8-year old who finished with 55% success while using the FanDuel lines as guide, and an 11-year old who did mostly blind guessing (I was happy to encourage more randomness), and betting against the obvious picks, and still ended with about 50% success rate.
OK, so those are the aggregate results. We collected the data on a weekly basis, which means that we had a winner (or winners) every week (whoever made the most correct picks). Below, I use the flexible tools of tidyverse to highlight different aspects of the data, add labels at the base of every bar-group and more.
Correcting missing data
As I mentioned earlier, our picks began only on week 3, and due to holiday traveling, we also missed week 11. That means there are three missing observations for every person. In order to make some predictions, I had to overcome the missing data issue.
First, I created a weekly winning percentage variable that would be used as the main prediction measure. Then, to correct for the missing observations (total of 12 missing values), I employ an imputation procedure with the mice package. The procedure is based on the assumption that values are missing at random, and then replaces missing values based on an imputation model (I used the “pmm” option for predictive mean matching).
In order to have a more precise data and a valid process, I use the mice function to create five imputed datasets and then averages across them to replace the missing values (check the code for the full process).
With the process complete, I had a full, imputed, dataset of picks (18 weeks) to work with. Before building any prediction model, I plot the full season winning percentage. The figure below focuses on the winning percentage measure and highlights a calculated moving average, weekly winning percentage and the year-long average. To build the moving average layer, I used the geom_ma (this blogpost was super helpful). The plot below uses separate facets for a 'cleaner' view of the data, showing the weekly winning percentage, the 3-week moving average and the horizontal line which is the full season average.
Making predictions
One of the main objectives of this project was to build prediction models. With the missing data issue corrected, I had a full dataset to work with, and started with predicting the final week (week 18), followed by a second prediction of the expected full season average.
I began with a simple linear model in which the weekly winning percentage is a function of the previous week's rate. One more small data wrangling aspect was creating a lagging variable (using dplyr) so that I can run the model. Then, I faced the issue of missing values for the week 1 lagged variable. I chose to correct these values with the season long average value.
With a complete (imputed) data, I ran a linear model with clustered SE at the individual level (using lm_robust() from the estimatr package). The results point to a positive correlation with an estimated coefficient value of 0.16 (p < .10).
Based on this estimator, I conduct a simple prediction for the week 18 winning percentage, and compare it to the actual results. In order to show both values for each of us, I plot the predicted value and its confidence interval as well as the actual winning percent for week 18.
Again, I use labels and different coloring which helps create a nice display of how close was the linear prediction to the actual result. Mom finished the year strong as she outplayed the predicted value by more than 13%!!. On the other hand, my own picks ATS (against the spread, more on that later) fell way short of the expected value. The rest ended somewhere within or fairly close to the expected interval.
The imputation procedure corrected the missing values, yet I still faced a fairly small dataset (a panel of 90 observations over 18 weeks). With a small dataset, another popular prediction method is bootstrapping. In this approach, we conduct repeated sampling from the data to create probability distributions for the relevant measure. One way to run this procedure in R is using the sample() function to create the bootstrapped samples (based on 500 iterations) for each of us. Then, I merge the four separate samples together and with the ggridges package, I present all four distributions on one plot.
The resulting probability distributions of these procedures were pretty good. Comparing it to the actual season average (which is based on a reduced sample of picking only 15 games), Dad's real outcome (64.3%) was closer the 'edge' of the distribution (about 2 SD away from the average of 61.53%) while the three others were well within the 95% range of the prediction.
Picking Against-The-Spread (ATS)
As a family, we picked winners every week (and Dad came up on top, substantive knowledge matters!!). In addition, I picked every game against the spread line from FanDuel. While the total dataset of 15 weeks is limited in terms of making predictions, it can be plotted in various ways to assess my skills (or more likely luck?!) when picking ATS.
The first viz below does a few things – it shows the number of games in which I picked the favorite or the underdog to cover (shown using the 'external' light blue bars, and labels to highlight each type). Then, the 'internal' grey bars detail how many of my picks were correct. While searching for tools to present these results, I stumbled on the bullet chart version of barplots (super useful blog here) which is pretty cool as it illustrates the idea of successful picks out of total group using separate bars.
Overall, I ended with a pretty good success rate of 58.2% when picking dogs (and the points), while I had an average of 48.6% success when taking the favorites.
Lastly, few weeks back I read a twitter post on geom_textpath that allows to integrate text into different plotting options. I use it to plot my success percentage in picking underdogs, favorites and overall ATS over the entire season (the textpath can serve as replacement for adding a legend and thus conserve plotting space).
Summary
This project was the second time since the 2022 Fall in which I "recruited" people (my friends and family) to collect data and run some analysis. The NFL picks data was a great starting point to work on data correction procedures, and conduct small sample predictions (in future projects, I'll engage in more advanced prediction techniques). Finally, our NFL picks data provide evidence that while you can do pretty well picks-wise with randomly selecting results, substantive knowledge still matters in the long-run.
Code and data are available on my GitHub.
Comments