top of page
Search
Writer's pictureRotem_D

Beyond the big six? Studying Premier League trends with cluster analysis

The Big Six is commonly used by pundits, writers and analysts when describing the EPL over the past decades. The term Big-Six primarily refers to the clubs populating the top of the league table, but it is also highly descriptive of their financial strength.

Yet, we may wonder, is it really a Big-Six group fighting for titles? Man.City and Liverpool have been pretty stable over the past half-decade, but there has been some turnover behind those two in terms of how other teams filled the top spots in the league table: Think about Aston Villa just this past year, or Newcastle the year before, Arsenal only recently returned to "top" form, even Leicester City was a

"top-six" member a few years back, and both Chelsea and Man. United have been zig-zagging in and out of this group for several seasons.

Perhaps we need a different descriptor? In other words, maybe we should think about grouping all EPL clubs in a different way? For starters, we should extend beyond a potential (and rather crude) Top-six, and also think about Bottom-3 (or more?), and what about all those clubs packing the middle?


A new season is upon us, so I decided to tackle this question using cluster analysis. K-Means clustering is an unsupervised machine learning method that splits a large dataset into a predetermined number of clusters (or groups) based on similarities in the data. It has been used by some football writers to look at individual player performance or to as part of a more nuanced tactical analysis.

For this post, I'm using K-means clustering to identify different groups of EPL clubs based on multiple underlying performance metrics. What I hope to gain from this analysis is a better sense of the "state of the league" and how clubs stack-up and fit certain descriptions.

I conduct a team-based cluster analysis with FBref data over the past five seasons (2019-2024), comparing if and how the grouping of all clubs has changed during that time. I adopt five years with the logic that it is a reasonable time frame that allows clubs to "move between groups" if they improve or decline.

Lastly, I use the most common performance metrics: GD and xGD, and expand it a little further with SoT, SCA90, and GCA90. This list of metrics goes beyond the basic GD (and the even more important xGD) with measurements of performance that account for multiple aspects of the game and their contribution to creating goal scoring chances (for full transparency, I began by testing models with a much larger list of metrics but the results did not change much, so I opted to simplify and take a clearer approach focusing on the aforementioned five metrics).


Grouping the Premier League (2019-2024)

I begin the analysis with tests that explore the recommended number of clusters to partition the data. The results shifted between three and four clusters for each of the 5-year datasets. That got me thinking - working with only three clusters is likely to create groups with a rather large number of members (we may end-up with groups of 9-10 clubs, and perhaps more). This is likely to make it much harder to learn much about the characteristics of each group, and its members. Instead, I settled on four clusters per season as a way to get a better idea of the uniqueness of each cluster and its members.

For each season, I run the K-means algorithm with four clusters representing four groups: The Title challengers; Aspiring for Europe; The Middle-grounders; and Fighting against relegation. I present the results with cluster plots for each season. These visuals group clubs together based on their degree of similarity in the five performance metrics. The Title challengers are colored in gold-yellow, Aspiring for Europe are green, The Middle-grounders group is the blue, and teams in the Fighting against relegation are colored in red.


What can we learn about a Big-Six in the EPL over the past five seasons?

Several patterns emerge from the analysis. First and foremost, there is not much evidence that a Big-Six exists in the EPL. Across all five seasons, the top group (the leftmost on each plot) usually consists of three members (it never consists more than four). So right off the bat (apologies for the baseball term) we see that maybe it is time to retire the Big-Six terminology.



What else stands out? The top group - Title Challengers, is more of a mini-group always revolving around Man.City. For the most part, City had Liverpool as company and recently, Arsenal joined this top cluster. Either way, this tier (with two prominent members) has been the dominant force in the EPL over the past five years, and not many other clubs were in close contention or can be viewed as real participants in the title race.

Speaking of the reigning champs, they are all-alone in the top group for 2019-2020, which is odd considering it’s the only season in our data in which Man.City did not win the title. Yet, analyzing 2019-20 should be done carefully as it was the crazy and truncated COVID season in which Liverpool (YNWA, sorry I had to) technically guaranteed the title by March, and then relented, which is reflected in their final position in the second tier.

The second group, Aspiring for Europe also appears to be pretty stable with members' count shifting from a minimum of three to a maximum of six. Some are mainstays (Tottenham and Man.United) that have been seeking European glory (or the less glorious Europa league) pretty regularly. Also, we see few clubs that “transit through” this tier like Arsenal on their climb to the top tier between 2020-2024. Other clubs also "passed through" the tier, but this was on their way down - see Leicester city, who dropped all the way to the Championship between 2019-2023.

As we might expect, the largest group on average is the Middle-grounders. Several members are in a somewhat stasis condition and stay in this group for most of the 5-year period. This includes Everton, Crystal Palace, Wolves and Brentford. But this group also feature transitions as some clubs head to the relegation group (Southampton twice!) or those who graduate to an upper-tier (Arsenal, Brighton, Tottenham, Aston Villa).

The Fighting against relegation group ended as the second largest in most seasons, which may be somewhat surprising. In three seasons of our data, there are between 5-7 teams trying to stave off relegation! At the same time, the difficulty of survival for newly promoted clubs is evident as those are the most likely members of this group. This point was exemplified in the 2023-24 season when the sole members of the Fighting against relegation group were the promoted/eventual relegated clubs (Burnley, Luton Town and Sheffield Utd.).

One more interesting result is how weird was the 2022-2023 season. We find Newcastle bolting to the top tier, a mere two years after being in the relegation danger zone, and finishing with a UEFA champions league spot. Another one is Brighton, who spend several years in the Middle-grounders group before finally graduating to a higher group as the De Zerbi Ball mania was in full swing, and ended with the Seagulls qualifying for European competition for the first time (in some versions of the model, Brighton actually climbed all the way to the top tier in 2023).

 

Big-Six no more

The term Big-six has been used for quite some time to describe the top-tier of the EPL. People have used it as a way to distinguish those teams from the rest of the pack in terms of the league standings, but also their financial strength. I began this study with some doubts. I finish it pretty confident to say that I do not this is still relevant way to describe the EPL. From a financial perspective, the OG Big-Six still posses the highest market values. But we can also think about the market value, budget size and total expenses of non-Big-Six clubs like Newcastle, Aston Villa and West Ham. Using Transfermarket data, the latter three clubs rank 6-8 in transfer values, and are actually ahead of both Liverpool and Man.City in terms of transfer balance over the past 5 years.

But more important and interesting is the league competition and results. First, several OG members have taken quite a fall in terms of top-6 league finish, see Chelsea and Man.United over the past few seasons, or Arsenal that only recently returned to the mix.

Beyond that, the term Big-Six suggests those are members of an exclusive club - the Title chasers, and a group that very few others are "allowed to join". In reality, what the data suggests is that this is not really the case in the past half-decade. At best, threre are three clubs in true contention form. And when looking even closer, it was more of a dynamic due of title contenders (first Man.City and Liverpool, now Man.City and Arsenal).


What about the upcoming 2024-25 season?

Looking ahead, the trend of the past five years is likely to persist with probably two teams in the top group and true contention form (currently, Man.City are the heavy favorites, with Arsenal following, per Opta). It is possible that Liverpool will join the party, but the managerial changes might make it harder than expected. The others are likely to populate what seems like a bigger than usual group of Aspiring for Europe and may include Newcastle, Man.United, Aston Villa, Tottenham, Chelsea, and possibly 1-2 members of the Middle-grounders. Yet, one thing that the analysis above has taught us is that there is always someone who jumps to a group or drops out. So expect the unexpected and embrace the randomness - this is the fun part of football.


Enjoy the season!!


Additional data analytics posts are available on my blog. Also, you can check-out my Github for data and code of many of these posts.


15 views0 comments

Comments


Visuals 1: Map
Visuals 2: Treemap
Visuals 3: Donut Charts
Visuals 4: Tables
bottom of page