Using Pressure to Evaluate Centre Backs

Originally published on StatsBomb.

Analysing centre backs is a subject likely to provoke either a shrug or a wistful smile from an analytics practitioner. To varying degrees, there are numbers and metrics aplenty for other positions but in public analytics at least, development has been limited and a genuine track record of successful application is yet to be found. If centre back analysis is the holy grail of public football analytics, then the search thus far has been more Monty Python than Indiana Jones.

One of the major issues with centre back analysis is that positioning isn’t measured directly by on-ball event data and any casual football watcher can tell you that positioning is a huge part of the defensive art. Tracking data would be the ideal means to assess positioning but it comes at a high-cost both computationally and technically, while having a much smaller coverage in terms of leagues than simpler event data provision.

StatsBomb’s new pressure event data serves as a bridge between the traditional on-ball event data and the detailed information provided by tracking data, offering a new prism to investigate the style and effectiveness of centre backs. While it won’t provide information on what a defender is up to when he is not in the immediate vicinity of the ball, it does provide extra information on how they go about their task.

Starting at the basic counting level, centre backs averaged six pressure actions per ninety minutes in the Premier League last season. Tackles and interceptions clock in at 0.8 and 1.3 per 90 respectively, which immediately illustrates that pressure provides a great deal more information to chew on when analysing more ‘proactive’ defending. I’m classing clearances and blocking shots as ‘reactive’ given they mostly take place in the penalty area and are more-directly driven by the opponent, while aerial duels are a slightly different aspect of defending that I’m going to ignore for the purposes of this analysis.

The figure below maps out where these defensive actions occur on the pitch and is split between left and right centre backs. Pressure actions typically occur in wider areas in the immediate vicinity of the penalty area, with another peak in pressure just inside the top corner of the 18-yard box. This suggests that centre backs don’t engage too high up the pitch in terms of pressure and are generally moving out towards the flanks to engage opponents in a dangerous position and either slow-down an attack, cut down an attackers options or directly contest possession.

DefensiveMaps.png

Maps illustrating the location of pressure actions, interceptions and tackles by centre backs in the 2017/18 EPL season. Top row is for left-sided centre backs and the bottom row is for right-sided centre backs.

The location of pressure actions is somewhat similar to the picture for interceptions, although the shape of the latter is less well-defined and tends to extend higher up the pitch. Tackles peak in the same zone just outside the top corners of the penalty area but are also less spatially distinct. Tackles also peak next to the edge of the pitch, a feature that is less distinct in the pressure and interception maps.

Partners in Crime

The number of pressure actions a centre back accumulates during a match will be driven by their own personal inclinations and role within the team, as well as the peculiarities of a given match and season e.g. the tactics of their own team and the opposition or the number of dangerous opportunities their opponent creates. The figure below explores this by plotting each individual centre back’s pressure actions per ninety minutes against their team name. The team axis is sorted by the average number of pressure actions the centre backs on each team make over the season.

CB_Pressure_Actions_per90

Pressure actions per 90 minutes by centre backs in the 2017/18 EPL season (minimum 900 minutes played) by team. Team axis is sorted by the weighted average number of pressure actions the centre backs on each team make over the season.

At the top end of the scale, we see Arsenal and Chelsea, two teams that regularly played a back-three over the past season. Nacho Monreal and César Azpilicueta led the league in pressure actions per ninety minutes by a fair distance and it appears the additional cover provided by playing in a back-three and their natural instincts developed as full backs meant they were frequently putting their opponents under pressure. Manchester United top the list in terms of those predominantly playing with two centre backs, with all of their centre backs applying pressure at similar rates.

At the other end of the scale, Brighton and Leicester’s centre backs appear to favour staying at home in general. Both though are clear examples of there being an obvious split between the number of pressure actions by the primary centre backs on a team, with one being more aggressive while the other presumably holds their position and plays a covering role. This division of roles is perhaps most clearly demonstrated by Chelsea’s centre backs, with Azpilicueta and Antonio Rüdiger as the side centre backs being more proactive than their counter-part in the central defensive slot (Cahill or Christensen).

Liverpool’s improved defensive performance over the course of the season has been attributed to a range of factors, with the signing of Virgil Van Dijk for a world-record fee garnering much of the credit. Intriguingly, his addition to the Liverpool backline has seemingly offered a significant contrast to the club’s incumbents, who all favoured a slightly greater than average number of pressure actions. Furthermore, Van Dijk ranked towards the bottom of the list in terms of pressure actions for Southampton (4.5 per 90) as well, with his figure for Liverpool (3.7 per 90) representing a small absolute decline. As an aside, Van Dijk brings a lot to the table in terms of heading skills, where he ranks highly for both total and successful aerial duels, so he is still an active presence in this aspect, while being a low-event player in others.

Centre backs are often referred to as a partnership and the above illustrates how defensive units often setup to complement each others skill sets and attempt to become greater than the sum of their parts.

The Thompson Triangle

Mark Thompson has led the way in terms of public analytics work on centre backs and has advocated for stylistic-driven evaluations as the primary means of analysis, which can then be built on with more traditional scouting. Pressure actions add another string to this particular bow and the figure below contrasts the three proactive defensive actions discussed earlier. Players in different segments of the triangle are biased towards certain actions, with those in the corners being more strongly inclined towards one action over the other two.

CB-TernaryGraph

Comparison of player tendencies in terms of ‘proactive’ defensive actions in the 2017/18 EPL season (minimum 900 minutes played). Apologies for triggering any flashbacks to chemistry classes. Click figure to open in new window.

There is a lot to pour over in the figure, so I’ll focus on defenders who are most inclined towards pressure actions. One clear theme is that such centre backs frequently featured on the sides of a back-three. Ryan Shawcross is unusual in this aspect given he was generally the middle centre back in Stoke’s back-three, as well as the right centre back in a back four. Ciaran Clark at Newcastle and Kevin Long at Burnley are the only players who featured mostly as one of two centre backs, with their partner adopting a more reserved role.

The additional cover provided by a back-three system and the frequent requirement for the player on the flanks to pull wide and cover in behind their wing-back seemingly plays a large part in determining the profile of centre backs. This illustrates the importance of considering team setup in determining a defenders profile and should feed into any recruitment process alongside their individual inclinations.

The analysis presented provides descriptive metrics and illustrations of the roles played by centre backs and is very much a first look at this new data. While we can’t gain definitive information on positioning without constant tracking of a player, the pressure event data provides a new lens to evaluate centre backs and significantly increases the number of defensive actions that can be evaluated further. Armed with such information, these profiles can be built upon with further data-driven analysis and combined with video and in-person scouting to build a well-rounded profile on the potential fit of a player.

Now all we need is a shrubbery.

Advertisements

Measuring counter-pressing

Originally published on StatsBomb.

The concept of pressing has existed in football for decades but its profile has been increasingly raised over recent years due to its successful application by numerous teams. Jürgen Klopp and Pep Guardiola in particular have received acclaim across their careers, with pressing seen as a vital component of their success. There are numerous other recent examples, such as the rise of Atlético Madrid, Tottenham Hotspur and Napoli under Diego Simeone, Mauricio Pochettino and Maurizio Sarri respectively.

Alongside this rise, public analytics has sought to quantify pressing through various metrics. Perhaps the most notable and widely-used example was ‘passes per defensive action’ or PPDA, which was established by Colin Trainor and first came to prominence on this very website. Anecdotally, PPDA found its way inside clubs and serves as an example of public analytics penetrating the private confines of football. Various metrics have also examined pressing through the prism of ‘possessions’, which Michael Caley has put to effective use on numerous occasions. Over the past year, I sought to illustrate pressing by quantifying a team’s ability to disrupt pass completion. While this was built on some relatively complex numerical modelling, it did provide what I thought was a nice visual representation of the effectiveness of a team’s pressing.

While the above metrics and others have their merits, they tend to ignore that pressing can take several forms and are biased towards the outcome, rather than the actual process. The one public example that side-steps many of these problems is the incredible work by the Anfield Index team through their manual collection of Liverpool’s pressing over the past few seasons but this has understandably been limited to one team.

Step-forward the new pressure event data supplied by StatsBomb Services. This new data is an event that is triggered when a player is within a five-yard radius of an opponent in possession. The radius varies as errors by the opponent would prove more costly, with a maximum range of ten-yards that is usually associated with goalkeepers under pressure. As well as logging the players involved in the pressure event and its location, the duration of the event is also collected.

The data provides an opportunity to explore pressing in greater detail than ever before. Different teams use different triggers to instigate their press, which can now be isolated and quantified. Efficiency and success can be separated from the pressing process in a number of ways at both the team and player-level. Such tools can be used in team-evaluation, opposition scouting and player recruitment.

One such application of the new data is to explore gegenpressing or counter-pressing, which is the process where a team presses the opposition immediately after losing possession. The initial aim of counter-pressing is to disrupt the opponent’s counter-attack, which can be a significant danger during the transition phase from attack-to-defence when a team is more defensively-unstable. Ideally possession is quickly won back from the opponent, with some teams seeking to exploit such situations to attack quickly upon regaining possession. Five seconds is often used as a cut-off for the period where pressure on the opposition is most intensely applied during the counter-press.

The exciting new dimension provided by StatsBomb’s new pressure data is that the definition of counter-pressing you would find in a coaching manual can be directly drawn from the data i.e. a team applies pressure to their opponent following a change in possession. The frequency at which counter-pressing occurs can be quantified and then we can develop various metrics to examine the success or failure of this process. Furthermore, we can analyse counter-pressing at the player-level, which has been out-of-reach previously.

The figure below illustrates where on the pitch counter-pressing occurs based on data from 177 matches from the Premier League this past season. The pitch is split into six horizontal zones and is orientated so that the team out-of-possession is playing from left-to-right. The colouring on the pitch shows the proportion of open-play possessions starting in each zone where pressure is applied within five seconds of a new possession.

AvgCounterPressMap.png

The figure illustrates that pressure is most commonly applied on possessions starting in the midfield zones, with marginally more pressure in the opposition half. Possessions beginning in the highest zone up the pitch come under less pressure, which is likely driven by the lower density of players in this zone on average. Very few possessions actually begin in the deepest zone and a smaller proportion of them come under pressure quickly than those in midfield.

From a tactical perspective, pressing is generally reserved for areas outside of a team’s own defensive third. The exact boundary will vary but for the following analysis, I have only considered possessions starting higher up the pitch, as denoted by the counter-pressing line in the previous figure.

In the figures below, the proportion of possessions in the counter-pressing zones where pressure is applied within five seconds is referred to as the ‘counter-pressing fraction’. In the sample of matches from the Premier League this season, a little under half (0.47) of open-play possessions come under pressure from their opponent within five seconds. At the top of the counter-pressing rankings, we see Manchester City, Tottenham Hotspur and Liverpool, which is unsurprising given the reputations of their managers. At the bottom end of the scale, we find a collection of teams that have mostly been overseen by British managers who are more-known for a deep-defensive line.

Team_CounterPressFraction

On the right-hand figure above, the strong association between counter-pressing and possession is illustrated, with the two showing a high correlation coefficient of 0.86 in this aggregated sample. Interpreting causality here is somewhat problematic given the likely circular relationship between the two parameters; teams that dominate possession may have more energy to press intensively, leading to a greater counter-pressing fraction, which would lead to them winning possession back more quickly, which will potentially increase their possession share and so on. The correlation is weaker for individual matches (0.36), which hints at some greater complexity and is something that can be returned to at a later date.

Perhaps the most interesting finding in the above figures is Burnley’s high counter-pressing fraction. The majority of analysis on Burnley has focused on their defensive structure within their own box and how that affects their defensive performance in relation to expected goals. The figure illustrates that Burnley employ a relatively aggressive counter-press, especially in relation to their possession share.

Examining Burnley’s counter-pressing game in more detail reveals that they counter-press 18 possessions per game, which is above average and only slightly lower than Manchester City. However, they only actually regain possession within five seconds 2.5 times per game, which falls short of what you might expect on average and falls below their counter-pressing peers. In terms of the ratio between their counter-pressing regains and total counter-pressing possessions, they sit 17th on 14%.

Burnley’s counter-press is the fourth least-effective at limiting shots, with 13% of such possessions ending with them conceding a shot compared to the average rate of 10%. However, one thing in their favour is that these possessions are typically around the league average in terms of their length and speed of attack, which will allow Burnley to regain their vaunted defensive organisation prior to conceding such shots.

The more dominant discourse around pressing is as an attacking rather than defensive weapon, so narratives are often formed around teams that regularly win back the ball through pressing and use this to generate fast attacks e.g. Liverpool and Tottenham Hotspur. As a result, a team like Burnley who seemingly employ counter-pressing as a defence-first tactic to prevent counter-attacks and slow attacking progress may be overlooked.

Burnley’s manager, Sean Dyche, has typically been lumped-in with the tactical stylings of the perennially-employed British managers who aren’t generally associated with pressing tactics. Dyche was reportedly most impressed by the pressing game employed by Guardiola’s Barcelona and he has seemingly implemented some of these ideas at Burnley. He has instilled an approach that combines counter-pressing and a low-block with numbers behind the ball, which is a neat trick to pull-off; Diego Simeone and Atlético Madrid are perhaps the more apt comparison given such traits.

The above analysis illustrates the ability of StatsBomb’s new pressure event data to illuminate an important aspect of the modern game. Furthermore, it is able to do this in a manner that directly translates tactical principles, separating underlying process and outcome, which is a giant step-forward for analytics. It also led to an analysis discussing the similarity between Guardiola’s legendary Barcelona team and Sean Dyche’s Burnley, which was probably unexpected to say the least.

This is just a taster of what is possible with StatsBomb’s new data. There’s more information in this presentation from the StatsBomb launch event and you can expect more analysis to appear over the summer and beyond.

Liverpool and I

While I probably watched Liverpool play before then, the first match I remember watching was on the 4th January 1994, when a nine-year-old me saw them come back from three goals down, which would become something of a theme. As is the want of memory, the events that leave an indelible mark are the ones that stand-out; my first actual football memory is Paul Bodin missing that penalty and not really understanding the scale of the disappointment. Turned out Wales’ last World Cup match was in 1958 when some no-mark seventeen-year-old called Edson Arantes do Nascimento scored his first international goal and knocked them out in the quarter-final.

Other early memories include one of God’s defining miracles, with a hat-trick notched up in four minutes and thirty three seconds and learning about player aging curves when I realised that the slow yet classy guy in midfield used to be one of the most devastating and exciting wide-players the game had ever seen. My first match at Anfield was Ian Rush’s last there in a red shirt, while subsequent visits took in thrilling cup matches under the gaze of King Kenny and the best live sporting experience of my life as I bounced out of Anfield full of hope in April 2014.

While a league title has proved elusive during my supporting life, Europe has provided the greatest thrills, with tomorrow marking a third European Cup Final to go along with two finals in the junior competition. A European Cup Final once every eight years on average, with all three in the last fourteen years is pretty good going for a non-super club, albeit one with significant resources.

Real Madrid are clearly going to be a tough nut to crack, with Five Thirty Eight, Club Elo and Euro Club Index all ranking them as the second best team around. The same systems have Liverpool as the fifth, seventh and eleventh best, so under-dogs with a good chance at glory overall.

According to Club Elo, the 2018 edition of Liverpool will be the best to contest a European Cup Final this century but on the flip-side, Real Madrid are stronger than either of the AC Milan teams that they faced in 2005 and 2007. Despite this, Liverpool are given a slightly better shot at taking home Old Big Ears than they had in 2005, as the gap between them and their opponents is narrower. The strides that the team made under Rafa between the 2005 and 2007 finals meant that the latter was contested by two equal teams.

Liverpool should evidently be approaching the final with optimism and further evidence of this is illustrated in the figure below, which shows the top-fifty teams by non-penalty expected goal difference in the past eight Premier League seasons. The current incarnation of Liverpool sit fifth and would usually be well-positioned to seriously challenge for the title. As the figure also illustrates, the scale of Manchester City’s dominance in their incredible season is well-warranted.

EPL-8-seasons-xGD.png

Top-fifty teams by non-penalty expected goal difference over the past eight Premier League seasons. Liverpool are highlighted in red, with the 17/18 season marked by the star marker. Data via Opta.

Liverpool’s stride forward under Klopp this past season has taken them beyond the 13/14 and 12/13 incarnations in terms of their underlying numbers. In retrospect, Rodgers’ first season was quietly impressive even if it wasn’t reflected in the table and it set the platform for the title challenge the following season.

Compared to those Luis Suárez-infused 12/13 and 13/14 seasons, the attacking output this past season is slightly ahead, with the team sitting sixth in the eight-season sample, which is their best over the period. Including penalties would take the 13/14 vintage beyond the latest incarnation, with the former scoring ten from the twelve (!) awarded, while 17/18 saw only three awarded (two scored).

The main difference between the current incarnation though is on the defensive end, with the team having the fifth best record in terms of non-penalty expected goals conceded this past season in the eight-year sample. The 13/14 season’s defence was the seventh worst by the club in this eight-year period and they lay thirty-fourth overall. These contrasting records equate to an eight non-penalty expected goal swing in their defensive performance.

While the exhilarating attacking intent of this Liverpool side is well-established, they are up against another attacking heavyweight; could it be that the defensive side of the game is the most decisive? The second half of this season is especially encouraging on this front, with improvements in both expected and actual performance. This period represents the sixth best half season over these eight-seasons (out of a total of 320) and a three-goal swing compared to the first half of the season. This was slightly offset by a reduction in attacking output of two non-penalty expected goals but the overall story is one of improvement.

The loss of Coutinho, addition of van Dijk and employing a keeper with hands (edit 2203 26/05/18: well at least he gets his hands to it usually) between the sticks is a clear demarcation in Liverpool’s season and it is this period that has seen the thrilling run to the European Cup Final. The improved balance between attack and defence bodes well and I can’t wait to see what this team can do on the biggest stage in club football.

Allez, Allez, Allez!

On Burnley and expected goals

Originally published on StatsBomb.

Expected goals has found itself outside the confines of the analytics community this season, which has brought renewed questions regarding its flaws, foibles and failures. The poster-child for expected goals flaws has been Burnley and their over-performing defence, even finding themselves in the New York Times courtesy of Rory Smith. Smith’s article is a fine piece blending comments and insights from the analytics community and Sean Dyche himself.

Smith quotes Dyche describing Burnley’s game-plan when defending:

The way it is designed is to put a player in a position that it is statistically, visually and from experience, harder to score from.

Several analyses have dug deeper into Burnley’s defence last season, including an excellent piece by Mark Thompson for StatsBomb. In his article, Mark used data from Stratagem to describe how Burnley put more men between the shooter and the goal than their peers, which may go some way to explaining their over-performance compared with expected goals.

Further work on the EightyFivePoints blog quantified how the number of intervening defenders affected chance quality. The author found that when comparing an expected goal model with and without information on the number of intervening defenders and a rating for defensive pressure, Burnley saw the largest absolute difference between these models (approximately 4 goals over the season).

If there is a quibble with Smith’s article it is that it mainly focuses on this season, which was only 12 games old at the time of writing. Much can happen in small samples where variance often reigns, so perhaps expanding the analysis to more seasons would be prudent.

The table below summarises Burnley’s goals and expected goals conceded over the past three and a bit seasons.

xG_Table

Burnley’s non-penalty expected goals and goals conceded over the past four seasons. Figures for first 13 games of 2017/18 season. Data via Opta.

Each season is marked by over-performance, with fewer goals conceded than expected. This ranges from 5 goals last season to a scarcely believable 15 goals during their promotion season in the Championship.

The above table and cited articles paint a picture of a team that has developed a game-plan that somewhat flummoxes expected goals and gains Burnley an edge when defending. However, if we dig a little deeper, the story isn’t quite as neat as would perhaps be expected.

Below are cumulative totals for goals and expected goals as well as the cumulative difference over each season.

Burnley_xG_Timeline_Full

Burnley’s cumulative non-penalty goals and expected goals conceded (left) and the cumulative difference between them (right) over the past four seasons. Figures for first 13 games of 2017/18 season. Data via Opta.

In their 2014/15 season, Burnley were actually conceding more goals than expected for the majority of the season until a run of clean sheets at the end of the season saw them out-perform expected goals. After the 12 game mark in their Championship season, they steadily out-performed expected goals, yielding a huge over-performance. This continued into their 2016/17 season in the Premier League over the first 10 games where they conceded only 12 goals compared to an expected 19. However, over the remainder of the season, they slightly under-performed as they conceded 39 goals compared with 36 expected goals.

The above illustrates that Burnley’s over-performance in previous Premier League seasons is actually driven by just a handful of games, rather than a systematic edge when examining expected goals.

This leaves us needing to put a lot of weight on their Championship season if we’re going to believe that Burnley are magical when it comes to expected goals. While the descriptive and predictive qualities of expected goals in the Premier League is well-established, there is less supporting evidence for the Championship. Consequently it may be wise to take their Championship figures with a few grains of salt.

This season and last has seen Burnley get off to hot starts, with what looks like a stone-cold classic example of regression to the mean last season. If we ignore positive variance for a moment, perhaps their opponents got wise to their defensive tactics and adapted last season but then you have to assume that they’ve either forgotten the lessons learned this season or Dyche has instigated a new and improved game-plan.

The cumulative timelines paint a different picture to the season aggregated numbers, which might lead us to conclude that Burnley’s tactics don’t give them quite the edge that we’ve been led to believe. In truth we’ll never be able to pin down exactly how much is positive variance and how much is driven by their game-plan.

However, we can state that given our knowledge of the greater predictive qualities of expected goals when compared to actual goals, we would expect Burnley’s goals against column to be closer to their expected rate (1.3 goals per game) than their current rate (0.6 goals per game) over the rest of the season.

Time will tell.

What has happened to the Klopp press?

Originally published on StatsBomb.

When asked how his Liverpool team would play by the media horde who greeted his unveiling as manager two years ago, Jürgen Klopp responded:

We will conquer the ball, yeah, each fucking time! We will chase the ball, we will run more, fight more.

The above is a neat synopsis of Klopp’s preferred style of play, which focuses on pressing the opponent after losing the ball and quickly transitioning into attack. It is a tactic that he successfully deployed at Borussia Dortmund and one that he has employed regularly at Liverpool.

However, a noticeable aspect of the new season has been Liverpool seemingly employing a less feverish press. The Anfield Index Under Pressure Podcast led the way with their analysis, which was followed by The Times’ Jonathan Northcroft writing about it here and Sam McGuire for Football Whispers.

Liverpool’s pass disruption map for the past three seasons is shown below. Red signifies more disruption (greater pressure), while blue indicates less disruption (less pressure). In the 2015/16 and 2016/17 seasons, the team pressed effectively high up the pitch but that has slid so far this season to a significant extent. There is some disruption in the midfield zone but at a lower level than previously.

LFC_dxP.png

Liverpool’s zonal pass completion disruption across the past three seasons. Teams are attacking from left-to-right, so defensive zones are to the left of each plot. Data via Opta.

The above numbers are corroborated by the length of Liverpool’s opponent possessions increasing by approximately 10% this season compared to the rest of Klopp’s reign. Their opponents so far this season have an average possession length of 6.5 seconds, which is lower than the league average but contrasts strongly with the previous figures that have been among the shortest in the league.

Examining their pass disruption figures game-by-game reveals further the reduced pressure that Liverpool are putting on their opponents. During 2015/16 and 2016/17, their average disruption value was around -2.5%, which they’ve only surpassed once in Premier League matches this season, with the average standing at -0.66%.

LFC-xP-17-18

Liverpool’s game-by-game pass completion disruption for 2017/18 English Premier League season. Figures are calculated for zones above Opta x-coordinates greater than 40. Data via Opta.

The Leicester match is the major outlier and examining their passing further indicates that the high pass disruption was a consequence of them attempting a lot of failed long passes. This is a common response to Liverpool’s press as teams go long to bypass the pressure.

Liverpool’s diminished press is likely a deliberate tactic that is driven by the added Champions League matches the team has faced so far this season. The slightly worrisome aspect of this tactical shift is that Liverpool’s defensive numbers have taken a hit.

In open-play, Liverpool’s expected goals against figure is 0.81 per game, which is up from 0.62 last season. Furthermore, their expected goals per shot has risen to 0.13 from 0.11 in open-play. To add further defensive misery, Liverpool’s set-piece woes (specifically corners) have actually got worse this season. The team currently sit eleventh in expected goals conceded this season, which is a fall from fifth last year.

This decline in underlying defensive performance has at least been offset by a rise on the attacking side of 0.4 expected goals per game to 1.78 this season. Overall, their expected goal difference of 0.79 this season almost exactly matches the 0.81 of last season.

Liverpool’s major problem last season was their soft under-belly but they were often able to count on their pressing game denying their opponents opportunities to exploit it. What seems to be happening this season is that the deficiencies at the back are being exploited more with the reduced pressure ahead of them.

With the season still being relatively fresh, the alarm bells shouldn’t be ringing too loudly but there is at least cause for concern in the numbers. As ever, the delicate balancing act between maximising the sides attacking output while protecting the defense is the key.

Klopp will be searching for home-grown solutions in the near-term and a return to the familiar pressing game may be one avenue. Given the competition at the top of the table, he’ll need to find a solution sooner rather than later, lest they be left behind.

Under pressure

Originally published on StatsBomb.

Models that attempt to measure passing ability have been around for several years, with Devin Pleuler’s 2012 study being the first that I recall seeing publicly. More models have sprung up in the past year, including efforts by Paul Riley, Neil Charles and StatsBomb Services. These models aim to calculate the probability of a pass being completed using various inputs about the start and end location of the pass, the length of the pass, the angle of it, as well as whether it is played with the head or foot.

Most applications have analysed the outputs from such models from a player passing skill perspective but they can also be applied at the team level to glean insights. Passing is the primary means of constructing attacks, so perhaps examining how a defense disrupts passing could prove enlightening?

In the figure below, I’ve used a pass probability model (see end of post for details and code) to estimate the difficulty in completing a pass and then compared this to the actual passing outcomes at a team-level. This provides a global measure of how much a team disrupts their opponents passing. We see the Premier League’s main pressing teams with the greatest disruption, through to the barely corporeal form represented by Sunderland.

Team_PCDgraph

Pass completion disruption for the 2016/17 English Premier League season. Disruption is defined as actual pass completion percentage minus expected pass completion percentage. Negative values means opponent’s complete fewer passes than expected. Data via Opta.

The next step is to break this down by pitch location, which is shown in the figure below where the pitch has been broken into five bands with pass completion disruption calculated for each. The teams are ordered from most-to-least disruptive.

PressureMap

Zonal pass completion disruption for 2016/17 English Premier League season. Teams are attacking from left-to-right, so defensive zones are to the left of each plot. Data via Opta.

We see Manchester City and Spurs disrupt their opponents passing across the entire pitch, with Spurs’ disruption skewed somewhat higher. Liverpool dominate in the midfield zones but offer little disruption in their deepest-defensive zone, suggesting that once a team breaks through the press, they have time and/or space close to goal; a familiar refrain when discussing Liverpool’s defense.

Chelsea offer an interesting contrast with the high-pressing teams, with their disruption gradually increasing as their opponents inch closer to their goal. What stands out is their defensive zone sees the greatest disruption (-2.8%), which illustrates that they are highly disruptive where it most counts.

The antithesis of Chelsea is Bournemouth who put together an average amount of disruption higher up the pitch but are extremely accommodating in their defensive zones (+4.5% in their deepest-defensive zone). Sunderland place their opponents under limited pressure in all zones aside from their deepest-defensive zone where they are fairly average in terms of disruption.

The above offers a glimpse of the defensive processes and outcomes at the team level, which can be used to improve performance or identify weaknesses to exploit. Folding such approaches into pre-game routines could quickly and easily supplement video scouting.

Appendix: Pass probability model

For this post, I built two different passing models; the first used Logistic Regression and the second used Random Forests. The code for each model is available here and here.

Below is a comparison between the two, which compares expected success rates with actual success rates on out-of-sample test data.

Actual_vs_Expected

Actual versus expected pass success for two different models. Data via Opta.

The Random Forest method performs better than the Logistic Regression model, particularly for low probability passes. This result is confirmed when examining the Receiver Operating Characteristics (ROC) curves in the figure below. The Area Under the Curve (AUC) for the Random Forest model is 87%, while the Logistic Regression AUC is 81%.

xPass_AUC

Receiver Operating Characteristics (ROC) curves for the two different passing models. Data via Opta.

Given the better performance of the Random Forest model, I used this in the analysis in the main article.

Liverpool 2017/18 season preview

Originally published on StatsBomb.

Liverpool enter the season with aspirations of challenging for the title after an at times hugely promising and exciting first full season under Jürgen Klopp. The prospect of European adventures returning on Tuesday or Wednesday nights is tantalizing close providing they negotiate their Champions League qualifying round.

The story so far

Liverpool’s tally of 76 points last season was their joint-third best tally over the last decade and only their second top-four finish since the Benitez years. In fact, after a run of four top-four finishes, Liverpool haven’t registered back-to-back Champions League qualifications since Rafa left and have on average finished sixth during that time with 65 points on the board.

With the above in mind, it’s tempting to view a season of consolidation as the priority for the coming season, alongside beginning to re-establish the team as a European force. Liverpool’s underlying performance last season is encouraging, with their goal return reasonably in-line with expectation and their expected goal difference placing them well in contention for a title push.

Drilling further into their expected goal numbers, sees a team that experienced fluctuating under-lying performance over the course of the season with a significant decline once 2017 was rung in. The graphic below illustrates this alongside a longer-term outlook encompassing the past five seasons.

LFC_xG_TimeLine.png

Rolling 19-game average expected goal timeline over the past five seasons. Grey vertical lines denote new season.

The heights of 2016/17 are close to those of the Suárez-powered team under Rodgers, while the low-point is more in-line with Klopp’s early tenure at the club. The past season thus illustrated that the team was capable of title-contending performances at times but also switched to a team competing for the fourth-place trophy at best.

Upping the pace

Closer examination of the downturn in performance using my ‘team strategy analysis‘ shows a drying up of shot generation via high-quality chances born of fast-paced attacks from deep and after midfield-transitions.

Sadio Mané was evidently missed due to AFCON duties and injury over the latter half of the season and this is borne out by the numbers. According to my model, he was second best in the EPL (0.11 per 90) in terms of xG-contribution (the sum of expected goals and assists) from fast-paced attacks following a midfield-transition. For fast-attacks from deep, he ranked sixth for xG-contribution (0.12 per 90).

Thankfully, Mohamed Salah, the club’s major acquisition so far, brings complementary qualities to the table and adds much-needed depth to the wide-forward ranks. James Yorke of this parish has already praised the signing earlier this summer and my only addition is that Salah showed up quite highly for xG-contribution (0.07 per 90, ranking eleventh in Serie A) for fast-paced attacks following a midfield-transition. The addition of Salah improves what was already a healthy front-line attack.

Defensive issues

According to the Objective Football website run by Benjamin Pugsley, Liverpool conceded just 8.1 non-penalty shots per game, ranking second over the past eight seasons behind a Pep-infused Manchester City last year. Shots-on-target conceded (3.0 per game) told a similar story, ranking joint-sixth over the same period. However, they combined these extraordinary shot-suppression numbers with the highest expected goals per shot in the league (0.11), which is the worst value I have over the past five seasons. When Liverpool conceded shots, they were of high quality, which ultimately saw them sit fifth in terms of expected goals against last season.

Klopp’s tactical system deserves credit for melding a highly exciting attack with strong defensive aspects in terms of shot-suppression. The optimistic take here is that tweaks and a greater familiarity with his counter-pressing tactics could bring about improvements in shot quality conceded, thereby seeing better defensive numbers. It’s worth noting the period during November and December 2016 when their expected goals against was the lowest it has been consistently over a 19-game span in the past five seasons, so the current squad is capable of sustained excellence in this realm.

The pursuit of Virgil van Dijk does suggest that the club are aiming to recruit a new starting centre-back. That saga remains running at the time of writing as the world waits to find out just how costly a single ice cream can be. Centre-back depth is an issue that needs to be rectified; Lucas Leiva made six appearances as a centre-back last term is all the evidence needed for that statement.

The other aspect of Liverpool’s defense that could improve is in the goalkeeping stakes. From a pure shot-stopping perspective, Karius has the best pedigree; in my goalkeeper shot-stopping analysis, Karius came 31st across the data-set with a rating of 91%, which is a pretty decent indication that he is an above-average shot-stopper. Mignolet fared much worse with a ranking of just 25%, which puts him at best as an average shot-stopper during his Liverpool career to date. I haven’t looked at numbers for the Championship but Mark Taylor’s numbers for Ward at Huddersfield were not encouraging. Playing Karius would be a bold move by Klopp given his limited exposure to English football thus far but Mignolet doesn’t provide much confidence either -personally, I would go with Karius.

Title talk

If I’ve learnt anything while sifting through the data for this preview, it’s that Manchester City should be strong favourites for the title this coming season.

Can Liverpool challenge them, while also competing in Europe? At present, I’d side with no given the depth issues of last season have yet to be addressed and the remaining questions marks in terms of the defense.

Liverpool’s other transfer saga involving Naby Keita could be a game-changer given that he could have a transformative impact on the team’s midfield but the likelihood of him signing appears to be receding by the day. Midfield depth is also potentially an issue unless Klopp is happy to rely on youth to cover midfield absentees over the season.

With potentially five teams in the Champions League group stages, progress to the latter rounds could have a strong bearing on league form post-Christmas. Six into four is likely the maths heading into the new season and Liverpool should be well in the mix.

Prediction: Third We’re gonna win the league

More thinking about goalkeepers

Following my previous article on the shot-stopping ability of goalkeepers, Mike Goodman posed an interesting question on Twitter:

This is certainly not an annoying question and I tend to think that such questions should be encouraged in the analytics community. Greater discussion should stimulate further work and enrich the community.

It certainly stands to reason and observation that goalkeepers can influence a strikers options and decision-making when shooting but extracting robust signals of such a skill may prove problematic.

To try and answer this question, I built a quick model to calculate the likelihood that a non-blocked shot would end up on target. It’s essentially the same model as in my previous post but for expected shots on target rather than goals. The idea behind the model is that goalkeepers who are able to ‘force’ shots off-target would have a net positive rating when subtracting actual shots on target from the expected rate.

When I looked at the results, two of the standout names were Gianluigi Buffon and Jan Oblak; Buffon is a legend of the game and up there with the best of all time, while Oblak is certainly well regarded, so not a bad start.

However, after delving a little deeper, dragons started appearing in the analysis.

In theory, goalkeepers influencing shot-on-target rates would do so for shots closer to goal as they would narrow the amount of goal they can aim for via their positioning. However, I found the exact opposite. Further investigation of the model workings pointed to the problem – the model showed significant biases depending on whether the shot was inside or outside the area.

This is shown below where actual and expected shot-on-target totals for each goalkeeper are compared. For shots inside the box, the model tends to under-predict, while the opposite is the case for outside the box shots. These two biases cancelled each other out when looking at the full aggregated numbers (the slope was 0.998 for total shots-on-target vs the expected rate).

Act_vs_Ex_SoT.png

Actual vs expected shots-on-target totals for goalkeepers considered in the analysis. Dashed line is the 1:1 line, while the solid line is the line of best fit. Left-hand plot is for shots inside the box, while the right-hand plot is for shots outside the box. Data via Opta.

The upshot of this was that goalkeepers performing well-above expectation were doing so due to shots from longer-range being off-target when compared to the expected rates for the model. I suspect that the lack of information on defensive pressure is skewing the results and introducing bias into the model.

Now when we think of Buffon and Oblak performing well, we recall that they play behind probably the two best defenses in Europe at Juventus and Atlético respectively. Rather than ascribing the over-performance to goalkeeping skill, the effect is likely driven by the defensive pressure applied by their team-mates and issues with the model.

Exploring model performance is something I’ve written about previously and I would also highly recommend this recent article by Garry Gelade on assessing expected goals. While the above is an unsatisfactory ending for the analysis, it does illustrate the importance of testing model output prior to presenting results and testing whether such results match with our theoretical expectations.

Knowing what questions analytics can and cannot answer is a pretty useful thing to know. Better luck next time hopefully.

 

Thinking about goalkeepers

Goalkeepers have typically been a tough nut to crack from a data analytics point-of-view. Randomness is an inherent aspect of goal-scoring, particularly over small samples, which makes drawing robust conclusions at best challenging and at worst foolhardy. Are we identifying skill in our ratings or are we just being sent down the proverbial garden path by variance?

To investigate some of these issues, I’ve built an expected save model that takes into account shot location and angle, whether the shot is a header or not and shot placement. So a shot taken centrally in the penalty area sailing into the top-corner will be unlikely to be saved, while a long-range shot straight at the keeper in the centre of goal should usually prove easier to handle.

The model is built using data from the past four seasons of the English, Spanish, German and Italian top leagues. Penalties are excluded from the analysis.

Similar models have been created by new Roma analytics guruStephen McCarthyColin Trainor & Constantinos Chappas and Thom Lawrence in the past.

The model thus provides an expected goal value for each shot that a goalkeeper faces, which we can then compare with the actual outcome. In a simpler world, we could easily identify shot-stopping skill by taking the difference between reality and expectation and then ranking goalkeepers by who has the best (or worst) difference.

However, this isn’t a simple world, so we run into problems like those illustrated in the graphic below.

Keeper_Funnel_Plot.png

Shot-stopper-rating (actual save percentage minus expected save percentage) versus number of shots faced. The central black line at approximately zero is the median, while the blue shaded region denotes the 90% confidence interval. Red markers are individual players. Data via Opta.

Each individual red marker is a player’s shot-stopper rating over the past four seasons versus the number of shots they’ve faced. We see that for low shot totals, there is a huge range in the shot-stopper-ranking but that the spread decreases as the number of shots increases, which is an example of regression to the mean.

To illustrate this further, I used a technique called boot-strapping to re-sample the data and generate confidence intervals for an average goalkeeper. This re-sampling is done 10,000 times to create a probability distribution built by randomly extracting groups of shots from the data-set and calculating actual and expected save percentages and then seeing how large the difference is. We see a strong narrowing of the blue uncertainty envelope up to around 50 shots, with further narrowing up to about 200 shots. After this, the narrowing is less steep.

What this effectively means is that there is a large band of possible outcomes that we can’t realistically separate from noise for an average goalkeeper. Over a season, a goalkeeper faces a little over 100 shots on target (119 on average according to the data used here). Thus, there is a huge opportunity for randomness to play a role and it is therefore of little surprise to find that there is little repeatability year-on-year for save percentage.

Things do start to settle down as shot totals increase though. After 200 shots, a goalkeeper would need to be performing more than ± 4% on the shot-stopper-rating scale to stand up to a reasonable level of statistical significance. After 400 shots, signal is easier to discern with a keeper needing to register more than ± 2% to emerge from the noise. That is not to say that we should be beholden to statistical significance but it is certainly worth bearing in mind in any assessment plus an understanding of the uncertainty inherent in analytics can be a powerful weapon to wield.

What we do see in the graphic above are many goalkeepers outside of the blue uncertainty envelope. This suggests that we might be able to identify keepers who are performing better or worse than the average goalkeeper, which would be pretty handy for player assessment purposes. Luckily, we can employ some more maths courtesy of Pete Owen who presented a binomial method to rank shot-stopping performance in a series of posts available here and here.

The table below lists the top-10 goalkeepers who have faced more than 200 shots over the past four seasons by the binomial ranking method.

GK-Top10.png

Top-10 goalkeepers as ranked by their binomial shot-stopper-ranking. Post-shot refers to expected save model that accounts for shot placement. Data via Opta.

I don’t know about you but that doesn’t look like too shabby a list of the top keepers. It may be that some of the names on the list have serious flaws in their game aside from shot-stopping but that will have to wait another day and another analysis.

So where does that leave us in terms of goalkeeping analytics? On one hand, we have noisy unrepeatable metrics from season-to-season. On the other, we appear to have some methods available to extract the signal from the noise over larger samples. Even then, we might be being fooled by aspects not included in the model or the simple fact that we expect to observe outliers.

Deficiencies in the model are likely our primary concern but these should be checked by a skilled eye and video clips, which should already be part of the review process (quit sniggering at the back there). Consequently, the risks ingrained in using an imperfect model can be at least partially mitigated against.

Requiring 2-3 seasons of data to get a truly robust view on shot-stopping ability may be too long in some cases. However, perhaps we can afford to take a longer-term view for such an important position that doesn’t typically see too much turnover of personnel compared to other positions. The level of confidence you might want when short-listing might well depend on the situation at hand; perhaps an 80% chance of your target being an above average shot-stopper would be palatable in some cases?

All this is to say that I think you can assess goalkeepers by the saves they do or do not make. You just need to be willing to embrace a little uncertainty in the process.

On the anatomy of a counter-attack

Originally published on StatsBomb.

One of the most enduring aspects of football is the multitude of tactical and stylistic approaches that can be employed to be successful. Context is king in analytics and football as a whole, so the ability to identify and quantify these approaches is crucial for both opposition scouting and player transfer profiles.

At the OptaPro Forum this year, I looked at data from the past five Premier League seasons and used a sprinkling of maths to categorise shots into different types.

One such style I identified was ‘fast attacks from deep’, which were a distinct class of shots born of fast and direct possessions originating in the defensive zone. While these aren’t entirely synonymous with counter-attacks, there is likely a lot of overlap; the classical counter-attack is likely a subset of the deep fast-attacks identified in the data.

These fast-attacks from deep typically offer good scoring chances, with above average shot conversion (10.7%) due to the better shot locations afforded to them. They made up approximately 23% of the shots in my analysis.

So what do they look like?

To provide an overview of the key features of these attacks, I’ve averaged them together to get a broad picture of their progression up the pitch. I’ve presented this below and included a look at attacks from deep that involve more build-up play for comparison.

Fast-attacks_vs_Build-up

Comparison between fast-attacks from deep and attacks from deep that focus on slower build-up play. Vertical pitch position refers to the progression of an attack towards the opponent’s goal (vertical pitch position equal to 100). Both attack types start and end in similar locations on average but their progress with time is quite different. The shading is the standard deviation to give an idea of the spread inherent in the data. Data via Opta.

Fast-attacks from deep are characterised by an initial speedy progression towards goal within a team’s own half, followed by a steadier advance in the attacking half. This makes sense qualitatively as counter-attacks often see a quick transition in their early stages to properly establish the attacking opportunity. The attack can then be less frenetic as a team seeks to create the best opportunity possible from the situation.

Over the past five seasons, the stand out teams as rated by shot volume and expected goals have been various incarnations of Arsenal, Manchester City, Chelsea and Liverpool.

The architects

Player-level metrics can be used to figure out who the crucial architects of a counter-attacking situation are. One method of examining this is how many yards a player’s passing progressed the ball during deep fast-attacking possessions.

Below I’ve listed the top 10 players from the 2016/17 season by this metric on a per 90 minute basis, alongside some other metrics for your delectation.

BallProgression_SummaryTable

Top players ranked by ball progression per 90 minutes (in yards) during fast-attacks from deep for the 2016/17 Premier League season. xGoals and Goals per 90 are for possessions that a player is involved in (known as xGChain in some parts). Players with more than 1800 minutes only. Data via Opta.

While the focus was often on him kicking people rather than the ball, we see that Granit Xhaka stands alone in terms of ball progression, with Daley Blind a long way behind him in second place. Xhaka’s long-range passing skills are well known, so combining this with the most passes per 90 in such situations propels him to the top of pile.

The graphic below illustrates Xhaka’s passing during deep fast-attacks, with his penchant for long passes spread all over the midfield zone evident. For comparison, I’ve included Eden Hazard’s passing map as someone who played many important passes that were limited in terms of ball progression as they were typically shorter or lateral passes in the final third.

Xhaka_Hazard_PassMaps.png

Passes played by Granit Xhaka and Eden Hazard during fast-attacks from deep during the 2016/17 season. Solid circles denote pass origin, while the arrows indicate the direction and end point of each pass. Data via Opta.

Evidently there is a link between position and ball progression, as players in deeper positions have greater scope to progress the ball as they have more grass in front of them. The likes of Coutinho, Özil and De Bruyne residing so high up the rankings is therefore impressive.

Coutinho_DeBruyne_PassMaps.png

Passes played by Philippe Coutinho and Kevin Dr Bruyne during fast-attacks from deep during the 2016/17 season. Data via Opta.

Coutinho’s passing chalkboard above illustrates his keen eye for a pass from midfield areas through opposition defensive lines, as does De Bruyne’s ability to find teammates inside the penalty area. De Bruyne’s contribution actually ranks highest in terms of xG per 90 for the past season.

The finishers

While ball progression through the defensive and midfield zones is important for these fast-attacks from deep, they still require the finishing touches in the final third. There are fewer more frustrating sights in football than watching a counter-attack be botched in its final moments.

The graphic below summarises the top players in this crucial aspect by examining their expected goal and assist outputs. Unsurprisingly, Kevin De Bruyne leads the way here and is powered by his exceptional creative passing.

xGandxA_DFA

Top-20 players rated by expected goals (xG) plus expenses assists (xA) for fast-attacks from deep. Players with more than 1800 minutes only. Data via Opta.

The list is dominated by players from the top-6 clubs, with Negredo the only interloper inside the top-10 ranking. Middlesbrough’s minimal attacking output left few scraps of solace for Negredo but at least he did get a few shots away in these high-value situations to alleviate the boredom.

Conclusion

The investigation of tactical and stylistic approaches carried out above merely scratches the surface of possibilities for opposition scouting and player profiling.

Being able to identify ‘successful’ attacking moves opens the door to examining ‘failed’ possessions, which would allow efficiency to be studied as well as defensive aspects. This is an area rich with promise that I’ll examine in the future, along with other styles identified within the same framework.