Using Pressure to Evaluate Centre Backs

Originally published on StatsBomb.

Analysing centre backs is a subject likely to provoke either a shrug or a wistful smile from an analytics practitioner. To varying degrees, there are numbers and metrics aplenty for other positions but in public analytics at least, development has been limited and a genuine track record of successful application is yet to be found. If centre back analysis is the holy grail of public football analytics, then the search thus far has been more Monty Python than Indiana Jones.

One of the major issues with centre back analysis is that positioning isn’t measured directly by on-ball event data and any casual football watcher can tell you that positioning is a huge part of the defensive art. Tracking data would be the ideal means to assess positioning but it comes at a high-cost both computationally and technically, while having a much smaller coverage in terms of leagues than simpler event data provision.

StatsBomb’s new pressure event data serves as a bridge between the traditional on-ball event data and the detailed information provided by tracking data, offering a new prism to investigate the style and effectiveness of centre backs. While it won’t provide information on what a defender is up to when he is not in the immediate vicinity of the ball, it does provide extra information on how they go about their task.

Starting at the basic counting level, centre backs averaged six pressure actions per ninety minutes in the Premier League last season. Tackles and interceptions clock in at 0.8 and 1.3 per 90 respectively, which immediately illustrates that pressure provides a great deal more information to chew on when analysing more ‘proactive’ defending. I’m classing clearances and blocking shots as ‘reactive’ given they mostly take place in the penalty area and are more-directly driven by the opponent, while aerial duels are a slightly different aspect of defending that I’m going to ignore for the purposes of this analysis.

The figure below maps out where these defensive actions occur on the pitch and is split between left and right centre backs. Pressure actions typically occur in wider areas in the immediate vicinity of the penalty area, with another peak in pressure just inside the top corner of the 18-yard box. This suggests that centre backs don’t engage too high up the pitch in terms of pressure and are generally moving out towards the flanks to engage opponents in a dangerous position and either slow-down an attack, cut down an attackers options or directly contest possession.

DefensiveMaps.png

Maps illustrating the location of pressure actions, interceptions and tackles by centre backs in the 2017/18 EPL season. Top row is for left-sided centre backs and the bottom row is for right-sided centre backs.

The location of pressure actions is somewhat similar to the picture for interceptions, although the shape of the latter is less well-defined and tends to extend higher up the pitch. Tackles peak in the same zone just outside the top corners of the penalty area but are also less spatially distinct. Tackles also peak next to the edge of the pitch, a feature that is less distinct in the pressure and interception maps.

Partners in Crime

The number of pressure actions a centre back accumulates during a match will be driven by their own personal inclinations and role within the team, as well as the peculiarities of a given match and season e.g. the tactics of their own team and the opposition or the number of dangerous opportunities their opponent creates. The figure below explores this by plotting each individual centre back’s pressure actions per ninety minutes against their team name. The team axis is sorted by the average number of pressure actions the centre backs on each team make over the season.

CB_Pressure_Actions_per90

Pressure actions per 90 minutes by centre backs in the 2017/18 EPL season (minimum 900 minutes played) by team. Team axis is sorted by the weighted average number of pressure actions the centre backs on each team make over the season.

At the top end of the scale, we see Arsenal and Chelsea, two teams that regularly played a back-three over the past season. Nacho Monreal and César Azpilicueta led the league in pressure actions per ninety minutes by a fair distance and it appears the additional cover provided by playing in a back-three and their natural instincts developed as full backs meant they were frequently putting their opponents under pressure. Manchester United top the list in terms of those predominantly playing with two centre backs, with all of their centre backs applying pressure at similar rates.

At the other end of the scale, Brighton and Leicester’s centre backs appear to favour staying at home in general. Both though are clear examples of there being an obvious split between the number of pressure actions by the primary centre backs on a team, with one being more aggressive while the other presumably holds their position and plays a covering role. This division of roles is perhaps most clearly demonstrated by Chelsea’s centre backs, with Azpilicueta and Antonio Rüdiger as the side centre backs being more proactive than their counter-part in the central defensive slot (Cahill or Christensen).

Liverpool’s improved defensive performance over the course of the season has been attributed to a range of factors, with the signing of Virgil Van Dijk for a world-record fee garnering much of the credit. Intriguingly, his addition to the Liverpool backline has seemingly offered a significant contrast to the club’s incumbents, who all favoured a slightly greater than average number of pressure actions. Furthermore, Van Dijk ranked towards the bottom of the list in terms of pressure actions for Southampton (4.5 per 90) as well, with his figure for Liverpool (3.7 per 90) representing a small absolute decline. As an aside, Van Dijk brings a lot to the table in terms of heading skills, where he ranks highly for both total and successful aerial duels, so he is still an active presence in this aspect, while being a low-event player in others.

Centre backs are often referred to as a partnership and the above illustrates how defensive units often setup to complement each others skill sets and attempt to become greater than the sum of their parts.

The Thompson Triangle

Mark Thompson has led the way in terms of public analytics work on centre backs and has advocated for stylistic-driven evaluations as the primary means of analysis, which can then be built on with more traditional scouting. Pressure actions add another string to this particular bow and the figure below contrasts the three proactive defensive actions discussed earlier. Players in different segments of the triangle are biased towards certain actions, with those in the corners being more strongly inclined towards one action over the other two.

CB-TernaryGraph

Comparison of player tendencies in terms of ‘proactive’ defensive actions in the 2017/18 EPL season (minimum 900 minutes played). Apologies for triggering any flashbacks to chemistry classes. Click figure to open in new window.

There is a lot to pour over in the figure, so I’ll focus on defenders who are most inclined towards pressure actions. One clear theme is that such centre backs frequently featured on the sides of a back-three. Ryan Shawcross is unusual in this aspect given he was generally the middle centre back in Stoke’s back-three, as well as the right centre back in a back four. Ciaran Clark at Newcastle and Kevin Long at Burnley are the only players who featured mostly as one of two centre backs, with their partner adopting a more reserved role.

The additional cover provided by a back-three system and the frequent requirement for the player on the flanks to pull wide and cover in behind their wing-back seemingly plays a large part in determining the profile of centre backs. This illustrates the importance of considering team setup in determining a defenders profile and should feed into any recruitment process alongside their individual inclinations.

The analysis presented provides descriptive metrics and illustrations of the roles played by centre backs and is very much a first look at this new data. While we can’t gain definitive information on positioning without constant tracking of a player, the pressure event data provides a new lens to evaluate centre backs and significantly increases the number of defensive actions that can be evaluated further. Armed with such information, these profiles can be built upon with further data-driven analysis and combined with video and in-person scouting to build a well-rounded profile on the potential fit of a player.

Now all we need is a shrubbery.

Advertisements

Measuring counter-pressing

Originally published on StatsBomb.

The concept of pressing has existed in football for decades but its profile has been increasingly raised over recent years due to its successful application by numerous teams. Jürgen Klopp and Pep Guardiola in particular have received acclaim across their careers, with pressing seen as a vital component of their success. There are numerous other recent examples, such as the rise of Atlético Madrid, Tottenham Hotspur and Napoli under Diego Simeone, Mauricio Pochettino and Maurizio Sarri respectively.

Alongside this rise, public analytics has sought to quantify pressing through various metrics. Perhaps the most notable and widely-used example was ‘passes per defensive action’ or PPDA, which was established by Colin Trainor and first came to prominence on this very website. Anecdotally, PPDA found its way inside clubs and serves as an example of public analytics penetrating the private confines of football. Various metrics have also examined pressing through the prism of ‘possessions’, which Michael Caley has put to effective use on numerous occasions. Over the past year, I sought to illustrate pressing by quantifying a team’s ability to disrupt pass completion. While this was built on some relatively complex numerical modelling, it did provide what I thought was a nice visual representation of the effectiveness of a team’s pressing.

While the above metrics and others have their merits, they tend to ignore that pressing can take several forms and are biased towards the outcome, rather than the actual process. The one public example that side-steps many of these problems is the incredible work by the Anfield Index team through their manual collection of Liverpool’s pressing over the past few seasons but this has understandably been limited to one team.

Step-forward the new pressure event data supplied by StatsBomb Services. This new data is an event that is triggered when a player is within a five-yard radius of an opponent in possession. The radius varies as errors by the opponent would prove more costly, with a maximum range of ten-yards that is usually associated with goalkeepers under pressure. As well as logging the players involved in the pressure event and its location, the duration of the event is also collected.

The data provides an opportunity to explore pressing in greater detail than ever before. Different teams use different triggers to instigate their press, which can now be isolated and quantified. Efficiency and success can be separated from the pressing process in a number of ways at both the team and player-level. Such tools can be used in team-evaluation, opposition scouting and player recruitment.

One such application of the new data is to explore gegenpressing or counter-pressing, which is the process where a team presses the opposition immediately after losing possession. The initial aim of counter-pressing is to disrupt the opponent’s counter-attack, which can be a significant danger during the transition phase from attack-to-defence when a team is more defensively-unstable. Ideally possession is quickly won back from the opponent, with some teams seeking to exploit such situations to attack quickly upon regaining possession. Five seconds is often used as a cut-off for the period where pressure on the opposition is most intensely applied during the counter-press.

The exciting new dimension provided by StatsBomb’s new pressure data is that the definition of counter-pressing you would find in a coaching manual can be directly drawn from the data i.e. a team applies pressure to their opponent following a change in possession. The frequency at which counter-pressing occurs can be quantified and then we can develop various metrics to examine the success or failure of this process. Furthermore, we can analyse counter-pressing at the player-level, which has been out-of-reach previously.

The figure below illustrates where on the pitch counter-pressing occurs based on data from 177 matches from the Premier League this past season. The pitch is split into six horizontal zones and is orientated so that the team out-of-possession is playing from left-to-right. The colouring on the pitch shows the proportion of open-play possessions starting in each zone where pressure is applied within five seconds of a new possession.

AvgCounterPressMap.png

The figure illustrates that pressure is most commonly applied on possessions starting in the midfield zones, with marginally more pressure in the opposition half. Possessions beginning in the highest zone up the pitch come under less pressure, which is likely driven by the lower density of players in this zone on average. Very few possessions actually begin in the deepest zone and a smaller proportion of them come under pressure quickly than those in midfield.

From a tactical perspective, pressing is generally reserved for areas outside of a team’s own defensive third. The exact boundary will vary but for the following analysis, I have only considered possessions starting higher up the pitch, as denoted by the counter-pressing line in the previous figure.

In the figures below, the proportion of possessions in the counter-pressing zones where pressure is applied within five seconds is referred to as the ‘counter-pressing fraction’. In the sample of matches from the Premier League this season, a little under half (0.47) of open-play possessions come under pressure from their opponent within five seconds. At the top of the counter-pressing rankings, we see Manchester City, Tottenham Hotspur and Liverpool, which is unsurprising given the reputations of their managers. At the bottom end of the scale, we find a collection of teams that have mostly been overseen by British managers who are more-known for a deep-defensive line.

Team_CounterPressFraction

On the right-hand figure above, the strong association between counter-pressing and possession is illustrated, with the two showing a high correlation coefficient of 0.86 in this aggregated sample. Interpreting causality here is somewhat problematic given the likely circular relationship between the two parameters; teams that dominate possession may have more energy to press intensively, leading to a greater counter-pressing fraction, which would lead to them winning possession back more quickly, which will potentially increase their possession share and so on. The correlation is weaker for individual matches (0.36), which hints at some greater complexity and is something that can be returned to at a later date.

Perhaps the most interesting finding in the above figures is Burnley’s high counter-pressing fraction. The majority of analysis on Burnley has focused on their defensive structure within their own box and how that affects their defensive performance in relation to expected goals. The figure illustrates that Burnley employ a relatively aggressive counter-press, especially in relation to their possession share.

Examining Burnley’s counter-pressing game in more detail reveals that they counter-press 18 possessions per game, which is above average and only slightly lower than Manchester City. However, they only actually regain possession within five seconds 2.5 times per game, which falls short of what you might expect on average and falls below their counter-pressing peers. In terms of the ratio between their counter-pressing regains and total counter-pressing possessions, they sit 17th on 14%.

Burnley’s counter-press is the fourth least-effective at limiting shots, with 13% of such possessions ending with them conceding a shot compared to the average rate of 10%. However, one thing in their favour is that these possessions are typically around the league average in terms of their length and speed of attack, which will allow Burnley to regain their vaunted defensive organisation prior to conceding such shots.

The more dominant discourse around pressing is as an attacking rather than defensive weapon, so narratives are often formed around teams that regularly win back the ball through pressing and use this to generate fast attacks e.g. Liverpool and Tottenham Hotspur. As a result, a team like Burnley who seemingly employ counter-pressing as a defence-first tactic to prevent counter-attacks and slow attacking progress may be overlooked.

Burnley’s manager, Sean Dyche, has typically been lumped-in with the tactical stylings of the perennially-employed British managers who aren’t generally associated with pressing tactics. Dyche was reportedly most impressed by the pressing game employed by Guardiola’s Barcelona and he has seemingly implemented some of these ideas at Burnley. He has instilled an approach that combines counter-pressing and a low-block with numbers behind the ball, which is a neat trick to pull-off; Diego Simeone and Atlético Madrid are perhaps the more apt comparison given such traits.

The above analysis illustrates the ability of StatsBomb’s new pressure event data to illuminate an important aspect of the modern game. Furthermore, it is able to do this in a manner that directly translates tactical principles, separating underlying process and outcome, which is a giant step-forward for analytics. It also led to an analysis discussing the similarity between Guardiola’s legendary Barcelona team and Sean Dyche’s Burnley, which was probably unexpected to say the least.

This is just a taster of what is possible with StatsBomb’s new data. There’s more information in this presentation from the StatsBomb launch event and you can expect more analysis to appear over the summer and beyond.

Liverpool and I

While I probably watched Liverpool play before then, the first match I remember watching was on the 4th January 1994, when a nine-year-old me saw them come back from three goals down, which would become something of a theme. As is the want of memory, the events that leave an indelible mark are the ones that stand-out; my first actual football memory is Paul Bodin missing that penalty and not really understanding the scale of the disappointment. Turned out Wales’ last World Cup match was in 1958 when some no-mark seventeen-year-old called Edson Arantes do Nascimento scored his first international goal and knocked them out in the quarter-final.

Other early memories include one of God’s defining miracles, with a hat-trick notched up in four minutes and thirty three seconds and learning about player aging curves when I realised that the slow yet classy guy in midfield used to be one of the most devastating and exciting wide-players the game had ever seen. My first match at Anfield was Ian Rush’s last there in a red shirt, while subsequent visits took in thrilling cup matches under the gaze of King Kenny and the best live sporting experience of my life as I bounced out of Anfield full of hope in April 2014.

While a league title has proved elusive during my supporting life, Europe has provided the greatest thrills, with tomorrow marking a third European Cup Final to go along with two finals in the junior competition. A European Cup Final once every eight years on average, with all three in the last fourteen years is pretty good going for a non-super club, albeit one with significant resources.

Real Madrid are clearly going to be a tough nut to crack, with Five Thirty Eight, Club Elo and Euro Club Index all ranking them as the second best team around. The same systems have Liverpool as the fifth, seventh and eleventh best, so under-dogs with a good chance at glory overall.

According to Club Elo, the 2018 edition of Liverpool will be the best to contest a European Cup Final this century but on the flip-side, Real Madrid are stronger than either of the AC Milan teams that they faced in 2005 and 2007. Despite this, Liverpool are given a slightly better shot at taking home Old Big Ears than they had in 2005, as the gap between them and their opponents is narrower. The strides that the team made under Rafa between the 2005 and 2007 finals meant that the latter was contested by two equal teams.

Liverpool should evidently be approaching the final with optimism and further evidence of this is illustrated in the figure below, which shows the top-fifty teams by non-penalty expected goal difference in the past eight Premier League seasons. The current incarnation of Liverpool sit fifth and would usually be well-positioned to seriously challenge for the title. As the figure also illustrates, the scale of Manchester City’s dominance in their incredible season is well-warranted.

EPL-8-seasons-xGD.png

Top-fifty teams by non-penalty expected goal difference over the past eight Premier League seasons. Liverpool are highlighted in red, with the 17/18 season marked by the star marker. Data via Opta.

Liverpool’s stride forward under Klopp this past season has taken them beyond the 13/14 and 12/13 incarnations in terms of their underlying numbers. In retrospect, Rodgers’ first season was quietly impressive even if it wasn’t reflected in the table and it set the platform for the title challenge the following season.

Compared to those Luis Suárez-infused 12/13 and 13/14 seasons, the attacking output this past season is slightly ahead, with the team sitting sixth in the eight-season sample, which is their best over the period. Including penalties would take the 13/14 vintage beyond the latest incarnation, with the former scoring ten from the twelve (!) awarded, while 17/18 saw only three awarded (two scored).

The main difference between the current incarnation though is on the defensive end, with the team having the fifth best record in terms of non-penalty expected goals conceded this past season in the eight-year sample. The 13/14 season’s defence was the seventh worst by the club in this eight-year period and they lay thirty-fourth overall. These contrasting records equate to an eight non-penalty expected goal swing in their defensive performance.

While the exhilarating attacking intent of this Liverpool side is well-established, they are up against another attacking heavyweight; could it be that the defensive side of the game is the most decisive? The second half of this season is especially encouraging on this front, with improvements in both expected and actual performance. This period represents the sixth best half season over these eight-seasons (out of a total of 320) and a three-goal swing compared to the first half of the season. This was slightly offset by a reduction in attacking output of two non-penalty expected goals but the overall story is one of improvement.

The loss of Coutinho, addition of van Dijk and employing a keeper with hands (edit 2203 26/05/18: well at least he gets his hands to it usually) between the sticks is a clear demarcation in Liverpool’s season and it is this period that has seen the thrilling run to the European Cup Final. The improved balance between attack and defence bodes well and I can’t wait to see what this team can do on the biggest stage in club football.

Allez, Allez, Allez!

What has happened to the Klopp press?

Originally published on StatsBomb.

When asked how his Liverpool team would play by the media horde who greeted his unveiling as manager two years ago, Jürgen Klopp responded:

We will conquer the ball, yeah, each fucking time! We will chase the ball, we will run more, fight more.

The above is a neat synopsis of Klopp’s preferred style of play, which focuses on pressing the opponent after losing the ball and quickly transitioning into attack. It is a tactic that he successfully deployed at Borussia Dortmund and one that he has employed regularly at Liverpool.

However, a noticeable aspect of the new season has been Liverpool seemingly employing a less feverish press. The Anfield Index Under Pressure Podcast led the way with their analysis, which was followed by The Times’ Jonathan Northcroft writing about it here and Sam McGuire for Football Whispers.

Liverpool’s pass disruption map for the past three seasons is shown below. Red signifies more disruption (greater pressure), while blue indicates less disruption (less pressure). In the 2015/16 and 2016/17 seasons, the team pressed effectively high up the pitch but that has slid so far this season to a significant extent. There is some disruption in the midfield zone but at a lower level than previously.

LFC_dxP.png

Liverpool’s zonal pass completion disruption across the past three seasons. Teams are attacking from left-to-right, so defensive zones are to the left of each plot. Data via Opta.

The above numbers are corroborated by the length of Liverpool’s opponent possessions increasing by approximately 10% this season compared to the rest of Klopp’s reign. Their opponents so far this season have an average possession length of 6.5 seconds, which is lower than the league average but contrasts strongly with the previous figures that have been among the shortest in the league.

Examining their pass disruption figures game-by-game reveals further the reduced pressure that Liverpool are putting on their opponents. During 2015/16 and 2016/17, their average disruption value was around -2.5%, which they’ve only surpassed once in Premier League matches this season, with the average standing at -0.66%.

LFC-xP-17-18

Liverpool’s game-by-game pass completion disruption for 2017/18 English Premier League season. Figures are calculated for zones above Opta x-coordinates greater than 40. Data via Opta.

The Leicester match is the major outlier and examining their passing further indicates that the high pass disruption was a consequence of them attempting a lot of failed long passes. This is a common response to Liverpool’s press as teams go long to bypass the pressure.

Liverpool’s diminished press is likely a deliberate tactic that is driven by the added Champions League matches the team has faced so far this season. The slightly worrisome aspect of this tactical shift is that Liverpool’s defensive numbers have taken a hit.

In open-play, Liverpool’s expected goals against figure is 0.81 per game, which is up from 0.62 last season. Furthermore, their expected goals per shot has risen to 0.13 from 0.11 in open-play. To add further defensive misery, Liverpool’s set-piece woes (specifically corners) have actually got worse this season. The team currently sit eleventh in expected goals conceded this season, which is a fall from fifth last year.

This decline in underlying defensive performance has at least been offset by a rise on the attacking side of 0.4 expected goals per game to 1.78 this season. Overall, their expected goal difference of 0.79 this season almost exactly matches the 0.81 of last season.

Liverpool’s major problem last season was their soft under-belly but they were often able to count on their pressing game denying their opponents opportunities to exploit it. What seems to be happening this season is that the deficiencies at the back are being exploited more with the reduced pressure ahead of them.

With the season still being relatively fresh, the alarm bells shouldn’t be ringing too loudly but there is at least cause for concern in the numbers. As ever, the delicate balancing act between maximising the sides attacking output while protecting the defense is the key.

Klopp will be searching for home-grown solutions in the near-term and a return to the familiar pressing game may be one avenue. Given the competition at the top of the table, he’ll need to find a solution sooner rather than later, lest they be left behind.

Under pressure

Originally published on StatsBomb.

Models that attempt to measure passing ability have been around for several years, with Devin Pleuler’s 2012 study being the first that I recall seeing publicly. More models have sprung up in the past year, including efforts by Paul Riley, Neil Charles and StatsBomb Services. These models aim to calculate the probability of a pass being completed using various inputs about the start and end location of the pass, the length of the pass, the angle of it, as well as whether it is played with the head or foot.

Most applications have analysed the outputs from such models from a player passing skill perspective but they can also be applied at the team level to glean insights. Passing is the primary means of constructing attacks, so perhaps examining how a defense disrupts passing could prove enlightening?

In the figure below, I’ve used a pass probability model (see end of post for details and code) to estimate the difficulty in completing a pass and then compared this to the actual passing outcomes at a team-level. This provides a global measure of how much a team disrupts their opponents passing. We see the Premier League’s main pressing teams with the greatest disruption, through to the barely corporeal form represented by Sunderland.

Team_PCDgraph

Pass completion disruption for the 2016/17 English Premier League season. Disruption is defined as actual pass completion percentage minus expected pass completion percentage. Negative values means opponent’s complete fewer passes than expected. Data via Opta.

The next step is to break this down by pitch location, which is shown in the figure below where the pitch has been broken into five bands with pass completion disruption calculated for each. The teams are ordered from most-to-least disruptive.

PressureMap

Zonal pass completion disruption for 2016/17 English Premier League season. Teams are attacking from left-to-right, so defensive zones are to the left of each plot. Data via Opta.

We see Manchester City and Spurs disrupt their opponents passing across the entire pitch, with Spurs’ disruption skewed somewhat higher. Liverpool dominate in the midfield zones but offer little disruption in their deepest-defensive zone, suggesting that once a team breaks through the press, they have time and/or space close to goal; a familiar refrain when discussing Liverpool’s defense.

Chelsea offer an interesting contrast with the high-pressing teams, with their disruption gradually increasing as their opponents inch closer to their goal. What stands out is their defensive zone sees the greatest disruption (-2.8%), which illustrates that they are highly disruptive where it most counts.

The antithesis of Chelsea is Bournemouth who put together an average amount of disruption higher up the pitch but are extremely accommodating in their defensive zones (+4.5% in their deepest-defensive zone). Sunderland place their opponents under limited pressure in all zones aside from their deepest-defensive zone where they are fairly average in terms of disruption.

The above offers a glimpse of the defensive processes and outcomes at the team level, which can be used to improve performance or identify weaknesses to exploit. Folding such approaches into pre-game routines could quickly and easily supplement video scouting.

Appendix: Pass probability model

For this post, I built two different passing models; the first used Logistic Regression and the second used Random Forests. The code for each model is available here and here.

Below is a comparison between the two, which compares expected success rates with actual success rates on out-of-sample test data.

Actual_vs_Expected

Actual versus expected pass success for two different models. Data via Opta.

The Random Forest method performs better than the Logistic Regression model, particularly for low probability passes. This result is confirmed when examining the Receiver Operating Characteristics (ROC) curves in the figure below. The Area Under the Curve (AUC) for the Random Forest model is 87%, while the Logistic Regression AUC is 81%.

xPass_AUC

Receiver Operating Characteristics (ROC) curves for the two different passing models. Data via Opta.

Given the better performance of the Random Forest model, I used this in the analysis in the main article.

Liverpool 2017/18 season preview

Originally published on StatsBomb.

Liverpool enter the season with aspirations of challenging for the title after an at times hugely promising and exciting first full season under Jürgen Klopp. The prospect of European adventures returning on Tuesday or Wednesday nights is tantalizing close providing they negotiate their Champions League qualifying round.

The story so far

Liverpool’s tally of 76 points last season was their joint-third best tally over the last decade and only their second top-four finish since the Benitez years. In fact, after a run of four top-four finishes, Liverpool haven’t registered back-to-back Champions League qualifications since Rafa left and have on average finished sixth during that time with 65 points on the board.

With the above in mind, it’s tempting to view a season of consolidation as the priority for the coming season, alongside beginning to re-establish the team as a European force. Liverpool’s underlying performance last season is encouraging, with their goal return reasonably in-line with expectation and their expected goal difference placing them well in contention for a title push.

Drilling further into their expected goal numbers, sees a team that experienced fluctuating under-lying performance over the course of the season with a significant decline once 2017 was rung in. The graphic below illustrates this alongside a longer-term outlook encompassing the past five seasons.

LFC_xG_TimeLine.png

Rolling 19-game average expected goal timeline over the past five seasons. Grey vertical lines denote new season.

The heights of 2016/17 are close to those of the Suárez-powered team under Rodgers, while the low-point is more in-line with Klopp’s early tenure at the club. The past season thus illustrated that the team was capable of title-contending performances at times but also switched to a team competing for the fourth-place trophy at best.

Upping the pace

Closer examination of the downturn in performance using my ‘team strategy analysis‘ shows a drying up of shot generation via high-quality chances born of fast-paced attacks from deep and after midfield-transitions.

Sadio Mané was evidently missed due to AFCON duties and injury over the latter half of the season and this is borne out by the numbers. According to my model, he was second best in the EPL (0.11 per 90) in terms of xG-contribution (the sum of expected goals and assists) from fast-paced attacks following a midfield-transition. For fast-attacks from deep, he ranked sixth for xG-contribution (0.12 per 90).

Thankfully, Mohamed Salah, the club’s major acquisition so far, brings complementary qualities to the table and adds much-needed depth to the wide-forward ranks. James Yorke of this parish has already praised the signing earlier this summer and my only addition is that Salah showed up quite highly for xG-contribution (0.07 per 90, ranking eleventh in Serie A) for fast-paced attacks following a midfield-transition. The addition of Salah improves what was already a healthy front-line attack.

Defensive issues

According to the Objective Football website run by Benjamin Pugsley, Liverpool conceded just 8.1 non-penalty shots per game, ranking second over the past eight seasons behind a Pep-infused Manchester City last year. Shots-on-target conceded (3.0 per game) told a similar story, ranking joint-sixth over the same period. However, they combined these extraordinary shot-suppression numbers with the highest expected goals per shot in the league (0.11), which is the worst value I have over the past five seasons. When Liverpool conceded shots, they were of high quality, which ultimately saw them sit fifth in terms of expected goals against last season.

Klopp’s tactical system deserves credit for melding a highly exciting attack with strong defensive aspects in terms of shot-suppression. The optimistic take here is that tweaks and a greater familiarity with his counter-pressing tactics could bring about improvements in shot quality conceded, thereby seeing better defensive numbers. It’s worth noting the period during November and December 2016 when their expected goals against was the lowest it has been consistently over a 19-game span in the past five seasons, so the current squad is capable of sustained excellence in this realm.

The pursuit of Virgil van Dijk does suggest that the club are aiming to recruit a new starting centre-back. That saga remains running at the time of writing as the world waits to find out just how costly a single ice cream can be. Centre-back depth is an issue that needs to be rectified; Lucas Leiva made six appearances as a centre-back last term is all the evidence needed for that statement.

The other aspect of Liverpool’s defense that could improve is in the goalkeeping stakes. From a pure shot-stopping perspective, Karius has the best pedigree; in my goalkeeper shot-stopping analysis, Karius came 31st across the data-set with a rating of 91%, which is a pretty decent indication that he is an above-average shot-stopper. Mignolet fared much worse with a ranking of just 25%, which puts him at best as an average shot-stopper during his Liverpool career to date. I haven’t looked at numbers for the Championship but Mark Taylor’s numbers for Ward at Huddersfield were not encouraging. Playing Karius would be a bold move by Klopp given his limited exposure to English football thus far but Mignolet doesn’t provide much confidence either -personally, I would go with Karius.

Title talk

If I’ve learnt anything while sifting through the data for this preview, it’s that Manchester City should be strong favourites for the title this coming season.

Can Liverpool challenge them, while also competing in Europe? At present, I’d side with no given the depth issues of last season have yet to be addressed and the remaining questions marks in terms of the defense.

Liverpool’s other transfer saga involving Naby Keita could be a game-changer given that he could have a transformative impact on the team’s midfield but the likelihood of him signing appears to be receding by the day. Midfield depth is also potentially an issue unless Klopp is happy to rely on youth to cover midfield absentees over the season.

With potentially five teams in the Champions League group stages, progress to the latter rounds could have a strong bearing on league form post-Christmas. Six into four is likely the maths heading into the new season and Liverpool should be well in the mix.

Prediction: Third We’re gonna win the league

Thinking about goalkeepers

Goalkeepers have typically been a tough nut to crack from a data analytics point-of-view. Randomness is an inherent aspect of goal-scoring, particularly over small samples, which makes drawing robust conclusions at best challenging and at worst foolhardy. Are we identifying skill in our ratings or are we just being sent down the proverbial garden path by variance?

To investigate some of these issues, I’ve built an expected save model that takes into account shot location and angle, whether the shot is a header or not and shot placement. So a shot taken centrally in the penalty area sailing into the top-corner will be unlikely to be saved, while a long-range shot straight at the keeper in the centre of goal should usually prove easier to handle.

The model is built using data from the past four seasons of the English, Spanish, German and Italian top leagues. Penalties are excluded from the analysis.

Similar models have been created by new Roma analytics guruStephen McCarthyColin Trainor & Constantinos Chappas and Thom Lawrence in the past.

The model thus provides an expected goal value for each shot that a goalkeeper faces, which we can then compare with the actual outcome. In a simpler world, we could easily identify shot-stopping skill by taking the difference between reality and expectation and then ranking goalkeepers by who has the best (or worst) difference.

However, this isn’t a simple world, so we run into problems like those illustrated in the graphic below.

Keeper_Funnel_Plot.png

Shot-stopper-rating (actual save percentage minus expected save percentage) versus number of shots faced. The central black line at approximately zero is the median, while the blue shaded region denotes the 90% confidence interval. Red markers are individual players. Data via Opta.

Each individual red marker is a player’s shot-stopper rating over the past four seasons versus the number of shots they’ve faced. We see that for low shot totals, there is a huge range in the shot-stopper-ranking but that the spread decreases as the number of shots increases, which is an example of regression to the mean.

To illustrate this further, I used a technique called boot-strapping to re-sample the data and generate confidence intervals for an average goalkeeper. This re-sampling is done 10,000 times to create a probability distribution built by randomly extracting groups of shots from the data-set and calculating actual and expected save percentages and then seeing how large the difference is. We see a strong narrowing of the blue uncertainty envelope up to around 50 shots, with further narrowing up to about 200 shots. After this, the narrowing is less steep.

What this effectively means is that there is a large band of possible outcomes that we can’t realistically separate from noise for an average goalkeeper. Over a season, a goalkeeper faces a little over 100 shots on target (119 on average according to the data used here). Thus, there is a huge opportunity for randomness to play a role and it is therefore of little surprise to find that there is little repeatability year-on-year for save percentage.

Things do start to settle down as shot totals increase though. After 200 shots, a goalkeeper would need to be performing more than ± 4% on the shot-stopper-rating scale to stand up to a reasonable level of statistical significance. After 400 shots, signal is easier to discern with a keeper needing to register more than ± 2% to emerge from the noise. That is not to say that we should be beholden to statistical significance but it is certainly worth bearing in mind in any assessment plus an understanding of the uncertainty inherent in analytics can be a powerful weapon to wield.

What we do see in the graphic above are many goalkeepers outside of the blue uncertainty envelope. This suggests that we might be able to identify keepers who are performing better or worse than the average goalkeeper, which would be pretty handy for player assessment purposes. Luckily, we can employ some more maths courtesy of Pete Owen who presented a binomial method to rank shot-stopping performance in a series of posts available here and here.

The table below lists the top-10 goalkeepers who have faced more than 200 shots over the past four seasons by the binomial ranking method.

GK-Top10.png

Top-10 goalkeepers as ranked by their binomial shot-stopper-ranking. Post-shot refers to expected save model that accounts for shot placement. Data via Opta.

I don’t know about you but that doesn’t look like too shabby a list of the top keepers. It may be that some of the names on the list have serious flaws in their game aside from shot-stopping but that will have to wait another day and another analysis.

So where does that leave us in terms of goalkeeping analytics? On one hand, we have noisy unrepeatable metrics from season-to-season. On the other, we appear to have some methods available to extract the signal from the noise over larger samples. Even then, we might be being fooled by aspects not included in the model or the simple fact that we expect to observe outliers.

Deficiencies in the model are likely our primary concern but these should be checked by a skilled eye and video clips, which should already be part of the review process (quit sniggering at the back there). Consequently, the risks ingrained in using an imperfect model can be at least partially mitigated against.

Requiring 2-3 seasons of data to get a truly robust view on shot-stopping ability may be too long in some cases. However, perhaps we can afford to take a longer-term view for such an important position that doesn’t typically see too much turnover of personnel compared to other positions. The level of confidence you might want when short-listing might well depend on the situation at hand; perhaps an 80% chance of your target being an above average shot-stopper would be palatable in some cases?

All this is to say that I think you can assess goalkeepers by the saves they do or do not make. You just need to be willing to embrace a little uncertainty in the process.

On the anatomy of a counter-attack

Originally published on StatsBomb.

One of the most enduring aspects of football is the multitude of tactical and stylistic approaches that can be employed to be successful. Context is king in analytics and football as a whole, so the ability to identify and quantify these approaches is crucial for both opposition scouting and player transfer profiles.

At the OptaPro Forum this year, I looked at data from the past five Premier League seasons and used a sprinkling of maths to categorise shots into different types.

One such style I identified was ‘fast attacks from deep’, which were a distinct class of shots born of fast and direct possessions originating in the defensive zone. While these aren’t entirely synonymous with counter-attacks, there is likely a lot of overlap; the classical counter-attack is likely a subset of the deep fast-attacks identified in the data.

These fast-attacks from deep typically offer good scoring chances, with above average shot conversion (10.7%) due to the better shot locations afforded to them. They made up approximately 23% of the shots in my analysis.

So what do they look like?

To provide an overview of the key features of these attacks, I’ve averaged them together to get a broad picture of their progression up the pitch. I’ve presented this below and included a look at attacks from deep that involve more build-up play for comparison.

Fast-attacks_vs_Build-up

Comparison between fast-attacks from deep and attacks from deep that focus on slower build-up play. Vertical pitch position refers to the progression of an attack towards the opponent’s goal (vertical pitch position equal to 100). Both attack types start and end in similar locations on average but their progress with time is quite different. The shading is the standard deviation to give an idea of the spread inherent in the data. Data via Opta.

Fast-attacks from deep are characterised by an initial speedy progression towards goal within a team’s own half, followed by a steadier advance in the attacking half. This makes sense qualitatively as counter-attacks often see a quick transition in their early stages to properly establish the attacking opportunity. The attack can then be less frenetic as a team seeks to create the best opportunity possible from the situation.

Over the past five seasons, the stand out teams as rated by shot volume and expected goals have been various incarnations of Arsenal, Manchester City, Chelsea and Liverpool.

The architects

Player-level metrics can be used to figure out who the crucial architects of a counter-attacking situation are. One method of examining this is how many yards a player’s passing progressed the ball during deep fast-attacking possessions.

Below I’ve listed the top 10 players from the 2016/17 season by this metric on a per 90 minute basis, alongside some other metrics for your delectation.

BallProgression_SummaryTable

Top players ranked by ball progression per 90 minutes (in yards) during fast-attacks from deep for the 2016/17 Premier League season. xGoals and Goals per 90 are for possessions that a player is involved in (known as xGChain in some parts). Players with more than 1800 minutes only. Data via Opta.

While the focus was often on him kicking people rather than the ball, we see that Granit Xhaka stands alone in terms of ball progression, with Daley Blind a long way behind him in second place. Xhaka’s long-range passing skills are well known, so combining this with the most passes per 90 in such situations propels him to the top of pile.

The graphic below illustrates Xhaka’s passing during deep fast-attacks, with his penchant for long passes spread all over the midfield zone evident. For comparison, I’ve included Eden Hazard’s passing map as someone who played many important passes that were limited in terms of ball progression as they were typically shorter or lateral passes in the final third.

Xhaka_Hazard_PassMaps.png

Passes played by Granit Xhaka and Eden Hazard during fast-attacks from deep during the 2016/17 season. Solid circles denote pass origin, while the arrows indicate the direction and end point of each pass. Data via Opta.

Evidently there is a link between position and ball progression, as players in deeper positions have greater scope to progress the ball as they have more grass in front of them. The likes of Coutinho, Özil and De Bruyne residing so high up the rankings is therefore impressive.

Coutinho_DeBruyne_PassMaps.png

Passes played by Philippe Coutinho and Kevin Dr Bruyne during fast-attacks from deep during the 2016/17 season. Data via Opta.

Coutinho’s passing chalkboard above illustrates his keen eye for a pass from midfield areas through opposition defensive lines, as does De Bruyne’s ability to find teammates inside the penalty area. De Bruyne’s contribution actually ranks highest in terms of xG per 90 for the past season.

The finishers

While ball progression through the defensive and midfield zones is important for these fast-attacks from deep, they still require the finishing touches in the final third. There are fewer more frustrating sights in football than watching a counter-attack be botched in its final moments.

The graphic below summarises the top players in this crucial aspect by examining their expected goal and assist outputs. Unsurprisingly, Kevin De Bruyne leads the way here and is powered by his exceptional creative passing.

xGandxA_DFA

Top-20 players rated by expected goals (xG) plus expenses assists (xA) for fast-attacks from deep. Players with more than 1800 minutes only. Data via Opta.

The list is dominated by players from the top-6 clubs, with Negredo the only interloper inside the top-10 ranking. Middlesbrough’s minimal attacking output left few scraps of solace for Negredo but at least he did get a few shots away in these high-value situations to alleviate the boredom.

Conclusion

The investigation of tactical and stylistic approaches carried out above merely scratches the surface of possibilities for opposition scouting and player profiling.

Being able to identify ‘successful’ attacking moves opens the door to examining ‘failed’ possessions, which would allow efficiency to be studied as well as defensive aspects. This is an area rich with promise that I’ll examine in the future, along with other styles identified within the same framework.

Identifying and assessing team-level strategies: 2017 OptaPro Forum Presentation

At the recent OptaPro Analytics Forum, I was honoured to be selected to present for a second time to an audience of analysts and other representatives from the sporting industry. My aim was to explore the multifaceted approaches employed by teams using cluster analysis of possession chains.

My thinking was that this could be used to assess the strengths and weaknesses of teams in both attack and defense, which could be used for opposition scouting. The results can also be used to evaluate how well players contribute to certain styles of play and potentially use this in recruitment.

The video of the presentation is below, so go ahead and watch it for more details. The slides are available here and I’ve pulled out some of the key graphics below.

The main types of attacking moves that result in shots are in the table below. I used the past four full English Premier League seasons plus the current 2016/17 season for the analysis here but an obvious next step is to expand the analysis across multiple leagues.

Cluster Profile Summary.png

Below is a comparison of the efficiency (in terms of shot conversion) and frequency of these attack types. The value of regaining the ball closer to goal and quickly transitioning into attack is clear, while slower or flank-focussed build-up is less potent. Much of the explanation for these differences in conversion rate can be linked to the distance from which such shots are taken on average.

An interesting wrinkle is the similarity in conversion rates between the ‘deep build-up’ and ‘deep fast-attacks’ profiles, with shots taken in the build-up focussed profile being approximately 2 yards further away from goal on average than the faster attacks. Looking through examples of the ‘deep build-up’ attacks, these are often characterised by periods of ball circulation in deeper areas followed by a quick transition through the opposition half towards goal with the opposition defense caught higher up the pitch, which may explain the results somewhat.

EfficiencyVsFrequency

Finally, here is a look at how attacking styles have evolved over time. The major changes are the decline in ‘flank-focussed build-up’ and increase in the ‘midfield regain & fast attack’ profile, which is perhaps unsurprising given wider tactical trends and the managerial changes over the period. There is also a trend in attacks from deep being generated from faster-attacks rather than build-up focussed play. A greater emphasis on transitions coupled with fast/direct attacking appears to have emerged across the Premier League.

EPL_ProfileTimeline

These are just a few observations and highlights from the presentation and I’ll hopefully put together some more team and player focussed work in the near future. It has been nearly a year since my last post but hopefully I’ll be putting out a steadier stream of content over the coming months.

Leicester City: Need for Speed?

Originally published on StatsBomb.

Leicester City’s rise to the top of the Premier League has led to many an analysis by now. Reasons for their ascent have mainly focused on smart recruitment and their counter-attacking style of play, as well as a healthy dose of luck. While their underlying defensive numbers leave something to be desired, their attack is genuinely good. The pace and directness of their attack has regularly been identified as a key facet of their style by writers with analytical leanings.

Analysis by Daniel Altman has been cited in both the Economist and the Guardian, with the crux being that the ‘key’ to stopping Leicester is to ‘slow them down’. Using slightly different metrics, David Sumpter illustrated this further at the recent Opta Pro Forum and on the Sky Sports website, where his analysis surmised that:

For Leicester, it’s about the speed of the attack.

An obvious and somewhat unaddressed question here is whether the pace of Leicester’s attack is the key to their increased effectiveness this season? Equating style with success in football is often a fraught exercise; the often tedious and pale imitations of Guardiola’s possession-orientated approach being a recent example across football.

Below are a raft of numbers comparing various facets of Leicester’s style and effectiveness this season with last season.

LCFC_Summary_Table.png

Comparison between Leicester City’s speed of attack and shot profile from ‘fast’ possessions. A possession is a passage of play where a team maintains unbroken control of the ball. Possessions moving at greater than 5 m/s on average are classed as ‘fast’. All values are for open-play possessions only. Data via Opta.

The take home message here is that the average pace of Leicester’s play has barely shifted this season compared to last. Only Burnley in 2014/15 and Aston Villa in 2013/14 have attacked at a greater pace than Leicester this season over the past four years.

The proportion of their shots generated via fast paced possessions has risen this year (from 27.5% to 32.1%) and Leicester currently occupy the top position by this metric over this period. In terms of counter-attacking situations, their numbers have barely changed this season (20.1%) compared to last season (20.8%), with only the aforementioned Aston Villa having a greater proportion (21.3%) than them in my dataset.

What has altered is the effectiveness of their attacks this season, as we can see that their expected goal figures have risen. Below are charts comparing their shots from counter-attacking situations, where we can see more shots in the central zone of the penalty area this season and several better quality chances.

LCFC_CounterAttack_Shots.png

Comparison of Leicester City’s shots from ‘fast’ and ‘deep’ attacks in 2014/15 and 2015/16. Points are coloured by their expected goal value (red = higher xG, lighter = lower xG). Any resemblance to the MK Shot Maps is entirely intentional. Data via Opta.

Their improvement this year sees them currently rank first and second in expected goals per game from fast-attacks and counter-attacks respectively over the past four season (THAT Liverpool team rank second and first). Based on my figures, Leicester’s goals from these situations are closely in line with expectations also (N.B. my expected goal model doesn’t explicitly account for counter-attacking moves).

The figure below shows how this has evolved over the past two seasons, where we see fast-attacks helping drive their improved attack at the end of 2014/15, which continued into this season. There has been a gradual decline since an early-season peak, although their expected goals from fast-attacks has reduced more than their overall attacking output in open-play, indicating some compensation from other forms of attack.

LCFC_CA_TimeLine

Rolling ten-match samples of Leicester City’s expected goals for in 2014/15 and 2015/16. All data is for open-play shots only. Data via Opta.

The effectiveness of these attacks has gone a long way to improving Leicester’s offensive numbers. According to my expected goal figures in open-play, they’ve improved from 0.70 per game to 0.94 per game this season. About half of that improvement has come from ‘fast’ paced possessions, with many of these possessions starting from deep areas in their own half.

Examining the way these chances are being created highlights that Leicester are completing more through-balls during their build-up play this season. The absolute numbers are small, with an increase from 11 to 17 through-balls during ‘fast’ possessions and from 6 to 12 during ‘fast’ possessions from their own half, but they do help to explain the increased effectiveness of their play. Approximately 27% of their shots from counter-attacks include a through-ball during their build-up this season, compared to just 11% last season. Through-balls are an effective means of opening up space and increasing the likelihood of scoring during these fast-paced moves. Leicester’s counter-attacks are also far less reliant on crosses this season, with just 2 of these attacks featuring a cross during build-up compared to 9 last season, which will further increase the likelihood of scoring.

Speed is an illusion. Leicester’s doubly so.

Overall, attacking at pace is a difficult skill to master but the rewards can be high. The pace and verve of Leicester’s attack has been eye-catching but it is the execution of these attacks, rather than the actual speed of them that has been the most important factor. Slowing Leicester down isn’t the key to stopping them, rather the focus should be either on denying them those potential counter-attacking situations or diluting their impact should you find yourself on the receiving end of one.

Whether they can sustain their attacking output from these situations is a difficult question to answer. If we examine how well output is maintained from one year to the next, the correlation for expected goals from counter-attacks is reasonable (0.55), while goal expectation per shot is lower (0.30). Many factors will determine the values here, not least the relatively small number of shots per season of this type, as well as a host of other intrinsic football factors. For fast-attacks, the correlations rise to 0.59 for expected goals and 0.52 for expected goals per shot. For comparison, the values for all open-play shots in my data-set are 0.91 and 0.63.

Examining the data in a little more depth suggests that the better counter-attacking and/or fast-paced teams tend to maintain their output, particularly if they retain managerial and squad continuity. Leicester have a good attack overall that is excellent at exploiting space with fast-attacking moves.

Retaining and perhaps even supplementing their attacking core over the summer would likely go a long way to maintaining a style of play that has brought them rich rewards.