Burnley: 2018-19 Season Preview

Originally published on StatsBomb.

Burnley enter the new season holding ‘the best of the rest’ crown following their highest place finish since 1974 last time out. The club are breaking new ground in their recent history, with a third consecutive season in the top flight plus the visit of European football to Turf Moor for the first time since the mid-sixties.

According to StatsBomb’s underlying numbers, Burnley’s finishing position was reasonably well-merited, with the ninth best expected goal difference in the league. That performance was powered by the sixth best defence in the league in both their outcomes and expectation. Fifty-four points wouldn’t usually be enough to secure seventh but Burnley can hardly be blamed for the deficiencies of others; eighth place Everton sitting on forty-nine points tells the story well on that front.

With all of the above considered though, seventh seems like the very top of Burnley’s potential outcomes given they don’t have the financial means to vault into challenging the top-six, so where do they go from here?

Defensive Foundations

Unless this is your first time visiting StatsBomb towers, you’ll no doubt be familiar with Burnley’s annual dance with expected goal models. Over each of the last four seasons, they’ve conceded fewer goals than expected based on traditional expected goal models, with their extreme form of penalty box defending thought to at least partially explain these divergences.

StatsBomb’s new shot event data includes the position of the players at the moment of the shot and it’s fair to say that Burnley keep the data collectors busy. StatsBomb’s expected goal model includes all of that information and puts their average expected goal conceded per shot at 0.084, which is the lowest in the league by some distance. It’s an impressive feat and it is what powers their defensive performance as only Stoke City and West Ham conceded more shots last season, which would usually put you amongst the worst defensive teams in the league.

Sean Dyche has instilled a defensive system based around forcing their opponent’s to take shots from poor locations and getting as many bodies between those shots and the goal as possible, even if those bodies don’t necessarily pressure the shot-taker. To give an overall picture of this, Burnley’s shots conceded rank:

  • Third longest distance from goal (the fundamental building block of expected goals).
  • Lowest proportion of shots where only the goalkeeper was between the shot-taker and the goal.
  • Fourth highest in density of players between the shot-taker and the goal.
  • Seventh highest in proportion of shots under pressure.
  • Ninth shortest distance between the shot-taker and the closest defender.

No other team comes close to putting all of those numbers together and when you add it all up you get a potent defensive cocktail that sees the highest proportion of blocked shots in the league and the lowest proportion of shots on target. Even with all that added to the model melting pot, Burnley still out-performed their expected goal figures to the tune of 12 goals, which is similar result to traditional expected goal models. However, StatsBomb’s model rates them more highly relative to the rest of the league. There was still some air in their numbers, but the process has a more-solid footing.

While their bunker-like approach might sound reminiscent of your ‘typical’ English defensive style, Burnley actually differ markedly further up the pitch where they apply a blanket of pressure on their opponents. The average distance of their defensive actions sat at a league average level, as did their opponent’s pass completion rate.

Burnley Defensive Activity Heatmap Premier League 2017_2018

Burnley counter-pressed at a league average intensity, sitting tenth overall. Based on a simple model of the strong relationship between counter-pressing and possession, Burnley counter-pressed more than any other team relative to their level of possession. The midfield pair of Steven Defour and Jack Cork led their defensive-pressing efforts, with able support from their wingers and attacking midfielders.

Defence has been the bedrock of this Burnley side and there is no reason to expect 2018/19 to be any different.

Attacking Concerns

On the attacking end, Burnley were thirteenth in both shots (10.6) and expected goals (1.1) per game, which you can likely surmise meant their expected goal per shot was distinctly league average. They actually under-performed their expected goals to the tune of six goals, which caused them particular problems at home where they were down seven goals against expectation.

Chris Wood provided very good numbers in his debut season, contributing 0.45 expected goals per 90, which was tenth highest of players who played over 900 minutes and third highest of those not at one of the top-six. His shot map illustrates his fondness for the central area of the box, with his expected goals per shot sitting fourth of players taking more than one shot per 90 minutes. Even his shots from outside of the penalty area were reasonably high quality, with two of them coming with the keeper out of position and no defender blocking his path to goal, one of which yielded a goal against Crystal Palace on his full debut.

Chris Wood Premier League 2017_2018

While Wood’s numbers were very good, there was a drop-off in goal-scoring contribution across the rest of the squad; Ashley Barnes and Welsh legend Sam Vokes were contributing at a 1 in 3 game rate in both expectation and actual output, with the midfield ranks providing limited goal-scoring support.

Wood’s medial ligament injury just before Christmas and two-month absence coincided with Burnley’s attack dropping below one expected goal per game. This was compounded by a poor run on the defensive side as well, leading to them collecting just 4 points in 9 games with Wood out of the starting eleven.

Burnley Premier League Trendlines

Gudmundsson carried the creative burden, with Brady chipping in when in the team but both relied heavily on set pieces when examining their expected assist contribution. Burnley were likely a touch unfortunate to not score more from dead-ball situations as their underlying process was good. However, there is certainly a lot of room for improvement in creativity from open-play.

With perhaps less scope for improvement on the defensive side, Burnley could do with improving their attacking output to really establish themselves in the top-half of the table. If Chris Wood can remain healthy and maintain his form then that would certainly help but ideally you would want to see an attack that is less reliant on one individual.

Transfers

From a departures point of view, Scott Arfield is the only player to leave who contributed reasonably significant minutes last season. After going most of the summer without an incoming transfer on the horizon, things have got busier over the last few days of the window

It is unclear whether proper football men or air-conditioned analytics practitioners were more excited by the signing of Joe Hart, with the latter intrigued how a goalkeeper with a history of poor shot-stopping numbers will fair at a club where the previous incumbents have consistently out-performed said numbers. With Nick Pope’s dislocated shoulder expected to keep him out of action for several months and Tom Heaton’s calf strain disrupting his own return from a dislocated shoulder, Hart is likely to start the season between the sticks.

Ben Gibson arrives from one of the better defensive teams in the Championship to provide competition and depth to the central defensive ranks. He could potentially ease Ben Mee out of the starting eleven and form a peak-age partnership with James Tarkowski once he is up to speed with Burnley’s defensive system.

Another arrival from the Championship is Matej Vydra, whose 21 goals last season saw him top the scoring charts, although his total was inflated by 6 penalties. His 15 non-penalty goals put him joint-fourth across the season in terms of volume at a rate of 0.50 goals per 90. However, his goal-scoring record over his career could be charitably described as ‘patchy’, while his two previous stints in the Premier League were mostly spent on the side-lines. Those concerns aside, the hope is that he can form an effective partnership with Wood and provide more creativity and a greater goal-threat than Jeff Hendrick, which is a practically subterranean low-bar.

Burnley have a recent history of reasonably successful recruitment from the Championship and are seemingly following that model again with the signings of Gibson and Vydra. Adding a more creative option in open-play looks like the area where they could have clearly upgraded the existing squad. It’s hard not to wonder whether they could have used their success last season and the draw of European football to improve their first-eleven. That said, being financially prudent isn’t the worst strategy and has seen them progress over recent years, so it’s hard to be too critical.

Where Do We Go from Here?

Burnley’s prospects this season are likely closely-tied to whether they qualify for the Europa League group stage. Injuries aside, they played essentially a first-choice team against Aberdeen, so are clearly aiming to progress. İstanbul Başakşehir are ranked 66th in Europe based on their Elo ranking, with Burnley in 53rd, so their tie is expected to be evenly-balanced.

Burnley ran with the most settled line-up in the league by a wide margin last season and squad depth is a major concern with the potential Thursday-Sunday grind that comes with Europa League qualification. Add in whatever the league cup is called this year and the next few months could be perilous.

The range of potential outcomes for this Burnley squad seems quite broad and the bookmakers are certainly unconvinced. On one end of the scale they could secure another top-half finish and put together a European adventure to bore the next generation of fans with, while on the flip-side they could struggle with the extra strain on the squad and find themselves at the wrong end of the table. The backbone of their past two campaigns has been their form in the first half of the season, which has kept them well-outside the relegation battle. That wasn’t the case in 2014/15 when they spent the entire season in the bottom four and struggled for goals when they needed to put wins on the board. A similar scenario playing out this term amidst a potentially stronger bottom-half could well be in play heading into 2019.

However, Burnley have made a habit of defying expectations and even have the opportunity to expand their exploits abroad this year. We’ll see if Sean Dyche can weave further sorcery from his spell-book.

Advertisements

On quantifying passing skill

Quantifying passing skill has been a topic that has gained greater attention over the past 18 months in public analytics circles, with Paul Riley,  StatsBomb and Played off the Park regularly publishing insights from their passing models. I talked a little about my own model last season but only published results on how teams disrupted their opponents passing. I thought delving into the nuts and bolts of the model plus reporting some player-centric results would be a good place to start as I plan to write more on passing over the next few months.

Firstly, the model quantifies the difficulty of an open-play pass based on its start and end location, as well as whether it was with the foot or head. So for example, relatively short backward passes by a centre back to their goalkeeper are completed close to 100% of the time, whereas medium-range forward passes from out-wide into the centre of the penalty area have pass completion rates of around 20%.

The data used for the model is split into training and testing sets to prevent over-fitting. The Random Forest-based model does a pretty good job of representing the different components that drive pass difficulty, some of which are illustrated in the figure below (also see the appendix here for some further diagnostics).

xP_val

Comparison between expected pass completion rates from two different passing models and actual pass completion rates based on the start and end location of an open-play pass. The horizontal dimension is orientated from left-to-right, with zero designating the centre of the pitch. The dashed lines in the vertical dimension plots show the location of the edge of each penalty area. Data via Opta.

One slight wrinkle with the model is that it has trouble with very short passes of less than approximately 5 yards due to the way the data is collected; if a player attempts a pass and an opponent in his immediate vicinity blocks it, then the pass is unsuccessful and makes it looks like such passes are really hard, even though the player was actually attempting a much longer pass. Neil Charles reported something similar in his OptaPro Forum presentation in 2017. For the rest of the analysis, such passes are excluded.

None shall pass

That gets some of the under-the-hood stuff out of the way, so let’s take a look at ways of quantifying passing ‘skill’.

Similar to the concept of expected goals, the passing model provides a numerical likelihood of a given pass being completed by an average player; deviations from this expectation in reality may point to players with greater or less ‘skill’ at passing. The analogous concept from expected goals would be comparing the number of goals scored versus expectation and interpreting this as ‘finishing skill‘ or lack there of. However, when it comes to goal-scoring, such interpretations tend to be very uncertain due to significant sample size issues because shots and goals are relatively infrequent occurrences. This is less of a concern when it comes to passing though, as many players will often attempt more passes in a season than they would take shots in their entire career.

Another basic output of such models is an indication of how adventurous a player is in their passing – are they playing lots of simple sideways passes or are they regularly attempting defense-splitting passes?

The figure below gives a broad overview of these concepts for out-field players from the top-five leagues (England, France, Germany, Italy and Spain) over the past two seasons. Only passes with the feet are included in the analysis.

dxP_avg_xP_scatter

Passing ‘skill’ compared to pass difficulty for outfield players from the past two seasons in the big-five leagues, with each data point representing a player who played more than 3420 minutes (equivalent to 38 matches) over the period. The dashed lines indicate the average values across each position. Foot-passes only. Data from Opta.

One of the things that is clear when examining the data is that pulling things apart by position is important as the model misses some contextual factors and player roles obviously vary a huge amount depending on their position. The points in the figure are coloured according to basic position profiles (I could be more nuanced here but I’ll keep it simpler for now), with the dashed lines showing the averages for each position.

In terms of pass difficulty, midfielders attempt the easiest passes with an average expected completion of 83.2%. Forwards (81.6%) attempt slightly easier passes than defenders (81.4%), which makes sense to me when compared to midfielders, as the former are often going for tough passes in the final third, while the latter are playing more long passes and crosses.

Looking at passing skill is interesting, as it suggest that the average defender is actually more skilled than the average midfielder?!? While the modern game requires defenders to be adept in possession, I’m unconvinced that their passing skills outstrip midfielders. What I suspect is happening is that passes by defenders are being rated as slightly harder than they are in reality due to the model not knowing about defensive pressure, which on average will be less for defenders than midfielders.

Forwards are rated worst in terms of passing skill, which is probably again a function of the lack of defensive pressure included as a variable, as well as other skills being more-valued for forwards than passing e.g. goal-scoring, dribbling, aerial-ability.

Pass muster

Now we’ve got all that out of the way, here are some lists separated by position. I don’t watch anywhere near as much football as I once did, so really can’t comment on quite a few of these and am open to feedback.

Note the differences between the players on these top-ten lists are tiny, so the order is pretty arbitrary and there are lots of other players that the model thinks are great passers who just missed the cut.

First-up, defenders: *shrugs*.

In terms of how I would frame this, I wouldn’t say ‘Faouzi Ghoulam is the best passer out of defenders in the big-five leagues’. Instead I would go for something along the lines of ‘Faouzi Ghoulam’s passing stands out and he is among the best left-backs according to the model’. The latter is more consistent with how football is talked about in a ‘normal’ environment, while also being a more faithful presentation of the model.

Looking at the whole list, there is quite a range of pass difficulty, with full-backs tending to play more difficult passes (passes into the final third, crosses into the penalty area) and the model clearly rates good-crossers like Ghoulam, Baines and Valencia. Obviously that is a very different skill-set to what you would look for in a centre back, so filtering the data more finely is an obvious next step.

Defenders (* denotes harder than average passes)

Name Team xP rating Pass difficulty
Faouzi Ghoulam Napoli 1.06 80.3*
Leighton Baines Everton 1.06 76.5*
Stefan Radu Lazio 1.06 82.1
Thiago Silva PSG 1.06 91.0
Benjamin Hübner Hoffenheim 1.05 84.4
Mats Hummels Bayern Munich 1.05 86.0
Kevin Vogt Hoffenheim 1.05 87.4
César Azpilicueta Chelsea 1.05 83.4
Kalidou Koulibaly Napoli 1.05 87.8
Antonio Valencia Manchester United 1.05 80.0*

On to midfielders: I think this looks pretty reasonable with some well-known gifted passers making up the list, although I’m a little dubious about Dembélé and Fernandinho being quite this high up. Iwobi is an interesting one and will keep James Yorke happy.

Fàbregas stands-out due to his pass difficulty being well-below average without having a cross-heavy profile – nobody gets near him for the volume of difficult passes he completes.

Midfielders (* denotes harder than average passes)

Name Team xP rating Pass difficulty
Cesc Fàbregas Chelsea 1.06 79.8*
Toni Kroos Real Madrid 1.06 88.1
Luka Modric Real Madrid 1.06 85.9
Arjen Robben Bayern Munich 1.05 79.6*
Jorginho Napoli 1.05 86.8
Mousa Dembélé Tottenham Hotspur 1.05 89.9
Fernandinho Manchester City 1.05 87.2
Marco Verratti PSG 1.05 87.3
Alex Iwobi Arsenal 1.05 84.9
Juan Mata Manchester United 1.05 84.5

Finally, forwards AKA ‘phew, it thinks Messi is amazing’.

Özil is the highest-rated player across the dataset, which is driven by his ability to retain possession and create in the final third. Like Fàbregas above, Messi stands out for the difficulty of the passes he attempts and that he is operating in the congested central and half-spaces in the final third, where mere mortals (and the model) tend to struggle.

In terms of surprising names: Alejandro Gomez appears to be very good at crossing, while City’s meep-meep wide forwards being so far up the list makes we wonder about team-effects.

Also, I miss Philippe Coutinho.

Forwards (* denotes harder than average passes)

Name Team xP rating Pass difficulty
Mesut Özil Arsenal 1.07 82.9
Eden Hazard Chelsea 1.05 81.9
Lionel Messi Barcelona 1.05 79.4*
Philippe Coutinho Liverpool 1.04 80.6*
Paulo Dybala Juventus 1.03 84.8
Alejandro Gomez Atalanta 1.03 74.4*
Raheem Sterling Manchester City 1.03 81.6*
Leroy Sané Manchester City 1.03 81.9
Lorenzo Insigne Napoli 1.03 84.3
Diego Perotti Roma 1.02 78.4*

Finally, the answer to what everyone really wants to know is, who is the worst passer? Step-forward Mario Gómez – I guess he made the right call when he pitched his tent in the heart of the penalty area.

Pass it on

While this kind of analysis can’t replace detailed video and live scouting for an individual, I think it can provide a lot of value. Traditional methods can’t watch every pass by every player across a league but data like this can. However, there is certaintly a lot of room for improvement and further analysis.

A few things I particularly want to work on are:

  • Currently there is no information in the model about the type of attacking move that is taking place, which could clearly influence pass difficulty e.g. a pass during a counter-attacking situation or one within a long passing-chain with much slower build-up. Even if you didn’t include such parameters in the model, it would be a nice means of filtering different pass situations.
  • Another element in terms of context is attempting a pass after a dribble, especially given some of the ratings above e.g. Hazard and Dembélé. I can envisage the model somewhat conflates the ability to create space through dribbling and passing skill (although this isn’t necessarily a bad thing depending on what you want to assess).
  • Average difficulty is a bit of a blunt metric and hides a lot of information. Developing this area should be a priority for more detailed analysis as I think building a profile of a player’s passing tendencies would be a powerful tool.
  • You’ll have probably noticed the absence of goalkeepers in the results above. I’ve left them alone for now as the analysis tends to assign very high skill levels to some goalkeepers, especially those attempting lots of long passes. My suspicion is that long balls up-field that are successfully headed by a goalkeeper’s team-mate are receiving a bit too much credit i.e. yes the pass was ‘successful’ but that doesn’t always mean that possession was retained after the initial header. That isn’t necessarily the fault of the goalkeeper, who is generally adhering to the tactics of their team and the match situation but I’m not sure it really reflects what we envisage as passing ‘skill’ when it comes to goalkeepers. Discriminating between passes to feet and aerial balls would be a useful addition to the analysis here.
  • Using minutes as the cut-off for the skill ratings leaves a lot of information on the table. The best and worst passers can be pretty reliably separated after just a few hundred passes e.g. Ruben Loftus-Cheek shows up as an excellent passer after just ~2000 minutes in the Premier League. Being able to quickly assess young players and new signings should be possible. Taking into account the number of passes a player makes should also be used to assess the uncertainty in the ratings.

I’ve gone on enough about this, so I’ll finish by saying that any feedback on the analysis and ratings is welcome. To facilitate that, I’ve built a Tableau dashboard that you can mess around with that is available from here and you can find the raw data here.

Time to pass and move.

Using Pressure to Evaluate Centre Backs

Originally published on StatsBomb.

Analysing centre backs is a subject likely to provoke either a shrug or a wistful smile from an analytics practitioner. To varying degrees, there are numbers and metrics aplenty for other positions but in public analytics at least, development has been limited and a genuine track record of successful application is yet to be found. If centre back analysis is the holy grail of public football analytics, then the search thus far has been more Monty Python than Indiana Jones.

One of the major issues with centre back analysis is that positioning isn’t measured directly by on-ball event data and any casual football watcher can tell you that positioning is a huge part of the defensive art. Tracking data would be the ideal means to assess positioning but it comes at a high-cost both computationally and technically, while having a much smaller coverage in terms of leagues than simpler event data provision.

StatsBomb’s new pressure event data serves as a bridge between the traditional on-ball event data and the detailed information provided by tracking data, offering a new prism to investigate the style and effectiveness of centre backs. While it won’t provide information on what a defender is up to when he is not in the immediate vicinity of the ball, it does provide extra information on how they go about their task.

Starting at the basic counting level, centre backs averaged six pressure actions per ninety minutes in the Premier League last season. Tackles and interceptions clock in at 0.8 and 1.3 per 90 respectively, which immediately illustrates that pressure provides a great deal more information to chew on when analysing more ‘proactive’ defending. I’m classing clearances and blocking shots as ‘reactive’ given they mostly take place in the penalty area and are more-directly driven by the opponent, while aerial duels are a slightly different aspect of defending that I’m going to ignore for the purposes of this analysis.

The figure below maps out where these defensive actions occur on the pitch and is split between left and right centre backs. Pressure actions typically occur in wider areas in the immediate vicinity of the penalty area, with another peak in pressure just inside the top corner of the 18-yard box. This suggests that centre backs don’t engage too high up the pitch in terms of pressure and are generally moving out towards the flanks to engage opponents in a dangerous position and either slow-down an attack, cut down an attackers options or directly contest possession.

DefensiveMaps.png

Maps illustrating the location of pressure actions, interceptions and tackles by centre backs in the 2017/18 EPL season. Top row is for left-sided centre backs and the bottom row is for right-sided centre backs.

The location of pressure actions is somewhat similar to the picture for interceptions, although the shape of the latter is less well-defined and tends to extend higher up the pitch. Tackles peak in the same zone just outside the top corners of the penalty area but are also less spatially distinct. Tackles also peak next to the edge of the pitch, a feature that is less distinct in the pressure and interception maps.

Partners in Crime

The number of pressure actions a centre back accumulates during a match will be driven by their own personal inclinations and role within the team, as well as the peculiarities of a given match and season e.g. the tactics of their own team and the opposition or the number of dangerous opportunities their opponent creates. The figure below explores this by plotting each individual centre back’s pressure actions per ninety minutes against their team name. The team axis is sorted by the average number of pressure actions the centre backs on each team make over the season.

CB_Pressure_Actions_per90

Pressure actions per 90 minutes by centre backs in the 2017/18 EPL season (minimum 900 minutes played) by team. Team axis is sorted by the weighted average number of pressure actions the centre backs on each team make over the season.

At the top end of the scale, we see Arsenal and Chelsea, two teams that regularly played a back-three over the past season. Nacho Monreal and César Azpilicueta led the league in pressure actions per ninety minutes by a fair distance and it appears the additional cover provided by playing in a back-three and their natural instincts developed as full backs meant they were frequently putting their opponents under pressure. Manchester United top the list in terms of those predominantly playing with two centre backs, with all of their centre backs applying pressure at similar rates.

At the other end of the scale, Brighton and Leicester’s centre backs appear to favour staying at home in general. Both though are clear examples of there being an obvious split between the number of pressure actions by the primary centre backs on a team, with one being more aggressive while the other presumably holds their position and plays a covering role. This division of roles is perhaps most clearly demonstrated by Chelsea’s centre backs, with Azpilicueta and Antonio Rüdiger as the side centre backs being more proactive than their counter-part in the central defensive slot (Cahill or Christensen).

Liverpool’s improved defensive performance over the course of the season has been attributed to a range of factors, with the signing of Virgil Van Dijk for a world-record fee garnering much of the credit. Intriguingly, his addition to the Liverpool backline has seemingly offered a significant contrast to the club’s incumbents, who all favoured a slightly greater than average number of pressure actions. Furthermore, Van Dijk ranked towards the bottom of the list in terms of pressure actions for Southampton (4.5 per 90) as well, with his figure for Liverpool (3.7 per 90) representing a small absolute decline. As an aside, Van Dijk brings a lot to the table in terms of heading skills, where he ranks highly for both total and successful aerial duels, so he is still an active presence in this aspect, while being a low-event player in others.

Centre backs are often referred to as a partnership and the above illustrates how defensive units often setup to complement each others skill sets and attempt to become greater than the sum of their parts.

The Thompson Triangle

Mark Thompson has led the way in terms of public analytics work on centre backs and has advocated for stylistic-driven evaluations as the primary means of analysis, which can then be built on with more traditional scouting. Pressure actions add another string to this particular bow and the figure below contrasts the three proactive defensive actions discussed earlier. Players in different segments of the triangle are biased towards certain actions, with those in the corners being more strongly inclined towards one action over the other two.

CB-TernaryGraph

Comparison of player tendencies in terms of ‘proactive’ defensive actions in the 2017/18 EPL season (minimum 900 minutes played). Apologies for triggering any flashbacks to chemistry classes. Click figure to open in new window.

There is a lot to pour over in the figure, so I’ll focus on defenders who are most inclined towards pressure actions. One clear theme is that such centre backs frequently featured on the sides of a back-three. Ryan Shawcross is unusual in this aspect given he was generally the middle centre back in Stoke’s back-three, as well as the right centre back in a back four. Ciaran Clark at Newcastle and Kevin Long at Burnley are the only players who featured mostly as one of two centre backs, with their partner adopting a more reserved role.

The additional cover provided by a back-three system and the frequent requirement for the player on the flanks to pull wide and cover in behind their wing-back seemingly plays a large part in determining the profile of centre backs. This illustrates the importance of considering team setup in determining a defenders profile and should feed into any recruitment process alongside their individual inclinations.

The analysis presented provides descriptive metrics and illustrations of the roles played by centre backs and is very much a first look at this new data. While we can’t gain definitive information on positioning without constant tracking of a player, the pressure event data provides a new lens to evaluate centre backs and significantly increases the number of defensive actions that can be evaluated further. Armed with such information, these profiles can be built upon with further data-driven analysis and combined with video and in-person scouting to build a well-rounded profile on the potential fit of a player.

Now all we need is a shrubbery.

Measuring counter-pressing

Originally published on StatsBomb.

The concept of pressing has existed in football for decades but its profile has been increasingly raised over recent years due to its successful application by numerous teams. Jürgen Klopp and Pep Guardiola in particular have received acclaim across their careers, with pressing seen as a vital component of their success. There are numerous other recent examples, such as the rise of Atlético Madrid, Tottenham Hotspur and Napoli under Diego Simeone, Mauricio Pochettino and Maurizio Sarri respectively.

Alongside this rise, public analytics has sought to quantify pressing through various metrics. Perhaps the most notable and widely-used example was ‘passes per defensive action’ or PPDA, which was established by Colin Trainor and first came to prominence on this very website. Anecdotally, PPDA found its way inside clubs and serves as an example of public analytics penetrating the private confines of football. Various metrics have also examined pressing through the prism of ‘possessions’, which Michael Caley has put to effective use on numerous occasions. Over the past year, I sought to illustrate pressing by quantifying a team’s ability to disrupt pass completion. While this was built on some relatively complex numerical modelling, it did provide what I thought was a nice visual representation of the effectiveness of a team’s pressing.

While the above metrics and others have their merits, they tend to ignore that pressing can take several forms and are biased towards the outcome, rather than the actual process. The one public example that side-steps many of these problems is the incredible work by the Anfield Index team through their manual collection of Liverpool’s pressing over the past few seasons but this has understandably been limited to one team.

Step-forward the new pressure event data supplied by StatsBomb Services. This new data is an event that is triggered when a player is within a five-yard radius of an opponent in possession. The radius varies as errors by the opponent would prove more costly, with a maximum range of ten-yards that is usually associated with goalkeepers under pressure. As well as logging the players involved in the pressure event and its location, the duration of the event is also collected.

The data provides an opportunity to explore pressing in greater detail than ever before. Different teams use different triggers to instigate their press, which can now be isolated and quantified. Efficiency and success can be separated from the pressing process in a number of ways at both the team and player-level. Such tools can be used in team-evaluation, opposition scouting and player recruitment.

One such application of the new data is to explore gegenpressing or counter-pressing, which is the process where a team presses the opposition immediately after losing possession. The initial aim of counter-pressing is to disrupt the opponent’s counter-attack, which can be a significant danger during the transition phase from attack-to-defence when a team is more defensively-unstable. Ideally possession is quickly won back from the opponent, with some teams seeking to exploit such situations to attack quickly upon regaining possession. Five seconds is often used as a cut-off for the period where pressure on the opposition is most intensely applied during the counter-press.

The exciting new dimension provided by StatsBomb’s new pressure data is that the definition of counter-pressing you would find in a coaching manual can be directly drawn from the data i.e. a team applies pressure to their opponent following a change in possession. The frequency at which counter-pressing occurs can be quantified and then we can develop various metrics to examine the success or failure of this process. Furthermore, we can analyse counter-pressing at the player-level, which has been out-of-reach previously.

The figure below illustrates where on the pitch counter-pressing occurs based on data from 177 matches from the Premier League this past season. The pitch is split into six horizontal zones and is orientated so that the team out-of-possession is playing from left-to-right. The colouring on the pitch shows the proportion of open-play possessions starting in each zone where pressure is applied within five seconds of a new possession.

AvgCounterPressMap.png

The figure illustrates that pressure is most commonly applied on possessions starting in the midfield zones, with marginally more pressure in the opposition half. Possessions beginning in the highest zone up the pitch come under less pressure, which is likely driven by the lower density of players in this zone on average. Very few possessions actually begin in the deepest zone and a smaller proportion of them come under pressure quickly than those in midfield.

From a tactical perspective, pressing is generally reserved for areas outside of a team’s own defensive third. The exact boundary will vary but for the following analysis, I have only considered possessions starting higher up the pitch, as denoted by the counter-pressing line in the previous figure.

In the figures below, the proportion of possessions in the counter-pressing zones where pressure is applied within five seconds is referred to as the ‘counter-pressing fraction’. In the sample of matches from the Premier League this season, a little under half (0.47) of open-play possessions come under pressure from their opponent within five seconds. At the top of the counter-pressing rankings, we see Manchester City, Tottenham Hotspur and Liverpool, which is unsurprising given the reputations of their managers. At the bottom end of the scale, we find a collection of teams that have mostly been overseen by British managers who are more-known for a deep-defensive line.

Team_CounterPressFraction

On the right-hand figure above, the strong association between counter-pressing and possession is illustrated, with the two showing a high correlation coefficient of 0.86 in this aggregated sample. Interpreting causality here is somewhat problematic given the likely circular relationship between the two parameters; teams that dominate possession may have more energy to press intensively, leading to a greater counter-pressing fraction, which would lead to them winning possession back more quickly, which will potentially increase their possession share and so on. The correlation is weaker for individual matches (0.36), which hints at some greater complexity and is something that can be returned to at a later date.

Perhaps the most interesting finding in the above figures is Burnley’s high counter-pressing fraction. The majority of analysis on Burnley has focused on their defensive structure within their own box and how that affects their defensive performance in relation to expected goals. The figure illustrates that Burnley employ a relatively aggressive counter-press, especially in relation to their possession share.

Examining Burnley’s counter-pressing game in more detail reveals that they counter-press 18 possessions per game, which is above average and only slightly lower than Manchester City. However, they only actually regain possession within five seconds 2.5 times per game, which falls short of what you might expect on average and falls below their counter-pressing peers. In terms of the ratio between their counter-pressing regains and total counter-pressing possessions, they sit 17th on 14%.

Burnley’s counter-press is the fourth least-effective at limiting shots, with 13% of such possessions ending with them conceding a shot compared to the average rate of 10%. However, one thing in their favour is that these possessions are typically around the league average in terms of their length and speed of attack, which will allow Burnley to regain their vaunted defensive organisation prior to conceding such shots.

The more dominant discourse around pressing is as an attacking rather than defensive weapon, so narratives are often formed around teams that regularly win back the ball through pressing and use this to generate fast attacks e.g. Liverpool and Tottenham Hotspur. As a result, a team like Burnley who seemingly employ counter-pressing as a defence-first tactic to prevent counter-attacks and slow attacking progress may be overlooked.

Burnley’s manager, Sean Dyche, has typically been lumped-in with the tactical stylings of the perennially-employed British managers who aren’t generally associated with pressing tactics. Dyche was reportedly most impressed by the pressing game employed by Guardiola’s Barcelona and he has seemingly implemented some of these ideas at Burnley. He has instilled an approach that combines counter-pressing and a low-block with numbers behind the ball, which is a neat trick to pull-off; Diego Simeone and Atlético Madrid are perhaps the more apt comparison given such traits.

The above analysis illustrates the ability of StatsBomb’s new pressure event data to illuminate an important aspect of the modern game. Furthermore, it is able to do this in a manner that directly translates tactical principles, separating underlying process and outcome, which is a giant step-forward for analytics. It also led to an analysis discussing the similarity between Guardiola’s legendary Barcelona team and Sean Dyche’s Burnley, which was probably unexpected to say the least.

This is just a taster of what is possible with StatsBomb’s new data. There’s more information in this presentation from the StatsBomb launch event and you can expect more analysis to appear over the summer and beyond.

Liverpool and I

While I probably watched Liverpool play before then, the first match I remember watching was on the 4th January 1994, when a nine-year-old me saw them come back from three goals down, which would become something of a theme. As is the want of memory, the events that leave an indelible mark are the ones that stand-out; my first actual football memory is Paul Bodin missing that penalty and not really understanding the scale of the disappointment. Turned out Wales’ last World Cup match was in 1958 when some no-mark seventeen-year-old called Edson Arantes do Nascimento scored his first international goal and knocked them out in the quarter-final.

Other early memories include one of God’s defining miracles, with a hat-trick notched up in four minutes and thirty three seconds and learning about player aging curves when I realised that the slow yet classy guy in midfield used to be one of the most devastating and exciting wide-players the game had ever seen. My first match at Anfield was Ian Rush’s last there in a red shirt, while subsequent visits took in thrilling cup matches under the gaze of King Kenny and the best live sporting experience of my life as I bounced out of Anfield full of hope in April 2014.

While a league title has proved elusive during my supporting life, Europe has provided the greatest thrills, with tomorrow marking a third European Cup Final to go along with two finals in the junior competition. A European Cup Final once every eight years on average, with all three in the last fourteen years is pretty good going for a non-super club, albeit one with significant resources.

Real Madrid are clearly going to be a tough nut to crack, with Five Thirty Eight, Club Elo and Euro Club Index all ranking them as the second best team around. The same systems have Liverpool as the fifth, seventh and eleventh best, so under-dogs with a good chance at glory overall.

According to Club Elo, the 2018 edition of Liverpool will be the best to contest a European Cup Final this century but on the flip-side, Real Madrid are stronger than either of the AC Milan teams that they faced in 2005 and 2007. Despite this, Liverpool are given a slightly better shot at taking home Old Big Ears than they had in 2005, as the gap between them and their opponents is narrower. The strides that the team made under Rafa between the 2005 and 2007 finals meant that the latter was contested by two equal teams.

Liverpool should evidently be approaching the final with optimism and further evidence of this is illustrated in the figure below, which shows the top-fifty teams by non-penalty expected goal difference in the past eight Premier League seasons. The current incarnation of Liverpool sit fifth and would usually be well-positioned to seriously challenge for the title. As the figure also illustrates, the scale of Manchester City’s dominance in their incredible season is well-warranted.

EPL-8-seasons-xGD.png

Top-fifty teams by non-penalty expected goal difference over the past eight Premier League seasons. Liverpool are highlighted in red, with the 17/18 season marked by the star marker. Data via Opta.

Liverpool’s stride forward under Klopp this past season has taken them beyond the 13/14 and 12/13 incarnations in terms of their underlying numbers. In retrospect, Rodgers’ first season was quietly impressive even if it wasn’t reflected in the table and it set the platform for the title challenge the following season.

Compared to those Luis Suárez-infused 12/13 and 13/14 seasons, the attacking output this past season is slightly ahead, with the team sitting sixth in the eight-season sample, which is their best over the period. Including penalties would take the 13/14 vintage beyond the latest incarnation, with the former scoring ten from the twelve (!) awarded, while 17/18 saw only three awarded (two scored).

The main difference between the current incarnation though is on the defensive end, with the team having the fifth best record in terms of non-penalty expected goals conceded this past season in the eight-year sample. The 13/14 season’s defence was the seventh worst by the club in this eight-year period and they lay thirty-fourth overall. These contrasting records equate to an eight non-penalty expected goal swing in their defensive performance.

While the exhilarating attacking intent of this Liverpool side is well-established, they are up against another attacking heavyweight; could it be that the defensive side of the game is the most decisive? The second half of this season is especially encouraging on this front, with improvements in both expected and actual performance. This period represents the sixth best half season over these eight-seasons (out of a total of 320) and a three-goal swing compared to the first half of the season. This was slightly offset by a reduction in attacking output of two non-penalty expected goals but the overall story is one of improvement.

The loss of Coutinho, addition of van Dijk and employing a keeper with hands (edit 2203 26/05/18: well at least he gets his hands to it usually) between the sticks is a clear demarcation in Liverpool’s season and it is this period that has seen the thrilling run to the European Cup Final. The improved balance between attack and defence bodes well and I can’t wait to see what this team can do on the biggest stage in club football.

Allez, Allez, Allez!

What has happened to the Klopp press?

Originally published on StatsBomb.

When asked how his Liverpool team would play by the media horde who greeted his unveiling as manager two years ago, Jürgen Klopp responded:

We will conquer the ball, yeah, each fucking time! We will chase the ball, we will run more, fight more.

The above is a neat synopsis of Klopp’s preferred style of play, which focuses on pressing the opponent after losing the ball and quickly transitioning into attack. It is a tactic that he successfully deployed at Borussia Dortmund and one that he has employed regularly at Liverpool.

However, a noticeable aspect of the new season has been Liverpool seemingly employing a less feverish press. The Anfield Index Under Pressure Podcast led the way with their analysis, which was followed by The Times’ Jonathan Northcroft writing about it here and Sam McGuire for Football Whispers.

Liverpool’s pass disruption map for the past three seasons is shown below. Red signifies more disruption (greater pressure), while blue indicates less disruption (less pressure). In the 2015/16 and 2016/17 seasons, the team pressed effectively high up the pitch but that has slid so far this season to a significant extent. There is some disruption in the midfield zone but at a lower level than previously.

LFC_dxP.png

Liverpool’s zonal pass completion disruption across the past three seasons. Teams are attacking from left-to-right, so defensive zones are to the left of each plot. Data via Opta.

The above numbers are corroborated by the length of Liverpool’s opponent possessions increasing by approximately 10% this season compared to the rest of Klopp’s reign. Their opponents so far this season have an average possession length of 6.5 seconds, which is lower than the league average but contrasts strongly with the previous figures that have been among the shortest in the league.

Examining their pass disruption figures game-by-game reveals further the reduced pressure that Liverpool are putting on their opponents. During 2015/16 and 2016/17, their average disruption value was around -2.5%, which they’ve only surpassed once in Premier League matches this season, with the average standing at -0.66%.

LFC-xP-17-18

Liverpool’s game-by-game pass completion disruption for 2017/18 English Premier League season. Figures are calculated for zones above Opta x-coordinates greater than 40. Data via Opta.

The Leicester match is the major outlier and examining their passing further indicates that the high pass disruption was a consequence of them attempting a lot of failed long passes. This is a common response to Liverpool’s press as teams go long to bypass the pressure.

Liverpool’s diminished press is likely a deliberate tactic that is driven by the added Champions League matches the team has faced so far this season. The slightly worrisome aspect of this tactical shift is that Liverpool’s defensive numbers have taken a hit.

In open-play, Liverpool’s expected goals against figure is 0.81 per game, which is up from 0.62 last season. Furthermore, their expected goals per shot has risen to 0.13 from 0.11 in open-play. To add further defensive misery, Liverpool’s set-piece woes (specifically corners) have actually got worse this season. The team currently sit eleventh in expected goals conceded this season, which is a fall from fifth last year.

This decline in underlying defensive performance has at least been offset by a rise on the attacking side of 0.4 expected goals per game to 1.78 this season. Overall, their expected goal difference of 0.79 this season almost exactly matches the 0.81 of last season.

Liverpool’s major problem last season was their soft under-belly but they were often able to count on their pressing game denying their opponents opportunities to exploit it. What seems to be happening this season is that the deficiencies at the back are being exploited more with the reduced pressure ahead of them.

With the season still being relatively fresh, the alarm bells shouldn’t be ringing too loudly but there is at least cause for concern in the numbers. As ever, the delicate balancing act between maximising the sides attacking output while protecting the defense is the key.

Klopp will be searching for home-grown solutions in the near-term and a return to the familiar pressing game may be one avenue. Given the competition at the top of the table, he’ll need to find a solution sooner rather than later, lest they be left behind.

Under pressure

Originally published on StatsBomb.

Models that attempt to measure passing ability have been around for several years, with Devin Pleuler’s 2012 study being the first that I recall seeing publicly. More models have sprung up in the past year, including efforts by Paul Riley, Neil Charles and StatsBomb Services. These models aim to calculate the probability of a pass being completed using various inputs about the start and end location of the pass, the length of the pass, the angle of it, as well as whether it is played with the head or foot.

Most applications have analysed the outputs from such models from a player passing skill perspective but they can also be applied at the team level to glean insights. Passing is the primary means of constructing attacks, so perhaps examining how a defense disrupts passing could prove enlightening?

In the figure below, I’ve used a pass probability model (see end of post for details and code) to estimate the difficulty in completing a pass and then compared this to the actual passing outcomes at a team-level. This provides a global measure of how much a team disrupts their opponents passing. We see the Premier League’s main pressing teams with the greatest disruption, through to the barely corporeal form represented by Sunderland.

Team_PCDgraph

Pass completion disruption for the 2016/17 English Premier League season. Disruption is defined as actual pass completion percentage minus expected pass completion percentage. Negative values means opponent’s complete fewer passes than expected. Data via Opta.

The next step is to break this down by pitch location, which is shown in the figure below where the pitch has been broken into five bands with pass completion disruption calculated for each. The teams are ordered from most-to-least disruptive.

PressureMap

Zonal pass completion disruption for 2016/17 English Premier League season. Teams are attacking from left-to-right, so defensive zones are to the left of each plot. Data via Opta.

We see Manchester City and Spurs disrupt their opponents passing across the entire pitch, with Spurs’ disruption skewed somewhat higher. Liverpool dominate in the midfield zones but offer little disruption in their deepest-defensive zone, suggesting that once a team breaks through the press, they have time and/or space close to goal; a familiar refrain when discussing Liverpool’s defense.

Chelsea offer an interesting contrast with the high-pressing teams, with their disruption gradually increasing as their opponents inch closer to their goal. What stands out is their defensive zone sees the greatest disruption (-2.8%), which illustrates that they are highly disruptive where it most counts.

The antithesis of Chelsea is Bournemouth who put together an average amount of disruption higher up the pitch but are extremely accommodating in their defensive zones (+4.5% in their deepest-defensive zone). Sunderland place their opponents under limited pressure in all zones aside from their deepest-defensive zone where they are fairly average in terms of disruption.

The above offers a glimpse of the defensive processes and outcomes at the team level, which can be used to improve performance or identify weaknesses to exploit. Folding such approaches into pre-game routines could quickly and easily supplement video scouting.

Appendix: Pass probability model

For this post, I built two different passing models; the first used Logistic Regression and the second used Random Forests. The code for each model is available here and here.

Below is a comparison between the two, which compares expected success rates with actual success rates on out-of-sample test data.

Actual_vs_Expected

Actual versus expected pass success for two different models. Data via Opta.

The Random Forest method performs better than the Logistic Regression model, particularly for low probability passes. This result is confirmed when examining the Receiver Operating Characteristics (ROC) curves in the figure below. The Area Under the Curve (AUC) for the Random Forest model is 87%, while the Logistic Regression AUC is 81%.

xPass_AUC

Receiver Operating Characteristics (ROC) curves for the two different passing models. Data via Opta.

Given the better performance of the Random Forest model, I used this in the analysis in the main article.

Liverpool 2017/18 season preview

Originally published on StatsBomb.

Liverpool enter the season with aspirations of challenging for the title after an at times hugely promising and exciting first full season under Jürgen Klopp. The prospect of European adventures returning on Tuesday or Wednesday nights is tantalizing close providing they negotiate their Champions League qualifying round.

The story so far

Liverpool’s tally of 76 points last season was their joint-third best tally over the last decade and only their second top-four finish since the Benitez years. In fact, after a run of four top-four finishes, Liverpool haven’t registered back-to-back Champions League qualifications since Rafa left and have on average finished sixth during that time with 65 points on the board.

With the above in mind, it’s tempting to view a season of consolidation as the priority for the coming season, alongside beginning to re-establish the team as a European force. Liverpool’s underlying performance last season is encouraging, with their goal return reasonably in-line with expectation and their expected goal difference placing them well in contention for a title push.

Drilling further into their expected goal numbers, sees a team that experienced fluctuating under-lying performance over the course of the season with a significant decline once 2017 was rung in. The graphic below illustrates this alongside a longer-term outlook encompassing the past five seasons.

LFC_xG_TimeLine.png

Rolling 19-game average expected goal timeline over the past five seasons. Grey vertical lines denote new season.

The heights of 2016/17 are close to those of the Suárez-powered team under Rodgers, while the low-point is more in-line with Klopp’s early tenure at the club. The past season thus illustrated that the team was capable of title-contending performances at times but also switched to a team competing for the fourth-place trophy at best.

Upping the pace

Closer examination of the downturn in performance using my ‘team strategy analysis‘ shows a drying up of shot generation via high-quality chances born of fast-paced attacks from deep and after midfield-transitions.

Sadio Mané was evidently missed due to AFCON duties and injury over the latter half of the season and this is borne out by the numbers. According to my model, he was second best in the EPL (0.11 per 90) in terms of xG-contribution (the sum of expected goals and assists) from fast-paced attacks following a midfield-transition. For fast-attacks from deep, he ranked sixth for xG-contribution (0.12 per 90).

Thankfully, Mohamed Salah, the club’s major acquisition so far, brings complementary qualities to the table and adds much-needed depth to the wide-forward ranks. James Yorke of this parish has already praised the signing earlier this summer and my only addition is that Salah showed up quite highly for xG-contribution (0.07 per 90, ranking eleventh in Serie A) for fast-paced attacks following a midfield-transition. The addition of Salah improves what was already a healthy front-line attack.

Defensive issues

According to the Objective Football website run by Benjamin Pugsley, Liverpool conceded just 8.1 non-penalty shots per game, ranking second over the past eight seasons behind a Pep-infused Manchester City last year. Shots-on-target conceded (3.0 per game) told a similar story, ranking joint-sixth over the same period. However, they combined these extraordinary shot-suppression numbers with the highest expected goals per shot in the league (0.11), which is the worst value I have over the past five seasons. When Liverpool conceded shots, they were of high quality, which ultimately saw them sit fifth in terms of expected goals against last season.

Klopp’s tactical system deserves credit for melding a highly exciting attack with strong defensive aspects in terms of shot-suppression. The optimistic take here is that tweaks and a greater familiarity with his counter-pressing tactics could bring about improvements in shot quality conceded, thereby seeing better defensive numbers. It’s worth noting the period during November and December 2016 when their expected goals against was the lowest it has been consistently over a 19-game span in the past five seasons, so the current squad is capable of sustained excellence in this realm.

The pursuit of Virgil van Dijk does suggest that the club are aiming to recruit a new starting centre-back. That saga remains running at the time of writing as the world waits to find out just how costly a single ice cream can be. Centre-back depth is an issue that needs to be rectified; Lucas Leiva made six appearances as a centre-back last term is all the evidence needed for that statement.

The other aspect of Liverpool’s defense that could improve is in the goalkeeping stakes. From a pure shot-stopping perspective, Karius has the best pedigree; in my goalkeeper shot-stopping analysis, Karius came 31st across the data-set with a rating of 91%, which is a pretty decent indication that he is an above-average shot-stopper. Mignolet fared much worse with a ranking of just 25%, which puts him at best as an average shot-stopper during his Liverpool career to date. I haven’t looked at numbers for the Championship but Mark Taylor’s numbers for Ward at Huddersfield were not encouraging. Playing Karius would be a bold move by Klopp given his limited exposure to English football thus far but Mignolet doesn’t provide much confidence either -personally, I would go with Karius.

Title talk

If I’ve learnt anything while sifting through the data for this preview, it’s that Manchester City should be strong favourites for the title this coming season.

Can Liverpool challenge them, while also competing in Europe? At present, I’d side with no given the depth issues of last season have yet to be addressed and the remaining questions marks in terms of the defense.

Liverpool’s other transfer saga involving Naby Keita could be a game-changer given that he could have a transformative impact on the team’s midfield but the likelihood of him signing appears to be receding by the day. Midfield depth is also potentially an issue unless Klopp is happy to rely on youth to cover midfield absentees over the season.

With potentially five teams in the Champions League group stages, progress to the latter rounds could have a strong bearing on league form post-Christmas. Six into four is likely the maths heading into the new season and Liverpool should be well in the mix.

Prediction: Third We’re gonna win the league

Thinking about goalkeepers

Goalkeepers have typically been a tough nut to crack from a data analytics point-of-view. Randomness is an inherent aspect of goal-scoring, particularly over small samples, which makes drawing robust conclusions at best challenging and at worst foolhardy. Are we identifying skill in our ratings or are we just being sent down the proverbial garden path by variance?

To investigate some of these issues, I’ve built an expected save model that takes into account shot location and angle, whether the shot is a header or not and shot placement. So a shot taken centrally in the penalty area sailing into the top-corner will be unlikely to be saved, while a long-range shot straight at the keeper in the centre of goal should usually prove easier to handle.

The model is built using data from the past four seasons of the English, Spanish, German and Italian top leagues. Penalties are excluded from the analysis.

Similar models have been created by new Roma analytics guruStephen McCarthyColin Trainor & Constantinos Chappas and Thom Lawrence in the past.

The model thus provides an expected goal value for each shot that a goalkeeper faces, which we can then compare with the actual outcome. In a simpler world, we could easily identify shot-stopping skill by taking the difference between reality and expectation and then ranking goalkeepers by who has the best (or worst) difference.

However, this isn’t a simple world, so we run into problems like those illustrated in the graphic below.

Keeper_Funnel_Plot.png

Shot-stopper-rating (actual save percentage minus expected save percentage) versus number of shots faced. The central black line at approximately zero is the median, while the blue shaded region denotes the 90% confidence interval. Red markers are individual players. Data via Opta.

Each individual red marker is a player’s shot-stopper rating over the past four seasons versus the number of shots they’ve faced. We see that for low shot totals, there is a huge range in the shot-stopper-ranking but that the spread decreases as the number of shots increases, which is an example of regression to the mean.

To illustrate this further, I used a technique called boot-strapping to re-sample the data and generate confidence intervals for an average goalkeeper. This re-sampling is done 10,000 times to create a probability distribution built by randomly extracting groups of shots from the data-set and calculating actual and expected save percentages and then seeing how large the difference is. We see a strong narrowing of the blue uncertainty envelope up to around 50 shots, with further narrowing up to about 200 shots. After this, the narrowing is less steep.

What this effectively means is that there is a large band of possible outcomes that we can’t realistically separate from noise for an average goalkeeper. Over a season, a goalkeeper faces a little over 100 shots on target (119 on average according to the data used here). Thus, there is a huge opportunity for randomness to play a role and it is therefore of little surprise to find that there is little repeatability year-on-year for save percentage.

Things do start to settle down as shot totals increase though. After 200 shots, a goalkeeper would need to be performing more than ± 4% on the shot-stopper-rating scale to stand up to a reasonable level of statistical significance. After 400 shots, signal is easier to discern with a keeper needing to register more than ± 2% to emerge from the noise. That is not to say that we should be beholden to statistical significance but it is certainly worth bearing in mind in any assessment plus an understanding of the uncertainty inherent in analytics can be a powerful weapon to wield.

What we do see in the graphic above are many goalkeepers outside of the blue uncertainty envelope. This suggests that we might be able to identify keepers who are performing better or worse than the average goalkeeper, which would be pretty handy for player assessment purposes. Luckily, we can employ some more maths courtesy of Pete Owen who presented a binomial method to rank shot-stopping performance in a series of posts available here and here.

The table below lists the top-10 goalkeepers who have faced more than 200 shots over the past four seasons by the binomial ranking method.

GK-Top10.png

Top-10 goalkeepers as ranked by their binomial shot-stopper-ranking. Post-shot refers to expected save model that accounts for shot placement. Data via Opta.

I don’t know about you but that doesn’t look like too shabby a list of the top keepers. It may be that some of the names on the list have serious flaws in their game aside from shot-stopping but that will have to wait another day and another analysis.

So where does that leave us in terms of goalkeeping analytics? On one hand, we have noisy unrepeatable metrics from season-to-season. On the other, we appear to have some methods available to extract the signal from the noise over larger samples. Even then, we might be being fooled by aspects not included in the model or the simple fact that we expect to observe outliers.

Deficiencies in the model are likely our primary concern but these should be checked by a skilled eye and video clips, which should already be part of the review process (quit sniggering at the back there). Consequently, the risks ingrained in using an imperfect model can be at least partially mitigated against.

Requiring 2-3 seasons of data to get a truly robust view on shot-stopping ability may be too long in some cases. However, perhaps we can afford to take a longer-term view for such an important position that doesn’t typically see too much turnover of personnel compared to other positions. The level of confidence you might want when short-listing might well depend on the situation at hand; perhaps an 80% chance of your target being an above average shot-stopper would be palatable in some cases?

All this is to say that I think you can assess goalkeepers by the saves they do or do not make. You just need to be willing to embrace a little uncertainty in the process.

On the anatomy of a counter-attack

Originally published on StatsBomb.

One of the most enduring aspects of football is the multitude of tactical and stylistic approaches that can be employed to be successful. Context is king in analytics and football as a whole, so the ability to identify and quantify these approaches is crucial for both opposition scouting and player transfer profiles.

At the OptaPro Forum this year, I looked at data from the past five Premier League seasons and used a sprinkling of maths to categorise shots into different types.

One such style I identified was ‘fast attacks from deep’, which were a distinct class of shots born of fast and direct possessions originating in the defensive zone. While these aren’t entirely synonymous with counter-attacks, there is likely a lot of overlap; the classical counter-attack is likely a subset of the deep fast-attacks identified in the data.

These fast-attacks from deep typically offer good scoring chances, with above average shot conversion (10.7%) due to the better shot locations afforded to them. They made up approximately 23% of the shots in my analysis.

So what do they look like?

To provide an overview of the key features of these attacks, I’ve averaged them together to get a broad picture of their progression up the pitch. I’ve presented this below and included a look at attacks from deep that involve more build-up play for comparison.

Fast-attacks_vs_Build-up

Comparison between fast-attacks from deep and attacks from deep that focus on slower build-up play. Vertical pitch position refers to the progression of an attack towards the opponent’s goal (vertical pitch position equal to 100). Both attack types start and end in similar locations on average but their progress with time is quite different. The shading is the standard deviation to give an idea of the spread inherent in the data. Data via Opta.

Fast-attacks from deep are characterised by an initial speedy progression towards goal within a team’s own half, followed by a steadier advance in the attacking half. This makes sense qualitatively as counter-attacks often see a quick transition in their early stages to properly establish the attacking opportunity. The attack can then be less frenetic as a team seeks to create the best opportunity possible from the situation.

Over the past five seasons, the stand out teams as rated by shot volume and expected goals have been various incarnations of Arsenal, Manchester City, Chelsea and Liverpool.

The architects

Player-level metrics can be used to figure out who the crucial architects of a counter-attacking situation are. One method of examining this is how many yards a player’s passing progressed the ball during deep fast-attacking possessions.

Below I’ve listed the top 10 players from the 2016/17 season by this metric on a per 90 minute basis, alongside some other metrics for your delectation.

BallProgression_SummaryTable

Top players ranked by ball progression per 90 minutes (in yards) during fast-attacks from deep for the 2016/17 Premier League season. xGoals and Goals per 90 are for possessions that a player is involved in (known as xGChain in some parts). Players with more than 1800 minutes only. Data via Opta.

While the focus was often on him kicking people rather than the ball, we see that Granit Xhaka stands alone in terms of ball progression, with Daley Blind a long way behind him in second place. Xhaka’s long-range passing skills are well known, so combining this with the most passes per 90 in such situations propels him to the top of pile.

The graphic below illustrates Xhaka’s passing during deep fast-attacks, with his penchant for long passes spread all over the midfield zone evident. For comparison, I’ve included Eden Hazard’s passing map as someone who played many important passes that were limited in terms of ball progression as they were typically shorter or lateral passes in the final third.

Xhaka_Hazard_PassMaps.png

Passes played by Granit Xhaka and Eden Hazard during fast-attacks from deep during the 2016/17 season. Solid circles denote pass origin, while the arrows indicate the direction and end point of each pass. Data via Opta.

Evidently there is a link between position and ball progression, as players in deeper positions have greater scope to progress the ball as they have more grass in front of them. The likes of Coutinho, Özil and De Bruyne residing so high up the rankings is therefore impressive.

Coutinho_DeBruyne_PassMaps.png

Passes played by Philippe Coutinho and Kevin Dr Bruyne during fast-attacks from deep during the 2016/17 season. Data via Opta.

Coutinho’s passing chalkboard above illustrates his keen eye for a pass from midfield areas through opposition defensive lines, as does De Bruyne’s ability to find teammates inside the penalty area. De Bruyne’s contribution actually ranks highest in terms of xG per 90 for the past season.

The finishers

While ball progression through the defensive and midfield zones is important for these fast-attacks from deep, they still require the finishing touches in the final third. There are fewer more frustrating sights in football than watching a counter-attack be botched in its final moments.

The graphic below summarises the top players in this crucial aspect by examining their expected goal and assist outputs. Unsurprisingly, Kevin De Bruyne leads the way here and is powered by his exceptional creative passing.

xGandxA_DFA

Top-20 players rated by expected goals (xG) plus expenses assists (xA) for fast-attacks from deep. Players with more than 1800 minutes only. Data via Opta.

The list is dominated by players from the top-6 clubs, with Negredo the only interloper inside the top-10 ranking. Middlesbrough’s minimal attacking output left few scraps of solace for Negredo but at least he did get a few shots away in these high-value situations to alleviate the boredom.

Conclusion

The investigation of tactical and stylistic approaches carried out above merely scratches the surface of possibilities for opposition scouting and player profiling.

Being able to identify ‘successful’ attacking moves opens the door to examining ‘failed’ possessions, which would allow efficiency to be studied as well as defensive aspects. This is an area rich with promise that I’ll examine in the future, along with other styles identified within the same framework.