More thinking about goalkeepers

Following my previous article on the shot-stopping ability of goalkeepers, Mike Goodman posed an interesting question on Twitter:

This is certainly not an annoying question and I tend to think that such questions should be encouraged in the analytics community. Greater discussion should stimulate further work and enrich the community.

It certainly stands to reason and observation that goalkeepers can influence a strikers options and decision-making when shooting but extracting robust signals of such a skill may prove problematic.

To try and answer this question, I built a quick model to calculate the likelihood that a non-blocked shot would end up on target. It’s essentially the same model as in my previous post but for expected shots on target rather than goals. The idea behind the model is that goalkeepers who are able to ‘force’ shots off-target would have a net positive rating when subtracting actual shots on target from the expected rate.

When I looked at the results, two of the standout names were Gianluigi Buffon and Jan Oblak; Buffon is a legend of the game and up there with the best of all time, while Oblak is certainly well regarded, so not a bad start.

However, after delving a little deeper, dragons started appearing in the analysis.

In theory, goalkeepers influencing shot-on-target rates would do so for shots closer to goal as they would narrow the amount of goal they can aim for via their positioning. However, I found the exact opposite. Further investigation of the model workings pointed to the problem – the model showed significant biases depending on whether the shot was inside or outside the area.

This is shown below where actual and expected shot-on-target totals for each goalkeeper are compared. For shots inside the box, the model tends to under-predict, while the opposite is the case for outside the box shots. These two biases cancelled each other out when looking at the full aggregated numbers (the slope was 0.998 for total shots-on-target vs the expected rate).


Actual vs expected shots-on-target totals for goalkeepers considered in the analysis. Dashed line is the 1:1 line, while the solid line is the line of best fit. Left-hand plot is for shots inside the box, while the right-hand plot is for shots outside the box. Data via Opta.

The upshot of this was that goalkeepers performing well-above expectation were doing so due to shots from longer-range being off-target when compared to the expected rates for the model. I suspect that the lack of information on defensive pressure is skewing the results and introducing bias into the model.

Now when we think of Buffon and Oblak performing well, we recall that they play behind probably the two best defenses in Europe at Juventus and Atlético respectively. Rather than ascribing the over-performance to goalkeeping skill, the effect is likely driven by the defensive pressure applied by their team-mates and issues with the model.

Exploring model performance is something I’ve written about previously and I would also highly recommend this recent article by Garry Gelade on assessing expected goals. While the above is an unsatisfactory ending for the analysis, it does illustrate the importance of testing model output prior to presenting results and testing whether such results match with our theoretical expectations.

Knowing what questions analytics can and cannot answer is a pretty useful thing to know. Better luck next time hopefully.


On the anatomy of a counter-attack

Originally published on StatsBomb.

One of the most enduring aspects of football is the multitude of tactical and stylistic approaches that can be employed to be successful. Context is king in analytics and football as a whole, so the ability to identify and quantify these approaches is crucial for both opposition scouting and player transfer profiles.

At the OptaPro Forum this year, I looked at data from the past five Premier League seasons and used a sprinkling of maths to categorise shots into different types.

One such style I identified was ‘fast attacks from deep’, which were a distinct class of shots born of fast and direct possessions originating in the defensive zone. While these aren’t entirely synonymous with counter-attacks, there is likely a lot of overlap; the classical counter-attack is likely a subset of the deep fast-attacks identified in the data.

These fast-attacks from deep typically offer good scoring chances, with above average shot conversion (10.7%) due to the better shot locations afforded to them. They made up approximately 23% of the shots in my analysis.

So what do they look like?

To provide an overview of the key features of these attacks, I’ve averaged them together to get a broad picture of their progression up the pitch. I’ve presented this below and included a look at attacks from deep that involve more build-up play for comparison.


Comparison between fast-attacks from deep and attacks from deep that focus on slower build-up play. Vertical pitch position refers to the progression of an attack towards the opponent’s goal (vertical pitch position equal to 100). Both attack types start and end in similar locations on average but their progress with time is quite different. The shading is the standard deviation to give an idea of the spread inherent in the data. Data via Opta.

Fast-attacks from deep are characterised by an initial speedy progression towards goal within a team’s own half, followed by a steadier advance in the attacking half. This makes sense qualitatively as counter-attacks often see a quick transition in their early stages to properly establish the attacking opportunity. The attack can then be less frenetic as a team seeks to create the best opportunity possible from the situation.

Over the past five seasons, the stand out teams as rated by shot volume and expected goals have been various incarnations of Arsenal, Manchester City, Chelsea and Liverpool.

The architects

Player-level metrics can be used to figure out who the crucial architects of a counter-attacking situation are. One method of examining this is how many yards a player’s passing progressed the ball during deep fast-attacking possessions.

Below I’ve listed the top 10 players from the 2016/17 season by this metric on a per 90 minute basis, alongside some other metrics for your delectation.


Top players ranked by ball progression per 90 minutes (in yards) during fast-attacks from deep for the 2016/17 Premier League season. xGoals and Goals per 90 are for possessions that a player is involved in (known as xGChain in some parts). Players with more than 1800 minutes only. Data via Opta.

While the focus was often on him kicking people rather than the ball, we see that Granit Xhaka stands alone in terms of ball progression, with Daley Blind a long way behind him in second place. Xhaka’s long-range passing skills are well known, so combining this with the most passes per 90 in such situations propels him to the top of pile.

The graphic below illustrates Xhaka’s passing during deep fast-attacks, with his penchant for long passes spread all over the midfield zone evident. For comparison, I’ve included Eden Hazard’s passing map as someone who played many important passes that were limited in terms of ball progression as they were typically shorter or lateral passes in the final third.


Passes played by Granit Xhaka and Eden Hazard during fast-attacks from deep during the 2016/17 season. Solid circles denote pass origin, while the arrows indicate the direction and end point of each pass. Data via Opta.

Evidently there is a link between position and ball progression, as players in deeper positions have greater scope to progress the ball as they have more grass in front of them. The likes of Coutinho, Özil and De Bruyne residing so high up the rankings is therefore impressive.


Passes played by Philippe Coutinho and Kevin Dr Bruyne during fast-attacks from deep during the 2016/17 season. Data via Opta.

Coutinho’s passing chalkboard above illustrates his keen eye for a pass from midfield areas through opposition defensive lines, as does De Bruyne’s ability to find teammates inside the penalty area. De Bruyne’s contribution actually ranks highest in terms of xG per 90 for the past season.

The finishers

While ball progression through the defensive and midfield zones is important for these fast-attacks from deep, they still require the finishing touches in the final third. There are fewer more frustrating sights in football than watching a counter-attack be botched in its final moments.

The graphic below summarises the top players in this crucial aspect by examining their expected goal and assist outputs. Unsurprisingly, Kevin De Bruyne leads the way here and is powered by his exceptional creative passing.


Top-20 players rated by expected goals (xG) plus expenses assists (xA) for fast-attacks from deep. Players with more than 1800 minutes only. Data via Opta.

The list is dominated by players from the top-6 clubs, with Negredo the only interloper inside the top-10 ranking. Middlesbrough’s minimal attacking output left few scraps of solace for Negredo but at least he did get a few shots away in these high-value situations to alleviate the boredom.


The investigation of tactical and stylistic approaches carried out above merely scratches the surface of possibilities for opposition scouting and player profiling.

Being able to identify ‘successful’ attacking moves opens the door to examining ‘failed’ possessions, which would allow efficiency to be studied as well as defensive aspects. This is an area rich with promise that I’ll examine in the future, along with other styles identified within the same framework.

OptaPro Analytics Forum 2016 accepting abstract proposals

OptaPro are inviting proposals to present at their Analytics Forum, which according to their announcement:

aims to connect football clubs with analytical communities and experts working outside of the professional game

This will be the third year that the forum has taken place and an impressive number of clubs and other football organisations are represented at the forum, along with plenty of laptop gurus with no relevant playing experience.

I was lucky/skillful enough to have my proposal accepted last year, so I thought it might be useful if I posted my abstract as an example. I’m told that the judges liked it as it was tailored to the audience i.e. club analysts.

When I wrote it, my aim was to define a clear and (hopefully) relevant question and give some idea of how feasible it was and how it could be used. I posted the slides and video of my presentation here if you want to check it out.

If you’re thinking of submitting, then I would highly recommend it. The forum is a great way to meet others working in football analytics and as a member of the online analytics community, it was great to properly meet people I had ‘known’ via Twitter. Presenting was a valuable experience also and led to interesting discussions with people during and after the event.

The closing date for submissions is midnight Sunday 18th October. My abstract is below and good luck with your submissions.

Finding square pegs for square holes: identifying player types for scouting

Proposed area of study: player evaluation

Proposed method: Principal component analysis and cluster analysis of on-ball player data

One consideration when scouting potential player signings is how well they will fit into their new team environment. A common criticism of a perceived failed player transfer is that the player was a “square peg for a round hole”. This study will aim to identify certain player types based on their statistical output to aid finding the “right fit” when scouting players.

I propose using Principal Component Analysis (PCA) to distinguish players based on their underlying performance data (specifically Opta’s on-ball data). PCA is an ideal method for exploring datasets with multiple variables in order to discern patterns in the underlying data. This study builds on my previous analysis that used a similar method to study playing styles at the team level1. I will further extend this by applying cluster analysis to the data to group the players into certain types based on their attributes.

I have already explored the feasibility of this method using publically available Opta data from and the results are promising. In order to extend the analysis for the forum, I would look to apply the method to more granular data, with a focus on player actions in open-play; the current dataset I have used groups all on-field actions together, which is not ideal. Furthermore, inclusion of location data would provide additional context for the analysis and aid differentiation of players and styles.

The persistence of player traits and classification will be assessed. Providing the dataset is large enough, it should be possible to test this persistence for players staying at the same team and for those who transfer to a new one. This will be a crucial aspect of the analysis and its utility.

The output from the analysis can serve as an additional tool when identifying potential transfer signings by categorising players according to their team role and providing statistical baselines for their performance compared to their peers. For example, the method separates different styles of central midfielders, such as deep-lying playmakers and defensive midfield “destroyers”. Players can then be compared against their peers in that style category based on the important traits of those player types.

By applying these techniques, this study will aim is to provide a more robust “apples-to-apples” comparison technique and find the appropriate square peg for the square hole in question.

1Relevant blog posts available here: