Building a Football Prediction Model

by

in

Could I program a model to give me an accurate prediction of a football match? Maybe. I want to document my experiences in building a model that will predict games from scratch.

Background

One thing I found from reading papers is that a football match isn’t a single event. It can be separated into a series of attacks by each team. Some teams create attacks more frequently, some convert their chances more efficiently, and others defend well enough to suppress them altogether.

It is more effective to ask how many goals each team scores than who wins.

Foundations

A 1982 paper, ‘Modelling association football scores’ assigned every team into two parameters.

  1. Its Attacking strength (How often a team creates quality chances)
  2. Its Defensive strength (How well a team prevents goals)

This can be represented in Python by the following code:

Poisson Distribution

The Poisson Distribution models the probability of several events happening within a fixed period of time. This can be used to create a team’s expected goals.

For example, if a team scores on average 1.4 goals per match, the Poisson distribution tells us how likely they are to score 0 goals, 1 goal, 2 goals, etc. Crucially, it doesn’t say they will score 1.4 goals. It says that 1.4 is the centre of gravity around which outcomes fluctuate.

This is useful, as it can calculate the probability of every scoreline. If these are added together, you can get the probability of a home win, a draw and an away win.

I found a dataset online listing all the scores in the Premier League this season. I then created the model using Python code and created an Excel sheet to keep track of the scores and xG from the model.

Results – Week 1:

10 games are played per gameweek. Because I have started this project in the middle of the season, I collected the data from teams’ previous games.

Out of the 20 teams, it predicted the correct goals scored 6 times, giving me an accuracy of 30%. Not great, but the best models have 55-65% accuracy. A good goal to have for the project would be to consistently work towards getting higher accuracy.

There are a few things that happened in the football world which wasn’t factored into the model. The most notable being the sacking of head coach Reuben Amorim before the Man U vs Man C game, and the replacement of Michael Carrick. This could explain their surprise win against Man C, as the new manager bounce is a well-known effect in sports.

Conclusions and next steps

This model will have to learn over time. Form will rise and fall. Players get injured, and teams change. Instead of calculating the xG (Expected Goals) of a team overall, I could try to calculate the xG of individual players. This would need me to find a predicted lineup of both teams and factor this into the model.

I aim to track the accuracy of this model and ultimately try find any value bets on betting exchanges.


Leave a comment