Pages

Saturday 2 May 2015

Premier League 2014-15 Football Match Analysis

Premier League 2014-15 Football Match Analysis (prem)

The prem notebook demonstrates the power of IPython Notebookpython, pandas and matplotlib to analyse a data set. The data set I am using is the English premiership league 2014-15 football match results as of 27th April 2015.




The notebook answers questions such as: 
  • Which team scored the most goals in a match?
  • Which team had the most shots in a match?
  • Which team has the best shots per goal ratio?
  • Which referees brandish the most red cards?
  • What is the average number of goals per game?
  • Which team has drawn most and least games?
  • What is the most and least common score?
  • What does the graph comparing the top team performance look like?
  • What does the graph comparing the bottom team performance look like?



View the notebook

You can view the output from the notebook by simply clicking on the prem notebook link. This runs nbviewer on the notebook.


Download and run the IPython notebook

Alternatively you can download the notebook and take advantage of the IPython Notebook interactive computing environment. You can run the code in each cell and, if you are in the mood, hack away to answer your own questions of the data.
The notebook and data file are stored in the prem repository on github - see the github bootcamp instructions for an introduction to github.
To download prem:
$ git clone https://github.com/terry.dolan/prem.git project-prem
This will create a project-prem directory with all the key files.
$ cd project-prem
Now run the IPython Notebook:
$ ipython notebook prem-analysis.ipynb

Want an up to date data analysis?

Simply download the latest English Premier League stats csv file in to the data sub-directory in project-prem and re-run the notebook. Note that some of the comments in the notebook relate specifically to the data at 27th April 2015.

Acknowledgements

Thanks to http://www.football-data.co.uk for the raw data. And to the community behind python, ipython, pandas and matplotlib. IPython Notebook is an excellent rapid prototyping environment. We've hardly scratched the surface on what can be done with pandas. Hopefully this has whet your appetite for data munging and analysis.

No comments:

Post a Comment