Transcript
Page 1: The effects of weather on football matches played within the German Bundesliga

Declaration Cover Sheet for Project Submission SECTION 1 Student to complete

Name:

Student ID:

Supervisor:

SECTION 2 Confirmation of Authorship The acceptance of your work is subject to your signature on the following declaration: I confirm that I have read the College statement on plagiarism (summarised overleaf and printed in full in the Student Handbook) and that the work I have submitted for assessment is entirely my own work. Signature:_______________________________________________ Date:____________ NB. If it is suspected that your assignment contains the work of others falsely represented as your own, it will be referred to the College’s Disciplinary Committee. Should the Committee be satisfied that plagiarism has occurred this is likely to lead to your failing the module and possibly to your being suspended or expelled from college. Complete the sections above and attach it to the front of one of the copies of your assignment.

Page 2: The effects of weather on football matches played within the German Bundesliga

National College of Ireland

Hdip

2013/2014

Alastair Macnair

x13129325

[email protected]

Can the weather kill goals?

The effects of weather on goal outcome for football

matches played within the German Bundesliga

Dissertation

Page 3: The effects of weather on football matches played within the German Bundesliga

Table of Contents

Executive Summary .............................................................................................. 5

1 Introduction .................................................................................................... 6

1.1 Background ............................................................................................. 6

1.2 Aims ........................................................................................................ 7

1.2.1 Research Questions ......................................................................... 9

1.3 Solution Overview .................................................................................. 10

1.4 Structure ................................................................................................ 11

2 Literature Review and Related Work ........................................................... 12

2.1 Introduction ............................................................................................ 12

2.2 Statistical analysis in football ................................................................. 12

2.2.1 Football Tradition versus Data Analysis .......................................... 12

2.2.2 Statistical Methods in sports ........................................................... 13

2.3 The effect of weather on sports performance ........................................ 15

2.3.1 Weather, Altitude and measurement Indices. ................................. 15

2.3.2 Weather Effects in Sport ................................................................. 16

2.4 Conclusion ............................................................................................. 19

3 System and Datasets .................................................................................. 20

3.1 Design and Architecture ........................................................................ 20

3.2 Datasets ................................................................................................ 21

3.2.1 Data Set 1 – Bundesliga Football Results ....................................... 21

3.2.2 Data Set 2 – Weather Observation Data for Germany .................... 21

3.2.3 Data Set 3 - Stadiums ..................................................................... 22

3.2.4 Data Set 4 - Weather Stations ........................................................ 22

3.2.5 Limitations of the Data .................................................................... 23

3.3 Data Processing .................................................................................... 24

3.3.1 Introduction ..................................................................................... 24

3.3.2 Football Match Data ........................................................................ 26

3.3.3 Stadium Data Set creation .............................................................. 26

3.3.4 Distance Matrix Calculation ............................................................. 26

3.3.5 Checking Altitude Differential .......................................................... 27

Page 4: The effects of weather on football matches played within the German Bundesliga

- 3 -

3.4 Dealing with Missing Data ..................................................................... 28

3.5 Mashup .................................................................................................. 29

3.6 Feature Engineering .............................................................................. 30

4 Testing, Evaluation & Error Checking .......................................................... 32

4.1 Introduction ............................................................................................ 32

4.1.1 Test 01 – Check Downloaded files .................................................. 32

4.1.2 Test 02 – Football match data set ................................................... 32

4.1.3 Test 03 – Weather Station data ...................................................... 32

4.1.4 Test 04 – Stadium Data .................................................................. 32

4.1.5 Test 05 – Distance Matrix ............................................................... 32

4.1.6 Test 06 – Imputed Data .................................................................. 33

4.1.7 Test 07 – Final Checking ................................................................ 34

4.2 Data analysis ......................................................................................... 35

4.2.1 Introduction ..................................................................................... 35

4.2.2 Data Analysis Process and Methodology ........................................ 35

4.2.3 Exploratory Analysis ....................................................................... 36

4.2.4 Total Goals Analysis ....................................................................... 44

4.2.5 Over/Under 2.5 goals scored .......................................................... 49

4.2.6 Goal Difference ............................................................................... 53

4.2.7 Home/Away Win or Draw ................................................................ 56

4.3 Altitude Effects ....................................................................................... 59

4.4 Scatter plots and correlation .................................................................. 60

4.5 Data Mining and Predictive Modelling ................................................... 62

4.6 Analysis Conclusion .............................................................................. 62

5 Conclusions ................................................................................................. 64

5.1.1 Introduction ..................................................................................... 64

5.1.2 Theoretical Implications .................................................................. 66

5.1.3 Conclusion ...................................................................................... 67

6 Further development or research ................................................................ 69

7 References .................................................................................................. 70

8 Appendix ..................................................................................................... 75

8.1 Glossary of Terms ................................................................................. 75

Page 5: The effects of weather on football matches played within the German Bundesliga

- 4 -

8.2 Geography ............................................................................................. 77

8.2.1 Map of Europe and Germany .......................................................... 77

8.2.2 Map of Germany ............................................................................. 78

8.2.3 States of Germany (and created Regions) ...................................... 79

8.2.4 Weather Station & Stadium Locations ............................................ 80

8.2.5 Final weather station and Stadium locations ................................... 82

8.3 Data Sets ............................................................................................... 83

8.3.1 Football Data ................................................................................... 83

8.3.2 Weather Observation Data .............................................................. 84

8.3.3 Stadium Data .................................................................................. 88

8.3.4 Weather Station Data ...................................................................... 91

8.4 Data Set Variables ................................................................................. 92

8.5 Naïve Bayes probability outcomes ........................................................ 94

8.6 List of Figures and Tables ..................................................................... 95

8.7 List of Tables ......................................................................................... 96

8.8 Initial Project Plan .................................................................................. 97

8.9 Initial Requirement Specification ......................................................... 124

8.10 Management Progress Reports ........................................................... 153

8.10.1 Management Progress Report 1 ................................................... 154

8.10.2 Management Progress Report 2 ................................................... 165

8.10.3 Management Progress Report 3 ................................................... 179

8.11 Other Material Used ............................................................................ 192

Page 6: The effects of weather on football matches played within the German Bundesliga

- 5 -

Executive Summary

This study seeks to investigate the relationship between football match goal

outcome and weather effects within Germany’s top two football leagues;

Bundesliga 1 and 2. Twenty one seasons of historic football results for both of the

Bundesliga tiers is used from 1993 until 2013 which is linked to weather

observation data by matching each of the 80+ weather stations around Germany

with the nearest stadium where the matches were played. Five weather variables

are used in the study including average daily temperature, rainfall, cloud cover,

wind speed and humidity. These are investigated against four Goal Outcome

measures; Total Goals, Goal Difference, Under/Over2.5 and Home/Away/Draw

results. The R Studio open source software is used to investigate and determine

whether any relationship between them exists.

The study finds that temperature has a small but measurable effect on Total goals

scored reducing goal averages very slightly as temperature decreases,

predominantly during the winter season. Other effects such as humidity and cloud

cover have no measurable effect with goals scored being consistent across the

entire range. Wind and rain seem also have no obvious trend except at the extreme

ranges where they inflict interference on all goal outcome variables. However this

interference effect has no obvious trend or pattern, is generally unpredictable and

is based on a very low sample size, given the inherently rare nature of such events,

and as such the results should be treated cautiously. Overall there is no clear

relationship or correlation between weather and goal outcome and the study raises

the issue on how well we really understand such factors on sports.

Page 7: The effects of weather on football matches played within the German Bundesliga

- 6 -

1 Introduction

1.1 Background

The Sports industry within the Europe Union (EU) is estimated to contribute €294

Billion to the economy based on a recent study by the European Commission

(2012) and supports 2% of all EU jobs or around 4.46million employees. Germany

provides the highest number of these jobs at 1.15 million or almost 27% of all the

sports related jobs within the EU. The Bundesliga, Germany’s top football league,

is one of the ‘big five’ within Europe which includes England, Spain, Italy and

France. In 2013/14 the combined revenue of these top five leagues grew by 5% to

€9.8 Billion representing almost half of the entire European football market which

was valued at €19.9 billion in the same period (Deloitte, 2014.) The German

Bundesliga is characterised by strong cost control with the lowest wage to revenue

ratio at 51% which resulted in it being one of only two Leagues in Europe to

generate an operating profit (€264m) for the sixth successive year in the 2013/14

season. In this context football should not be seen solely as a hobby or weekly

television event but an increasingly important economic part of Europe and a major

provider of employment within the European Union.

In 2012 Germany passed new laws to promote liberalisation of its gambling and

betting market. Marketing company MECN (2012) estimates that the German

sports betting market will grow to around €1.5 Billion in 2015 alone. This is part of

a global sports market which is estimated to be valued at around €733 Billion (BBC,

2014) with over 70% of that total being derived from football matches. Betting on

football matches became popular around the early 1920’s within the UK with the

creation of the Football Pools (2014), one of the oldest gaming companies in the

world, allowing fans to predict matches and win money if they proved to be correct.

Given the value of this market there is a critical need for gaming companies to

ensure they fully understand the products they are providing, and all of the

Page 8: The effects of weather on football matches played within the German Bundesliga

- 7 -

variables that effect the odds of the betting instruments they provide to customers,

as mistakes could be costly.

Finally, trainers, teams and players are always seeking to gain competitive

advantage to ensure continuing or future success. Utilising statistical information

and data analytics is becoming increasingly important within the football and sports

sector as these groups seek to become both smarter and more efficient (Lewis,

2014.) The majority of this analysis currently focusses on the players and there

has been very little analysis on the influence of external factors such as weather.

There are claims that suggest that weather, in particular temperature, could be a

factor in the outcome of football matches (British Weather Services, 2014.)

The weather in Germany is characterised as a typically moderate and temperate

climate with temperatures ranging from around zero degrees in winter to the mid

twenty’s in the summer (ECA&D, 2014) The use of a more temperate climate like

Germany reduces the known influence that extreme temperatures have on sports

performance and health (Hong, 2014.) However, throughout the 21 year period

there have been colder and warmer periods ranging from -15°C to 28°C as well

as wetter and windier periods (ECA&D, 2014) Germany is large enough to have

distinct regions of weather patterns (Encyclopedia Britannica, 2014) and the

Bundesliga stadiums are distributed around Germany widely enough (Appendix

8.2.3) to be able to see if this regional weather has any measurable effect on goal

outcome results.

1.2 Aims

The aim of the study is to determine if specific weather factors, either

independently or in combination, including temperature, precipitation, wind speed,

humidity and cloud cover, have any measurable effect on the goal outcome of

football matches played within the German Bundesliga 1 and 2. The proposed

Page 9: The effects of weather on football matches played within the German Bundesliga

- 8 -

theory is that weather conditions such as lower or higher temperatures, wind or

rainfall may have an effect on the goal outcome. Football match data and historic

recorded weather observations can be linked using Germany’s weather station

network located throughout the entire country consisting of around 100 potentially

viable weather stations and geo-located using the stadiums where each match was

played. By linking these data sets it will be determined if weather effects play a role

in goal outcome when considered across a time period of 21 years ranging from

the 1993/94 season until the 2013/14 season.

Goal outcome within this study is defined by four commonly used betting

instruments (PKR, 2014) as a means to assess and compare the data, identify any

relationships, and if viable provide a basis to make predictions. (See Appendix 8.1

for definitions.)

1) The under/over 2.5 goals scored (UO)

2) The Home / Away team win or Draw (HAD)

3) Goal difference (GD)

4) Total Match Goals scored (TG)

If a particular pattern or lean towards an uneven spread is found due to one or

more weather effects then this would be of particular interest to managers, trainers,

players and in particular those companies that provide such betting instruments as

products where that weather effect is not currently an in built factor.

Secondary objectives will consider the differences, if any, between the first and

second leagues and also if a particular geographical location or time of year affects

the goal outcome.

Page 10: The effects of weather on football matches played within the German Bundesliga

- 9 -

1.2.1 Research Questions

The primary problem being considered is the lack of information regarding the

relationship between weather factors such as temperature, precipitation, humidity,

wind and cloud cover and the effect that this may have on goal outcome in football

matches. The primary research question being considered is: -

“Is there any relationship between weather effects and goal outcome for football

matches played within the Bundesliga 1 & 2?”

From this the Null Hypothesis Ho and the hypothesis that will be tested H1 is

established: -

Ho: There is no relationship between goal outcome in football matches and daily

corresponding average values of temperature, wind speed, precipitation, humidity

and cloud cover.

H1: There is a relationship between goal outcome in football matches and daily

corresponding average values of temperature, wind speed, precipitation, humidity

and cloud cover.

The Null hypothesis is non directional and therefore a two tailed test will be applied

where appropriate with a significance level of 5%

Secondary Research Questions

Within the context of the broader research question there are further questions that

will be considered: -

Page 11: The effects of weather on football matches played within the German Bundesliga

- 10 -

1) Is there any measurable difference between goal outcomes within the

Bundesliga 1 and 2 due to the effects or weather?

2) Is goal outcome affected by just one, or multiple weather variables?

3) Does regional weather affect goal outcome?

4) Can the goal outcome of future matches be better predicted using selected

betting instruments and weather factors?

5) Do some seasons, months or time periods see goal outcome affected more

due to weather effects?

6) Can goal outcome be better predicted using combined weather indices such as

apparent temperature?

1.3 Solution Overview

The project will primarily utilise the open source R Studio (2014) statistical

programming package to import, clean, merge, feature engineer and finally

analyse the data, and then provide all graphical and statistical outputs as required

throughout the study. This will be supported by programs such as Microsoft Excel

notepad and PeaZip for extracting files to ensure that all parts of the data can be

accessed, checked and manipulated as required at all stages of the project. The

R Studio platform will be used to handle the 200+ raw data files and apply the

relevant changes and functions to the files to remove unwanted information and

ultimately merge them all together into a single coherent data frame where each

match played is linked with its corresponding weather observation data across the

21 years of play. The stadium information will need to be manually extracted and

created as a fourth data set to link the weather stations and match results.

The study will contribute to the overall body of knowledge of statistical analysis in

sport and also develop the understanding of the effects of weather on sport and

football in particular. The study will also help progress understanding on the issues

of handling and analysing large data sets.

Page 12: The effects of weather on football matches played within the German Bundesliga

- 11 -

1.4 Structure

The study is structured as follows: -

Section 1 – A background to the study, why the study is important and what

benefits it will potentially realise through its undertaking

Section 2 – A literature review to assess the existing body of knowledge in the

areas of weather effects on sports performance, sports statistics, stadium design

issues and data mining and analysis techniques

Section 3 – The system architecture and datasets utilised for the study and a

process flow description of how the various data sets were combined along with

key areas of interest detailed

Section 4 – Testing, evaluation and primary analysis of the data sets

Section 5 – Study conclusion and recommendations

Section 6 – Areas where the study could benefit from further future development

Section 7 – Supporting references

Section 8 – Appendixes. All supporting information such as tables and graphs and

preliminary or supporting documents such as management progress reports

Page 13: The effects of weather on football matches played within the German Bundesliga

- 12 -

2 Literature Review and Related Work

2.1 Introduction

This literature review’s primary aim is to examine all relevant and recent research

relating to the effects of weather on sport, in particular football, in relation to its

impact on achieved or measured performance levels. To support this objective

research has also been conducted into climatic and meteorological weather

effects, and types of measurement index. Additionally, statistical and data analysis

approaches in sports and also general analytical and relevant data mining

techniques have been investigated.

A range of sources were used including Google Scholar, Google Books, IEEE,

ACE, Directory of Open Access Journals and Wiley as well as books, articles and

websites to support this. The primary search terms used were (but not limited to):

Football, Soccer, Bundesliga, Weather, Sport, Rain, Wind, Temperature, Humidity,

Okta, Germany, Stadium, Environment, Performance, Statistics, data mining and

data analysis. Specific data parameters were not incorporated into the search

although a number of key relevant papers that were found were from as early as

1970. Papers that did focus of weather effects in sports were both quantative and

qualitative. Overall there is a reasonably good level of research in data analytics

on sports and weather effects on sport with recent publications in 2014 indicating

a strong ongoing interest in this area.

2.2 Statistical analysis in football

2.2.1 Football Tradition versus Data Analysis

Football is a game that has long been dominated by seven words; “That’s the way

it has always been done” highlighting the strong belief systems and dogmas that

dominate the game and still continue to exert influence over players, coaches and

Page 14: The effects of weather on football matches played within the German Bundesliga

- 13 -

fans alike (Anderson and Sally, 2013.) Tradition exerts such a strong social

influence that knowledge can remain static, being rarely challenged or questioned

with the suggestion that football as a game is full of unexamined clichés and myths

which have never been tested against real life data (Kuper and Syzmanski, 2012.)

However, anecdotal evidence within football, once commonly accepted, is now

being proven to be incorrect using statistical analysis, such as believing corner

kicks increase the chances of scoring for example (Anderson and Sally, 2013.)

Seeking to understand issues like these potentially can provide competitive

advantage to those competing and help shape and adapt the way they play.

2.2.2 Statistical Methods in sports

As with most statistical analysis variables in sport are classified as either

independent or dependent. For example independent variables such as various

weather factors may have an effect on multiple dependent variables such as goal

outcome (O’Donoghue and Holmes, 2014.) McGarry, O’Donoghue and Sampaio

(2013) suggest two statistical methods that could be of benefit in sports

performance analysis. Firstly they suggest that principal component analysis

(PCA) as a way of reducing the often large number of variables down into a smaller

manageable group which contains those components that account for the largest

amount of variance. They also suggest that regression analysis is an important tool

in being able to predict or forecast game outcome. Analysis on goals scored in

worldwide domestic matches by Greenhough, Birch, Chapman and Rowlands

(2002) using thin tailed binomial Poisson and negative binomial distributions have

shown that beyond the lower score range such models do not fit well except in

some cases such as the English premier league and that a best fit extremal

distribution is also needed. A simpler approach taken by the British Weather

Service (2013) was to use average goals scored and counts of the Under/Over2.5

goals scored and whether match temperature was above or below zero degrees

Page 15: The effects of weather on football matches played within the German Bundesliga

- 14 -

to draw conclusions on the effect of cold on score outcome suggesting that it was

in fact highly correlated. Hamilton (2014) developed this for the same data by using

a logistic regression to better explore if any relationship actually exists between

the match outcome and weather and found that there was almost no change in the

Under/over 2.5 goals scored probability due to temperature but noted that the

study only used 16 games on a single day.

Analytics Company Kickdex (2014) again adopted a simpler approach using the

goal Difference metric against totals of whether the favorite did or did not win in

rainy conditions for any given Goal Difference score. Gelade and Dobson (2007)

used linear regression modelling and factor analysis in their investigation of

worldwide football team performance. O’Donoghue and Holmes (2014) also

suggest linear modelling and correlation techniques as being especially useful for

making predictive models but note that these are for making average

generalisations only and not as models for predicting individual match results.

They also note the use of non-parametric tests in sports analysis due to most

sports data not following assumptions applicable to data which is normally

distributed. Peters and O’Donoghue (2013) used logistic regression to analyse the

performance of the Qatari football team and the influence of temperature on home

advantage. Finally the use of data mining methods such as Naïve Bayes allows for

probability outcomes of binary goal outcome results such as UnderOver2.5 to be

determined based on variables such as weather (Witten, Frank and Hall, 2011.)

Overall there are a range of techniques and models applicable to sports analysis

with an emphasis on simplicity to draw generalisations as to overall trends or

making predictions of future match outcomes.

Page 16: The effects of weather on football matches played within the German Bundesliga

- 15 -

2.3 The effect of weather on sports performance

2.3.1 Weather, Altitude and measurement Indices.

Germany has a temperature climate which is classified as Cfb based on the

Köppen-Geiger climate classification system (Rubel and Kottek, 2013.) This

identifies it as a having a warm temperature climate (C), being fully humid (f) and

having a generally warm summer (b). Recorded data for Germany (ECA&D, 2014)

indicate extreme temperatures ranging from -36°C to +40°C although these are

the exception and not typical of the overall climate. There have also been a number

of more extreme events classified for Germany occurring from 1999 onwards, but

despite these Germany is on average a temperate climatic region.

Physical features can affect weather, such as altitude, affecting temperature and

wind speed due to changes in atmospheric pressure and terrain. The lapse rate,

which is the rate at which temperature decreases for an increase in temperature is

approximately a 0.65°C drop in temperature per 100m in height gain. However,

the actual lapse rate is specific to each location and the conditions at the time

(Stone and Carlson, 1979.) The use of equivalent indices such as heat index, wind

chill and apparent temperature have been noted as being of particular use for

sports due to the combined interaction of weather effects being considered (Perry,

2004.) One index of particular interest is the apparent temperature index

(Steadman, 1984) which was first outlined in 1979 and then revised to its currently

used format in 1984. A simplified version that uses temperature (T), wind speed

(V) and humidity (e) is most commonly used today and is often described as a

‘feels like’ temperature combining the wind chill and heat index to provide an

adjusted temperature that takes into account wind speed and humidity.

Steadman’s (1984) approach uses a linear model which is applicable to most

outdoor conditions because it contains wind speed but it should be noted that there

can be no accurate formula for what a person actually feels like.

Page 17: The effects of weather on football matches played within the German Bundesliga

- 16 -

2.3.2 Weather Effects in Sport

Sports performance analysis is the process by which the various persons involved

within a sport such as coaches, analysts or physiologists come together to break

down a games performance from observed data and then identify those factors

which contributed towards either a good or bad performance (McGarry,

O’Donoghue, Sampaio, 2013.) The effects that weather and environmental factors

have on sport is an area where potentially considerable improvements could be

made according to Thornes (1977) to improve sports management, performance

and economic performance. There is evidence to suggest that some sports are

more adversely affected than others with endurance sports, in particular cycling,

being affected more by the weather (Pezzoli, Cristofori, Moncalero, Giacometto

and Boscolo, 2013.) The study also found that most sports were affected by three

primary characteristics namely temperature, humidity and wind. Rain was also

noted as a factor in a number of cases, for some, but not all sports.

Football is catagorised as a ‘weather interference sport’ in that it is ideally suited

to weather less days which are warm, dry, overcast, bright and with little wind. In

this category both teams experience the same weather conditions at the same time

and footballs structure of two halves where teams swap sides compensates for the

effects of factors such as wind. Weather for football therefore acts as an equal

interference factor (Thornes, 1977.) In contrast Perry (2004) suggests that factors

like wind in football could provide an unequal advantage to one team over the

other. Perry suggests that wind and rain have a significant subjective effect on

football matches played while temperature and other weather effects have only a

very limited impact.

Stephens Hawking’s research into England’s chances of success also factored in

weather variables and altitude (Paddy Power, 2014.) The study showed that a 5

degree Celsius increase in temperature could significantly reduce England’s

chances of winning as would playing over 500m altitude. Taylor and Rollo (2014)

Page 18: The effects of weather on football matches played within the German Bundesliga

- 17 -

also note that low altitudes which range between 500-2000m would provide an

impairment to aerobic performance due to reduced air pressure although this can

vary and is specific to geographical location and individual player conditioning.

Acclimitisation has also been shown by Gelade and Dobson (2007) to have a

potentially significant effect on international football where teams travel to a

different climatic region that they are unaccustomed to.

Precipitation can affect overall play conditions making the ball drag and lose spin

and skid on the playing surface. This can affect each team’s ability to control the

ball and increase the difficulty of the goalkeeper in making saves (Kickdex, 2014.)

The analytics company suggests that rain does have an effect on playing

conditions and goal outcome by altering the outcome of the favorite (the team with

the shortest pre match odds.) suggesting they are more likely to prevail if it doesn’t

rain. Their data indicates that the favorites chances of winning when there is rain

is reduced to 50% compared to a 67% chance of winning when there is no rain,

and also that the overall total number of goals per match increases significantly as

well if rain occurs. However, the Kickdex (2014) study only considers 147 matches

played within London and only provides results. Rain has also been shown to

affect ball characteristics and studies that investigate the flight characteristics of

how a ball moves in flight (Carre, Asai, Akatsuka and Haake, 2002) show that

precipitation affects the trajectory such that it is more likely to be off target.

Thornes (1977) notes that lower temperatures can be a hazard as blood to the

hands and feet is lost much quicker in colder conditions affecting sports like football

where players and goalkeepers rely on extremities to maintain performance. Riley

and Williams (2003) also suggest that colder weather reduces limb temperatures

which would detrimentally affect motor performance as well as strength and power.

In fact muscle power was found to be reduced by 5% for every 1°C drop in muscle

temperature below normal. The effects of very extreme hot or cold temperatures

on human physiology are known to directly affect both performance and health

Page 19: The effects of weather on football matches played within the German Bundesliga

- 18 -

(Hong, 2014.) Although the contribution of weather factors where extreme

temperature is not a factor, is still not clearly understood or established. A

qualitative study by Olszak (2012) found that while player’s perceptions of how

games were affected by weather varied they generally had no statistical

significance as to real measured effects.

The effects of temperature on the ball, being made of viscoelastic materials is also

a factor as with temperatures approaching zero degrees Celsius a goalkeeper has

7% more time to react to a penalty that at higher temperatures when the ball moves

quicker. (Wiart, Kelley, James and Allen, 2012) The flight of the ball is also affected

with colder conditions causing the ball to drop and move slower overall with less

power than at warmer temperatures. However as Riley and Williams (2003) point

out in colder weather the goalkeeper is most susceptible to reduced limb

temperature and dexterity unless they keep highly active. A study in baseball (Kraft

and Skeeter, 1995) found that temperature appeared to play a significant role in

how a far a batter could hit a ball compared to factors such as wind speed which

are specific to a particular stadium and local terrain and turbulence effects.

Advanced Football Analytics (2014) also found that temperature significantly

affected the success rate of field goals scored in American football with lower

temperatures reducing the distance that players can successfully score from.

Finally stadiums have been shown to affect weather factors (Szucs, Allard,

Moreau, 2009.) Wind in particular can be mitigated significantly where stadiums

are enclosed or have roofs although both temperature and humidity remain

unaffected. However, wind channeling effects and turbulence are highly localised

specific to each stadia and the surrounding area. Kraft and Skeeter (1995) also

noted that such effects were hard to predict.

Page 20: The effects of weather on football matches played within the German Bundesliga

- 19 -

2.4 Conclusion

This review investigated and examined the area of weather effects on sport and

statistical methods used in sports to gain understanding of how such variables

effect sport performance and goal outcome. The review found that in relation to

weather effects on sports there are multiple studies that suggest that such factors

are affecting sports like football and performance although these studies often use

a qualitative approach to justify this. However, there is some conflicting opinion on

which weather effect affects performance and to what extent actual sports

performance, like goal outcome is affected. In that regard there is a lack of

quantative knowledge in this field. Some of the sources used are commercial in

nature and so caution must be taken as to the data and results they present which

may not be entirely unbiased or be subject to checking.

Page 21: The effects of weather on football matches played within the German Bundesliga

- 20 -

3 System and Datasets

3.1 Design and Architecture

The overall system architecture is shown below in figure 1. The Data sources that

form the basis of this study are freely available for download via a PC with internet

access. A stand-alone PC with the open source R studio (2014) software is used

at all stages to gather, clean, combine, analyse and then make predictions. R

Studio requires online access to Googles Mapping API through a number of

additional packages to facilitate geolocation functions, calculation of altitude and

creating distance matrixes to determine the nearest weather station to each

stadium. The results of the study and any viable predictions are then provided to

the customer as the end product.

Figure 1: System Architecture Design

Page 22: The effects of weather on football matches played within the German Bundesliga

- 21 -

3.2 Datasets

The study uses and combines four datasets to achieve a single data table for

feature engineering and analysis. One of the data sets, stadiums, was not available

in any single readily useable format and required the stadium names to be

manually obtained and entered for each unique team using secondary sources of

information. All the other data sets could be downloaded from their respective

sources as outlined below in either txt of csv file format. In total these four data

sets when combined will provide all 12,926 match results with the best available

weather observation information specific to each location where the match was

played and on the correct date.

3.2.1 Data Set 1 – Bundesliga Football Results

Useable football data for the Bundesliga 1 and 2 is available from an online source

(Football-UK, 2014) which provides a single csv file for every season of play. The

data covers the period from 1993 up to 2013 (21 years in total) which is the most

recent season that came to an end in May 2014. The data provides in all cases the

date the game was played, the home and away team names and the home and

away team goals scored at full time. Recent years gathered even more data but

this is not reflected in previous years consistently and so cannot be used. There

are typically 306 games played per season and across two leagues and 21 years

this equates to 12,926 games in total which will be subject to analysis. An example

of the raw football data is provided in appendix 8.3.1.

3.2.2 Data Set 2 – Weather Observation Data for Germany

Weather data for Germany was obtained from the European Climate Assessment

& Database (ECA, 2014.) There are 84 weather stations for Germany which

contain every weather variable required in the study. The weather observation data

is the actual recorded weather information such as rainfall, temperature and

humidity. This is distinct from the actual weather station itself which is a physical

Page 23: The effects of weather on football matches played within the German Bundesliga

- 22 -

location at which weather observation data is recorded. Weather Observation data

is provided as a single comma delimited text file for every unique weather variable

at every weather station (See Appendix 8.3.2 for an example.) The unzipped file

size of all five weather variables being considered is 8GB as they contain the entire

European dataset. In total 32 stations are used in the final data set all having five

variables per station resulting in 160 individual text files which need to be extracted

from the data set for actual use. Each txt file contains on average 25,591 lines of

data which results in 4.1million lines of raw data. The five average daily variables

used are: - Temperature (°C), Rainfall (mm), Cloud Cover (Okta’s), Wind Speed

(m/s) and Humidity (%).

3.2.3 Data Set 3 - Stadiums

The stadium is the location at which each match is played and provides the

physical link to the nearest weather station. In a league structure each game

played is always based on the home team location as listed on the match results

which is used in conjunction with three online stadium databases (Appendix 8.3.3)

and Google Maps to physically locate every stadium. Extracting the stadium name

information was undertaken manually for each participating team. Once each

name was obtained the stadiums geodecimal coordinates using the Google API

can be accessed through R to provide Geolocation, distance, mapping and altitude

information to match the nearest weather station.

3.2.4 Data Set 4 - Weather Stations

The weather station data is obtained from ECA&D (2014) as a separate entity to

the observation data which provides a list of every weather station within Europe.

The lists contain latitude and longitude information which is the critical geolocation

information to match each stadium (and therefore the match played) to the actual

weather observation data. As more than one stadium will be covered by a single

weather station and some weather stations are located in areas nowhere near any

Page 24: The effects of weather on football matches played within the German Bundesliga

- 23 -

of the stadia, as well potential errors in the recorded observation data for some

stations, there will be a subset of the stations used from all those covering

Germany and some crossover. An example of the weather station data lists is

provided in Appendix 8.3.4. In total 32 stations out of the 84 viable possibilities

were used in the study.

3.2.5 Limitations of the Data

The data sets are considered to be relatively robust and accurate in their individual

states. The limitation is the way in which they are being combined in particular with

the weather data. The weather data provides numeric data, for example rainfall,

for every 24 hour period reflecting daily figures. Football matches last 105 minutes

including the half time break which represents only a portion of this overall weather

period or 7.3% overall. The weather data figures being used may not be exactly

representative of the actual conditions experienced when the match was played,

although they do represent the general average weather conditions of that day. In

that regard the study is not looking at specific days as such but considering any

overall trend between warmer and cooler periods, or wetter and dryer periods of

time.

German football teams can have multiple variations on their names which are in

current use. The match data files and stadium database names lack unique

identifiers and are sufficiently different to prevent string matching using tools like

Levenshtein distance requiring manual selection of stadium names for each team.

Stadium names also vary as many are named after sponsors which can vary each

year. Commonly used, older names may prevail and are not always consistent with

current Google Map information. As such geocoding for stadiums needs to be

checked manually to ensure the right stadium has been matched.

Page 25: The effects of weather on football matches played within the German Bundesliga

- 24 -

3.3 Data Processing

3.3.1 Introduction

The study’s objective is to be able to analyse football match outcome with weather

effects. To achieve this the four datasets as outlined above must be merged into a

coherent and valid single data frame for analysis. Each of the four sets needs to

be treated both individually to remove unwanted information and ensure consistent

formatting but also be combined to match viable weather observation and weather

stations with stadium locations. As the data flow diagram in figure 2 shows this is

not just a simple join of two data sets but an iterative process where both weather

stations and their observation data need to be verified and checked.

In calculating this there are two key difficulties. Firstly weather stations are located

at a variety of altitudes and some are in mountainous regions which represent a

significant height and therefore temperature differential making them unusable.

Secondly, observation data for some variables is missing to such an extent that it

also renders that station unusable. The size and number of these files makes it

impossible to manually check observation files.

Removing stations requires the distance matrix to be recalculated and the next

nearest station selected with altitude and observation checks undertaken again.

This process continues until no more errors are found. The overall data flow

diagram is shown below in figure 2. The colours shown indicate generally each

data sets role on the overall process with blue being the weather station, green the

match data, grey the weather observation data and red the stadium data. The

merged data set is denoted orange at the point where all four sets are combined.

A detailed description of the key processes are outlined below.

Page 26: The effects of weather on football matches played within the German Bundesliga

- 25 -

Figure 2: Data flow diagram

Weather Station Data set

Match Data set

Stadium Data set

Weather Observation Data

Final combined Data Set

Page 27: The effects of weather on football matches played within the German Bundesliga

- 26 -

3.3.2 Football Match Data

All 42 files were downloaded simultaneously from ECA&D (2014) with the Firefox

web browser using the ‘Download it All’ add on package. The files were vertically

bound into a single data frame and the blank rows were removed and all unwanted

columns except the six key data fields (Division, Date, Home & Away Team name

and Home and away team score) which are constant across all years retained.

Some team names are corrected at this stage as they have been written differently

over time. A list of all unique teams was generated from this for each division which

will eventually form the basis of the stadium data set.

3.3.3 Stadium Data Set creation

The key part of the stadium set creation process is manual in that stadium names

must be selected from a suitable database and entered into a csv file. Because of

the similar multiple team names string matching tools such as those that employ

Levenshtein distance were found to be not feasible for use. Four online databases

as outlined in Appendix 8.3.3 were used to obtain the correct stadium name. Once

done the Google API was used through the geocode package within R Studio to

automatically generate latitude and longitude co-ordinates for each stadium.

3.3.4 Distance Matrix Calculation

Calculating the distance between each stadium and weather station is a critical

part of the process. Two approaches were taken to calculate these distances. The

primary method uses a point distance calculation which takes each pair of stadium

coordinates and calculates the distance to every weather station on the current list

as shown in figure 3. This output matrix consists of three columns; Stadium Name,

Weather Station ID (STAID) and the distance in metres between them. The matrix

is then reduced by using a minimum distance function to output a final list of each

stadium and its nearest single matching weather station.

Page 28: The effects of weather on football matches played within the German Bundesliga

- 27 -

Figure 3: Distance matrix Calculation Code in R Studio

Finally this list is merged with the original stadium list which adds the weather

station ID (STAID) as a new column against each stadium name. STAID is critical

as a unique identifier to merge the weather observation data later on as the

weather observation data uses this.

The process has a manual element in that the observation weather files for each

of the five variables need to be selected (based on the results of the distance

calculation) and moved into a working folder for merging and checking later on.

Subject to this check files may need to be subsequently removed or added and

also the list of weather stations needs to be manually corrected to remove stations

that contain missing observation data before the process is repeated again.

3.3.5 Checking Altitude Differential

While the weather station raw data includes altitude data the stadium altitude must

be obtained using the RJSONIO package within R. It is the height difference that

is being checked. A large difference between the stadia and weather station could

result in the temperatures being non representative of the climate. Height

differentials greater than 400m or about 2.5°C are omitted from the study.

Page 29: The effects of weather on football matches played within the German Bundesliga

- 28 -

3.4 Dealing with Missing Data

The weather observation data contains a variety of missing values. These are

identified during the checking process. Some missing values are one offs having

no discernible pattern and are classified as Missing Completely at Random

(MCAR) as they are no more or less likely to be missing than any other value.

However, the error files also highlighted data which is almost certainly Missing Not

At Random (MNAR) such as October 2001 and February 2014 being months that

had missing values for a large number of all the weather stations. Generally

missing data was in three categories; 1) MCAR and typically one or two

consecutive days only, 2) MNAR and typically 10-20 days for specific recurring

months and years (figure 4), 3) Large scale missing data for multiple months and

often years for an entire weather variable.

There are a variety of ways to deal with missing data. The approach taken in this

study is to try and preserve as much of the data as possible as the percentage of

weather observation data affected is small. Of the 241,753 total lines just 1323

lines contain errors equating to 0.5%. If this was evenly represented when the files

are joined then there would be 70 lines out of 12,926 with errors. However, as

match data is based on weekly occurring events and errors are clustered across

10-20 consecutive days the real error rate would be much lower. Deleting these

values was considered but using imputation (Yuan, 2010) was decided on to

provide an unbiased replacement values for those occurrences.

Page 30: The effects of weather on football matches played within the German Bundesliga

- 29 -

Figure 4: Example of missing data for wind speed for 12 days in Sept 2001

The use of the Amelia package (Cran, 2013) which is accessed through R Studio

allows for missing values to be imputed in lieu of NA values. Filling in the missing

values using an algorithm that approximates a best fit based on values on either

side avoids bias and the package has aspects that make it highly applicable to

time series data.

3.5 Mashup

A mashup is taking independent data sets and recombining them to reveal

previously unknown information. The three separate data sets are now combined

to create a new and unique data set. As the station codes (STAID) were added to

the stadium data set earlier the weather station list is not required as the STAID

code bridges the gap between each stadium and the observation data allowing

them be merged.

STAID DATE RR HU TG CC FG

51 19/09/2001 12.8 86 10.7 6 -999.9

51 20/09/2001 11.9 97 12.6 8 -999.9

51 21/09/2001 0.2 86 13.1 6 -999.9

51 22/09/2001 0 84 10.6 5 -999.9

51 23/09/2001 0 84 10.9 5 -999.9

51 24/09/2001 0 88 13 6 -999.9

51 25/09/2001 0 93 10.8 7 -999.9

51 26/09/2001 0 83 13.3 4 -999.9

51 27/09/2001 0 82 15.8 5 -999.9

51 28/09/2001 0 79 15.3 4 -999.9

51 29/09/2001 6.4 92 13.8 7 -999.9

51 30/09/2001 0.7 89 16.3 5 -999.9

Page 31: The effects of weather on football matches played within the German Bundesliga

- 30 -

3.6 Feature Engineering

Feature Engineering is the process of extracting even more potentially useful

information from the existing data (Domingos, 2011) and is an important process

in knowledge discovery and analysis especially in data mining. In total 14 new

columns were created each containing a unique variable that allows for enhanced

subsetting, simplification, grouping or analysis of the data set. The new variables

each fall into one of four categories; Time, Weather, Goal Outcome or

Geographical.

Time Variables

Month: Extracted from the date field. Creates 12 standard calendar months;

January, February, March etc. (Categorical)

Year: Extracted from the date field. Creates a four digit numeric value for the year.

(Numeric)

Season: Groups together 3 consecutive months to generate spring, summer,

autumn and winter categories. Summer is June, July & August. (Categorical)

Weather Variables

BScale: Creates a simplified categorical wind variable based on the Beaufort scale

of measurement with values from 0 to 9. (Numeric)

HUscale: Creates a simplified scale from 1 to 10 for humidity range in 10%

increments. (Numeric)

Rain: Rain is grouped into 6 categories; ‘No Rain’, ‘Light rain’, ‘Moderate’, ‘Heavy’,

‘Very Heavy’ and ‘Violent’. (Categorical)

Atemp: Apparent Temperature is derived from temperature, humidity and

windspeed and provides a ‘real feel’ equivalent (Steadman, 1984.) (Numeric)

Page 32: The effects of weather on football matches played within the German Bundesliga

- 31 -

ATempScale: A simplification of the above range into 5 general temperature

ranges each of 10 degrees Celsius each. (Categorical)

Goal Outcome

TotalGoals: The total goals scored per match. (Numeric)

GDiff: The difference between home team goals and away team goals. (Numeric)

OverUnder2.5: Calculates based on total goals whether the result is ‘Under’ 2.5

goals or ‘Over’ 2.5 goals (Categorical)

H_A_Win: Creates a match result value of ‘HomeWin’, ‘AwayWin’, or ‘Draw’

(Categorical)

Geographical

Area: Germany is comprised of 16 states (Appendix 8.2.3) which can be obtained

using the ‘output=more’ parameter within the geocode function within R Studio.

Region: Combines the Areas above into larger general geographical weather

regions based on typical climate conditions experienced around the country.

(Encyclopedia Brittanica, 2014 and Appendix 8.2.3)

Page 33: The effects of weather on football matches played within the German Bundesliga

- 32 -

4 Testing, Evaluation & Error Checking

4.1 Introduction

The processing stage as outlined in figure 2 and section 3.3 was subjected to

rigorous testing, error checking and evaluation at every stage to ensure that the

data sets were correct and accurate.

4.1.1 Test 01 – Check Downloaded files

Downloaded files including football data, weather observation and weather station

are checked visually to make sure the download was successful and the files have

been obtained. A check of one or two files is made by opening then to validate

integrity and that the information is present as expected.

4.1.2 Test 02 – Football match data set

1) Test for NA rows and remove from data frame.

2) Check final row count (12,926) against matches played based on number of

participating teams.

3) Unique team names checked & verified. Duplicates and spelling corrected to

remove ‘false copies’. Checked against Fussball Archiv (2014)

4.1.3 Test 03 – Weather Station data

1) Check for weather variable with least weather variables using nrow.

2) Check lat/lon coordinates have been correctly transformed into decimal codes.

4.1.4 Test 04 – Stadium Data

1) Check number of teams in stadium list is correct.

2) Check stadium names are valid within Google Map API using geocode.

4.1.5 Test 05 – Distance Matrix

3) Create error file of all combined -9999 values for weather observation data

based on current weather station selection. Remove large scale missing values

greater than 30 consecutive days.

Page 34: The effects of weather on football matches played within the German Bundesliga

- 33 -

4) Spot check results using single line code version and goggle maps.

4.1.6 Test 06 – Imputed Data

To ensure that the imputation process did not fundamentally adjust the original

data a statistical summary was undertaken before and after. Analysis found that

the imputed data had no overall effect on the original data. In particular no values

occurred at the extreme ranges and neither the mean nor median was altered.

Figure 5: Weather data before Imputation

Figure 6: Weather data after Imputation

If the imputation process had altered the data significantly then removing these

values would have been the next best alternative course of action.

Page 35: The effects of weather on football matches played within the German Bundesliga

- 34 -

4.1.7 Test 07 – Final Checking

Prior to feature engineering the final merged data set is checked for: -

Row count – Missing rows may indicate a missing weather observation file which

would remove all associated values during the join process. Use nrow.

NAs – Values missed by the imputation process. Filter for NA, -9999 values and

use summary analysis to check.

Validity – Several random rows are checked against the raw data sets to ensure

the right dates have been joined with every weather variable. Manual check.

File is saved as a csv file to preserve all processes and make retrieval easier for

the next stages.

Page 36: The effects of weather on football matches played within the German Bundesliga

- 35 -

4.2 Data analysis

4.2.1 Introduction

The study’s primary objective is to determine if any relationship exists between

weather effects and goal outcome and R Studio provides a platform to investigate

the data set using both inbuilt and additional packages. Not all of the data set

contains information that will be analysed and categorical data cannot be analysed

using traditional statistical methods although it can be tabulated, graphed and

explored using R Studio and data mining techniques.

4.2.2 Data Analysis Process and Methodology

The analysis is split into 4 main sections: -

1) Exploratory. Basic investigation of the variables using histograms and tables

to describe the main variables being considered.

2) Goal Outcome Analysis. Each of the four goal outcome variables is compared

against; Apparent Temperature, Rainfall, Wind Speed, Humidity, Cloud Cover,

Calendar Month, Season and Region. In each case the categorical expression

is used, as developed through feature engineering, to generalise the results as

much as possible. If any potential trend is identified then where possible this is

tested using appropriate inferential statistics.

3) Altitude Analysis. A general overview of altitude.

4) Scatter Plots and Correlation. The raw numeric weather data is compared to

total goals and goal difference to determine if any relationship exists. A multiple

linear regression model will be assessed to see if one or more of the variables

are related.

5) Data Mining. Naïve Bayes will be used for the Under/Over 2.5 event to

determine if the spread of matches can be predicted using weather variables

by splitting the data set into 80% training and 20% test sets.

Page 37: The effects of weather on football matches played within the German Bundesliga

- 36 -

4.2.3 Exploratory Analysis

Descriptive analysis allows for a general overall understanding of the data to be

gained and will include bivariate analysis within. There are four categories of data

to be considered; Time, Goal Outcome, Weather Observations and Geography.

Both Time and Geography are in part weather items as they contain variables such

as season and region (Encyclopedia Britannica, 2014) which are associated with

a general weather type. From the 31 variables in the data set 13 are numeric, 12

are categorical and 6 are unused. The numerical values can be directly

investigated using the statistical function describe. A full list of all the data set

variables is provided in Appendix 8.4.

Table 1: Summary Overview of numerical variables

Variable mean sd median trimmed mad min max range se

FTHG 1.63 1.33 1.00 1.50 1.48 0.00 9.00 9.00 0.01

FTAG 1.17 1.13 1.00 1.02 1.48 0.00 9.00 9.00 0.01

alt 154.18 147.09 105.00 129.16 94.89 4.00 553.00 549.00 1.29

RR 1.94 4.04 0.10 0.94 0.15 0.00 49.90 49.90 0.04

HU 77.40 12.08 79.00 78.33 11.86 24.00 100.00 76.00 0.11

TG 9.10 6.38 9.20 9.11 6.52 -15.80 28.70 44.50 0.06

CC 5.43 2.18 6.00 5.67 1.48 0.00 8.00 8.00 0.02

FG 3.48 1.78 3.10 3.28 1.63 0.00 21.60 21.60 0.02

TotalGoals 2.80 1.71 3.00 2.72 1.48 0.00 13.00 13.00 0.02

GDiff 0.46 1.78 0.00 0.46 1.48 -8.00 9.00 17.00 0.02

BScale 2.51 0.90 2.00 2.47 1.48 0.00 9.00 9.00 0.01

HUscale 8.19 1.24 8.00 8.28 1.48 3.00 10.00 7.00 0.01

Atemp 5.76 7.67 5.40 5.65 8.15 -20.30 30.10 50.40 0.07

A preliminary approach (table 1) reveals some useful information for the data set

as a whole. As expected there is a higher number of home goals scored than away

goals although both have a range between zero and nine at the upper end which

is very high compared to the mean indicating these high scoring events are very

rare. Goal difference is positive indicating that home team wins are again more

prevalent. For the weather data rainfall is typically low with occasional high

Page 38: The effects of weather on football matches played within the German Bundesliga

- 37 -

downpours based on the mean and range. Humidity is often quite high as is the

cloud cover. The Apparent temperature sees a lower mean than temperature alone

suggesting that wind speed plays a greater role in reducing the feels like

temperature than humidity does in increasing it.

A range of graphs and charts considering goal structure and distribution and the

primary weather variables are considered.

Figure 7: Mean goals scored per match for all teams

Mean goals follow a relatively smooth curve although the top 7 teams do show a

stepped increase in average goals scored. Means goals scored are 2.8.

Page 39: The effects of weather on football matches played within the German Bundesliga

- 38 -

Figure 8: Total goals scored for all teams

Total goals in figure 8 scored show a stepped divide where teams have either

scored 800 goals or more or less than 450 goals with only one team between these

limits. Some teams would have survived for a longer time period and in Bundesliga

2 (B2) there is much more volatility in team movement as there have been 66

teams that have played in B2 over the 21 years compared to 37 in B1.

Figure 9 over the page shows the distribution of total goals for each stadium. The

Allianz Stadium in Munich has the highest goal density because it has two teams

that have consistently participated in either the B1 and or B1 league for the entire

21 year period. Generally the overall density of goals scored is highest in the west

of the country where most teams and stadia are located.

Page 40: The effects of weather on football matches played within the German Bundesliga

- 39 -

Figure 9: Total Goals scored per Stadium

Figure 10: Goal Distribution for B1 (top) and B2 (bottom)

Bundesliga 1

Mean = 2.87

sd =1.71

Bundesliga 2

Mean = 2.72

sd = 1.72

Page 41: The effects of weather on football matches played within the German Bundesliga

- 40 -

Figure 10 shows goal distributions are positively or right skewed although the goal

distribution structure, mean and standard distribution is virtually identical between

the two leagues with B1 having a slightly higher overall scoring average then B2.

Figure 11: Goal Difference distribution histogram

As with Goals scored the Goal difference in figure 11 between B1 and B2 is

similarly matched.

Page 42: The effects of weather on football matches played within the German Bundesliga

- 41 -

Figure 12: Histogram of Average Temperature

Average temperature is normally distributed as shown by figure 12 above with a

mean temperature of 9.1°C.

Figure 13: Histogram of Rainfall for Germany for all 21 years

Rainfall is very heavily positively skewed to the right as in figure 13 with the majority

of all instances, 6500 days (over 50%) in the data set, having no recorded rain at

Page 43: The effects of weather on football matches played within the German Bundesliga

- 42 -

all. Heavier periods of rain are increasingly intermittent and rare with most rainfall

being either light or moderate.

Figure 14: Histogram of Cloud cover

With a median of 6 Germany can be considered a fairly overcast country with cloud

cover being a prominent feature year round (as in figure 14) occurring at any time.

In fact clear days with little or no cloud (categories 0, 1 and 2) combined account

for just 12.5% of the total time period.

Page 44: The effects of weather on football matches played within the German Bundesliga

- 43 -

Figure 15: Histogram of wind speed

Wind speed is typically low with almost all occurrences being less than 6m/s wind

speeds. This roughly equates to Beaufort scale 3 and accounts for 87% of all

matches in the data set with only 13% of all matches seeing higher wind speeds.

Figure 16: Histogram of density for average temperature and season.

Page 45: The effects of weather on football matches played within the German Bundesliga

- 44 -

An analysis of season make up and average daily temperature as in figure 16

reveals distinct distributions for summer and winter with some slight overlap and

each having an overall warm and cold average temperature respectively. Spring

and autumn however are much more similar having an almost identical average

temperature and distribution. Analysis shows that the ranges, quartiles and

medians of these two seasons are almost the same. The other weather factors

show even less distinction between seasons with significant overlap and with wind,

humidity and rain occurring at nearly all times of the year.

4.2.4 Total Goals Analysis

Total Goals are expressed in the plots below as average goals (of the total) to

provide a more meaningful comparison between groups for each weather variable.

The division grouping is also used in these plots to determine if there are any

differences between these groups. The average goals scored for all matches

played is 2.8 which is slightly higher at 2.88 for B1 and 2.72 for B2.

Figure 17: Total Goals (Mean) graph for Apparent Temperature and Rainfall

Page 46: The effects of weather on football matches played within the German Bundesliga

- 45 -

Apparent temperature appears to indicate a decrease in total goals scored as it

decreases. 32.3% of all games played are below zero degrees. An Anova test of

Total Goals against apparent temperature returns a p value of 5.62e-06 indicating

that the groups are not identical and there is a statistical difference between them.

A boxplot analysis in figure 18 shows the 5 different groups. However, in real terms

20% of all games are played in the winter (60 per season) and a 0.3 drop in

average would be equivalent to 150 goals rather than an average 170 being

scored, a difference of 20 for that season. The balancing of the OverUnder2.5

statistic however flattens the odds compared to other times when the home team

has an advantage (See 4.4.2.) Statistically the effects are noticeable but small

enough so that they are not practically important such that goal outcome is

significantly affected.

Figure 18: Boxplot of Apparent Temperature ranges against Total Goals

Rainfall indicates a possible increase in goals scored for very heavy rain however

this category is comprised of only 49 matches. The violent category has only 5

matches. While a sample size of 30 is typically considered statistically sufficient,

Page 47: The effects of weather on football matches played within the German Bundesliga

- 46 -

in this case the unknown nature of the occurrence (see Limitations of the Data,

3.2.5) makes it less reliable and the results should be treated with suitable caution.

There is no clear indication that rain affects total goals scored in relation to a

particular trend or pattern although it may act as an interference factor.

Figure 19: Total Goals graph for Wind Speed and Humidity

The wind scale histogram demonstrated that the majority of all occurrences (87%)

were within the first four categories which are relatively low wind speeds. At

Beaufort Scale 6 (BS6) there are just 28 matches followed by 3, 6 and 2 matches

for BS7, 8 and 9 respectively. While the average scores in figure 19 indicate that

wind is having an interference effect by increasing goals scored the small sample

is too low to draw any reliable conclusions.

Likewise humidity scale 3 also consists of just 7 matches (2 for B1) and while there

are 67 for scale 4 the sample size is also probably too low. Analysis of the 7

matches shows that these were all played in mild conditions with no other

significant weather effects being present at the time. There is no clear trend or

pattern that humidity has an effect on Total goal outcome.

Page 48: The effects of weather on football matches played within the German Bundesliga

- 47 -

Figure 20: Total Goals scored for Cloud Cover and Month

Cloud cover in figure 20 shows no clear trend or pattern and although B1 teams

seem to perform very slightly better in clearer conditions the same cannot be said

for B2 teams. The overall effect is minimal. When looking at the monthly totals a

typical football season in Germany runs from August through to May. However

there are some years which started early in July (86 matches) and finished later in

June (128 matches). May and June, representing 10% of the data, together seem

to show much higher averages at the end of the season which could be from rising

temperatures at this time but could also be due to factors other than weather.

Page 49: The effects of weather on football matches played within the German Bundesliga

- 48 -

Figure 21: Mean Goals Scored by Season and Region

In Bundesliga 1 average goals scored remain high all year but there is a small but

noticeable drop of 7.5% from 2.95 to 2.73 average goals scored as the season

moves from autumn into the winter period. In Bundesliga 2 this is not as noticeable

at a 3.25% drop although the winter period does reflect the lowest period of

average goals scored. A Kruskal Wallis test applied where the data is not normally

distributed shows that this is statically significant although again practically not as

useful. Playing in the east of the country (408 B1 and 1079 B2 matches for this

region) seems to affect B1 teams with average goals lower compared to any other

region. The south East sees the biggest differential between B1 and B2 teams

despite a comparable 850 B1 matches and 1005 B2 matches being played there.

Kruskal Wallis tests indicate no statistical significance for humidity and cloud cover

indicating that those groups are very similar. All other variables show a difference

between groups.

Page 50: The effects of weather on football matches played within the German Bundesliga

- 49 -

4.2.5 Over/Under 2.5 goals scored

The overall over/under (OU) scores for the data set are 6818 (over) versus 6108

(under). This reflects an average proportion of 52.7% over, versus 47.3% under

which is a fairly even spread overall.

Figure 22: OU results against Apparent Temperature and Rainfall.

Apparent Temperature in figure 22 (left) appears to show the OU ratio flatten out

at temperatures of zero degrees and below compared to temperatures above

freezing where the over metric has a clear advantage. It is also slightly flatter at

higher temps in the 20-31 category. Rainfall does not appear to show any change

in OU trends for changing rainfall intensity.

Page 51: The effects of weather on football matches played within the German Bundesliga

- 50 -

Figure 23: OU against wind speed and humidity

In figure 24 (left) and at Beaufort scale 4 (1382 matches) and above there is no

distinct difference between OU, in fact the under 2.5 goals scored is slightly higher

for BS4 and flattened for BS5. Wind is potentially having an interference effect by

reducing the number of goals being scored slightly. Humidity (right) however

shows no trend or pattern with all ranges being similar.

Figure 24: OU against Cloud Cover and Calendar month

Page 52: The effects of weather on football matches played within the German Bundesliga

- 51 -

Cloud cover appears to have no measurable effect on OU outcome. A monthly

breakdown indicates a very slightly higher differential in Aug, Sept, Oct and Nov

around 1% above average. However, May sees over scores 59% of the time and

June 65% of the time which is a noticeable shift against the average odds.

However, there has not been a June match since 2000 although matches in May

do occur every year.

Figure 25: Mean Goals Scored by Season

Winter shows some levelling out effect compared to the other three seasons due

to lower mean goals scored. Region seems to be having some effect with the East

going against the overall average and the south East being generally flat compared

to the coastal and west regions.

Page 53: The effects of weather on football matches played within the German Bundesliga

- 52 -

Figure 26: Apparent Temperature and Wind Speed Proportion Graphs

Two graphs of further interest are Apparent Temperature and Wind speed. These

are shown above in figure 26 above normalised to show proportional difference

between groups. For apparent temperature there is less of a difference between

the ranges than we may believe from the previous plots although the probability of

an over result increases slightly to 55% for the two warmest categories. At Beaufort

scale 6, 7 and 8 over results are achieved 71%, 67% and 67% of the time

respectively. However the number of matches played at just 28, 3 and 6 for the

three ranges respectively is a very low sample size. The switch to all results being

under for the two matches played at BS9 highlight the potential volatility of this

although there is some evidence that high wind speeds are potentially causing

interference leading to higher goals scored.

Page 54: The effects of weather on football matches played within the German Bundesliga

- 53 -

4.2.6 Goal Difference

Goal difference is numeric representation of the Home Away Draw result with a

value of zero being a draw and a negative value representing an away win and by

how much and a positive value a home win. There is a very large range with a goal

difference of -9 to +9 providing 18 variables to be plotted against each weather

effect which is largely unpractical. Again a proportional plot is used to determine if

there are any trends or patterns across the data as a whole as a frequency plot is

ineffective for such large ranges. Even with a proportional plot much of the goal

differences beyond +3 and -3 are not visible as the number of games featuring

such scores beyond these values is very low.

Figure 27: Goal Difference (GD) vs Apparent Temperature and Rainfall

Apparent temperature indicates more draw results (GD=0) towards lower

temperatures and less home wins as per the Home/Away/Draw results. There

appears to be a slight increase in away win scores at -3 and -4 at very low

temperatures. Rainfall again shows interference for heavy and violent rainfall but

as 99.6% of all matches played are within the four left categories no clear trend

can be inferred given the very low sample size.

3 2

1

0

-1

-2

99.6% of

all matches

played

Page 55: The effects of weather on football matches played within the German Bundesliga

- 54 -

Figure 28: Goal Difference vs Wind speed and Humidity

Wind speed in figure 28 shows the same interference at BS6 and above with the

number of draw results increasing slightly as wind speed increases at lower

speeds. There is no clear trend for humidity.

Figure 29: Goal Difference vs Cloud Cover and calendar month

Figure 29 indicates no trend or pattern between GD and cloud cover or month.

Page 56: The effects of weather on football matches played within the German Bundesliga

- 55 -

Figure 30: Goal Difference vs season and Region

Figure 30 indicates no significant trends or patterns between season and region

for goal difference. On the whole the graphical representation of goal difference as

a more detailed expression of Home/Away/Draw result did not yield any more

useful or additional information that those results did not provide.

Page 57: The effects of weather on football matches played within the German Bundesliga

- 56 -

4.2.7 Home/Away Win or Draw

For the data as a whole there are 6074 Home wins (47%), 3395 Away wins (26.3%)

and 3457 Draws (26.7%). The Home/Away or Draw results (HAD) are represented

as proportion plots as the number of matches played between groups was so large

using frequency plots made any visual analysis unfeasible.

Figure 31: HAD vs Apparent Temperature and Rainfall

Apparent Temperature in figure 31 (left) shows a slight decrease in home win

advantage as temperature decreases. Away wins don’t increase however and the

difference is made up by draw results. Rainfall results show a fairly stable HAD

outcome except for the last two ranges which have very low sample sizes.

Page 58: The effects of weather on football matches played within the German Bundesliga

- 57 -

Figure 32: HAD vs Wind Speed and Humidity

Wind in figure 32 (left) shows a slight decrease in home wins as wind speed

increases from BS2 to BS5, after which there is significant interference. Home wins

appear to be benefitting but the small sample size precludes drawing any

conclusions. Humidity indicates a stable pattern of HAD results excluding

categories three and four, which have low sample sizes.

Figure 33: HAD vs Cloud Cover and Calendar month

Page 59: The effects of weather on football matches played within the German Bundesliga

- 58 -

Figure 33 for cloud cover and calendar month show no variation in HAD results.

Figure 34: HAD vs season and Region

Both graphs in figure 34 show no significant changes in HAD result.

Home Away or Draw results do not seem to be affected by weather effects overall.

There is some evidence that falling temperature reduces home wins and increases

draws slightly and that increasing wind speed also reduces home wins. However

the effect is very small. As with previous results the low sample size for the extreme

ranges of wind and rainfall prevent drawing any conclusions.

Page 60: The effects of weather on football matches played within the German Bundesliga

- 59 -

4.3 Altitude Effects

Altitude, while not a direct weather effect does affect player performance through

a reduction in atmospheric pressure and is therefore an external environmental

effect which could be considered an indirect weather effect.

Figure 35: Goal Outcome against Altitude

Mean goals scored generally trend downward with altitude very slightly and drop

lower to just 2.38 where the altitude is 500m or over. The UO metric shows a

greater skew towards under results at 59%. There have been 323 games played

at 500m or more at two stadia which is a reasonable sample size. Pressure tends

to be constant so there is reduced risk as to matches and pressure occurring at

the same time. The issue with these results is that individual team results are

affecting the results. Unterhaching & Augsburg are the residents at these stadia

and they have goal averages of 2.25 and 2.58 respectively based on figure 7

represent very low scoring teams overall and this is what is being represented

rather than the effects of altitude.

Page 61: The effects of weather on football matches played within the German Bundesliga

- 60 -

4.4 Scatter plots and correlation

In looking for any linear relationship and correlation the dependant value Total

Goals is used against the five numeric independent values of temperature (°C),

wind speed (m/s), humidity (%), cloud cover (Okta’s) and rainfall (mm). The

addition of the B1 and B2 variable shows how the two Leagues vary between each

other. The output results are shown in figure 36 below.

Figure 36: Pairs and correlation analysis of numeric data

Overall there appears to be no real correlation between any of the weather effects

and total goals scored with temperature having the best match at just 0.05

indicating no real correlation. This would appear to be supported by the Goal

outcome analysis which only showed evidence of small changes overall and at

extreme ranges and no real obvious trends or patterns.

Page 62: The effects of weather on football matches played within the German Bundesliga

- 61 -

Figure 37: Scatter plot of total goals against Daily average temperature

The lack of relationship is highlighted with the scatter plot in Figure 37 above which

shows an almost flat linear regression line between temperature and Total Goals

providing a correlation value of just 0.0527. A multiple linear model was also

created and tested using all the available weather values and created for both Total

Goals and Goal Difference. The results were conclusive in that no relationship

exists between them whatsoever.

Page 63: The effects of weather on football matches played within the German Bundesliga

- 62 -

4.5 Data Mining and Predictive Modelling

Although the results indicate there is no relationship between goal outcome and

weather data mining techniques could allow for knowledge discovery and

potentially even predictive modelling. An approach based on a Naïve Bayes

algorithm was selected as it uses data from previous events to try to predict future

outcomes. The event being predicted is the OverUnder2.5 goals variable using the

featured engineered weather factors; Rain, HUscale, CC, ATempScale and

BScale. These are all factorised for use within the algorithm. The advantage of

naïve bayes is the fact that it assumes all of the features in the data set are of

equal importance which makes it good at detecting potentially weak effects as

could be likely in this instance.

The data set is split into an 80% training and 20% random test data sets to allow

the algorithm to learn any rules and then apply them in predicting goal outcome on

the test data set. The results on the 2586 row test data set showed that 1366 were

correctly predicted equating to a 52.8% success rate. As the Over Under metric is

essentially an even spread this indicates that the naïve bayes predictive model

using weather variables is potentially no better than making a guess where the

probability of getting a right result is essentially 50/50. An output of the Naïve

Bayes probabilities is provided in Appendix 8.5.

4.6 Analysis Conclusion

The study has shown that mean goals scored are in fact reduced slightly as

temperature drops, which is reflected seasonally, with much colder apparent

temperatures of below -10°C seeing a slight further drop. While statistically this is

relevant the overall change in mean goals is not sufficiently different from a

practical perspective to have any meaningful impact. Humidity, Cloud Cover,

Rainfall and wind speed appear to have no measurable effect whatsoever on goal

outcome although wind and to some extent rainfall show interference at the

Page 64: The effects of weather on football matches played within the German Bundesliga

- 63 -

extreme limits for just 0.4% of the total matches played. These extreme events are

too rare and the overall sample of matches too low to infer any specific pattern or

trend although very high wind speeds seem to increase goals scored and favour

home wins. Realistically, the most we can infer is that there is potentially an

interference effect.

Goal difference, Over Under2.5 and Home/Away/Draw outcomes were all

generally consistent in that they showed no overall trend for weather effects. May

showed a greater number of over match result at 59% which is reasonably

significant. Altitude as an indirect environmental factor did indicate that matches

played at 500m and above skewed the average goals and UO2.5 metric but this

was attributed to those two teams performing on average much worse.

Predictive modelling using the dataset or data mining techniques is not possible

due to the lack of relationship between goal outcome and weather effects. No

pattern or trend could be extracted from the data using a Naïve Bayes algorithm.

Page 65: The effects of weather on football matches played within the German Bundesliga

- 64 -

5 Conclusions

5.1.1 Introduction

The aim of the study was to establish if there is any relationship between weather

effects including temperature, humidity, wind speed, rainfall and cloud cover

against football match goal outcome within the German Bundesliga 1 & 2 leagues

over the last 21 years. The sports industry has been shown to be an economically

important part of western economies as outlined in chapter 1 and teams and

betting product providers are seeking to gain competitive advantage wherever

possible. The study seeks to redress the lack of knowledge with respect to

understanding how or if weather affects goal outcome in football matches. Existing

knowledge in this area cannot answer with certainty to what extent weather

variables affect match outcome or which weather factors play a role. The study

seeks to answer one primary question which is: -

“Is there any relationship between weather effects and goal outcome for football

matches played within the Bundesliga 1 & 2?”

Supporting this a number of secondary research questions were also asked as

outlined in section 1.2.2 and are a summary outline of the findings is as follows:

Is there any difference between B1 and B2 due to weather effects?

The analysis shows that goals distributions are almost identical between the

groups and while B2 has a slightly lower overall average goal score there is no

discernable pattern or trend that would indicate that the second tier is affected any

differently to weather effects than the first tier.

Page 66: The effects of weather on football matches played within the German Bundesliga

- 65 -

Is goal outcome affected by just one, or multiple weather variables?

Goal outcome does not appear to be directly affected by any weather variable.

Slight effects are noted for reduced goal averages with temperature, reduced

home wins with increasing wind speed and higher over results in May. Multiple

linear regression modelling indicates no interaction effects between the variables.

Does regional weather affect games played in that area and goal outcome?

Region doesn’t appear to effect goal outcome and although average goals showed

a greater differential between B1 & B2 leagues all other goal outcomes showed no

change due to region. The UO2.5 spread was found to be unevenly spread with

the probability of an ‘under’ score increasing to 64%.

Can the goal outcome of future matches be better predicted using selected

betting instruments and weather factors?

There is nothing to suggest that the four goal outcomes can predict match results

based on weather factors. High winds and high rainfall showed interference effects

such that wind speeds of BS6, 7 & 8 indicated a 0.7 probability of an over result

and probably a home win result but the sample size is too low (less than 0.4% of

matches) to say with any certainty this is correct. Analysis using the Naïve Bayes

algorithm on predicting the Over Under result outcome was unable to make

predictions beyond a 52% accuracy level which is no better than a random guess.

Do some seasons, months or time periods see goal outcome affected more

due to weather effects?

Seasonally there was a small but measurable drop in average goals scored moving

from autumn into winter. This was matched by a flattening of the Over Under2.5

Page 67: The effects of weather on football matches played within the German Bundesliga

- 66 -

goals scored to an almost even spread. However the effect was slight and

practically has limited applications and impact.

Can goal outcome be better predicted using combined weather indices such

as apparent temperature?

The use of apparent temperature has the effect of extending the temperature range

compared to if just air temperature alone was used. However, there is nothing to

suggest that apparent temperature offers any significant benefit over just

temperature and no predictive advantage can be gained from using it based on its

use in the study.

Is there any relationship between weather effects and goal outcome for

football matches played within the Bundesliga 1 & 2?

Aside from several small effects as outlined above and some possible interference

effects for extreme events there is no relationship between goal outcome and

weather effects. The null hypothesis (see section 1.2.2) is therefore accepted.

5.1.2 Theoretical Implications

Based on the results of this analysis the theoretical assumptions behind the effect

that weather has on football matches which were explored in section 2 should be

re-examined (Kuper and Syzmanski, 2012.) Claims that weather and in particular

cold weather detrimentally affects goal outcome (British Weather Service, 2013)

are not really supported from a more extensive analysis across the 12,926 matches

considered in this study. Recent studies (Perry, 2004) suggested that wind and

rain have a significant impact while temperature does not. However, if anything

this study has found the opposite to be true with temperature having a small effect

overall and wind and rain only having an interference effect, if at all, at the upper

Page 68: The effects of weather on football matches played within the German Bundesliga

- 67 -

extreme limits which are rare, at less than 0.4% of matches played, and due to the

small sample sizes not reliable enough to draw patterns or trends from. However,

the use of internal factors such as betting odds in conjunction with rainfall (Kickdex,

2014) could yield additional information not considered within this study. Other

studies have made generalist statements suggesting that that ‘most sports’ are

affected by humidity, temperature and wind (Pezzoli, Cristofori, Moncalero,

Giacometto and Boscolo, 2013.) but again for the climatic region of Germany these

factors play almost certainly no role in affecting goal outcome and overall

performance. Thornes (1977) suggestion that colder conditions are a hazard to

goalkeepers in maintaining performance are also not borne out by the lower goals

scored in colder conditions although the effect on players extremities overall could

be factor in lower than average goals scored during colder months.

Overall the existing body of knowledge has been perhaps too general in its

approach and has lacked analytical methods to quantify such claims potentially

overplaying the effects weather is really having on sports like football. This could

be avoided by being specific for the sport type being analysed and limited to a

specific geographic region of study to ensure accuracy. Almost certainly a larger

sample size is required to better understand the effects of high wind and rain.

5.1.3 Conclusion

Despite ongoing interest and opinion on the effects of weather on football matches

there does not appear to be any relationship between weather effects and football

match goal outcome. Temperature effects are minimal and while they do reduce

average goals scored in colder conditions the effect is not large enough to be

practically significant and could be attributed to external factors. All other variables

also don’t appear to have any measurable practical impact other than several slight

and often localised effects which are too small or rare in occurrence to draw any

meaningful conclusions except to say they potentially have an interference effect.

Page 69: The effects of weather on football matches played within the German Bundesliga

- 68 -

The weather it seems does not kill goals and football as a sport within the

Bundesliga 1 & 2 Leagues seems to be unaffected by weather effects overall.

Page 70: The effects of weather on football matches played within the German Bundesliga

- 69 -

6 Further development or research

The study undertaken was limited to a single European country and to a specific

time period. The project could be developed to include a greater range of countries

both within and beyond Europe and also include a greater time period of matches

played. There is match data available for the Bundesliga (Fussball Archiv, 2014)

for even earlier time periods although this would of required web scraping to obtain.

By including countries that have a warmer or colder climate such as Norway and

Greece for example the impact of how these teams are affected by overseas

temperature regimes and climates they are unaccustomed to can be investigated

through tournaments such as the World Cup, European champions League and

the Euro’s. Some weather variables provided by the ECA&D (2014) were unused

such as snow depth, wind direction, sunshine duration and pressure although

based on the ones used it is not clear if these would yield any more useful results.

The inclusion of a stadium factor could help investigate the effects stadiums have

as they range from almost totally open fields to being fully enclosed venues with

roofs. This adds further complexity however in data preparation as stadiums have

changed over time due to refurbishment and teams moving.

As part of a broader and long term study the installation of dedicated weather

stations at every major stadium in Europe would allow for the recording of

continuous weather data at the specific point at which matches are played. As

many stadia are also used for athletics events this would also potentially provide

additional study in these areas of sport as well and help broaden the research base

being considered. By measuring some parameters such as wind speed outside the

stadium as well as at pitch level the effect of stadia could be better understood on

how it mitigates weather factors such as wind.

Page 71: The effects of weather on football matches played within the German Bundesliga

- 70 -

7 References

Anderson, C., and Sally, D. (2013) The Numbers Game: Why everything you know

about football is wrong. Penguin Books.

BBC (2014) ‘Football Betting – The global industry worth Billion.’ [Online]. BBC.

Available at: http://www.bbc.com/sport/0/football/24354124 [Accessed 29th May

2014].

British Weather Services (2013) ‘Cold Kills Goals – The stats’ [Online]. British

Weather Services. Available from: http://www.britishweatherservices.co.uk/cold-

kills-goals-the-stats/ [Accessed 15 November 2014].

Advanced Football Analytics (2014) ‘Temperature and Field Goals’, Advanced

Football Analytics. Available from:

http://www.advancedfootballanalytics.com/index.php/home/research/weather/165

-temperature-and-field-goals [Accessed 18th December 2014].

Carré, M. J., Asai, T., Akatsuka, T., and Haake, S. J. (2002) ‘The curve kick of a

football II: flight through the air’. Sports Engineering, 5(4): 193-200.

Cran (2014) ‘Amelia II: A Program for missing data’ [Online]. Cran r-project.

Available from: http://cran.r-project.org/web/packages/Amelia/index.html

[Accessed 15th November 2014].

Deloitte (2014) ‘Premium Blend: A review of football finance.’ [Online]. Deloitte.

Available from: http://www.deloitte.com/assets/Dcom-

Italy/Local%20Assets/Documents/Pubblicazioni/uk-sbg-annual-review-of-football-

finance-2014.pdf [Accessed 15th November 2014].

Domingos, P. (2012) ‘A few useful things to know about machine learning’.

Communications of the ACM, 55(10): 78-87.

Page 72: The effects of weather on football matches played within the German Bundesliga

- 71 -

ECA&D (2014) ‘European Climate Assessment & Dataset Project’ [Online].

European Climate Assessment & Dataset Project. Available from:

http://eca.knmi.nl/ [Accessed 20th September 2014].

Encyclopedia Britannica (2014) ‘Germany - Climate’ [Online]. Encyclopedia

Britannica. Available from:

http://www.britannica.com/EBchecked/topic/231186/Germany/57996/Climate

[Accessed 28th May 2014].

European Commission (2012) ‘Study on the Contribution of Sport to Economic

Growth and Employment in the EU.’ [Online]. European Commission. Available

from: http://ec.europa.eu/sport/library/studies/study-contribution-spors-economic-

growth-final-rpt.pdf [Accessed 1st June 2014].

Football Pools (2014) ‘The Pioneers of Football Pools’ [Online]. Football Pools.

Available from:

http://www.footballpools.com/cust?action=GoHelp&help_page=about_us

[Accessed 1st June 2014]

Fussball Archiv (2014) ‘Das Deutsch Fussball Archiv’ [Online]. Fussball Archiv.

Available from: http://www.f-archiv.de/ [Accessed 15th October 2014].

Gelade, G.A., and Dobson,P. (2007) ‘Predicting the Comparative Strengths of

National Football Teams’ Social Science Quarterly. 88(1): 244-258.

Greenhough, J., Birch, P. C., Chapman, S. C., & Rowlands, G. (2002) ‘Football

goal distributions and extremal statistics’. Physica A: Statistical Mechanics and its

Applications, 316(1): 615-624.

Page 73: The effects of weather on football matches played within the German Bundesliga

- 72 -

Hamilton, H. (2014) ‘Does the cold really kill Goals?’ Howard Hamilton Blog, 1st

May. Available from: http://www.soccermetrics.net/league-

competitions/temperature-vs-goals-study-premier-league [Accessed 24th May

2014].

Hong, Y (eds.) (2014) Routledge Handbook of ergonomics in sport and exercise.

New York: Routledge.

Kickdex (2014) ‘Does rain level the playing field?’, Kickdex Blog, 28th November.

Available from: http://blog.kickdex.com/post/68368668405/does-rain-level-the-

playing-field [Accessed 30 November 2014].

Kraft, M. D., & Skeeter, B. R. (1995) ‘The effect of meteorological conditions on fly

ball distances in North American Major League Baseball games’. The

Geographical Bulletin, 37(1): 40-48.

Kuper, S., and Szymanski, S. (2012) Soccernomics. Philadelphia: Nation Books

Lewis, T. (2014) ‘How computer analysts took over at Britain’s top football clubs’

[Online]. The Guardian, 9th March, Available from:

http://www.theguardian.com/football/2014/mar/09/premier-league-football-clubs-

computer-analysts-managers-data-winning [Accessed 28th May 2014].

MECN (2012) ‘The German Gambling and betting market’ [Online]. MECN.

Available from: http://www.mecn.net/German_Betting_and_Gambling_Market-

Report_Summary.pdf [Accessed 5th November 2014].

McGarry, T., O’Donoghue, P., and Sampaio, J. (eds) (2013) Routledge Handbook

of Sports Performance Analysis. New York: Routledge

Page 74: The effects of weather on football matches played within the German Bundesliga

- 73 -

O'Donoghue, P., and Holmes, L. (2014) Data Analysis in Sport. New York:

Routledge

Paddy Power (2014) ‘Stephen Hawking Exclusive: The math’s that show us how

England can triumph in the world cup’ [Online]. Paddy Power. Available from:

http://blog.paddypower.com/2014/06/18/stephen-hawking-exclusive-the-maths-

that-show-us-how-england-can-win-the-world-cup/ [Accessed 10th December

2014].

Perry, A. (2004). Sports tourism and climate variability. Advances in Tourism Cli.

Peters, D. M., and O'Donoghue, P. (Eds.). (2013) Performance analysis of sport

IX. New York: Routledge

Pezzoli, A., Cristoforu, E., Moncalero, M., Giacometto, F., and Boscolo, A. (2013)

‘Climatological Analysis, Weather Forecast and Sport Performance: Which are the

Connections?’, Journal Climatol Weather Forecasting 1: e105

PKR. (2014) ‘Under / Over Betting’ [Online]. PKR. Available from:

http://bet.pkr.com/en/get-started/bet-types/under-over/ [Accessed 28th May 2014].

R Studio (2014) R studio. [Online]. Rstudio. Available from:

http://www.rstudio.com/ [Accessed 1st September 2014]

Riley, T., Williams, A.M. (eds.) (2003) Science and Soccer. 2nd Edition. London:

Routledge.

Rubel, F., and Kottek, M. (2010) ‘Observed and projected climate shifts 1901–

2100 depicted by world maps of the Köppen-Geiger climate classification’.

Meteorologische Zeitschrift, 19(2): 135-141.

Page 75: The effects of weather on football matches played within the German Bundesliga

- 74 -

Steadman, R.G. (1984) ‘A universal scale of apparent temperature’. J. Appl.

Meteor., 23, 1674-1687.

Stone, P.H., and Carlson, J.H. (1979) ‘Atmospheric Lapse rates regimes and

their parametrization’. Journal of the Atmospheric Sciences, 36(3): 415-423.

Szucs, A., Allard, F., and Moreau, S. (2009) ‘Open Stadium Design Aspects for

cold climates’. PLEA2009 - 26th Conference on Passive and Low Energy

Architecture, Quebec City, Canada, 22-24.

Thornes, J. E. (1977). The effect of weather on sport. Weather, 32(7): 258-268. Wiart, N, J Kelley, D James and T Allen (2011) ‘Effect of temperature on the

dynamic properties of soccer balls’, Journal of Sports Engineering and Technology

225.

Witten, I. H., Frank, E., and Hall, M.E (2011) Data Mining: Practical machine

learning tools and techniques. 3rd ed. Morgan Kaufmann.

World Stadiums (2014) ‘World Stadiums’ [Online]. World Stadiums. Available from:

http://www.worldstadiums.com/ [Accessed 15th September 2014]

Yuan, Y. C (2010) ‘Multiple imputation for missing data: Concepts and new

development (Version 9.0)’. SAS Institute: Rockville.

Page 76: The effects of weather on football matches played within the German Bundesliga

- 75 -

8 Appendix

8.1 Glossary of Terms

App. Temperature Apparent Temperature is a measurement index that combines

air temperature, humidity and wind speed to provide an

equivalent ‘feels like’ temperature.

CSV Comma Separated Value file format.

Goal Difference The difference between the Home and Away goals scored.

This will return a positive value where the home team wins or

negative value where the away team wins.

Goal Outcome Goal Outcome for the purpose of the study refers to the one

of four measurements including: - Under/Over 2.5 Goals, Goal

Difference, Home/Away Win or Total Goals

Home/Away Win Either the home team wins, the Away team wins or it is a draw.

Naïve Bayes A data mining algorithm that is used to classify and predict

based on probabilities.

PCA Principal Component Analysis is a technique used to reduce

the number of variables to a smaller number of components

which account for the largest amount of variance.

UEFA Union of European Football Associations is the administrative

body for association football in Europe.

Under/Over 2.5 A commonly used betting instrument which allows the

customer to bet that the total goals scored within a match will

be less than or greater than 2.5. As average goals tend to be

around 2.8 this is typically regarded as an even spread bet.

The result is binary and will be either under or over.

Page 77: The effects of weather on football matches played within the German Bundesliga

- 76 -

R Studio An open source statistical analysis program used to obtain

knowledge from data sets and provide a range of graphical

and tabular outputs.

Total Goals The total goals scored in a match is the home team goals plus

the away team goals.

Page 78: The effects of weather on football matches played within the German Bundesliga

- 77 -

8.2 Geography

8.2.1 Map of Europe and Germany

Germany as shown above in orange as part of the European Union (EU.)

Image Source: http://wrm.org.uy/wp-content/uploads/2012/12/map-europe-germany.png

Page 79: The effects of weather on football matches played within the German Bundesliga

- 78 -

8.2.2 Map of Germany

Germany Map showing primary cities, towns and major topography like mountains

and plains.

Source: http://www.worldatlas.com/webimage/countrys/europe/lgcolor/decolor.gif

Page 80: The effects of weather on football matches played within the German Bundesliga

- 79 -

8.2.3 States of Germany (and created Regions)

Germany has 16 distinct states. 3 (Berlin, Bremen and Hamburg) are city states.

These were simplified by combining into four regions as shown above.

Coastal and NW Region Bremen, Niedersachsen, Hamburg, Schleswig-

Holstein, Mecklenburg-vorpommern

East Region Brandenburg, Berlin, Sachsen, Thuringen

South East Bavaria

West Northrhine-westphalia, Rhineland-palatinate, Hessen,

Saarland, Baden-wurttemberg

Source: http://www.itcwebdesigns.com/tour_germany/map_german_states.gif

Coastal & NW

East

South West

West

Page 81: The effects of weather on football matches played within the German Bundesliga

- 80 -

8.2.4 Weather Station & Stadium Locations

R Studio Mapping showing all of the 73 stadiums for Bundesliga 1 & 2 and all of

the possible 84 weather stations that contain every weather variable prior to

matching and selecting the nearest. Some weather stations are at map edges or

out at sea and will not be any use. Overall there is a good match between the two

locations although this assumes that all weather stations are viable at this stage.

The Dusseldorf area shown within the red dashed box is shown over the page as

a point of interest.

Page 82: The effects of weather on football matches played within the German Bundesliga

- 81 -

A zoomed map showing the Dusseldorf Area. This area has the highest density of

stadiums and teams linked to a single weather station. 6716 of all games played

(52%) are linked to this weather station ID (479). One of the project risks was

potentially that stations like this were unusable which could jeopardise the entire

study.

479

Page 83: The effects of weather on football matches played within the German Bundesliga

- 82 -

8.2.5 Final weather station and Stadium locations

Map showing the final 32 weather stations with useable observation data for all

five weather variables and all 73 stadiums for both Bundesliga 1&2.

Page 84: The effects of weather on football matches played within the German Bundesliga

- 83 -

8.3 Data Sets

8.3.1 Football Data

Example of a raw data file for Bundesliga 1 results for the 2002/2003 season.

Football-Data (2014) provides a single file in csv format for each season of play

and for each division separately. Data is available from 1993 onwards. There are

typically 306 matches for each season based on 18 teams participating in each

League. There are 52 columns of data per file for most files containing the date,

final time results, half time results, where the game was played and a variety of

betting information. For earlier years not all of this information was recorded so the

files are inconsistent in their structure and layout. Twenty one seasons of matches

(306 games per season) across two leagues equates to 12,926 games played in

total. Note that this is slightly higher than expected as some years featured 20

teams in a season resulting in more games being played. This data is spread

across 42 csv files. The first six columns represent consistent elements across all

42 files from which all four goal outcome measurement metrics can be calculated.

Div Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR

D1 09/08/2002 Dortmund Hertha 2 2 D 2 1 H

D1 10/08/2002 Cottbus Leverkusen 1 1 D 0 0 D

D1 10/08/2002 M'gladbach Bayern Munich 0 0 D 0 0 D

D1 10/08/2002 Nurnberg Bochum 1 3 A 0 2 A

D1 10/08/2002 Schalke 04 Wolfsburg 1 0 H 0 0 D

D1 10/08/2002 Stuttgart Kaiserslautern 1 1 D 1 0 H

D1 11/08/2002 Bielefeld Werder Bremen 3 0 H 1 0 H

D1 11/08/2002 Hamburg Hannover 2 1 H 0 1 A

D1 14/08/2002 Munich 1860 Hansa Rostock 0 2 A 0 1 A

D1 17/08/2002 Bayern Munich Bielefeld 6 2 H 3 0 H

D1 17/08/2002 Bochum Cottbus 5 0 H 3 0 H

D1 17/08/2002 Hannover Munich 1860 1 3 A 1 2 A

D1 17/08/2002 Hansa Rostock Nurnberg 2 0 H 1 0 H

D1 17/08/2002 Hertha Stuttgart 1 1 D 0 1 A

D1 17/08/2002 Kaiserslautern Schalke 04 1 3 A 1 0 H

D1 17/08/2002 Leverkusen Dortmund 1 1 D 1 0 H

D1 18/08/2002 Werder Bremen Hamburg 2 1 H 1 1 D

D1 18/08/2002 Wolfsburg M'gladbach 1 0 H 0 0 D

D1 24/08/2002 Bielefeld Wolfsburg 1 0 H 1 0 H

D1 24/08/2002 Cottbus Hansa Rostock 0 4 A 0 2 A

D1 24/08/2002 Dortmund Stuttgart 3 1 H 1 0 H

D1 24/08/2002 Hamburg Bayern Munich 0 3 A 0 1 A

Page 85: The effects of weather on football matches played within the German Bundesliga

- 84 -

8.3.2 Weather Observation Data

Examples of the five historic weather observation data sets are provided below.

The European Climate Assessment & Database Project (2014) provides data for

weather stations across Europe. The five variables are: - Temperature, Humidity,

Rainfall, Wind speed and Cloud cover.

Temperature

Historic weather data example for Germany for Daily Mean Temperature

The above data sample is taken from station number 494 (Augsburg) for mean

daily temperature. The text files are comma delimited and provide (from left to right)

station number, source identifier, date (yyyy/mm/dd), temperature and quality

code. Each location file contains around 25, 591 lines of data. The temperature is

provided in 0.1degrees Celsius in its current html format and requires a decimal

point to read correctly. For example the first line of data above for the 28th March

records a daily mean temperature of 6.5 Degrees Celsius with no known errors or

missing data. Below freezing levels are identified with a minus symbol (none shown

in example above.)

Page 86: The effects of weather on football matches played within the German Bundesliga

- 85 -

Humidity

Historic weather data example for Germany for Daily Humidity levels

The above ECA&D (2014) data sample is taken from station number 494

(Augsburg) for daily humidity. The text files are comma delimited and provide (from

left to right) station number, source identifier, date (yyyy/mm/dd), humidity in

percent and quality code.

Rainfall

Historic weather data example for Germany for Daily precipitation levels.

The above ECA&D (2014) data sample is taken from station number 494

(Augsburg) for daily precipitation. The text files are comma delimited and provide

(from left to right) station number, source identifier, date (yyyy/mm/dd),

precipitation in html format of 0.1mm and quality code.

Page 87: The effects of weather on football matches played within the German Bundesliga

- 86 -

Wind Speed

Historic weather data example for Germany for Daily mean wind speed.

The above ECA&D (2014) data sample is taken from station number 494

(Augsburg) for daily average wind speed. The text files are comma delimited and

provide (from left to right) station number, source identifier, date (yyyy/mm/dd),

average wind speed in html format and 0.1m/s and quality code. The actual wind

speed is the above figure divided by 10. For example the first value for April shown

above would be 1.5m/s.

Cloud Cover

Historic weather data for Germany for Cloud Cover.

The above ECA&D (2014) data sample is taken from station number 494

(Augsburg) for daily average cloud cover. The text files are comma delimited and

Page 88: The effects of weather on football matches played within the German Bundesliga

- 87 -

provide (from left to right) station number, source identifier, date (yyyy/mm/dd),

cloud cover in Okta’s and quality code. The oktas scale provides a measure of

cloud cover from 0 to 8 subject to the overall portion of sky covered. Zero

represents a totally clear sky while 8 would be totally overcast.

Page 89: The effects of weather on football matches played within the German Bundesliga

- 88 -

8.3.3 Stadium Data

Four on line databases were used as shown below with an example of the structure

and format of the information. Three were stadium databases, the fourth was

Google maps for checking stadium names where geocoding indicated an error.

German stadium names can change frequently to reflect sponsors.

Stadium Guide.com

Contains 34 listings of present day stadia currently in use.

Source: http://www.stadiumguide.com/present/germany/

World Stadiums.com

Lists 523 stadiums organised by state location. Includes all types of stadia

including football.

Page 90: The effects of weather on football matches played within the German Bundesliga

- 89 -

Source: http://www.worldstadiums.com/europe/countries/germany.shtml

Stadium Database.com

Contains listings for all current Bundesliga 1 & 2 clubs and a large number of other

stadia

Source: http://stadiumdb.com/stadiums/ger

Page 91: The effects of weather on football matches played within the German Bundesliga

- 90 -

Google Maps.ie

Large online mapping tool with all current stadiums listed.

Source: https://www.google.ie/maps

Page 92: The effects of weather on football matches played within the German Bundesliga

- 91 -

8.3.4 Weather Station Data

Example of weather station list; Average Daily Temperature stations shown.

There is one station list for each weather variable as not every station records

every weather variable. Each list is a comma delimited text file which can be

downloaded from the ECA&D (2014) website directly using R Studio. Lists contain

all European stations which are identified through two digit ISO3116 country

codes. The Station ID (STAID) is the unique identifier that allows the observation

data to be matched to each weather station. Latitude and longitude as well as

altitude is also provided. Altitude is critical to ensure each weather station and

stadium are at comparable heights.

Page 93: The effects of weather on football matches played within the German Bundesliga

- 92 -

8.4 Data Set Variables

The finished data set used for analysis has 12,926 rows each representing a single

match and 31 columns or variables. 14 columns were feature engineered.

> str(master)

'data.frame': 12926 obs. of 31 variables:

$ STAID: int 4012 472 3990 2763 2758 4005 3990 479 477 491 ...

$ HomeTeam: Factor w/ 73 levels "Aachen","Aalen"….

$ Div: Factor w/ 2 levels "D1","D2": 2 2 2 2 2 2 2 2 2 2 ...

$ AwayTeam: Factor w/ 74 levels "Aachen","Aalen",..

$ FTHG: int 1 4 1 0 0 0 0 3 3 2 ...

$ FTAG: int 1 0 1 0 1 1 0 1 0 1 ...

$ Stadium_Name: Factor w/ 72 levels "Allianz Arena",.

$ lat: num 49.5 54.1 50.9 48.8 52.7 ...

$ lon: num 8.5 12.1 11.58 9.19 7.3 ...

$ alt: int 96 22 149 481 21 55 318 38 57 276 ...

$ dist: num 5.22 10.71 38.59 7.7 69.38 ...

$ RR: num 24 7.9 2.5 8.9 0.3 2.2 2.5 0.3 3.3 4.9 ...

$ HU: int 94 95 75 87 82 84 75 83 81 94 ...

$ TG: num 18 16.1 17.7 17.8 16.4 16.4 17.7 17.4 16.4 16.6 ...

$ CC: int 8 6 6 8 7 5 6 5 5 8 ...

$ FG: num 2.3 5.6 6 2.9 4.9 4.5 6 4.9 4.1 3.9 ...

The following data variables were feature engineered from the raw data set above:

$ month: Factor w/ 12 levels "April","August",..

$ year: int 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 ...

$ TotalGoals: int 2 4 2 0 1 1 0 4 3 3 ...

$ OverUnder2.5: Factor w/ 2 levels "Over","Under": 2 1 2 2 2 2 2 1 1 1 ...

$ H_A_Win: Factor w/ 3 levels "AwayWin","Draw",..: 2 3 2 2 1 1 2 3 3 3 ...

$ GDiff: int 0 4 0 0 -1 -1 0 2 3 1 ...

Page 94: The effects of weather on football matches played within the German Bundesliga

- 93 -

$ BScale: int 2 4 4 2 3 3 4 3 3 3 ...

$ HUscale: int 10 10 8 9 9 9 8 9 9 10 ...

$ Rain: Factor w/ 6 levels "Heavy","Light",..: 1 3 3 3 2 3 3 2 3 3 ...

$ season: Factor w/ 4 levels "Autumn","Spring",..

$ Atemp: num 18.8 13.9 14.5 17.6 14 14.4 14.5 15.4 14.5 15.7 ...

$ Area: Factor w/ 15 levels "baden-wurttemberg",..

$ Region: Factor w/ 4 levels "Coastal & NW",..

$ ATempScale: Factor w/ 5 levels "<-10","<0","0-10",..

Page 95: The effects of weather on football matches played within the German Bundesliga

- 94 -

8.5 Naïve Bayes probability outcomes

The majority of all the probabilities are essentially even except for those specific

cases at extreme ends and for altitude >500m.

Page 96: The effects of weather on football matches played within the German Bundesliga

- 95 -

8.6 List of Figures and Tables

Figure 1: System Architecture Design ................................................................ 20

Figure 2: Data flow diagram ................................................................................ 25

Figure 3: Distance matrix Calculation Code in R Studio ..................................... 27

Figure 4: Example of missing data for wind speed for 12 days in Sept 2001 ..... 29

Figure 5: Weather data before Imputation .......................................................... 33

Figure 6: Weather data after Imputation ............................................................. 33

Figure 7: Mean goals scored per match for all teams ......................................... 37

Figure 8: Total goals scored for all teams ........................................................... 38

Figure 9: Total Goals scored per Stadium .......................................................... 39

Figure 10: Goal Distribution for B1 (top) and B2 (bottom) ................................... 39

Figure 11: Goal Difference distribution histogram ............................................... 40

Figure 12: Histogram of Average Temperature .................................................. 41

Figure 13: Histogram of Rainfall for Germany for all 21 years ............................ 41

Figure 14: Histogram of Cloud cover .................................................................. 42

Figure 15: Histogram of wind speed ................................................................... 43

Figure 16: Histogram of density for average temperature and season. .............. 43

Figure 17: Total Goals (Mean) graph for Apparent Temperature and Rainfall .... 44

Figure 18: Boxplot of Apparent Temperature ranges against Total Goals .......... 45

Figure 19: Total Goals graph for Wind Speed and Humidity............................... 46

Figure 20: Total Goals scored for Cloud Cover and Month................................. 47

Figure 21: Mean Goals Scored by Season and Region ...................................... 48

Figure 22: OU results against Apparent Temperature and Rainfall. ................... 49

Figure 23: OU against wind speed and humidity ................................................ 50

Figure 24: OU against Cloud Cover and Calendar month .................................. 50

Figure 25: Mean Goals Scored by Season ......................................................... 51

Figure 26: Apparent Temperature and Wind Speed Proportion Graphs ............. 52

Figure 27: Goal Difference (GD) vs Apparent Temperature and Rainfall ........... 53

Figure 28: Goal Difference vs Wind speed and Humidity ................................... 54

Figure 29: Goal Difference vs Cloud Cover and calendar month ........................ 54

Figure 30: Goal Difference vs season and Region ............................................. 55

Page 97: The effects of weather on football matches played within the German Bundesliga

- 96 -

Figure 31: HAD vs Apparent Temperature and Rainfall ..................................... 56

Figure 32: HAD vs Wind Speed and Humidity .................................................... 57

Figure 33: HAD vs Cloud Cover and Calendar month ........................................ 57

Figure 34: HAD vs season and Region .............................................................. 58

Figure 35: Goal Outcome against Altitude .......................................................... 59

Figure 36: Pairs and correlation analysis of numeric data .................................. 60

Figure 37: Scatter plot of total goals against Daily average temperature ........... 61

8.7 List of Tables

Table 1: Summary Overview of numerical variables ........................................... 36

Page 98: The effects of weather on football matches played within the German Bundesliga

- 97 -

8.8 Initial Project Plan

Page 99: The effects of weather on football matches played within the German Bundesliga

Project Proposal

Can the weather kill goals?

The effects of weather on goal outcome for football

matches played within the German Bundesliga

Alastair Macnair, x13129325, [email protected]

Higher Diploma in Data Analytics

4th June 2014

Page 100: The effects of weather on football matches played within the German Bundesliga

Table of Contents

1. Objectives ....................................................................................................................................... 3

2. Background ................................................................................................................................... 4

3. Literature Review ......................................................................................................................... 6

4. Research Question ........................................................................................................................ 7

5. Requirements Elicitation and Analysis ....................................................................................... 9

6. Special Resources required ........................................................................................................ 13

7. Project Plan ................................................................................................................................. 14

8. Consultation ................................................................................................................................. 14

9. Declaration ................................................................................................................................... 14

Appendix A – Examples of the Football Data Sets .......................................................................... 15

Appendix B – Examples of the Weather Data Sets .......................................................................... 16

Appendix C – Map of Current Bundesliga 1 & 2 Stadium locations. ............................................ 21

Appendix D – Project Plan Gantt chart ............................................................................................ 22

Appendix E – Map showing Principle Regions of Germany ........................................................... 23

Appendix F – Project Proposal Revisions ......................................................................................... 24

References ............................................................................................................................................ 25

Page 101: The effects of weather on football matches played within the German Bundesliga

3

1. Objectives The study will seek to assess whether certain weather factors such as temperature, cloud cover,

precipitation, wind and humidity have any determined effect on the goal outcome of football

matches within the Bundesliga 1&2 football leagues held within Germany when considered

across twenty seasons of historic play. The theory is that weather conditions, in particular lower

temperatures, may have a detrimental impact on goals scored although warmer temperatures

will also be considered. By linking daily historic weather data for specific weather stations with

stadiums and the dates and results of matches played it will be determined if the effects of

weather plays any role in goal outcome when considered over a significant time period.

Secondary objectives will consider if any difference exists between the first and second league

in relation to weather effects on goal outcome and also whether any particular stadium affects

goal outcome due to its geographical location or size for the teams that play there. The results

will also be used to compare against a particular betting instrument which is the over/under

goals scored bet (PKR, 2014) to see if any meaningful predictions can be made regarding total

match goals scored. Any possible lean towards an uneven spread (more goals under then over

due to weather factors) would be of particular interest to football teams, coaches, trainers and

in particular those companies that provide such betting instruments products.

Summary of all objectives

Objective #1 Determine if there is any link between goals scored and weather effects

within the Bundesliga 1&2 football Leagues.

Objective #2 Determine if there is any difference between the Bundesliga 1 & Bundesliga

2 due to the effects of weather, location or smaller stadiums.

Objective #3 Determine if just single or multiple weather parameters predominantly affect

goal outcome.

Objective #4 Investigate if stadium location and regional local weather affects games

played there and match outcome.

Objective #5 Compare the outcomes of matches to under/over goal difference betting

instruments to determine if the spread of match results could have been better

predicted using the results of the analysis.

Objective #6 Attempt to use the data to predict goal outcome for a number of future

matches using weather predictions and selected betting instruments.

Objective #7 Determine if goal difference between teams is greater in colder weather and

if sustained cold weather effects a team’s performance over time.

Objective #8 Use analysis software including but not limited to Excel, Python, R and SQL

to gain knowledge in their use for analysing large data sets.

Page 102: The effects of weather on football matches played within the German Bundesliga

4

2. Background According to a recent study by the European Commission (2012) on the contribution of sport

to the economy it placed the value at €294 Billion euros. Additionally the betting market for

sports is estimated to be around €733Billion globally (BBC, 2014) with 70% of that income

coming from football matches. Betting on football matches became popularised in the early

1920’s since the creation of the football pools (2014) in the UK, the oldest gaming company in

the world, which allowed fans to predict matches and win money if those predictions proved

to be correct. With some individual bets now reaching figures of over €200,000 (BBC, 2014)

it is important for gaming companies to be able to understand the level of risk they are being

exposed to as mistakes could be costly.

Additionally trainers and teams are always looking to gain competitive advantage to ensure

success and the use of statistical information and data analytics is becoming increasingly

important within football as more and more managers and teams use data analysis to become

smarter and more efficient (Lewis, 2014.) While nearly all analysis focusses on the players

there has been much less analysis on external factors. There is some evidence and a number

of studies to indicate that weather factors, predominantly temperature, may be a factor in the

outcome of European football matches (Hamilton, 2014.) While the effects of extreme hot or

cold temperatures on human physiology are known to directly affect both performance and

health (Hong, 2014) the overall contribution that weather makes, where extreme temperature

is not a factor, and in particular, to the goal outcome of football matches, is still not clearly

understood or established.

Germany has a moderate and temperate climate (See Appendix 2(e) for a typical weather year)

with temperatures ranging on average from just below zero degrees Celsius in winter to around

the mid twenty’s during the summer (ECA, 2014.) The use of a moderate temperate climate

seeks to reduce as much as possible any effects of very extreme temperatures. However there

are colder areas such as Munich which can see temperatures drop to around -10°C which can

affect performance (Hong, 2014) although at around 0°C there should be very little drop in

performance for persons engaged in moderate exercise even if wearing t-shirts. Germany is

large enough to have distinct regions of specific weather patterns (Encyclopaedia Britannica,

2014) with variable frequency of temperature, humidity and precipitation experienced in

different regions and throughout the year. The Bundesliga stadiums are distributed around

Page 103: The effects of weather on football matches played within the German Bundesliga

5

Germany widely enough to see if regional weather plays a role in match outcome (Appendix

C & E.)

The study considers two primary data sets which were identified for the purposes of being

suitable for analysis and to meet the studies objectives. Firstly the Bundesliga football league

results for which reliable historic data exists for its entire history since 1963. Within this a

selection of data will be considered for the period 1993 until 2013 which represents twenty one

seasons (years) of play. Useable football score data has been identified from an online provider

(Football-UK, 2014) which provides one csv file (Refer to Appendix A) for each season played

detailing every game played within the season, the date played, half time and full time scores

and where it was played as well as range of other match information. There are around 306

games played per season per league so the football data set will comprise around 306 (games)

x 21 (seasons) x 2 (leagues) = 12, 852 football matches being analysed in total. Each of the csv

files is relatively small at around 100kb in size.

Compared to this will be daily weather data for Germany obtained from the European Climate

Assessment & Database (ECA, 2014.) This site provides data on numerous weather stations

positioned around Europe which can be matched geographically using name, latitude and

longitude co-ordinates to each stadium being considered to within a few miles. The data is

available as a number of individual text files, one for each unique weather station and weather

variable (Refer to Appendix B.) The blended data was selected for use which combines weather

data from different sources, although checking shows no difference for the weather stations

being used. The files contain comma delimited text in their raw format and the uncompressed

size of the files for each weather variable ranges from 200MB to 4GB containing

approximately 400 to 5,000 individual weather stations in each zipped folder for that particular

weather variable. Each file provides data for approximately 67 years equating to 25,591 lines

of raw data for each weather station and single weather variable. There are 18 stadiums in the

Bundesliga 1 and the same in the Bundesliga 2. However, there are over the 21 year period 38

teams that have played in the top tier with a similar number expected in the second but, there

will be instances of cross over where teams share stadiums and the closeness of stadia may

allow for a common weather station to be used. This will be subject to more a more detailed

assessment during the initial stages of the project. This means that there could be approximately

30 distinct weather files for each weather parameter. The total approximate size of the raw data

will therefore be 25,591 x 30 = 767,730 lines of data for each weather variable. It is intended

Page 104: The effects of weather on football matches played within the German Bundesliga

6

to consider five variables; temperature, precipitation, cloud cover, wind and humidity which

will equate to almost 3.8 million lines of raw data prior to selecting the relevant lines that equate

to the 12,852 actual football matches that were played.

There are some important limitations to the data being considered, in particular the weather

data. The data being used provides daily averages which may not equate to the conditions

experienced during the time the match was played. For example rain may have fallen before,

after or during the match. The study is seeking to determine if any relationship exists between

the historic weather data and goal outcome and so some caution is advised as links could be

established where none really exist. However, the primary objective is to consider any overall

trend over the course of a playing season i.e. changes in seasons and over months rather than

specific matches.

3. Literature Review The literature review will at this stage examine primarily factors relating to the statistical

analysis of sports and the effects of weather on sports but should also extend to consider

statistical analysis in general, prediction analysis, climate and weather, sports performance and

stadium design. Literature has been researched sourced from Google Scholar, CiteSeerX,

Google Books and the Directory of Open Access Journals along with articles, websites and

other sources.

Statistical Analysis in sports

Sports performance analysis is the process by which the various persons involved within a sport

such as coaches, analysts or physiologists come together to break down a games performance

from observed data and then identify those factors which contributed towards either a good or

bad performance (McGarry, O’Donoghue, Sampaio, 2013.) A lot of commonly accepted

anecdotal evidence within football has been proven to be incorrect using statistical analysis

such as corner kicks increasing the chances of scoring (Anderson and Sally, 2013.) The authors

propose that understanding issues like this provides competitive advantage through knowledge

justifying the time and expense in undertaking such analysis in the first place.

Page 105: The effects of weather on football matches played within the German Bundesliga

7

The effect of cold weather in Sport

The effects that weather and environmental factors have on sport is an area where potentially

considerable improvements could be made according to Thornes (1977) to improve sports

management, performance and economic performance. There is evidence to suggest that some

sports are more adversely affected than others with endurance sports, in particular cycling,

being affected by the weather (Pezzoli, Cristofori, Moncalero, Giacometto and Boscolo, 2013.)

The study also found that most sports were affected by three primary characteristics namely

temperature, humidity and wind. Rain was also a factor in a number of cases for some but not

all sports. Riley and Williams (2003) indicates that colder weather reduces limb temperatures

which would detrimentally affect motor performance as well as strength and power. In fact

muscle power was found to be reduced by 5% for every 1°C drop in muscle temperature below

normal.

The effects of temperature on ball properties is also a possible environmental factor as with

temperatures approaching zero degrees Celsius a goalkeeper has 7% more time to react to a

penalty that at higher temperatures when the ball moves quicker. (Wiart, Kelley, James and

Allen, 2012) The flight of the ball is also affected with colder conditions causing the ball to

drop and move slower overall with less power than at warmer temperatures. However as Riley

and Williams (2003) point out in colder weather the goalkeeper is most susceptible to reduced

limb temperature and dexterity unless they keep highly active.

4. Research Question The problem being considered is that there is a lack of information regarding the effects that

weather factors like temperature, precipitation, humidity and wind may have on goals scored

in football matches. The primary research question being considered is: -

“Does the weather effect the goal outcome in football matches within the Bundesliga 1 & 2?”

From this the Null Hypothesis Ho and the hypothesis that will be tested H1 is established: -

Page 106: The effects of weather on football matches played within the German Bundesliga

8

Ho: There is no relationship between goal outcome in football matches and daily corresponding

average values of temperature, wind, precipitation and humidity.

H1: There is a relationship between goal outcome in football matches and daily corresponding

average values of temperature, wind, precipitation and humidity.

The Null hypothesis is non directional and therefore a two tailed test will be applied where

appropriate with a significance level (critical value) of 5%

Figure 1: Graphical representation of a two tailed test with rejection regions.

Within the context of the broader research question there are further questions that will be

considered: -

(i) “Is the Bundesliga 2 Leagues goal outcome affected more by weather factors than

the Bundesliga 1 League?”

(ii) “Can goal outcome at any particular stadium be attributed to any possible regional

weather effects?”

(iii) “Does a single weather variable affect match outcome or are multiple factors

required?”

(iv) Do smaller stadiums have a greater effect on goal outcome due to greater expose

to the weather?”

These are components of the primary research question and will be investigated. Appropriate

hypothesis testing will need to be established for these questions. Further questions will be

developed for the project.

Page 107: The effects of weather on football matches played within the German Bundesliga

9

Predictions

Additionally there is the possibility of the results analysis being used to undertake match

outcome prediction for goals scored using next day weather forecasting. It is expected that

rather than being able to predict actual total goals for a match with any accuracy it is more

likely that prediction of average goals scored due to general weather conditions experienced

over a time period would be possible. The use of a betting tool such as the Under/Over (x)

goals instrument will be used based on the average number of goals per game and league across

the period being considered. For example if the average goals scored was 2.7 then Under/Over

2.5 goals would be used as the instrument to see if the results can be used to reliably determine

significant push or pull above or below this level which could potentially indicate that the

predictions can be made. As the predictions are dependent on weather then the time period will

typically be in the 1 to 3 day period in line with weather forecasting but could increase to 10

days.

The research will be limited to only stadium locations within Germany, the weather data

identified and goals scored for a match. No other in match data or statistics will be used such

as corners or passes. Individual players will not be considered nor will any other variables other

than those indicated and referenced.

5. Requirements Elicitation and Analysis Requirements elicitation is a preliminary stage in which the requirements of the process are

specified and defined which then leads to the correct solution being designed and implemented.

Undertaking requirements elicitation is primarily a process to understand a particular problem

which comes typically from a business need. The objective of requirements elicitation is to

identify all of the requirements, or as many of them as is feasibly possibly (Kasirun, 2005.) At

this stage the requirements are a preliminary step towards a more detailed project specification

later on during the second semester when the dissertation will be initiated, undertaken and

completed.

Elicitation techniques are the systems and tools used to bring forth the requirements and help

develop and find understanding. For this part of the process the tools used are Brainstorming

and Document Analysis as outlined in the (IIBA, 2009.)

Page 108: The effects of weather on football matches played within the German Bundesliga

10

The brainstorming process was utilised primarily at this stage to help stimulate ideas on the

project. This did not take the format of a scheduled session but instead was an ongoing process

where ideas were jotted down in a note book as and when they came to mind. No critiquing or

analysis of the ideas was undertaken deliberately as this is contrary to the brainstorming process

which is to develop new ideas.

Before determining the functional and non-functional project requirements it is useful to first

re state the problem being considered which was explored in the previous section: - “The

problem being considered is that there is a lack of information regarding the effects that

weather factors like temperature, precipitation, humidity, cloud cover and wind may have on

goals scored in football matches.” From this we can then look to determine the project

requirements.

Project Scope

The project is a Big Data Analysis study which will use a relational database most likely SQL

in conjunction with R Studio to undertake analysis of a large data set to find trends, patterns,

links and predictions supported by graphing and tables to present results.

General Description

The database will be created and designed to facilitate the querying and manipulation of a large

amount of data to allow for the effects of weather such as temperature, humidity, precipitation

and wind on total goals scored in football matches to be analysed to determine if a relationship

exists. The aim is that the analysis will provide insight into the possible effects of weather on

sports like football.

The database must be designed in such a way that all the entities and their relationships are

robust and well understood and that the data has been normalised prior to database creation.

The ability to handle very large queries and joins will be required as tables with thousands of

rows has a multiplying effect within SQL databases which can have significant demands on

processing ability of computers. If the database cannot function properly then either the number

of data points will have to be restricted or the amount of analysis limited which will not provide

a sufficient amount of information for a robust analysis which could damage the study as a

whole. The core function of the project is to compare the two primary datasets which must be

central to any design approach implemented.

Page 109: The effects of weather on football matches played within the German Bundesliga

11

System Interfaces

The database will be a self-contained system however it may interface with a PC or a server

that will be located on Amazon Web Services, or Windows Asia (to be decided subject to

further research.) It will also need to potentially receive input data from another programs such

as Microsoft Excel, R or Python and be required to export back to Excel and R Studio for

ongoing graphing and analysis.

Preliminary list of Functional Requirements

The purpose of the project is to utilise a database to either accept or reject the null hypothesis

as set out within the specified project timeline and to produce a dissertation report.

1. The weather data cleaning preparation tool (R or Python) must be able to discard the dates

and associated data that are not relevant to reduce the weather file size.

2. The weather data cleaning preparation tool (R or Python) must be able to read, re-organise

and output the data files into a readable and standardised format for entry into the SQL

database.

3. The data preparation must ensure that dates from both files are in a standardised ISO format

that are compatible with each other.

4. The weather stations should have specific identity codes matched to each stadium.

5. The SQL database system must be able to be export results data out to other programs for

analysis, graphing and visualisation.

6. The SQL database being used for analysis must be able to hold several thousand entries.

7. The SQL database must be able to filter and select different columns and rows of

information for analysis and comparison.

8. The data outputted should be produced in a form that it is capable of being analysed by

using a variety of statistical tools (it is assumed that all of these will be utilised at this stage

to some extent subject to verification during the next stage) including: -

a. z-test (hypothesis testing)

b. power analysis (due to the large sample size)

c. Analysis of variance (ANOVA) to compare each season of play and other sub

groups of means.

d. Mean (there will be multiple means considered)

e. Calculation of Standard deviation(s)

f. Calculation of Variation(s) and Covariance.

Page 110: The effects of weather on football matches played within the German Bundesliga

12

g. Time series analysis (for possible prediction analysis)

h. Cluster analysis

i. Correlation Analysis (Calculation of r)

j. Simple linear Regression, multiple & logistic regression tools.

9. The SQL database should be designed so that comparison against weather variables can be

made against the following football variables:

a. The entire range of matches played by date of match.

b. Each season of play (by Individual selection.)

c. By Stadium location.

d. By Team.

e. By a pre-determined or local region (Refer to Appendix E.)

10. The SQL database should be designed so that comparison against football variables can be

made against one, two, three or all of the following weather statistics:

a. Temperature

b. Humidity

c. Precipitation

d. Wind

e. Cloud Cover

11. The database team table should provide the numbers of years they have played in each

league as not every team will have played for the entire time period being analysed.

Preliminary list of Non Functional requirements

Non-functional requirements are outlined below. They include:

1. The methodology section should enable another person to reproduce the research project

in its entirety and from the same data obtain the same/similar results.

2. The project and research objectives should be able to be understood by non-experts.

3. The data being used should be verified as authentic and reliable.

4. The author must invest a minimum of three hours a week on the project based on the project

plan.

5. The author must attend all lectures and tutorials within semester 2.

6. The database should be able to achieve a reasonable level of performance in its required

operation.

Page 111: The effects of weather on football matches played within the German Bundesliga

13

7. The project must be stored electronically on three different media sources at all times and

at least be updated once a week.

8. The project must be completed by the specified date.

6. Special Resources required The proposed project will require a number of programs to undertake the required analysis and

then production of results: -

1/Microsoft Excel – Required to read and open primary football data files and do basic checks

and tables, graphical outputs.

2/ Microsoft Word – To generate written reports.

3/ R. – R will be the primary program used to prepare and analyse, graph and tabulate the data.

It will be used to clean up all the football files removing unwanted columns and binding all

years of play into one file. Weather data will also be cleaned up removing unwanted lines and

error checking for NULL values.

4/ SQL - The data lends itself towards a relational database such as SQL where the weather

data can be combined with the football data based on, temperature, precipitation, humidity,

wind or geographic location or team for example.

6/ Map Reduce/Hadoop & Python – The use of a distributed computer system could offer

potential benefits for speed of computation as the data set may be too large to handle efficiently

on a single user PC. This will be investigated as to its necessity as the project develops.

7/ Pea Zip – A program that can easily un-compress a variety of large file formats to be used

for the weather data.

8/ Microsoft PowerPoint – To create the project presentation

9/ Adobe Photoshop - May be required to assist with image manipulation for the project and

presentation.

10/ Browser add-on for Mozilla; “Download it all!” to quickly extract and download all 42

football csv files.

At this stage there may be additional programs that may be useful but have not yet been

identified as being a requirement. This will be a part of the project plan to determine what

technologies should be used.

Page 112: The effects of weather on football matches played within the German Bundesliga

14

7. Project Plan The project plan is provided in Appendix D and shows the general expected timeline for project

delivery in the second semester. The first half of the project is planned for research, preparing

all the data, building databases and becoming familiar with them as well as the initial parts of

the thesis. The second part focuses on the analysis, findings and writing the analysis which are

key parts of the project process. The plan has been updated based on confirmation of the

submission date in early January and additional deadlines for management reports and the

presentation.

8. Consultation The project proposal was discussed with NCI Lecturer Padraig De Burca. The discussion took

place 26th May 2014 and took the form of an informal discussion after scheduled classes.

Padraig provided valuable feedback relating to the potential for use of SQL to build a database

of all the normalised match and weather variables which can then be queried in multiple ways

with the results being outputted to other programs like Excel to generate graphs. The significant

benefit of using SQL would be firstly in the speed by which stadiums, teams, results and even

certain weather conditions can be isolated for comparative analysis but also would limit the

amount of preparation the weather files needed as there would be no need to eliminate all the

dates where games were not played. Just clip the data file at the start date to eliminate the

largest unneeded section prior to 1993. This would create potentially ‘redundant’ data within

the database and may affect times to undertake joins but could be quicker than trying to

eliminate certain dates in the raw weather files as there are potentially 70-100 individual

weather files.

As a result of the consultation several possible new ways to view the data were considered.

Firstly it opens the possibility of considering the past few days of weather prior to any match

for consideration which had not ben though of and secondly it allows the comparison of

sequential matches played by the same team in different locations to see if the effects of any

general ongoing weather such as sustained cold has a compounding effect. Padraig also noted

that SQL has some graphing capabilities which will be investigated as to their potential use.

9. Declaration By submitting this proposal through the NCI Moodle system, I declare that unless otherwise specified,

all content in this proposal is my own work and has not been copied from other sources.

Page 113: The effects of weather on football matches played within the German Bundesliga

15

Appendix A – Examples of the Football Data Sets

Data Set 1 – Football results for the Bundesliga 1 & 2. Excerpt below shows Bundesliga 2

results for July 2013.

Football-Data (2014) provides a full season of play for either Bundesliga 1 or 2 as a csv file

available for download. Each csv file contains the results for one entire season of play. There

are 306 matches in total for each season which equates to 18 teams. There are 52 columns of

data per file for most files containing the date, final time results, half time results, where the

game was played and a variety of betting information. For earlier years not all this information

was recorded. Twenty years of historic football data for both leagues equates to 306(games per

season) x 21 (seasons) x 2 (leagues) = 12,852 lines of data for the football matches which in

its raw form exists in 42 corresponding csv files. Total goals is not a parameter but any program

or database such as SQL could calculate this from the home and away goals scored columns.

Page 114: The effects of weather on football matches played within the German Bundesliga

16

Appendix B – Examples of the Weather Data Sets

Data Set 2(a) – Historic weather data example for Germany for Daily Mean Temperature at

station 494 (Augsburg, Germany)

The European Climate Assessment & Database Project (2014) provides data for weather

stations across Europe. The above data sample is taken from station number 494 (Augsburg)

for mean daily temperature. The text files are comma delimited and provide (from left to right)

station number, source identifier, date (yyyy/mm/dd), temperature and quality code. This file

contains 67 years, 3 months and 29 days of data which equates to around 25, 591 lines of data

for each of the locations. The year the station began monitoring varies but typically covers a

significant time period in all cases. The temperature is provided in 0.1degrees Celsius in its

current html format and requires a decimal point to read correctly. For example the first line of

data above for the 28th March records a daily mean temperature of 6.5 Degrees Celsius with no

known errors or missing data. Below freezing levels are identified with a minus symbol (none

shown in example above.)

Page 115: The effects of weather on football matches played within the German Bundesliga

17

Data Set 2(b) – Historic weather data example for Germany for Daily Humidity levels at

station 494 (Augsburg, Germany)

The above ECA (2014) data sample is taken from station number 494 (Augsburg) for daily

humidity. The text files are comma delimited and provide (from left to right) station number,

source identifier, date (yyyy/mm/dd), humidity in percent and quality code. This file also

contains 67 years, 3 months and 29 days of data which equates to around 25, 591 lines of data

for each of the locations. The year the station began monitoring varies but typically covers a

significant time period in all cases.

Page 116: The effects of weather on football matches played within the German Bundesliga

18

Data Set 2(c) – Historic weather data example for Germany for Daily precipitation levels at

station 494 (Augsburg, Germany)

The above ECA (2014) data sample is taken from station number 494 (Augsburg) for daily

precipitation. The text files are comma delimited and provide (from left to right) station

number, source identifier, date (yyyy/mm/dd), precipitation in 0.1mm and quality code. This

file also contains 67 years, 3 months and 29 days of data which equates to around 25, 591 lines

of data for each of the locations. The year each station began monitoring varies but typically

covers a significant time period in all cases.

Page 117: The effects of weather on football matches played within the German Bundesliga

19

Data Set 2(d) – Historic weather data example for Germany for Daily mean wind speed at

station 494 (Augsburg, Germany)

The above ECA (2014) data sample is taken from station number 494 (Augsburg) for daily

average wind speed. The text files are comma delimited and provide (from left to right) station

number, source identifier, date (yyyy/mm/dd), average wind speed in 0.1m/s and quality code.

This file also contains 67 years, 3 months and 29 days of data which equates to around 25, 591

lines of data for each of the locations. In this data set all records prior to 1960 are Null. The

actual wind speed is the above figure divided by 10. For example the first value for April shown

above would be 1.5m/s.

Data Set 2(e) – Historic weather data for Germany for Cloud Cover

The cloud cover data files (not shown) are based on the oktas scale which provides a measure

of cloud cover from 0 to 8 subject to the overall portion of sky covered. Zero represents a

totally clear sky while 8 would be totally overcast.

Page 118: The effects of weather on football matches played within the German Bundesliga

20

Example Weather Year 2(e) -Typical Weather Year for Mean Daily Temperature for weather station 494 – Augsburg

Page 119: The effects of weather on football matches played within the German Bundesliga

21

Appendix C – Map of Current Bundesliga 1 & 2 Stadium locations.

Image Source: Total Football Forums, http://www.totalfootballforums.com/forums/topic/76502-

german-football-fans/

Page 120: The effects of weather on football matches played within the German Bundesliga

22

Appendix D – Project Plan Gantt chart

Notes

1/ Dates shown are week commencing for the Monday of each week.

Wk_01 Wk_02 Wk_03 Wk_04 Wk_05 Wk_06 Wk_07 Wk_08 Wk_09 Wk_10 Wk_11 Wk_12 Wk_13 Wk_14 Wk_15 Wk_16 Wk_17 Wk_18 Wk_19 Wk_20 Wk_21

8/9/14 15/9/14 22/9/14 29/9/14 6/10/14 13/10/14 20/10/14 27/10/14 3/11/14 10/11/14 17/11/14 24/11/14 1/12/14 8/12/14 15/12/14 22/12/14 29/12/14 5/1/15 12/1/15 19/1/15 26/1/15

Task

Revised Proposal (28/09/14)

Statistical research (ongoing) Thesis writing

Technology & Tools research Supporting Processes

Data Cleaning & Preperation Key Landmarks

Normalisation & ERD

Requirements Specification

SQL/R Set up and programming

System Testing

Introduction

Literature Review

Methodology

Data Analysis (pre-testing)

Data Analysis & Programming

Discussion

Graphing and Visualisation

Refinemant

Conclusion

Final Checking

Printing and Binding (x3 copies)

Submission (06/01/15)

Management Reports

Write Presentation

Practice Presentation

Presentations

September October November December January

Page 121: The effects of weather on football matches played within the German Bundesliga

23

Appendix E – Map showing Principle Regions of Germany

Note: The regions are a base point for further study as it is accepted that these region

locations do not necessarily equate to accepted regional weather.

Image Source: 24point0. http://www.24point0.com/ppt-shop/media/catalog/product/r/e/regions-

map-of-germany-ppt-slides.jpg

Page 122: The effects of weather on football matches played within the German Bundesliga

24

Appendix F – Project Proposal Revisions

2/ Background

Extra season of play added increasing size and additional weather factor included (cloud

cover) also increasing the raw data size. (Minor Change)

4/ Research Question

A few extra sub research questions added and in the predictions section the limitations of

predictions are based on forecasting which is realistically limited to a few days.

6/ Special Resources

This area has been updated to better reflect the actual technology being used and for which

specific purpose based on time spent investigating each technology and undertaking small

scale tests.

7/ Project Plan

Updated to reflect known dates and revised to better break down sub components.

Appendix B

Cloud cover information added (without example picture) to note inclusion of this weather

data set in the project.

Appendix D

Project plan updated to reflect additional information such as key dates as outlined in section

seven.

Overall changes are considered minor with changes not exceeding 2-3% of the originally

submitted proposal.

Page 123: The effects of weather on football matches played within the German Bundesliga

25

References

Anderson, C., and Sally, D. (2013) The Numbers Game: Why everything you know about

football is wrong. Penguin Books.

BBC (2014) ‘Football Betting – The global industry worth Billion.’ [Online]. BBC. Available

at: http://www.bbc.com/sport/0/football/24354124 [Accessed 29th May 2014]

Encyclopaedia Britannica (2014) ‘Germany - Climate’ [Online]. Encyclopaedia Britannica.

Available from:

http://www.britannica.com/EBchecked/topic/231186/Germany/57996/Climate [Accessed

28th May 2014].

European Commission (2012) ‘Study on the Contribution of Sport to Economic Growth and

Employment in the EU.’ [Online]. European Commission. Available from:

http://ec.europa.eu/sport/library/studies/study-contribution-spors-economic-growth-final-

rpt.pdf [Accessed 1st June 2014].

Football-Data (2014) ‘Data-Files: Germany’ [Online]. Football-Data. Available from:

http://football-data.co.uk/germanym.php [Accessed 21st May 2014]

Football Pools (2014) ‘The Pioneers of Football Pools’ [Online]. Football Pools. Available

from: http://www.footballpools.com/cust?action=GoHelp&help_page=about_us [Accessed

1st June 2014]

Hamilton, H. (2014) ‘Does the cold really kill Goals?’ Howard Hamilton Blog, 1st May.

Available from: http://www.soccermetrics.net/league-competitions/temperature-vs-goals-

study-premier-league [Accessed 24th May 2014]

Hong, Y (eds.) (2014) Routledge Handbook of ergonomics in sport and exercise. New York:

Routledge.

Page 124: The effects of weather on football matches played within the German Bundesliga

26

IIBA (2009) A Guide to the business analysis body of knowledge (BABOK Guide.)

International Institute of Business Analysis: Toronto, Canada.

Kasirun, Z.M. (2005) A survey on the requirements elicitation practices among courseware

developers, Malaysian Journal of Computer Science, Vol. 18 No. 1, June 2005, pp. 70-77.

Lewis, T. (2014) ‘How computer analysts took over at Britain’s top football clubs’ [Online].

The Guardian, 9th March, Available from:

http://www.theguardian.com/football/2014/mar/09/premier-league-football-clubs-computer-

analysts-managers-data-winning [Accessed 28th May 2014].

McGarry, T., O’Donoghue, P., and Sampaio, J. (eds) (2013) Routledge Handbook of Sports

Performance Analysis. New York: Routledge

Pezzoli, A., Cristoforu, E., Moncalero, M., Giacometto, F., and Boscolo, A. (2013)

Climatological Analysis, Weather Forecast and Sport Performance: Which are the

Connections? Journal Climatol Weather Forecasting 1: e105

PKR. (2014) ‘Under / Over Betting’ [Online]. PKR. Available from:

http://bet.pkr.com/en/get-started/bet-types/under-over/ [Accessed 28th May 2014].

Riley, T., Williams, A.M. (eds.) (2003) Science and Soccer. 2nd Edition. London: Routledge.

Thornes, J. E. (1977), The Effect of Weather on Sport. Weather, 32: 258–268.

Weather Online (2014) ‘Climate Germany’ [Online]. Weather Online. Available from:

http://www.weatheronline.co.uk/reports/climate/Germany.htm [Accessed 28th May 2014].

Wiart, N., Kelley, J., James, D., and Allen, T. (2011) Proceedings of the Institution of

Mechanical Engineers, Part P: Journal of Sports Engineering and Technology 2011 225: 189

Page 125: The effects of weather on football matches played within the German Bundesliga

- 98 -

8.9 Initial Requirement Specification

Page 126: The effects of weather on football matches played within the German Bundesliga

HDSDAJAN 2014

Requirements Specification (RS) The effects of weather on goal outcome

for football matches played within the

German Bundesliga

Alastair Macnair 10/12/2014

Page 127: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 1

Requirements Specification (RS)

Document Control

Revision History

Date Version Scope of Activity Prepared Reviewed Approved

12/10/2005 1 Create AM X X

Distribution List

Name Title Version

Ioana Ghergulescu Lecturer 1

Related Documents

Title Comments

Title of Use Case Model

Title of Use Case Description

Page 128: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 2

Table of Contents

Requirements Specification (RS) 1

Document Control 1

Revision History 1

Distribution List 1

Related Documents 1

1 Introduction 4

1.1 Purpose 4

1.2 Project Scope 4

1.3 Definitions, Acronyms, and Abbreviations 7

2 Requirements Specification 7

2.1 Functional requirements 8

2.1.1 Use Case Diagram 8

2.1.2 Requirement 1: Extract/Collect Required Data Sets 8

2.1.3 Requirement 2: Filter Weather files 11

2.1.4 Requirement 3: Transform Data 13

2.1.5 Requirement 4: Database Management 16

2.1.6 Requirement 5: Data Analysis 18

2.1.7 Requirement 6: Data Prediction Modelling 20

2.1.8 Requirement 7: Report Production and Output 23

2.2 Non-Functional Requirements 25

2.2.1 Recover requirement 25

2.2.2 Reliability requirement 25

2.2.3 Extendibility requirement 25

2.2.4 Resource utilization requirement 25

3 Interface requirements 25

3.1 Application Programming Interfaces (API) 25

4 System Architecture 26

5 System Evolution 27

Page 129: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 3

6 Special resources required 27

Page 130: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 4

1 Introduction

1.1 Purpose

The purpose of this document is to set out the requirements for the development

of an analytical study between weather data variables recorded across Germany

and the goal outcome for matches played within the Bundesliga 1 & 2 Leagues.

The analysis of weather and match outcome built on 21 years of historical data will

enable all relevant users to gain a better insight into understanding the effects of

weather through the varied weather variables being considered and future match

outcome through predictive analysis.

The intended primary customers will include predominantly football teams,

coaches, and trainers and in particular those companies that provide betting

instrument products for football matches. Secondary customers however could

easily include those involved in any competitive or non-competitive sport where

weather is a factor.

1.2 Project Scope

The scope of the project is to develop an understanding of the relationship between

weather conditions and football match goal outcome within the Bundesliga 1 & 2

leagues played in Germany. One of the projects primary outcomes is to provide all

of the identified customers with a better understanding, knowledge and statistical

insight into weather effects on goal outcome that can be used to improve match

Page 131: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 5

strategy and winning performance levels through the individual objectives outlined

below.

The data to be used as the basis of the analysis is comprised of ECA&D weather

data files and Football data UK files which are both freely available for non-

commercial use.

The primary project objectives are listed below:

Objective #1 Determine if there is any link between goals scored and weather effects

within the Bundesliga 1&2 football Leagues.

Objective #2 Determine if there is any difference between the Bundesliga 1 &

Bundesliga 2 due to the effects of weather, location or smaller stadiums.

Objective #3 Determine if just single or multiple weather parameters predominantly

affect goal outcome.

Objective #4 Investigate if stadium location and regional local weather affects games

played there and match outcome.

Objective #5 Compare the outcomes of matches to under/over goal difference betting

instruments to determine if the spread of match results could have been

better predicted using the results of the analysis.

Objective #6 Attempt to use the data to predict goal outcome for a number of future

matches using weather predictions and selected betting instruments.

Objective #7 Determine if goal difference between teams is greater in periods of colder

or warmer weather and if sustained whether this effects a team’s

performance over time.

Objective #8 Use analysis software including but not limited to Excel, Python, R and

SQL to gain knowledge in their use for analysing large data sets.

Page 132: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 6

Successful Outcome Criteria: The project will be considered successful if the

primary research question(s) can be answered. In the context of this study

determining that weather has no determinable effect or if only one or two of the

project objectives can be answered this will be considered equally successful.

There are a number of restrictions and limitations of the project:

Weather Data: The weather data uses daily averages which may not equate to

the conditions being experienced during the actual match. For example rain may

have fallen before, after or during the match. The intention is to consider general

trends rather than specific instances. i.e. colder weather, warmer weather, very

wet, very dry. The weather data could be used to generate secondary parameters

either on their own or by combining two or more variables. For example wind chill

factor, Beaufort scale and heat index.

Football Data: The only variables being considered are the goal outcome (each

team and total) and the respective geolocation based on the where the match was

played. No other variables such as the players, corners or other ‘in match’ data will

be used.

Software and tools: All software and tools being used such MySQL, R Studio,

Python and Microsoft Office are available within the College without cost or

unreasonable restriction.

Page 133: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 7

Data use: Both data providers offer the data for free use. However the ECA&D in

their data policy document note that the data cannot be used for commercial

purposes. While the study is not commercial in nature any future usage of the study

must take this into account where and if applicable.

Time: The project must be completed within the specified time frame from the 13th

October 2014 until the deadline of 5th January 2015.

Budget: There is no required budget as all the required software and tools are

available without cost penalty within the College.

1.3 Definitions, Acronyms, and Abbreviations

CSV Comma Separated Values file format

ECA&D European Climate Assessment & Data Set Project

Firefox Open source internet browser

MySQL Relational Database Program

PeaZip Program to unzip and extract large zip files

Python Open Source programming program

R Studio Open Source statistical analysis program

Txt Text file format

2 Requirements Specification

This section provides an overall description of the project and detailed descriptions of the functional requirements that represent the key steps and processes that are essential to ensure successful project completion.

Page 134: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 8

2.1 Functional requirements

2.1.1 Use Case Diagram

The Use Case Diagram below provides a project overview of all the functional requirements. The Data Analyst is involved in every step with use case using the preceding use case from the first to the last step as indicated.

Figure 01_ Overall Use Case Diagram

2.1.2 Requirement 1: Extract/Collect Required Data Sets

2.1.2.1 Scope and Description

2.1.2.1.1 Description Overview & Priority

The data sets are downloaded from the websites identified. Downloading the data is essential to being able to undertake analysis.

Page 135: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 9

2.1.2.1.2 Inputs

Weather Data Website: http://eca.knmi.nl/

Football Data Website: http://football-data.co.uk/

PC with Firefox web browser

High Speed internet access

PeaZip Program

Storage device with at least 10GB of free space for all files

2.1.2.1.3 Processing

1/ Check data usage policy and obtain any permissions to use data sets.

2/ Access the weather website and download the five zip files plus station lists.

3/ Access the football website and use Firefox to down load all 42 files across all individual links simultaneously.

4/ Check files (sample) are viable and contain data as expected.

2.1.2.1.4 Outputs

Weather files: There will be around 5000 individual weather files in comma delimited txt format. There will be five station lists, one for each weather variable in comma delimited txt. Station lists provide a full description of every station for that weather variable with a country code and location allowing for files to be identified.

Football files: There will be 42 individual csv files. 21 for Bundesliga one and 21 for Bundesliga 2 representing each season of play.

2.1.2.1.5 Error handling

If the Files or data is not available, corrupt, missing, damaged or, the links do not work then contact the site administrator to resolve.

2.1.2.2 Use Case 001 – Data Collection/Extraction

Scope

The scope of this use case is to acquire the weather and football data sets and any other supporting information required at this stage.

Description

The data analyst accesses each website and downloads the files to a dedicated project directory.

Use Case Diagram

Page 136: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 10

Figure 02 – Use Case 001

Flow Description

Precondition

There is an active internet connection. There is a PC with internet browser and suitable storage device with at least 10GB of storage to hold the files when un-zipped.

Activation

The use cases starts when the websites are accessed for the purposes of downloading the data sets.

Main flow

1. The ECA&D Website is accessed 2. The weather blended ZIP files for Average Temperature, Rainfall,

Cloud Cover, Humidity and Average Wind Speed are downloaded to the dedicated project directory.

3. The Station list txt files for each weather variable are downloaded. 4. The Football Data UK website is accessed 5. The 42 individual links to each CSV files for each season of play and

league are downloaded as a single operation using Firefox add on ‘download it all’ into the dedicated project directory.

6. The ZIP files are opened using PeaZip to unpack the files and visually check that the operation was successful.

Page 137: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 11

7. Two Random CSV files are opened to visually confirm that the data is not compromised and is all present.

Termination

The project folder shows all the required files as being present.

Post condition

The data sets are ready to be used.

2.1.3 Requirement 2: Filter Weather files

2.1.3.1 Scope and Description

2.1.3.1.1 Description Overview & Priority

The five Zipped weather files each contain between 500 and 2600 individual weather station txt files representing the entire published weather station data set across Europe comprising 5000+ txt files. The required txt files for each weather parameter need to be identified and removed based on their unique numeric file name. The remaining weather files can be discarded. High Priority as it is not possible to proceed until the correct files and locations are correctly identified.

2.1.3.1.2 Inputs

The five zip files from use case 001.

The station list comma delimited txt file for each of the ZIP files.

Football data files.

2.1.3.1.3 Processing

The football data files are filtered to provide a list of the teams that played for each year of play. From this a ‘master list’ is created that shows every team that played in every season and the stadium location is matched to this with its decimal degrees location.

The station lists are copied into Microsoft Excel. Each parameter is filtered using the two digit country code to leave just Germanys entries. The smallest list is used as there must be five weather variables for each location from the same station. The latitude and Longitude data is split into its three constituent parts and the decimal equivalent calculated. A geo location program using R and the list of weather stations in excel is used to determine the closest weather station for each of the stadiums. The weather station code is matched to each stadium. This

Page 138: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 12

eventually provides a list of the actual weather stations that will be used. These files are extracted and the rest discarded.

2.1.3.1.4 Outputs

An excel file that contains a list of every team that has played across all 21 seasons of play for both leagues and their respective stadium. A series of txt files (comma delimited) that are the actual weather files needed for analysis with the football data.

2.1.3.2 Use Case 002 – Filter weather files

Scope

The scope of this use case is to filter the weather files and identify just those particular weather stations that are relevant to the study and match actual football stadia as closely as possible.

Description

This use case describes the process of identifying only those weather files that match the actual stadia used over the 21 years of Bundesliga 1&2 match history.

Use Case Diagram

Figure 03 – Use Case Diagram 02

Page 139: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 13

Flow Description

Precondition

The data files from use case 001. A PC with Microsoft Excel. R Studio with Google maps and Map distance packages installed.

Activation

This use case starts when the data analyst completes use case 001.

Main flow

1. The participating teams for each season of play are entered into an excel file to provide an overall summary list of every unique team that has played across all 21 seasons of play for both Bundesliga 1 & 2.

2. The decimal degrees location of each stadium is established and added to the master list.

3. The weather station files are filtered on ‘DE’ for Germany for each of the five weather parameters. The smallest list used to ensure that all five variables exist for each unique weather station being used.

4. The stadium location is matched to the nearest weather station using R Studio distance location. The results are entered onto the master list to provide a complete description of each stadia and the weather station that it relates to.

5. The relevant individual files are extracted from each of the five zip files. All other files are removed and discarded.

Termination

All the non-relevant weather files are discarded and the project folder contains only the files needed.

Post condition

The correct weather files to be used are located in the project directory.

2.1.4 Requirement 3: Transform Data

2.1.4.1 Scope and Description

2.1.4.1.1 Description Overview & Priority

The weather and football data files are prepared and treated to remove all unrequired data, columns, characters and noise that are unrequired. The data is error checked for NULL or missing values.

Page 140: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 14

2.1.4.1.2 Inputs

The weather files and football data files from use case 002.

2.1.4.1.3 Processing

Football data: The 21 files associated with each season of play for an entire league are loaded into R and bound into a single file. All unwanted columns and data are removed. The date is reformatted to a standard ISO format.

Weather Data: The weather files are checked for null values.

All unwanted columns and headers are removed and the numeric element corrected by a factor of ten. Each of the five elements is joined on date to provide a single weather file for each weather station location containing all five parameters. All weather information prior to 1993 is removed.

2.1.4.1.4 Outputs

Two CSV files. One for each of the Bundesliga 1&2 leagues.

A single CSV file for each weather station location containing all five weather variables.

2.1.4.1.5 Error handling

If the error checking process detects null values for weather data these are shown as -9999 and will need to be checked against actual dates played to see if a conflict occurs. If the error is small across a few days only then a comparable value will be extrapolated from the data either side. If the error persists across a large range then this weather station will need to be discarded for all five variables and a new one (next nearest) selected in its place.

2.1.4.2 Use Case 003 – Transform Data

Scope

The scope of this use case is to clean up the data files by removing all unwanted information, bind and join the files together to create a smaller number of files with common information and error check the weather data for null values to determine if an alternative station location will need to be used.

Description

This use case describes the process of cleaning, transforming and error checking the data prior to its use within a database and for analysis.

Use Case Diagram

Page 141: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 15

Figure 04 – Use Case 003

Flow Description

Precondition

All of the required files have been extracted and downloaded and unzipped where required.

Activation

This use case starts when the data analyst starts the R script to begin cleaning and transforming the data files.

Main flow

1. The R script opens each weather file and clips the data by removing unwanted header information and removing all dates prior to 1993. All unwanted columns are removed leaving just the date and weather variable. R closes each file and saves as a csv.

2. Python script opens each weather file and checks for NULL values. An output file is created listing all detected values.

3. If an error is detected the error handling process commences (A1) 4. If there is no error across all five variables the files are joined and

saved as a csv. 5. The football data files are bound vertically and all unwanted columns

removed. The date is reformatted to a standard ISO format.

Page 142: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 16

Alternate flow

A1 : Detection of NULL values in weather files 1. Where there are isolated null values the dates are checked against

actual match dates as matches are typically only played at weekends. If the dates don’t match then continue with main flow. If the dates do conflict then interpolate a value based on values either side.

2. If there are a significant number of NULL values denoting missing values across a number of weeks or months then the weather station is discarded and the next nearest one is selected in lieu.

3. The use case continues at position (1) or (4) of the main flow depending on if the new weather file has already been cleaned and treated.

Termination

Python will have completed all error handling and R will have bound or joined all files.

Post condition

The project directory will contain the finished files for the weather and football.

2.1.5 Requirement 4: Database Management

2.1.5.1 Scope and Description

2.1.5.1.1 Description Overview & Priority

The data is entered into a relational database structure such as MySQL. All tables, entities, and relationships are created as required.

2.1.5.1.2 Inputs

The cleaned files from use case 003. A relational database system like MySQL or sqldf (an SQL add on for R) is used to manage the data. This will allow the data to be manipulated and visualised during the analysis stage.

2.1.5.1.3 Processing

An entity relationship diagram is created with all key entities, attributes and relationships. Primary and foreign keys are created or identified. The data is loaded into the data base.

Page 143: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 17

2.1.5.1.4 Outputs

A relational database, fully normalized with clear relationships and no null values or errors.

2.1.5.2 Use Case 004 – Database Management

Scope

The scope of this use case is to create the relational database structure that will allow manipulation and visualisation of the data.

Description

This use case describes the process of the relational database creation and its management.

Use Case Diagram

Figure 05 – Use case 004

Flow Description

Precondition

Use case 003 is fully completed. There is access to a relational database structure like MySQL to allow data to be imported.

Activation

Page 144: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 18

This use case can commence any time after use case 003 is completed and begins when a python script is run which begins the process of loading the files into the database.

Main flow

1. The database is created with all tables and relationships. 2. A python script loads the football and weather data files. 3. The program returns a successful outcome message. 4. The database is created and ready for use.

Termination

The database is fully created and all data is loaded and there are no errors or missing attributes or relationships. The python script returns a task completed message.

Post condition

The data is now entered into a relational database and is ready for use.

2.1.6 Requirement 5: Data Analysis

2.1.6.1 Scope and Description

2.1.6.1.1 Description Overview & Priority

Data analysis is a core function of the projects and the primary objectives of the project relates to the undertaking of data analysis on the data sets which have been prepared for analysis in the previous requirements and use cases.

2.1.6.1.2 Inputs

The relational database with all data loaded in.

2.1.6.1.3 Processing

Access database, manipulate data, and use MYSQL and R Studio to run scripts and statistical analysis on the generated queries based on the primary project objectives. Create the required graphs, tables, mapping and other outputs to visualise the results for compilation in the report and presentation. Interpret the results and document.

2.1.6.1.4 Outputs

Page 145: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 19

A variety of graphs, tables, maps and charts to be included in the presentation and report.

2.1.6.2 Use Case 005 – Data Analysis

Scope

The scope of this use case is to undertake statistical and data mining activities to determine how weather is related to match goal outcome.

Description

This use case describes the process of statistical analysis and data mining activities on the data set followed by interpretation of the results and the creation of outputs to describe and explain the data.

Use Case Diagram

Figure 06 – Use Case 005

Flow Description

Precondition

The relational database in use case 004 is ready for use and accepting queries.

Activation

Page 146: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 20

This use case starts when the relational database (use case 004) is completed.

Main flow

1. Undertake the data mining and statistical analysis based on the projects primary objectives.

2. Generate Queries and scripts to support research and project objectives.

3. Output tables, graphs, charts and maps to present the outcome of the analysis.

4. Interpret the results and document what they mean. 5. Create the appropriate part of the report using the gathered

information.

Termination

The use case is terminated when the primary research objectives have been answered and the relevant report section and presentation is completed.

Post condition

A report & presentation draft structure that provides discussion and explanation of the results supported by graphs, tables, charts and maps.

2.1.7 Requirement 6: Data Prediction Modelling

2.1.7.1 Scope and Description

2.1.7.1.1 Description Overview & Priority

Predictive Modelling is the process by which the data analysis results can be used in conjunction with weather data to make predictions on matches yet to be played. This process has a high priority because being able to make predictions about data provides a potentially useful tool for the customer.

2.1.7.1.2 Inputs

The interpreted results from use case 006.

2.1.7.1.3 Processing

A description of the processing steps. Describe the main steps involved in processing

Page 147: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 21

2.1.7.1.4 Outputs

A predictive model that allows for customers and

2.1.7.1.1 Error handling

If the analysis is inconclusive or there is no relationship between the data sets then the scope for a predictive model is reduced. In this case it may be possible to determine some general trends or patterns if such exist. If no relationship exists between at all then no predictive modelling may be possible.

2.1.7.2 Use Case 006 – Predictive Modelling

Scope

The scope of this use case is to provide a predictive modelling program or overall trends and insights that provides the customer(s) with information that may provide a competitive edge in actual play or through betting instruments.

Description

This use case describes the process to determine goal outcome using match fixture data and weather forecasting information between 1 to 5 days in advance of any game being played.

Use Case Diagram

Figure 07 – Use Case 006

Page 148: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 22

Flow Description

Precondition

The primary data analysis is completed and all results have been interpreted and documented. The results show trends and patterns that allow for predictive modelling to be undertaken.

Activation

This use case starts when the analysis in use case 005 is complete and the predictive analysis is undertaken using R.

Main flow

1. Predictive modelling process commences. 2. 80% of the data is selected at random and designated as ‘training

data’. The remainder will be the actual test data. 3. The programs and models learn from the training data and this is

then applied to the test data to see if the model is able to correctly determine match outcome.

4. Depending on time frames and availability the model may also be applied to actual future football matches using match fixtures and predicted weather forecasting.

5. The results are documented and interpreted.

Alternate flow

A1 : No clear relationships 1. Where no clear relationship exists between the data sets and

predictive modelling is not applicable then any general trends or patterns will be discussed if applicable. (Returns to main process step 5.)

Termination

The use case ends either when the best predictive model is produced based on the data analysis or it is determined that no model can be created.

Post condition

A report and interpretation of the results providing either predictive modelling or general trends and patterns where possible.

Page 149: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 23

2.1.8 Requirement 7: Report Production and Output

2.1.8.1 Scope and Description

2.1.8.1.1 Description Overview & Priority

A report is created which outlines the entire project and all results, findings, discussions and any predictive modelling.

2.1.8.1.2 Inputs

The primary data analysis results from use case 005 and predictive modelling results from use case 006.

2.1.8.1.3 Processing

A clear and detailed report is created to allow the customer to review the project and understand the data sets and all relationships and trends that exist. In addition a short presentation will be created alongside this to present the key findings to the customer.

2.1.8.1.4 Outputs

A detailed and clear printed and electronic report of 10,000 to 12,000 words in line with the customers’ requirements and a PowerPoint presentation file.

2.1.8.2 Use Case 007 – Report Production and Output

Scope

The scope of this use case is to create a final report and presentation to be presented to the customer.

Description

This use case describes the process of creating the final report and presentation material.

Use Case Diagram

Page 150: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 24

Figure 08 – Use case 007

Flow Description

Precondition

The data analysis and predictive modelling use cases are completed.

Activation

This use case starts when the previous two use cases are completed.

Main flow

1. The data analysis is used to create the final report 2. The data analysis and report is used to create the presentation 3. Both the presentation and report are presented to the customer for

review.

Termination

The use case ends once the final and finished report and presentation are fully completed and ready to be issued/presented to the customer.

Post condition

The report and presentation are delivered to the customer for review.

Page 151: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 25

2.2 Non-Functional Requirements

Specifies any other particular non-functional attributes required by the system.

2.2.1 Recover requirement

The project data, analysis programs and output reports must be stored on at least three different mediums that are unrelated to each other. They should include at least one PC, one high capacity USB drive or external hard drive and a cloud storage system such as Google Drive. In the event of accidental loss or damage all of the project data and analysis can be easily recovered and re-instated.

2.2.2 Reliability requirement

The weather data is checked on an ongoing basis by ECA&D and the database is updated to reflect the addition of new data as this is time series based. The football data is also checked for errors and both data sets are deemed to be highly accurate.

2.2.3 Extendibility requirement

There is no requirement to extend the project at this stage. However, due to the ongoing addition of new weather data and ongoing football games being played there is new data being created year on year which could add to the overall data sets being used for this project.

2.2.4 Resource utilization requirement

The project will require a PC with high speed internet connection to ensure timely download of the estimated 1GB of data required. All required software packages will be required including Microsoft Office, Python, R Studio, Web browser and Peazip as well as MySQL.

3 Interface requirements

3.1 Application Programming Interfaces (API)

The analysis process will use Google maps to provide mapping visualisation tools for use within R Studio via dedicated add-ons. Google mapping is used to plot weather station and stadium locations and also to determine the closest distance between them where it is not clear or there is a choice of weather stations in close proximity. Some analysis output results may be presented using Google Mapping visualisation tools.

Page 152: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 26

4 System Architecture

The overall system architecture is shown as a high level diagram in Figure 09. The system and its steps and processes is shown within the dotted line which represents the use cases outlined above.

Overall System Architecture

Figure 09 – Overall system Architecture

Page 153: The effects of weather on football matches played within the German Bundesliga

Requirements Specification

Page 27

UML Class Diagram for Database Structure

The database will have three classes as shown below with attributes. This is a draft class diagram which also forms the basis of an Entity Relationship diagram.

Figure 10 – UML Class diagram for the Database structure

5 System Evolution

As outlined in section 2.2.3 weather data and football results are time series data which are being added to on an ongoing basis. Both the ECA&D and Football results providers add to the data sets on a continual basis providing the option of additional data to be included in any future study.

The system could also consider extending the range of countries as the ECA&D hold data for a huge range of European countries although the detail and reliability of the data outside of modern countries like Germany is not as good quality with higher incidences of Null values. The inclusion of very hot or wet countries like Spain, and Italy could reveal patterns or trends across Europe as a whole.

6 Special resources required

No special resources are required or anticipated at this stage.

Page 154: The effects of weather on football matches played within the German Bundesliga

- 99 -

8.10 Management Progress Reports

8.10.1 Management Progress Report 1

Page 155: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 01

Highlight Report

The effects of weather on goal outcome for football matches played within the German Bundesliga

Release: Management Progress Report 01

Date: 31st October 2014

Authors: Alastair Macnair

x13129325

Page 156: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 31 October 2014

Management Progress Report 01 Page 2 of 11

1 Report History

1.1 Document Location

This document is only valid on the day it was printed.

1.2 Revision History

Revision date Author Version Summary of Changes Changes marked

31/10/2014 Alastair Macnair 01 Initial Issue

1.3 Approvals

This document requires the following approvals:

Name Title Date of Issue Version

Ioana Ghergulescu Project Supervisor 31/10/2014 01

1.4 Distribution

This document has additionally been distributed to:

Name Title Date of Issue Status

Page 157: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 31 October 2014

Management Progress Report 01 Page 3 of 11

Table of Contents Page

1 Report History ________________________________________________________________ 2

1.1 Document Location ________________________________________________________ 2

1.2 Revision History __________________________________________________________ 2

1.3 Approvals _______________________________________________________________ 2

1.4 Distribution ______________________________________________________________ 2

2 Highlight Report from 15th September 2014 to 31st October 2014. _______________________ 4

3 Highlight Report Purpose _______________________________________________________ 4

4 Summary of Project Progress ____________________________________________________ 4

4.1 Project Plan Table and summary status ________________________________________ 5

5 Key Milestones Achieved in this period ____________________________________________ 6

6 Problems encountered in this period ______________________________________________ 6

7 Highlighting Concerns (RAID Log) _________________________________________________ 6

8 Variance from Plan ____________________________________________________________ 6

9 Planned Work for Next Period (to 30-11-2014) ______________________________________ 7

10 Appendix A – Revised Project Plan Gantt Chart ______________________________________ 8

11 Appendix B – Risks, Assumptions, Issues and Dependencies (RAID) Logs __________________ 9

Page 158: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 31 October 2014

Management Progress Report 01 Page 4 of 11

2 Highlight Report from 15th September 2014 to 31st October 2014.

3 Highlight Report Purpose

A Highlight report provides the Project Board and Client/Customer with a summary of the status of a

project at agreed stages and is used to monitor progress. The Project manager/Data Analyst uses the

Highlight report to advise the Project Board/Client of any potential problems or areas where they

could help.

4 Summary of Project Progress

Data Sets Downloaded and preliminary check indicates they are OK

Bundesliga 1 Team and Stadium List 100% completed, weather station list compiled

Bundesliga 2 Team and Stadium List 65% completed, weather station list yet to be completed

R script written to match decimal degree co-ordinates of each stadium to nearest weather station to create stadium lists above

R script written to bind all football match data files

R script written to strip and clean football and weather data files

Python script written to error check weather files for missing values

Introduction started

Literature Review started. Primary sections initially identified

Data description Section started

All 21 years of Bundesliga results bound into one file

Some research started in technology and statistical analysis

Some areas have been started slightly ahead of schedule as per the project table summary indicates.

The project plan Gantt chart was re-done from scratch to maximise the effectiveness of excel in being able to provide a simple project plan. A table was built using all currently identified tasks, although some were omitted to ensure clarity of the overall chart. The revised chart can be adjusted and updated much easier which the Gantt chart (Appendix A.) reflects automatically and the use of colour allows for specific sub groups of tasks to be better identified. The table uses a traffic light system to identify sections completed, in progress and yet to be started. The Project plan table is shown on the next page: -

Page 159: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 31 October 2014

Management Progress Report 01 Page 5 of 11

4.1 Project Plan Table and summary status

Start Date Duration End Date Satus

PROJECT PROPOSAL

Project Proposal Submitted 28-Sep-14 1 28-Sep-14 Completed

REQUIREMENTS SPECIFICATION

Requirements Specification 01-Oct-14 12 12-Oct-14 Completed

Requirements Specification Submitted 12-Oct-14 1 12-Oct-14 Completed

MANAGEMENT REPORT

Management Report 01 31-Oct-14 1 31-Oct-14 Completed

Management Report 02 30-Nov-14 1 30-Nov-14 To be started

Management Report 03 20-Dec-14 1 20-Dec-14 To be started

KEY RESEARCH AREAS

Germany's Weather 01-Nov-14 12 12-Nov-14 In Progress

Statistical Tools 15-Sep-14 60 13-Nov-14 In Progress

Sports Performance Factors 01-Nov-14 12 12-Nov-14 To be started

Stadium Design 01-Nov-14 10 10-Nov-14 To be started

Technology and Tools 15-Sep-14 55 08-Nov-14 In Progress

DATA EXTRACTION

Download Weather Files 29-Sep-14 1 29-Sep-14 Completed

Download Football Files 29-Sep-14 1 29-Sep-14 Completed

FILTER WEATHER FILES

Collate Bundesliga Stadium list 14-Oct-14 20 02-Nov-14 In Progress

Extract required weater stations 02-Nov-14 1 02-Nov-14 To be started

DATA TRANSFORMATION

Write R script to clean and transform weather files 01-Nov-14 5 05-Nov-14 In Progress

Write R script to clean and transform football files 30-Oct-14 2 31-Oct-14 Completed

Bind and clean all football files 30-Oct-14 2 31-Oct-14 Completed

JOIN all weather parameters for each station 05-Nov-14 1 05-Nov-14 To be started

DATABASE MANAGEMENT

Design Relational Database 01-Nov-14 2 02-Nov-14 In Progress

Insert Data into SQL database 06-Nov-14 1 06-Nov-14 To be started

DATA ANALYSIS

Analyse the database 06-Nov-14 15 20-Nov-14 To be started

Graphing and Visualisation 15-Nov-14 6 20-Nov-14 To be started

PREDICTION

Predictive Modelling 20-Nov-14 2 21-Nov-14 To be started

REPORT WRITING

Introduction 31-Oct-14 4 03-Nov-14 In Progress

Literature Review 04-Nov-14 8 11-Nov-14 In Progress

Data Set Description 12-Nov-14 2 13-Nov-14 In Progress

Discussion 20-Nov-14 14 03-Dec-14 To be started

Conclusion 03-Dec-14 3 05-Dec-14 To be started

Checking & Review 08-Dec-14 5 12-Dec-14 To be started

Print 3 Copies and Bind 15-Dec-14 3 17-Dec-14 To be started

Submit Dissertation 06-Jan-15 1 06-Jan-15 To be started

PROJECT PRESENTATION

Prepare & Practice Presentation 06-Jan-15 14 19-Jan-15 To be started

Make Presentation 19-Jan-15 5 23-Jan-15 To be started

Page 160: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 31 October 2014

Management Progress Report 01 Page 6 of 11

5 Key Milestones Achieved in this period

Project Proposal document Updated and re-issued 28th September 2014

Requirements Specification document completed and issued 12th October 2014

Project Plan revised and updated 31st October 2014 (See Appendix 1)

6 Problems encountered in this period

When the stadium list was compiled the total number of individual teams that have played over the 21 year period was much larger than anticipated. Both the Bundesliga 1 & 2 leagues have a total of 18 teams each at any one time. Bundesliga 1 has unique 38 teams and Bundesliga 2 features 67 teams over the 21 year period. This added significantly to time to create the stadium list

Some teams have changed name or stadium over the time period

The European Climate Assessment & Database website updated its datasets for a large part of October and the files were unavailable for download.

7 Highlighting Concerns (RAID Log)

The full RISK log is provided in Appendix B. Summary of the RAID log: -

Risks

Four risks have been identified that could affect the project in the future in the next period. One has been resolved relatively easily.

Assumptions

One long range assumption has been identified relating to the holiday period and available resources over this period to complete the project.

Issues

Five issues were encountered in the reported period. Four were dealt with and the fifth is due to be resolved within the next few days.

Dependencies

There is only one current dependency which is ensuring the cleaned and transformed data is completed on time to allow for database creation and analysis which is a major and key part of the project.

8 Variance from Plan

The original plan had assumed that more work would have been completed during October which has not been the case. Other external demands limited the time able to invest in this

The original plan assumed for more work over the Christmas period than could be considered realistic upon reflection. Based on bank holidays and potential business closures the project has been pulled back from the Christmas period

Page 161: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 31 October 2014

Management Progress Report 01 Page 7 of 11

9 Planned Work for Next Period (to 30-11-2014)

The next work period will be when the vast majority of all the key tasks will be undertaken including the main analysis and report writing.

Finalise and complete stadium lists and weather file extraction

Clean and bind/join all required data files

Update all final data into database

Undertake analysis and all graphing tables and charts

Write first three primary report sections

Undertake ongoing research to support analysis and report writing

Page 162: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 01

10 Appendix A – Revised Project Plan Gantt Chart

PROJECT PROPOSALProject Proposal Submitted

REQUIREMENTS SPECIFICATIONRequirements Specification

Requirements Specification SubmittedMANAGEMENT REPORT

Management Report 01Management Report 02Management Report 03

KEY RESEARCH AREASGermany's Weather

Statistical ToolsSports Performance Factors

Stadium DesignTechnology and Tools

DATA EXTRACTIONDownload Weather FilesDownload Football Files

FILTER WEATHER FILESCollate Bundesliga Stadium list

Extract required weater stationsDATA TRANSFORMATION

Write R script to clean and transform weather filesWrite R script to clean and transform football files

Bind and clean all football filesJOIN all weather parameters for each station

DATABASE MANAGEMENTDesign Relational Database

Insert Data into SQL databaseDATA ANALYSIS

Analyse the databaseGraphing and Visualisation

PREDICTIONPredictive Modelling

REPORT WRITINGIntroduction

Literature ReviewData Set Description

DiscussionConclusion

Checking & ReviewPrint 3 Copies and Bind

Submit DissertationPROJECT PRESENTATION

Prepare & Practice PresentationMake Presentation

Page 163: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 01

11 Appendix B – Risks, Assumptions, Issues and Dependencies (RAID) Logs

RISKS

ID Date Raised Risk Description Like

lih

oo

d

Imp

act

Seve

rity

Mitigation Plan Owner Status Date Closed

1 28/09/2014

Available time until project deadline is

limited 3 3 9

Ensure constant review of project plan and

keep to project plan deadlines. Open

2 15/09/2014

Project Data is lost due to PC or USB key

loss/damage

3 5 15

A dedicated Google Drive space has been set

up in addition to storage on a PC and USB key

creating three distinct storage places. Back

up latest files at least onc a week or after

significant work development Closed 15/10/2014

3 20/09/2014

Ability to apply appropriate statistical and

algorithm applications to data limited due to

time and knowledge shortfalls impacting on

project quality 1 3 3

Ensure ongoing research and adherence to

project plan as well timely undertaking of

literature review to ensure all knowledge

areas are complete Open

4 28/09/2014

Period of Christmas is potentially much less

useful than it appears especially as to

working days and using external providers to

print and bind prior to the deadline

3 5 15

Pull back project timeline to allow for

printing before christmas. Identify

businesses that provide this service and

determine opening hours as early as possible

Open

Page 164: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 01

ISSUES

ID Date Raised Issue Description Impact Description Impact Priority Mitigation Plan Owner Status

1 14/10/2014

Bundesliga total team range over 21

years significantly higher than

anticiapted adding to work and

creating lots of 'one' off teams over

the period

Adds to time required to collate

stadia list and identify stadia and

potentially affects the analysis

where single teams and stadia are

present over the Medium High

Allow more time and adjust project

plan accordingley Open

2 14/10/2014

Some teams have changed over the

years and stadium location has

changed

Over 21 years teams have changed

in name and even location Low Low

Number of teams upon analysis is

only one or two. Make a reasonable

decision to place and re-name to

ensure consistency across all 21

years Closed

3 25/10/2014

Project plan has slipped from

origonal

Origonal plan had allowed for much

more work to be completed in this

period placing additional stress into

next work preiod. Medium Medium

Revise plan realistically as possible

and better allow for external

pressures Closed

4 14/10/2014

ECA&D website undertook a major

update and some data sets could

not be downloaded

Unable to download primary data

sets as more were added due to

project proposal revision High High

Check every day until data sets

become available for download Closed

5 25/10/2014

Existing Gantt Chart cannot be easily

adjusted or updated to reflect

change

Unable to plan and manage project

and significant time to adjust chart

moving forward. Danger is that this

is not done and ability to manage is

compromised. Medium Medium

Revise plan to use changeable date

format and days that automatically

updates chart Closed

Page 165: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 01

ASSUMPTIONS

DEPENDANCIES

ID Date Raised Assumption Description Reason for Assumption Action to Validate Impact if Assumption Incorrect Status

1 30/10/2014

Businesses that print and bind may

be closed over the christmas period

Holidays and reduced demand may

see these businesses close over an

extended period

Identify businesses required to bind

report and clarify opening hours

well before christmas period

More time in project plan currently

not being utilised. Open

ID Date Raised Dependency Description Location Deliverables Delivery Date Importance Status

1 20/10/2014

Analysis and database creation is

dependant upon stadium list

being compiled Internal

Bound files for weather station

locations and stadium lists. 02/11/2014 High Open

Page 166: The effects of weather on football matches played within the German Bundesliga

- 100 -

8.10.2 Management Progress Report 2

Page 167: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 02

Highlight Report

The effects of weather on goal outcome for football matches played within the German Bundesliga

Release: Management Progress Report 02

Date: 7th December 2014

Authors: Alastair Macnair

x13129325

Page 168: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 7 December 2014

Management Progress Report 02 Page 2 of 13

1 Report History

1.1 Document Location

This document is only valid on the day it was printed.

1.2 Revision History

Revision date Author Version Summary of Changes Changes marked

31/10/2014 Alastair Macnair 01 Initial Issue

7/12/2014 Alastair Macnair 02 Progress Revision

1.3 Approvals

This document requires the following approvals:

Name Title Date of Issue Version

Ioana Ghergulescu Project Supervisor 31/10/2014 01

Ioana Ghergulescu Project Supervisor 31/10/2014 01

1.4 Distribution

This document has additionally been distributed to:

Name Title Date of Issue Status

Page 169: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 7 December 2014

Management Progress Report 02 Page 3 of 13

Table of Contents Page

1 Report History ________________________________________________________________ 2

1.1 Document Location ________________________________________________________ 2

1.2 Revision History __________________________________________________________ 2

1.3 Approvals _______________________________________________________________ 2

1.4 Distribution ______________________________________________________________ 2

2 Highlight Report from 1st November 2014 to 7th December. ____________________________ 4

3 Highlight Report Purpose _______________________________________________________ 4

4 Summary of Project Progress ____________________________________________________ 4

4.1 Project Plan Table and summary status ________________________________________ 5

5 Key Milestones Achieved in this period ____________________________________________ 6

6 Problems encountered in this period ______________________________________________ 6

7 Highlighting Concerns (RAID Log) _________________________________________________ 6

8 Variance from Plan ____________________________________________________________ 7

9 Contingency Planning __________________________________________________________ 7

10 Planned Work for Next Period (to 20-12-2014) ______________________________________ 7

11 Appendix A – Revised Project Plan Gantt Chart ______________________________________ 8

11.1 Detailed Project Plan for upcoming period ______________________________________ 9

12 Appendix B – Risks, Assumptions, Issues and Dependencies (RAID) Logs _________________ 10

13 Appendix C - Contingency Plans _________________________________________________ 13

13.1 File, Electronic and Hardcopy protection, Backup and recovery Plan ________________ 13

13.2 Critical Circumstance & Threat Contingency Plan _______________________________ 13

Page 170: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 7 December 2014

Management Progress Report 02 Page 4 of 13

2 Highlight Report from 1st November 2014 to 7th December.

3 Highlight Report Purpose

A Highlight report provides the Project Board and Client/Customer with a summary of the status of a

project at agreed stages and is used to monitor progress. The Project manager/Data Analyst uses the

Highlight report to advise the Project Board/Client of any potential problems or areas where they

could help.

4 Summary of Project Progress

Stadiums and weather station (observation data) fully paired and calculated

Complete data set compiled and checked and 100% ready

R scripts written for Feature Engineering enhancements to Data set

Feature engineering elements added to data set. (Goal outcome measures, seasons etc.)

R scripts written for descriptive statistics and graphing

Report Section 01 – Introduction, completed

Report Section 02 – Literature Review, 20% Complete

Report Section 03 – Data Sets, 25% complete

Report Section 04 – Analysis, 5% complete

Research ongoing in statistical analysis and sports performance

Research ongoing in Data Mining techniques and predictive modelling

This period has seen progress in a number of areas. The Gantt chart has been split to provide an overview and a separate more detailed plan for the next work period to help better understand the various sub tasks required. Gathering weather forecast data will commence from 7th December until the 20th December prior to each match.

Page 171: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 7 December 2014

Management Progress Report 02 Page 5 of 13

4.1 Project Plan Table and summary status

Start Date Duration End Date Satus

PROJECT PROPOSAL

Project Proposal Submitted 28-Sep-14 1 28-Sep-14 Completed

REQUIREMENTS SPECIFICATION

Requirements Specification 01-Oct-14 12 12-Oct-14 Completed

Requirements Specification Submitted 12-Oct-14 1 12-Oct-14 Completed

MANAGEMENT REPORT

Management Report 01 31-Oct-14 1 31-Oct-14 Completed

Management Report 02 07-Dec-14 1 07-Dec-14 Completed

Management Report 03 20-Dec-14 1 20-Dec-14 To be started

KEY RESEARCH AREAS

Germany's Weather 01-Nov-14 12 12-Nov-14 Completed

Statistical Tools 15-Sep-14 60 13-Nov-14 Completed

Sports Performance Factors 08-Dec-14 7 14-Dec-14 In Progress

Stadium Design 08-Dec-14 7 14-Dec-14 In Progress

Technology and Tools 15-Sep-14 55 08-Nov-14 Completed

DATA EXTRACTION

Download Weather Files 29-Sep-14 1 29-Sep-14 Completed

Download Football Files 29-Sep-14 1 29-Sep-14 Completed

FILTER WEATHER FILES

Collate Bundesliga Stadium list 14-Oct-14 20 02-Nov-14 Completed

Extract required weater stations 02-Nov-14 1 02-Nov-14 Completed

DATA TRANSFORMATION

Write R script to clean and transform weather files 01-Nov-14 5 05-Nov-14 Completed

Write R script to clean and transform football files 30-Oct-14 2 31-Oct-14 Completed

Bind and clean all football files 30-Oct-14 2 31-Oct-14 Completed

JOIN all weather parameters for each station 05-Nov-14 1 05-Nov-14 Completed

DATABASE MANAGEMENT

Design Relational Database 01-Nov-14 2 02-Nov-14 Omitted

Insert Data into SQL database 06-Nov-14 1 06-Nov-14 Omitted

DATA ANALYSIS

Analyse the database 01-Dec-14 20 20-Dec-14 In Progress

Graphing and Visualisation 01-Dec-14 20 20-Dec-14 In Progress

PREDICTION

Predictive Modelling 10-Dec-14 12 21-Dec-14 To be started

REPORT WRITING

Introduction 31-Oct-14 4 03-Nov-14 Completed

Literature Review 08-Dec-14 8 15-Dec-14 In Progress

Data Set Description 08-Dec-14 5 12-Dec-14 In Progress

Analysis and Evaluation 12-Dec-14 8 19-Dec-14 In Progress

Conclusion 17-Dec-14 4 20-Dec-14 To be started

Checking & Review 28-Dec-14 2 29-Dec-14 To be started

Print 3 Copies and Bind 02-Jan-15 2 03-Jan-15 To be started

Submit Dissertation 06-Jan-15 1 06-Jan-15 To be started

PROJECT PRESENTATION

Prepare & Practice Presentation 12-Jan-15 7 18-Jan-15 To be started

Make Presentation 19-Jan-15 5 23-Jan-15 To be started

Page 172: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 7 December 2014

Management Progress Report 02 Page 6 of 13

5 Key Milestones Achieved in this period

All four data sets (Observations, Stations, Stadiums, and Matches) have been cleaned, corrected and joined to provide a complete data set of all matches, stadiums and weather observations. Problems encountered last period such as name changes were corrected. A small percentage of missing observation data was overcome using Multiple Imputation.

Feature Engineering enhancements have been added to the data set.

Descriptive statistics and graphing have been undertaken on the data.

Project Report has seen some sections completed and most others started.

Project Management Plan revised and updated 7th December 2014 (See Appendix A)

Contingency Plans created.

6 Problems encountered in this period

Although an R script works out the distances between stadia and weather stations there is still a manual element to add or remove those weather station observation files from the project folder. This needed to be checked as found even one missing file would affect the final joins creating a list with values missing and making it was hard to detect errors.

Personal and Work/College factors continue to detrimentally impact on the project plan. However, the majority of all other NCI commitments are now completed which allows for the project to regain priority positioning.

7 Highlighting Concerns (RAID Log)

The full RISK log is provided in Appendix B. Summary of the RAID log: -

Risks

Time continues to be the biggest risk factor to meet the required deadline. The previously unused time over Christmas has been utilised and the project super visor has confirmed that binding is not a critical requirement. Work and research in Data mining has developed knowledge in this area and creating contingency plans alongside existing data backup methods has limited risk in this area.

Assumptions

Project Super visor has confirmed that binding is not an essential requirement.

Issues

All issues now resolved

Dependencies

Predictions are dependent on collecting weather forecast data (not historical data.) This data will need to be recorded from a suitable forecast provider prior to every match. Failure to have reliable forecasting data will prevent real match data to be used.

Page 173: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 7 December 2014

Management Progress Report 02 Page 7 of 13

8 Variance from Plan

As other NCI deadlines and commitments built up over November project work was affected although it has continued albeit at a slower than ideal pace. These other commitments are now essentially completed allowing more time for project work.

The updated plan now uses the previously unused time over Christmas previously omitted to ensure project deliverables can be achieved due to delays in this period.

The MySQL data base that was to be used has been omitted after review as everything can be achieved in R Studio and calculations, tabulation and graphing will be quicker and easier.

9 Contingency Planning

Sections on contingency planning have been added for data loss and for external threats and circumstances. See Appendix C.

10 Planned Work for Next Period (to 20-12-2014)

The next work period will still be based on key sections such as the analysis and report writing being undertaken.

Complete detailed analysis of data using descriptive statistics

Undertake detailed analysis of data using inferential statistics

Undertake detailed analysis of data using Data Mining techniques

Capture weather forecasts for remaining matches to be played in 2014.

Complete Report sections 2 & 3

Commence report section 4

Finalise research into sports performance analysis

Page 174: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 02

11 Appendix A – Revised Project Plan Gantt Chart

Page 175: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 7 December 2014

Management Progress Report 02 Page 9 of 13

11.1 Detailed Project Plan for upcoming period

Page 176: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 02

12 Appendix B – Risks, Assumptions, Issues and Dependencies (RAID) Logs

RISKS

ID Date Raised Risk Description Like

lih

oo

d

Imp

act

Seve

rity

Mitigation Plan Owner Status Date Closed

1 28/09/2014

Available time until project deadline is

limited 3 3 9

Ensure constant review of project plan and

keep to project plan deadlines. Open

2 15/09/2014

Project Data is lost due to PC or USB key

loss/damage

3 5 15

A dedicated Google Drive space has been set

up in addition to storage on a PC and USB key

creating three distinct storage places. Back

up latest files at least onc a week or after

significant work development Closed 15/10/2014

3 20/09/2014

Ability to apply appropriate statistical and

algorithm applications to data limited due to

time and knowledge shortfalls impacting on

project quality 1 3 3

Ensure ongoing research and adherence to

project plan as well timely undertaking of

literature review to ensure all knowledge

areas are complete Closed 01/12/2014

4 28/09/2014

Period of Christmas is potentially much less

useful than it appears especially as to

working days and using external providers to

print and bind prior to the deadline

3 5 15

Pull back project timeline to allow for

printing before christmas. Identify

businesses that provide this service and

determine opening hours as early as possible

Closed 01/12/2014

Page 177: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 02

ISSUES

ID Date Raised Issue Description Impact Description Impact Priority Mitigation Plan Owner Status

1 14/10/2014

Bundesliga total team range over 21

years significantly higher than

anticiapted adding to work and

creating lots of 'one' off teams over

the period

Adds to time required to collate

stadia list and identify stadia and

potentially affects the analysis

where single teams and stadia are

present over the Medium High

Allow more time and adjust project

plan accordingley Closed

2 14/10/2014

Some teams have changed over the

years and stadium location has

changed

Over 21 years teams have changed

in name and even location Low Low

Number of teams upon analysis is

only one or two. Make a reasonable

decision to place and re-name to

ensure consistency across all 21

years Closed

3 25/10/2014

Project plan has slipped from

origonal

Origonal plan had allowed for much

more work to be completed in this

period placing additional stress into

next work preiod. Medium Medium

Revise plan realistically as possible

and better allow for external

pressures Closed

4 14/10/2014

ECA&D website undertook a major

update and some data sets could

not be downloaded

Unable to download primary data

sets as more were added due to

project proposal revision High High

Check every day until data sets

become available for download Closed

5 25/10/2014

Existing Gantt Chart cannot be easily

adjusted or updated to reflect

change

Unable to plan and manage project

and significant time to adjust chart

moving forward. Danger is that this

is not done and ability to manage is

compromised. Medium Medium

Revise plan to use changeable date

format and days that automatically

updates chart Closed

Page 178: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 02

ASSUMPTIONS

DEPENDANCIES

ID Date Raised Assumption Description Reason for Assumption Action to Validate Impact if Assumption Incorrect Status

1 30/10/2014

Businesses that print and bind may

be closed over the christmas period

Holidays and reduced demand may

see these businesses close over an

extended period

Identify businesses required to bind

report and clarify opening hours

well before christmas period

More time in project plan currently

not being utilised. Closed

ID Date Raised Dependency Description Location Deliverables Delivery Date Importance Status

1 20/10/2014

Analysis and database creation is

dependant upon stadium list

being compiled Internal

Bound files for weather station

locations and stadium lists. 02/11/2014 High Closed

2 01/12/2014

Real match predictions need

weather forecasts to be captured

and recorded now for later

analysis External

Weather forecast details for all

weather paramters to be taken on

day of match prior to being

played 20/12/2014 High Open

Page 179: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 02

13 Appendix C - Contingency Plans

13.1 File, Electronic and Hardcopy protection, Backup and recovery Plan

This contingency plan addresses how critical project data can be protected from events such as PC hard drive failure, USB loss, power failure, Internet loss or any other incident which when occurring can threaten the project data including all analysis and reports electronic and hard copy. This process should be enacted if these events occur Critical Electronic Project Data

All project data is backed up onto PC from the working USB key and also onto a dedicated cloud storage file space. In the event that one of these is compromised or lost then an alternative should be established immediately to ensure that multiple redundancy is maintained at all times. A secondary hard drive or storage device should be obtained with the most recent version of the project files copied onto it. Hard Copy File Data

Hard copies will be created towards the end of the project. They should be suitably protected and bound to ensure they cannot be easily damaged and stored in a safe place until submitted to the client or submitted early. 3 copies should be printed.

13.2 Critical Circumstance & Threat Contingency Plan

In the event of personal circumstances beyond anyone’s control such as illness or family emergency the following plan should be put in place.

1. Determine as best as possible impact on project plan 2. If time lost will be detrimental to the project to the extent where delivery is

comprised immediately contact Project Supervisor to advise in writing 3. Ensue that personal circumstances forms are completed and Project Supervisor is

kept informed as to situation.

Page 180: The effects of weather on football matches played within the German Bundesliga

- 101 -

8.10.3 Management Progress Report 3

Page 181: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 03

Highlight Report

The effects of weather on goal outcome for football matches played within the German Bundesliga

Release: Management Progress Report 03

Date: 30th December 2014

Authors: Alastair Macnair

x13129325

Page 182: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 30 December 2014

Management Progress Report 03 Page 2 of 12

1 Report History

1.1 Document Location

This document is only valid on the day it was printed.

1.2 Revision History

Revision date Author Version Summary of Changes Changes marked

31/10/2014 Alastair Macnair 01 Initial Issue

7/12/2014 Alastair Macnair 02 Progress Revision

30/12/2014 Alastair Macnair 03 Progress Revsision

1.3 Approvals

This document requires the following approvals:

Name Title Date of Issue Version

Ioana Ghergulescu Project Supervisor 31/10/2014 01

Ioana Ghergulescu Project Supervisor 30/12/2014 03

1.4 Distribution

This document has additionally been distributed to:

Name Title Date of Issue Status

Page 183: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 30 December 2014

Management Progress Report 03 Page 3 of 12

Table of Contents Page

1 Report History ________________________________________________________________ 2

1.1 Document Location ________________________________________________________ 2

1.2 Revision History __________________________________________________________ 2

1.3 Approvals _______________________________________________________________ 2

1.4 Distribution ______________________________________________________________ 2

2 Highlight Report from 1st November 2014 to 7th December. ____________________________ 4

3 Highlight Report Purpose _______________________________________________________ 4

4 Summary of Project Progress ____________________________________________________ 4

4.1 Project Plan Table and summary status ________________________________________ 5

5 Key Milestones Achieved in this period ____________________________________________ 6

6 Problems encountered in this period ______________________________________________ 6

7 Highlighting Concerns (RAID Log) _________________________________________________ 6

8 Variance from Plan ____________________________________________________________ 6

9 Contingency Planning __________________________________________________________ 6

10 Planned Work for Next Period (to 20-12-2014) ______________________________________ 7

11 Appendix A – Revised Project Plan Gantt Chart ______________________________________ 8

11.1 Detailed Project Plan for upcoming period _______________ Error! Bookmark not defined.

12 Appendix B – Risks, Assumptions, Issues and Dependencies (RAID) Logs __________________ 9

13 Appendix C - Contingency Plans _________________________________________________ 12

13.1 File, Electronic and Hardcopy protection, Backup and recovery Plan ________________ 12

13.2 Critical Circumstance & Threat Contingency Plan _______________________________ 12

Page 184: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 30 December 2014

Management Progress Report 03 Page 4 of 12

2 Highlight Report from 7th December to 30th December 2014.

3 Highlight Report Purpose

A Highlight report provides the Project Board and Client/Customer with a summary of the status of a

project at agreed stages and is used to monitor progress. The Project manager/Data Analyst uses the

Highlight report to advise the Project Board/Client of any potential problems or areas where they

could help.

4 Summary of Project Progress

Report Section 01 – Completed

Report Section 02 – Completed

Report Section 03 – Completed

Report Section 04 – 90% complete

Report Section 05 – 65% complete

Report Section 06 – 50% complete

Data Analysis complete

Data Mining (predictive analysis) analysis 50% complete

This period has seen the most work and completion of the various tasks. The Gantt chart has been updated to reflect the few outstanding tasks left to complete the project. Gathering weather forecast data has been omitted as the number of ‘real’ matches taking place (15) was considered to be too small a sample to be able to undertake meaningful predictive modelling. The data set will be split instead into training and test sets.

Page 185: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 30 December 2014

Management Progress Report 03 Page 5 of 12

4.1 Project Plan Table and summary status

Start Date Duration End Date Satus

PROJECT PROPOSAL

Project Proposal Submitted 28-Sep-14 1 28-Sep-14 Completed

REQUIREMENTS SPECIFICATION

Requirements Specification 01-Oct-14 12 12-Oct-14 Completed

Requirements Specification Submitted 12-Oct-14 1 12-Oct-14 Completed

MANAGEMENT REPORT

Management Report 01 31-Oct-14 1 31-Oct-14 Completed

Management Report 02 07-Dec-14 1 07-Dec-14 Completed

Management Report 03 20-Dec-14 1 20-Dec-14 Completed

KEY RESEARCH AREAS

Germany's Weather 01-Nov-14 12 12-Nov-14 Completed

Statistical Tools 15-Sep-14 60 13-Nov-14 Completed

Sports Performance Factors 08-Dec-14 7 14-Dec-14 Completed

Stadium Design 08-Dec-14 7 14-Dec-14 Completed

Technology and Tools 15-Sep-14 55 08-Nov-14 Completed

DATA EXTRACTION

Download Weather Files 29-Sep-14 1 29-Sep-14 Completed

Download Football Files 29-Sep-14 1 29-Sep-14 Completed

FILTER WEATHER FILES

Collate Bundesliga Stadium list 14-Oct-14 20 02-Nov-14 Completed

Extract required weater stations 02-Nov-14 1 02-Nov-14 Completed

DATA TRANSFORMATION

Write R script to clean and transform weather files 01-Nov-14 5 05-Nov-14 Completed

Write R script to clean and transform football files 30-Oct-14 2 31-Oct-14 Completed

Bind and clean all football files 30-Oct-14 2 31-Oct-14 Completed

JOIN all weather parameters for each station 05-Nov-14 1 05-Nov-14 Completed

DATABASE MANAGEMENT

Design Relational Database 01-Nov-14 2 02-Nov-14 Omitted

Insert Data into SQL database 06-Nov-14 1 06-Nov-14 Omitted

DATA ANALYSIS

Analyse the database 01-Dec-14 20 20-Dec-14 Completed

Graphing and Visualisation 01-Dec-14 20 20-Dec-14 Completed

PREDICTION

Predictive Modelling 01-Jan-15 3 03-Jan-15 In Progress

REPORT WRITING

Introduction 31-Oct-14 4 03-Nov-14 Completed

Literature Review 08-Dec-14 8 15-Dec-14 Completed

Data Set Description 08-Dec-14 5 12-Dec-14 Completed

Analysis and Evaluation 12-Dec-14 8 19-Dec-14 Completed

Conclusion 28-Dec-14 8 04-Jan-15 In Progress

Checking & Review 28-Dec-14 8 04-Jan-15 In Progress

Print 3 Copies and Bind 05-Jan-15 1 05-Jan-15 To be started

Submit Dissertation 06-Jan-15 1 06-Jan-15 To be started

PROJECT PRESENTATION

Prepare & Practice Presentation 12-Jan-15 7 18-Jan-15 To be started

Make Presentation 19-Jan-15 5 23-Jan-15 To be started

Page 186: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 30 December 2014

Management Progress Report 03 Page 6 of 12

5 Key Milestones Achieved in this period

All analysis and graphing has been completed and added to the report.

All primary sections of the report have been drafted and most have been finished with just the conclusion yet to complete.

Project Management Plan (03) revised and updated 30th December 2014 (See Appendix A)

6 Problems encountered in this period

With four dependant variables (goal outcome) with three subsets of this (All data, B1 and B2) and 8+ independent variables there is a massive amount of graphing and analysis that potentially needs to be undertaken. Almost 96 distinct cases that need analysis. Deciding on how to approach this and which ones need to be omitted has been the greatest challenge. It would have been better to focus on just or two dependant variables only.

Personal and Work/College factors continue to detrimentally impact on the project plan.

7 Highlighting Concerns (RAID Log)

The full RISK log is provided in Appendix B. Summary of the RAID log: -

Risks

As the project is essentially finished time risk is now reduced and the topic closed

Assumptions

There are no ongoing assumptions

Issues

All issues now resolved

Dependencies

Collecting real match data has been omitted as the number of matches is too low to be viable for analysis. The existing data set will be used instead.

8 Variance from Plan

Overall the project plan has been adhered to although there has been some slippage over the Christmas period but not detrimentally so.

The updated plan now reflects the outstanding tasks over the next 4-6 days needed to be completed to finish the project.

9 Contingency Planning

Sections on contingency planning continue to be monitored and back-ups are in progress. See Appendix C.

Page 187: The effects of weather on football matches played within the German Bundesliga

The effects of weather on goal outcome for football matches played within the German Bundesliga

Highlight Report Date: 30 December 2014

Management Progress Report 03 Page 7 of 12

10 Planned Work for Next Period (to 06-01-2015)

The next work period will see the completion of the project and finalising unfished elements.

Complete Data mining and predictive modelling

Complete Conclusion

Print and submit finished report

Page 188: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 03

11 Appendix A – Revised Project Plan Gantt Chart

Page 189: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 03

12 Appendix B – Risks, Assumptions, Issues and Dependencies (RAID) Logs

RISKS

ID Date Raised Risk Description Like

lih

oo

d

Imp

act

Seve

rity

Mitigation Plan Owner Status Date Closed

1 28/09/2014

Available time until project deadline is

limited 1 3 3

Ensure constant review of project plan and

keep to project plan deadlines. Closed

2 15/09/2014

Project Data is lost due to PC or USB key

loss/damage

3 5 15

A dedicated Google Drive space has been set

up in addition to storage on a PC and USB key

creating three distinct storage places. Back

up latest files at least onc a week or after

significant work development Closed 15/10/2014

3 20/09/2014

Ability to apply appropriate statistical and

algorithm applications to data limited due to

time and knowledge shortfalls impacting on

project quality 1 3 3

Ensure ongoing research and adherence to

project plan as well timely undertaking of

literature review to ensure all knowledge

areas are complete Closed 01/12/2014

4 28/09/2014

Period of Christmas is potentially much less

useful than it appears especially as to

working days and using external providers to

print and bind prior to the deadline

3 5 15

Pull back project timeline to allow for

printing before christmas. Identify

businesses that provide this service and

determine opening hours as early as possible

Closed 01/12/2014

Page 190: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 03

ISSUES

ID Date Raised Issue Description Impact Description Impact Priority Mitigation Plan Owner Status

1 14/10/2014

Bundesliga total team range over 21

years significantly higher than

anticiapted adding to work and

creating lots of 'one' off teams over

the period

Adds to time required to collate

stadia list and identify stadia and

potentially affects the analysis

where single teams and stadia are

present over the Medium High

Allow more time and adjust project

plan accordingley Closed

2 14/10/2014

Some teams have changed over the

years and stadium location has

changed

Over 21 years teams have changed

in name and even location Low Low

Number of teams upon analysis is

only one or two. Make a reasonable

decision to place and re-name to

ensure consistency across all 21

years Closed

3 25/10/2014

Project plan has slipped from

origonal

Origonal plan had allowed for much

more work to be completed in this

period placing additional stress into

next work preiod. Medium Medium

Revise plan realistically as possible

and better allow for external

pressures Closed

4 14/10/2014

ECA&D website undertook a major

update and some data sets could

not be downloaded

Unable to download primary data

sets as more were added due to

project proposal revision High High

Check every day until data sets

become available for download Closed

5 25/10/2014

Existing Gantt Chart cannot be easily

adjusted or updated to reflect

change

Unable to plan and manage project

and significant time to adjust chart

moving forward. Danger is that this

is not done and ability to manage is

compromised. Medium Medium

Revise plan to use changeable date

format and days that automatically

updates chart Closed

Page 191: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 03

ASSUMPTIONS

DEPENDANCIES

ID Date Raised Assumption Description Reason for Assumption Action to Validate Impact if Assumption Incorrect Status

1 30/10/2014

Businesses that print and bind may

be closed over the christmas period

Holidays and reduced demand may

see these businesses close over an

extended period

Identify businesses required to bind

report and clarify opening hours

well before christmas period

More time in project plan currently

not being utilised. Closed

ID Date Raised Dependency Description Location Deliverables Delivery Date Importance Status

1 20/10/2014

Analysis and database creation is

dependant upon stadium list

being compiled Internal

Bound files for weather station

locations and stadium lists. 02/11/2014 High Closed

2 01/12/2014

Real match predictions need

weather forecasts to be captured

and recorded now for later

analysis External

Weather forecast details for all

weather paramters to be taken on

day of match prior to being

played 20/12/2014 Low Closed

Page 192: The effects of weather on football matches played within the German Bundesliga

Management Progress Report 03

13 Appendix C - Contingency Plans

13.1 File, Electronic and Hardcopy protection, Backup and recovery Plan

This contingency plan addresses how critical project data can be protected from events such as PC hard drive failure, USB loss, power failure, Internet loss or any other incident which when occurring can threaten the project data including all analysis and reports electronic and hard copy. This process should be enacted if these events occur Critical Electronic Project Data

All project data is backed up onto PC from the working USB key and also onto a dedicated cloud storage file space. In the event that one of these is compromised or lost then an alternative should be established immediately to ensure that multiple redundancy is maintained at all times. A secondary hard drive or storage device should be obtained with the most recent version of the project files copied onto it. Hard Copy File Data

Hard copies will be created towards the end of the project. They should be suitably protected and bound to ensure they cannot be easily damaged and stored in a safe place until submitted to the client or submitted early. 3 copies should be printed.

13.2 Critical Circumstance & Threat Contingency Plan

In the event of personal circumstances beyond anyone’s control such as illness or family emergency the following plan should be put in place.

1. Determine as best as possible impact on project plan 2. If time lost will be detrimental to the project to the extent where delivery is

comprised immediately contact Project Supervisor to advise in writing 3. Ensue that personal circumstances forms are completed and Project Supervisor is

kept informed as to situation.

Page 193: The effects of weather on football matches played within the German Bundesliga

- 102 -

8.11 Other Material Used

A CD containing all the code used in this study is attached to the front cover of this

document. Open the README.txt file for information.


Top Related