1 a multivariate analysis on the 2004 summer olympic games wei xiong, m.sc student, department of...

23
1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May 12-13, 2005

Upload: heather-walker

Post on 17-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

1

A Multivariate Analysis on the 2004

Summer Olympic Games

Wei Xiong, M.Sc Student, Department of Mathematics and Statistics,

University of Guelph

May 12-13, 2005

Page 2: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

2

OUTLINE

1. Introduction• 2004 Summer Olympic Games• Multivariate techniques: cluster analysis,

multivariate analysis of variance, multivariate regression analysis

• Literature review of analyses on Olympic Games

2. Data Analysis and Discussion

3. Conclusions

Page 3: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

3

2004 Summer Olympic Games • the largest event, 11,000 athletes from 202

countries, 929 metals won by 75 countries/regions.

Multivariate (>1 response variable) Techniques • Cluster Analysis: obs’n (countries) classified into clusters

(groups) based on each obsn’s similarity of multi variables

(number of gold, silver, bronze and total), by measuring the

distance or dissimilarity between any two clusters.

Page 4: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

4

• Multivariate Analysis of Variance (MANOVA): a generalization of ANOVA, used to compare more than

two population mean vectors

Hypothesis:

H0: 1 = … = t versus Ha: j ≠ k (for some j ≠k)

H0 is rejected if H = SS(Treatment) >> E = SS(Error) Wilk’s statistic = |E| / |E+H|

Page 5: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

5

• Multivariate Regression

model: Y (nxp) = X (nxq) (qxp) + E (nxp) where n: observations,

p: response variables,

q: explanatory variables

Least square estimator of is:

(X'X )-1X'Y

Page 6: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

6

Literature review

• Condon et al [1] tried to predict a country’s success at the Olympic Games using linear regression models and neural network models.

• Lins et al [2] developed a Data Envelopment Analysis (DEA)-based model to rank each country based on its ability to win medals in relation to its available resources.

• Churilov and Flitman [3] improved the Data Envelopment Analysis (DEA)-based model by combining different sets of input parameters with the DEA model.

This study: uses multivariate techniques to analyze the 2004 Summer Olympic Games and try to explore the factors that influence the number of medals won.

Page 7: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

7

Table 1: Rankings For Participating Countries

Country Gold

(y1)

Silver

(y2)

Bronze

(y3)

Total

(y4)

Ranking

(by Gold)

[4]

Ranking

(by Cluster Analysis)

USA 35 39 29 103 1 1

China 32 17 14 63 2 2

Russia 27 27 38 92 3 1

Canada 3 6 3 12 21 4

Syrian 0 0 1 1 71 5

Trinidad 0 0 1 1 71 5

Note: number of countries in cluster 1, 2, 3, 4 and 5 are 2, 3, 7, 7, 56 respectively.

Page 8: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

8

Table 2: Least Square Means for Group Medals

y1 (Gold) y2 (Silver) y3 (Bronze) y4 (Total)

1(USA, RUS)

31.00 33.00 33.50 97.50

2(CHN, AUS, GER )

21.00 16.33 16.00 53.33

3 10.43 8.86 # 11.00 30.29

4 4.86 7.00 # 5.29 17.14

5 1.23 1.34 1.75 4.32

Group

Medals

Note: # close to each other

Page 9: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

9

Multivariate Analysis of Variance (MANOVA):Compares the metal means for the 5 groups

MANOVA Test: Hypothesis of No Overall Group Effect

Statistic Value F Value Pr > F

Wilks' Lambda 0.02126952 49.34 <.0001

proc glm;class groupmodel y1-y4=group;manova h=group;lsmeans group/pdiff;run;

Page 10: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

10

Least Squares Means for effect group for silver (y2)

Pr > |t| for H0: LSMean(i)=LSMean(j)

i/j 1 2 3 4

2 <.0001

3 <.0001 <.0001

4 <.0001 <.0001 0.0572

5 <.0001 <.0001 <.0001 <.0001

Note: p-values for other metals < 0.0001

Page 11: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

11

? WHY

• Why some countries won more medals and the others won less

• Hypotheis: the larger the population and GDP, the more the

medals

Population: the larger the population (x1), the more the outstanding athletes available

GDP (Gross Domestic Product): the higher the GDP, the more the funding for athletes training

Page 12: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

12

Number of Gold (y1)

Number of Silver (y2)

Number of Bronze (y3)

Number of Total (y4)

1 p-value 2 p-value 3 p-value 4 p-value

x1 (million)

0.0116 0.0002 0.0043 0.1223 0.0031 0.3712 0.0190 0.0317

x2 ($bill-

ion)

0.0031 <.0001 0.0033 <.0001 0.0027 <.0001 0.0091 <.0001

y’s

x’s

Table 3: Multivariate Regression of Medals on Population (x1) [5] and GDP (x2) [6]

proc glm;model y1-y4 = x1-x2/xpx i;run;

Page 13: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

13

Conclusions

The 2004 Summer Olympic Games are analyzed using multivariate

methods: Cluster Analysis, Multivariate Analysis of Variance,

Multivariate Regression Analysis.

Participating countries are classified into 5 groups based on their number

of medals won. It is found that each group differs significantly in terms of

the number of medals in that group.

Page 14: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

14

Population and GDP are two significant factors for each group’s number of

medals: an increase of 1 million in population increase the number of gold

by 0.0116, or the number of total medals by 0.019. 1 billion’s increase in

GDP increase the number of gold by 0.0031, silver 0.0033, bronze 0.0027,

or total by 0.0091.

References

[1] Edward M. Condon, Bruce L. Golden and Edward A. Wasil (1999).

Predicting the success of nations at the Summer Olympics using neural

networks. Computers & Operations Research. 26(13),1243-1265.

Page 15: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

15

[2] Marcos P. Estellita Lins, Eliane G. Gomes, João Carlos C. B. Soares de Mello and Adelino José R. Soares de Mello (2003). Olympic ranking based on a zero sum gains DEA model.  European Journal of Operational Research. 148(2), 312-322.

[3] L. Churilov and A. Flitman (2004). Towards fair ranking of Olympics

achievements: the case of Sydney 2000. Computers & Operations

Research. Available online 6 November 2004.

[4] http://www.athens2004.com/en/OlympicMedals/medals, accessed

May 11, 2005.

[5] http://www.geohive.com/global/index.php, accessed Nov. 25, 2004.

[6] http://www.geohive.com/global/geo.php?xml=ec_gdp1&xsl=ec_gdp1,

accessed May 11, 2005.

Page 16: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

16

Page 17: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

17

Appendix 1

Table 1. Number of metals for each country/region

• Country/Region,Gold,Silver,Bronze,Total

• USA 35,39,29,103 CHN 32,17,14,63 RUS 27,27,38,92 AUS17,16,16,49 JPN16,9,12,37 GER 14,16,18,48 FRA11,9,13,33 ITA 10,11,11,32 KOR 9,12,9,30 GBR 9,9,12,30 CUB 9 7 11 27 UKR 9 5 9 23 HUN 8 6 3 17 ROM 8 5 6 19 GRE 6 6 4 16 NOR 5 0 1 6 NED 4 9 9 22 BRA 4 3 3 10 SWE 4 1 2 7 ESP 3 11 5 19 CAN 3 6 3 12 TUR 3 3 4 10 POL 3 2 5 10 NZL 3 2 0 5 THAThailand314826BLRBelarus2671527AUTAustria241728ETHEthiopia232729IRII.R.Iran222630SVKSlovakia222631TPEChineseTaipei221532GEOGeorgia220433BULBulgaria2191234JAMJamaica212535UZBUzbekistan212536MARMorocco210337DENDenmark206838ARGArgentina204639CHIChile201340KAZKazakhstan143841KENKenya142742CZECzechRepublic134843RSASouthAfrica132644CROCroatia122545LTULithuania120346EGYEgypt113547SUISwitzerland113548INAIndonesia112449ZIMZimbabwe111350AZEAzerbaijan104551BELBelgium102352BAHBahamas101253ISRIsrael101254CMRCameroon100155DOMDominicanRep100156IRLIreland100157UAEUArabEmirates100158PRKDPRKorea041559LATLatvia040460MEXMexico031461PORPortugal021362FINFinland020263SCGSerbia.Monteneg020264SLOSlovenia013465ESTEstonia012366HKGHongKong010167INDIndia010168PARParaguay010169NGRNigeria002270VENVenezuela002271COLColombia001172ERIEritrea001173MGLMongolia001174SYRSyrianArabRep001175TRITrinidad.Tobago0011

Page 18: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

18

SAS coding-1data Anthemn2004SummerOlympic;input Country $ y1-y4;cards;see Table 1 for data;proc cluster method=eml standard rmsstd rsquare outtree=tree;var y1-y4 ;id country;run;proc tree data=tree noprint n=5 out=countryout;id country;run;proc tree data=tree n=5;id country;run;proc sort;by country;proc sort data=Anthemn2004SummerOlympic out=new;by country;data temp;merge new countryout;by country;proc sort;by cluster;proc print;id country;proc factor heywood rotate=varimax, quartimax;var y1-y4 ;by cluster;proc princomp;var y1-y4 ;run;proc factor heywood rotate=varimax, quartimax;var y1-y4 ;run;

Page 19: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

19

SAS coding-2data Anthemn2004SummerOlympic;input group y1-y4 x1-x2 ;cards;5 35 39 29 103 273 108825 27 27 38 92 146 4334 32 17 14 63 1247 14104 17 16 16 49 19 5184 14 16 18 48 82 2401;proc glm;class group;model y1-y4=group;manova h=group/printe printh;lsmeans group/pdiff;run;

Page 20: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

20

SAS codingdata Anthemn2004SummerOlympic;input group y1-y4 x1-x2 ;cards;……………..;proc corr;var y1-y4 x1-x2;run;proc glm;model y1-y4 = x1-x2/xpx i;MANOVA H=x1 x2 /printe printh;run;

Page 21: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

21

Log Likelihood

2356

1856

1356

856

356

- 144

- 644

Count r y

USAUnite

RUSRussi

CHNChina

AUSAustr

GERGerma

JPNJapan

FRAFranc

GBRGreat

ITAItaly

KORKorea

CUBCuba

UKRUkrai

HUNHunga

GREGreec

ROMRoman

CANCanad

BLRBelar

NEDNethe

ESPSpain

NORNorwa

SWESwede

NZLNewZe

GEOGeorg

LTULithu

MARMoroc

IRII.R.I

SVKSlova

CROCroat

TPEChine

JAMJamai

UZBUzbek

ARGArgen

AZEAzerb

EGYEgypt

SUISwitz

SLOSlove

CHIChile

BAHBaham

ISRIsrae

CMRCamer

DOMDomin

IRLIrela

UAEUArab

INAIndon

ESTEston

ZIMZimba

BELBelgi

NGRNiger

VENVenez

COLColom

ERIEritr

MGLMongo

SYRSyria

TRITrini

FINFinla

SCGSerbi

HKGHongK

INDIndia

PARParag

AUTAustr

ETHEthio

RSASouth

KAZKazak

KENKenya

CZECzech

PRKDPRKo

LATLatvi

MEXMexic

PORPortu

BRABrazi

TURTurke

POLPolan

THAThail

BULBulga

DENDenma

Cluster analysis: Countries Classified into 5 Groups

CAN

5 4 3 2 1Groups:

Page 22: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

22

Table 2: Factor Analysis on Metals

Group 1 2 3 4 5

LatentFactor

(%)

1

(94.71) *

1(95.99)

1(61.35)

2(86.50)

1

(52.10)

2

(83.89)

1

(58.04)

2

(83.09)

Gold

(y1)

0.9634 # 0.9997 0.8783 -.0055 -.0378 0.9844 0.7654 0.0052

Silver

(y2)

0.9694 0.9839 0.1470 0.9873 0.6388 -.5449 0.1409 0.9893

Bronze

(y3)

0.9595 - .9414 0.8314 -.0629 0.8151 -.1716 0.8546 -.1100

Total

(y4)

0.9999 0.9928 0.8682 0.4932 0.9551 0.2727 0.9186 0.3908

Note: * cumulative eigenvalues, percentage of total variation explained in the four variables (metals) # Factor loading, correlation between latent factor and variables (Factor Analysis, rotation = quartimax, make latent factor strongly or weakly correlated to variables)

Page 23: 1 A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May

23

Correlation Between y’s and x’s x1 ( # Population) [2] , x2 ( # GDP, Gross Domestic Product) [3]

Pearson Correlation Coefficients Prob > |r| under H0: Rho=0

y1 y2 y3 y4

x1 0.46543 0.3038 0.23199 0.34887 <.0001 0.0081 0.0452 0.0022 x2 0.70219 0.76180 0.60769 0.71640 <.0001 <.0001 <.0001 <.0001

Note: reasonable correlation between y’s and x1, large correlation between y’s and x2.

# Both population and GDP are in 2003