predicting outcomes of ncaa basketball …users.manchester.edu/student/adcripe/math senior...

Cripe 1

Predicting Outcomes of

NCAA Basketball Tournament Games

Aaron Cripe

March 12, 2008

Math Senior Project

Cripe 2

Introduction

All my life I have really enjoyed watching college basketball. When I was younger my

favorite month of the year was March, solely because of the fact that the NCAA Division I

Basketball Tournament took place in March. I would look forward to filling out a bracket every

year to see how well I could predict the winning teams. I quickly realized that I was not an

expert on predicting the results of the tournament games. Then I started wondering if anyone

was really an expert in terms of predicting the results of the tournament games. For my project I

decided to find out, or at least compare some of the top rating systems and see which one is most

accurate in predicting the winner of the each game in the Men’s Basketball Division I

Tournament. For my project I compared five rating systems (Massey, Pomeroy, Sagarin, RPI)

with the actual tournament seedings. I compared these systems by looking at the pre-tournament

ratings and the tournament results for 2004 through 2007. The goal of my project was to

determine which, if any, of these systems is the best predictor of the winning team in the

tournament games.

Project Goals

Each system that I compared gave a rating to every team in the tournament. For my

project I looked at each game and then compared the two team’s ratings. In most cases the two

teams had a different rating; however there were a couple of games where the two teams had the

same rating which I will address later. Since the two teams had different ratings this created a

favored team (which is the team that should win according to the rating system), and non-favored

team or underdog (which is the team that should lose the game according to the rating).

Differing rating systems might have a different favored team for each game. Of course there are

going to be some games when every rating system has the same favored team, but there will also

Cripe 3

be games when the systems could be split in terms of who the favored team is. For my project

we computed the relative strength of the favored team versus the underdog. If a rating system has

any predictive power then the higher the favorites’ relative strength, the more likely they are to

win. The relative strength of the two teams can be determined two different ways, ratio and

difference. Relative strength ratio can be computed by taking the favored team’s rating and

dividing it by the non-favored team’s rating. The relative strength difference can be computed

by taking the favored team’s rating and subtracting the non-favored team’s rating. When the

favored team has a much higher rating than the non-favored team, the relative strength will be

large. This occurs in the tournament when a one seed plays a sixteen seed, two seed plays a

fifteen seed, three seed plays a fourteen seed, or a four seed plays a thirteen seed. However,

when the two teams have a similar rating the relative strength of the favored team is much

smaller. This occurs in the tournament when an eight seed plays a nine seed, seven seed plays a

ten seed, or a six seed plays and eleven seed. A lower relative strength means that the favored

team is less likely to win than a favored team with a large relative strength. Another property of

a useful rating system is that the favored team should win at least 50% of the time (because if

you just pick one team at random there is still a 50% that team will win), even if the relative

strength is extremely small. If a rating system has less than a 50% accuracy rate, then it has no

ability to predict which team will win each game. We are hoping that we can find out that every

system has a much higher accuracy rate than 50% because that would then tell me that there are

valid methods for predicting the winner of each game. We also wish to determine if certain

methods are better at predicting winners. Then we will develop models for each method to

estimate the probability a favored team wins, given their relative strength versus their opponent.

Cripe 4

NCAA Tournament Selection Process

Before going into further detail about each rating method that we studied it is important

to understand the procedure for determining which teams make the NCAA tournament. There is

a ten-member committee chosen by the NCAA made up of athletic directors and conference

commissioners. However, the athletic director is not allowed to discuss his/her own team unless

someone else directs a question to them. Also the conference commissioner is not allowed to

discuss the teams in their conference but is entitled to update the committee about each team.

Updates include any injuries, suspension, or anything else that may be relevant. The committee

is solely responsible for choosing which teams make the tournament. The actual process is very

detailed and extremely tedious. There are thirty teams that receive an “automatic bid” to the

tournament. The only way to receive this automatic bid is to win your conference tournament.

The Ivy League, however, does not have a conference tournament so the team that wins their

conference season gets the “automatic bid.” Thus the committee is in charge of determining the

rest of the teams that make up the tournament, seeding the teams, and creating the brackets. The

other teams that make the tournament receive one of the 34 at-large bids. They did not win their

conference tournament however they are picked to be in the tournament by the committee. Most

of these at-large bids come from the teams in the most prestigious conferences. It is less

common for a small school to receive an at-large bid. The committee has access to any rating

system of their choice when deciding which teams make the tournament. Also the national polls

and coaches’ poll play a large factor in deciding which teams make the tournament. The teams

that are in the top twenty-five will almost always make the tournament. When deciding which

teams receive the at-large bids the committee is supposed to take into account how good that

team is at the start of the tournament to decide which teams should make it and which teams

Cripe 5

should not. The committee is allowed to take into account any injuries of suspensions that any

given team may have at the start of the tournament. There are four (possibly more) aspects that

the committee cannot use to determine which teams get the at-large bids. Those four are:

geography of the schools, the coach of the school, the prestige of the school, or the style of the

play of the team. The committee is allowed to use anything else when determining which teams

make it. The committee is also in charge of giving each team a seed. The tournament is a 65

team tournament and two teams have to play in what is known as the “play in game” to decide

which team plays in the 64 team tournament. The 64-team set up consists of four regional

brackets where each section consists of 16 teams seeded from 1 to 16. The team with the one

seed is the “best” team in that region and the team with the sixteen seed is the “worst” team in

that region according to the committee. Like other single-elimination tournaments the top seed

playing the lowest seed, then the next best seed plays the second to last seed, etc. Each region

has one team go through without a loss and those winning teams make up what is known as the

Final Four. Then it is played just like any four-team tournament single-elimination, until one

team is deemed the National Champion. Before explaining the actual seeding process, we now

discuss the rating systems to be investigated (NCAA 1).

Rating Systems

RPI

Ratings Percentage Index or RPI as it is more commonly referred to, is one of the most

widely known rating systems. It has been rumored that it is the most influential system in

regards to the committee deciding which teams make the tournament. This system has been in

use since 1981. RPI consists of three main parts: 1) the team’s winning percentage (25%), 2) the

Cripe 6

team’s opponents’ winning percentage (50%), and 3) the winning percentage of those opponents’

opponents (25%). The third part represents the schedule strength of the teams that your team

plays. The RPI rating is determined as a weighted average of these three:

RPI =( .25 * (1)) + (.50 * (2)) + (.25 * (3))

The major criticism of RPI is that this system places to much emphasis upon strength of schedule

and unfairly advantages teams from major conferences. The teams in major conferences are

forced to play against certain schools, which usually are schools of high caliber. The smaller

schools are forced to play the schools in their conference that are usually also small schools.

Since RPI is so heavy looked at by the selection committee the smaller schools are trying to get

the bigger schools to play them during the non-conference portion of the season. Small schools

will go as far as even paying the bigger schools to allow them to play them. It is a big risk for a

big school to play a talented smaller school because a loss will adversely affect their RPI while a

win provides little benefit. Thus this is a weakness of the RPI system because not every team

has complete control of their schedule (Pomeroy(2) 1).

Since 2004 the RPI has added one more aspect into the systems’ ratings. The idea of

home court advantage has been taken into consideration. For example, a home win now counts

as 0.6 win, while a road win counts as 1.4 wins. Conversely, a home loss equals 1.4 losses,

while a road loss counts as 0.6 losses. This change is one of the main reasons we only look at

the tournaments as far back as 2004 (Pomeroy(2) 1).

Pomeroy

The Pomeroy rating system is done by Ken Pomeroy. He states that his system is

designed to be purely a predictor. He claims that he does not rate the teams on how successful

Cripe 7

their season has been, but instead by how strong he feels that team is if they played tonight,

independent of injuries and emotional factors. Thus as you can imagine there are a lot of aspects

about his system that he does not like to reveal to the public. However I will try and describe his

system as I best understand it. Pomeroy looks at a game that the team won and then looks to see

how much they won by. Similarly, for a loss he looks to see how much they lost by. There is

more emphasis placed on a 20-point win than say a 5-point win. This system gives a higher

rating to a team that lost a lot of close games against strong opposition than one that wins a lot of

close games against weak opposition. Thus there could be a team in a major conference with a

losing record that has a higher rating than a team in a small conference with a winning record.

Pomeroy’s system also takes consideration into home court advantage. To sum Pomeroy’s

system up he takes into account margin of victory, strength of opponent, schedule strength,

offensive efficiency, defensive efficiency, and how good your opponents’ offense and defense is.

His system is very mathematical and very complex. Ken Pomeroy is one of the few who truly

understands how his system works. One of the weaknesses of his system is that he keeps making

changes on his system from year to year. He has not yet had a chance to compare his system

over a long period of time because he makes little changes from year to year always trying to

make his rating system better. This could be considered a strength of his system because each

year he is trying to make his system a stronger predictor (Pomeroy(1) 1). .

Sagarin

Sagarin’s system is done by American sports statistician Jeff Sagarin. His rating system

has been used by the selection committee since 1984 to help determine which teams should or

should not make the tournament and, if they do, what seed they should be. It is also a widely

known fact that Sagarin does not like to reveal his exact methodology behind his system.

Cripe 8

Sagarin’s system is made up of two different systems which he then combines to form his actual

rating system for college basketball. The first system that he calls “ELO Chess” (based on the

ELO rating system used internationally to rank chess players). That system only takes into

account wins and losses and has no reference to margin of victory. The other system he calls

“Predictor” takes victory margin into account. This system is meant to predict the margin of

victory for the stronger team at a neutral site by the difference in two teams’ rating scores. The

rating that I used is a combination of the two; however I was unable to find how he combines

them. I know that he does not simply take the average of the two. I would assume that he

weights one of the systems higher than the other. Also Sagarin’s system favors teams that lose

close games against stronger opponents versus teams that win close games against weak

opponents. For both systems, teams gain higher ratings within the Sagarin system by winning

games against stronger opponents, factoring in home-court advantage. Thus, in that aspect this

system is similar to Pomeroy’s. Even though Sagarin’s system is very secretive and not a lot is

known about it, it is a very widely used and respected system (Schatz 1).

Massey

The fourth rating system that we investigated is the Massey Rating System. This system

is a computer rating system done by Kenneth Massey. No different than other rating systems,

Massey does not like to divulge his exact methods to the public. However, we do know that

Massey inputs the score, venue, and date of each game to come up with his ratings. Massey does

not use any statistics such as rebounds or steals to come up with his ratings. He goes on to say

that it is impossible to include all aspects of the game such as injuries, suspension, or how the

ball bounces on any given night. However, Massey feels that the score of the game will usually

include those various components. He believes that, in a sense, will include the score when

Cripe 9

finding his ratings because those outside factors will affect the score of the game. Massey

believes that his rating system is designed to measure the teams’ past performance, and it is not

an accurate predictor of future outcomes. Massey’s rating system uses a rating scale where the

highest team is around a 2.5 and the lowest teams are usually in the negatives. One difference

that Massey’s system has compared to the others is that he includes all the divisions of basketball

and not just Division I (Massey 1).

Table to summarize the different rating systems.

Margin of Victory

Home Court

Strength of Schedule

Strength of Opponents Schedule

Record (win and losses)

Range of Ratings

RPI No Yes Yes Yes Yes 0.0 to 1.0

Pomeroy Yes Yes Yes No Yes 0.0 to 100

Sagarin Yes Yes Yes No Yes 0 to 100

Massey Yes Yes No No Yes -4.0 to 4.0

Figure 1

Tournament Seeds

The selection committee’s seeding of each team can also be considered a type of rating

systems. The selection procedure is extremely complex and not completely revealed to the

public. However, the selection committee first decides which 65 teams make the tournament

before they even think about seeds. Then they have to, in a sense, rank the teams from 1 to 65,

this process is the most mysterious of the seeding selections. After the committee has the teams

ranked in order than the process is quite simple and systematic. The teams that are ranked 1

through 4 will all receive 1 seeds, and teams 5 through 8 will all receive 2 seeds, and so on until

Cripe 10

there are four teams of each seed from 1 to 16. Then they have to determine what teams go into

what regional bracket, as discussed earlier. They take the number 1 overall seed (the team with

the 1 seed when seeded 1 through 65) and put that team into a regional bracket. Then they place

the other number 1 seeds in different regions. Thus if the Final Four occurred as the seedings

predict the top number 1 seed would play the fourth number 1 seed, while the second number 1

seed would play the third number 1 seed, in the final four. Then they divide the four number 2

seeds into the regions in a similar way. They take the fourth number 2 seed and put them with

the top number 1 seed, and the top number 2 seed with the fourth number 1 seed. They do this

accordingly for the rest of the seeds, but each time alternating which region to ensure that every

region is equal in difficulty according to the selection committee. While the selection committee

is known for doing a good job when assigning seeds to each team, it is inevitable that there will

still be upsets in every tournament. However, in all the years of the NCAA tournament a number

1 seed has never lost to a number 16 seed. There have been a couple number 2 seed schools to

lose to a number 15 seeded team.

Goals, Problems, and Assumptions

We analyzed all five of these rating systems to determine which one is the most accurate

predictor of the winner of the tournament games. The five rating systems sometimes make

changes to keep the systems up to date. This caused me to only use the data as far back 2004

because in 2004 RPI rating system made some changes, taking home court advantage in to

account. We wanted to make sure that the rating system was consistent from year to year. Also

when we were looking at the Massey rating systems we noticed that in 2005 the range of the

ratings was different than the range of the other years. For example the top team in 2005 had a

rating of 6.5, while the top team in the other three years had a rating around 2.5. However, the

Cripe 11

lowest team’s ratings were really similar amongst all the years, which caused a much higher

range in year 2005. This change in the range is a problem for our data because we are using the

ratio and difference of the ratings. The ratio and differences would be drastically different in

year 2005. Thus we decided that I needed to re-scale the 2005 Massey ratings to align the

ratings with the rest of the years. After consulting with Kenneth Massey we were able to re-scale

the data so it was comparable to the other years. This also occurred with Pomeroy’s rating

system. In 2007 he used a different method which resulted in different ratings than in his

previous three years of data. This time we were unable to manual change the data to correspond

with the other three years so our only option was to omit the 2007 data of Pomeroy. Now instead

of 252 games of data for Pomeroy, we used 189 games. Also there was the opportunity for a

team to have the same rating as its opponent. This is very possible for the tournament seeds,

since there are four regional brackets with essentially the same seeds. However the only time

this could occur would be during the Final Four, when each regional bracket has one team

emerge and play the winners of the other regional brackets. When looking at all five rating

systems over four years of tournament games we ran into three ties. Two out of the three ties

occurred with the seeds and the other was in 2005 with the RPI rating system. We decided to

simply not use those games in our study since neither team is considered to be the favorite. We

omitted the data of that one specific game only for that specific rating system.

Cripe 12

Rating Systems’ Predicting Ability

Rating System Success rate for predicting winner

Seeds 73.2%

Pomeroy 73.0%

Massey 72.6%

Sagarin 72.2%

RPI 70.9%

Figure 2

The table in Figure 2 provides the percentage of time that the favored team wins

according to each rating system. It is calculated by taking the total number of games predicted

correctly divided by the total number of games. If the rating systems cannot predict a winner

with an accuracy higher than 50% that means that the system has no merit, because you can just

simply flip a coin and have the same accuracy. The actual seeds rating system was the most

accurate in terms of just predicting which team would win at 73.2% of the time. However, when

looking at Figure 2 you can see that the five rating systems all are within 3% of one another

which leads me to believe that there is not a lot of difference in the systems in terms of purely

predicting the winner. You can use the hypothesis test for one sample proportion to determine

whether or not a rating system can predict the winner with accuracy higher than 50%. We, let

our null hypothesis to be the proportion predicted correctly equals .50. Our alternative

hypothesis is the probability proportion greater than .50 (i.e. the rating system is effective at

predicting winners). According to SPSS 15, RPI had the lowest percentage of predicting the

winner at 70.9%. Thus for the hypothesis test we used RPI because if it passes the test it is

Cripe 13

obvious that the other systems will too. A z-score is computed by comparing the sample

proportion of .709 to .50 then we obtain a z-score of 6.62. Since z is greater than 3 we know the

corresponding P-value is below .05. From this we can conclude that RPI is effective at

predicting at a higher accuracy than 50%. As a result we can conclude that all of the other rating

systems also are effective at predicting the winner.

Another question that needs to be addressed is whether one system is “better” than any

other system at predicting the winner. For this we can use a hypothesis test comparing two

sample proportions. We will use Seeds (because according to figure 2 it is the highest) as sample

1 and RPI (because according to Figure 2 it is the lowest) as sample 2. If the hypothesis test

shows that there is no difference between these two systems then we can conclude that there is

no difference between any of the systems. We will assume the true proportion p1 and p2 are

equal to each other for our null hypothesis. Then for our alternative hypothesis we will say that

they are not equal. If we calculate a p-value that exceeds .05 then we can conclude that our

alternative hypothesis is false. A z-score comparing .732 and .709 results with z = .575,

resulting in a p-value of .562 which is greater than .05. Thus, there is no evidence to conclude

that any of the rating systems are different in terms of the likelihood of predicting the winner.

Predicting Game Outcomes using Relative Strength

The overall goal of my project is to use each rating system to develop a model for the

probability that the favored team wins; furthermore, we will indentify which model makes the

most accurate predictions of game outcomes. In order for us to find the probability of winning

based on each rating system we used a logistic regression model. Logistic regression models the

natural log of the odds as a linear function of some independent variable x:

Cripe 14

1) ln +

Odds of winnings are defined as where p is the probability the favored team wins. We will

use relative strength as our predictor variable x. The parameters of the logistic model are and

. Solving equation (1) for p, we obtain:

2) p (x) =

Here p(x) is the probability that a team with relative strength x (compared to their opponent) will

win the game. The parameters of the model, will be found using SPSS 15. This model will form

what is known as a logistic curve.

The above picture is an example of a typical shape of a logistic curve. This shows as x increases

(that is relative strength) the probability of winning p(x) approaches 1. As the relative strength

p(x)

x

Figure 3

Cripe 15

decreases the probability of winning approaches 0, and as relative strength approaches 0

(meaning no difference in the rating of the two teams) p(x) logically approaches .5 or 50%. For

us the only part of Figure 3 that is relevant is the positive x values which represent the rel

strength of the favored team. We will be comparing relative strength two different ways by

difference and ratio. The relative strength by difference is found by subtracting the non-favored

team’s rating from the favored team’s rating. Two teams that have the same rating would ea

have a relative strength of zero in terms of difference, as shown in Figure 1. However, for our

investigation the relative strength x will be positive. The relative strength by ratio is found by

dividing the favored team’s rating by the non-favored team’s rating. Thus, two teams with equa

rating will have a relative strength of one rather than zero when using differences. When two

teams have the same rating it impossible to label one of them the favored team. Thus, there is no

way of predicting the winner better than by just simply flipping a coin. Each team has the same

chance of winning the game. That is why it was important for us, in the project, to make sure

that the modeled logistic curve goes through the point that represents a 50% chance of winning

when two teams have the same rating. In order to insure this happens we can force to be

zero, which is permitted in SPSS. The logistic model then simplifies to:

3) p (x) =

ative

ch

l

SPSS will construct the f nct o pu i n (x) for each rating system using both the relative strength with

ratio and with difference. We hope the data will show that the higher the relative strength the

more likely the favored team is to win. Furthermore the logistic regression model will help me

determine which rating system is the most accurate predictor of the probability of winning for

the favored team (McCabe, Moore Chapter 16).

Cripe 16

After analyzing the data from SPSS we were able to construct a table of the numbers th

are particularly relevant to the project (see append

at

ix 1). The table consists of , the P-value,

2

v. team won

Cox and Snell R , and the percentage of games that the favored team won.

Sig. (P-Value) Cox and Snell % of games fa

Sagarin RS_Df .206 .000 .312 72.2

Sagarin RS_Rt .930 .000 .202 72.2

RPI RS_Df 24.068 .000 .271 70.9

RPI RS_Rt .864 .000 .181 70.9

Pomeroy RS_Df .219 .000 .297 73.0

Pomeroy RS_Rt .953 .000 .217 73.0

Massey RS_Df 3.078 .000 .278 72.6

Massey RS_Rt .867 .000 .231 72.6

Seed RS_Df .200 .000 .301 73.2

Seed RS_Rt .472 .000 .293 73.2

he first concern we had was if any of these rating systems provide a statistically significant

redicator of the winner. That is, the relative strength is useful when estimating the probability

the favored team wins. In order to tell if the rating system is statistically significant we look at

the associated p-value column. If the p-value is less than 0.05 then there is statistical evidence of

a relationship between p(x) and relative strength. As you can see from the table, all five ratings

systems provide significant results for relative strength using both differences (RS_Df) and ratios

Figure 4

T

p

Cripe 17

(RS_Rt). The Cox and Snell R2 shows how accurate the relative strength is in predicting p(x).

If relative strength were 100% accurate in predicting the probability of the winner then the R2

rating would be 1. The Cox and Snell R2 is extremely important to the project because that

number shows how useful relative strength is, which exactly is the goal of the project. Thus, as

you can see from the table, Sagarin difference is the most accurate when analyzing the relati

strengths with the highest R2 of .312. Roughly speaking, R2= .312 indicates, 31.2% of variation

in the favored team’s chance of winning can be explained by knowing their relative strength

according to Sagarin’s system.

Based on Figure 4 we see that Sagarin_Df (R2=.312), Seeds_Df (R2=.301), and

Pomeroy_Df (R2=.297) provide the most accurate predic

ve

tions of the probability of winning.

Based on the 4 years of tournament data, that we examined, the best three models are:

Sagarin_Df- p(x) = .

.

Seeds_Df- p(x) = .

.

omeroy_Df- p(x) = .

.

P

Cripe 18

__ 30.00025.00020.00015.00010.0005.0000.000

1.00

0.80

0.60

0.40

0.20

0.00

ProbWin RS_Diff

FavWin RS_Diff

Pomeroy RS Df

__ 30.00025.00020.00015.00010.0005.0000.000

1.00

0.80

0.60

0.40

0.20

0.00

ProbWin RS_Diff

FavWin RS_Diff

Sagarin RS Df

Cripe 19

_ _ 14121086420

1.00

0.80

0.60

0.40

0.20

0.00

ProbWin RS_Diff

FavWin RS_Diff

Seeds RS Df

These three graphs above are representations of the three most accurate predictors of the winning

teams. The graphs show two types of scatter plot. The first is the points either on zero or one,

where one is where the favored team won, and zero is where the favored team lost. The other

curved graph is taking the data and plotting it on the logistic model. As you can see from the

three graphs above, as relative strength increases so does a team’s chance of winning. For

example, consider 2004 Elite Eight Alabama vs. Connecticut. Connecticut is a 2 seed in the

tournament and Alabama is an eight, and in the project we switched the ratings so the favored

team would have a higher seed, thus Connecticut a rating of 15 and Alabama a rating of 9. Thus

the relative strength using Seed_Df would 6 for Connecticut. Thus if you look at the graph of

Seed_Df for x = 6, you can see that the probability of winning for Connecticut is just below

80%. That means that Connecticut has just below an 80% chance of winning the game

Cripe 20

according to the Seeds_Df ratings. It turns out that Connecticut did, indeed, win the game and

end up winning the National Championship in 2004.

In these models represents the factor increase in odds of winning when relative

strength increases 1 unit. This can be illustrated using a three-team tournament where team A

has a rating of 4, team B a rating of 3, and team C a rating of 2. If team A plays team B (say in

Game #1) then team A will have a relative strength Df of 1. Thus if we plug that into the

formula 3 and use the Sagarin_Df model

.

. we obtain a p(1) = .5513. Then if we plug

our p(1) into our odds formula we result with 1.23. Which means that a team with relative

Df of 1 according to Sagarin, has 1.23:1 odds of winning the game. Then if team A plays team C

(say Game #2) then team A will have a relative strength Df of 2. Using formula 3 and Sagarin

Df (β1=.206) we obtain a p(2) = .6016. Then plugging that into the same odds formula we obtain

odds of 1.51:1. Now we can calculate what happens to a team’s chance of winning when their

relative strength Df increases by 1. By dividing the higher odds by the lower odds we obtain

1.228, which means that when a team’s relative strength Df increases by one their odds of

winning increases by a factor of 1.228. To prove this we should look at the original formula:

ln = β1x where relative strength = x

Then by raising both sides to the power of e, and we obtain:

where relative strength = x

Thus when we adding one to the relative strength we obtain:

Cripe 21

, where represents the odds Team A wins Game #2

21 2

*

, where is the odds of Team A winning Game #1

Now you can see that when you add one to the relative strength it results the odds of winning

being multiplied by . If we would calculate e.206 (.206 is β1 for Sagarin_Df) we obtain 1.228,

which is exactly what was in our example when the relative strength Df increases by 1.

Conclusions

After comparing and analyzing all of the data of the NCAA tournament games for the last

four years we can make some conclusions. All five rating systems do predict the winner better

than 50% which was proven using the hypothesis tests. Also there is no clear difference between

the five systems in terms of predicting outcomes. That was clearly shown using a two-sample

hypothesis test. The other questions entering this project were about whether or not the actual

numerical ratings of the teams actually matters in predicting their chance of winning in actual

games. After using the logistic model we can determine that relative strength is a statistically

significant predictor of the probability of winning for all five systems. Also we can conclude

that both relative strength difference and relative strength ratio were both effective methods of

determining relative strength. However, looking at the data it appears that the relative strength

by difference is a better predictor than relative strength by ratio. By using the R we can

conclude that the three best predictors that provided the most accurate predictions were: Sagarin

2

Cripe 22

Df, Seeds Df, and Pomeroy Df. I was glad to see that the actual Seeds rating system does just as

well, if not better, than other ratings system because the seeds is what is used in tournament. The

committee has access to all of the other rating systems, so it makes sense that the seeds rating

system is one of the most accurate.

Cripe 23

Works Cited

Massey, Kenneth. (2006). “About the Ranking Comparision.” 5, February 2008.

http://www.masseyratings.com/cf/aboutcomp.htm

McCabe, George. Moore, David. (2006). “Introduction to the practice of Statistics. Fifth Edition.” W.H. Freeman and Company, New York. 5, February 2008.

National Collegiate Athletic Assosiation (NCAA). (2007). “Principles and Procedures for Establishing the Men’s Bracket.” 5, February 2008.

http://www.ncaa.com/basketball-mensbracket/article.aspx?id=92514

Pomeroy, Ken(1). (2006). “Rating Explanation.” 5, February 2008.

http://www.kenpom.com/blog/index.php/weblog/ratings_explanation/

Pomeroy, Ken(2). (2008). “College Basketball Ratings Percentage Index.” 5, February 2008. http://www.kenpom.com/rpi.php

Schatz, Amy. (2005). “Indiana Sports Guy Tackles Time Zones.” The Wall Street Journal. 5, February 2008. http://www.post-gazette.com/pg/05292/591293.stm

http://www.masseyratings.com/cf/aboutcomp.htm

http://www.ncaa.com/basketball-mensbracket/article.aspx?id=92514

http://www.kenpom.com/blog/index.php/weblog/ratings_explanation/

http://www.kenpom.com/rpi.php

http://www.post-gazette.com/pg/05292/591293.stm

predicting outcomes of ncaa basketball …users.manchester.edu/student/adcripe/math senior...

Documents