econometric paper final draft

Domestic Box Office Success (1980-2016)

I. Introduction

The movie industry has been developing for hundreds of years, especially from the late

19th century until what it is today in the 21st century. It is a pivotal industry in entertainment

today, which has brought enjoyment for many people. There are hundreds of films being

released each year, which greatly differ in successes. From this knowledge we decided to

research more into what makes a film more or less successful. The purpose is to determine

a film’s success in U.S. domestic markets and to find various data that would determine if

certain variables will or will not make a movie more or less successful. We will estimate an

Ordinary Least Squares regression to attempt to explain the monetary success based off of

variables such as budget, genre, rating, etc. Will the factors that we see such as rating,

genre or MPAA affect the success of a film relative to another? The foundation of the

research project is based off of this question, where it will be expanded the further we

move forward with this project.

II. Literature Review

Although our project was almost entirely created from intuition and self-knowledge of

what we know of the movie industry we decided to find out if there were any other tests by

other economists to determine the success in the film industry. From that, we found a few

articles and research papers, but we will be using our basis off of a paper that attempts to

answer nearly the same question, called “Examining Success in the Motion Picture Industry”

by Pat Topf. Topf has decided to use Total revenue as the basis for the project, with the

variables of Production Costs, Star Power, Age-Appropriate rating as a dummy variable,

Genre, Sequel, Summer/Winter Release, Holiday Release, as well as an interaction variable

between advertising costs and professional review scores. From this research we have

decided to use it as our basis and how our model will differ from Topf’s model as we will be

using a few different variables, while trying to achieve the same result in determining the

success of a film in the motion picture industry.

III. Theoretical Model

We have decided to adjust all of the data for ticket price inflation to give a more

accurate representation of the impact of the movie in the box office through numerous

variables. The dependent variable is Domestic Gross adjusted in 2016 dollars, which we will

be determine if it is affected by: the production budget, the maximum number of theaters

that it is released in, the maximum number of weeks that it was released in a theater,

whether or not it is a sequel or not, the IMDB rating, Rotten Tomatoes Rating, the Motion

Picture Association of America (MPAA) film rating system of G, PG, PG-13, R and Unrated.

As well as the quarter of the year that it was released in, as well as all of the genres that it is

considered from Box Office Mojo.

IV. Data

The dataset that we analyzed is the top 102 films in box office gross, released in the

United States from the year 1980 to 2016. The data was chosen to be after the year 1980 as

we mainly looked pre-1980, as we felt that the production budget has one of the largest

impacts on the domestic gross of the movies. The choice of data, ensures accuracy for our

research, as it would give us complete data for the entirety of our sample size. We have

decided to use the top 102 grossing movies of all time from the given time frame as our

sample size, where it would not be an accurate representation of any movie being released

to make a guaranteed x amount from certain variables, but rather the most successful

movies have certain variables in them that would or would not have made them more or

less successful. All of the data will be available from three websites that we have chosen

that are considered one of the most popular data websites in the movie industry. We will

be using the Internet Movie Database, Rotten Tomatoes, and Box Office Mojo.

For our dependent model we will be determining the domestic gross of a film, in

millions of dollars. This variable is adjusted for inflation and will be represented as

(Domestic Gross). It is the revenue that a film has grossed during its theatrical release.

For our first independent variable, we have felt that this is our most important variable

which is the production budget, which will be denoted as (Production), as it is the amount

of money that the directors and producers of a film is given in order to produce a film by

their production company. For our second variable, we have decided to use (Theaters),

which will be the maximum number of theatres that the movie was widely released in

during its theatrical release. While (Weeks) will be the number of weeks that the given

movie in our population was in theaters at any given time. We have decided to use the

ranking from the Internet Movie Database from a scale of 1-10 based off of the average

review by an IMDB user, which is going to represented as (IMDB), which we will do the

same for Rotten Tomatoes critics, but instead being rated from a scale of 1-100, which will

be denoted as (RoT).

We have quite a few dummy variables in our data, which will be represented by a 1 if it

fulfills the category, and 0 if otherwise not fulfilling the specific category. For these dummy

variables we will be using (Release), which determines whether or not the movie was re-

released in theatre at any time, which we believe is a pivotal factor in how much revenue

that a movie can potentially earn. As well as (Sequel), which may give some movies an edge

in the market as it is part of a film in a series. We will only be using this variable if it is truly

a sequel to a previous film. We will also be using a few dummy variables, which will be the

rating that the Motion Picture Association of America gives a film, also known as the MPAA

rating which scales from G to NC-17, while we chose to remove NC-17 and Unrated as we

have found that no films in our sample size were considered either of those. The last

dummy variable that we will be using is which quarter that the movie was released in. We

originally wanted to measure whether or not it was released in the summer or winter

months as we felt that it would be a good indicator (Topf, 2009). But, we have decided not

to as many movies were not released in those specific seasons, but rather two or three

weeks before, so we have decided to just opt with using the quarter system as many

companies do quarterly releases on their financial statements.

V. Empirical Model

For our project we have decided to run a few models making adjustments as needed in

order to try to find the best feasible model given the data that was readily available for us.

After much discussion as a group we have decided to come up with two models that will be

analyzed, with our preliminary model involving all of the independent variables being

previously mention. While on the other had our Adjusted model omitting the variables:

Sequel, IMDB Rating, as well as Rotten Tomatoes Rating.

Preliminary Model:

Domestic Gross = β0 + β1(Production) + β2(Theaters) + β3(Weeks) + β4(IDBM) + β5(RoT) +

β6(Sequel) + β7(PG) + β8(PG-13) + β9(R) + β10(UR) + β11(Q2) + β12(Q3) + β13(Q4) + ε

After some theory we have decided to omit some variables from our original model to

make an adjusted model. Our second model that we examining is the model where we

will omit Sequel, IMDB rating, as well as Rotten Tomatoes Rating, as we saw these were

insignificant at 5% from our preliminary testing as seen in table 1.3, where we would

decide whether or not these variables were statistically significant or not.

Adjusted Model:

Domestic Gross = β0 + β1(Production) + β2(Theaters) + β3(Weeks) + β4(ReRelease) +

β5(PG) + β6(PG-13) + β7(R) + β8(Q2) + β9(Q3) + β10(Q4) + ε

The adjusted model is expected to receive better results after omitting variables that we

previously though were important, but statistically they would not be significant in our

hypothesis on the determination of success in the motion picture industry.

VI. Empirical Results

After running a regression on our first model, we received a R-squared value of

0.452 with an adjusted R-squared value of 0.370 as seen in table 1.1. This shows the

overall fitness of our model, and while it does not seem that strong of a model at a 0.370

value from the adjusted R-squared, we decided to do other tests to find out more about

our model. From this model, it is hard to determine the dependent variable, as we believe

that a higher adjusted R-squared would give a better explanation of the hypothesis that we

are attempting to answer. In this model we decided to do a F-test, although we were not

content with the model with the H0: B1 = B2 = B3 = B4 = B5 = B6 = B7 = B9 = B10 = B11 = B12 = B13

= 0 and 13 degrees of freedom, we found the Critical-F value of 1.83 from our degrees of

freedom in our numerator and our denominator, with the F-value being 5.513, where all of

the data can be found on table 1.2. As 5.513 is clearly greater than our critical F, we

rejected the null. Since our preliminary model should still consist of running tests, we ran a

T-test on all of the variables, where we it would help assess the chance of the slope’s true

value. The Critical-T value was 1.984 from the formula, where we rejected the null on

whatever variable was greater than the Critical-T value. There were no signs of

multicollinearity in our model as the Variance Inflation Factor was less than 5 for the

majority of our variables that were not dummy variables, so we concluded that there is no

multicollinearity in our model from our given dataset.

After a few of our models that were ran, we have decided to proceed with the

adjusted model, as it seemed as the best feasible model based off of our data that we have

acquired. This is due to the significance of every variable that we have decided to keep,

which were: Production, Theatres, Weeks, Re-release, MPAA Ratings, and Quarters. The

significance of every variable was less than 8.5%, which is the closest we believe that we can

achieve, without finding the omitted variables which may be present in our model. After

the improvement of our F-Value increasing from 5.513 to 6.950 from our preliminary model

to our adjusted model, we have decided that there is not a high chance that the variance of

the variables is equal to each other, which makes the overall fitness of our model marginally

better. In running our T-test for the adjusted model, the majority of our variables T-values

were greater than the Critical-T, so they passed the test with flying colors while rejecting

the null. As well as our F-value rejecting the Critical-F null value of 1.94, it shows that our

model is marginally more significant prior to our preliminary model.

VII. Empirical Testing

One of the “diseases” that we tested for was multicollinearity, which was pretty

easy, as we just used the Variance Inflation Factor or the VIFs. Although we have decided

to use the adjusted model, we did that for both models just to get a general idea of how

the VIFs change throughout both models. Since most of the variables did not have VIFs

above 5.0, other than the dummy variables, but we saw those as exceptions as they are in

binary codes consisting of 0 and 1s, as to whether or not they met that variable

requirement or not. We have decided that we do not have multicollinearity among our

variables in our model. The VIF values are in tables 1.3 and 2.3, and is consistent for both

models run, that there are no signs of multicollinearity, with the exceptions of the dummy

variables.

After checking for multicollinearity, we would check whether or not our model has

heteroscedasticity, as in order to have a model it should pass all of the classical

assumptions, where one of the assumptions is that the variances in the model has to be

homoscedastic. After plotting the residuals against the standard predicted value in table

3.1, as well as the frequency and residuals in a histogram on table 3.2, we may have

heteroscedasticity, so we would clearly decide to test for it and attempt to correct for it.

We will test for it using the White Test, as we do not know the Z-value for the Park Test, we

felt that the White Test would be the best choice.

To do the white test, we squared the error term observations against the

independent variables that included all of the variables from the adjusted model, the

independent variables from the adjusted regression squared, and the independent

variables from the adjusted regression multiplied by another independent variable, doing

all of the possible combinations, while not including the dummy variables when squaring

or multiplying the independent variables against one another. Running the regression, we

get table 3.3 and 3.4. Find the critical-chi square value it was 26.30 at a 5% significance

level with 16 degrees of freedom after accounting for the White Test variables. After

finding the Chi-square we receive a value of 23.5 after multiplying the Population sample

against the adjusted R-squared from the White Test regression. We can conclude since

23.5>26.3 is not true, we do not reject the null so there is no heteroscedasticity present in

our model.

VIII. Conclusion

The results from our research suggest that Production Budget, MPAA Rating, the

number of weeks the movie was in theaters, as well as whether or not the movie was re-

released has a positive and statistically significant effect on the domestic gross revenue of a

film.

We have also concluded that the quarter in which the movie was released in is

inconclusive to our model, which may be replaced by whether or not the movie was

released on or near a holiday to give our model a better fit. While ratings by internet

websites such as the Internet Movie Database and Rotten Tomatoes, as widely popular they

are in the film industry is insignificant in our model, as well as whether or not a movie was a

sequel or not so we have decided to omit those variables from our original model.

After testing for both multicollinearity and heteroscedasticity, the implications of our

findings is that although our model may not be extremely strong there is still some

significance to it. Production budget is probably the most important factor in determining

how much a movie will make in the box office, while there are numerous of other variables

such as the MPAA rating or how long the movie was in the theatres, although it can be

argued that the length that a movie stays in theatres is due to causation of the movie

making a large amount of money and not a factor in determining how much a movie will

make. From our findings, there is a lot of variables to take into account that we did not

include, which could be added on, such as adding more movies into the data set to get a

larger sample size to get a more accurate representation, instead of the top 102 movies that

we have chosen for this movie. As well as, adding other dummy variables such as which

production company helped the movie acquire funds and advertising, or whether a movie

was released on a holiday.

Consequently, we can conclude that the film industry is very volatile and risky, and

certain variables can increasingly change the domestic box office success of a film, although

we there is not complete transparency from production companies in data, such as

advertising budgets. The movie industry is extremely complex and it cannot be narrowed

down by a few variables in its success, but rather it is determined by large numbers of

individuals willing and able to see a movie, which is hard to quantify statistically especially if

they will see the movies in theatres or not.

Appendix

Table 1.1

Table 1.2

Table 1.3

Table 2.1

Table 2.2

Table 2.3

Table 3.1

Table 3.2

Table 3.3

Table

3.4

Model Summaryb

Model R R Square Adjusted R SquareStd. Error of the

Estimate1 .592a .351 .227 28238.27525

a. Predictors: (Constant), ReleaseWeeks, Quarter 2, PG, Re-Release, R, Quarter 3, ProductionSquared, WeeksSquared, PG-13, Quarter 4, ReleaseSquared, ProductionWeeks, # of Weeks, Production Budget (in millions of dollars), Widest Release (theatres), ProductionReleaseb. Dependent Variable: ResidualSquared

ANOVAa

ModelSum of Squares df Mean Square F Sig.

1 Regression 36222956190.000

16 2263934762.000

2.839 .001b

Residual 66981615870.000

84 797400188.900

Total 103204572100.000

100

a. Dependent Variable: ResidualSquaredb. Predictors: (Constant), ReleaseWeeks, Quarter 2, PG, Re-Release, R, Quarter 3, ProductionSquared, WeeksSquared, PG-13, Quarter 4, ReleaseSquared, ProductionWeeks, # of Weeks, Production Budget (in millions of dollars), Widest Release (theatres), ProductionRelease

Works Cited

“Box Office Mojo” BoxOfficeMojo. 5/2/2016. http://www.boxofficemojo.com/

“The Internet Movie Database” IMDB. 1980-2016. http://www.imdb.com/

“Rotten Tomatoes” RottenTomatoes. 5/2/2016. Http://www.rottentomatoes.com/

Topf, P. (2009). Examing Success in the Motion Picture Industry. Retrieved May 2, 2016, from https://www.iwu.edu/economics/PPE18/8Topf.pdf

econometric paper final draft

Documents