econometric paper final draft
TRANSCRIPT
Domestic Box Office Success (1980-2016)
I. Introduction
The movie industry has been developing for hundreds of years, especially from the late
19th century until what it is today in the 21st century. It is a pivotal industry in entertainment
today, which has brought enjoyment for many people. There are hundreds of films being
released each year, which greatly differ in successes. From this knowledge we decided to
research more into what makes a film more or less successful. The purpose is to determine
a film’s success in U.S. domestic markets and to find various data that would determine if
certain variables will or will not make a movie more or less successful. We will estimate an
Ordinary Least Squares regression to attempt to explain the monetary success based off of
variables such as budget, genre, rating, etc. Will the factors that we see such as rating,
genre or MPAA affect the success of a film relative to another? The foundation of the
research project is based off of this question, where it will be expanded the further we
move forward with this project.
II. Literature Review
Although our project was almost entirely created from intuition and self-knowledge of
what we know of the movie industry we decided to find out if there were any other tests by
other economists to determine the success in the film industry. From that, we found a few
articles and research papers, but we will be using our basis off of a paper that attempts to
answer nearly the same question, called “Examining Success in the Motion Picture Industry”
by Pat Topf. Topf has decided to use Total revenue as the basis for the project, with the
variables of Production Costs, Star Power, Age-Appropriate rating as a dummy variable,
Genre, Sequel, Summer/Winter Release, Holiday Release, as well as an interaction variable
between advertising costs and professional review scores. From this research we have
decided to use it as our basis and how our model will differ from Topf’s model as we will be
using a few different variables, while trying to achieve the same result in determining the
success of a film in the motion picture industry.
III. Theoretical Model
We have decided to adjust all of the data for ticket price inflation to give a more
accurate representation of the impact of the movie in the box office through numerous
variables. The dependent variable is Domestic Gross adjusted in 2016 dollars, which we will
be determine if it is affected by: the production budget, the maximum number of theaters
that it is released in, the maximum number of weeks that it was released in a theater,
whether or not it is a sequel or not, the IMDB rating, Rotten Tomatoes Rating, the Motion
Picture Association of America (MPAA) film rating system of G, PG, PG-13, R and Unrated.
As well as the quarter of the year that it was released in, as well as all of the genres that it is
considered from Box Office Mojo.
IV. Data
The dataset that we analyzed is the top 102 films in box office gross, released in the
United States from the year 1980 to 2016. The data was chosen to be after the year 1980 as
we mainly looked pre-1980, as we felt that the production budget has one of the largest
impacts on the domestic gross of the movies. The choice of data, ensures accuracy for our
research, as it would give us complete data for the entirety of our sample size. We have
decided to use the top 102 grossing movies of all time from the given time frame as our
sample size, where it would not be an accurate representation of any movie being released
to make a guaranteed x amount from certain variables, but rather the most successful
movies have certain variables in them that would or would not have made them more or
less successful. All of the data will be available from three websites that we have chosen
that are considered one of the most popular data websites in the movie industry. We will
be using the Internet Movie Database, Rotten Tomatoes, and Box Office Mojo.
For our dependent model we will be determining the domestic gross of a film, in
millions of dollars. This variable is adjusted for inflation and will be represented as
(Domestic Gross). It is the revenue that a film has grossed during its theatrical release.
For our first independent variable, we have felt that this is our most important variable
which is the production budget, which will be denoted as (Production), as it is the amount
of money that the directors and producers of a film is given in order to produce a film by
their production company. For our second variable, we have decided to use (Theaters),
which will be the maximum number of theatres that the movie was widely released in
during its theatrical release. While (Weeks) will be the number of weeks that the given
movie in our population was in theaters at any given time. We have decided to use the
ranking from the Internet Movie Database from a scale of 1-10 based off of the average
review by an IMDB user, which is going to represented as (IMDB), which we will do the
same for Rotten Tomatoes critics, but instead being rated from a scale of 1-100, which will
be denoted as (RoT).
We have quite a few dummy variables in our data, which will be represented by a 1 if it
fulfills the category, and 0 if otherwise not fulfilling the specific category. For these dummy
variables we will be using (Release), which determines whether or not the movie was re-
released in theatre at any time, which we believe is a pivotal factor in how much revenue
that a movie can potentially earn. As well as (Sequel), which may give some movies an edge
in the market as it is part of a film in a series. We will only be using this variable if it is truly
a sequel to a previous film. We will also be using a few dummy variables, which will be the
rating that the Motion Picture Association of America gives a film, also known as the MPAA
rating which scales from G to NC-17, while we chose to remove NC-17 and Unrated as we
have found that no films in our sample size were considered either of those. The last
dummy variable that we will be using is which quarter that the movie was released in. We
originally wanted to measure whether or not it was released in the summer or winter
months as we felt that it would be a good indicator (Topf, 2009). But, we have decided not
to as many movies were not released in those specific seasons, but rather two or three
weeks before, so we have decided to just opt with using the quarter system as many
companies do quarterly releases on their financial statements.
V. Empirical Model
For our project we have decided to run a few models making adjustments as needed in
order to try to find the best feasible model given the data that was readily available for us.
After much discussion as a group we have decided to come up with two models that will be
analyzed, with our preliminary model involving all of the independent variables being
previously mention. While on the other had our Adjusted model omitting the variables:
Sequel, IMDB Rating, as well as Rotten Tomatoes Rating.
Preliminary Model:
Domestic Gross = β0 + β1(Production) + β2(Theaters) + β3(Weeks) + β4(IDBM) + β5(RoT) +
β6(Sequel) + β7(PG) + β8(PG-13) + β9(R) + β10(UR) + β11(Q2) + β12(Q3) + β13(Q4) + ε
After some theory we have decided to omit some variables from our original model to
make an adjusted model. Our second model that we examining is the model where we
will omit Sequel, IMDB rating, as well as Rotten Tomatoes Rating, as we saw these were
insignificant at 5% from our preliminary testing as seen in table 1.3, where we would
decide whether or not these variables were statistically significant or not.
Adjusted Model:
Domestic Gross = β0 + β1(Production) + β2(Theaters) + β3(Weeks) + β4(ReRelease) +
β5(PG) + β6(PG-13) + β7(R) + β8(Q2) + β9(Q3) + β10(Q4) + ε
The adjusted model is expected to receive better results after omitting variables that we
previously though were important, but statistically they would not be significant in our
hypothesis on the determination of success in the motion picture industry.
VI. Empirical Results
After running a regression on our first model, we received a R-squared value of
0.452 with an adjusted R-squared value of 0.370 as seen in table 1.1. This shows the
overall fitness of our model, and while it does not seem that strong of a model at a 0.370
value from the adjusted R-squared, we decided to do other tests to find out more about
our model. From this model, it is hard to determine the dependent variable, as we believe
that a higher adjusted R-squared would give a better explanation of the hypothesis that we
are attempting to answer. In this model we decided to do a F-test, although we were not
content with the model with the H0: B1 = B2 = B3 = B4 = B5 = B6 = B7 = B9 = B10 = B11 = B12 = B13
= 0 and 13 degrees of freedom, we found the Critical-F value of 1.83 from our degrees of
freedom in our numerator and our denominator, with the F-value being 5.513, where all of
the data can be found on table 1.2. As 5.513 is clearly greater than our critical F, we
rejected the null. Since our preliminary model should still consist of running tests, we ran a
T-test on all of the variables, where we it would help assess the chance of the slope’s true
value. The Critical-T value was 1.984 from the formula, where we rejected the null on
whatever variable was greater than the Critical-T value. There were no signs of
multicollinearity in our model as the Variance Inflation Factor was less than 5 for the
majority of our variables that were not dummy variables, so we concluded that there is no
multicollinearity in our model from our given dataset.
After a few of our models that were ran, we have decided to proceed with the
adjusted model, as it seemed as the best feasible model based off of our data that we have
acquired. This is due to the significance of every variable that we have decided to keep,
which were: Production, Theatres, Weeks, Re-release, MPAA Ratings, and Quarters. The
significance of every variable was less than 8.5%, which is the closest we believe that we can
achieve, without finding the omitted variables which may be present in our model. After
the improvement of our F-Value increasing from 5.513 to 6.950 from our preliminary model
to our adjusted model, we have decided that there is not a high chance that the variance of
the variables is equal to each other, which makes the overall fitness of our model marginally
better. In running our T-test for the adjusted model, the majority of our variables T-values
were greater than the Critical-T, so they passed the test with flying colors while rejecting
the null. As well as our F-value rejecting the Critical-F null value of 1.94, it shows that our
model is marginally more significant prior to our preliminary model.
VII. Empirical Testing
One of the “diseases” that we tested for was multicollinearity, which was pretty
easy, as we just used the Variance Inflation Factor or the VIFs. Although we have decided
to use the adjusted model, we did that for both models just to get a general idea of how
the VIFs change throughout both models. Since most of the variables did not have VIFs
above 5.0, other than the dummy variables, but we saw those as exceptions as they are in
binary codes consisting of 0 and 1s, as to whether or not they met that variable
requirement or not. We have decided that we do not have multicollinearity among our
variables in our model. The VIF values are in tables 1.3 and 2.3, and is consistent for both
models run, that there are no signs of multicollinearity, with the exceptions of the dummy
variables.
After checking for multicollinearity, we would check whether or not our model has
heteroscedasticity, as in order to have a model it should pass all of the classical
assumptions, where one of the assumptions is that the variances in the model has to be
homoscedastic. After plotting the residuals against the standard predicted value in table
3.1, as well as the frequency and residuals in a histogram on table 3.2, we may have
heteroscedasticity, so we would clearly decide to test for it and attempt to correct for it.
We will test for it using the White Test, as we do not know the Z-value for the Park Test, we
felt that the White Test would be the best choice.
To do the white test, we squared the error term observations against the
independent variables that included all of the variables from the adjusted model, the
independent variables from the adjusted regression squared, and the independent
variables from the adjusted regression multiplied by another independent variable, doing
all of the possible combinations, while not including the dummy variables when squaring
or multiplying the independent variables against one another. Running the regression, we
get table 3.3 and 3.4. Find the critical-chi square value it was 26.30 at a 5% significance
level with 16 degrees of freedom after accounting for the White Test variables. After
finding the Chi-square we receive a value of 23.5 after multiplying the Population sample
against the adjusted R-squared from the White Test regression. We can conclude since
23.5>26.3 is not true, we do not reject the null so there is no heteroscedasticity present in
our model.
VIII. Conclusion
The results from our research suggest that Production Budget, MPAA Rating, the
number of weeks the movie was in theaters, as well as whether or not the movie was re-
released has a positive and statistically significant effect on the domestic gross revenue of a
film.
We have also concluded that the quarter in which the movie was released in is
inconclusive to our model, which may be replaced by whether or not the movie was
released on or near a holiday to give our model a better fit. While ratings by internet
websites such as the Internet Movie Database and Rotten Tomatoes, as widely popular they
are in the film industry is insignificant in our model, as well as whether or not a movie was a
sequel or not so we have decided to omit those variables from our original model.
After testing for both multicollinearity and heteroscedasticity, the implications of our
findings is that although our model may not be extremely strong there is still some
significance to it. Production budget is probably the most important factor in determining
how much a movie will make in the box office, while there are numerous of other variables
such as the MPAA rating or how long the movie was in the theatres, although it can be
argued that the length that a movie stays in theatres is due to causation of the movie
making a large amount of money and not a factor in determining how much a movie will
make. From our findings, there is a lot of variables to take into account that we did not
include, which could be added on, such as adding more movies into the data set to get a
larger sample size to get a more accurate representation, instead of the top 102 movies that
we have chosen for this movie. As well as, adding other dummy variables such as which
production company helped the movie acquire funds and advertising, or whether a movie
was released on a holiday.
Consequently, we can conclude that the film industry is very volatile and risky, and
certain variables can increasingly change the domestic box office success of a film, although
we there is not complete transparency from production companies in data, such as
advertising budgets. The movie industry is extremely complex and it cannot be narrowed
down by a few variables in its success, but rather it is determined by large numbers of
individuals willing and able to see a movie, which is hard to quantify statistically especially if
they will see the movies in theatres or not.
Appendix
Table 1.1
Table 1.2
Table 1.3
Table 2.1
Table 2.2
Table 2.3
Table 3.1
Table 3.2
Table 3.3
Table
3.4
Model Summaryb
Model R R Square Adjusted R SquareStd. Error of the
Estimate1 .592a .351 .227 28238.27525
a. Predictors: (Constant), ReleaseWeeks, Quarter 2, PG, Re-Release, R, Quarter 3, ProductionSquared, WeeksSquared, PG-13, Quarter 4, ReleaseSquared, ProductionWeeks, # of Weeks, Production Budget (in millions of dollars), Widest Release (theatres), ProductionReleaseb. Dependent Variable: ResidualSquared
ANOVAa
ModelSum of Squares df Mean Square F Sig.
1 Regression 36222956190.000
16 2263934762.000
2.839 .001b
Residual 66981615870.000
84 797400188.900
Total 103204572100.000
100
a. Dependent Variable: ResidualSquaredb. Predictors: (Constant), ReleaseWeeks, Quarter 2, PG, Re-Release, R, Quarter 3, ProductionSquared, WeeksSquared, PG-13, Quarter 4, ReleaseSquared, ProductionWeeks, # of Weeks, Production Budget (in millions of dollars), Widest Release (theatres), ProductionRelease
Works Cited
“Box Office Mojo” BoxOfficeMojo. 5/2/2016. http://www.boxofficemojo.com/
“The Internet Movie Database” IMDB. 1980-2016. http://www.imdb.com/
“Rotten Tomatoes” RottenTomatoes. 5/2/2016. Http://www.rottentomatoes.com/
Topf, P. (2009). Examing Success in the Motion Picture Industry. Retrieved May 2, 2016, from https://www.iwu.edu/economics/PPE18/8Topf.pdf