design and implementation of a dynamic data mlp to predict motion picture revenue david a. gerasimow

11
Design and Design and Implementation of a Implementation of a Dynamic Data MLP to Dynamic Data MLP to Predict Motion Picture Predict Motion Picture Revenue Revenue David A. Gerasimow David A. Gerasimow

Upload: emily-tucker

Post on 13-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design and Implementation of a Dynamic Data MLP to Predict Motion Picture Revenue David A. Gerasimow

Design and Implementation of a Design and Implementation of a Dynamic Data MLP to Predict Dynamic Data MLP to Predict

Motion Picture RevenueMotion Picture Revenue

David A. GerasimowDavid A. Gerasimow

Page 2: Design and Implementation of a Dynamic Data MLP to Predict Motion Picture Revenue David A. Gerasimow

Problem StatementProblem Statement

Problem: Motion picture revenue is seemingly Problem: Motion picture revenue is seemingly unpredictable.unpredictable.

Solution: Develop an artificial neural network that Solution: Develop an artificial neural network that takes into account the characteristics of takes into account the characteristics of successful films and predicts the opening successful films and predicts the opening weekend box-office revenue of upcoming weekend box-office revenue of upcoming releases.releases.

However, the film industry is constantly changing However, the film industry is constantly changing as is public taste. as is public taste.

Consequently, develop dynamic data artificial Consequently, develop dynamic data artificial neural network that is constantly retraining itself neural network that is constantly retraining itself to the most up-to-date data.to the most up-to-date data.

Page 3: Design and Implementation of a Dynamic Data MLP to Predict Motion Picture Revenue David A. Gerasimow

Data Collection 1Data Collection 1

Determine the significant characteristics of a film Determine the significant characteristics of a film that contribute to its success or failure at the box-that contribute to its success or failure at the box-office.office.

The characteristics include:The characteristics include:

1) Month and year of release1) Month and year of release 2) Genre2) Genre

3) Rating (i.e., G, PG, etc.)3) Rating (i.e., G, PG, etc.) 4) Runtime4) Runtime

5) Number of theatres in which the film is played5) Number of theatres in which the film is played

6) Production studio6) Production studio

7) Holiday weekend opening?7) Holiday weekend opening?

8) Sequel?8) Sequel?

9) Color, black and white, or animation9) Color, black and white, or animation

Page 4: Design and Implementation of a Dynamic Data MLP to Predict Motion Picture Revenue David A. Gerasimow

Data Collection 2Data Collection 2 All films released since 1989 that earned more than fifteen million All films released since 1989 that earned more than fifteen million

dollars can be found at: www.boxofficeguru.comdollars can be found at: www.boxofficeguru.com Furthermore, film specific information (i.e., genre, etc.) is listed at:Furthermore, film specific information (i.e., genre, etc.) is listed at:

www.imdb.comwww.imdb.com Data collection application development (in Visual Basic 6.0)Data collection application development (in Visual Basic 6.0)

dataextractor.exe extracts information from files downloaded from dataextractor.exe extracts information from files downloaded from www.boxofficeguru.com and converts them to a readable format.www.boxofficeguru.com and converts them to a readable format.

dataconcatenator.exe links the readable files into a single file.dataconcatenator.exe links the readable files into a single file. dataconverter.exe searches single data file to determine which data dataconverter.exe searches single data file to determine which data

fields need to be filled in manually at www.imdb.comfields need to be filled in manually at www.imdb.com This data collection process needs to be performed only once and This data collection process needs to be performed only once and

using it to design an ANN will create a standard using it to design an ANN will create a standard static data static data neural neural network.network.

Page 5: Design and Implementation of a Dynamic Data MLP to Predict Motion Picture Revenue David A. Gerasimow

Dynamic Data CollectionDynamic Data Collection

Develop an application that will gather data Develop an application that will gather data continually and automatically – allowing ANN to continually and automatically – allowing ANN to be retrained using up-to-date data.be retrained using up-to-date data.

updatewizard.exe (developed in Visual Basic 6.0)updatewizard.exe (developed in Visual Basic 6.0) Functionality of updatewizard.exeFunctionality of updatewizard.exe

Step 1: Download up-to-date data from Step 1: Download up-to-date data from www.boxofficeguru.com, process and concatenate.www.boxofficeguru.com, process and concatenate.

Step 2: Compare up-to-date data to current data. If there Step 2: Compare up-to-date data to current data. If there is a difference, ANN needs to be retrained.is a difference, ANN needs to be retrained.

Step 3: Create new training and testing files from up-to-Step 3: Create new training and testing files from up-to-date data.date data.

Page 6: Design and Implementation of a Dynamic Data MLP to Predict Motion Picture Revenue David A. Gerasimow

Developing ANNDeveloping ANN

For motion picture revenue For motion picture revenue application, MLP is appropriate.application, MLP is appropriate.

Determine optimal MLP configuration Determine optimal MLP configuration using:using: Three-way cross-validationThree-way cross-validation Multiple trials of MLP trainingMultiple trials of MLP training Compute mean and standard deviation Compute mean and standard deviation

of classification rates to choose of classification rates to choose configuration.configuration.

Page 7: Design and Implementation of a Dynamic Data MLP to Predict Motion Picture Revenue David A. Gerasimow

MLP ConfigurationMLP Configuration

After three-way cross-validation and After three-way cross-validation and multiple trials, the results were:multiple trials, the results were: 10-6-X configuration (X represents the number of output 10-6-X configuration (X represents the number of output

classes – varies depending on options chosen in classes – varies depending on options chosen in updatewizard.exe)updatewizard.exe)

Learning rate: Learning rate: αα = 0.1 = 0.1 Momentum constant: Momentum constant: μμ = 0.7 = 0.7 Max. number of epochs: 5000Max. number of epochs: 5000 Samples per epoch: 64Samples per epoch: 64 Scaling of input: [-5, 5]Scaling of input: [-5, 5] Other values are defaults as specified in bp.mOther values are defaults as specified in bp.m

Page 8: Design and Implementation of a Dynamic Data MLP to Predict Motion Picture Revenue David A. Gerasimow

MATLAB Files for MLPMATLAB Files for MLP

Project MATLAB m-files modified from Professor Project MATLAB m-files modified from Professor Yu Hen Hu’s code for back-propagation MLP.Yu Hen Hu’s code for back-propagation MLP.

Modified code contained in:Modified code contained in: moviesbp.mmoviesbp.m moviesbptest.mmoviesbptest.m moviesbpconfig.mmoviesbpconfig.m

Modification allows for:Modification allows for: application specific characteristicsapplication specific characteristics hard-coding of configurationhard-coding of configuration interfaces with Windows application to predict opening interfaces with Windows application to predict opening

weekend revenue of a newly-released filmweekend revenue of a newly-released film

Page 9: Design and Implementation of a Dynamic Data MLP to Predict Motion Picture Revenue David A. Gerasimow

PredictionPrediction

Windows application newmovie.exe Windows application newmovie.exe allows user to enter a newly-released allows user to enter a newly-released film’s characteristics using a film’s characteristics using a graphical user interface.graphical user interface.

newmovie.exe stores data in newmovie.exe stores data in testsinglemovie.txt – which is read testsinglemovie.txt – which is read by moviesbp.m. Then, the by moviesbp.m. Then, the moviesbp.m classifies the film.moviesbp.m classifies the film.

Page 10: Design and Implementation of a Dynamic Data MLP to Predict Motion Picture Revenue David A. Gerasimow

Results 1Results 1

MLP Classification Rates: 54% - 59%MLP Classification Rates: 54% - 59% Improvement over past ANN approaches Improvement over past ANN approaches

used by students in CS/ECE/ME 539.used by students in CS/ECE/ME 539. Random classification: Roughly 20%Random classification: Roughly 20% Clearly, MLP performs well.Clearly, MLP performs well.

Dynamic Data AspectDynamic Data Aspect Data is updated weekly on Data is updated weekly on

www.boxofficeguru.com. Run www.boxofficeguru.com. Run updatewizard.exe to update automatically.updatewizard.exe to update automatically.

Page 11: Design and Implementation of a Dynamic Data MLP to Predict Motion Picture Revenue David A. Gerasimow

Results 2Results 2

The project was functional for less The project was functional for less than two weeks.than two weeks. Thus, not enough time has past to Thus, not enough time has past to

accumulate enough data to make a accumulate enough data to make a statistically significant improvements in statistically significant improvements in MLP performance.MLP performance.

According to dynamic data ANN model, According to dynamic data ANN model, performance should increase gradually performance should increase gradually over time. over time.