all genres hamming test loss accuracy p a m g f i pcs229.stanford.edu/proj2019spr/poster/9.pdf ·...
TRANSCRIPT
PREDICTING A MOVIE’S GENRE FROM ITS POSTER
GABRIEL BARNEY AND KRIS KAYA
STANFORD UNIVERSITY SYMBOLIC SYSTEMS PROGRAM
{barneyga, kkaya23}@stanford.edu
Motivation
● The film industry is one industry that is incredibly reliant upon the use of posters to promote movies.○ Posters must convey a movie’s theme and genre to make the film
seem as appealing as possible to a wide variety of people○ This makes the features that a poster include on it incredibly
important in the portrayal of a movie● Our project attempted to train a model that could learn features on a
movie poster and predict the movie’s genre/genres on the basis of these features.
Results
References
Discussion
At Least One Genre All Genres Hamming Loss
Baseline 12.34% ------ -----
ML- kNN (K=40) 34.28% 7.77% 0.117
OVR- kNN (K=40) 35.428% 9.71% 0.118
Data/Features
Table 1: Performance of ML-kNN and OVR-kNN models
Future
● For this project, we used features to make predictions about genre, however, movie posters contain more potential information about a film.○ Given more time, we would attempt to predict other things about a
movie from its poster such as viewer ratings or cast members.● The current dataset had a large number of dramas and few TV movies - we
could augment our dataset to expose our model to more examples and make our model more robust.
Figure 1: A visualization of our problem with an example poster
1. The Movies Dataset” .https://www.kaggle.com/rounakbanik/the-movies-datase2. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.903. Zhang, Min-Ling and Zhi-Hua Zhou. “ML-KNN: A lazy learning approach to multi-label learning.” Pattern Recognition 40 (2007): 2038-2048.
Methods and Models
1. Random Baseline - Test Size: (2000). Randomly selects genres for a given movie.
2. K-Nearest Neighbors3 - Dataset split: (2800, 350, 350). Uses MAP to predict on unseen examples on the basis of their K - nearest neighbors. Can be framed as a multi-label classification problem (ML) or multiple binary classification problems (OVR) Objective is minimizing Hamming Loss (defined below):
3. ResNet342 - Dataset split: (28000, 3500, 3500). Pretrained network. We replaced the final softmax layer with a sigmoid layer and changed the loss function from Cross Entropy Loss to Binary Cross Entropy Loss. ResNets solve the problem of vanishing gradients by using residual blocks.
Figure 3: ResNet34 Architecture Figure 4: Residual Block
4. Custom Architecture - Dataset Split: (28000, 3500, 3500). A simple CNN architecture with Maxpool and Dropout Layers as well as an Adam Optimizer and binary cross entropy loss (shown below).
● We used the Full MovieLens Dataset1 from Kaggle, which consists of meta-data collected from TMDB and GroupLens. ○ The dataset contains entries for 45,466 movies and each
entry for a given movie contains various elements about the film such as genre, user rating, cast, and most importantly, poster.
○ We preprocessed the dataset to remove entries with improper formatting, which simplified working with the data, and isolated the genres.
● We formatted each individual poster into a 224x224 square grid. ○ Our raw input data is the color of each pixel in the image
expressed in terms of RGB values - a 224x224x3 matrix.○ The genres are encoded using a one-hot vector.
Figure 2: An example poster from our dataset
● The ResNet network and the Custom Architecture performed slightly better in pure accuracy when compared to ML-kNN○ The ResNet and Custom Architecture both had better performance in other evaluation
metrics such as F1 Score, Recall, and Top K Categorical Accuracy.● The distribution of our dataset may predispose our model towards certain genres.
○ We had a range of 655 class members (TV Movies) to 15941 members (Dramas)● Our model also would sometimes learn features that were prevalent but not intrinsic to a
given genre, which could account for errors.○ This may also have been caused by issues of resolution.
Type Size Stride Outputs
CONV 5x5 2 64
MAXPOOL 2x2 --- 64
CONV 5x5 2 128
MAXPOOL 2x2 --- 128
CONV 5x5 2 256
MAXPOOL 2x2 --- 256
DROPOUT --- --- 256
FLATTEN --- --- 1024
RELU --- --- 128
DROPOUT --- --- 128
SIGMOID --- --- 20
Figure 5: The custom architecture
Binary Cross Entropy Loss
At Least One Genre
All Genres Hamming Loss
Test Loss Accuracy
ResNet34 38.26% 12.49% 0.0938 0.2486 90.62%
Genre Recall Precision F1 Count
Animation 0.44 0.84 0.58 135
Comedy 0.40 0.77 0.52 1135
Drama 0.39 0.68 0.50 1558
Horror 0.32 0.52 0.40 406
Family 0.23 0.75 0.35 233
Table 2: Performance of ResNet34 Model
Table 3: Top 5 class performances by ResNet34
Figure 6: Top 5 Class performances by Custom Architecture
Figure 6: Top K Categorical Accuracy - Custom Architecture
Genre Recall Precision F1 Count
Drama 0.46 0.48 0.47 1558
Comedy 0.38 0.46 0.41 1135
Thriller 0.17 0.35 0.23 632
Horror 0.10 0.27 0.15 406
ACtion 0.08 0.26 0.12 526