people’s control over their own image in photossamuel.molinari.me/files/dissertation.pdf ·...

People’s Control Over Their Own Image in

Photos on Social Networks

By Samuel Molinari (092083932)

8th May 2012

Supervisor: Dr. Steve Riddle

Word Count: 16,950

2

Abstract

In recent years, social networks have understood that sharing photos with friends, family or even the

entire world was an important part of the social experience. They have worked hard to make photo

sharing a very easy task. Thus allowing users to upload their photos directly from their cameras, or

sharing them on the go via mobile phones. Whilst the owners of these photos are given more

control over how these photos should be shared, people who have their photo taken have very few

privacy options over their image.

The motivation of this project is to help those people to gain more control over their image when a

photo of themselves is uploaded by a third party.

3

Declaration

“I declare that this dissertation represents my own work, except where otherwise stated.”

4

Acknowledgements

I would like to thank my supervisor Dr. Steve Riddle for his support throughout this project. I also

would like to thank Ashleigh Turnbull for her support and help that got me through this busy year.

5

Table of Contents

Abstract ................................................................................................................................................... 2

Declaration .............................................................................................................................................. 3

Acknowledgements ................................................................................................................................. 4

1. Introduction .................................................................................................................................. 10

1.1. Problem ................................................................................................................................. 10

1.2. Aim & Objectives ................................................................................................................... 10

1.3. Document Structure.............................................................................................................. 10

2. Background Research .................................................................................................................... 12

2.1. Current Options Flaws .......................................................................................................... 12

2.1.1. Issues with Photo Sharing ............................................................................................. 12

2.1.2. Lack of Awareness ......................................................................................................... 12

2.1.3. Intrusive Tagging System .............................................................................................. 12

2.1.4. Unbalanced Control ...................................................................................................... 12

2.2. Controls Improvements ........................................................................................................ 13

2.2.1. Instantly Raise Awareness............................................................................................. 13

2.2.2. Automatically Hide Users .............................................................................................. 13

2.2.3. Tagging .......................................................................................................................... 13

2.3. Face Detection ...................................................................................................................... 13

2.3.1. Overview ....................................................................................................................... 13

2.3.2. Haar Features ................................................................................................................ 14

2.3.3. Classifier ........................................................................................................................ 16

2.3.4. Cascade ......................................................................................................................... 18

2.3.5. Sub-Windows ................................................................................................................ 19

2.3.6. Integral Image ............................................................................................................... 19

2.4. Face Recognition ................................................................................................................... 20

2.4.1. Overview ....................................................................................................................... 20

2.4.2. Generate Eigenfaces ..................................................................................................... 20

2.4.3. Use Eigenfaces for Recognition ..................................................................................... 22

3. System Design ............................................................................................................................... 23

3.1. Requirements ........................................................................................................................ 23

3.2. Face Detector/Recogniser ..................................................................................................... 24

3.2.1. Class Diagram ................................................................................................................ 24

6

3.2.2. Activity Diagrams .......................................................................................................... 25

3.3. Mock-up Web Application .................................................................................................... 28

3.3.1. Database ....................................................................................................................... 28

3.3.2. Models .......................................................................................................................... 28

3.3.3. Controllers ..................................................................................................................... 30

4. Image Processing Implementation ............................................................................................... 30

4.1. Introduction .......................................................................................................................... 30

4.2. Technologies Used ................................................................................................................ 31

4.2.1. OpenCV ......................................................................................................................... 31

4.2.2. RMagick ......................................................................................................................... 31

4.3. Face Detection ...................................................................................................................... 31

4.3.1. Basic face detection program ....................................................................................... 31

4.3.2. Upright Frontal Faces .................................................................................................... 32

4.3.3. Tilted Frontal Faces ....................................................................................................... 34

4.3.4. Conclusion ..................................................................................................................... 36


4.4.1. Overview ....................................................................................................................... 36

4.4.2. Background Removal .................................................................................................... 37

4.4.3. Equalise Histogram ....................................................................................................... 37

4.4.4. Keep the Faces Straight Up ........................................................................................... 38

4.5. Filters ..................................................................................................................................... 39

4.5.1. Introduction .................................................................................................................. 39

4.5.2. Plain Cover .................................................................................................................... 39

4.5.3. Blurring .......................................................................................................................... 40

4.5.4. Pixelate .......................................................................................................................... 41

4.5.5. Face Replacement ......................................................................................................... 42

4.6. Conclusion ............................................................................................................................. 42

5. Mock-up Web Application Implementation ................................................................................. 43

5.1. Technologies Used ................................................................................................................ 43

5.1.1. HTML 5 .......................................................................................................................... 43

5.1.2. JavaScript/jQuery .......................................................................................................... 43

5.1.3. Ruby on Rails ................................................................................................................. 43

5.1.4. MySQL ........................................................................................................................... 43

5.2. Photo Upload ........................................................................................................................ 44

7

5.2.1. Drag & Drop .................................................................................................................. 44

5.2.2. Background Upload ....................................................................................................... 44


5.3.1. Run Script from Web Application ................................................................................. 45

5.3.2. Parse Face Database to Script ....................................................................................... 46

5.3.3. Handling Script Output ................................................................................................. 46

5.3.4. Train User Recognition .................................................................................................. 46

5.4. Recognition Notifications ...................................................................................................... 47

5.4.1. On Recognition .............................................................................................................. 47

5.4.2. Notifications Action Centre ........................................................................................... 48

5.5. Privacy Settings ..................................................................................................................... 48

5.5.1. Automatic General Settings .......................................................................................... 48

5.5.2. Automatic User Specific Settings .................................................................................. 50

5.5.3. Photo Specific Settings .................................................................................................. 50

5.6. Photo Viewing ....................................................................................................................... 51

5.6.1. Access Control ............................................................................................................... 51

5.6.2. Image Processing .......................................................................................................... 52

5.6.3. Photo Viewer ................................................................................................................. 53

5.7. Conclusion ............................................................................................................................. 54

6. Testing & Results ........................................................................................................................... 55

6.1. Questionnaire ....................................................................................................................... 55

6.1.1. How much control do you feel you have over photos of yourself being uploaded by

others on social networks? ........................................................................................................... 55

6.1.2. Did you ever report a photo you were part of? ............................................................ 55

6.1.3. Have you ever asked someone to remove a photo of you? ......................................... 55

6.1.4. Did someone ever share a photo of you that made you feel uncomfortable? ............ 55

6.1.5. If a picture made you feel uncomfortable and you didn't report it, why didn't you? .. 55

6.1.6. Rate each of those features that could improve your control over photos being

uploaded of yourself on social networks. ..................................................................................... 55

6.1.7. Rate the following ideas on how you could be possibly hidden from a photos ........... 56

6.2. Face Detection ...................................................................................................................... 56

6.2.1. Image Size ..................................................................................................................... 57

6.2.2. Scaled Window Size ...................................................................................................... 59

6.2.3. Minimum Neighbours ................................................................................................... 63

8

6.2.4. Tilted Faces ................................................................................................................... 65

6.2.5. Rotated Out of Image Pane Faces ................................................................................. 68

6.2.6. Lighting .......................................................................................................................... 68


6.3.1. Training Set Size ............................................................................................................ 69

6.3.2. Background ................................................................................................................... 72

6.3.3. Equalise Histogram ....................................................................................................... 73

6.3.4. Straight Faces Only ........................................................................................................ 74

6.4. Filters ..................................................................................................................................... 76

6.4.1. Introduction .................................................................................................................. 76

6.4.2. Blurring .......................................................................................................................... 76

6.4.3. Pixelate .......................................................................................................................... 77


6.5.1. Introduction .................................................................................................................. 78

6.5.2. Face Specific Settings .................................................................................................... 78

6.5.3. Auto Global Settings...................................................................................................... 82

6.5.4. Auto User Specific Settings ........................................................................................... 85

7. Evaluation ..................................................................................................................................... 87

7.1. Face Detection ...................................................................................................................... 87


7.3. Filters ..................................................................................................................................... 87

7.4. Face Detection & Recognition within Web Application ........................................................ 87

7.5. Notification ........................................................................................................................... 87


7.7. Photo Viewing ....................................................................................................................... 87

8. Conclusion ..................................................................................................................................... 88

8.1. Project Overview ................................................................................................................... 88

8.2. Further Work ......................................................................................................................... 89

8.2.1. Detect Profile Faces + Improve Detection of Tilted Faces ............................................ 89

8.2.2. Replace Eigenfaces with Fisherfaces ............................................................................. 89

8.2.3. Improve Recognition Speed .......................................................................................... 89

8.2.4. Detect Full Body ............................................................................................................ 89

8.2.5. Object Removal ............................................................................................................. 89

8.2.6. Caching System ............................................................................................................. 90

9

8.2.7. WebSocket .................................................................................................................... 90

Bibliography .......................................................................................................................................... 91

Appendices ............................................................................................................................................ 93

Appendix A – Questionnaire ............................................................................................................. 93

Results ........................................................................................................................................... 95

Appendix B – Compute Pixel Region under Haar Feature without Integral Image .......................... 96

Appendix C – Application of a Haar feature over an image .............................................................. 96

Appendix D – AdaBoost Example ...................................................................................................... 98

Appendix E – Example of the Creation of an Integral Image .......................................................... 100

Appendix F – Steps Undertaken to Implement a Non-Cropping Rotation ..................................... 101

Appendix G – Tracking a Detected Face During Rotation ............................................................... 107

Appendix H – Straight Up Tilted Faces ............................................................................................ 110

Appendix I – WebSocket v. AJAX Polling ......................................................................................... 111

10

1. Introduction

1.1. Problem

Social networks have become the easiest way to share photos with friends or anybody else on the

web. More than 2.5 billion photos are uploaded every month on Facebook, making it one of the

biggest photos sharing website on the web. [1]

Many of the uploaded photos are portrait or groups photos, and are often uploaded straight from

the user’s camera. These mass uploads are never filtered, and many people are often unaware that a

photo of themselves have been uploaded without their consent.

Currently, social networks are improving the privacy controls a user has over their own profile;

however a flaw persists in the area of photo sharing. The privacy of the persons in the photo, that

are not the owner, have very limited control of their own image. This could be troubling especially

when these photos can be shared with millions Worldwide.

1.2. Aim & Objectives

The aim of this project is to develop features that are system driven (with automatism, no user

interaction) that can be implemented in a social network environment, to give users more control on

shared photos of themselves, even if there are not the direct owners of these photos.

The objectives are:

Investigate flaws of sharing photos on social networks

User a survey to analyse different ways privacy could be improved when sharing photos,

through the use of face detection and recognition

Investigate face detection and recognition algorithms

Establish a list of requirements using the gathered data from the two previous objectives

Implement face detection and recognition, focus mainly on getting a low rate of false

negatives, but avoid ridiculously high running time

Build a photo sharing web application mock-up to demonstrate the implementation of

features found to improve the privacy of users in photos of social networks

1.3. Document Structure

This document is split into seven main sections. Sections 2.1 and 2.2 describe the current options

offered by social networks to users when it comes to controlling their privacy in photos of

themselves posted by others, and give a list of possible improvement that could be made.

Sections 2.2.1 and 2.4 are descriptions of the algorithms for the face detections and the face

recognition used in the final system.

The design of the scripts and system are introduced in section 3.

Section 4 covers the work produced to achieve the goal of this project, focusing on the image

processing part. In section 4.3, the implementations of the face detection is given, with

11

improvements that were made to produce better results. It is then followed by the implementation

of the face recognitions, in section 4.4, with explaining the techniques used to have more accurate

recognitions. Finally, a descriptions of each filters used to hide faces in photos is given in section 4.5.

The construction of the mock-up of a social photo sharing website to demonstrate the

implementation of the improved privacy control can be found in section 5, with a run through on

how it was built.

Section 5.4.1 are all the tests carried out during the implementations of the system, section 7 is the

evaluation of the final system, and finally the conclusion in section 8 is a summary of how the

project went, and how it could be improved.

12

2. Background Research

2.1. Current Options Flaws

2.1.1. Issues with Photo Sharing

In the survey carried out for this project, 65.2% of people have experienced some discomfort when

someone shared a photo of themselves, and over 70% feel like they have little control over photos

of themselves uploaded by someone else (see Appendix A – Questionnaire).

2.1.2. Lack of Awareness

When a third party uploads a photo of another user, the only way this user can finds out a photo of

themselves was uploaded is mainly through notifications that are user driven, by tagging this user, or

if being told. Users also become aware of a photo of themselves appearing on social networks from

their news feed. However, this only occurs if the user is friends with the photos' owner. Knowing the

photo has been uploaded also depends on the user checking their account regularly.

2.1.3. Intrusive Tagging System

Tagging is a good way for a user to be aware of the photos they are in, but the drawback is that once

a user is tagged in a photo, the photo becomes available to all their friends. Tagging becomes more

of a privacy leak, and can become very intrusive. One of the discoveries when researching about tags

on Facebook was how hard it was to un-tag from a photo. When someone else tags a user in a

photo, this user can’t directly un-tag itself, instead, the user have to report the photo.

2.1.4. Unbalanced Control

The owner of an uploaded photo always has full control over them, disregarding any users that are

actually present in the photo.

Owners have the following rights:

Decide on the access control

Tag people they know

Remove any tags

People in the photos:

Can tag anybody

Can filter tags (not set by default)

Can’t remove tags of themselves added by others

Remove photo from timeline

Report the photo

Those controls are very weak, and can take some time to be applied. From the survey results, only

8.7% users ever used the report function, while slightly more than 50% have asked someone to

remove a photo of them. When asked why they didn’t report photos that made them feel

13

uncomfortable: 34.8% said they weren’t convinced that reporting the photo would lead to anything,

21.7% found it was too much hassle, and 43.5% didn’t report it because it was a group photo (see

Appendix A – Questionnaire).

2.2. Controls Improvements

2.2.1. Instantly Raise Awareness

To improve users’ awareness of the photos of themselves being shared, face recognitions can be

used as a notification tool. When a photo is uploaded, the photo can be scanned, and find any

people that are present in the photo, and send them a notification as soon as they’ve been found.

From the survey, close to 90% of the people found it a good feature to have (see Appendix A –

Questionnaire).

2.2.2. Automatically Hide Users

When users are found, some pre-set privacy settings can be used to hide them from others. The user

will be able to remove this filter later on. 78.2% of people would like to see such a feature

implemented to improve their control over their image in social networks (see Appendix A –

Questionnaire).

Some basic privacy setting can be chosen from the following:

Hide from

1. everybody

2. people who are not part of the photo

3. the general public (not their friends)

4. nobody

Users can be offered advanced privacy settings, such as being able to hide themselves from specific

people, choose to hide when they are found in photos with some other users, and choose the type

of filter they want to use to cover their face.

2.2.3. Tagging

As previously shown in section 2.1.3, tagging can become very intrusive. Tagging can actually be

used to improve user’s privacy. When a user is tagged, they become owner of region in a photo.

They can then apply any filter to that area if they wish to do so.

2.3. Face Detection

2.3.1. Overview

There are a number of face detection algorithms that have been proposed, such as the neural

network-based face detection by H. Rowley, S. Baluja and T. Kanade [2] which is an algorithm to

detect upright, frontal views of faces in grayscale images.

14

The 3D Object Detection by H. Schneiderman and T. Kanade [3] is a statistical method for 3D object

detection. It represents the statistics of both object appearance and “non-object” appearance using

a product of histograms. Each histogram represents the joint statistics of a subset of wavelet [4]

coefficients and their position on the object. This approach is to use many such histograms

representing a wide variety of visual attributes.

Here is a comparison of those algorithms with the Viola-Jones detection framework:

Detector\False Detections 10 31 50 65 78 95 167

Viola-Jones 76.1% 88.4% 91.4% 92.0% 92.1% 92.9% 93.9%

Viola-Jones (voting) 81.1% 89.7% 92.1% 93.1% 93.1% 93.2 % 93.7%

Rowley-Baluja-Kanade 83.2% 86.0% - - - 89.2% 90.1%

Schneiderman-Kanade - - - 94.4% - - -

Roth-Yang-Ahuja - - - - (94.8%) - -

Table 1: Detection rates for various numbers of false positives on the MIT+CMU test set containing 130 images and 507 faces [5]

The Viola-Jones detection framework [5] was chosen to handle the face detection for this project, as

its performances are good, and the framework was fully supported by one of the library that were

used. The following section provides an outline of the framework.

2.3.2. Haar Features

Figure 1: Extended set of Haar like features from Lienhart, and Maydt [6] used for face detection in the Viola-Jones detection framework

The face detection framework by Paul Viola and Michael Jones uses a set of Haar like features

(Figure 1). They are composed of 2 or 3 (sometimes more) regions – or rectangles. Each region has a

scalable size, meaning they are not restricted by the size of detection window and they all have a

designated position to take in that window. Their original values are based on a window size of

20x20. Each region in a feature has a weight (represented by a black or white colour). The blacks are

given a negative weight and the whites are given positive weight. Some features have a rotational

value if the feature needs to be tilted.

15

Figure 2: Structure of a Haar like feature

Haar like features offer a way of analysing a set of pixels and turn them into useful data, giving an

idea if the region being analysed could represent an edge, a line or other. All faces have similarities,

features that can be found in most people. For example, shadowed areas can often be found under

and alongside the nose, eyes, mouth and face edges. When a feature is placed over the window of

detection, it calculates the sum of all the pixels that lie within each of the feature’s regions, using the

integral image, described in section 2.3.6 (pixel value can be between 0 and 255 in grey scale images

as shown in Figure 3), multiplied by their weight, and then sums up the total weight of all the

regions.

Figure 3: Grey scale images and pixels values

When saying the feature is placed over the image, it is an abstract description of what is really

happening. In a computing environment, the images are matrices, where the width is the number of

columns of the matrix, the height is the number of rows, and the pixels are the elements in the

16

matrix. Therefore, a plain black greyscale image , of size 3x3 pixels would be represented as

followed (0 = black):

[

]

An example of how a Haar feature is applied on an image can be found in Appendix C – Application

of a Haar feature over an image.

2.3.3. Classifier

The first step toward building a successful face detection is to train the machine to be able to detect

a face based on some given features. Machine learning has the goal to turn some given data into

useful information. In the case of face detection, the machine will be given a matrix (the data)

representing an image, and should tell us if this matrix is a face or not. The Viola-Jones framework

uses a type of machine learning called a booster, and more specifically, it uses the Adaptive Booster.

For the machine to accomplish the detection, a set of features must be provided and must be

trained. The training process applies each feature to multiple images, a set of positive and negative

images (each set must contain thousands of images for the training to work). The positive images

must all contain faces, where the position of the face is known, so the Haar feature can be applied

properly. The negative images can be anything, but they can’t contain any faces. Each image

produces a result for each feature. This data will be used for boosting.

As described in the article written by Yoav Freund and Robert Schapire – This data will be used for

boosting, which is a means of obtaining a more accurate prediction by combining a number of "rules

of thumb" [7].

Each feature gets their own boost. The values from the learning session are put into a training set

{( ) ( ) ( )} where:

is the instance (data)

is the label and (where { })

Each element of the training set are assigned a weight, for the first round of boosting, they all take a

weight of

for . It is also called a distribution .

The classification is split into a defined number of rounds, , where for each a new weak

classifier is generated and called. During each round, the distribution of the weight is updated,

where the weight of each misclassified item is increased, and the correctly classified items get their

weight decreased.

The pseudo code of the AdaBoost algorithm is the following:

Given: ( ) ( ) where { }

Initialise ( )

17

For :

Train weak learner using distribution

Get weak hypothesis { } with error [ ( ) ]

Choose

(

)

Update:

( ) ( )

{

( )

( )

( ) ( ( ))

Where is a normalization factor (chosen so that will be a distribution).

Output the final hypothesis:

( ) (∑ ( )

)

An example showing how AdaBoost works step by step can be found in Appendix D – AdaBoost

Example.

Viola-Jones slightly changed the algorithm so it can pick a few Haar features from the available set of

features (180,000 for an image of size 24x24). It is also made so all weak classifiers have a different

feature:

Given example images ( ) ( ) where for negative and positive

examples respectively.

Initialise weights

for respectively, where and are the number of

negatives and positives respectively.

For :

18

Normalize the weights,

∑

so that is a probability distribution.

For each feature, , train a classifier which is restricted to using a single feature.

The error is evaludated with repect to , ∑ ( ) .

Choose the classifier, , with the lowest .

Update the weights:

Where if example is classified correctly, otherwise, and

The final strong classifier is:

( ) { ∑ ( )

∑

where (

)

The next section shows how those classifiers are used in the face detection.

2.3.4. Cascade

AdaBoost provides a set of weak classifiers, each classifiers ( ) consists of a feature , a threshold

and parity indicating the direction of the inequality sign:

( ) { ( )

This set of selected features, each part of a weak classifier, is called a cascade. For an object to be

successfully identified as a face, each of these classifiers must return a positive value. The features

are ordered very carefully for the cascade to perform efficiently. Each feature has an error rate as

previously described; the ones with the lower error rate (which performs well using low processing

power) are used first. The last classifier used is the one using the most processing power to decide if

a face is detected or not.

19

Figure 4: Classifier Cascade

2.3.5. Sub-Windows

For the algorithm to cover the entire image, it must split it into many sub-windows. Each sub-

window will be scanned for any faces using the cascade approach as described above. The window

starts with a minimum size of width and height , at position and where and . This

sub-window is then moved top to bottom, left to right. Once the sub-window has reached the

bottom right corner of the image, its size is increased by and goes back to position ( ). This is

repeated until the sub-window size becomes too big.

2.3.6. Integral Image

One of the main features in this face detection algorithm is the implementation of the calculation of

an integral image. When using Haar like features, the sum of the pixels under each region must be

computed before being able to analyse them. Integral image allows this computation to be done in

constant time (instead of computing the sum of each pixel as shown in Appendix B – Compute Pixel

Region under Haar Feature without Integral Image). The integral image at location contains the

sum of the pixels above and to the left of , inclusive:

( ) ∑ ( )

Where ( ) is the original image and ( ) is the integral image. As shown in Figure 5, the sum

of the pixels within rectangle D computed with four array references. The value of integral image at

location 1 is the sum of the pixels in rectangle A. The value at location 2 is A+B. at

location 3 is A+C, and at location 4 is A+B+C+D. The sum within D can be computed as 4+1-(2+3). [5]

Figure 5

20

Appendix E – Example of the Creation of an Integral Image, goes through an example of how an

integral image is created and used.

Using an integral image in the algorithm dramatically improves the speed of detection, especially

when remembering that the sum of pixels has to be computed for each feature’s regions, in each of

the sub-windows at least once.

2.4. Face Recognition

2.4.1. Overview

When researching for any face recognition methods, there were two that seem to give decent

results. One of them is called Fisherfaces [8]. A Linear Discriminant Analysis, a type of discriminant

analysis, is used to find the subspace representation of a set of face images, the resulting basis

vectors defining that space are known as Fisherfaces1.

This technique is actually better than the one described below, because it is insensitive to large

variation of lighting and changes in facial expressions, which is a big drawback in Eigenfaces.

Eigenfaces aims to capture the important features in a face, but doesn’t only focus on the one that

are obvious to the human eyes, such as eyes, nose and mouth. It also represents images more

efficiently so it reduces the computation and space complexity. The following sections are a

description of each step undertaken to be able to build a reliable face recognition using eigenfaces

[9].

2.4.2. Generate Eigenfaces

As for face detection, the program has to be trained to recognise the face of a person; therefore it is

required to give it a set of images of that individual’s face so the eigenfaces of this person can be

computed 2 (see Figure 6)

Figure 6: Training set of faces © Copyright: AMC 2011

1. Each image should have the same resolution, and the eyes, nose and mouth should all overlap

(preferably). Training images are all treated as vectors, meaning that if the common image

resolution of all the images in the set is then by simply concatenating each rows, the

image will result in a single row image with elements (see Figure 7: Training image turn into

a vector).

1 http://www.scholarpedia.org/article/Fisherfaces

2 http://www.scholarpedia.org/article/Eigenfaces#Computing_the_Eigenfaces

http://www.scholarpedia.org/article/Fisherfaces

http://www.scholarpedia.org/article/Eigenfaces#Computing_the_Eigenfaces

21

Figure 7: Training image turn into a vector

All training image are then stored in a single matrix where each rows of the matrix is an

image.

2. To generate the eigenfaces, the average of the images in the training set { }:

∑

In more details, every pixels at coordinates ( ) in each image of the set are added

together, then each pixel sum are divided by the number of images in the set, the outcome

is the average of the pixel at coordinates ( ) (see Figure 8).

Figure 8: Average face of the training set in Figure 6

Once the average face is calculated, it is subtracted from each face in the training set:

3. The covariance matrix is the matrix used to generate the eigenvectors and eigenvalues,

where

∑

Where the matrix [ ] and is the transpose of the matrix :

[

]

4. Due to the complexity of generating the eigenvectors of the covariance matrix , it is a near

impossible task for typical image sizes when . The solution to compute the

eigenvectors of efficiently is to compute the eigenvectors of the much-smaller matrix

.

The eigenvector and eigenvalue matrices of are defined as: { } and

{ } , , where is the rank of .

22

5. The eigenvalue and eigenvector matrices of are and , where { } is the

collection of eigenfaces.

2.4.3. Use Eigenfaces for Recognition

Figure 9: Eigenfaces and a New Face Projected onto the Face Space Orange dot: New image projection

Green dots: Projections of the training set

Once the eigenfaces are calculated, a subset of those is chosen; each associated with the largest

eigenvalues:

{ }

The outcome is the creation of a face space (of -dimensions), where the origin is the average face,

and its axes are the eigenfaces. The recognition process is performed by computing the distance

within or from the face space.

To compare a new face with the reconstructed known faces, all must be projected onto the face

space. The new face projected in the face space is represented by the following

( )

Once all the projections are created, one can start calculate the distance of the new face compare to

the known faces:

Where is the projection of the known face in the face space. After computing all the

distances, the lowest distance with face is more likely to resemble the new face. This value can

then be compare against a threshold value 3:

{

3 http://www.scholarpedia.org/article/Eigenfaces#Face_recognition

http://www.scholarpedia.org/article/Eigenfaces#Face_recognition

23

3. System Design

3.1. Requirements

ID Cross-Ref Priority Requirement

Image Processing

R1 4.3.2 H Upright face detection

R2 4.3.3 M Tilted face detection

R3 4.3.3 M Profile face detection

R4 4.4 H Face recognition

R5 4.5.4 H Apply pixelate filter on defined region of an image

R6 4.5.3 H Apply blur filter on defined region of an image

R7 4.5.2 H Apply plain cover over a defined region of an image

R8 4.5.5 L Replace image region with another image

R9 L Remove object from an image

Mock-up Web Application

R10 5.3.1 H Run face detection and recognition from the Ruby on Rails framework

R11 5.3.2 H Run face recognition using existing database of known users

R12 5.3.3 H Notification system for user to instantly know when they are recognised in a photo

R13 5.4.2/5.6.3 H Allow user to confirm/deny face recognition

R14 M Implement account creation

R15 M Implement safe login

R16 5.2 H Implement image upload

R17 5.2.2 L Simultaneous asynchronous multiple photo upload

R18 5.2.1 L Drag and drop image upload

R19 5.6.3 M Position tag circle over image on area of the image that has detected faces

R20 5.6.3 M Allow users to tag detected faces who haven’t been associated with an user

R21 5.5.3 H Allow users to apply a filter of their choice on their recognised/tag face in a photo

R20 5.5.1 H Allow users to choose to hide automatically from a group of people when they are recognised in a photo, with the filter of their choice

R21 5.5.2 M Give users advanced options to hide from specific users with the filter of their choice

R22 5.5.2 L Give users an extra option to hide themselves from specific users when found in a photo with specific users

R23 5.4.1 H Generate new image depending on the viewer and the privacy settings from each user recognised in a photo

24

3.2. Face Detector/Recogniser

3.2.1. Class Diagram

Figure 10: Face Detection and Recognition Class Diagram 4

4 See a better quality of this diagram on the attached CD: /umldiagrams/fig10.png

25

3.2.2. Activity Diagrams

Figure 11: Face Detection Activity Diagram

For the detection to happen, a parameter must be passed through targeting an image file. The face

detection goes through a few steps. The first ones focus on preparing the image for detection

(resizing and converting to greyscale); then the detection is reproduced 12 times, with different

rotation angles so the image can look for any faces with different angles. For each of those

iterations, the maximum and minimum size of face is updated so the detection can go faster. When

the rotation of the image is , the detected faces’ original positions are computed, to be then

saved in a final vector.

The face recognition can only be run if a set of known faces is passed to the script. Multiple CSV files

can be loaded to retrieve lists of image paths and their matching labels. Once the images and labels

are loaded, the eigenfaces are computed, and are projected onto a face space.

26

Figure 12: Initialisation For Face Recognition

Once the script is ready to run the recognition, a face (image) can be parsed for recognition. A

projection of this face is computed, so it can be compared with all the projections generated from

the data set, by calculating its distance with each of them. Some filtering and sorting is applied

before the face can be given a label. At the end, the user is given a vector of pairs where each pair

has a label and its distance value, this vector is sorted so the first element is the label that is the

most likely to be the right label for that face.

27

Figure 13: Run Face Recognition on a Given Face

28

3.3. Mock-up Web Application

3.3.1. Database

Figure 14: Database Diagram5

3.3.2. Models

Here is the class diagram of the main model used in the web application. The User, Photo and Face

models are the backbones of the applications. The User model manages the user interacting with the

system, the Photo model focus on the photos uploaded by the users, and the Face model is for all

the detected faces by the C++ for each photos.

The PendingRecognition model is used for notification and recognitions waiting to be confirmed or

denied by the targeted user.


29

All the other models, AutoHideFromUser, AutoHideWhenWithUser and FaceHideFromUser are used

for privacy settings.

Figure 15: Ruby on Rails Model Diagram6

The relationships between the database tables are set directly within the models.

A user can have many photos, faces, pending recognitions, face hidden from users, auto hide when

with users, and auto hide from users.

A photo belongs to a user and can have many faces; a face can belong to a user and a photo, and

share many pending recognitions with a user.

All the models used for the privacy settings belong to a user.


30

3.3.3. Controllers

Figure 16: Ruby on Rails Controller Diagram7

Controllers are classes, where each public method is called an action, accessible from the web

browser. They all extend the ApplicationController class. The application controller has a

before_filter (set of actions that should be run before the controller is loaded). For each controller

call, the init() action is first executed to set the user global variable that will be used in all the

controllers. Once the user is initialised, a second method is executed before the controller is loaded,

require_auth. By default, all controllers require a user to be logged in, except from a few actions:

AuthController

o login()

o authenticate()

o reset()

AccoutController

o create()

4. Image Processing Implementation

4.1. Introduction

Before writing an entire system with improved privacy settings, scripts had to be written to be able

to support those privacy settings and add automatism to the system.

The following sections describe the implementation of the needed scripts to support such system. In

section 4.3, a description of the functions used for the face detection is given, with some techniques

developed to improve the rate of false negatives (face not detected). The implementation of the

face recognition is then described section 4.4, with once again, a few techniques developed to


31

improve quality of recognition. Finally, the features used for hiding faces are described in section

4.5.

All the following implementations focus mostly on the quality of the scripts, the speed was still

considered but was not a priority.

4.2. Technologies Used

4.2.1. OpenCV

OpenCV8 (Open Computer Vision) is a library distributed under the BSD licence (Berkeley Software

Distribution license). It is provided for free by Intel Corporation. The following work uses the C/C++

interfaces for the face detection/recognition of the version 2.3.1 of the library. The object detection

feature provided by OpenCV uses the Viola-Jones object detection framework as described in

section 2.2.1 and uses an improved set of Haar-life features by Lienhart and Maydt [6].

For the face recognition, a class written using OpenCV was used9 and very small changes were made

to the original code.

4.2.2. RMagick

ImageMagick10 is an image-processing library and RMagick11 is its API for Ruby. It allows the easily

manipulate images (rotating, resizing, effects and others). It will be mostly used for resizing images

when being uploaded by the users, rotate them, and apply filters to them (describe in sections 4.5.1,

4.5.3, 4.5.4 and 4.5.5).

4.3. Face Detection

4.3.1. Basic face detection program

Implementing a basic face detector in C++ using OpenCV isn’t too hard as the library already has a

function to detect objects in an image when giving it the location of a haarcascade XML file. The

main issue in to make the image compatible with the method detectMultiScale12 called by a

CascadeClassifier13 object. The image must only contain one channel of colours (greyscale image).

The function cvtColor allows the conversion of colour in an image. Because OpenCV already provides

haarcascades, there is no need of going through any training.

This program has been tested to see how each parameter affects the detection quality and speed.

From those tests, results have been analysed, and solutions were given when needed. The ideal face

detector would need to be fast (detection ran in less than a second), have a high positive detection

rate, meaning faces should be detected no matter their titled angle, rotation (left and right, up and

8 http://opencv.org

9 https://github.com/bytefish/opencv/tree/master/eigenfaces

10 http://www.imagemagick.org/

11 http://rmagick.rubyforge.org/

12http://opencv.itseez.com/modules/objdetect/doc/cascade_classification.html#cascadeclassifier-

detectmultiscale 13

http://opencv.itseez.com/modules/objdetect/doc/cascade_classification.html#cascadeclassifier

http://opencv.org/

https://github.com/bytefish/opencv/tree/master/eigenfaces

http://www.imagemagick.org/

http://rmagick.rubyforge.org/

http://opencv.itseez.com/modules/objdetect/doc/cascade_classification.html#cascadeclassifier-detectmultiscale


http://opencv.itseez.com/modules/objdetect/doc/cascade_classification.html#cascadeclassifier

32

down), under different lighting, the skin colour, hair style (a fringe for example would cover the

forehead).

4.3.2. Upright Frontal Faces

The face detection method provided by OpenCV works very well on upright and frontal faces. For

the detection to be successful, using the default parameters, the image is required to have a good

enough resolution. A high quality photo (high resolution) is more likely to give a more accurate

detection. This was demonstrated in the test found in section 6.2.1. As shown below, all the faces

were detected on the original photo (resolution of 1626x1123 pixels), but there were seven false

negatives (faces not being detected) on the photo on the right which had a resolution of 640x442

pixels. As the resolution decreases, so is the size of all the faces within this image, a person being

closer to the camera will have more chance of being detected than a person standing in the

background.

Figure 17: Detection quality on the same image with different resolution. Left: 1626x1123 pixels – 29/29 faces detected Right: 640x442 pixels – 22/29 faces detected

The issue with running face detection on a high resolution image is the area the detection has to

cover; a high resolution equals a large work area, which slows down the detection, uses more

processing power and memory (image has to be stored in the memory, the larger the image, the

more memory it uses). From the test results, 1280x1280 was the lowest resolution an image could

have for the detection to have a fairly good chance of detecting all faces. Due to limitation in

processing power and memory in the private virtual host used for this project, the choice was to

provide the user with lower quality photos, so the faces that would be left out in the background

wouldn’t be recognisable anyway by the naked eye. A maximum resolution of 800x800 pixels was

chosen as it seemed the detection could still run providing a good successful detection rate.

OpenCV provides a good set of options to increase/decrease the quality of detection; a few tests

focusing on the main parameters of the following methods were carried out to find out how each

affect the detection.

void CascadeClassifier::detectMultiScale( const Mat& image, CV_OUT vector<Rect>& objects, double scaleFactor=1.1, int minNeighbors=3, int flags=0, Size minSize=Size(),

33

Size maxSize=Size() ); 14

Image: Matrix of the type CV_8U containing an image where objects are detected

Objects: Vector of rectangles where each rectangle contains the detected object

ScaleFactor: Parameter specifying how much the image size is reduced at each image scale.

The scale factor affects the quality and speed of the detection as shown in section 6.2.2, the

lower it is, the more windows have to be considered. A scale factor of 1.1 will increase the

window size by 10% after each round (image fully scanned using the current window size). It

was demonstrated that increasing the window size by 5% at each round dramatically

lowered the false negatives. As a drawback, the number of false positives (unwanted

detected objects) was increased proportionally. On average, the time of detection was

doubled when increasing the size of the window by 5% instead of 10% (an average speed of

818ms instead of 412ms).

Other than affecting the speed and the number of false negatives and positives, it was found

that the quality of detection was affected as well when the scale factor was increased, the

detection rectangles weren’t positioned as well as they should (the region of interest

became too big or too small, see Figure 49).

MinNeighbors: Parameter specifying how many neighbors each candiate rectangle should

have to retain it.

This parameter acts as a cleaning function, once the detection is over, the list of objects

detected look like this:

Figure 18: Face detection without grouping Total of 294 "faces" detected instead of 29

14




34

What this parameter allows the user to do is request to group detected objects with a

minimum neighbour count, if an object doesn’t have that amount of neighbours, it is

removed from the list of valid objects. An object detected with high neighbour count is more

likely to be a face than a false positive. Therefore when lowering this value down, the

detection becomes more flexible, resulting in less false negatives, but more false positives

(as demonstrated in section 6.2.3). Compare to the scale factor parameters, the speed of

detections isn’t affected by this parameter, this is because it doesn’t directly affect the

detection, and only acts once the detection is over.

Flags: Parameter with the same meaning for an old cascade as in the function

cvHaarDetectObjects. It is not used for a new cascade.

MinSize: Minimum possible object size. Objects smaller than that are ignored.

MaxSize: Maximum possible object size. Objects larger than that are ignored.

4.3.3. Tilted Frontal Faces

The previous sections have shown how the face detection worked well when the faces were all

upright, unfortunately, as demonstrated in section 6.2.4, the detections deteriorate when faces are

tilted. The test showed that faces tilted no more than 17° left or right are still detected, but if the

tilted angle is greater than 17°, the face will be ignored. This confirms what Viola and Jones had said

in their journal, that the detection could only work on faces that had a maximum tilted angle of ±15°

(see Figure 19). ±15° only covers about 8% of all possible tilted angles.

Figure 19: Maximum tilted angle allowed for detection

There are two solutions to solve this problem so the detection can handle any faces with any tilted

angle:

1) Rotating the image

35

As described above, a face can only be detected within a tilted angle of ±15°, giving a total of 30°

covered by default. The total number of possible angle is of 360°, meaning there are

not covered, the solution is to “slice” the image into

segments, meaning the image

has to be rotated each time a detection is accomplished. The main issue that had to be overcome

was to achieve a full 360 rotation without cropping any part of the image, to avoid missing any

faces during this process, function which isn’t given in OpenCV. Appendix F – Steps Undertaken to

Implement a Non-Cropping Rotation, explains in great details how such rotation was achieved.

This new function to rotate an image without cropping allows the detection to be carried over a full

360 rotation, allowing the detection of faces tilted to whatever angle.

When detecting faces on a rotating image, it is needed to track them, first to avoid any duplicated

detection, second is to move it back to the detected region where it originally belongs. To fix those

two issues, the detected faces have to be mapped to the rotated angle of the image at the time of

detection. The coordinates of a detected object have to be referenced to the centre of the image,

and not referenced to the top left corner of the rotated image. Appendix G – Tracking a Detected

Face During Rotation shows each steps to follow to track a detected when the image is rotated.

Using the rotation technique greatly affects the speed of detection, increasing it by 14 times its

original speed (see section 6.2.4, the second test). There are a few ways of improving the speed

slightly, such as avoiding any detection over the empty area generated by the rotation. Fortunately,

there is a much more efficient way of handling tilted faces, and profile faces at the same time,

without affecting the time of detection.

2) Introducing new Haar like features, and implementing a two stage detector

This technique was introduced by Paul Viola and Michael Jones in 2003 [10]. For each detection

window, the viewpoint is firstly determined, and then a decision tree is used for that specific view

point. The previous features used in the Viola Jones detection framework (section 2.3.2) weren’t

sufficient enough to detect diagonal structures and edges; by introducing a new set of features, it is

now possible to do so. They consist of four overlapping rectangles that combine to yield the blocky

diagonal areas (see Figure 20 – A,C). They operate in the same way as the previous filters.

Figure 20: Diagonal Filters

The detection happens in two stages, where the first stage is used to determine the pose of the face

in the current window, and then evaluates only the detector trained on that pose.

36

It was shown earlier that the basic viola jones detection framework was only able to cover tilted

faces of ±15°, covering only 30°. As described in their article, Viola jones “trained 12 different

detectors for frontal faces in 12 different rotation classes. Each rotation class covers 30 degrees of

in-plane rotation so that together, the 12 detectors cover the full 360 degrees of possible rotations.”

The profile faces detectors have to be trained as well, both left and right profile faces, at an angle of

15°, and 30° on each side.

Another detector has to be trained, for the post estimator, so the program knows which of the

previously trained detectors has to be used.

Unfortunately, this technique could not be implemented in the final system, due to lack of

processing power, and good training set. To implement such features, a few thousands images of

rotated and profile faces would have been required for the training to take place. The OpenCV

source code would have had to be updated to integrate the new Haar like feature to detect more

accurately diagonals. To run the trainings, it would have required a computer to run for days, or

even weeks to finalise them. This level of processing power wasn't available when producing this

work.

4.3.4. Conclusion

OpenCV provides a good detecting tool to be able to find faces in images. The speed and the quality

of detection are dependant to each other; a fast detection means a higher number of false positives

and false negatives. The Haar cascades provided by OpenCV are good, but could be improved; the

detection of frontal faces is only good where the face is well exposed and where the lighting

conditions aren’t extreme. The faces are only detected if the tilted angle doesn’t go over ±15°. The

best solution is to use an improvement of the Viola-Jones detection framework, which split the

detection into two steps; one to estimate the pose of the face within a window, the second is to use

a detector that was trained for that type of pose, but due to limited resources, this method wasn’t

used in the final system, replacing it by a more costly technique, rotating the image and running the

detection at each rotation and tracking the detected faces, affecting the speed of detection greatly.


4.4.1. Overview

As for face detection, face recognition is performed using the OpenCV library. One power tool offer

in the recent version of OpenCV is the introduction of the PCA15 class (Principal Component Analysis

class) in its API. What it does is generates the eigenvectors of a covariance matrix computed from a

set of vectors (images turned into vectors as described at the beginning of section 2.4.2).

The face recognition worked well using the class provided by Philipp Wagner. First the object can be

initialized with a vector of matrices, and a vector of labels (or a matrix of vectors for each rows, and

a vector of labels), and then use the predict method that takes a matrix as a parameter. This method

goes through all the projections constructed during the initialisation of the object, and gives the

15

http://opencv.itseez.com/modules/core/doc/operations_on_arrays.html#pca

http://opencv.itseez.com/modules/core/doc/operations_on_arrays.html#pca

37

matrix the label of the projection that gave the lowest distance value (fewer differences between

the projections).

The code was slightly changed, and instead of just outputting the label, the program would output a

list of all the labels associated with the projections that returned a distance value below a threshold.

4.4.2. Background Removal

When detecting faces, some background noise still lies within the detected region. When building

the eigenfaces, the background noise can be turned into a feature that could mislead the recognition

(see Figure 21).

Figure 21: The area in red is the noise that can mislead the recognition © Copyright: AMC 2011

This was proven in section 6.3.2, where it was shown that introducing noises in the training set was

introducing errors in the recognition. One possible solution and easy to implement was to crop out

the edges of all the faces involved in the recognition (faces in the training set and faces to be

recognised).

The downside in implementing such solution is the loss of possible details that could be very useful

to the recognition (ears as such). The plus side is the almost total removal of the background in the

images (see Figure 22: Remove the background by cropping and resizing the image to its original

size).

Figure 22: Remove the background by cropping and resizing the image to its original size © Copyright: AMC 2011

The tests in section 6.3.2 not only showed that backgrounds do affects the recognition, and

removing them greatly improve the recognition by reducing the minimum distance value.

4.4.3. Equalise Histogram

Equalising the histogram means normalising the grayscale image brightness and contrast by

normalizing its histogram (see Figure 23)

38

Figure 23: Equalising the histogram using the function equalizeHist() in OpenCV © Copyright: AMC 2011

This step greatly improves the visibility of the faces and makes their features stands out. By applying

such effects on all the faces involved in the recognition, the minimum distance value is greatly

improved (see section 6.3.3).

This process is achieved using the equalizeHist16 function provided by OpenCV.

4.4.4. Keep the Faces Straight Up

One advice to create a training set and running the recognition against an image, is to keep the faces

as straight as possible so the differences between a face and the eigenfaces won’t be too big. One

problem is that the detector can detect tilted faces of ±15°, affecting the training set.

The solution works only on faces of high enough quality, because it requires detecting the eyes on

the face. Using the detector previously described, and using the

haarcascade_eye_tree_eyeglasses.xml file as the haarcascade, both eyes can be detected. Using the

detected rectangles and computing their centre, the eyes positions on the face can be detected.

These positions can be used to compute the angle of the face: if the left eye is higher than the right

eye this mean the face is tilted toward the right. If the right eye turns out to be higher than the left

eyes, then the face is titled toward the left. The achievement of such process is described step by

step in Appendix H – Straight Up Tilted Faces.

The downside of this technique was the lack of insurance that both eyes were going to be detected

(false negatives), and that no false positives were going to be introduced during the eyes detection.

Unlike the face detection, it is important to have very accurate eye detection, with no errors. The

best that could be done is to not straighten faces that have 0, 1 or more than 2 eyes detected. Faces

with 2 eyes detected but that have a tilted angle of more than ±15° should not be straighten (an

allowance margin can be added because it has been shown in the tests that faces were still being

detected at a tilted angle of ±17°, so faces with a tilted angle of ±20° could still be considered as

valid) .

Due to the low improvements showed from the test in section 6.3.4, this feature wasn’t

implemented in the final system.

16

http://opencv.itseez.com/modules/imgproc/doc/histograms.html#equalizehist

http://opencv.itseez.com/modules/imgproc/doc/histograms.html#equalizehist

39

4.5. Filters

4.5.1. Introduction

So far, the system has a good recognition script, allowing it to detect and recognise faces. The next

step was to allow the system to hide the recognised person through the help of some image

processing technique. All the faces coordinates were entered manually for the following examples.

The data needed was the following for each detected faces:

The coordinates and

Width and height of the detected region

All the effects were produced using RMagick, directly within the web application, but they could

have been done through some C/C++ scripts if it was needed.

4.5.2. Plain Cover

It is one of the easiest image processing methods used to hide a face from a photo, but the least

elegant one. Using the coordinates and size of the detected regions, a rectangle is drawn over the

wanted region.

Figure 24: Faces hidden with black rectangles © Copyright: AMC 2011

Taken from the API, the method rectangle from the Draw class takes 4 parameters:

draw.rectangle(x1, y1, x2, y2) -> self 17

17

http://www.imagemagick.org/RMagick/doc/draw.html#rectangle

http://www.imagemagick.org/RMagick/doc/draw.html#rectangle

40

Figure 25: Drawing a rectangle in RMagick

4.5.3. Blurring

A better effect to hide faces from a photo; it is smoother and blends better in the image’s context.

The process to blur a specific area of an image is slightly more challenging than the previous effect.

Unfortunately, RMagick doesn’t provide a function to blur a specific area of an image, instead, the

region of interest has to be singled out, saved into a new image object, apply the blur effect on that

image object, and then put it back over the original image to its original position so it covers the

original face.

Figure 26: Faces blurred out © Copyright: AMC 2011

The next thing to do was to make this script compatible with any image sizes, as the Gaussian blur

value would only work for that image size, and not bigger ones (see test in sections 6.4.1).

One aesthetic issue was how the blur was cut, and did not fade out smoothly. The solution was to

expand the region that were supposed to be blurred, generate a mask with white circles and a black

background, where the white circles should be drawn where the blurred regions are, and blur the

mask as well for a smoother mask. Then apply the mask on the blurred image, which would output a

transparent image with the blurred regions cropped in circles. Finally, cover the original photo

(untouched) and cover it with the newly created transparent image with the cropped out blurred

faces (see

Figure 27: Creating a smoother blurring filter).

41

Figure 27: Creating a smoother blurring filter

4.5.4. Pixelate

Pixelating the image works on the same idea as blurring it, each face has to be isolated, the

processed, and then put it back over the original image to its original position so it covers the

original face.

RMagick doesn’t provide any function to pixelate the image, but by considerably reducing the size of

the face, and then resizing back to its original size, without any smoothing effect, result in a

pixelated area of the image.

Figure 28: Pixelated faces © Copyright: AMC 2011

The issue with pixelating the image (same as blurring it) is that it has to match the size of the image.

Shrinking the image by a scale of 0.08 would only work for some image sizes, if the image was

bigger, the filter would not be strong enough to hide the face (see test in section 6.4.1).

To avoid the problem above, the resize scale value should be dynamic:

42

Where the width of the face to be pixelated is, the constant is the number of

visible pixels on the axis of the region to be pixelated (the face). From the tests carried out,

should take a value no more than 6, because the face becomes more and more

recognisable with higher constants.

4.5.5. Face Replacement

Each user could have the choice to upload a few of their own faces to be used as a replacement for

their original face. A user could then replace an undesirable picture with a face of their choice. This

process would require some kind of input for the use to place their face properly over the detected

one. The photos used as a replacement should be in PNG so they can handle transparency.

To allow such feature to be applied, the program would need the read the original photo, read the

replacement face with its coordinates and size and maybe rotation angle.

Figure 29: Walter White's face replaced with one previously cut and saved as a PNG to support transparency © Copyright: AMC 2011

4.6. Conclusion

In this section, face detection, face recognition and filters were all covered. Those are the main

elements that were used in the final system discussed in the next section. The face detection gave

the system the possibilities to detect the faces, get their position and size within a photo. The face

recognition determines who the detected person is, allowing the system to give a label to a face. The

filters give the user a number of choices to hide themself from a photo.18 19

18

Find the source code of the script to do a face detection and recognition on the attached CD: /applications/finalsystem/face_detection_and_recognition/src/ 19

Find the source code for each filters on the attached CD: /applications/demo/filters/

43

5. Mock-up Web Application Implementation

5.1. Technologies Used

5.1.1. HTML 5

HTML520 is the new version of HTML not yet officially released but many new features are already

supported in most web browsers.

The main new elements that HTML5 brings are the new form widgets, the Web Socket Protocol,

customized attributes and canvases.

5.1.2. JavaScript/jQuery

JavaScript is the language that allows the client to interact with the DOM without refreshing the

page. Because each web browser have their own JavaScript engine, it makes the development of

web application very frustrating, as many conditions have to be written to check if such engine

support such function.

jQuery21 is a JavaScript library that is compatible across many different web browser, making the

development in JavaScript much smoother.

5.1.3. Ruby on Rails

Ruby on Rails22 (RoR) is a wildly used web framework written in Ruby. It uses the Model-View-

Controller architecture.

The models are the objects that link the web application with the objects in the database. The

framework generates automatically a range of methods that allow very easy and quick access to the

database.

The controllers are the entities that control the user’s requests. Generally, the controller is the

models and views interact between each other, the user requests to view a page, the controller

process the request, access the data needed via a set of models, and send a response back to the

user.

The views are the pages generated/redirected to/rendered by the controller; it’s the front end of the

web application. The views are often html documents (or can be any formats) with some inline script

written in ruby to generate dynamic content.

5.1.4. MySQL

MySQL23 is a reference in term of relational database management system. It is used worldwide and

by many big companies (Google, Facebook, Twitter…). It will be used to keep records for the Ruby on

Rails application.

20

http://dev.w3.org/html5/ 21

http://jquery.com/ 22

http://rubyonrails.org/

http://dev.w3.org/html5/

http://jquery.com/

http://rubyonrails.org/

44

5.2. Photo Upload

5.2.1. Drag & Drop

The photo uploader uses the drag and drop technique to upload files. It uses the Javascript DOM

event handler. For the drag and drop to work, an element on the HTML page must first be used as a

target; it can be anything, from a small region on the page using a div tag, or the entire document.

Then the target, through a unique ID, is given some event handlers dealing with dragging and

dropping:

Target.addEventListerner(<event name>,function(event) { … },false);

For the drag and drop to take place, there are four events to be handled:

dragover

Trigger the function when something is dragged over the target. When an item is dragged over the

drop area, the CSS is changed so the target highlights itself and changes its message to “Drop!”.

Figure 30: Target Before and After dragover

dragout: Reverse target back to its original style when the drag goes out of the target.

dragenter: Single event triggered when the drag enter the target.

drop: Event triggered when the dragged item(s) is dropped onto the target.

Each of those events must be prevented from triggering their default action, for example, dropping a

photo onto a web page would make the web browser load this photo, therefore, the

event.preventDefault(); method must be used, as well as using event.stopPropagation(); to avoid the

action to propagate itself onto other elements in the web page.

5.2.2. Background Upload

Once the drag and drop events are under control, the next step is to be able to pull the elements

that are being drop onto the target; in the case of a drag and drop uploader, the elements to be

23

http://www.mysql.com/

http://www.mysql.com/

45

pulled are the files being dropped onto the target. They can be retrieved using

event.dataTransfer.files.

Each of those files can then be sent via AJAX as a form, making the upload of multiple images

asynchronous. Once fancy feature supported with HTML5 is the ability to add an event listener of

the progress of the upload 24:

xhr.upload.addEventListener("progress",function(progress){

var percentComplete = progress.loaded / progress.total;

},false);

Using the updated percentage once can create a feedback to the user of the current progress of the

upload, such showing a progress bar:

Figure 31: Image Upload with Progress Bar

The uploader implemented lets the user know when they can leave the page, as the user only need

to stay on the page while the photos are being uploaded, not when the photo is being processed

(image resize and face detection/recognition). Therefore, when a photo upload reaches 100%, it is

removed automatically from the upload list on left hand side, and is added to the recently uploaded

photos.


5.3.1. Run Script from Web Application

To run the script implemented in sections 4.3 and 4.4, the path to an image is needed, followed by a

list of location of CSV files containing the locations of face images and their matching user ID.

Ruby allows the execution of external command from within the code itself. The command simply

has to be written between backticks25 `<command>`, any output generated by that command is

returned.

24

http://dvcs.w3.org/hg/progress/raw-file/tip/Overview.html

http://dvcs.w3.org/hg/progress/raw-file/tip/Overview.html

46

Therefore, to run the face detection/recognition and retrieve the output:

output = `<path to script> <path to image> <path to csv1> <path to csv2>`

5.3.2. Parse Face Database to Script

The script was built to retrieve known faces from a set of CSV files. When a photo is uploaded, every

user has its own CSV file generated using the records in the faces table. These generated files are

then passed as parameters in the script call as shown above.

5.3.3. Handling Script Output

The script has a method to output the detected faces and their recognition in the JSON format. Ruby

has a good library to parse JSON string into a Ruby hash map.

Each detected face is processed, where a record is created in the database, and the image of the

face is saved in a private directory. If the face has an ID attached to it, a PendingRecognition record

is created taking the newly created face’s unique ID, and user ID. This record is then used for

notifying a user it has been recognised, and the user needs to take action to confirm or deny the

recognition.

5.3.4. Train User Recognition

There are two different set of images that are used for the recognition of a user. The first set is all

the faces detected by the system and confirmed by the user in photos uploaded to be shared.

The second set are all the faces that have for sole purpose to be used for face recognition, they are

not attached to any specific photo. This is a feature implemented in the account page of the user,

where they can build their own training set of faces by drag and dropping an image, and choosing

their face from the ones automatically detected by the system:

Figure 32: User can drop photos to improve their recognition

25

http://www.ruby-doc.org/core-1.9.3/Kernel.html#method-i-60

http://www.ruby-doc.org/core-1.9.3/Kernel.html#method-i-60

47

Figure 33: After dropping a photo, users can choose which face should be added to their set of faces

5.4. Recognition Notifications

5.4.1. On Recognition

The notification system uses the record generated when a face is detected with a label attached to

it, leading to the creation of a PendingRecognition record. Users needed to be notified as soon as

they had they face recognised in a freshly uploaded photo. To do so, an asynchronous AJAX call is

send every three seconds. That AJAX call sends a request to the recognition action found in the

NotificationCentreController. This action returns a count of the number of pending recognitions that

the logged in user needs to take action on. When the total count is over zero, a specific HTML

element already present in the DOM, but hidden, is set to be visible, and its content (number to be

displayed) is replaced with the number returned by the AJAX call.

Figure 34: Recognition Notification in the Top Bar

When a user tags another user, a PendingRecognition record is created as well.

48

5.4.2. Notifications Action Centre

Figure 35: Notifications Centre

When the notification button is clicked, the user is redirected to a page where all the pending

recognitions are listed, with the recognised face, and a question asking the user to say if they are or

not the user recognised.

5.5. Privacy Settings

5.5.1. Automatic General Settings

These are the privacy setting options offered to users when they’ve been recognised or tagged in a

photo. These options are applied automatically on recognition:

1. A user must first choose the group of people they would like to hide from automatically,

these options can be to hide from:

Everybody

This option means the user will be hidden from anybody who views the photo they

have been recognised in.

People not in photo

Only people who have been recognised or tagged in a photo the user has been

recognised in will be able to see the user. All the other viewers won’t be able to see

the user.

Public

Only people the user follows will be able to see that user in a photo he has been

recognised to be in. All other people won’t be able to see that user.

49

Nobody

The user won’t be hidden from anybody viewing the photos he has been recognised

to be in

2. The user must then choose what type of filter (or cover) should be used to cover its face:

None

No cover will be applied on the user’s face

Pixelate

The user’s face will be pixelated

Blur

The user’s face will be blurred out

Plain Cover

A plain black square will be placed over the user’s face

Figure 36: Automatic Global Privacy Settings

50

5.5.2. Automatic User Specific Settings

Figure 37: 3-steps privacy settings

This option is a 3-steps privacy setting:

1. Select Users You Want To Hide From

The user is given a list of selectable users. These selected users are the ones the user

wants to hide from.

2. When In a Photo With

Once one or more people have been selected in the first step, another list of selectable

users appears. The selected users will be those the user doesn’t want to be seen with. So

if user “A” selects user “B” in the first list and user “C” in the second list, it would mean

that “A” will be hidden from “B” each time it is recognised in a photo with “C”.

3. Choose a Cover Option

List of available filters (covers)

5.5.3. Photo Specific Settings

The user can pick a filter (or cover) to hide its face from a specific photo, it is automatically set on

“auto”, meaning the privacy settings chosen by the recognised user are applied automatically:

51

Figure 38: Cover option specific to a photo

5.6. Photo Viewing

5.6.1. Access Control

In the system, there was a privacy issue. The problem? If user “A” wants to be hidden from user “B”,

but not from user “C”, what prevents “C” from sharing the direct link to the image of “A” (without

their face covered) with “B”.

The solution was to create a controller named “ImageAdministratorController” that would take care

of the rendering of the images. Instead of storing the images in a public folder, all the uploaded

photos were moved to a private location of the server where no-one in the general public can have

access to them directly (via an URL). When allowing a user to view a photo, a link to the controller is

generated with some parameters:

http://localhost:3000/image_administrator/index?ref=1234

Where image_administrator is the name of the controller, the view is index, and with a parameter

ref that references to a unique string of an image.

When a user accesses this controller, the image matching the reference string is loaded into an

object. Then depending on the user trying to view the image, the image is processed before

rendering it to the user.

The benefit of using such a technique is having 100% control over how the image is displayed and

depending on the viewer. The images aren’t directly made available to the public and are instead

http://localhost:3000/image_administrator/index?ref=1234

52

handled by a controller. The unfortunate downside of using such technique is the cost of processing

and speed. For each image request, the image has to be loaded, processed, and is then sent to the

client. It affects the user because the request speed is slower, and affects the provider due to extra

processing work to be done for each image request.

One solution that can be later implemented is having a caching system where the system wouldn’t

need to process the image for each request. Instead, it could be done once, then stored for a

defined amount of time locally in case other similar request are made later on.

Figure 39: Access Control on Images

5.6.2. Image Processing

Using a controller instead of direct image link allows any kind of image manipulation before it is sent

to the client requesting to view it.

Using the privacy settings set by the people tagged in the photos, and depending on who wants to

view it, images can be changed to respect other people’s privacy. Using the effects demonstrated in

section 4.5, and respecting the user’s choice of privacy settings (0/nil-Don’t hide, 1-Hide being a

plain black square, 2-Blur face out, 3-Pixelate face), one of the effect is applied on the region of

interest stored in the database, and scaled appropriately matching the image size requested.

53

5.6.3. Photo Viewer

The photo viewer can have different designs depending on the current pending recognitions,

unknown people detected, and the privacy settings applied to users present in the photo.

When a user is recognised in a photo, a small HTML fragment is placed under its pending face

recognition.

All other detected users can be confirmed by anybody, but will still have to be confirmed by the

recognised person itself. Any tag pending for recognition can’t be removed except by the targeted

user.

If a face is detected, but isn’t recognised automatically, anybody can tag that face; once again, the

tagged user is the only one who can confirm or deny the tag.

Figure 40: Photo Viewer with Detected and Recognised faces

The photo viewer, accessed by someone not owning the photo, can have a very different page,

depending on the users’ privacy settings:

54

Figure 41: Anonymous Users in the Photo Viewer

Users who decide to hide themselves from a group of people or a specific user have their chosen

cover/filter applied over their face and are set as “Anonymous User” on the right hand side.

Hidden users never have their faces covered when the owner of the photo is viewing, as it is

assumed that the owner has access to the original photo anyway.

5.7. Conclusion

All the sections above focus on the main features that are related with the aim of this project. The

notification system raises the users’ awareness of the photos being shared on the web application.

Users can confirm or deny their recognised faces, but their privacy settings are applied automatically

even before they take any actions. Everything is system driven, adding the automatism that was

needed to give users more control over their own images. The users’ recognition profiles are built

through tagging, but they are given the opportunity to contribute in building a stronger training set.

Dynamically rendering the photos depending on the users in the photo, their privacy settings and

the person viewing it, makes sharing the image of an individual more challenging. 26

26

Find the entire ruby on rails web application source code on the attached CD: /applications/finalsystem/ruby_on_rails_application/DissertationSystem/

55

6. Testing & Results

6.1. Questionnaire

Results of the questionnaire released before the development stage (see Appendix A –

Questionnaire):

6.1.1. How much control do you feel you have over photos of yourself being uploaded by

others on social networks?

0 1 2 3 4 5

21.7% 30.4% 21.7% 8.7% 13% 4.3%

6.1.2. Did you ever report a photo you were part of?

Yes 8.7%

No 91.3%

6.1.3. Have you ever asked someone to remove a photo of you?

Yes 56.5%

No 43.5%

6.1.4. Did someone ever share a photo of you that made you feel uncomfortable?

Yes 65.2%

no 34.8%

6.1.5. If a picture made you feel uncomfortable and you didn't report it, why didn't you?

Not convinced 34.8%

Too much hassle 21.7%

Group picture 43.5%

Other 8.7%

6.1.6. Rate each of those features that could improve your control over photos being

uploaded of yourself on social networks.

Nope Bad Why not

Good Excellent

Automatically notify me when my face has been recognised in an uploaded photos:

4.3% 8.7% 8.7% 26.1% 52.2%

Hide me from the photo depending on who is viewing it:

8.7% 13% 30.4% 26.1% 21.7%

56

6.1.7. Rate the following ideas on how you could be possibly hidden from a photos

Nope Bad Why not

Good Excellent

Replace your face with a plain black square: 30.4% 30.4% 30.5% 8.7% 0%

Blur your face out: 4.3% 13% 34.8% 34.8% 13%

Pixelate your face: 13% 8.7% 26.1% 43.5% 8.7%

Replace your face with another photo of yourself that you would have chosen in your settings:

47.8% 21.7% 17.4% 4.3% 8.7%

Try to remove your entire body off the photo with the background:

21.7% 8.7% 17.4% 30.4% 21.7%

6.2. Face Detection

The following tests show how the face detection works for different images and settings. The false

negatives are faces that haven’t been detected, false positives are detected objects that aren’t faces.

The best results possible would be to have near zero false negatives, as the aim of this project is

being able to detect and recognise every single person that are part of a photo.

In all the tests, unless stated otherwise, the haarcascade used is one provided by OpenCV called

haarcascade_frontalface_alt2.xml.

Here are all the images used in the tests below

57

Figure 42: Test Image 1

29 faces


1 face


18 faces

6.2.1. Image Size

These tests aim to show how the detection handles different image sizes. The speed and detection

quality have been monitored.

The following tests all use the following settings:

Increase of window size at each round by 1.1

Minimum window size of (0,0)

No limit on maximum window size

3 minimum neighbours

All the following images will be resized with the following width: 1600, 1280, 1024, 800, 640, 320.

Their ratio won’t be changed.

a) Image 1 (see Figure 42)

Image Size Time(ms) False negative False positive

1600x1105 1728 0 0

1280x884 1097 0 1

1024x707 774 1 1

800x552 419 1 1

640x442 272 7 0

58

320x221 90 29 0

b) Image 2 (see Figure 43)


1600x1600 2666 0 1

1280x1280 1561 0 1

1024x1024 986 0 0

800x800 419 0 1

640x640 338 0 0

320x320 84 0 0

c) Image 3 (see Figure 44)


1600x803 1164 0 5

1280x642 758 0 5

1024x514 458 1 5

800x401 284 3 0

640x321 177 11 0

320x160 39 18 0

When analysing the results, it is clear that the higher resolution the photo is, the better the

detection is. The second image is very favourable for the detection to succeed, no matter the size of

the image, the face is successfully detected. Because the face is the centre of focus of this image, the

size of the face is still very good for detection, even if the size of the image is only 320x320 pixels.

This is the reason why the detection in both group photos is poor as soon as their resolution is

reduced. (see Figure 45)

The speed of detection depends on the size of the image as well; the detection runs faster on

smaller images because the area to cover is lower. (see Figure 46)

Having found out these two pieces of information, it is inevitable a compromise has to be made,

speed or quality of detection. It seems at the moment, using the current settings as described at the

beginning of this section, photos should be resized to have a maximum size of 1280x1280 pixels so

speed and quality of detection can co-exist. The following tests might result in finding better settings

for detection in smaller images. The best would be to have a good detection rate on an image of size

800x800 pixels.

59

Figure 45: Successful detections depending on image size

Figure 46: Detection speed depending on image size (lowest in fastest)

6.2.2. Scaled Window Size

These tests demonstrate the affects a scale factor has on the detection.

The following tests all use the following settings:




0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

1600 1280 1024 800 640 320

Succ

ess

ful D

ete

ctio

n (

%)

Width (pixels)

Detection Performances Depending on Image Size

Image 1

Image 2

Image 3

0

500

1000

1500

2000

2500

3000

51

20

0

70

72

0

10

24

00

20

54

40

28

28

80

32

08

00

40

96

00

44

16

00

52

63

36

64

00

00

72

39

68

82

17

60

10

48

57

6

11

31

52

0

12

84

80

0

16

38

40

0

17

68

00

0

25

60

00

0

Tim

e (

ms)

Image Area

Detection Speed Depending on Image Size

Speed

60

Image size of a maximum size of 800x800 pixels, as it is the size aimed to be used for a

reliable detection.

The scale factors to be tested are the following: 1.01, 1.02, 1.03, 1.04, 1.05, 1.06, 1.07, 1.08, 1.09,

1.10, 1.20, 1.30, 1.40, 1.50, 1.60, 1.70, 1.80, 1.90.

Normally, the detection using a lower factor should be more precise than a detection using a higher

factor.


Scale Factor Time(ms) False negative False positive

1.01 3664 0 9

1.02 1842 0 3

1.03 1237 0 1

1.04 928 0 1

1.05 733 0 1

1.06 647 0 1

1.07 544 1 1

1.08 496 0 1

1.09 414 1 0

1.10 388 1 1

1.20 232 5 0

1.30 165 15 0

1.40 121 27 0

1.50 113 26 0

1.60 100 28 0

1.70 90 28 0

1.80 79 28 0

1.90 75 28 0

61



1.01 5164 0 6

1.02 2579 0 2

1.03 1795 0 1

1.04 1355 0 1

1.05 1080 0 1

1.06 913 0 1

1.07 804 0 1

1.08 699 0 1

1.09 600 0 0

1.10 565 0 1

1.20 335 0 0

1.30 239 0 0

1.40 194 0 0

1.50 181 0 0

1.60 144 0 0

1.70 127 0 0

1.80 117 0 0

1.90 114 1 0



1.01 2476 0 10

1.02 1215 0 4

1.03 865 0 3

1.04 631 1 0

1.05 642 0 2

1.06 439 0 0

1.07 370 3 1

1.08 329 1 0

1.09 298 3 1

1.10 284 3 0

1.20 151 9 0

1.30 108 10 0

1.40 82 15 0

1.50 83 16 0

1.60 66 16 0

1.70 65 16 0

1.80 57 16 0

1.90 52 16 0

As predicted, the lower the factor is, the more detected objects there are. In the previous tests,

image 1 and 2 had a few false negatives (faces that weren’t detected), by lowering the scale factor

62

down, those faces were detected successfully (using scale factor of 1.06 instead of 1.10, see Figure

47). The size of the scale factor affects the speed of the detection, the lower the factor is, the more

sub-windows have to be taken into account. (see Figure 48)

Figure 47: Successful detections depending on the scale factor

Figure 48: Detection speed depending on scale factor, speed differs from images because of the difference between their areas

During the detection, when the factor is increased, the accuracy of the detection became less and

less accurate as shown in Figure 49. The detection on the left contains the whole face, where the

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

1.01 1.03 1.05 1.07 1.09 1.2 1.4 1.6 1.8

Succ

ess

ful D

ete

ctio

n (

%)

Scale Factor

Detection Performances Depending on Scale Factor

Image 1

Image 2

Image 3

0

1000

2000

3000

4000

5000

6000

1.0

11

.02

1.0

31

.04

1.0

51

.06

1.0

71

.08

1.0

91

.11

.21

.31

.41

.51

.61

.71

.81

.9

Tim

e (

ms)

Scale Factor

Detection Speed Depending on Scale Factor

Image 1

Image 2

Image 3

63

one on the right only has the area above the mouth of the person. It would be advised to keep the

scale factor below 1.2.

Figure 49: Scale factor of 1.1 on the left 1.6 on the right

6.2.3. Minimum Neighbours

These following tests demonstrate how the minimum neighbours’ value affects the face detection.

They all use the following settings:



Image size of a maximum size of 800x800 pixels, as it is the size aimed to be used for a

reliable detection.

Even if the previous tests have shown that a scale factor of 1.05/1.06 increase the detection

quality, the factor used in these tests is 1.10.

The “minimum neighbours” values to be tested are the following: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

The expected outcome is that a detection will most likely be made when using the least number of

minimum neighbours. When set to 0, it is hard to tell which is a false positive when found close to

face, therefore, any extra detection other than the number of faces supposed to be detect were

taken as false positives.


Min. Neighbours Time(ms) False negative False positive

0 394 0 265

1 460 0 4

2 391 0 1

3 399 1 1

4 399 3 0

5 390 5 0

6 385 7 0

7 386 10 0

64

8 395 13 0

9 408 16 0

10 418 19 0



0 575 0 60

1 572 0 1

2 579 0 1

3 591 0 1

4 586 0 0

5 585 0 0

6 586 0 0

7 564 0 0

8 585 0 0

9 592 0 0

10 573 0 0



0 259 0 134

1 269 0 2

2 274 1 1

3 292 3 0

4 287 3 0

5 271 7 0

6 273 8 0

7 278 8 0

8 286 11 0

9 292 12 0

10 291 14 0

After analysing these results, it is clear that the restriction imposed by the minimum neighbours

doesn’t affect the speed of the detection (see Figure 51), this is mainly due to its post implication.

The parameter aims to clear the data after the real detection has taken place.

As predicted, a low minimum neighbours’ value makes the grouping process less strict. This results in

increasing the number of false positives, and lowering the number of false negatives, so less faces

are missed out (see Figure 50).

65

Figure 50: Successful detections depending on minimum neighbours

Figure 51: Detection speed depending on minimal neighbours, speed differs from images because of the difference between their areas

6.2.4. Tilted Faces

The following tests check how the face detection handles tilted faces, with the face facing the

camera (frontal view). Only one image was used, making sure the face was well centred, and up-

right (90°, see Figure 52: Upright face (90°)) and the image was rotated 360°, 1° at a time to simulate

a tilted angle.

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

0 1 2 3 4 5 6 7 8 9 10

Succ

ess

ful D

ete

ctio

n (

%)

Minimum Neighbours

Detection Performances Depending on Minimal Neighbours

Image 1

Image 2

Image 3

0

100

200

300

400

500

600

700

0 1 2 3 4 5 6 7 8 9 10

Tim

e (

ms)

Minimum Neighbours

Detection Speed Depending on Minimum Neighbours

Image 1

Image 2

Image 3

66

Figure 52: Upright face (90°)

90° 1

91° 1

92° 1

93° 1

94° 1

95° 1

96° 1

97° 1

98° 1

99° 1

100° 1

101° 1

102° 1

103° 1

104° 1

105° 1

106° 1

107° 1

108° 1

109° 0

110° 0

111° 0

112° 0

113° 0

114° 0

115° 0

116° 0

117° 0

118° 0

119° 0

134° 0

135° 0

136° 0

137° 0

138° 0

139° 0

140° 0

141° 0

142° 0

143° 0

144° 0

145° 0

146° 0

147° 0

148° 0

149° 0

150° 0

151° 0

152° 0

153° 0

154° 0

155° 0

156° 0

157° 0

158° 0

159° 0

160° 0

161° 0

162° 0

163° 0

178° 0

179° 0

180° 0

181° 0

182° 0

183° 0

184° 0

185° 0

186° 0

187° 0

188° 0

189° 0

190° 0

191° 0

192° 0

193° 0

194° 0

195° 0

196° 0

197° 0

198° 0

199° 0

200° 0

201° 0

202° 0

203° 0

204° 0

205° 0

206° 0

207° 0

222° 0

223° 0

224° 0

225° 0

226° 0

227° 0

228° 0

229° 0

230° 0

231° 0

232° 0

233° 0

234° 0

235° 0

236° 0

237° 0

238° 0

239° 0

240° 0

241° 0

242° 0

243° 0

244° 0

245° 0

246° 0

247° 0

248° 0

249° 0

250° 0

251° 0

266° 0

267° 0

268° 0

269° 0

270° 0

271° 0

272° 0

273° 0

274° 0

275° 0

276° 0

277° 0

278° 0

279° 0

280° 0

281° 0

282° 0

283° 0

284° 0

285° 0

286° 0

287° 0

288° 0

289° 0

290° 0

291° 0

292° 0

293° 0

294° 0

295° 0

310° 0

311° 0

312° 0

313° 0

314° 0

315° 0

316° 0

317° 0

318° 0

319° 0

320° 0

321° 0

322° 0

323° 0

324° 0

325° 0

326° 0

327° 0

328° 0

329° 0

330° 0

331° 0

332° 0

333° 0

334° 0

335° 0

336° 0

337° 0

338° 0

339° 0

354° 0

355° 0

356° 0

357° 0

358° 0

359° 0

0° 0

1° 0

2° 0

3° 0

4° 0

5° 0

6° 0

7° 0

8° 0

9° 0

10° 0

11° 0

12° 0

13° 0

14° 0

15° 0

16° 0

17° 0

18° 0

19° 0

20° 0

21° 0

22° 0

23° 0

38° 0

39° 0

40° 0

41° 0

42° 0

43° 0

44° 0

45° 0

46° 0

47° 0

48° 0

49° 0

50° 0

51° 0

52° 0

53° 0

54° 0

55° 0

56° 0

57° 0

58° 0

59° 0

60° 0

61° 0

62° 0

63° 0

64° 0

65° 0

66° 0

67° 0

82° 1

83° 1

84° 1

85° 1

86° 1

87° 1

88° 1

89° 1

67

120° 0

121° 0

122° 0

123° 0

124° 0

125° 0

126° 0

127° 0

128° 0

129° 0

130° 0

131° 0

132° 0

133° 0

164° 0

165° 0

166° 0

167° 0

168° 0

169° 0

170° 0

171° 0

172° 0

173° 0

174° 0

175° 0

176° 0

177° 0

208° 0

209° 0

210° 0

211° 0

212° 0

213° 0

214° 0

215° 0

216° 0

217° 0

218° 0

219° 0

220° 0

221° 0

252° 0

253° 0

254° 0

255° 0

256° 0

257° 0

258° 0

259° 0

260° 0

261° 0

262° 0

263° 0

264° 0

265° 0

296° 0

297° 0

298° 0

299° 0

300° 0

301° 0

302° 0

303° 0

304° 0

305° 0

306° 0

307° 0

308° 0

309° 0

340° 0

341° 0

342° 0

343° 0

344° 0

345° 0

346° 0

347° 0

348° 0

349° 0

350° 0

351° 0

352° 0

353° 0

24° 0

25° 0

26° 0

27° 0

28° 0

29° 0

30° 0

31° 0

32° 0

33° 0

34° 0

35° 0

36° 0

37° 0

68° 0

69° 0

70° 0

71° 0

72° 1

73° 0

74° 1

75° 1

76° 1

77° 1

78° 1

79° 1

80° 1

81° 1

The output shows that a face was only detected between its upright position (90°) and 108°, as well

as between 74° and 89°. This only represents a small tilted angle of around ±17° from its upright

position, covering only 9% of the total possible angles a face can take in a photo.

When rotating the image to find all faces with a greater tilted angle than ±15°, the detection

improves, but the speed of detection gets dramatically worse as shown in the test below:

Type Time(ms) False negative False positive

Without rotation 200 10 0

With rotation (30° intervals)

2912 2 2

Figure 53: Detection in image with faces with different angles Without image rotation on the left With image rotation on the right

68

6.2.5. Rotated Out of Image Pane Faces

This test shows how the face detection works with faces having different angles out of the image

pane.

The test below uses the following settings:





It looks like the detection stops working on faces where the second eye is hidden (see Figure 54)

Figure 54: Detection on faces with different profile angles

When using a different haarcascade, haarcascade_profileface.xml, provided by OpenCV, the only

faces detected are the once looking towards the left (see Figure 55). It looks like the user is required

to flip the image vertically so the detector can detect the faces looking towards the right. The benefit

of using this haarcascade is very little due to the time of detection being twice as long, as two

different haarcascades will have to be used.

Figure 55: Detection using haarcascade_profileface.xml

6.2.6. Lighting

This test shows how the face detection works in different lightings; photos of the same person were

taken under different lighting.

The test below uses the following settings:


69




The output is good, taking into consideration that all the false negatives (8/44 in total) are images

with a very high contrast, where the light hides more than 50% of the undetected faces (see Figure

56).

Figure 56: Face under different lighting condition.


6.3.1. Training Set Size

The test demonstrates how the size of a training set can affect the recognition quality.

In this test, the training set of each label was incremented one by one for 24 rounds; the minimum

distance values for each label were recorded for each round. The full training set used is showed

below:

70

The image used to be recognised is:

Below is the output of the test:

Min difference value

# faces/label Label 0 Label 4

1 13,395.000 40.038

2 3,232.680 1,696.660

3 4,181.430 3,194.750

4 5,130.290 4,293.820

5 5,182.240 4,286.090

6 3,378.650 4,745.600

7 3,530.560 4,740.400

8 3,619.530 4,988.180

9 3,770.680 5,143.920

10 3,861.700 5,015.610

11 4,002.400 5,186.250

71

Highlighted values are the one defining the label for the recognised face, in red are the ones giving a

wrong recognition, the ones in green give a positive recognition.

The graph shows how the recognitions evolve when adding more and more faces to the training set.

At the start, the recognition is very inaccurate, but is stabilised after each label recruits at least 6

faces for the training set; at this point, the recognition becomes its most accurate, outputting the

right label, with the lowest minimum distance value. From that point onward, adding more and

more faces to the training set still give the correct label to the image, but the minimum distance

value slightly gets higher at each round.

Introducing more than 6 faces in this test makes the recognition less accurate, but it balances out

with the fact that it allows the recognition to handle many other different faces of the same person.

0.000

2,000.000

4,000.000

6,000.000

8,000.000

10,000.000

12,000.000

14,000.000

16,000.000

1 3 5 7 9 11 13 15 17 19 21 23

Min

imu

m D

ista

nce

Val

ue

Size of training set / label

Minimum Distance Depending on The Size of The Training Set

Label 0

Label 4

12 4,266.490 5,218.510

13 4,277.780 5,420.750

14 4,350.250 5,442.720

15 4,381.970 5,464.230

16 4,579.860 5,678.490

17 4,584.850 5,666.140

18 4,861.690 5,782.100

19 4,772.910 5,766.820

20 4,879.840 5,894.350

21 4,893.440 5,935.550

22 5,097.570 6,045.020

23 5,246.660 6,178.160

24 5,505.480 6,329.650

72

6.3.2. Background

This test shows how the background or noise introduced in training set, affects the recognition of

faces.

In these tests, two similar persons were used. The backgrounds of 2 of the faces were removed and

replaced with the exact same background, with high contract:

The face highlighted in red was used as the face to be recognised in the next two tests.

Introducing similar background noise in the training set:

The training set was composed of the following faces with their matching labels:

Label 0:

Label 4:

With this training set, the matching label was 4 (with a minimum distance value of 4901.75), which

was a mismatch.

Removing similar background noise:

When running the same test, but this time replacing the first image of the set of faces with label 4:

Label 4:

The recognition outputted the right label 0 (with a minimum distance value of 5299.69).

These two tests have proven that introducing backgrounds or noise into a training set can possibly

mislead the recognition, making it much less accurate.

Removal of the background:

In the third test, the training set is reverted back to the one used in the first test. This time, the faces

were cropped to remove as much of the background as possible:

73

Label 0:

Label 4:

The image to be recognised has to be cropped the same way as the faces that are part of the training

set:

The recognition outputted the right label 0, and with a much lower minimum distance value of

2129.95.

6.3.3. Equalise Histogram

This test has the goal to demonstrate how equalising the histogram of the images in the recognition

affects the recognition.

The same test was used as the last one in section 6.3.2, except the function equalizeHist() was run

on every images/faces that were used during the test:

Label 1:

Label 4:

The image use for the recognition is the one below:

The label outputted was the right one, 0 with a minimum distance value of 672.886. In comparison

with the result in the third test, where the value was 2129.95 (using exactly the same faces and

image, but without equalising the histogram), it is a great improvement for the recognition itself.

74

6.3.4. Straight Faces Only

This section demonstrates how having tilted faces in a training set can affect the recognition. There

were 3 tests ran, all using 3 different training sets:

Training set 1 (All faces tilted):

Training set 2 (Same faces as in training set 1, but all straight):

Training set 3 (Mix both training sets above):

The images used for the test are a mix of tilted and non-tilted faces:

Results:

Minimum distance value

Training Set 1 Training Set 2 Training Set 3

Image 1 5196.29 3736.9 3954.41

Image 2 4262.91 3650.88 4149.29

Image 3 2270.36 2982.78 2397.42

75

Image 4 2891.65 2559.56 3052.56

Straightening up the faces of a training set has improved the recognition accuracy by around 4% on

average in the tests above. Mixing both straight and tilted faces doesn't significantly improve the

recognition.

0

1000

2000

3000

4000

5000

6000

Image 1 Image 2 Image 3 Image 4

Min

imu

m D

ista

nce

Val

ue

Effect of Tilted Faces in Training Set

Training Set 1

Training Set 2

Training Set 3

76

6.4. Filters

6.4.1. Introduction

All the filters in the following tests are applied on this image below, on the character in the

foreground.

Figure 57: © Copyright: AMC 2011

6.4.2. Blurring

Demonstrate how the blur filter renders in different image sizes using the same values for each size:

The blurred face of the same image but processed in different image size, with radius of value 5 and

sigma of value 5:

1600x1280 1024x819 800x512 640x410 320x205

The blur filter is different and is better when the image size is the smallest.

77

The next test is the same but uses a dynamic radius and sigma values, relative to the face’s size,

using the width as the dynamic value, and a constant . :

c 1600x1280 1024x819 800x512 640x410 320x205

0.02

0.04

0.06

0.08

0.10

0.12

The blurring is the same in any image sizes. A face blurred with a constant of 0.1-0.12 is barely

recognisable.

6.4.3. Pixelate

Demonstrate how the pixelate filter renders in different image sizes using the same values for each

size:

The pixelated face of the same image but processed in different image size, with a shrinking scale of

0.2:

1600x1280 1024x819 800x512 640x410 320x205

The face pixilation is different and is better when the image size is the smallest.

78

The next test is the same but uses a dynamic pixilation value; the number of pixels should be the

same on all the pixelated face in different image size, with shrinking scale of

where

num_pixels is the number of pixels to be display on the x axis over the face:

Num_pixels 1600x1280 1024x819 800x512 640x410 320x205

2

4

6

8

10

12

The pixilation is the same in any image sizes. A face pixelated with a number of 2,4,6 pixels isn’t very

recognisable. From 8 upward, the face becomes more and more recognisable.


6.5.1. Introduction

Test if the privacy settings are applied properly. The following tests use three users:

User 1

User 2

User 3

6.5.2. Face Specific Settings

User 1 uploads a photo of himself and choose to hide himself from that specific photo. User 1 has no

global privacy settings.

79

i. User 1 is the owner of the photo

a) User 1 set his privacy setting to “none”

Output for User 1

Output for User 2 & 3

b) User 1 set his privacy setting to “rectangle”

Output for User 1

80


c) User 1 set his privacy setting to “blur”

Output for User 1


81

d) User 1 set his privacy setting to “pixelate”

Output for User 1


82

ii. User 2 is the owner of the photo

The outputs are the same as above, except for User 2, any privacy settings picked by

User 1 aren’t applied on the photo for User 2:

6.5.3. Auto Global Settings

User 1 is in the photo with User3 and User 1 is following User 2:

i. User 1 chooses to hide from everybody

Output for User 1

Output for User 2

83

Output for User 3

ii. User 1 chooses to hide from everybody except the people who are in the photo

Output for User 1

Output for User 2

84

Output for User 3

iii. User 1 chooses to hide from everybody except the people he follows

Output for User 1

Output for User 2

85

Output for User 3

6.5.4. Auto User Specific Settings

User 1 uses the following settings:

86

User 1 wants to hide from a group of people including User 2 (Skyler White) and when he is in a

photo with another group of people that includes User 3 (Jesse Pinkman). In the photo used in this

test, User 1 and User 3 are in the same photo:

Output for User 1

Output for User 2

Output for User 3

87

7. Evaluation

7.1. Face Detection

The face detection works better than expected. When using the web applications and mass

uploading hundreds of photos, most faces were detected, except the profile faces. The speed of

detection was fast enough, depending on the uploaded photos, as some contain more elements

than others, which can slow the detection right down.


The face recognition often recognised the wrong user. To overcome this, the user must have a good

number of trained faces to increase recognition precision. Contrary to face detection, face

recognition took the longest, due to the time it takes for the script to project known faces onto the

face space. This takes a longer time when the number of label faces grows considerably.

7.3. Filters

The filters applied on the recognised faces worked very well. The pixelated filters look the most

effective in simultaneously hiding a face and preventing the original photo from being totally

spoiled. The blur filter looks good also, but it takes the longest time to be applied to a photo

(blurring an image requires a lot of processing power in comparison to pixilation). The plain black

rectangle filter is the most effective in covering the face; however it's an ugly mark on the photo.

The object removal filter was to be implemented, but had to be abandoned due to time constraints.

Also, this filter wouldn't work due to it only removing the head and not the whole body. The filter

replacing a person’s head was built but not implemented in the final mock-up.

7.4. Face Detection & Recognition within Web Application

The implementation of the script within the Ruby on Rails framework went very smoothly. The script

was called as excepted and the JSON output made it very easy to turn the data into database

records.

7.5. Notification

AJAX was used for generating notifications and worked well. The problem was that the notifications

didn’t appear instantly after the photo was uploaded, because of the time it takes for the face

detection and recognition to be run.


The privacy settings offered to the users worked well. The interface itself could be improved because

currently, the system isn’t built to take into account large lists of users to hide from.

7.7. Photo Viewing

The photo viewing system works extremely well when it comes to respecting users’ privacy.

Generating an image depending on the viewer allows the users to share the same link, but have

88

different outputs depending on the users’ privacy settings. The massive drawback in using such

technique is the time it takes to generate each photo as it requires a lot of processing power,

especially when a user tries to view a large photo.

8. Conclusion

8.1. Project Overview

All the objectives set at the beginning of the project were all achieved. A research on the current

social networks allowed narrowing down the issues present when sharing a photo online and help

established a few solutions to fix them. The survey gave some figures to visualize how each problem

were perceived by the public and help confirming which ideas were to become requirements of the

final system.

Understanding face detection was not as bad as expected. The hardest part was to understand how

training the Haar features worked using the Ada Boost classification, but it became clearer after

going through some online resources [11] [12] [13]. Eigenfaces was the most challenging element of

this dissertation. Writing about it in the background research required a good understanding of

matrices and vectors, and its implementation took the longest.

The implementation of the face detection and recognition were achieved using the OpenCV, a very

complete but complicated library; a lot of time was spent learning the C++ language and

understanding the OpenCV library. The face detection and recognition were supposed to be

implemented using the libface library, but this idea was abandoned after realising this library didn’t

produce good enough results. Nevertheless, it helped understanding the OpenCV library, and

allowed to build the face detection and recognition with it.

Thanks to the very complete object detection function and Haar cascades provided by OpenCV, time

was saved by avoiding the process of running the trainings for the Haar features. As the detection

rate was poor, due to an inability to detect tilted faces and rotate images without cropping them

down, a function had to be generated to overcome these problems. This turned out to be a very

challenging step in the implementation of face detection, because it required a good knowledge of

maths, geometry and trigonometry.

The newest version of OpenCV made the implementation of the Eigenfaces for the recognition much

easier, and finding an add-on also helped a lot, but it was still one of the most challenging part of the

project. Much time was spent in trying to improve the recognitions (finding a way of removing the

background, equalizing the histogram etc.).

Having prior knowledge in ruby and using complete and simple RMagick API, allowed the

implementation of the filters to be quite a short process. The most challenging side was to find

functions to achieve the wanted effects, and trying to find a middle ground between keeping a user

anonymous, but without destroying the look of a photo.

Finally, implementing all previously built scripts into a final mock-up system using the Ruby on Rails

framework made the task much easier. Ruby on Rails allows the developer to experiment with

89

creating/removing models and controllers in seconds without affecting the rest of the framework. It

allows the user to revert back to previous database versions or improve the current one through the

use of migrations, so the user can add or remove fields in the already present tables.

The final mock-up system did achieve the aims set at the beginning of that project. It provided users

with better awareness of when photos of them were uploaded and allowed them to possess more

control over how their images were shared on social networks. The section below introduces some

aspects of the current system that could be improved to produce a better solution to the aim of the

project.

8.2. Further Work

8.2.1. Detect Profile Faces + Improve Detection of Tilted Faces

As described in section 4.3.3, there is a more efficient way to improve the detection of tilted faces,

covering a full 360° face rotation, and greatly improving the detection of profile faces at the same

time. The source code of the OpenCV library would need to be improved by implementing the new

needed Haar like features, and training them with a set of rotated and profiled faces.

8.2.2. Replace Eigenfaces with Fisherfaces

Eigenfaces is a really good face recognitions algorithm but is greatly affected by the changes in

lighting and face expressions. The solution would be to replace Eigenfaces with Fisherfaces, which

produce better recognition even in strong variations in lighting and changes in face expressions.

8.2.3. Improve Recognition Speed

At the moment, the CSV files are regenerated at each photo upload, which isn’t needed when a user

hasn’t labelled a face since the last time its CSV file was created. The file timestamp should be

compared against the last face added in the database of that user. If the timestamp of the last face

added to the file is more recent than the CSV file, then this CSV file should be updated.

The second thing is to find a way to use the last used recognition projections and add to them new

face so the script doesn’t need to set rebuild the face space each time it is called, as it is a very costly

process and is the main reason why the face recognition is slow.

8.2.4. Detect Full Body

The face is the only part of the body to be detected and covered. In some extreme cases, some users

might want to hide their full body and not only their face. OpenCV provide a Haar cascade for

detecting full body and could be used to not only detect faces, but a user’s body. The user could

then have the choice to hide their face only, or their entire body.

8.2.5. Object Removal

This one was one of the requirements but wasn’t implemented. It could still be nice to find a good

way to remove someone from a photo by replacing them with the background.

90

8.2.6. Caching System

Currently, when a user wants to view a photo, a brand new image is specifically generated for that

request. It takes time and uses a lot of processing power, which wouldn’t be suitable in a social

network used by millions of people. The solution would be to generate cache copy of those

generated photos, so one wouldn’t need to be generated for each user.

8.2.7. WebSocket

Notifications are currently handled via AJAX calls. When a user logs in, every 3 seconds, a HTTP

request is sent to the server and the server reply with a HTTP response, then the connection is

closed, then re-opened again after 3 seconds. This is unnecessary use of traffic, especially if that user

never gets recognised in a photo. By replacing AJAX calls (polling) with Web Sockets, the amount of

bits transferred is greatly (see Appendix I – WebSocket v. AJAX Polling ).

91

Bibliography

[1] Chris Putnam. (2010, February) Facebook Blog. [Online].

https://blog.facebook.com/blog.php?post=206178097130

[2] Henry A. Rowley, Shumeet Baluja, and Takeo Kanade, "Neural Network-Based Face Detection,"

January 1998.

[3] Henry Schneiderman and Takeo Kanade, "A Statistical Method for 3D Object Detection Applied

to Faces and Cars".

[4] eter M ller and Brani Vidakovic, Wavelets for kids : a tutorial introduction. Durham, NC, United

States of America: Institute of Statistics and Decision Sciences, Duke University, 1994.

[5] Paul Viola and Michael Jones, "Rapid Object Detection using a Boosted Cascade of Simple

Features," IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol.

1, p. 511, 2001.

[6] Reiner Lienhart and Jochen Maydt, "An extended set of Haar-like features for rapid object

detection," Proceedings International Conference on Image Processing, vol. 1, no. 1, pp. 900-

903, 2002.

[7] Yoav Freund and Robert E. Schapire, "A Short Introduction To Boosting," Journal of Japanese

Society for Artificial Intelligence, vol. 14, no. 5, pp. 771-780, September 1999.

[8] Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, "Eigenfaces vs. Fisherfaces:

Recognition Using Class Specific Linear Projection," in European Conference on Computer Vision,

1996.

[9] Matthew Turk and Alex Pentland, "Eigenfaces for Recognition," Journal of Cognitive

Neuroscience, vol. 3, no. 1, 1991.

[10] Paul Viola and Michael Jones, "Fast Multi-view Face Detection," Mitsubishi Electric Research

Laboratories, 2003.

[11] Robert Schapire. (2005, May) Boosting. Video. [Online].

http://videolectures.net/mlss05us_schapire_b/

[12] Jan Sochman and Jiri Matas. AdaBoost. Presentation. [Online].

http://cmp.felk.cvut.cz/~sochmj1/adaboost_talk.pdf

[13] Raul Rojas. (2009, Dec.) AdaBoost and the Super Bowl of Classifiers: A Tutorial Introduction to

Adaptive Boosting. Document. [Online]. http://www.inf.fu-berlin.de/inst/ag-ki/adaboost4.pdf

[14] Peter Lubbers, Frank Salim, and Brian Albers, Pro HTML5 Programming, 2nd ed.: Apress, 2011.

https://blog.facebook.com/blog.php?post=206178097130

http://videolectures.net/mlss05us_schapire_b/

http://cmp.felk.cvut.cz/~sochmj1/adaboost_talk.pdf

http://www.inf.fu-berlin.de/inst/ag-ki/adaboost4.pdf

92

[15] Robert Laganière, OpenCV 2 Computer Vision Application Programming Cookbook. Birmingham,

United Kingdom: Packt Publishing, 2011.

[16] Gary Bradski and Adrian Kaehler, Learning OpenCV, 1st ed. Sebastopol: O'Reilly Media, Inc.,

2008.

93

Appendices

Appendix A – Questionnaire

95

Results

96

Appendix B – Compute Pixel Region under Haar Feature without Integral

Image

The following formula get the value of a feature, where is the number of regions the feature has

and is the value of the pixel at the position and .

∑(( ∑ ∑ ( )

[ ] [ ]

[ ]

[ ] [ ]

[ ]

) [ ] )

is the value of every region’s value, which is the sum of all the pixels that the region covers,

multiplied by the weight of that region.

Appendix C – Application of a Haar feature over an image

The following example gives an idea of how the Haar like features are applied in an image of size

20x20 pixels. The feature used in this example is shown in Figure 58.

97

Figure 58: Haar like feature used in the example

The image used is represented in the matrix below – 20 columns and 20 rows. Each element in this

matrix represents the value of a pixel (its colour).

=

[

]

When the feature is applied on the image, it can be visually represented as shown on the

left, the sum of all the pixels lying under the black region will have a weight of -1 and the

one lying under the white region will have a weight of 2.

98

=

[

]

(∑ ∑ ( )

) is the value of the black region.

=

[

]

(∑ ∑ ( )

) is the value of the white region.

Haar like features translate the region of an image into data. The next step is to turn the data into

useful information. The best way to do this is to train the machine to understand those data, using

machine learning.

Appendix D – AdaBoost Example

The following example taken from the lecture given by Robert Schapire (the Toy Example – At

0:20:00 into the video) [11] helps to visualise how AdaBoost works. Each distribution is represented

on a canvas, where the (+) are the positive labelled instances, and the (-) are the negative labelled

instances. The size of each label represents their weights. In the first distribution, all instances have

the same weight (

). The weak classifiers are represented by the horizontal or vertical half

planes where { }:

99

Figure 59: The weak classifier in the distribution has an error of value ,

the sum of the weights of the errors (circled) in this classification.

(

)

Figure 60: All the previous instances that were misclassified have a new weight where

√ ( ) therefore ( )

For the correctly classified positives and negatives: ( )

and

(

)

Figure 61: All the previous instances that were misclassified have a new weight where

√ ( ) therefore ( )

For the correctly classified positives and negatives: ( )

and ( )

and

(

)

If the error rate is ½ lower than , this is good enough, no more stages are required.

Figure 59: First Distribution (D1)

100

Figure 60: Second Distribution (D2)

Figure 61: Third Distribution (D3)

Once all the rounds have been executed, calculating is done as shown in Figure 62.

Figure 62: Abstract representation of Hfinal

AdaBoost groups a few weak classifiers, producing a fairly reliable final classifier.

Appendix E – Example of the Creation of an Integral Image

Below is an example of how an integral image is built using an image of size 5x3 pixels, and how the

integral image is used when doing a detection.

101

Figure 63: Pixels and locations of the image. The grey area is the "void" needed to calculate the locations such as (A,A). The value of the overflow is equal to 0 (black).

For starters, all locations with the 2 following formats – ( ) and ( ) – all take the value (the

value of the “void” area). To calculate any other location:

( ) (( ) ( ) ( ))

Where and are the and letter on the axis and . the value of a pixel at

position ( ).

So, using the instructions above, ( ) , ( ) , ( ) , ( ) (( ) ( )

( )) . The table below contains the value of each location, acting as the integral image of

the one shown in Figure 63.

(A,A) 0

(A,B) 0

(A,C) 0

(A,D) 0

(B,A) 0

(B,B) 0

(B,C) 255

(B,D) 255

(C,A) 0

(C,B) 255

(C,C) 765

(C,D) 765

(D,A) 0

(D,B) 255

(D,C) 765

(D,D) 1020

(E,A) 0

(E,B) 510

(E,C) 1275

(E,D) 1785

(F,A) 0

(F,B) 765

(F,C) 1785

(F,D) 2550

Table 2: Locations values

The computation of the sum of the pixels lying under any region of this image has now become

constant. As an example, the sum of the pixels of the region at position ( ) of width 3px and

height 2px is equals to (( ) ( )) (( ) ( )) ( )

, instead of doing ∑ ∑ ( )

, which

would require iterations to compute that sum.

Appendix F – Steps Undertaken to Implement a Non-Cropping Rotation

OpenCV provides a way to rotate matrices, but one of the issues encountered was that the canvas

stayed the same size, meaning that rotating an image of size 800x600 pixels by 90° would crop the

image: the canvas would stay the same size (800x600 pixels) and the rotated image would be of size

600x800 pixels, resulting in the loss of 600x100 pixels at the top and bottom (see Figure 64).

102

Figure 64: Cropped image when using rotation, areas in red are cropped out, in black are the blank areas

To solve this issue, the canvas size had to be calculated before the rotation was applied on the

matrix. The canvas size can be computed by finding out the position of all the extremities of the

rotated image:

Using an image ( ) where are the points found at each corner of the image,

is the centre of (the point of rotation). To compute the canvas size, two consecutive points have to

be tracked, in the example, the points and were used, but it could be any of those pairs:

( ) ( ) ( ) ( ) (see Figure 65).

To track those two points, the length between the centre of the image and any of the

corners (( ) ( ) ( ) ( )), to find the length, Pythagoras Theorem can be applied:

√(

) (

)

The second asset needed to track the rotations of those points is to calculate the following angles:

(

) and

Once all those values are computed, the canvas height and width can be calculated using the sine

and cosine functions; the values they return are only the half of the canvas height and width,

therefore, they have to be multiplied by two to calculate the whole values:

When the rotating angle ( ) is between 0° and 90° or 181° and 270°:

( ( ) )

( ( ) )

When the rotating angle ( ) is between 91° and 180° or 171° and 359°:

( ( ) )

103

( ( ) )

Figure 65: Image I

The second issue arises during the rotation process; when the image is rotated, it is then copied onto

the new matrix (canvas), which still crops the newly rotated image (see Figure 66).

The solution was to add the extra margins (top, bottom, left and right) needed directly to the matrix

to be rotated, but never crop any parts of the image itself. For example, an image of size 800x600

pixels, to be rotated in a 90° angle, would take the size of 600x800 pixels. A border of 100px should

be added at the top and the bottom of the image (

), the left and right sides

shouldn’t be changed because 800 > 600.

Figure 66: Rotated image with the right canvas size but badly positioned

Once the new matrix is ready, the following steps have to be followed to get the point of rotation so

the picture can fit in the canvas:

104

Figure 67: Overview of how to find the point of rotation for it to land in the wanted area Orange: The current corners’ position

Yellow: Future position of the image after rotation Blue: Perpendiculars to the lines of the current and the future positions of the corners, crossing the centre of these line

Pink: Point of rotation found at the intersection of all perpendiculars

Calculate the current positions of all the corners in the new canvas:

( )

( )

( )

( )

Using the image centre as the current point or rotation, calculate the positions of all the

corners after rotation:

(

( ( ))

( ( )))

Where is the new position of a point , is the rotation value, used to change the sign

of and is the extra degrees depending on where the point is placed in the unit circle:

o

105

o

o

o

Find the 2 points that are the closest from the top ( ) and from the left ( ) corner of the

canvas (coordinates (0,0)), get the value of the one closest to the left ( ), and the value

of the one closest to the top ( ). With those values, move all the projected corners

positions toward the top left. For any corner :

Compute the coordinates of the centre point ( ) between any point and its projection

:

( )

{

|

|

| |

( )

{

|

|

| |

Find the gradient of the line ( ) of linear equation where is the gradient

and is the constant.

( )

There are a few special cases to be considered, which require different approaches:

o When and

meaning both points overlap each other,

can’t be calculated because can’t be divided by 0.

o When and

meaning the line ( ) is vertical, ,

can’t be calculated because can’t be divided by 0.

o When and

meaning the line ( ) is horizontal,

. Due to being equal to 0, it won’t be possible to calculate the

106

perpendicular.

The next step is to get the linear equations for the perpendicular line to the line ( ),

passing through the centre ( ). The general formula for all cases except the three

described above is:

( )

( )

( )

( )

( ) ( )

o In the first exception, there are no need to get the perpendicular to find out the

point of rotation, this point of rotation is simply the centre of the canvas:

(

)

o In the case of finding the perpendicular of a vertical line, the perpendicular must be

horizontal:

( )

( )

( )

( )

( ) ( )

( )

( )

o In the case of finding the perpendicular of a horizontal line, the perpendicular must

be vertical:

( ) ( )

The final part is to get the point of intersection between two perpendiculars; the two

perpendiculars must origin from two consecutive points. The following pairs would be valid

to find the point of intersection: ( ( ) ( )) ( ( ) ( )) ( ( ) ( ))

( ( ) ( )). In all cases, except the two exception (horizontal and vertical lines), the

formula to find the point of intersection between two lines is as follow:

|

( )

( )

( )

( )|

| ( ) ( )

|

( )

( )

o When dealing with a horizontal perpendicular line (for example ( ))

107

( )

( )

( )

( )

o When dealing with a vertical perpendicular line (for example ( ))

( )

( )

( )

These steps only have to be followed in cases when the canvas height after rotation is found to be

smaller than the image height, and it is the same with width. A video was made to watch the

rotating process of an image using the previously described steps27.

Appendix G – Tracking a Detected Face During Rotation

The following steps show how to track faces whilst rotating the image:

At rotation of angle 0°, save faces in a final list of detected object

For each rotation of angle , the positions of the current detected objects have to be re-

computed so their position can be translated as if the image didn’t rotate (see Figure 68):

27

http://youtu.be/c0CfTR-rkTc

http://youtu.be/c0CfTR-rkTc

108

Figure 68: Track detected face in a rotated photo

o Instead of tracking the region, get the centre point of this region to make tracking

easier

(

)

o Compute the position of in reference to the centre of the rotated image

(

)

o Calculate , the distance between the centre of the rotated image , and

√

o Get the angle of

109

{

(

)

(

)

{

o Compute the normal position of in reference to the centre of the image

( ( ) (

))

o Finally, compute the position in reference to the top left corner of the image

(

)

Figure 69: Tracked point matches position of the tracked face

Once the original position of the detected face is found, it has to be checked with the

current detected object found in the final list . If the point isn’t found within a current

detected object’s region, the face can be added to , the currently stored object’s region

can be compared with the new one, to find which is the most suitable for this area of

detection. To do so, each time an object is added to , the average of all the region’s area of

each object belonging to has to be updated, using this average, the difference of the two

compared region is calculated, the one with the lowest value becomes replace the other in

.

110

Appendix H – Straight Up Tilted Faces

Using trigonometry, an accurate value of the tilted angle can be calculated following a few steps:

Figure 70: Steps used to get a face's tilted angle using the eyes' coordinates © Copyright: AMC 2011

First, find out which of the detected regions ( and ) is the left and which is the right eye.

The coordinates of the centre point of each region has to be calculated, where is the

centre point of the region :

If

, then is the left eye and is the right eye , else if

, then

is the right eye and is the left eye . If the eyes did happen to be on the same axis,

that would mean they are invalid as the detector can’t detected faces tilted with an angle of

±90°.

Secondly, the centre between the two eyes has to be found:

{

111

Then find the distance between and (or ) :

√(

)

(

)

Finally, the tilted angle can be calculated:

( ( )

)

If , it means the face is tilted toward the left. If , it means the face is tilted toward

the right. If , then the face isn’t tilted.

To straight the face back up, an easy rotation of - should do the trick.

Appendix I – WebSocket v. AJAX Polling

Figure 71: Comparison of the unnecessary network overhead between the polling WebSocket traffic [14]

Use case A: 1,000 clients polling every second: Network traffic is (871 × 1,000) =

871,000 bytes = 6,968,000 bits per second (6.6 Mbps)

112

Use case B: 10,000 clients polling every second: Network traffic is (871 × 10,000) =

8,710,000 bytes = 69,680,000 bits per second (66 Mbps)

Use case C: 100,000 clients polling every 1 second: Network traffic is (871 ×

100,000) = 87,100,000 bytes = 696,800,000 bits per second (665 Mbps)

Use case A: 1,000 clients receive 1 message per second: Network traffic is (2 × 1,000)

= 2,000 bytes = 16,000 bits per second (0.015 Mbps)

Use case B: 10,000 clients receive 1 message per second: Network traffic is (2 ×

10,000) = 20,000 bytes = 160,000 bits per second (0.153 Mbps)

Use case C: 100,000 clients receive 1 message per second: Network traffic is (2 ×

100,000) = 200,000 bytes = 1,600,000 bits per second (1.526 Mbps)

people’s control over their own image in photossamuel.molinari.me/files/dissertation.pdf ·...

Documents