analysis of evaluation and benchmarking results [m18] · producer d4.3 analysis of evaluation and...

122
1 Analysis of evaluation and benchmarking results [M18] Deliverable D4.3 Producer File ID: WP4_D4.3.docx Version: 1.0 Deliverable number: D4.3 Authors: Kemal Görgülü (FE), Hans-Peter Richter (FE) Contributors: Christopher Ververidis (HIT), Panagiotis Sabatakos (HIT), Ioanna Roussaki (ICCS), Sergio Ghizzardi (DOMINO), Simona Tonoli (MEDIASET), Cecilia Vanella (MEDIASET), Laura Zotta (MEDIASET), Nikos Kalatzis (ICCS), Giorgos Mitsis (ICCS), Giorgos Marinellis (ICCS), Pasquale Panuccio (FINCONS), Louay Bassbouss (FRAUNHOFER) Internal reviewers: Nikos Kalatzis (ICCS), Sergio Ghizzardi (DOMINO), Work Package: WP4 Task: T4.3 Nature: R – Report Dissemination: PU – Public Status: Draft Contractual Delivery date: 30.06.2018

Upload: others

Post on 25-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

1

Analysis of evaluation and benchmarking results

[M18]

Deliverable D4.3

Producer File ID: WP4_D4.3.docx

Version: 1.0 Deliverable number: D4.3 Authors: Kemal Görgülü (FE), Hans-Peter Richter (FE)

Contributors:

Christopher Ververidis (HIT), Panagiotis Sabatakos (HIT), Ioanna Roussaki (ICCS), Sergio Ghizzardi (DOMINO), Simona Tonoli (MEDIASET), Cecilia Vanella (MEDIASET), Laura Zotta (MEDIASET), Nikos Kalatzis (ICCS), Giorgos Mitsis (ICCS), Giorgos Marinellis (ICCS), Pasquale Panuccio (FINCONS), Louay Bassbouss (FRAUNHOFER)

Internal reviewers: Nikos Kalatzis (ICCS), Sergio Ghizzardi (DOMINO),

Work Package: WP4 Task: T4.3 Nature: R – Report Dissemination: PU – Public Status: Draft Contractual Delivery date: 30.06.2018

Page 2: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

2

Versions and Controls

Version Date Reason for change Editor

0.1 30.11.17 Initial Structure Hans-Peter Richter (FE)

0.2 27.05.18 First chapters filled Hans-Peter Richter (FE)

0.3 18.06.18 Input from all contributing partners inserted all contributing partners

0.4 5.07.2018 Contribution from Sunny Side of the Doc benchmarking event Sergio Ghizzardi (DOMINO)

0.5 6.7.2018 Contribution from Sunny Side of the Doc benchmarking event Cecilia Vanella & Simona Tonoli (MEDIASET- RTI)

0.6 07.07.2018 Restructuring internal & external benchmarking chapters (FE) Kemal Görgülü (FE)

0.7 08.07.2018 Consolidation of all inputs from the partners Kemal Görgülü (FE)

1.0 09.07.2008 Formatting and proof-reading Kemal Görgülü (FE) & Nikos Kalatzis (ICCS)

Page 3: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

3

Table of Contents Executive Summary 81. Introduction 9

1.1 Background 91.2 Context 91.3 Objectives 111.4 Tasks 111.5 Methodology 111.6 Structure 12

2. Terms and abbreviations 143. Description of the prototype used for benchmarking 16

3.1 Description of tools and system integration level 163.2 Content used for benchmarking 17

4 Internal benchmarking 204.1 Scope and objective 204.2 Methodology 204.3 Results 20

4.3.1 Benchmarking against the user requirements 204.3.2 Objective benchmarking of ICCS tools 314.3.2.1 Automatic Annotation tool 314.3.2.2 Social Recommendation and Personalization tool 364.3.2.3 Integrated Trends Discovery tool – Gender Inference algorithm 504.3.3 Objective benchmarking of FOKUS tools 514.3.4 Subjective benchmarking by project partners 534.3.4.1 Subjective benchmarking by MEDIASET 544.3.4.1.1 MCSSR 544.3.4.1.2 ABT 554.3.4.2 Subjective benchmarking by DOMINO Production 564.3.4.1.1 ITDT 564.3.4.1.2 ABT 574.3.4.1.2 IEVCT 614.3.4.3 Subjective benchmarking of SSF by Fraunhofer FOKUS 64

5 External benchmarking 665.1 Scope and objectives 665.2 Methodology 66

4.3.4.2.14.3.4.2.24.3.4.2.3

Page 4: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

4

5.3 Results 675.3.1 Strategic Benchmarking by industry experts and professional users at NAB 2018, FKTG annual symposium 2018) 675.3.2 Subjective benchmarking by professional users and system integrators at FKTG Symposium 2018 735.3.3 Subjective benchmarking by professional users at Sunny-side-of-the-DOC market 2018 by Mediaset 785.3.4 Subjective benchmarking by professional users at Sunny-side-of-the-DOC market 2018 by DOMINO 835.3.5 Subjective benchmarking by students for part of the tool set (ICCS) 905.3.5.1 Automatic Annotation tool 905.3.5.2 Integrated Trends Discovery Tool 975.3.5.3 Social Recommendation and Personalization tool 114

6 References 122

Page 5: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

5

List of Figures Figure 1: Producer Workflow and System Architecture ....................................................... 10Figure 2: Object detection comparison ................................................................................ 34Figure 3: Effect of image resolution .................................................................................... 35Figure 4: Object detection performance comparison ............................................................ 35Figure 5: Discounted cumulative gain versus time with respect to theta values .................... 46Figure 6: R-score versus time with respect to theta values ................................................... 47Figure 7: Profile distance versus time with respect to different similarity measures ............. 48Figure 8: Discounted cumulative gain versus time with respect to different similarity measures ........................................................................................................................................... 49Figure 9: R-score versus time with respect to different similarity measures ......................... 49Figure 10: Accuracy and Coverage for PNN and SVM Hybrid Classifiers on identifying Twitter user gender.......................................................................................................................... 51Figure 11: Bandwidth evaluation......................................................................................... 52Figure 12: 360° Pre-Rendering ............................................................................................ 52Figure 13: Gane of Truth Billboard ..................................................................................... 58Figure 14: Game of Truth using ABT .................................................................................. 58Figure 15: Green Gold banner ............................................................................................. 59Figure 16: Green Gold billboard.......................................................................................... 59Figure 17: Green Gold using ABT ...................................................................................... 60Figure 18: Yamal using IEVCT ........................................................................................... 61Figure 19: Yamal Screenshots for SSF ................................................................................ 62Figure 20: Yamal Screenshots for SSF ................................................................................ 63Figure 21: Yamal Screenshots for SSF ................................................................................ 63Figure 22: Yamal Screenshots for SSF ................................................................................ 64Figure 23: 360VPT on multiple TV Screens for Playout ...................................................... 64Figure 24: Age & gender of testers ...................................................................................... 74Figure 25: Employment status and nature of current occupation of testers ........................... 74Figure 26: Education and multimedia metadata experience ................................................. 74Figure 27: Ease of logging in and navigation (dashboard..................................................... 75Figure 28: Ease of accessing tools from the dashboard and user-friendliness of dashboard .. 75Figure 29: Rating the number of data transmitted between the tools and speed of data exchange ........................................................................................................................................... 75Figure 30: Fit of tools to testers requirements ...................................................................... 76Figure 31: Improvement areas for the PRODUCER prototype collected at FKTG Symposium 2018 .................................................................................................................................... 77Figure 32: Age & gender of testers ...................................................................................... 78Figure 33: Employment status and nature of current occupation of testers ........................... 79Figure 34: Education and multimedia metadata experience ................................................. 79Figure 35: Level of experience with editing or tools for production ..................................... 80Figure 36: Overall opinion of the software about quality, ease of use, usefulness, meets requirements, meets expectations ........................................................................................ 82Figure 37: Gender of the testers........................................................................................... 84Figure 38: Age of the testers ............................................................................................... 84Figure 39: Employment status of the testers ........................................................................ 85Figure 40: Employment status of the testers ........................................................................ 85Figure 41: Education status of the testers............................................................................. 86

Page 6: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

6

Figure 42: Experience with multimedia tools ...................................................................... 86Figure 43: Age & gender of AAT testers ............................................................................. 90Figure 44: General demographic information about AAT testers ......................................... 91Figure 45: Feedback for AAT ............................................................................................. 92Figure 46: Feedback for AAT functionalities ...................................................................... 93Figure 47: Expectations for AAT ........................................................................................ 94Figure 48: Gender, age & education of testers ..................................................................... 98Figure 49: Occupation of testers .......................................................................................... 98Figure 50: Nature of occupation .......................................................................................... 99Figure 51: Experience and hours per day utilisation Social Media services .......................... 99Figure 52: Number of connections and preferred social media service ............................... 100Figure 53: Why do you use online social networks? .......................................................... 100Figure 54: “To which extent your interactions with social media are contributing in formulating your opinion for various societal issues?”.......................................................................... 101Figure 55: “Do you think that Social Media analytics can support the extraction of information regarding public opinion (similar to the information extracted via opinion polls by survey companies)?” .................................................................................................................... 101Figure 56: What is your level of experience in using tools that attempt to discover and process popularity/trends in Social Media and Search Engines ...................................................... 101Figure 57: Ethical issues on social media opinion mining .................................................. 102Figure 58: Ease of creating a new query at the "Add Query Parameters" page of the tool?. 102Figure 59: Ease of managing existing "Query Descriptions" page ..................................... 103Figure 60: Ease of producing results / reports .................................................................... 103Figure 61: Ease of reading the results. ............................................................................... 104Figure 62: How user-friendly is the Integrated Trends Discovery Tool? ............................ 104Figure 63: How successful is the Integrated Trends Discovery Tool in performing its intended tasks .................................................................................................................................. 105Figure 64: Meets expectations as these are defined in the innovations list presented upon video start ................................................................................................................................... 105Figure 65: Evaluate overall software quality ..................................................................... 105Figure 66: The Integrated Trends Discovery Tool provides various reports. Which are the more useful for you? .................................................................................................................. 110Figure 67: Estimation of cost in order to utilise ITD tool in business environment............. 110Figure 68: a) Education level b) Current Occupation ......................................................... 114Figure 69: Level of experience with Social Recommendation and Personalization Tools (1: no experience, 5: much experience) ....................................................................................... 115Figure 70: a) Difficulty of adding data to the system (1: very difficult, 5: very easy) b) Were the data needed by the system too much? .......................................................................... 115Figure 71: Matching of the generated with the expected user's profile (1: unacceptable, 5: excellent) .......................................................................................................................... 116Figure 72: Matching of the recommended videos to the user's expectations (1: unacceptable, 5: excellent) .......................................................................................................................... 116Figure 73: Overall Quality of Experience (1: unacceptable, 5: excellent) ........................... 117Figure 74: Importance of recommendations on a) videos b) enrichments (1: not essential, 5: absolutely essential) .......................................................................................................... 118Figure 75: Preferred relation of enrichments to the video content (1: Tightly related to video content, 5: Tightly related to user profile).......................................................................... 118

Page 7: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

7

Figure 76: Level of experience with Social Recommendation & Personalization Tools (1: No experience, 5: Much experience) ....................................................................................... 119Figure 77: a) Difficulty of adding data to the system (1: very difficult, 5: very easy) b) Were the data needed by the system too much? .......................................................................... 119Figure 78: a) Matching of the generated with the expected user's profile b) Matching of the recommended videos to the user's expectations c) Overall Quality of Experience (1: unacceptable, 5: excellent) ................................................................................................ 120Figure 79: Importance of recommendation on a) videos b) enrichments c) preferred relation of enrichments to the video content ....................................................................................... 121 List of Tables Table 1: Terms and abbreviations ........................................................................................ 15Table 2: System integration levels ....................................................................................... 17Table 3: List of benchmarking content ................................................................................ 19Table 4: OCD – benchmarking against user requirements ................................................... 21Table 5: ABT – benchmarking against user requirements ................................................... 23Table 6: ITDT – benchmarking against user requirements ................................................... 25Table 7: MCSSR – benchmarking against user requirements ............................................... 27Table 8: AAT – benchmarking against user requirements .................................................... 28Table 9: IEVCT – benchmarking against user requirements ................................................ 29Table 10: 360VPT – benchmarking against user requirements ............................................. 29Table 11: SSF – benchmarking against user requirements ................................................... 29Table 12: SRPT – benchmarking against user requirements ................................................ 30Table 13: Integrated Prototype – benchmarking against user requirements .......................... 31Table 14: Evalutation of face detection frameworks ............................................................ 32Table 15: Face recognition evaluation ................................................................................. 33Table 16: Evaluation of models on question-words ............................................................. 37Table 17: Evaluation of models on miller analogies test ...................................................... 38Table 18: Scoring for video “A new American toy” ............................................................ 39Table 19: Scoring for video “Football and Ski Jumping” ..................................................... 40Table 20: Scoring for video “Green Gold” .......................................................................... 40Table 21: Scoring for video “Documentary about Leonardo da Vinci” ................................ 41Table 22: Scoring of video “Documentary about Archimedes” ............................................ 42Table 23: Scoring of video “From an animal’s perspective, part 1” ..................................... 42Table 24: Profile distance versus time with respect to theta values ...................................... 46Table 25: Overview of subjective benchmarking tests ......................................................... 54Table 26:Consolidated feedback about the integrated prototype at NAB 2018 ..................... 73Table 27: Consolidated feedback from “Sunny Side of the Doc” ......................................... 89Table 28: General quotes for AAT ...................................................................................... 95Table 29: Main categories of replies on the question on "How can we improve our Tool?" 106Table 30: Are there use case scenarios, where Integrated Trends Discovery Tools are suitable for your business? Please explain ...................................................................................... 108Table 31: Which should be future R&D directions towards improving the Integrated Trends Discovery Tool?................................................................................................................ 109Table 32: Do you know of any products or services with features similar to the ones of the Integrated Trends Discovery Tool? ................................................................................... 110

Page 8: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

8

Executive Summary

This deliverable summarizes the work performed in task T4.3 “Evaluation, Testing and Benchmarking” which is part of work package 4 “Platform Integration, Evaluation and Benchmarking”. In accordance with the Project Description of Work (WP4 Description) the following subtasks are covered in this deliverable:

a) Evaluation of the different tools with carefully designed experiments and benchmarking

procedures

b) Evaluation of the integrated PRODUCER prototype The main target of T4.3 was the evaluation of the prototype solution towards the following criteria:

- Prototype meets the user-defined specifications defined in WP2 (use cases and related end user / investor requirements)

- Prototype reaches or supersedes the state-of-the-art performance of tool-related algorithms and other specific performance data

- Prototype has a unique feature set and is technological advanced compared to already available tool sets and is raising interest from end users, professional users and from potential system integrators

These tasks were accomplished by

1. Internal benchmarking against the end user and investor requirements 2. Internal benchmarking tests at the technical project partner sites (technical

performance tests, end user tests), 3. External benchmarking tests by end users, professional users and relevant industry

stakeholders (both subjective tests and strategic benchmarking). The integration was tested by evaluating the dashboard as integration layer and by testing the exchange of content data and metadata between the individual PRODUCER tools.

In summary the prototype has passed the benchmarking tests with very good success. This applies for the benchmarking against the user requirements and as well for the benchmarking tests of some of the tools against state-of-the-art. Nearly all of the end user and investor user requirements for the prototype were fully or at least partially fulfilled and the technical benchmarking tests showed that the PRODUCER prototype supersedes or at least equals the state-of-the-art performance of comparable algorithms and tools. End users, professional users and potential investors rated the current feature set and the actual performance of the prototype as already high. But they also provided valuable feedback on improvement areas, like further harmonization of user interfaces and enhanced integration features with existing production infrastructures.

Page 9: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

9

1. Introduction

1.1 Background From the earliest days of cinema, documentaries have provided a powerful way of engaging audiences with the world, by insisting on pursuing truth through a mode of seeing and artistic creation that no other art form ever provided. Furthermore, documentaries had social and market impact because they adapted to the available means of production and distribution. More than any other type of films, documentarians were avid adapters of new technologies, which periodically revitalized the classical documentary form. Nowadays, the documentaries creation process faces several problems and it is still characterized by a particularly heterogeneous and demanding workflow, as well as a long and demanding process from costs, time and viewers’ engagement perspective. The documentary creation process consists of three main phases: (a) pre-production, (b) production and (c) post-production. The pre-production phase contains, the idea pitching based on the current social trends and audience’s interests, the research about documentary’s topic, the available archival material, the script writing and story outlining, along with the arrangement of necessary interviews and scouting of locations about shooting. In the production phase the shooting of primary footage is realized, as well as the conducting of interviews, the capturing of audio and the keeping of shots logs in physical and / or digital format, so as to better organize the captured multimedia content. Finally, in the post-production phase the transcribing of the interviews, the indexing of the captured footage, the editing of the overall collected multimedia content and its fine-tuning towards meeting viewers’ prerequisites and interests are fulfilled. The PRODUCER prototype is paving the path towards supporting the evolutionary transformation of the well-established and successful traditional model of linear documentaries to interactive documentaries, by responding to the recent challenges of the convergence of interactive media and documentaries. This is achieved through the provisioning of a set of enhanced ICT tools that focus on supporting various stages of the aforementioned documentary creation phases, ranging from the user engagement and audience building, to the final documentary delivery. Apart from directly reducing the overall production cost and time thus enabling small documentary production houses to become a significant part of the documentary production arena while increasing their market share in this field, our project would result in enhancing viewers’ experience and satisfaction by generating multi-layered documentaries, and delivering to the viewer more personalized services allowing him to choose different viewing paths primarily with respect to the documentary format and playout system.

1.2 Context The objective of PRODUCER project is to devise innovative tools for a cost-effective and interactive - enriched production of video documentaries. In order to reach this objective, an underlying workflow model and system architecture for the project was developed (figure 1).

Page 10: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

10

Figure 1: Producer Workflow and System Architecture

The basic workflow as well as the system architecture are based on the user and investor requirements which were collected in WP2 “Requirements Analysis and Collection of Documentary Multimedia Content and User Generated Content”. The tool-set comprises

- an Audience Building tool (ABT) which leverages the power of social media and of gamification techniques for assisting documentary producers to attract and engage audiences

- an Integrated Trends Discovery tool (ITDT) that allows the identification of the most engaging topics to a specified target audience offering valuable insights to the documentary producer

- an Open Content Discovery tool (OCDT) mining across open repositories, libraries and archives, to discover research information, e.g. text, video, pictures, and in general multimedia content that may be related to the topic that the documentary will elaborate on

- a Multimedia Content Storage, Search & Retrieval tool (MCSSR) that serves as a cloud-based searchable multimedia content repository, where the content will be organized in thematic units and paired with the annotations provided by the Automatic Annotation tool (AAT), which will comprise the metadata

- an Automatic Annotation tool that not only supports the content annotation by various automatic content analysis components but also provide input to the Interactive-enriched Video Creation tool (IEVCT) for an automatic pre-selection of enrichments that will be paired with the content

- an Interactive-enriched Video Creation tool that supports the generation of interactive and clickable objects on the videos

- a Second Screen Interaction tool (SSIT) that supports the delivery of “Interactive TV ADs”, an interactive and extended version of an AD, which can be accessed on paired

Page 11: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

11

companion devices that is linked to and triggered by the AD shown on the main TV screen

- a Social Recommendation & Personalization tool (SRPT) that enables the production houses of documentaries and correspondingly the broadcasters to deliver personalized and targeted to audience’s interests versions of documentaries

- a 360° Video Cloud Streaming tool (360VPT) providing a high quality 360° video experience on low capability devices, such as Hybrid TVs (HBBTV), or in cases of constrained network connectivity e.g. on mobile devices

A Dashboard provides functionalities like Single-Sign-On and user profile setting and supports the navigation to the specific tools. In addition to the platform internal interfaces between the different tools quite a number of external interfaces to relevant repositories, social media platforms and professional production tools were implemented.

1.3 Objectives The overall objectives of WP4 are

- the specification of the functional design and the skeleton of the PRODUCER platform architecture as well as the platform integration plan

- the successful integration of all developed tools in the PRODUCER platform - the development of appropriate interfaces to existing production infrastructures - a thorough evaluation, testing and benchmarking of the PRODUCER platform as a

whole in order to guarantee that it meets all the requirements for its roll-out as a fully functional solution

1.4 Tasks The benchmarking of the PRODUCER prototype is part of the WP4 Platform Integration, Evaluation and Benchmarking. The tasks

- T4.1 Platform Toolset integration, and - T4.2 Platform interfacing with existing production infrastructures

are providing the ground for the benchmarking task

- T4.3 Evaluation, Testing & Benchmarking The benchmarking task is set-up in a way that user requirements will be traced from concept to design and - by applying various benchmarking methodologies and various types of benchmarkers - will finally guarantee that the final integration work has been successful.

1.5 Methodology The methodology of the benchmarking for the PRODUCER prototype was based on various

tools and procedures, depending on the type of benchmarking performed:

a) Workshop

Page 12: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

12

A workshop consisting of several steps with all project partners was conducted in order to benchmark the investor and end user requirements as defined in Deliverable 2.1 against the results that were achieved with the PRODUCER prototype. During this workshop the list of user requirements was evaluated step-by-step and it was documented whether the requirements were fully, partially or not fulfilled.

b) Objective benchmarking against state-of-the-art For some of the tools an objective benchmarking against state-of-the art where accomplished by the respective technical partners. This benchmarking was focussing to performance of algorithms, for e.g. object detection and personalization, as well as performance regarding specific resource usage like e.g. bandwidth.

c) Non-guided / guided interview based on questionnaires

Tool specific questionnaires were used for both external and internal subjective testing. As part of this benchmarking the prototype results were measured against the expectations of end users (e.g. students), professional users and potential investors. Some of the sessions were conducted as guided interviews with the benchmarkers.

d) Public showing / demonstration The PRODUCER concept and prototype was introduced to relevant business players and potential investors during the NAB (National Association of Broadcasters) exhibition in Las Vegas in April 2018. Additional interviews with experts were conducted at the FKTG bi-annual symposium in Nuremberg (June 2018) and the Sunny Side of the Doc conference - focussing on documentary and specialist factual content - in La Rochelle (June 2018). With this strategic type of benchmarking very valuable input was received with respect to short and medium-term exploitation options.

e) Documentary movie production Last but not least documentary movies were produced by utilizing the PRODUCER tool set. Since experienced producers were involved in the production this benchmark was the most critical one to pursue.

1.6 Structure Section 2 provides a list of terms and abbreviations used throughout this deliverable. Section 3 gives a short description of the integrated prototype that was used for the benchmarking tests. A more detailed description of the integrated toolset is provided in the WP 4.1 deliverable. In addition, this section covers the video and metadata content used for the benchmarking.

Page 13: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

13

Section 4 provides a description of the scope and objectives, the methodology and finally the results of the internal benchmarking tests. Three different types of benchmarking tests were performed: a benchmarking of the prototype against the user requirements that were raised in Deliverable D2.1, a benchmarking of a subset of the PRODUCER toolset against the state-of-the-art, and finally subjective benchmarkings for all of the tools stretching from online testing from various partners up to complete documentary movie productions conducted by Domino Productions. Finally section 5 covers the results from the external benchmarking activities, covering subjective evaluations by end users and professional users as well as a strategic type of benchmarking with industry experts and potential investors.

Page 14: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

14

2. Terms and abbreviations

360°VPT 360° Video Cloud Streaming tool

AD Advertisement

API Application Programming Interface

AAT Automatic Annotation Tool

ABT Audience Building Tool

ITD Integrated Trends Discovery tool

FKTG Fernseh- und Kinotechnische Gesellschaft

GUI Graphical User Interface

HbbTV Hybrid Broadcast Broadband TV

IEVC Interactive-Enriched Video Creation tool

JSON JavaScript Object Notation

MCSR Multimedia Content Storage, Search & Retrieval tool

NAB National Association of Broadcasters

OCD Open Content Discovery tool

REST Representational State Transfer

SRPT Social Recommendation and Personalization Tool

SSF Second Screen Framework

Page 15: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

15

WP Work Package

XML Extended Markup Language

Table 1: Terms and abbreviations

Page 16: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

16

3. Description of the prototype used for benchmarking

3.1 Description of tools and system integration level

The prototype which was used for the benchmarking was the final integrated PRODUCER platform which consists of the Dashboard and all the pre-production, production and post-production tools that are detailed in D4.2. The tools that integrated in the final PRODUCER platform are enlisted below:

1. Dashboard 2. Audience Building tool (ABT) 3. Integrated Trends Discovery tool (ITDT) 4. Open Content Discovery tool (OCDT) 5. Multimedia Content Storage, Search & Retrieval tool (MCSSR) 6. Automatic Annotation tool (AAT) 7. Interactive-enriched Video Creation tool (IEVCT) 8. 360° Video Playout tool (360 VPT) 9. Second Screen Framework (SSF) 10. Social Recommendation & Personalization tool (SRPT)

Regarding the platform, there was a three-level integration that led to the final integrated system. Table 1 displays the five levels of integration of the benchmarked prototype. The last column of the table describes the involved components in each level of integration.

Level Description Involved components

1 Single Sign On (SSO) Dashboard, ABT, ITD, OCD, MCSSR, AAT, IECV, SRP

2 Back-end: i. Web services implementation & exposal ii. Workflows adaptation & web services consumption

ABT, ITD, OCD, MCSSR, AAT, IECV, SRP, 360VPT, SSF

3 Front-end: Common style Dashboard , ABT, ITD, OCD, MCSSR, AAT, IECV, SRP, SSF

4 Front-end: UI updated for calling the exposed services

ABT, OCD, MCSSR, ITD, IEVC

Page 17: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

17

5 Common Domain naming Dashboard, ABT, ITD, OCD, MCSSR, AAT, IECV, SRP, 360VP, SSF

Table 2: System integration levels

In order for a user to be able to access all platform’s tools with a single log in step, a Single Sign On (SSO) approach was followed. Moreover, a new user interface (a.k.a. Dashboard) was developed that facilitates a user’s registration, SSO and navigation into the platform’s tools. As Table 1 describes, all the platform’s components needed to update their log-in mechanism in order to comply with the SSO approach, except the SSF and the 360VPT. Furthermore, for the integration of the platform it was necessary to also perform the back-end integration of the tools. For this reason, several already or new implemented services were exposed as RESTful services. By doing this and by adapting each tool’s usage workflow so as to consume the exposed, by the rest tools, services the back-end integration was succeeded. All the components except Dashboard made the required modification in order for the back-end integration to be successful. Apart from the back-end integration, also the front-end was important, so the front-end of all tools was adjusted as much as possible in order to follow common style rules. Also, the needed modifications to each tool’s UI were implemented in order for the user to be able to utilize the integrated functionalities of the rest of the platform’s tools. Updates or new implementations on the UI of the ABT, ITD, OCD, IEVC and MCSSR tools needed to be performed as Table 1 displays. Finally, the names of the domains used by the Dashboard and each tool, were carefully selected. The producer-toolkit.eu was used as the main domain name of the platform which serves the Dashboard and for all the other tools, subdomains have been created in the following format: <tool_name>. producer-toolkit.eu

3.2 Content used for benchmarking The content ingested in the PRODUCER platform for the internal benchmarking and user testing, is coming mainly from the project partners Mediaset and Domino Productions. Besides already available content Domino Productions has produced special 360° content for the PRODUCER prototype benchmarking (Yamal). Depending on the specific functionalities of each tool different pieces of content were used for the benchmarking.

The following table provides an overview of the content that was used for the benchmarks.

Content Source

Collection Title Nber of clips

Genre Description

Length Audio Video

DOMINO Green Gold 1 Investigation

83 Stereo-5.1 2K-H264

Page 18: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

18

DOMINO A leak in paradise 1 Investigation

75 Stereo-5.1 HD-H264

DOMINO Spain, facing uncertainty’

1 History 60 Stereo H264

DOMINO The Blow of chemic weapons

1 History 52 Stereo H264

DOMINO One way ticket 1 Society 35 Stereo H264

DOMINO Love is love 1 Society 20 Stereo H264 DOMINO What about Eric 1 Society 52 Stereo H264 DOMINO The contagion 1 Politics 52 Stereo H264 DOMINO Cinéma Inch Allah 1 Society 90 Stereo H264 DOMINO Copenhague, the

climate war 1 Politics 52 Stereo H264

DOMINO Europe, 180 days to convince

1 Politics 90 Stereo HD

DOMINO At the heart of Europe

1 Politics 90 Stereo HD

DOMINO CONVENTION 1 Politics 52 Stereo SD DOMINO Be President 1 Politics 52 Stereo SD MEDIASET

Afghanistan 5 Wars and conflict

Military life in Afghanistan

Mono .mov

MEDIASET

PREZIOSE DESK IMMAGINI NEWS

2 Rushes Bridges and waterways in Pavia, Torino and Firenze

Mono .mov

MEDIASET

Archimede 1 Arts and Culture

Biography 63'29" Mono mxf

MEDIASET

Gengis Khan 1 Arts and Culture

Biography 53'50" Mono mxf

MEDIASET

Leonardo 1 Arts and Culture

Biography 52'18" Mono mxf

MEDIASET

Secret Italy 1 1 Travels Tourist itineraries in Italy

24'07" Mono mxf

MEDIASET

Secret Italy 2 1 Travels Tourist itineraries in Italy

26'14" Mono mxf

MEDIASET

Secret Italy 3 1 Travels Tourist itineraries in Italy

26'19" Mono mxf

Page 19: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

19

MEDIASET

Secret Italy 4 1 Travels Tourist itineraries in Italy

27'56" Mono mxf

MEDIASET

Secret Italy 5 1 Travels Tourist itineraries in Italy

27'01" Mono mxf

MEDIASET

Secret Italy 6 1 Travels Tourist itineraries in Italy

29'25" Mono mxf

MEDIASET

In defence of the animals 1

1 Society and social issues

Animals’ tales

25’47’’ Mono .mov

MEDIASET

In defence of the animals 2

1 Society and social issues

Animals’ tales

24’34’’ Mono .mov

MEDIASET

In defence of the animals 4

1 Society and social issues

Animals’ tales

25’45’’ Mono .mov

MEDIASET

In defence of the animals 6

1 Society and social issues

Animals’ tales

26’01’’ Mono .mov

MEDIASET

In defence of the animals 7

1 Society and social issues

Animals’ tales

26’49’’ Mono .mov

MEDIASET

In defence of the animals 8

1 Society and social issues

Animals’ tales

26’29’’ Mono .mov

MEDIASET

In defence of the animals 9

1 Society and social issues

Animals’ tales

25’34’’ Mono .mov

Table 3: List of benchmarking content

Page 20: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

20

4 Internal benchmarking

4.1 Scope and objective The scope and objective of this task was to evaluate the different developed components with carefully designed experiments and benchmarking procedures. This has been achieved by various benchmarking methods, which were applied both within the project team (internal benchmarking) as well as outside the project team (external benchmarking with end users, professional users and potential investors). Regarding internal benchmarking the results from the prototype were benchmarked against the investor and end user requirements that were set up in the Deliverable D2.1 ‘Report on investors´ and viewers´ requirements and usage scenarios’. As part of a second internal benchmarking stream the performance of a subset of the toolset were tested against the state of the art in the respective technology fields by ICCS and FOKUS. Finally, subjective benchmarking tests were performed by various project members for different tools, either by performing tests within the tools or by utilizing the toolset for a documentary movie production.

4.2 Methodology The following methodologies were used for the internal benchmarking tests:

- Workshop utilizing a step-by-step approach for the objective benchmarking against the list of investor and end user requirements from D2.1

- Objective measurement of performance parameters against state-of-the art - Subjective benchmarking of individual tools by project members - Utilizing the toolset for documentary video production

4.3 Results 4.3.1 Benchmarking against the user requirements As part of a workshop including re-iteration steps the project partners benchmarked the prototype against the list of user requirements (reference D2.1). For this purpose each requirement was reviewed and it was stated whether the requirement was fully, partially or not achieved. If required, additional comments were provided as well (tables 4 - 13). The tables below are restricted to the prototype relevant requirements.

Page 21: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

21

User Requirement - OCD Prototype Benchmarking

Fulfilment Comment

Searching for content in "open" archives - one public domain archive: "EU-Archives" - one commercial archive (like Getty images) - Netflix or iTunes, Amazon Prime Video

complete Actually, repo integrated: - Youtube - InternetArchive - Pexels - Pixabay - Vimeo - Wikipedia Integration with repositories is subjected to the API availability of such systems

Searching for content in commercial VoD platforms (such as iTunes) (see above)

partially No integration with iTunes or Amazon Prime Video is in place

Searching for content with keywords and tags that are the outcome of the Integrated Trends Discovery Tool

complete Filtering properly on territorial information won't be possible / depends on the exact API-endpoint that is provided.

Search results: - Show on which platform / open archive the content is available - Show in which territory the content is available - Show licensing informations (commercial, free, region, ...)

partially No territory information provided by actual repository

Preview and Download of multimedia content directly from OCD

complete This functionality is subjected to the licensing model of the repository providing the multimedia content.

Search engine should implement semantic, keyword and free text lookup

complete

Filtering based on multimedia content metadata complete Actually filters on media type and repository type

Search results should include either AV or informative content from text libraries

complete

Direct upload of found content into MCSSRT for further usage without quitting session

complete

Integration with MCSSR to ingest selected multimedia content

complete

User role-based access control complete Tag metadata information directly into OCD complete Download of content directly from OCD complete

Integration with SSO functionalities provided by ABT

complete

Table 4: OCD – benchmarking against user requirements

Page 22: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

22

User Requirement - ABT Prototype Benchmarking

Fulfilment Comment

Track down the feedback from a certain post by extracting statistics on profile info from the responders

partially ABT can support this assuming audience-member has logged-in using FB/Twitter credentials in PRODUCER or has provided demographics data upon PRODUCER platform login. Note that Non-follower generic interests and trends input is given by Trends Discovery Tool.

Generation of statistics based on the activity and the the available profile information of the responders.

Additionally, the gamification framework will support personalized statistics for each player/user.

complete

Publishing content (through forwarding/sharing) in Facebook's page and/or twitter wall

complete

Create new sessions (i.e. campaigns/projects) and keep organized the old ones with editing capabilities

complete

Post multimedia files (video,images,documents) to the linked Facebook Page and/or Twitter accounts

complete

Offer a gamification editor to be handled by the "producer"

complete

Offer generation, storage and editing of multiple gamification scenarios per session (aka. project/campaign), including choice of targetization, requesting either individual or collaborative (i.e. teaming-up) gaming

complete

Generation and visualization of KPIs that directly or indirectly measure the acceptance, appeal, etc. of a certain session (aka. project/campaign) to the responders.

Additionally, the gamification framework will support PBL for all and personalized statistics for each user

complete

Page 23: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

23

Support interactive gamification concepts with the users, like rewards like graphical badges and/or tangible goods/items

complete

Registration/Signing Up and unique linking with ones facebook and/or twitter account

complete

Merging and central synchronous access to facebook and twitter accounts

complete

Support of sharing of specific media files (incl. final product) through forwarding to social media and/or of generating links to be distributed to selected users

complete

Synchronization between activities occurring on Twitter and facebook page with the connected ABT session

complete

Support of enhancing the retrieved facebook/twitter profile data with extra ones

complete

The system will support feedback answers on suggestions for improvement and clear and easy to understand messages

complete

The GUI will be clearly visible and user friendly and it will be fully integrated to PRODUCER front-end.

complete

A usage manual guide will accompany the ABT complete

Support of different profile types (i.e. investor, producer & simple audience) with the appropriate screens

complete

Export network to other modules of the Producer complete

Support of inviting friends, creating a community network & chatting and file exchanging capabilities within the network

complete

Table 5: ABT – benchmarking against user requirements

Page 24: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

24

User Requirement - ITDT Prototype Benchmarking

Fulfilment Comment

Detect generic popularity of a documentary topic. (a term escorted by an audience popularity metric)

complete The ITD tool provides various popularity metrics escorting the provided keyword (e.g. normalised popularity metrics for region and time-based results, absolute number of google searches, number of retweets and Twitter likes, etc.)

Detect popularity of a documentary topic for specified region(s) -"Europe"

- "Asia"

- "Northern America"

- "DACH" (Germany / Austria / Switzerland)

- ...

partially The tool can provide popularity for individual countries and for the different regions/counties of a country specified by the user.

Detect popularity of a documentary topic for specified time period (e.g. significant dates, identification of seasonal habits)

complete The ITD tool is capable to perform a research for a time period specified by the user. For the given time period the popularity of Google searches are presented in a time series graph which allows the user to identify significant dates, seasonal patterns, etc.

Detect popularity of a documentary topic based on combined criteria.(time and location)

complete Before a user start a query, s/he can provide the location and time period of interest. Research is performed based on this input when possible. For example the official free version of Twitter API only provides data for the period of the last week.

Assemble social media meta data (e.g. Twitter Hashtags) and group them according to specific social media networks

- Facebook (?)

- Twitter

- Instagram (?)

- ...

The idea is to see WHERE a specific topic is talked about

complete It was a design decision not to integrate with Facebook as it only allows very limited disclosure of user originating data for entities (e.g. ITD tool) that are not connected (e.g. Friends) with the users. Currently ITD utilises as information sources the Google Trends Engine, the Google adwords service and the Twitter API.

Page 25: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

25

Assemble and show most relevant Twitter-hashtags to a specific topic (relevance = number of tweets etc. for a specific topic-hashtag)

complete

Identify correlated terms of a specific documentary topic; a metric defines the correlation level

complete

Basic audiences sentiment analysis for a documentary topic.

complete

Detect popular questions, issued by internet users, related with a topic

complete Questions related with the provided keyword and that are issued to google search service are extracted and presented as part of the overall research

Present results in various forms (e.g. time based - date to date, latest X weeks, latest X months, latest X years)

complete The ITD tool is capable to perform a research for a time period specified by the user. For the given time period the popularity of Google searches are presented in a time series graph which allows the user to identify significant dates, seasonal patterns, etc.

Extract results in various file formats (excel compat. required)

complete The ITD tool allows user to extract and download results in excel format.

Identify the right aspect/keywords for a specific topic in terms of popularity (e.g., “Brexit” instead of “bremain”, “eco- solutions” instead of “Economic war between Oil companies”)

complete The user can issue different queries for the same topic. Based on the extracted results the user can decide which keywords are more appropriate to be utilised for further research or for promoting his/her content.

Implement population lookup based on queries by topic dictionary and/or free keywords

complete The ITD tool allows users to specify the category (based on a predefined dictionary provided by Google) in which the keyword belongs.

Table 6: ITDT – benchmarking against user requirements

Page 26: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

26

User Requirement - MCSSR Prototype Benchmarking

Fulfilment Comment

Search for content (boolean search parameters) complete

Search for content that "matches" specific topics partially The MCSSR tool allows to search for content also considering the annotations set that has been generated from the AA tool. It’s possible to retrieve content that has been “marked” with a specific topic which is the object of the research.

Showing search results complete

Showing search results and showing keywords and tags that describe the topic(s), persons, locations and subjects of the content

complete Tags and Keywords are inherited from the repositories and from AAT.

Export platform specific metadata presets in XML format for the following platforms:

- iTunes

- Netflix

- Amazon Prime Video

- ...

partially Export of metadata stored on MCSSR and coming from the actual integrated repository where there is no one of the listed ones.

Search is possible using boolean operators complete

MCSSR has a complete video player where videos and audio can be played back

complete

MCSSR stores results of AAT tool and puts markers where e.g. a person or an object is detected

complete

MCSSR stores results of ITDT and puts markers where e.g. a topic detected

partially It was decided not to store any ITD data to MCSSR database. Although the ITD tool is fully integrated with the platform and exchanges related findings with the OCD tool, it maintains all necessary information in its own database.

Page 27: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

27

MCSSR will allow "bulk" upload of multiple assets (with choice of formats)

complete

Add "Target Group" entity partially The Target Group is managed by SRPT. The MCSSR give possibility to specify Target Group when invoking SRPT

Users can preview the selected component complete

Download and conversion of multimedia content complete

Add selected elements to a "collection" of related contents

complete

Extend metadata related to "enriched video" complete

Enriched video should be uploaded to MCSSR library

complete

Manual annotation of Multimedia Content complete

User role-based access control complete

Table 7: MCSSR – benchmarking against user requirements

Page 28: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

28

User Requirement - AAT Prototype Benchmarking

Fulfilment Comment

Automatic / semi-automatic annotation complete

Option for visualization of bounding boxes for persons/objects

complete this option actually forces the tool to provide the exact position of the item found.

Face recognition complete

Tools API complete

Receive as input manual annotations complete this req is provided through the JSON API

Objects detection and recognition complete

Table 8: AAT – benchmarking against user requirements

User Requirement - IEVC Prototype Benchmarking

Fulfilment Comment

Build interactive and clickable objects on videos complete

Detailed analytics of content complete Reporting of content actions to SRPT

Non-video content as linked info in video complete

Content needs to be on accessible servers for playout

complete

Tool needs to be able to import and filter data from automatic content description tool

complete

Include basic browse/edit functionalities (import/export/save/navigation/markup/enriched playout etc.)

complete

Managing 360° videos complete

Managing of subtitles not fulfilled IEVC tool is not the right place to manage subtitles (was discussed and agreed at 4th Plenary Meeting, Darmstadt)

Page 29: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

29

Managing different layers of metadata (versions) for the content

complete

Managing manual enrichments (add/remove) complete

Table 9: IEVCT – benchmarking against user requirements

User Requirement - 360°VPT Prototype Benchmarking Fulfilment Comment

360° analytics and object annotation well integrated complete Availability of whole rendering and playout process complete Get Player URLs for multiple Platform like HbbTV, Desktop, Android TV, etc

complete

Video processing should keep enrichments along with the respective hyperlinks

complete

Table 10: 360VPT – benchmarking against user requirements

User Requirement - SSF Prototype Benchmarking

Fulfilment Comment

Non-video content for second screen consumption complete Main and Second Screen Applications run in a Browser Context for example HbbTV on TV sets, Browser on Desktop and Mobile Browser or Hybrid Applications on mobile platforms like Android and iOS.

complete

(Pseudonymous-) Linking of TV to specific user device (could be ID number, QR code od DLNA discovery,

complete

Automatic device discovery, pairing and synchronization

complete

Table 11: SSF – benchmarking against user requirements

User Requirement - SRPT Prototype Benchmarking

Fulfilment Comment

User profiling functionality complete

Page 30: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

30

User profiling: Query result to be ranked based on user's profile

complete

User profiling: Stop timestamp of video not fulfilled UR was withdrawn based on later discussions since stop timestamp does not make sense in the user's preference

User profiling: Start & stop timestamp of video (duration)

complete

User profiling: Click on enrichments complete no enrichment on video User profiling: Share of enrichment in social networks, e.g. FB, Twitter etc

complete

User profiling: Explicit relevance feedback related to video (maybe after the playtime)

complete

Find #k similar users to the target user and/or users' clusters for collaborative recommendation

complete

Provision of user clustering for creating videos addressed to specific groups.

complete

Export results in both a common format (e.gg. excel) and IEVC compatible one

complete

Provide level of preference and ranking about each suggestion

complete

User interface for querying and data visualization complete GUI for the user input and the presentation of the output

User query protocol for targeted data visualization complete functionality of the above GUI as to how the user will be providing the input and how the output will be visualized

Users' metadata involved: gender, age, education, country, occupation, type of enrichment

complete

Decision interface for the user validation of the system output (suggestion)

complete GUI for the user to decide if he will choose the suggestion or not

Automatic interfacing with IEVCT tool complete

Table 12: SRPT – benchmarking against user requirements

Since the user requirements documented in Deliverable D2.1 were established for the individual tools only the benchmarking followed the same approach. In order to cover requirements for the integrated prototype as well (such as single-sign-on and data exchange between the individual tools) the requirement list was extended for the benchmarking procedure. The benchmarking results are shown in table 13.

Page 31: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

31

User Requirement - Integrated Prototype Prototype Benchmarking

Fulfilment Comment Expansion of the system can be achieved with a minimum of additional system administration burden and staffing.

complete

If possible, high resolution videos should be used for visualization of search results

complete

User/password authentication with Single-Sign-On complete Users rights and role management. complete user rights/role-based access

control Intuitive navigation to a variety of functions without having to move sequentially through excessive menus and screens.

complete

Highlighting and/or flagging of required and incomplete data fields.

partly is fulfilled for the most important data fields

APIs documentation complete

All tool interfaces according to the agreed data exchange table (Table 13) are implemented

complete

Table 13: Integrated Prototype – benchmarking against user requirements

Overall this benchmark provided very good results, with nearly all of the requirements were fully or at least partially met.

4.3.2 Objective benchmarking of ICCS tools 4.3.2.1 Automatic Annotation tool

For the benchmarking of the Automatic Annotation tool, evaluation on the algorithms used for each of the services the tool provides are going to be provided. In order to successfully accomplish its desired functionalities, the AAT is built on top of existing state of the art algorithms, tuned to the creation of documentaries which is the objective of the PRODUCER project. Both accuracy and speed are to be considered when using the tool, so tweaking to achieve the desired balance was considered necessary. More specifically, in the rest of the section, evaluation on the Face Detection, Face Recognition and Object Detection tasks are going to be presented. Face detection As documented in deliverable D2.2, the Automatic Annotation tool relies on established Viola-Jones algorithm [1] for the detection of faces within video frames. The algorithm is based on a cascade classifier, which evaluates a set of Haar-like features on every layer. From the analysis of the above algorithm, it has been proven that the speed of the detector is directly related to the number of evaluated Haar-like features. The evaluation of the related framework was performed using the MIT+CMU frontal face test [2], consisting of a set of 130 images with 507 labelled frontal faces. During the evaluation, the algorithm uses an average of 10 features

Page 32: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

32

out of a set of 6061 possible ones. This happens because a large set of images are rejected by the first and second layer of the cascade classifier and as a result, the Viola-Jones algorithm quickly rejects non-face images and spends most of its computation time to classify possible face images. On a 700 Mhz Pentium III processor, the face detector can process a 384 by 288 pixel image in about 0.067 seconds. This is roughly 15 times faster than the Rowley-Baluja-Kanade detector [2] and about 600 times faster than the Schneiderman-Kanade detector [3]. For the evaluation with other face detection frameworks, the Viola-Jones detection rate is compared with the rate of other relative frameworks at certain numbers of false positive guesses. For the Rowley-Baluja-Kanade results [2], a number of different versions of their detector were tested yielding a number of different results they are all listed in under the same heading. For the Roth-Yang-Ahuja detector [4], the authors reported results on the MIT+CMU test set, after removing 5 images containing line drawn faces.

Table 14: Evalutation of face detection frameworks

Face Recognition For the task of recognizing known persons in the set of detected faces in a video, the Automatic Annotation tool utilises the Local Binary Pattern Histograms framework [5]. The CSU Face Identification Evaluation System [6] was utilised to test the performance of the referenced algorithm. The system follows the procedure of the FERET test for semi-automatic face recognition algorithms [7] with slight modifications. The system uses the full-frontal face images from the FERET database. The CSU system uses the same gallery and probe image sets that were used in the original FERET test. Each set contains at most one image per person. These sets are:

1. fa set, used as a gallery set, contains frontal images of 1196 people 2. fb set (1195 images). The subjects were asked for an alternative facial than in fa

photograph

Page 33: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

33

3. fc set (194 images). The photos were taken under different lighting conditions 4. dup I set (722 images). The photos were taken later in time. 5. dup II set (234 images). This is a subset of the dup I set containing those images that

were taken at least a year after the corresponding gallery image. In the CSU system the LBPH algorithm was compared with implementations of the PCA, Bayesian intra/extrapersonal (BIC) and Elastic Bunch Graph Matching (EBGM) face recognition algorithms. For the BIC algorithm, the Maximum A Posteriori (MAP) decision rule was used. The following table shows the comparative results, using both a weighted dissimilarity measure version of the LBPH algorithm and an unweighted one

Table 15: Face recognition evaluation

Object Detection Finally, for the object detection task, the Automatic Annotation Tool uses deep learning techniques and more precisely a Convolutional Neural Network. The approach we implemented uses a Single Shot Detector (SSD) [8], to detect areas of objects and bounding boxes simultaneously. Furthermore, a MobileNet is used to detect the specified class of the object [9]. SSD with MobileNet provides the best accuracy trade-off within the fastest detectors. Other popular object detection techniques with CNN include Faster R-CNN, Fast R-CNN, YOLO, Feature Pyramid Networks (FPN), Region-based Fully Convolutional Networks (R-FCN), RetinaNet etc. Results have been produced using the MS-COCO dataset for training. The most important ones deal with the trade-off between speed and accuracy

Page 34: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

34

Figure 2: Object detection comparison

As we can see from the figure, the SSD with MobileNet proves to be the faster detection solution with a pretty acceptable accuracy level by being the closest solution to real-time object detection. Overall results are presented in the following table:

Another interesting result concerns the way the input image frame affects the prediction. If we take a look on the resolution of the images we will realize that higher resolution input frames improve the accuracy level. In contrary, lower resolution images decrease the accuracy level by a percentage close to 16%, whereas providing a decrease in inference time by 25%

Page 35: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

35

Figure 3: Effect of image resolution

Moreover, object size within the input frame could be also a critical point. SSD performs pretty well with MobilNet when dealing with large objects. In contrary the performance is much worse with small ones.

Figure 4: Object detection performance comparison

Page 36: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

36

In summary we can state that SSD with Mobilenet is the solution with the most balanced tradeoff between accuracy and speed. Input image resolution is a critical fact and impacts accuracy significantly. Reduce image size by half in width and height lowers accuracy by 15.88% on average but also reduces inference time by 27.4% on average. Moreover, for large objects, SSD can outperform Faster R-CNN and R-FCN in accuracy with lighter and faster extractors. But for small objects, even though SSD runs fast, the overall performance of the algorithm is much worse. Finally, taking into consideration the above results, the decision of the object detection method to implement deals with the specific goal of the current project. Especially, on PRODUCER project, taking into consideration that we mostly have to deal with high-resolution and professionally shot content we decided to implement a method focusing on speed while keeping more than acceptable accuracy levels.

4.3.2.2 Social Recommendation and Personalization tool

Part of the benchmarking procedure was performed for the evaluation of the effectiveness of the algorithms used for the generation of the feature vectors of the content. As discussed, the first step into providing the appropriate recommendations to users is the meaningful representation of the content. In our tool, we represent the content as a vector, where each element is one of the 14 categories we have specified, and the value is the percentage to which the content is relevant to this category. Since these scores are not provided manually by the content creators, ways to extract or approximate them need to be implemented. In order to achieve the task, we made use of the Word2Vec model [10]. The Word2Vec is a model of a shallow two-layer neural networks that is trained to find linguistic context of words. It takes as input a word and returns a unique representation in a multidimensional vector space. The position of the word in this vector space is such that words that share common contexts are located in close proximity with each other. The models used in the evaluation process are four pretrained models [11] on Wikipedia 2014 in glove representation [12] after we passed them from a transformation process to fit the Word2Vec representation, which contain a vocabulary of 400k words and 50 dimensions, a 100 dimensions, a 200 dimensions and a 300 dimensions vector representation respectively, as well as a pre-trained model on Google News with a vocabulary of 3 million words with a vector representation of 300 dimensions.

In order to test the efficiency of those models, we performed evaluation of the models on two different tests, the default accuracy test of word2vec models questions-words [13], and a more sophisticated Miller Analogies Test [15].

Α) Question-words test

This test consists of 19544 sets of 4 words, and is used to test how well a generated vector model does with analogies of different kinds: For example, capital (Athens Greece Baghdad Iraq), currency (Algeria dinar Angola kwanza) etc. The idea is to predict the 4th word based on the three previous ones.

Page 37: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

37

Once vectors from a corpus with sentences containing these terms is generated, the question-words file can be used to test how well the vectors do for analogy tests (assuming the corpus contains these terms). So, given an example from question-words.txt Athens Greece Baghdad Iraq, the analogy test is to look at nearest neighbors for the vector

Vector(Greece) - Vector(Athens) + Vector(Baghdad)

If the nearest neighbour is the vector Iraq then that analogy test passes. The test set can be found in [13]. After running the question-words test for all four models, we collected all the successful and unsuccessful attempts of the algorithm and so the following table is formed.

Model Correct Incorrect

Wikipedia 50d 49.69% 50.31%

Wikipedia 100d 65.49% 34.51%

Wikipedia 200d 71.98% 28.02%

Wikipedia 300d 74.05% 25.95%

GoogleNews 77.08% 22.92%

Table 16: Evaluation of models on question-words

All models perform pretty good with at least once in two successfully predicting the missing word for the smaller model (Wikipedia 50d 49.69%). What we notice is that the larger the model, the better the performance. Both larger vector representations and larger vocabulary contribute to the increase in the percentage of the correct predictions, as well as the quality and length of the corpus used to train the model. As we can see from the results, the Google News model clearly performs the best with a success rate of 77% but due to its size, it is not very practical on small infrastructures such as the one used for our prototype.

B) Miller Analogies test Since the questions-words analogies are not very sophisticated, consisting of relationships like city to nation and currency to nation, and syntactic relationships like adjectives to adverbs, we decided to test the models on some more interesting analogies as well. The Miller Analogies Test is a test given to students, consisting of analogies quite diverse, testing logical and analytical reasoning. The questions are in the form “A is to B as C is to what” with 4 choices to choose from. The analogies dataset was found on [14].

It should be noted that by doing a random selection between the choices, the success rate should be 25% and can be used as a baseline. It should also be noted that the wikipedia models

Page 38: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

38

attempted 123 analogies, since some questions contained words not in the vocabulary of the model, or questions about numbers which are out of scope for our natural language use case. On the other hand, GoogleNews attempted 145 since it has a richer vocabulary (*), while also avoiding same questions either not in the vocabulary or about numbers.

After running the test for all models, we collected the results shown in the following table.

Model Correct Correct %

Wikipedia 50d 52 42.28%

Wikipedia 100d 66 53.66%

Wikipedia 200d 67 54.47%

Wikipedia 300d 70 56.91%

GoogleNews 94* 64.83%

Table 17: Evaluation of models on miller analogies test

According to [15], the last three Wikipedia models’ performance places them among the top 30%-40% of the applicants to graduate school programs, while the Google News model performs equal to the top 15%. The results are more than impressive, signifying that computers are now able to complete this rather complex task of successfully deducing relationships between words.

For the implementation of the tool prototype, as well as the one used in the evaluation with real users that we are going to extensively present in a following section of the deliverable, we decided to use the Wikipedia 100d model which achieves a lower but comparable success rate but is considerably smaller and more memory efficient, and so is much easier to use while testing the functionalities of the platform. In a larger infrastructure used in a production environment, a larger model (Google News) could fit best.

C) Examples from out database

To test the efficiency of the Word2Vec model on the actual problem of finding the relevance the video has in each of the 14 categories, we did some evaluations on the actual data we had in our video database. The idea behind the evaluation is to provide the title together with some tags and the description of the video, and the neural network should be able to successfully deduce this relevance. The more available metadata each video has, the better the result of the algorithm is expected to be. For this evaluation process, we used the Google News model which is the best performing one, and which we expected to have the most accurate representations.

As far as the scoring is concerned, each word passes through the neural network and the similarity between the word and each category is calculated. To calculate the overall similarity score, we use a linear combination between the maximum score from all words on the document and the average score of the words. The average score is used in order to reduce the

Page 39: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

39

chance that a word that appears few times in the text, and is very close to the category in question, skews the result too much in its favour.

Title

A new American toy

Description

A new game comes from United States: a small stick that one has to rotate fast with the fingers of the hand. Its name is like that of the famous dance: Hully-gully

Tags

Curiosities, Hobbies

Art Business Computer Education Game Health Home

0.2773 0.2467 0.2156 0.1612 0.7377 0.1342 0.2602

News Recreation Science Shopping Society Sport Child

0.1711 0.2293 0.1882 0.2281 0.1621 0.2640 0.2468

Table 18: Scoring for video “A new American toy”

As we can see from the above table, the algorithm successfully categorizes the video in the “Game” category, something that is reasonable since it is about a new toy, giving it a very higher score than the rest of the categories.

Title

Football and Ski Jumping

Description

Germany: the German national team beats the Hungarian one for 1-0 in Hannover. Oberstdorf: the Russian Kanenski wins the competition of ski jumping

Tags

Football, Ski Jumping

Art Business Computer Education Game Health Home

0.1437 0.1851 0.1244 0.1525 0.3986 0.1879 0.2335

Page 40: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

40

News Recreation Science Shopping Society Sport Child

0.1652 0.2656 0.1473 0.1767 0.1325 0.3960 0.1538 Table 19: Scoring for video “Football and Ski Jumping”

In this video on the other hand, the algorithm decides to give a higher score to both “Game” and “Sports” category while keeping the other categories lower. This decision seems accurate since in the previous video, the “American toy” can only be a game while “Football” or “Ski Jumping” is both a game and a sport.

Title

Green Gold - L'or vert trailer

Description

Green Gold is an investigative documentary that takes us to the heart of renewable energies and the agrofuels that at one time were presented as the solution to the three major crises facing the world: the energy crisis, the economic and financial crisis, and the environmental crisis. Today the abandoning of fossil fuels in favour of renewable energies has become an imperative. But is our economy compatible with a clear and fair mode of development? What will be the price to pay for the continued existence of our model of society? An essential subject for the safeguarding of our model of society, the reproduction of life on earth and for the survival of humankind. Every day, nature sees its strength shaken that little bit more. But nature has not yet said its last word.

Tags

Crisis, Energy

Art Business Computer Education Game Health Home

0.1902 0.3422 0.1516 0.2559 0.2433 0.3060 0.2268

News Recreation Science Shopping Society Sport Child

0.2367 0.2356 0.2727 0.1637 0.7371 0.2200 0.2103 Table 20: Scoring for video “Green Gold”

The above video is a documentary provided by DOMINO. As we can see from the description, it is an investigative documentary that talks about some major social issues such as the energy crisis and economical and financial crisis. Our algorithm successfully captures this connection by giving a high score in the “Society” category while giving a rather high score (but significantly lower) to the “Business” category since it has incorporated the knowledge that business and economics/finance is semantically close.

Page 41: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

41

Title

Documentary about Leonardo da Vinci

Description

Learn more about the life and the achievements of the Italian Renaissance polymath Leonardo da Vinci. His areas of interest included invention, painting, sculpting, architecture, science, music, mathematics, engineering, literature, anatomy, geology, astronomy, botany, writing, history, and cartography. He has been variously called the father of palaeontology, ichnology, and architecture, and is widely considered one of the greatest painters of all time. Sometimes credited with the inventions of the parachute, helicopter and tank, he epitomized the Renaissance humanist ideal

Tags

Sciences, History

Art Business Computer Education Game Health Home

0.4384 0.2049 0.2507 0.3659 0.2058 0.2527 0.2254

News Recreation Science Shopping Society Sport Child

0.1680 0.2531 0.7533 0.1324 0.3194 0.1944 0.3393

Table 21: Scoring for video “Documentary about Leonardo da Vinci”

In this example we have a documentary provided by Mediaset and concerns the life of Leonardo da Vinci. From the description provided we can see that he was a scientist as well as an artist, and so the algorithm gives a high score to “Science” and a lesser one but still high score to “Art” categories.

Title

Documentary about Archimedes

Description

Learn more about the life and the achievements of the famous Greek mathematician, physicist, engineer, inventor, and astronomer, Archimedes of Syracuse. Although few details of his life are known, he is regarded as one of the leading scientists in classical antiquity. Generally considered the greatest mathematician of antiquity and one of the greatest of all time with great influence on modern science and engineering.

Tags

Page 42: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

42

Sciences, History

Art Business Computer Education Game Health Home

0.3139 0.2045 0.2476 0.3440 0.2085 0.2577 0.2231

News Recreation Science Shopping Society Sport Child

0.1910 0.2110 0.7487 0.1232 0.3195 0.1963 0.2097 Table 22: Scoring of video “Documentary about Archimedes”

On the other hand, in this documentary also provided by Mediaset, the life of Archimedes is presented who as we see was also a scientist but not an artist. Since there are no words or phrases showing connection between Archimedes and art, the algorithm categorizes the documentary as “Science” and keeps the “Art” score significantly lower than the “Documentary about Leonardo da Vinci”.

Title

From an animal's perspective, part 1

Description

Part 1 of a series about pets. Everything from rescue dogs, not so wild cats and lovable rabbits. How to train your dog and some veterinary advices.

Tags

Animals

Art Business Computer Education Game Health Home

0.1696 0.1908 0.1601 0.2033 0.2376 0.3035 0.2379

News Recreation Science Shopping Society Sport Child

0.1384 0.1719 0.2239 0.1310 0.1666 0.1877 0.3086 Table 23: Scoring of video “From an animal’s perspective, part 1”

Finally, we have a video provided by Mediaset concerning animals and pets. The algorithm, gives the highest score to “Child” because it finds a connection between pets and children and “Health” since as we can see the video contains veterinary advices.

Page 43: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

43

As already mentioned, the more metadata each video has, the more accurate is the representation provided by the algorithm. It should be noted though that since the method used cannot capture all real-life wordings (e.g. sarcasm), the algorithm is sensitive to false or unclear wording.

D) Recommendation algorithm evaluation via simulations

In order to evaluate the performance of the algorithm used in the Social Recommendation and Personalization Tool, we also performed some offline experiments via simulations on MATLAB. In order to achieve this task, sets of content items are given a scoring on the 14 categories, and sets of users with a specified behaviour are created. Based on their behaviour, the users have different probabilities on performing actions on a content item, depending on the relevance and thus the likelihood that the user is interested in the item. Although the users are artificial, we make reasonable assumptions trying to emulate a real-life user behaviour.

In our simulation we have created 50 videos, having 8 enrichments and 8 advertisements each, and a feature vector of 14 categories. Videos are assigned into 5 classes, where in each class, ⎣"

#⎦ = 2 elements get a higher score, corresponding to different video topics (e.g. arts and

science). 30 users are created to interact with the content and are again divided in 5 classes, in a similar way as the videos. Each user class implies different interests and preferences and so users that tend to select different videos and enrichments. The simulation consists of 200 recommendation rounds where, in each round, a list of 6 most relevant videos according to the current profile of the user is presented him, in a ranked order. The hybrid recommendation approach we are using combines the content and the collaborative recommendation approach, as already described in D3.3, as a linear combination

()*+,

-(), 0) = 2 ⋅ ()*+,456(), 0) + (1 − 2) ⋅ ()*+,

46(), 0), 0 ≤ 2 ≤ 1, where 2 is a tunable parameter.

For the collaborative part of the algorithm, we randomly assign 7 users as his friends and we use the 5 closest to the user as neighbours, which are the ones whose profile vectors are used to provide the collaborative recommendations.

As similarity measure, we use a tuneable parameter )<=>? and we perform a comparative evaluation between inner product, cosine and Euclidean similarities. More information on the similarity measures and the results will be presented later on.

As mentioned, user behavioural vectors are used to simulate how users interact with the video, and more specifically 5 interactions are considered: ● Percentage of video watched ● Number of clicks on enrichments ● Number of share of enrichments ● Number of click on advertisements ● Explicit relevance feedback

interactions which are the same as the ones used in the actual tool.

Page 44: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

44

Videos are watched by the user based on the video ranking the algorithm provides, and with a probability relevant to the video’s rank and the user’s behavioural vector, the user performs or not the above actions. The probabilistic nature of the process is used so that not all users perform all actions, as well as capture the realistic tendency of users following particular behaviour based on their actual interest.

After the user has finished his actions, an update procedure follows, similar to the one performed by the tool and described in D3.3. The importance given to each interaction is signified by the parameter @A = [@C

A, @5A, @4

A, @DA, @6

A] and in our evaluations, we consider

@A = [10,10,10,10,20]

It should be noted that most of the parameters have been chosen to provide the best results based on the work of E. Stai et al. [18], parameters which were also used on the implementation of the Social Recommendation and Personalization tool.

In order to reduce the randomness from our results, we run the experiment 10 times and calculated the average values on our figures.

Evaluation Metrics The system is evaluated based on three metrics, in order to measure its effectiveness. The

metrics used are the Profile Distance, the Discounted Cumulative Gain and the R-score [16]. ● Profile Distance

The Profile Distance metric, measures the difference between the generated profile score of the users from the tool and the actual predefined profile score that corresponds to the actual interests and preferences of the user. In the simulations, this corresponds to the Euclidean distance of the user profile and the user behaviour vector. From the measurement of the metric we can see if the user vector converges to the actual interests through the constant update based on the interactions of the user with the content and from its change over time, measure how fast, given a new user with no profile, this convergence takes place.

● Discounted Cumulative Gain

Another method of evaluating the system is by measuring how “correct” is the ordering of the recommendations the tool provides to the specific user. Since actually knowing the correct ordering is impossible, we approximate it by assigning a utility score to the recommendations list, which is the sum of the utility score each individual recommendation has. The utility of each recommendation is the utility of the recommended item, as a function of the explicit feedback provided by the user, discounted by a factor based on the position of the recommendation on the list. This metric assumes that the recommendations on top of the list, are more likely to be selected by the user, and thus discount more heavily towards the end of the list. In the Discounted Cumulative Gain, the discount, as we go down the list, follows a logarithmic function and more specifically,

Page 45: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

45

FGH =IJ

2KL − 1MNOP() + 1)

where ) is the item position in the list and QJ is the user’s rating on the item ). The base of the

logarithm typically takes a value between 2 and 10, but base of 2 is the most commonly used [17].

● R-score

The R-score follows the same idea of evaluating the “correct” ordering of the recommendations but instead of a logarithmic discount, it uses an exponential one. Since the items towards the bottom of the list are mostly ignored from the scoring, the R-score measure is more appropriate when the user is expected to select only a few videos from the top of the list.

The equation that is used for the calculation of the R-score is the following one,

@ = IJ

*RS(QJ, 0)

2JTUCTU

where ) is the item position in the list, QJ is the user’s rating on the item ), and R is a tunable parameter that controls the exponential decline [16]. At this point we should mention that on our evaluation we are not using the normalized DCG and R-score measures since those ones require the knowledge of the actual ideal values. Since we perform comparative evaluations of the SRP tool for different parameters of the algorithm, we do not consider it necessary to further complicate the evaluation scenarios with assumptions of the ideal values, which would involve further assumptions on the user behaviour and actions.

Simulation Results

In the first part of the evaluation, we chose as similarity metric the Euclidean similarity and tuned the 2 parameter for the hybrid recommendation scheme. The 2 values used on this part of the experiment are: ● 2 = 0 for collaborative recommendation only, ● 2 = 1 for content-based recommendation only, ● 2 = 0.5 for the hybrid approach where both content and collaborative

recommendations are equally taken into account.

In the following figure, we can see how the Profile Distance between the generated user profile and the expected one is affected with respect to theta. The smaller the distance, the more accurate the final representation of the user is, concerning his interests and preferences. As expected, the content-based only approach is the best performing one on this metric, while the hybrid approach’s performance is close, since using only his own profile, the algorithm can easier tune it towards convergence. The least successful one is the collaborative approach only with significant distance from the other two, which is expected since the algorithm tries indirectly to deduce the user’s profile through the profile of his friends. Even though the hybrid approach uses both content based and collaborative methods, its performance on the metric is

Page 46: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

46

more than satisfactory, while making use of the advantages provided by the collaborative method that we will discuss later on.

Table 24: Profile distance versus time with respect to theta values

The next figure shows the Discounted Cumulative Gain of the recommendations provided over time. We can also see that the two best performing approaches are the content only and the hybrid approach, with the collaborative only following third. Again, the difference between the content only and the hybrid approach is not significant, validating once more the effectiveness of the hybrid approach.

Figure 5: Discounted cumulative gain versus time with respect to theta values

Page 47: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

47

In the final figure we present the R-score of the recommendations list over time. The graphs follow the same pattern with the DCG, and so the hybrid approach succeeds in providing successful recommendations both on the total list and on the top recommended items.

Figure 6: R-score versus time with respect to theta values

The main disadvantage of using content-based only recommendations is the over-specialization of the algorithm on the user’s choices. Collaborative filtering is important in introducing novelty and diversity in recommendations that allow the user to find interesting content that he would otherwise have missed. The element of surprise is important for a recommendation system and such diverse recommendations could lead a user in unexpected paths in his research as well as help him evolve his own taste and preferences. This fact cannot be easily captured in an offline experiment and requires online experimentation.

Another problem the content-based only approach has to face is the cold start problem. When the system does not have enough information for a user, the system is basically unable to provide any meaningful recommendations. In this case, his friends network can be utilized to make use of information for users the system already has, and the recommendations provided are significantly more accurate. As a result, to overcome the problem, the collaborative approach seems effective. From our analysis we can see that the hybrid recommendation scheme constantly achieves a smooth performance and thus successfully combines the advantages of both content and collaborative based filtering approaches.

For the next part of the evaluation, we compare the different similarity metrics used in our algorithms. In this experiment, we fix the theta parameter to 2 = 0.5 which corresponds to the hybrid recommendation scheme. As mentioned, the parameter )<=>? used in our simulation specifies the similarity measure used by our algorithms and corresponds to:

Page 48: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

48

1. Inner product similarity

()*)MRQ)?X = Y ⋅ Z

2. Cosine similarity

()*)MRQ)?X = [N((2) =Y ⋅ Z

‖Y‖‖Z‖

3. Euclidean similarity

()*)MRQ)?X =1

1 + ](Y, Z)

](Y, Z) = ^IJ

(SJ − XJ)P

where ](Y, Z) is the Euclidean distance of the two vectors.

In the next figure we can see that the Euclidean similarity is the best performing similarity measure, achieving a slightly better score than the cosine similarity, while the inner product similarity is the worst performing. What’s more, the Euclidean similarity seems conceptually more appropriate in our use case, since each user and each item can be modeled as a point in the 14-dimensional metric space and the closer they are on the space, the more similar they are.

Figure 7: Profile distance versus time with respect to different similarity measures

The Discounted Cumulative Gain figure follows the same trend, showing that the Euclidean similarity outperforms the other two similarity measures by providing better overall

Page 49: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

49

recommendation lists to the user. The inner product, which is the simplest one, still performs the worst.

Figure 8: Discounted cumulative gain versus time with respect to different similarity measures

Finally, concerning the R-score, the Euclidean and the cosine similarity achieve the highest score with minor differences, while the inner product achieves significantly lower score. The fact that the two first measures perform almost the same while in the DCG metric, the Euclidean performs better could mean that the Euclidean similarity can better fine tune the lower scoring recommendations since even the lower scoring items, that the R-score ignores, are more likely to be more relevant to the user’s preferences.

Figure 9: R-score versus time with respect to different similarity measures

Page 50: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

50

More simulations concerning the parameters used can be found in the work of E.Stai et.al [18].

4.3.2.3 Integrated Trends Discovery tool – Gender Inference algorithm

During the preproduction phase of a documentary, producers are highly interested in estimating trends in correlation with potential audiences’ gender and age classification. This kind of information is not freely available from social media services due to user privacy protection data policies. There are various state of the art attempts that focus on inferring user demographics through probabilistic approaches based on user related data freely available on social media (e.g., tweets content, linguistic features, followers’ profile) [15][16] The Producer platform and the ITD tool tackles the task of age and gender estimation through the utilization of classification algorithms trained with ground-truth datasets of a number of twitter users. Twitter service proved to be the most proper for extracting user profile information as Twitter account data and content are openly available. The trained network is then utilized in order to generalize the training process and estimate missing information from wider networks of twitter users. Part of the benchmarking procedure was performed for the evaluation of the effectiveness of the algorithms used for the inference of demographic characteristics and more specifically the gender of individuals that have contributed through their Internet interaction (e.g. posting a Tweet) in making specific keywords to trend. For identifying users’ gender, ITD tool uses Twitter due to its openness and availability of its users’ profile information along with the respective user generated content (tweets). A detailed description of the state of the art on this research topic along with the results achieved through the method that has been developed for the needs of the ITD tool has been published on two research papers entitled:

• “Gender recognition based on social networks for multimedia production” which was presented at the international workshop “IEEE Image, Video, and Multidimensional Signal Processing (IVMSP)” that took place on 10-12 June 2018 in Aristi-Greece.[xx]

• “Social Media Analytics in Support of Documentary Production” which was presented at the “10th International Conference on Creative Content Technologies CONTENT 2018” Issue 9, Pages 7-13 and took place on 18-22 February 2018 at Barcelona / Spain.

The ITD tool’s mechanism for identifying the gender of Twitter users couples three standalone classifiers that use as input the user name, profile photo, or theme colour preference to infer the users’ gender. Our evaluation indicated (figure 10) that both Support Vector Machine and Probabilistic Neural Networks classifiers perform excellent, resulting in ~87% accurate results. When gender estimation confidence minimum thresholds are defined, PNNs are in principle more accurate, while SVM gender classifiers perform better in terms of coverage for thresholds up to 85%. Finally, one of the approach’s undisputable advantages is that it is lightweight, scalable and requires minimal resources to be engaged.

Page 51: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

51

Figure 10: Accuracy and Coverage for PNN and SVM Hybrid Classifiers on identifying Twitter user gender.

4.3.3 Objective benchmarking of FOKUS tools For benchmarking 360VPT, Fraunhofer FOKUS did comparison of all existing 360° solutions and evaluated the tool against factors like Bandwidth and Processing. The following sections will explain them respectively.

1. Bandwidth

To compare the required bandwidth of the different solutions, we did consider a 2160p (4k, 3840 x 2160) 360° equirectangular source video and field-of-view (FOV) with 60° vertical angle and 106.7° horizontal angle which is equivalent to 720p (1280 x 720) resolution. The recommended upload bitrate (according to YouTube) for 4k videos with H264 encoding is 35-45 Mbit/s and for a 720p video is 5 Mbit/s. The average required bandwidth for streaming the source 360° 4k video is 8x higher than for streaming 720p FOVs. We took eight videos for experiment from different broadcasters (Arte, ZDF, RBB, and BR) to compare the bandwidth between the source 360° video and FOV videos. The source videos have a 4k resolution, H264 encoding with framerate of 30fps. We generated for each of these videos the corresponding FOVs using our solution and then for each video, calculated the average FOV bitrate. The figure below shows the comparison between the bitrates of the source 360° videos (in blue) and average bitrates of FOV videos (in orange). The observation showed that only two videos had the recommended bitrate (by YouTube) for 4k video. Majority of the cases showed that FOV videos bitrates were 8x less than the bitrates of source 360° videos. This is due the fact that equirectangular videos contain redundant pixels especially on the polar regions.

Page 52: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

52

Figure 11: Bandwidth evaluation

2. Pre-rendering and Processing

Instead of live rendering for each client in a separate session, the solution pre-renders the 360° source video in a pre-processing step. It also solves the problem of scalability along with reduced bandwidth requirements as discussed in the above point and later streams only individual field of views (FOVs) to clients like TVs, PCs and mobile devices. Another advantage of this solution is that there are no additional requirements in terms of processing and programmatic capabilities on the client since the 360° video processing and view calculation happens in the cloud. The client is a simple video player and needs only to play a live stream that contains an individual field of view at a specific time.

Figure 12: 360° Pre-Rendering

Hybrid TV Terminals like Hybrid broadcast broadband TV (HbbTV) fall into the category of devices that are suitable to use this solution. These devices are not capable,

Page 53: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

53

in terms of programmatic features, of performing the necessary image transformations for rendering 360° video content. The advantages of this option are summarized below: ● No additional bandwidth is required comparing to traditional video streaming. ● No processing resources are required on the client. ● No need for additional APIs to process the video so any video player can be

used. ● No special hardware requirements comparing to traditional video playback. ● No video processing on the server is required during streaming.

4.3.4 Subjective benchmarking by project partners The following table provides an overview of subjective benchmarking tests conducted by project partners. Some of the results are reported in the external benchmarking sections because external benchmarkers were included in the benchmarking activities. In the “guided” column benchmarks are listed which were conducted by a PRODUCER project member showing tool functionalities either online or by demonstrating videos and/or screenshots to the benchmarkers with subsequent feedback collection by using the tool questionnaires as well. The “unguided” benchmark tests were als based on either online testing or video representations of the toolset but not directly supported or guided by a tool presenter. The feedback was only collected in filled questionnaires.

Guided Unguided

ABT MEDIASET, DOMINO -

ITD ICCS (see external benchmarking results), DOMINO

ICCS (see external benchmarking results)

AAT ICCS (see external benchmarking results)

ICCS (see external benchmarking results)

OCD ICCS -

MCSSR MEDIASET -

IEVC Domino Production (as part of documentary movie production)

-

Page 54: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

54

SSF Domino Production (as part of documentary movie production)

-

SRPT ICCS (see external benchmarking results)

ICCS (see external benchmarking results)

360VPT Domino Production (as part of documentary movie production)

-

Int. Prototype

Flying Eye -

Table 25: Overview of subjective benchmarking tests

4.3.4.1 Subjective benchmarking by MEDIASET Mediaset benchmarking is focused on testing the functionalities of two PRODUCER platform tools: ABT (Audience Building Tool) and MCSSR (Multimedia Content Storage, Search & Retrieval Tool). A no hands-on guided demo was performed by Mediaset in order to show the potential of each tested tool, as agreed with the consortium. The benchmarking is based on a specific topic identified by the most popular interests on socials survey. Specifically, in the last years a growing interest for animals has been detected and market research has revealed that the adoption of pets has significantly increased, indeed, parents give often their kids a puppy dog for Christmas or their birthday. Thus, animal subject was chosen as topic for the scenario realization, also thanks to a rich archive of several contents about animals provided by Mediaset and Domino. The internal benchmarking activities has been directed to end users and professional users.

4.3.4.1.1 MCSSR The aim of benchmarking related to the MCSSR is to demonstrate that this tool is able to storage different kinds of contents, such as videos, images and texts in organized archives and retrieve easily these contents by means of metadata. The contents stored in the tool are both free, sent by the OCDT, and proprietary of Mediaset and DOMINO. The test has been realized by inserting as keyword “dog” or “animals” and by showing the different contents resulting by the search. For instance, videos such as “Dalla Parte degli animali” of Mediaset, “Dog in Siberia” of Domino and images and videos coming from the OCDT showing dogs and other animals in some nice scenes have been visualized.

Page 55: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

55

The tool enables to modify the format conversion and download the content of interest. Manually addition of tags for each content is also allowed to search the specific content more easily. Moreover, an additional functionality has been showed during the benchmarking. This functionality concerns the automatic annotations of videos to detect and recognize objects and faces on the screen frame by frame, performed by clicking on the annotate button.

4.3.4.1.2 ABT The aim of benchmarking related to the ABT is to demonstrate the innovative tool functionalities related to generating an ecosystem where producers, investors and audience members can build social relations, promote documentaries with the help of gamification, connect their promotion efforts over other social media channels, retrieve key performance indicators that reveal audience interests on the documentary under-production or on candidate documentary topics. The tool has been benchmarked and tested by using different mockup/demo user accounts to show the functionalities related to the producer, investor and the audience user types according to a specific benchmarking scenario. In the benchmarking scenario, the first step was the creation of a campaign about animals on ABT. All functionality was tested, such as connecting the campaign with a mockup Facebook page and Twitter account and being able to sync content between the ABT campaign timeline and the social accounts in both directions. According to the scenario the purpose of the campaign was to discover which topics on animals are the most appealing to audiences. In order to achieve this goal and test the relevant functionalities of ABT the producer-user published on ABT campaign several pictures, articles and videos and shared them on different socials, as Facebook and Twitter. The published posts have concerned pictures and videos of dogs and cats and articles related to the mistreatment and abandonment of animals. For testing the abilities of ABT to collect and integrate audience reactions on the campaign and generating the related Key Performance Indicators through which the producer can quickly understand the appeal of the alternative topics he considers, the scenario involved also the creation of mockup audience accounts and their usage in order to interact with the campaign content (submitting likes, posts, sharing over social media etc.). Additional abilities of creating friend networks between producers, investors and audience members within ABT have been tested and demonstrated as well as part of the scenario. Finally, the benchmarking scenario involved the demonstration of ABT’s gamification capabilities. This involved the creation of a gamified contest aimed at the public for motivating audience to participate/provide feedback either on the ABT campaign timeline or over the connected social channels. The contest consists in establishing some specific rules to allow to participants to get some prizes as result of reactions, insertions of comments and sharing some posts. Overall the technical performance of the tool was very good, with acceptable response times and without any glitches or errors, while the user experience was considered great since the tool achieves to hide from the user the underlying complexity for achieving its functionality.

Page 56: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

56

4.3.4.2 Subjective benchmarking by DOMINO Production The DOMINO benchmarking covered the evaluation and testing of the following PRODUCER tools: - ABT (Audience Building Tool), - ITD (Integrated trends discovery tool) - IEVCT (Interactive enriched video creation tool). DOMINO Production also produced a 360° documentary to evaluate the IEVCT technology and used the tool for its own 360° documentary production. The benchmarking process for the above-mentioned tools was conducted using three DOMINO documentaries that cover the following topics: environment (“Green Gold”), politics (“Game of Truth”) and ethnography (“Yamal”). The internal benchmarking and evaluation activities have been communicated to end users and professional users such as Sales Agents, Directors and Producers.

4.3.4.1.1 ITDT The evaluation and benchmarking focus of DOMINO concerning the ITDT was mainly to detect the “generic popularity” of a documentary topic and to understand the importance of specific topics in different regions. The idea was to find out where a specific documentary is relevant, is talked about and help the documentary producer to set up the best strategy for its documentary in order to have access to the international and regional markets. The understanding of the different markets is crucial for production companies like DOMINO because of the financing necessities that usually require partners from different TV stations worldwide. ITDT: Evaluation & Benchmarking of the “topic” functionality “Green Gold” is an investigative documentary that takes us to the heart of renewable energies and the agrofuels that at one time were presented as the solution to the three major crises facing the world: the energy crisis, the economic and financial crisis, and the environmental crisis. We wanted to test the awareness of the public about the links between biofuels and deforestation. The ITDT showed that, for the public, the perception of a “link” between these topics was clear and that the ‘biofuels-deforestation’ had a negative perception by the public opinion. We also realized that the public opinion was mainly dominated by males. In order to improve our evaluation, we tested the word “biofuels” with several tags such as transport, collusion etc. These tests confirmed that the public awareness exists, especially in countries where deforestation is a major issue.

4.3.4.2.1

Page 57: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

57

ITDT: Evaluation & Benchmarking of the “region” functionality We tested the ITDT for “Game of truth”, a documentary on collusion during the trouble war in Northern Ireland. We wanted to know which regions were interested in the topic. After using and testing the ITDT we realized that the interest in this topic was mainly in Ireland, UK, Canada and United States. In order to find financing and TV partners, DOMINO decided to go to “Hot Docs” in Toronto where the most important documentary market in the world is held and where all the American, Canadian, English and Irish buyers are. We pitched our documentary project “Game of truth” at the “Hot Docs” market in 2018. When we arrived in Toronto, we had the opportunity to go to the screening of the Avant-Première of an Irish documentary on the trouble war, on the life of an IRA activist entitled “I, Dolours”. After the screening we met the producer Nuala Cunningham (who we did not know before) and concluded a co-production agreement by the end of the day because “Game of truth” was a documentary that they wanted to produce and they did not succeed. The financial inputs of the Irish sales agreement could be around 120.000 €, a quarter of the budget. Thanks to ITDT, we found about 25% of the production budget for this project.

4.3.4.1.2 ABT The objective of the Audience Building Tool (ABT) is to establish an audience / community through social media using one tool. Instead of using Twitter and Facebook separately, we are using a standalone tool for creating a campaign to establish an audience & community that is interested in the topic of the documentary. The second objective is to monitor the audience engagement. ABT presents all necessary KPI’s related to monitoring audience’s engagement in a unified way over all channels (ABT, Facebook, Twitter). Time saving with social media activities is a key issue and a major benefit for producers. For the benchmarking of the ABT, we used “Game of truth” and “Green Gold” in two specific use cases. First use case: “Game of Truth” “Game of truth” is a documentary that is still in pre-production phase. We are testing the viability of the project and intend to attract the interest of TV channels, co-producers and viewers. For the campaign, DOMINO produced two communication / marketing elements :

• A trailer o https://vimeo.com/267217530 o password: Gameoftruth18

• A billboard

4.3.4.2.2

Page 58: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

58

Figure 13: Gane of Truth Billboard

In our scenario, the first step was the creation of a campaign in order to set up an audience community in an early production phase. The trailer is key to attract audiences and TV channels interest. The scenario involved also the creation of mock-up audience accounts. Their objectives are to interact with the campaign content, to submit likes, posts, to share over social media etc.

Figure 14: Game of Truth using ABT

Time saving was key and ABT is user friendly with efficient KPI’s to monitor the film’s audiences and help us to consolidate the launch of the documentary. Second use case: ‘Green Gold’

Page 59: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

59

“Green Gold” is a documentary already produced that was ready for a theatrical and TV release. DOMINO produced promotional elements such as

• A trailer o https://vimeo.com/240455839

• A banner

Figure 15: Green Gold banner

• A billboard

Figure 16: Green Gold billboard

- 6 excerpts.

Page 60: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

60

The objective to use ABT for “Green Gold” was to support the theatrical release and the TV release on social media, using the access to social media and the gamification capabilities of ABT creating contests offering free tickets in exchange of likes or feedback about the documentary. The aim is to motivate audiences and provoke feedback. The contest consists in establishing some specific rules to allow participants to get some free tickets as result of likes and sharing some posts.

Figure 17: Green Gold using ABT

As you can see on the pictures above, all the functionalities of the ABT were used in order to build up a social media campaign on Facebook and Twitter, such as connecting the campaign with a mock-up Facebook page and Twitter account and being able to sync content between the ABT campaign timeline and the social media accounts in both directions. As you may see, we were able to change the date of marketing elements and messages. Time-saving conducting these activities on social media was at the center of our interest. It permits to be proactive and consistent on social media. Additional abilities of creating friend networks between producers, investors and audience members within ABT have been tested and demonstrated as well as part of the scenario. Clearly, in both cases, the Audience Building Tool allows to save time and set up a clear social media campaign. The producer can easily submit likes, posts and is able to share accompanying

Page 61: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

61

content over social media where we can reach and build our audiences in a consumer and user-friendly way.

4.3.4.1.2 IEVCT In order to test the functionalities of the Interactive-Enriched Video Creation Tool, DOMINO produced a 360° short documentary entitled “YAMAL” “YAMAL” NENETS are a people at risk. They are the first reindeer herders on the planet, they are the only ones to have preserved an ancestral way of life. As nomads, they move with their reindeer following ancient migratory routes. They live at Yamal Peninsula in Northern Siberia to the Kara Sea, well beyond the Arctic Circle. Yamal, which means 'end of the world', is a remote peninsula with permafrost, beaten by winds. It has been the territory of the Nenet people for millennia. Today, Nenets are in danger. They are witnesses to climate change, new geological phenomena, gas overexploitation and the disappearance of the last nomadic tribe in Europe. In "Yamal", we will experience an extraordinary timeless journey with a family of nomads, 3 generations of men and women moving North for Siberian transhumance. A long journey of 1500 km. “Yamal” video was shot in 5,5K and rendered in 4K from rushes shot by DOMINO. We shot with a GoPro Fusion camera with stereo audio.

Figure 18: Yamal using IEVCT

4.3.4.2.3

Page 62: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

62

On a single TV, the viewer would just use a standard remote control. It allows the viewers to move all around the 360° video and zoom in. This won’t change the consumer habit. The viewer is in front of the TV with the remote control in his/her hand. The IEVC functionalities that have been used by DOMINO have the objective: to facilitate the “viewer-journey” inside the 360° video having direct access to defined areas. Three different areas of 120° inside the 360° have been set up. The viewer can have access to this area straight away pointing out the specific ideogram. As you can see on the pictures above, you have 3 circles with 3 defined white areas. On the right side, you have a blue circle with an arrow inside that allows the viewer to move inside the pictures step by step. An identical blue circle with an arrow exists on the left side of the picture. With IECV, the journey inside the 360° gets smooth and allows the viewer to go straight to wherever he wants following the narrative and the sound information. In 360° video, the sound is crucial as it drives the viewers’ attention. As the Second Screen Framework (SSF) provides a framework, it is difficult to evaluate (objectively or subjectively) on its own. We decided to integrate within the SSF additional information on the 360° short documentary. The additional information consists in pictures and additional information on characters and venues related to the mini documentary. The viewer gets access to this information whenever he wants during the screening. The examples below show how this was achieved using the Second Screen Framework in our 360° documentary.

Figure 19: Yamal Screenshots for SSF

The Yamal Peninsula is a region in Northern Siberia, well beyond the Arctic Circle. Yamal means 'edge of the world'. In Yamal, 600 000 reindeers are raised by 15 000 nomads.

Page 63: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

63

Figure 20: Yamal Screenshots for SSF

The Nenets are the first reindeer herders on the planet, they are the only ones to have preserved a traditional way of life. Being nomads, they move their camp about 70 times a year, over more than 1,000 km to find the pastures essential for the reindeer's survival.

Figure 21: Yamal Screenshots for SSF

Page 64: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

64

Chums are traditional tents that serve as habitat and can shelter up to 15 people. It consists of thirty wooden poles of which none enter the ground.

Figure 22: Yamal Screenshots for SSF

The village of Yar-Salé Salekhard is located on the right bank of the river Ob. It means "the sand course" in the language of the Nenets. It has about 6,000 inhabitants.

4.3.4.3 Subjective benchmarking of SSF by Fraunhofer FOKUS As the Second Screen Framework provides a framework, it is difficult to evaluate (objectively or subjectively) on its own. Any benchmark or evaluation can and needs to be done in the context of a specific usage and application scenario. The primary use for SSF in the PRODUCER context has been in the synchronized playback of segments of a high resolution 360° video on multiple HbbTV television screens.

Figure 23: 360VPT on multiple TV Screens for Playout

Page 65: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

65

The “Caminandes” video was rendered in high source resolution (24K) from the freely available Blender Project source files. Each individual screen shows a 60° angle view of the video. On a single TV, the user would just use a standard remote control to look to the left or the right in the 360° video. On multiple screens, use of a single (or multiple) regular TV remote control would clearly be impractical. So a second screen application, using the SSF has been written to provide synchronous play control for all TV screens, including PAUSE/PLAY and LEFT/RIGHT functionality, allowing all TVs involved (three at IBC 2017 Broadcast exhibition, five at MWS 2018) to act as part of a large 360° video wall. It should be noted that all TVs play their section of the video autonomously and individually. They do not use a common video view that is just split across multiple screens. The only linking element between the TVs is the second screen device which sends information about the required play status and viewing angles to all TVs. Questionnaire based benchmarking would have been difficult to achieve with this application, as the functions for the end-user are limited (essentially to “turn left” and “turn right”) and the user experience would be more influenced by features of the application (video quality, layout of the televisions, appreciation of the video content) than the underlying second screen feature, which would likely be ‘invisible’ to the user, as pressing a button on a tablet to control a TV would not even be seen as something noteworthy in itself. So the benchmarking is only based on usage on industry events and showcases and the reaction of visitors. Technically, the system performed well, even facing the restrictions of industry fairs and exhibitions, with commonly bad Wifi connections and other technical issues. Screen did stay synchronized spatially (all views lined up to a continuous panorama) and temporally (all videos segments remained in sync with less than two frames difference). Booth visitors were generally impressed by the demonstration, although, as already mentioned, this is likely to be more about the video quality and the general showcase than the SSF specifically. And while their comments were probably more specific and subjective than most, the booth at the IBC in Amsterdam was also visited by Ton Roosendaal (the founder and CEO of Blender) and Pablo Vazquez (the director and creator of the Caminandes video) who both expressed admiration and praise regarding the demonstration based on the material they created.

Page 66: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

66

5 External benchmarking

5.1 Scope and objectives The scope and objective of the external benchmarking activities was directed to end users, professional users and potential investors for the PRODUCER tool set. The main intention was to get an outside view on the PRODUCER prototype with the additional benefit, that some of the external benchmarkers already have experience with similar software applications as part of their daily business. A quite comprehensive external benchmarking was performed by ICCS for their respective tools. Up to 160 responses were collected from students comprising valuable feedback for improvements regarding functionalities and GUI design. Another essential part of the external benchmarking was conducted during the NAB exhibition in April 2018 in Las Vegas / USA with the advantage, that many different stakeholders and potential investors from the industry were available onsite and were also willing to spend some time on the benchmarking of the PRODUCER prototype. Beneath the benchmarking of the tool set itself (functions, usability etc.) it was also a good opportunity to get some feedback on the principal architecture (e.g. combination of the individual tools, missing tools) of and potential competition (comparable tool sets or individual tools) for the prototype (strategic benchmarking). Additional benchmarking sessions took place at the FKTG bi-annual symposium in Nuremberg / Germany (professional users and system integrators) and at the Sunny-Side-of-the-Doc Docu market in La Rochelle / France. The Sunny Side of the Doc, now in its 29th year, is an international marketplace for documentary and other factual content. It is a four-day event, held in La Rochelle and bringing together filmmakers, broadcasters, financiers, distributors and producers.

5.2 Methodology The set of methodologies for the external benchmarking was similar to the internal benchmarking, consisting of

- Tool specific questionnaires for the subjective testing by end users - Video and / or slide-decks with demo path and user manuals to support the guided

interviews and subjective benchmarking tests

Page 67: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

67

5.3 Results

5.3.1 Strategic Benchmarking by industry experts and professional users at NAB 2018, FKTG annual symposium 2018)

During NAB 2018 in Las Vegas (USA) and FKTG Symposium in Nuremberg (Germany) the PRODUCER toolkit / integrated prototype was presented to relevant industrial players and potential users in order to collect feedback to the projects achievements. The participants of these benchmarking and evaluation activities were TV channels, software manufacturers and system integrators. The presentation of the integrated PRODUCER prototype was supported by short videos, slide-decks, “hands-on”-demonstration of individual tools and questionnaires that helped us conducting guided interviews. The overall goal for these evaluation and benchmarking activities at important B2B-events were to create awareness for the PRODUCER toolkit on the one hand and gather qualitative feedback to both the integrated prototype as well as the individual tools on the other hand. By nature of the events, technical and business-related perspectives were more in focus than user and/or end-user perspectives. The following table summarizes the main qualitative feedback during NAB2018 evaluation & benchmarking activities. Results of qualitative benchmarking during NAB 2018

Company / Institution

Type Comments during presentation and Benchmarking

Suggestions

NRK Public Broadcaster

- Very interesting approach, especially for our editorial departments that either want to promote content by themselves or provide producers to do so

- It would be great to have the toolset as a “software-as-a-service” offering so that we can use it also in “peak times” across different editorial departments

- Check, if other social media platforms could be integrated (e.g. Instagram, Snapchat, Pinterest)

- Integration into existing post-production (editing) pipelines and media asset management systems is very crucial

Deutsche Public - It is obvious that the UI / - Design and implement a

Page 68: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

68

Welle Broadcaster UX for the different tools has been manufactured by different parties. For a better “look & feel” it would be great to harmonize the different UI / UX aspects of the different tools in order to make them more “uniform” to each other.

UI/UX guideline for all tools

- Provide tools from the toolkit also individually

- Allow single-sign-on with Microsoft Active Directory accounts/mechanism

WDR Public Broadcaster

- The ITDT is very interesting. It would be great to apply even more categories/filters for socio-demographic aspects of the target groups and their interests

- It is a great idea to provide an integrated platform for documentary producers as they usually don’t have the financial resources or the know-how to use these kind of tools

- The toolset should be available as an “on-premise” installation due to possible GDPR issues

- In a future product: make usage of “paid”-interfaces to Twitter, Facebook and Google in order to have more data / categories to apply filters for the trend discovery

- Connect the ITDT to data visualization tools like Kibana or Grafana (or similiar) to further exploit the trends discovery mechanisms

- The toolset should provide good manuals and tutorial videos directly from within the webpage / tools

ZDF Public Broadcaster

- The integration of the different tools into one platform is very good as it provides all necessary tools that we need for the overall documentary production value chain

- We see the usage of the tools within our internal editorial department as well as in collaboration

- Offer an easy accessible and cost attractive business model

- Allow also to easily share licenses to co-workers within the editorial departments (e.g. assistants that take care of promotion and community management)

Page 69: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

69

with our producers - The major advantage to

“traditional” social media management systems is the integration into valuable different tools for the documentary production value chain but we still think that especially for social media promotion activities some important features are missing (particularly scheduling of posts)

Sony Software Company

- We understand the aim and the goal of the toolset but as a systems manufacturer we are only interested in some parts of the toolkit and not the overall platform. Is it possible to get access to the individual tools and integrate them on a project basis individually?

- Technical documentation of the software architecture should be available

- REST API’s are crucial for integration into different system environments / infrastructures

MTI System Integrator

- We like the AAT & the MCSSR a lot. Especially that these two tools are integrated with each other

- We think there are strong use cases for ITDT & 360 VPT as well

- Further expand the possibilities of the AAT by e.g. using commercially available “cognitive services” API’s from e.g. Google, Microsoft, IBM etc.

Mediastrat System Integrator

- Interesting and unique approach. We are not aware of any competition product / toolset. (Of course there is competition for the

- Individual tools & the overall platform should be deployable in hybrid models (i.e. in data-center on premise and in the public cloud)

Page 70: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

70

individual tools but not for the overall integrated prototype).

- What are your plans after the end of the EU-funded projects?

- How will you market the toolset?

- Are you also aiming at new/different customers (other than documentary film producers)?

dellEMC Software / Hardware Company

- It would be great to provide the tools individually and offer them also in an “on-premise” installation scenario

- What are the storage and computing requirements for 360 video playout?

- Maybe it’s better to market not the overall toolset but also integrated parts of it. E.g. just the ITDT, MCSSR & AAT in a “bundle” or the IEVCT and the 360 VPT in another “bundle” or the ABT as a standalone tool

MMI System Integrator

- What is the business model behind for using the platform and / or just individual tools?

- How do you enable system integrators like us to integrate the tools into different environments? Do you provide “professional services” as an EU-funded project?

- What are your plans after the funding period? Who is taking over for the commercialisation of the toolset?

- Focus on the interfaces for a seamless integration into post-production systems like Avid Media Composer, Adobe Premiere or Adobe After Effects

-

Exozet System - How do we use the SRPT - What are the

Page 71: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

71

Integrator in a standalone way? - SRPT seems to have it’s

potential more at the integration with OTT systems and not as an individual tool

recommendation algorithms that you are using?

- Content based recommendations?

- Item based recommendations?

- What kind of fingerprinting technology are you using?

unreel.me Software Company

- It should be ensured, that the integration with standard post-production tools is possible

- We are very interested in the 360 VPT: How is an integration into our product portfolio possible?

- Provide access to individual tools and not the overall toolset / integrated prototype

valossa Software Company

- What kind of face and object detection algorithms have been implemented?

- Did you develop custom algorithms & models for the AAT?

- What are your future plans for the AAT (commercially and technologically)?

- The tools should provide documented API’s with REST interfaces

-

wicketlabs Software Company

- What “maturity” do the tools really have?

- Which ones are “ready to ship”?

- How is an integration into our product portfolio commercially possible?

- -

Page 72: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

72

icx media Software Company

- OCD tool is really valuable if you can add more repositories

- What happens at the end of the project with the platform and the individual tools?

- Integrate more content repositories, especially commercial ones in order to create an

-

Adobe Software Company

- We like the idea of an integrated toolset for documentary film producers. Nevertheless we think that the combination of PRODUCER tools with existing industry-wide standards such as Adobe Premiere would make a lot of sense as these tools are already existing at a lot of companies

- The availability of individual tools and the integration into standard software systems should be a focus

- Integrate MCSSR and AAT as a “panel” into Adobe Premiere video editing software and profit from a deep integration into the Adobe toolset / toolchain

Avid Software Company

- Who is “the” contact person for using the overall platform and / or just individual tools?

- Some of the tools are very interesting, like ITDT, AAT, 360 VPT others less (as we provide similar tools)

- Outline technical and commercial aspects for the integration into Avid toolsets

- Provide well documented APIs

-

identv Software Company

- We like the idea of integrating the AAT with a media asset management system (MCSSR)

- How do you handle the heavy computing amounts that come with tools like AAT in a productive environment?

- AAT should be optimized for GPU processing of the algorithms

- Trained models for AAT should be in the ownership of the customer and not of the PRODUCER platform operator

Page 73: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

73

- It should be possible to scale “on-demand” to cloud computing power if the AAT has to process a lot of newly ingested data

vidrovr Software Company

- The combination of trends discovery (ITDT) and search in content repositories (OCD) is very unique and at the same time very promising

- It is a pity that you didn’t integrate more content repositories into OCD

- Is it possible to use the AAT with the OCD without using the MCSSR? It would help a lot not to rely on a MAM (like the MCSSR) for quick keyword extractions / editorial tags

- We are interested in the algorithms that you are using within AAT

- Why didn’t you focus also into “audio-based” cognitive services algorithms such as natural language processing?

- Integrate more content repositories into OCD

- Further develop AAT with more cognitive services algorithms

- Make sure that AAT works in public and private cloud environments and enable also “hybrid” deployments if you have a lot of content to process

Table 26:Consolidated feedback about the integrated prototype at NAB 2018

5.3.2 Subjective benchmarking by professional users and system integrators at FKTG Symposium 2018

During the FKTG symposium in Nuremberg Flying Eye presented the PRODUCER project in a paper session followed by individual guided benchmarks with eight representatives from the professional media market. The main focus of the benchmark was not directed to an individual tool benchmarking but was instead focussing to the integrated prototype and the way the tools

Page 74: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

74

are talking to each other. Another topic of the benchmark was directed to the importance of the different tools for the daily business of the experts. The answers were collected in questionnaires and the overall results are shown in the diagrams and tables below. A) General questions regarding the testers profile

Figure 24: Age & gender of testers

Figure 25: Employment status and nature of current occupation of testers

Figure 26: Education and multimedia metadata experience

In summary: Most of the testers were experienced professionals with an occupation in the media industry with all of them having at least basic familiarity with multimedia metadata handling, most of them were even experienced users and experts in this domain.

Page 75: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

75

B) Questions about the PRODUCER integrated prototype

Figure 27: Ease of logging in and navigation (dashboard

Figure 28: Ease of accessing tools from the dashboard and user-friendliness of dashboard

Since the single-sign-on feature was not fully implemented at the time of the benchmark the ease of logging-in was rated only average. After logging-in the navigation in the dashboard itself as well as the access to the different tools was rated very positively.

The following questions were covering the functionality and the performance of the interfacing between the different tools.

Figure 29: Rating the number of data transmitted between the tools and speed of data exchange

Page 76: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

76

The number of transmitted data was rated differently by the users (system integrators tend to prefer a high integration level between individual tools) and the speed of data exchange, which is also dependent on the resources booked in the AWS cloud, was rated average.

With the next question, the testers could indicate which subset of tool-set has the best fit to their requirements.

Figure 30: Fit of tools to testers requirements

Due to the background of the testers and given the fact that media management tools like the MCSSR and annotating tools like the AAT are well-known and frequently used in the broadcast industry it was no surprise that these tools were rated high.

The last two questions were collecting text entries regarding additional tools or functionalities the testers would like to see as part of the PRODUCER toolset and what from their perspective should be future R&D directions for the toolset. The results are compiled in the following table.

Page 77: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

77

Category Additional tools & features / measures for improving the prototype

User Interface “Interface to user management standard tools; user role management; Improved metadata editing functionalities” “user management functionalities (or link to a user management system)” “improving and harmonizing user interface; add functionalities for quality control” “improving GUI; more seamless transitions between the individual tools” “GUI improvement and improved editing integration”

Integration with existing infrastructures and other tools

“Publicly documented REST API for integration into existing production infrastructures” “Strengthen the integration of the ABT to more Social Media Networks (e.g. Snapchat, Instagram, Pinterest etc.)” “improved integration with editing systems” “GUI improvement and improved editing integration” “more focus on how to integrate the prototype into broadcasters’ systems infrastructure” “integration of the ITDT and OCDT into existing newsroom systems for broadcasters”

Additional features / functionalities

“more knowledge-based algorithms included in the AAT” “Improved metadata editing functionalities” “user management functionalities (or link to a user management system)” “Integration of "Augmented Reality" functionalities into the IEVCT” “add functionalities for quality control” “Add more AI / "cognitive services" algorithms into the AAT” “Augmented Reality”

Others “More research on how to reach and engage my audience as a producer”

Figure 31: Improvement areas for the PRODUCER prototype collected at FKTG Symposium 2018

There is obviously room for improvement regarding user interfaces and integration into existing infrastructures, which is not surprising given that nine individual tools had to be integrated for the prototype solution. The request for an integration of the OCG and ITDT into

Page 78: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

78

existing newsroom systems reflects the high interest of having such tools available for the day-to-day business. Regarding additional features and functionalities quality control features and augmented reality extensions are rated high.

5.3.3 Subjective benchmarking by professional users at Sunny-side-of-the-DOC market 2018 by Mediaset

During the Sunny Side of the Doc (25th-28th of June 2018), MEDIASET and DOMINO presented PRODUCER project to several international producers, Sales Agent and Directors, by means of live demos of the integrated prototype at a dedicated booth. MEDIASET also presented the project by means of an application on HbbTV. The application containing some video clips showed also the functionalities of each tool on a TV. The aim of the participating at the event was to get in contact with high numbers of the documentary players and collect significant feedbacks from strategic benchmarking activities performed by MEDIASET and DOMINO. The two exhibitors partners came into contact with more than 40 SME and professionals in the documentary production sector presenting a “pitch” of the project. Around half of them were eager to participate at the final “benchmarking demo session” of the toolkit and give their feedbacks about the use of the tool in the form of a written questionnaire. The overall collected results are shown in the diagrams and tables below.

A) General questions regarding the testers profile

Figure 32: Age & gender of testers

Page 79: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

79

Figure 33: Employment status and nature of current occupation of testers

Figure 34: Education and multimedia metadata experience

Page 80: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

80

In summary: Most of the testers were experienced professionals with an occupation in the media industry (producers, directors) and most of them had at least a basic familiarity with multimedia metadata handling, some of which were even experienced users and experts in this domain.

B) Questions about the PRODUCER integrated prototype

Figure 35: Level of experience with editing or tools for production

As highlighted in the histogram: most of the testers had a high experience with Editing or tools production, meaning that they are perfectly in line with Producer’s target

Page 81: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

81

Page 82: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

82

Figure 36: Overall opinion of the software about quality, ease of use, usefulness, meets requirements, meets expectations

In summary: the software was positively evaluated. Most of the testers expressed a medium-high judgment in terms of quality, ease of use, usefulness, meets business requirements and expectations.

In order to have a more complete picture of evaluators opinions about the tool and targeting future refined releases of the tool the evaluators were asked to provide explicitly their comments, feedback, and future improvements proposals. The question was: “How can we improve our toolkit? Please comment on any aspect, positive or negative, that you feel would be helpful for us to know.” After receiving 20 text responses, evaluators processed and analysed them. The answers were very positive, the users showed a particular satisfaction of the results obtained in such a short time at this early stage of the product (prototype) and are interested to see an evolution of the toolkit with a more user friendly and “responsive” web-design (where it is possible) interface and with extra functionalities like audio recognition and simultaneous multi-language subtitling. The following citations are the most interesting answers: ● “The software is user-friendly for the younger people but it could be difficult to use for

older people” ● “I would like PRODUCER to be able to annotate documents offline and allow editing

for a more integrated productive experience” ● “I would like to have the opportunity to perform multi language subtitling” ● “PRODUCER is a good product but if it were integrated with editing and stitching

tools, it would be the best” With the next question “Are there business scenarios, where the PRODUCER toolkit is suitable for your business?” the testers could indicate which subset of toolkit has the best fit to their requirements. Some testers have replied that they see the PRODUCER toolkit very useful in a pre-production phase for research and identification of trends activities, while for others it is very useful in a production phase, every producer dreams of having an efficient automatic annotation system to avoid paying money to people who perform this work manually.

Page 83: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

83

To the last question posed to testers “Which toolkit functionalities are more important for you?” the answers were very different and it is possible to divide them into 3 main categories: 1) Those, who consider it essential to use pre-production tools to identify trends in order to produce a successful documentary 2) Those, who experienced the MCSSR with the integrated AAT with the benefit of getting automatic annotation always want to use tools with these kind of functionalities 3) Those, who appreciated the most sophisticated post-production tools such as 360VPT and IEVCT and believe that these tools will have an important impact for the future of documentaries.

5.3.4 Subjective benchmarking by professional users at Sunny-side-of-the-DOC market 2018 by DOMINO

During the “Sunny Side of the Doc” in La Rochelle / France in June 2018, DOMINO disseminated and benchmarked the PRODUCER project in sessions with the participation of 12 representatives from the documentary industry such as Producers, Sales Agents and TV Commissioning Editors. The benchmarking sessions were conducted in one-to-one meetings. The main focus of the benchmarking was on three PRODUCER platform tools: ITDT, ABT and IECVT. The aim was to get feedback from the participants in order to evaluate the importance of these three tools for the daily documentary production business and how the tools are able to change and improve existing production processes. The benchmarking was supported by case studies based on three DOMINO documentaries: “Green Gold”, “Game of Truth” and “Yamal”. We used these documentaries and the accompanying usage of the three tools to explain the basic principles, functionalities and usage patterns of the tool tools and how they are overall integrated into the PRODUCER toolkit management dashboard. The feedback was collected in questionnaires and the overall results are shown in the diagrams and tables below where we gathered qualitative feedback. By nature of the event, business-related and technical perspectives were more in focus than user and/or end-user perspectives. The following graphs show the results of the questionnaires that have been used during the Sunny Side of the Doc 2018 evaluation & benchmarking activities conducted by us. Demographic and general information about the participants 12 professionals (58% males and 42% females) have participated to the benchmark activity.

Page 84: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

84

Figure 37: Gender of the testers

Age of the benchmarking participants: 42% were between 45-54 years old, 33% between 55-64 years old, 17% were between 35-44 years old and 8% older than 65.

Figure 38: Age of the testers

Page 85: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

85

Figure 39: Employment status of the testers

42% of the benchmarking participants are employed, 17 % are self-employed, the rest didn’t provide information about their employment status.

Figure 40: Employment status of the testers

17% are working in business units, 67% in art units and 17% in marketing departments.

Page 86: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

86

Figure 41: Education status of the testers

92% of the benchmarking participants hold a master degree and 8% a high school degree.

Figure 42: Experience with multimedia tools

50% of the benchmarking participants have a significant familiarity with multimedia projects & tools, 42% have a basic familiarity and 8% consider themselves as experts. The following tables summarize the main qualitative feedback during the Sunny Side of the Doc 2018 evaluation & benchmarking activities conducted by us.

Page 87: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

87

Profession Comments during presentation and Benchmarking

Suggestions

Producer (F) (ITA)

- Excellent and useful tool and excellent software quality tha fully meets the expectations

- The most important tools are ABT and ITDT

- To develop fast and effective tools for searching within footage

- The tools should be improved through better overall user experience (e.g. GUI)

Director (F) (US)

- The PRODUCER toolkit fully meets expectations and requirements

- The most important tool is the ABT

- She does not know a product with similar functionalities

- It should be more user friendly (improvement of UI / UX)

Producer (M) (BEL)

- Excellent and useful tool and great software quality

- It meets the expectations and PRODUCER toolkit may be useful to his business

- The more relevant functionalities are research and analysis of tags and keywords

- Sound recognition development could be an added value

Producer (M) (FRA)

- Great overall software quality, easy to use and very useful

- Until now he has the feeling that all the functions of PRODUCER are interesting and he has never seen such a toolkit. Now he does everything manually and may adopt PRODUCER toolkit

- PRODUCER is useful in pre-production, for research and for promotion

- PRODUCER will be very helpful to centralize the information on a production and help for pre-production research.

- He does not know any product like the PRODUCER toolkit

- Design (GUI) should be improved and be more user friendly

Page 88: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

88

Sales Agent (F) (B)

- Great quality of the software, very useful and it meets the requirements and expectations.

- PRODUCER could be useful in his business

- What is really important is to know how public reacts towards specific topics in order to analyse the potential impact of a story on the public

- She does not know now any similar products in the market

- It could be easier to use - Archiving searches etc.

should be improved - Improvement of the

database projects in order to know if it fits to TV broadcasters objectives

TV Commissioning editor (M) (ITA)

- PRODUCER could be used for campaign development

- He does not know now any similar product

- It should be more user friendly and the Interface design should be more developed

- Big data” analytics offered by the ITDT should be visualized in detailed and customizable visual graphics

- - It would be great to offer the tools as open source software

Producer (F) (B)

- Excellent and useful tools and excellent software quality

- It fully meets the expectations - PRODUCER toolkit could be

useful for her business - The important functionalities for

here are the research on a topic and the analysis of the public’s interest towards the topics

- - We should improve how to archive the research (to use it again at a later stage)

- - It could be very interesting to understand what the TV broadcasters are looking for as contents. Therefore, the development of a database which gathers these informations would be great

- - To develop a database on what the TV channels are looking for

Producer (F) (SPA)

- PRODUCER toolkit may be used in development and pre-production phases

- She does not know any product like the PRODUCER toolkit

- It should be more user friendly and visually more appealing

- Ease of use should be improved

Page 89: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

89

Producer (M) (FRA)

- He considered the toolkit as very useful but currently doesn’t meet all the expectations yet.

- He does not know any product like PRODUCER toolkit

- PRODUCER could be used by him to evaluate the potentials of a possible project

- PRODUCER toolkit promises a lot of time savings

- The most important tool is ITDT - He does not know any product

like the PRODUCER toolkit

- It should be more user friendly

- Real time feedback from social media campaigns could be very useful.

Director (M) (B)

- He considers the PRODUCER toolkit as very useful

- He does not know any product like the PRODUCER toolkit

- The IECV is a great tool that could be more user friendly and the Interface design should be more developed

- The IECV seems to be an effective tool that allows the viewer to easily locate objects / interactive elements in a 360° video.

- This tool should almost always be integrated into a 360° film.

- For the hyperlinks on the right, the idea is good but I think the video should stop (pause) at the moment of the click or maybe slow down

- All these tools should be adapted to “responsive design patterns” in order to use them on mobile phones, tablets and computer screens

Table 27: Consolidated feedback from “Sunny Side of the Doc”

Page 90: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

90

5.3.5 Subjective benchmarking by students for part of the tool set (ICCS)

5.3.5.1 Automatic Annotation tool Finally, towards the end of the project a questionnaire-based evaluation with end users and professional users was conducted and is still ongoing. This benchmarking was supported by a video, which explains the basic setup of the toolset and the management dashboard and a web-based implementation of the questionnaires on Google Forms. For the professional users guided interviews supported by the questionnaires were performed. By now 170 end users and professional users have contributed to this still ongoing activity. The following figures provide an overview of the benchmarking results that were achieved so far. There was additional feedback provided to questions with textual comments (which are not shown in the figures below) that provide specific input how individual aspects of the different tools can be improved further. The main aspects of this remarks and recommendations are compiled in the short textual summaries for each tool within this chapter.

Figure 43: Age & gender of AAT testers

Page 91: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

91

Figure 44: General demographic information about AAT testers

Page 92: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

92

Figure 45: Feedback for AAT

We can realize that our benchmarkers are from different kinds of occupations (generally engineers) but with not much familiarity with annotation tools. As a result we can assume that the benchmarkers represent a more sophisticated user than the average user. They can easily understand the purpose of an automatic annotation process and evaluate its usefulness. Moreover, the benchmarkers as professionals, they can provide critical information about the role of Automatic Annotation tool in the market domain. Consequently, they can provide us with helpful and very informative results. A user with not much familiarity with annotation tools votes positively about the accuracy of the tool. Going deeply, benchmarkers do not have the scientific background to evaluate the performance and built-in accuracy of the annotation algorithms. Instead what they can evaluate are the results presented and how suitable are for a user. We can state the the results of the automatic annotation process are satisfiable for the user.

Page 93: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

93

Figure 46: Feedback for AAT functionalities

From the first site we understand that users are more interested in the process of annotation than the performance. Experienced professionals understand that annotating is generally a heavy process, so production of asynchronous results is acceptable. The user interface for automatic annotation offered by Multimedia Content Search Storage Retrieval tool is friendly and easy to use. All in all, led to the general satisfaction of the users from the tool. However, we should not forget the willingness for more customization over the general process. This ,can be read, as a need of users who are satisfied and well-helped by the results of the process, despite their inexperience with metadata tools, to explore more functionalities and realize the whole spectrum of capabilities that automatic annotation offers.

Page 94: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

94

Figure 47: Expectations for AAT

To summarize, all the previous results we have exported from individual graphs, are reported on these last graphs. Automatic Annotation tool is considered a tool useful for a user of the PRODUCER. It helps them “describe” and moreover index a video with a set of words. Considering the text information provided, we can extract critical information for future development. Users did not know many known similar products. Most of them wrote about Youtube subtitles and Facebook’s face detection. Some of them knew about Google’s Speech-to-Text API. The most important functionalities for the users seems to be face detection along with object detection. In fact, the above-mentioned functionalities were referenced by the users two (2) times more than transcription and face recognition. The latter ones are recognized as functionalities helpful for specific reasons like watch a movie or identify if a famous person appears within the video. The tools is as well adaptive to quite a few business scenarios. Most of them deal with face/ people detection/recognition but the far more interesting is the one referred to control the traffic. Here we are presenting some interesting and feasible business scenarios stated by the users

Page 95: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

95

User Quotes

“great tool for traffic control” “Automating the operation of the building's parking facilities”

“Yes. It will be useful for employees who clock in the morning and clock out in the evening. Detection of their faces will save some time and errors. “

“Count trees & plants, wells and drillings, well lids, uncontrolled dump in aero-photographs or videos”

“For translating and annotating training videos so that it can be accessible to different language speaking employees. (2)”

“Yes, we would like to identify inappropriate content in videos based on bad language used or specific objects or faces.”

“Yes it could be used to identify the students that are in the class eliminating the need for an attendance book. Another case would be to detect the emotions of the students in order to find if the course is boring or too difficult to understand.”

“These Multimedia Automatic Annotation Tools could be used in order to recognise people who were attended an event from a video or a photo. This could be helpful for a journalism company”

“Yes, internal monitoring systems could be used the AAT in order to identify object and persons (security reasons, process optimization, etc) “

“ATM face recognition to prevent thieves

from using stolen credit - debit cards.”

“Business scenarios that use similar techniques/algorithms: Land cover classification, multitemporal change detection. Feature recognition (e.g. buildings from an aerial image).”

“Surveillance cameras in front of banks for face recognition of known people with criminal record or weapon recognition to automatically block doors and alert the security guards, automatic

“For my family business (football academies facilities) could be suitable for analyzing videos, where could recognize different teams in order to make a strategy plan or analyzing tactics issues.”

“Useful for all businesses. Could be very useful for home videos and also enhanced monitoring of future ambient intelligence home entrances, able to produce reports for residents regarding deliveries,mailman, milkman, advertisers, family, pets, stray animals, a specific person of interest etc.”

Table 28: General quotes for AAT

Finally, an interesting part is that for future R&D users believe that it should be put effort on training the object detector with a wider database. May this improvement increase the importance of the object detection process. For example, important objects could be more specific information about cars like the car model, the color or the car plate, information about buildings and known monuments, animals and electronic devices likes smartphones, laptops, etc. Lastly users submitted a request the tool to recognize the gender and the age of people. In

Page 96: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

96

general, the whole procedure is useful for the users, helping them put semantics on videos to make them easy to search and correlate with other content. B) Guided users For the guided part of the evaluation, users were asked to watch the video showing the functionalities and were asked to use the tool by themselves. The difference was that the actual purpose of the tool was communicated in person, so users understood better the purpose and the capabilities of the tool. As a result, they could evaluate actually the annotation process. For the part, 17 people were used as testers, mainly PhD candidates from the National Technical University of Athens. This specific pool of users is more experienced with metadata annotation tools thus they can provide a more precise evaluation of the system.

Immediately, we realize a general satisfaction on the performance of the tool which is paired with a proof of usefulness about the whole annotation process. This overall satisfaction is a quite hopeful result because the users these benchmarkers had the opportunity to practice using real multimedia content, which was processed asynchronously by automatic annotation tool producing text information.

Page 97: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

97

Finally, apart from the proved usefulness of the tool, is quite important to take into consideration the categories of objects that hands-on benchmarkers believe are important. As happened above, vehicles and their specific details (plate, color, model ect) and animals are important whereas landscapes -including well-known buildings (monuments)- and face characteristics (nose, eyes, shape of face, etc) seem to be crucial as well. Detection of people is the by far the most important task for users. Results similar to one provided be the unguided users.

5.3.5.2 Integrated Trends Discovery Tool This tool was evaluated by numerous individuals that were mainly students from the National Technical University of Athens - which ICCS is affiliated with. The students were mainly coming from the Techno Economics Masters program, jointly offered by the Department of Industrial Management and Technology at the University of Piraeus and the National Technical University of Athens - which is a highly interdisciplinary graduate programme targeted at professionals with existing market/business/working experience. The evaluation process included the following steps:

a) A document describing the core concepts of the PRODUCER project and the core innovations of the ITD tool was initially shared with the testers. The document is available at the following link: https://drive.google.com/drive/folders/1Kndvt6mmjP56tJtG97n5r5dH2mdgI-Cy

b) After reading the document the testers watched a 10-minute video demonstrating the utilisation of the ITD tool. The video contained textual information about the internal mechanisms that contribute in generating the visualised outcome at the front end of the tool. The video is available at this link: https://drive.google.com/open?id=1Pu7iTySbb_NrAGa9qlcoVtriRDrU49ag

c) Finally, the testers answered an online Google Forms based questionnaire. The questionnaire is available at: https://docs.google.com/forms/d/e/1FAIpQLSfkjQPiQbyOxI2iCj2wTzmT8V2Ilee-s_eLyg8h3n_696vWBg/viewform

Page 98: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

98

This process was completed by 157 individuals. In addition, another group of 20 individuals, after following steps a) and b), were requested to access a live version of the tool and to freely try the various functionalities. Then they proceeded on step c) and also answered the same questionnaire. As it is presented in figure 1 the ITD tool testers are mainly young persons (18-34 years old), students and/or full-time employees. The nature of their current (or most recent or targeted if not employed) occupation are mainly related to engineering, IT, and business/financial (figure 2 and 3).

Figure 48: Gender, age & education of testers

Figure 49: Occupation of testers

Page 99: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

99

Figure 50: Nature of occupation

All testers are familiar with the concept of social media services as they utilise them for long time period (more than five years) and for 1 to 4 hours per day (figure 4). In addition, most testers are highly interconnected with other users - having more than 100 connections- and seem to prefer Facebook, LinkedIn, Google, Instagram and Twitter. (figure 5)

Figure 51: Experience and hours per day utilisation Social Media services

Page 100: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

100

Figure 52: Number of connections and preferred social media service

Testers questioned about their purpose of Social media services utilisation. Their replies are presented in Figure 6. Replies such as: “To get opinions”, “To find information”, “To share your experience” are concentrating a significant amount of answers something which is important because these views are in support of the core objectives of the ITD tool. The core concept of the ITD tool is based on the fact that it is possible to gain information about population opinions and interests through mining social media and search engines services.

Figure 53: Why do you use online social networks?

In a similar manner, when the testers were directly asked “To which extent your interactions with social media are contributing in formulating your opinion for various societal issues?” they replied that are slightly affected (figure 54). On the other hand, most testers consider that social media analytics can support the extraction of information regarding public opinion similar to the information extracted via opinion polls by survey companies (figure 55).

Page 101: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

101

Figure 54: “To which extent your interactions with social media are contributing in formulating your opinion for various societal issues?”

Figure 55: “Do you think that Social Media analytics can support the extraction of information regarding public opinion (similar to the information extracted via opinion polls by survey companies)?”

The next question was about testers’ experience on using similar tools (figure 56).

Figure 56: What is your level of experience in using tools that attempt to discover and process popularity/trends in Social Media and Search Engines

The final question was about the ethical consequences on social media opinion mining. The actual question was: “The Integrated Trends Discovery Tool processes data that are freely

Page 102: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

102

available on the Internet but originate from users posts and searches. Do you consider that any ethical issues arise in this data aggregation process? Which of the following covers your opinion the most?”. Results illustrated on figure 58 show that most of the testers don’t see any ethical issues, but a significant amount of replies considers that there are such issues.

Figure 57: Ethical issues on social media opinion mining

The next set of questions targeted directly on the tool utilisation and underlying functionality. The first question was about how easy was for the testers to manage “Query Descriptions”. In order users to create a new query process need to add the necessary information e.g. textual descriptions, targeted keyword, time range, targeted regions and provide parameters about inference of higher level information. Respective replies about ease of creating a new query process are presented in figure 58 while replies about ease of managing existing Queries in general are presented in figure 59. Testers replies are based on a scale from 1 to 5 where 1 corresponds to “Very difficult” and 5 to “Very easy / intuitive”.

Figure 58: Ease of creating a new query at the "Add Query Parameters" page of the tool?

Page 103: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

103

Figure 59: Ease of managing existing "Query Descriptions" page

Based on a “Query Description” user is able to initiate a trends discovery process. Evaluator replies about how easy was for them to trigger this process and to use respective functionality for extracting results reports are presented in figure 60.

Figure 60: Ease of producing results / reports

Results on both of the figures 59 and 60 reveal that the query configuration process was characterised as easy and/or very easy for the majority of the evaluators. The next question was about the ease of reading and understanding the results. Given that rendered results are the outcome of the integration of diverse statistical models derived from external APIs utilising heterogeneous data models this task was the one of the most challenging. Within the lifetime of the Producer project we followed various iterations of design, evaluation and refinement of the way that the trend discovery results are presented to the end user. For this reason, various intuitive graphs (times series graphs, bar charts, pie chart, node graphs) are utilised in order to make the results comprehensible to users that are not demonstrating a background in statistics or in data engineering. The outcome of this evaluation is presented in figure 61 and most of the tool evaluators find the results reading process relative easy.

Page 104: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

104

Figure 61: Ease of reading the results.

The last question related with user interaction was “How user-friendly is the Integrated Trends Discovery Tool?” in general. The respective results are presented in figure 62.

Figure 62: How user-friendly is the Integrated Trends Discovery Tool?

As it is already described evaluators at the first steps of the overall process had to read a textual description of the ITD tool objectives which were also presented in the first minutes of the video describing the tool’s utilisation. Based on the presented list of innovations and after the demonstration and actual utilisation of the tool evaluators replied two different questions having the same target. The questions were: “How successful is the Integrated Trends Discovery Tool in performing its intended tasks?” and “Meets expectations as these are defined in the innovations list presented upon video start”. Results are presented in figure 63 and 64.

Page 105: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

105

Figure 63: How successful is the Integrated Trends Discovery Tool in performing its intended tasks

Figure 64: Meets expectations as these are defined in the innovations list presented upon video start

The last question with regards the actual evaluation of the tool was related with the overall software quality as this is disclosed through the execution of various tasks. As this question is difficult to be answered from evaluators with non-technical background it was considered as optional and hence it was not replied by the whole set of testers. The respective results are illustrated on figure 65.

Figure 65: Evaluate overall software quality

In order to have a more complete picture of evaluators opinions about the tool and targeting future refined releases of the ITD tool the evaluators where asked to provide explicitly their comments, feedback, and future improvements proposals. The question was: “How can we

Page 106: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

106

improve our Tool? Please comment on any aspect, positive or negative, that you feel would be helpful for us to know.” After processing 177 textual responses, we were able to group them in the categories described in table 1. Replies that were mentioning that the tool is perfect and no improvements are necessary are not included in this table. Category Indicative Responses Addition of more data sources.

“You can add more social networks on Information Sources like Facebook/Instagram or Linkedin.” “To get information from more resources than Facebook, Google and Twitter...” “Get information from more resources.” “Business users may have their own private social network(s). You could offer them a (secure) way to include them in searches.” “I think that it would be better to retrieve information from more platforms such youtube by analyzing a video's content and comparing it with the reactions of its viewers.”

Make the User Interface more friendly

“it could be more user friendly” “I think your tool is very good, super--friendly and easy to use. However, the interface is a little boring.” “The environment of the tool is not very attractive.” “A more contemporary appearance would make the tool more desirable.” “more user-friendly interface, faster execution time” “if the software is for sale it has to be more user friendly.” “I would recommend further explanation on the trends discovery results.”

Reduce duration of Trends discovery process.

“It will be appreciated if query description phase take less than one minute.” “I think that time needed for queries is too much. I understand that it may sounds unfair for the developer and the query process is time-demanding but from user's point of view continuous searching would be time-consuming and irritating.” “Reducing the time of search (easily work with massive amount of data)” “collect data faster” “I think that Integrated Trends Discovery Tool is really interesting and useful but the only drawback according to my opinion is that spends a lot of time in the phase where it retrieves data from external services.”

Additional functionalities

“You could maybe add more criteria in order to achieve better results. A great example would be to add an option for only negative or only positive opinions about the research topic and then study those two results independently.” “The ITD Tool can be improved by analysing the researches so as to estimate the age of the gender.” “It would be useful if news articles regarding the search topic were linked, for the periods of time with the most searches.”

Table 29: Main categories of replies on the question on "How can we improve our Tool?"

Page 107: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

107

The next question was “Are there use case scenarios, where Integrated Trends Discovery Tools are suitable for your business? Please explain.”. Table 2 summarises the most significant/popular outcomes out of 177 responses. Business Category

Indicative Responses

Marketing - “Demand and supply in the paper trade market” “My company could use Trends Discovery Tool's data to take a decision for promoting a new product or service according to the trends in each country.” “I am working on oil and gas industry, so a use case scenario for which economies are most heavily reliant on oil?” “I think it would be great for advertising company's capabilities at the right time for the audience to listen.” “My business is B2B and I think that this is more suitable for B2C companies, retailers etc. Yet, it could help in analyzing market trends and help businesses to built their strategy and invest in new products and features upon these results, that can be founded and analyzed through research, but with this tool it would be faster.” “As I work for a construction company, the ITD tool could be used for utility preferences regarding a future project in HR Department. The company will know more about the workers” “I do not think that there is a scenario that Integrated Trends Discovery Tool is necessary but if it was free of charge I may use it in some occasions. For example if there is a reduction in sales of a certain product you can use it to see the trends of interest generally in the products of this industry.” “Supposing that I want to sell a service or a product this tool can give me valuable information about the region and the time period this product will have better potentiality.”

Education “Yes they could be used to find the opinion of the students about the quality of the Master degree program or even about specific courses of the program.” “Yes, we are interested in trends of interest for the Greek society in order to produce focused teaching material for students and information videos for citizens.”

Personal “Yes, because I can find the information that i am interested in about engineering.”

Pharmaceu-tical

In theory, many data are stored in the pharmaceutical companies' databases. Maybe a demographic search regarding the need of a special medicine would be an interesting scenario to explore

Tourism Yes. Every business need to know about opinion trends and preferences of people. I work in tourism industry (software) and it would be very helpful to know what people seek and write about our fields of interest.

Energy “The company that i am working at the moment is an electricity supplier. This tool may be helpful to indicate the target group that is more willing in changing from PPC to an alternative supplier.

Page 108: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

108

Telecom-munications

yes it could be used for gathering information about the mobile signal and the telecommunication problems in areas of the greek prefecture.

Policy making

“As I am employed in the banking industry, the public opinion for the bank itself or a specific marketing campaign in social networks could be extracted pretty easily by this tool using keywords that are uniquely connected with the specific company's profile or campaign.” “I consider that this tool could be useful to many companies because they could get some really useful extracts, such as statistics, charts and pivot tables they could also discover their influence in society and be used by management for policy making.”

Table 30: Are there use case scenarios, where Integrated Trends Discovery Tools are suitable for your business? Please explain

The next question was “Which should be future R&D directions towards improving the Integrated Trends Discovery Tool?”. Table 31 summarises the most significant/popular outcomes out of 105 responses. Category Indicative Responses Audience analytics

“My opinion is that Integrated Trends Discovery Tool should improve more the part of analysis of audience's characteristics.” “Moreover, as I was watching the video, and show the part where you present users from Twitter related to the subject, I would like to have extra information, apart from likes or shares, how many and who are the people that are influenced from these users of your list. and maybe add a "broccoli chart" that would reveal the most powerful users per subject. Last, I would try to find more connections apart from Twitter and Google in order to increase my base. Apart from data from social platforms I would also think of using my future clients' database through a connection with their systems and pull their data in my tool.” “In general, ITDT is an excellent idea and I believe that you can continue to develop this tool as a final product beyond the project. A good idea will be the collaboration with schools of journalism and social communication in order to help you develop a more complete product.” “The fact that ITD tool uses innovations such as inference of audience characteristics, discovery and identification of specific real-life events related with the investigated topic make it really appealing but I think that some more diagrams can be added for more information and analysis.” “With a further analysis of the keywords or trends, it will be useful to approach the profession of the user (e.g. job, studies).” “I think the future of these tools is the improvement of the way the system collect data about the user profile and the AI algorithms which make the process of these data in order to make more accurate recommendations” “age and education graphs would improve it”

Page 109: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

109

Increase tool’s efficiency

“creation of more user-friendly environment and reduction of execution time” “The R&D should integrate big data and try to adapt real time data for many search and social platforms in order to deliver more accurate and updated data.” “More descriptive analytic tools (box plots, bar diagrams, etc)” “I believe that multidimensional data analysis would radically improve the existent tool. This will enable users to view the same data in different ways using multiple dimensions. For example, imagine a cube the front view shows interest vs time if you rotate the cube 90 degrees keyword volume vs time etc. This type of tools is more appealing for users.” “More social networks on Information Sources. Faster results. Maybe combine two or more keywords in one query and provide comparative/relative results.” “Caching predefined searches and results of the most recent news and topics that could be suggested to the user.”

Inference of high level information

“It would be very helpful that the tool could be able to suggest best decisions and practices about how to use data available. It could monitor historical data and actions and using machine learning could measure the impact and efficiency of each decision, in order to indicate those with the best results.” “Try to explain the spikes in the Time graph. Add news headlines of that day or the previous one” “I think 2 directions: 1) improving user interface, 2) allowing the choice of APIs to use, public or paid. Also, if possible add tool-set per trend category (e.g. politics, sports, research, economics). For example on politics I would like to know how far right or far left a trend is.” “Using text recognition to discover hoaxes or fake news.”

Table 31: Which should be future R&D directions towards improving the Integrated Trends Discovery Tool?

The next question was “Do you know of any products or services with features similar to the ones of the Integrated Trends Discovery Tool?”. Table 32 summarises the most significant/popular outcomes. Category Comment None More than 100 evaluators responded that they are not aware of similar

tools. Google Analytics Some users responded that ITD reminds them google analytics. The

main difference among ITD and Google Analytics is that the latter analyses data referring to a specific website. In addition, only the owner/administrator of the website has access to this info.

Facebook Analytics, Facebook Groups Insights

Some users responded that ITD reminds them Facebook Analytics. The main difference among ITD and Facebook Analytics is that the latter targets only specific Facebook pages and provides analytics services only to the owner/administrator of the page.

Twitter trending Some users responded that ITD reminds them Twitter trending analytics. Twitter trending analytics are acting as a source and are part of the ITD tool information’s sources.

Page 110: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

110

Google Trends Some users responded that ITD reminds them Google Trends. Google trending analytics are acting as a source and are part of the ITD tool information’s sources.

Table 32: Do you know of any products or services with features similar to the ones of the Integrated Trends Discovery Tool?

ITD tool developers aim to continue the refinement of the service and to extend the provided functionalities. To this end, evaluators were questioned on which of the provided reports are the more useful. The responses are illustrated in figure 66.

Figure 66: The Integrated Trends Discovery Tool provides various reports. Which are the more useful for you?

Finally, evaluators were questioned: “The Integrated Trends Discovery Tool currently utilises mainly the free versions of public APIs (e.g., Google API, Twitter API, ...). Hence there are often delays and matters related to limited access to data. Do you believe that a company interested in the tool's results would be willing to purchase more advanced services (e.g., more detailed user demographics, data from larger user populations, data that span longer to the past) for an additional fee? �If so, which of the following amounts do you consider as appropriate for the needs of a small company?”. The outcome of 177 responses is illustrated in figure 67.

Figure 67: Estimation of cost in order to utilise ITD tool in business environment.

Page 111: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

111

As it was stated in the beginning of this section 20 out of the 177 evaluators had the opportunity to utilise the ITD tool. The questionnaire responses that gave afterwards are not particular different from the overall outcomes as those have been presented in the previous graphs and tables. However, it is of interest of the evaluation process their opinion with regards the utilisation of the tool and the quality of the underlying software services. Their responses are illustrated in the following figures.

Page 112: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

112

Page 113: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

113

Page 114: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

114

5.3.5.3 Social Recommendation and Personalization tool

A) Unguided

For the unguided part of the evaluation of the SRP tool, 143 students from the National Technical University of Athens - which ICCS is affiliated with - were asked to participate. The students were mainly coming from the Techno Economics Masters program, jointly offered by the Department of Industrial Management and Technology at the University of Piraeus and the National Technical University of Athens - which is a highly interdisciplinary graduate programme targeted at professionals with existing market/business/working experience, as can be seen in the following figures.

Figure 68: a) Education level b) Current Occupation

A short video showing the functionalities of the tool and the expected interaction from the users was shown to the students, and unlike the other tools, they were expected to use the tool on their own via its standalone GUI. After exposing themselves to the tool and using it until they are satisfied they have formed an opinion on its capabilities, they were asked to respond to the corresponding questionnaire.

Page 115: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

115

The experience of the users that participated in the process on recommender systems is shown in the following figure, confirming that a reasonable user diversity was well achieved

Figure 69: Level of experience with Social Recommendation and Personalization Tools (1: no experience, 5: much experience)

The user is asked to create an account on the tool inserting his information in order to create the basic profile. The information required are certain demographics (age, country etc) and some personal information (name, email etc.) as well as a username and a password. The information needed explicitly by the users are not much as can be confirmed by the responses of the users.

Figure 70: a) Difficulty of adding data to the system (1: very difficult, 5: very easy) b) Were the data needed by the system too much?

After the user creates his account, he can continue to explore the actual functionalities of the tool. By clicking on the “Videos” tab, he is presented with two options. On the one hand he can see the recommended videos that the tool suggests based on the profile the tool has created until now. In the beginning, the profile is created based on the demographics chosen by the user, so that content relevant to similar users is presented. On the other hand, a search functionality is available, where the user can search the database of the SRP tool of more than 2600 videos by providing text relevant to what his is searching for. The concept is to use the search functionality together with the recommended videos and based on the interaction the

Page 116: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

116

user has on the videos, the tool should be able to deduce the user’s profile and suggest relevant videos to his interests. After some iterations of using the tool, the users should rate to the relevance of the recommended content and the user’s interest in each of the 14 categories is presented. The results of the procedure can be seen in the following figures.

Figure 71: Matching of the generated with the expected user's profile (1: unacceptable, 5: excellent)

Figure 72: Matching of the recommended videos to the user's expectations (1: unacceptable, 5: excellent)

Page 117: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

117

In both above figures, we see that the majority of the users rate the tools performance as satisfactory with 39% rating the profile matching generated by the tool and the one they had in mind while using the tool, with 3 starts and 38% rating it with 4 stars. On the other hand, the figure about the matching of the recommended videos to the user’s expectations shows again that the majority was satisfied, with a rating of 3 stars for the 39% and of 4 stars for the 36%. It is important to note that many times, the actual content of the video was rated by the users, something that is not important to the functionality of the tool, and so there could be some misinterpretation of the actual question. The limited availability of content could also play an important role in the results of the above questions.

When asked about the overall Quality of Experience they had while using the tool, the majority with 49% rating the system with more than 4 stars (4 or 5 stars) stating that the Quality of Experience was more than satisfactory.

Figure 73: Overall Quality of Experience (1: unacceptable, 5: excellent)

One very interesting result coming from the questionnaires, is the importance the users give on such recommendation systems on a documentary content provider platform such as the PRODUCER platform. According to the graph, the Social Recommendation and Personalization tool provides a highly appreciated feature of the platform that definitely increases the Quality of Experience of the user, while helping him achieve tasks faster and more efficiently.

Page 118: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

118

Figure 74: Importance of recommendations on a) videos b) enrichments (1: not essential, 5: absolutely essential)

Finally, users were asked about the relation that they expect between the video content and the enrichments that are recommended to the user by the tool. As we can see from the figure, the majority has responded that they would like a balance between being relevant to the video content and the user profile, which shows that they are open to having recommendations that are more loosely tied to the content itself.

Figure 75: Preferred relation of enrichments to the video content (1: Tightly related to video content, 5: Tightly related to user profile)

Recommending something slightly out of context as far as it is of interest to the user seems to be an option opening some interesting research topics for future exploration. Adding the capability to tune that relation based on user’s actions or the nature of the content could seem appropriate.

B) Guided

For the guided part of the evaluation of the SRP tool a similar approach to the unguided test was followed. The users were asked to watch the video showing the functionalities and were

Page 119: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

119

asked to use the tool by themselves. The difference was that the actual purpose of the tool was communicated in person, so that we avoided misinterpretation on what should be evaluated (the recommendation system) and what not (e.g. the actual content of the videos or the search mechanism) and questions arising could be answered on the spot. For the part, 17 people were used as testers, mainly PhD candidates from the National Technical University of Athens. Being PhD candidates on the field of Computer Science, the users slightly more experienced with Social Recommendation and Personalization tools than the unguided pool of users, and thus they were more experienced on how to evaluate the system.

Figure 76: Level of experience with Social Recommendation & Personalization Tools (1: No experience, 5: Much experience)

From the answers provided, we can see that there is accordance between the guided and unguided test users on the difficulty of adding data to the system and on the necessity of the required data. The following figures express that opinion. Notice that on the second figure, no 1 or 2 stars were given as response, indicating the ease of adding the respective data.

Figure 77: a) Difficulty of adding data to the system (1: very difficult, 5: very easy) b) Were the data needed by the system too much?

Page 120: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

120

Since the users were guided throughout the process, and by clearly indicating what is to be evaluated and what not while using the tool, the responses on the matching of the videos and the profile to the expectations are more skewed to the higher ranks, with most of the users giving 4 stars on both questions. More specifically, the 47% rated the matching of the generated to their actual profile with 4 stars, and the 65% rated the matching of the recommended videos to their expectation with 4 stars. The overall Quality of Experience was also rated as more than satisfactory since almost all users rated it with 4 or 5 stars. It is important to notice that no users rated the tool with 1 or 2 stars and so those ratings are omitted from the figures.

Figure 78: a) Matching of the generated with the expected user's profile b) Matching of the recommended videos to the user's expectations c) Overall Quality of Experience (1: unacceptable, 5: excellent)

Results on the rest of the questions are similar to the ones provided by the unguided test users and are provided here for the sake of completeness.

Page 121: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

121

Figure 79: Importance of recommendation on a) videos b) enrichments c) preferred relation of enrichments to the video content

Page 122: Analysis of evaluation and benchmarking results [M18] · PRODUCER D4.3 Analysis of evaluation and benchmarking results Public 8 Executive Summary This deliverable summarizes the work

PRODUCER D4.3 Analysis of evaluation and benchmarking results Public

122

6 References

[1] Viola, Paul, and Jones, Michael. "Rapid object detection using a boosted cascade of simple features." Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conf. on 2001: I-511-I-518 vol. 1. [2] H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. In IEEE Patt. Anal. Mach. Intell., volume 20, pages 22–38, 1998. [3] H. Schneiderman and T. Kanade. A statistical method for 3D object detection applied to faces and cars. In International Conference on Computer Vision, 2000. [4] D. Roth, M. Yang, and N. Ahuja. A snowbased face detector. In Neural Information Processing 12, 2000. [5] Ahonen, Timo, Hadid, Abdenour and Pietikäinen, Matti. "Face recognition with local binary patterns." Computer vision-ECCV 2004 (2004): 469-481. [6] Bolme, D.S., Beveridge, J.R., Teixeira, M., Draper, B.A.: The CSU face identification evaluation system: Its purpose, features and structure. In: Third International Conference on Computer Vision Systems. (2003) 304–311 [7] Phillips, P.J., Moon, H., Rizvi, S.A., Rauss, P.J.: The FERET evaluation methodology for face recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (2000) 1090–1104 [8] Liu, Wei & Anguelov, Dragomir & Erhan, Dumitru & Szegedy, Christian & Reed, Scott. (2015). SSD: Single Shot MultiBox Detector. [9] Howard, Andrew G. et al. “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.” CoRR abs/1704.04861 (2017): n. pag. [10] Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013). [11] http://nlp.stanford.edu/data/glove.6B.zip [12] Pennington, Jeffrey, Richard Socher, and Christopher Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. [13]https://storage.googleapis.com/google-code-archive-source/v2/code.google.com/word2vec/source-archive.zip [14] http://ryanheuser.org/word2vec-vs-the-mat/ [15] https://images.pearsonclinical.com/images/pdf/milleranalogies/matcib2002_03.pdf [16] Shani, Guy, and Asela Gunawardana. "Evaluating recommendation systems." Recommender systems handbook. Springer, Boston, MA, 2011. 257-297. [17] Clarke, Charles LA, et al. "Novelty and diversity in information retrieval evaluation." Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2008. [18] Stai, Eleni, et al. "A holistic approach for personalization, relevance feedback & recommendation in enriched multimedia content." Multimedia Tools and Applications 77.1 (2018): 283-326.