dynamic story shadows and sounds - técnico lisboa...fig. 12 - ghostwriter .....26 fig. 13 –...
TRANSCRIPT
Dynamic Story Shadows and Sounds
António João Martins Leonardo
Dissertação para obtenção do Grau de Mestre em
Engenharia Informática e de Computadores
Júri
Presidente: Prof. Paulo Jorge Pires Ferreira
Orientador: Prof ª Ana Maria Severino Almeida e Paiva
Vogais: Prof. David Manuel Martins de Matos
Junho 2009
Abstract
English
Sound and Music have always surrounded those who are lucky to be aware of its
existence. The world would not be complete without the existence of them. Besides the
entertainment aspect of music, it can also be used as a way of complementing and adding
information to a story.
In the context of storytelling environments, this is especially useful since we want the
viewers to not only enjoy the visualization of the story but also to understand it the best
possible. The task of scoring a story becomes more challenging when the story is being created
in real-time, since there is not time to prepare and adapt the music to apply on that story before
it is being shown to the viewers.
This work consists in the creation of a solution that answers to the challenge of using
music and sounds to score a story being created in real-time in a storytelling environment. We
use pre-composed music and sounds to score the story and we change musical parameters
according to changes happening in the story. The characters present in scene, their actions and
the emotions felt by these are some of the features which this work relies on to make those
changes.
The experiments validated the proposed solution as an addition of entertainment and
understanding to a story, making it more complete and enjoyable to watch.
Keywords: Music, Real-Time scoring, Storytelling environment, Emotions,
Entertainment, Understanding
Portuguese
Som e Música têm sido sempre uma presença constante na vida de quem tem a sorte
de os conseguir ouvir. O mundo não seria completo sem a existência destes. Além do aspecto
de entretenimento que a música proporciona, também pode ser usada como uma forma de
complementar e dar mais informações sobre uma história.
No contexto de ambientes contadores de histórias, isto é especialmente útil pois é
desejável que não só seja agradável visualizar a história mas também que se perceba o melhor
possível o que estamos a ver. A tarefa de criar um acompanhamento musical para uma história
torna-se um desafio maior quando esta está a ser criada em tempo-real, pois não há tempo
para preparar e adaptar a música e sons a serem aplicados à história antes desta estar a ser
mostrada à audiência.
Este trabalho consiste na criação de uma solução que responde ao desafio de utilizar
música e sons para acompanhar uma história que está a ser criada em tempo-real num
ambiente contador de histórias. São utilizados música e sons pré-compostos para acompanhar
essa história e são mudados parâmetros musicais conforme alterações que surgem na história.
As personagens presentes em cena, as suas acções e as emoções sentidas por estas são
algumas das características nas quais o trabalho se baseia para fazer estas alterações.
Os resultados validaram a solução proposta como uma adição de entretenimento e
compreensão a uma história, tornando-a mais completa e agradável de visualizar.
Palavras-chave: Música, Acompanhamento musical em tempo-real, Ambientes
contadores de histórias, Emoções, Entretenimento, Compreensão
Contents 1. Introduction................................................................................................................................ 1
1.1 Motivation............................................................................................................................ 1 1.2 Objectives ........................................................................................................................... 2 1.3. Outline................................................................................................................................ 3
2. Storytelling environments .......................................................................................................... 5 2.1. A Generic Storytelling Environment ................................................................................... 5 2.2. Valence and Arousal .......................................................................................................... 6 2.3. Story Development ............................................................................................................ 7 2.4. Related Storytelling Environments..................................................................................... 8 2.5. Summary.......................................................................................................................... 11
3. Background: Music, Emotions and Film Scoring..................................................................... 12 3.1. Music and Emotions......................................................................................................... 12 3.2. Film Scoring ..................................................................................................................... 17 3.3. Summary.......................................................................................................................... 23
4. Related Work........................................................................................................................... 24 4.1. Marc Downie’s work ......................................................................................................... 24 4.2. Roberto Bresin’s work...................................................................................................... 25 4.3. Pietro Casella’s work ....................................................................................................... 25 4.4. Andrew de Quincey’s work .............................................................................................. 26 4.5. Nakamura’s work ............................................................................................................. 26 4.6. Alex Brendt’s work ........................................................................................................... 27 4.7. David Cope’s work ........................................................................................................... 27 4.8. Summary.......................................................................................................................... 28
5. The sound of D3S ................................................................................................................... 30 5.1. Understanding and enjoying a story ................................................................................ 30 5.2. Event sounds and background music.............................................................................. 32 5.3. The Artistic task................................................................................................................ 37 5.4. Adding dynamism to music and sounds .......................................................................... 38 5.5. Summary.......................................................................................................................... 40
6. Architecture and Integration .................................................................................................... 41 6.1. I-Sounds........................................................................................................................... 41 6.2. Apollo – D3S Music Manager .......................................................................................... 46 6.3. Integration with the storytelling systems .......................................................................... 51 6.4. Summary.......................................................................................................................... 55
7. Evaluation................................................................................................................................ 56 7.1. General system evaluation - D3S and story enjoyment................................................... 56 7.2. Background music’s tempo and environment’s intensity................................................. 58 7.3. Association between characters and instruments ........................................................... 61 7.4. Themes associated to characters .................................................................................... 63 7.5. Use of sounds to underscore events ............................................................................... 67 7.6. Volume and relations between characters....................................................................... 69 7.7. Character theme and mood ............................................................................................. 71 7.8. Music associated to story’s key moments – Friendship and Duel Themes..................... 73 7.9. Music associated to story’s key moments – Climax Theme ............................................ 76
8. Conclusions ............................................................................................................................. 78 8.1. Future Work ..................................................................................................................... 79
References .................................................................................................................................. 81 Appendix A .................................................................................................................................. 84 Appendix B .................................................................................................................................. 87
List of Figures Fig. 1 – A generic storytelling environment ...................................................................................................5 Fig. 2 - Emotion Model (Russel,1980)...........................................................................................................6 Fig. 3 - Freytag’s Pyramid.............................................................................................................................7 Fig. 4 - Affective Guideline............................................................................................................................8 Fig. 5 – I-Shadows System ...........................................................................................................................9 Fig. 6 – I-Shadows story .............................................................................................................................10 Fig. 7 – FearNot! story ................................................................................................................................11 Fig. 8 - Thayer’s model of mood .................................................................................................................14 Fig. 9 - Features extracted from music and their correlation with emotional state ......................................15 Fig. 10 – Relations between musical parameters and human expressions ...............................................17 Fig. 11 – Project (void *) .............................................................................................................................24 Fig. 12 - Ghostwriter ...................................................................................................................................26 Fig. 13 – Berndt’s system overview of the approach to realtime adaptive orchestration and performance.27 Fig. 14 – Relation between discussed works and D3S ...............................................................................29 Fig. 15 – Story enjoyment and understanding.............................................................................................31 Fig. 16 – D3S Framework ...........................................................................................................................32 Fig. 17- Association between Mood Values and the type of Hero Theme...................................................35 Fig. 18 - Changes of acts in I-Shadows ......................................................................................................36 Fig. 19 - Possible discretization of the General Music ................................................................................36 Fig. 20 - Disney’s Peter and the Wolf – 1946..............................................................................................39 Fig. 21 – Freytag’s pyramid with music’s Tempo variable...........................................................................40 Fig. 22 – Original I-Sounds architecture......................................................................................................42 Fig. 23 – Integration of Apollo .....................................................................................................................44 Fig. 24 – D3S Interface ...............................................................................................................................46 Fig. 25 - Input and Output of Apollo – The music manager.........................................................................49 Fig. 26 - Internal Structure of Apollo for the Background Music..................................................................50 Fig. 27 - Communication between I-Shadows and I-Sounds ......................................................................51 Fig. 28 – Interaction example between I-Shadows and I-Sounds ...............................................................52 Fig. 29 - FearNot! Flow Chart......................................................................................................................54 Fig. 30 - Different scoring systems and associated interest........................................................................58 Fig. 31 - Different music’s tempo speeds and associated perception of the story intensity .......................60 Fig. 32 – Percentages of recognition ..........................................................................................................62 Fig. 33 – Frequency percent of score .........................................................................................................63 Fig. 34 - Characters used on this experiment .............................................................................................64 Fig. 35 – Percentages of recognition ..........................................................................................................65 Fig. 36 – Frequency percent of score .........................................................................................................65 Fig. 37 – Hero, Friend and Villain Colors ....................................................................................................66 Fig. 38 – Experiment 5 results ....................................................................................................................68 Fig. 39 – Experiment 6 box plot ..................................................................................................................70 Fig. 40 – Experiment 7 results ....................................................................................................................72 Fig. 41 – Participants distribution................................................................................................................73 Fig. 42 - Answers for the videos with the boy and the dragon ....................................................................75 Fig. 43 - Answers for the videos with the boy and the fairy.........................................................................75 Fig. 44 – Experiment 9 answers.................................................................................................................77 Fig. 45 – Brief description of the testing procedure.....................................................................................84 Fig. 46 – Experiment 1 versions..................................................................................................................88 Fig. 47 - Experiment 1 Descriptive Statistics...............................................................................................88 Fig. 48 - Experiment 1 Friedman Test.........................................................................................................88 Fig. 49 – Experiment 2 versions..................................................................................................................89 Fig. 50 - Experiment 2 Descriptive Statistics...............................................................................................89 Fig. 51 - Experiment 2 Kruskal-Wallis Test .................................................................................................90 Fig. 52 - Experiment 3 Descriptive Statistics...............................................................................................90 Fig. 53 - Experiment 3 Chi-Square Test......................................................................................................91 Fig. 54 – Experiment 4 versions..................................................................................................................92 Fig. 55 - Experiment 4 Descriptive Statistics – with sound..........................................................................92 Fig. 56 - Experiment 4 Chi-Square test – with sound..................................................................................93 Fig. 57 - Experiment 4 Chi-Square Test – without sound............................................................................94 Fig. 58 – Experiment 5 versions..................................................................................................................95 Fig. 59 - Experiment 5 Descriptive Statistics...............................................................................................95 Fig. 60 - Experiment 5 Mann-Whitney test..................................................................................................95 Fig. 61 – Experiment 6 versions..................................................................................................................96 Fig. 62 - Experiment 6 Kruskal-Wallis test ..................................................................................................96 Fig. 63 – Experiment 7 versions..................................................................................................................97
Fig. 64 - Experiment 7 Kruskal-Wallis test ..................................................................................................97 Fig. 65 – Experiment 8 versions..................................................................................................................98 Fig. 66 - Experiment 8 Kruskal-Wallis test for themes ................................................................................98 Fig. 67 - Experiment 8 Kruskal-Wallis test for characters ...........................................................................99 Fig. 68 - Experiment 8 Kruskal-Wallis test for versions...............................................................................99 Fig. 69 – Experiment 9 versions................................................................................................................100 Fig. 70 - Experiment 9 Descriptive Statistics.............................................................................................100 Fig. 71 - Experiment 9 Mann-Whitney test................................................................................................100
1/100
1. Introduction
Have you ever imagined the world without sounds and music? It would certainly be an
incomplete world where people couldn’t communicate with each other through speech and
where the birds couldn’t sing. Bands or composers would not exist and we wouldn’t be able to
hear those great musical pieces that delight our ears every time they are played. The movies
would be silent and there would not be any music or sounds on them.
Sound and Music have a big influence on the way we experience everything around us. It is
known that, if we didn’t have those, the world wouldn’t be perceived as it is today. We cannot
deny the existence and importance of sounds and music – they are everywhere. Therefore, they
play a big role in entertainment as well.
Music and sounds can be used to score a story that is being told in the cinema, theatre,
animated cartoons or games. By that, we mean that they can be used to accompany that story
in a way that they can emphasize situations of the story, create suspense and generate fear on
the viewer or other emotions.
We can think about the early 20th century silent movies, where it wasn’t possible to record any
dialogue. However, even though there was not any spoken dialogue, the directors tried to at
least use some background music to bring some color to those black and white silent movies of
that era. Even though that background music did not have much more than the purpose of
entertaining the audience so that they would not be surrounded by complete silence, the use of
music to accompany films and stories have evolved.
Nowadays, music is used as a tool to help the understanding of the story, such as, for example,
helping to identify characters: who are the characters of the story and what their role is,
situations where they are involved and emotions felt by these and other characteristics of a
story.
Music can also create emotional effects on the audience. The audience emotional response to a
story can be enhanced or even influenced through the use of music.
This work aims to create a solution to score stories automatically in real-time by adding
entertainment to the visualization of that story and helping the audience to understand it better.
1.1 Motivation
We can think about music that is used on cinema movies, on theatres, on animated cartoons
and even on video games. The importance of sound and music in these environments is
undeniable (scoring). The scoring is used as a complement to the visual content helping to
better understand it. In these environments, the stories that are being told are static and pre-
made. Therefore, when the music is being created to accompany them, it is possible to know
what will happen in the story. That way, we can plan the use of music according to the events
and situations happening on the story.
2/100
In contrast, another place where we can consider the importance of the existence of sounds
and music is in real time storytelling environments. In these environments, a story is being
created in real time by a computer and is being told to an audience that looks forward to
understand and enjoy it. We can consider extending the artistic elements of music and sounds
to these environments as well.
In these environments, the stories are told in a dynamic and interactive way, where the actors
are real and/or virtual and they have a strong emotional component. Since the stories in these
environments are being created in real-time and are most of the times interactive, one cannot
predict what will happen throughout the story. The unpredictability of user actions and other
factors such as the story development make the creation process of music and sounds to
accompany it more difficult. The extension of using music to these environments is not
straightforward, and presents a problem that needs to be addressed. Further, the presence of
autonomous virtual agents makes the creation of these stories more emergent, making difficult
the process of creating sounds and music that is adequate.
In these environments, the story is the most important aspect. If we think about a movie with a
large variety of interesting special effects, one might enjoy watching it without really
understanding the story. In storytelling environments this is not possible, since its all about the
story.
We must then think about using sound and music to help the audience to understand it. By
understanding a story, we consider essential to understand the following topics: understand the
character’s roles, understand their actions, understand their emotional states and understand
the interaction and relations between characters.
The challenge of this thesis can then be summarized as the following: how to score a story,
which is being created in real time by these environments, using sounds and music in a way
that adds more understanding to what is happening in the story and consequently increases the
enjoyment to the viewer?
1.2 Objectives
The goal of this thesis is to create a system which generates music to accompany an interactive
storytelling system making the experience of watching the story more enjoyable for the viewer.
We will consider that, for a viewer to enjoy a story he has to understand it. Therefore, we will
need to find a way to score that story adding more comprehension to it and thus helping the
viewer to understand what is happening on it.
On the other hand, we also want to bring the entertaining factor to the music itself. We want that
music to be enjoyable to hear and not only use it as a tool to help the viewer to understand the
story being told.
Ultimately, we want to understand how important and influent music and sounds are for the
scoring of storytelling environments.
3/100
We can then summarize the objectives of this thesis as the following:
1) Create a framework that allows the scoring of storytelling environments in real-time. This
framework should be generic and be able to be integrated with various storytelling
environments.
2) Utilization of music as a tool to enhance the understanding of the story. We want to use
music to help the audience to understand what is going on the story.
To help define the understanding of a story, we can formulate the following hypothesis:
To understand a story, one must understand:
- The characters’ roles;
- The characters’ actions;
- The characters’ emotional states;
- The characters’ actions;
- The interaction and relations between characters.
3) Guarantee the entertainment of the audience through the use of music. Besides utilizing it to
enhance the understanding, music should always be a good way of adding entertainment to the
story just through the pleasure of listening to it.
4) Include the work of this thesis in the I-Sounds system, which is an existing framework that
allows the scoring of storytelling environments. This framework was previously used to integrate
a real time music composition module and connect it to the storytelling environment I-Shadows.
5) Extend the use of the framework and adapt it to other storytelling environments, such as
FearNot!
These are then the objectives that Dynamic Story Shadows and Sounds (or D3S) will try to
solve.
1.3. Outline The following of this document is organized in seven different chapters.
This dissertation will be contextualized in chapter 2, where the storytelling environments will be
discussed. We will give a brief description of a generic storytelling environment explaining its
main components and their relevance for this work.
In the first part of the “Background: Music, Emotions and Film Scoring” chapter we will report on
some work that has been made in establishing relations between musical parameters and
emotions. In the second part, we will give an overview about the area of creating music and
4/100
sound to score films (Film Scoring), which is tightly connected to this thesis and was a major
source of inspiration for the features of the work in it.
In the “Related Work” chapter we will do an overview on the use of computing systems that
have been developed in the same area of this thesis.
The section “Sound of D3S” will explain the theory that supports what have been done in D3S
where the framework and all its features will be explained in detail.
In the “Architecture” chapter, we will explain all the technical details related to not only to the
system were the D3S was built in (I-Sounds system), but we will also explain details of
integration with other storytelling systems such as FearNot!.. The music manager module
Apollo, that was implemented to fulfill some of the objectives of this thesis, will also be
described in this chapter and it will be given an example of interaction between D3S and a
storytelling system.
In the “Evaluation” chapter, all the testing of the system and its consequent result’s analysis and
discussion will be explained. In this section, among other topics, we will see to what point music
is important and relevant for scoring a storytelling environment.
The final section “Conclusions” will give the final considerations about this thesis, as well as
there will be a sub-section about future work that can be developed within the area of musical
scoring of storytelling environments.
5/100
2. Storytelling environments
In this chapter we will contextualize our thesis with storytelling environments and to discuss
some of their main characteristics. This will allow the further study about music integration on
them. We will also present the storytelling systems that are closely related to it, since they are
the systems where this work was tested on. Those systems are I-Shadows system and
FearNot! System. The first consists on a shadow theatre, where shadows manipulated by real
users interact with other shadows created by the computer. The second consists in the creation
of improvised dramas to address bullying problems.
2.1. A Generic Storytelling Environment
A storytelling environment consists on a place where a story is being told. It is a combination of
characters and their surrounding environment that together develop that story. Generally, these
environments are being created and developed in real-time while the audience is watching the
story. In some of these environments, it is possible for the audience to interact with the
characters and join the story. In Fig. 1 we can see a representation of a generic storytelling
environment.
Fig. 1 – A generic storytelling environment
The characters have a name, a role in the story and a mood. The name is what the characters
will be recognized as, while the role is their function within the story. The mood corresponds of
6/100
their inner emotional state, whether they are happy or sad. The characters also have an
emotional relation between them. A character might like another character, while at the same
time might dislike a third character.
The characters can do actions in the environment. They can do actions towards other
characters, towards objects or they can express emotions. These actions have an impact on the
environment and may change the relations between the characters and their moods.
The environment is where the characters are acting. It not only contains the characters but can
also contain objects which they can interact with. The environment also has a generic arousal
and valence values, which corresponds to how tense / relaxed or happy / sad the emotions on
the environment are. Due to the importance of these concepts for this dissertation, the next
section is dedicated to a more detailed explanation on the arousal / valence concepts.
2.2. Valence and Arousal
In the psychology area, there are two concepts related to emotions that are also connected to
storytelling - “valence” and “arousal”. Valence is a term used in psychology, especially when
discussing emotions, that means the intrinsic attractiveness (positive valence) or aversiveness
(negative valence) of an event, object or situation [1]. For example, emotions usually referred as
“positive” (such as happiness, joy) have a “positive valence”, while “negative” emotions (such as
fear, anger) have a “negative valence”. Arousal is a physiological and psychological state of
being awake. It is important on regulating consciousness, attention, information processing and
on motivating certain behaviours. Emotions can then be conceptualized along these 2
dimensions. Arousal consists on a single axis of intensity increasing from neutral to maximally
arousing. Valence can be described as a bipolar dimension, consisted of positive and negative
values. [2] It is possible then, to associate emotions within this 2 dimension space. In 1980,
Russel defined an Emotion Model, where several emotions were distributed on this space
according to their valence and arousal characteristics. Valence is represented on the horizontal
axis, while Arousal is on the vertical axis. (Fig. 2)
Fig. 2 - Emotion Model (Russel,1980)
7/100
This model is interesting on the context of this thesis because we can make a direct relation
between the most important human emotions and the dimensions of valence and arousal.
These are present on the storytelling environments and can be used to help the introduction of
music to accompany the story. Next, we will see how the stories can be created and developed
in a storytelling environment, and how we can associate valence and arousal with them.
2.3. Story Development
On the storytelling environments where the users can interact within, one is faced with the
problem of conciliation between the story planned by the system and the actions the users are
free to make that may change the story. Because of that, when creating a story to be followed
by characters in real time (being real or virtual), there is the need to follow a guideline so the
story makes sense and there are no critical deviations made by the users from it.
Freytag (1983) defined a pyramid that would define the drama development that generally
followed the same pattern along a variable called Tension.
On the Freytag Pyramid, there are 5 different acts:
• Exposition – provides the information about the environment, characters and their
relations.
• Rising Action – reaction to negative events that happen to prevent the protagonist
from reaching his goal
• Climax – turning point, where usually the protagonist succeeds on his goal
• Falling Action – everything returns back to the normal
• Denouement – conclusion of the story
Fig. 3 - Freytag’s Pyramid
It is possible to establish an association between this pyramid and valence. That way, we can
say that the story starts with a positive mood, suffers a negative impact and then ends in a
positive conclusion. Figure 4 shows the Affective Guideline of the story development used in I-
Shadows storytelling environment which illustrates that association.
8/100
Fig. 4 - Affective Guideline
If we wish to introduce a musical component to these environments, then it is essential to take
into account the usefulness of story development, not only on the impact it makes on the current
emotion state of each character and on the scene emotion, but also on predicting what will
happen in the current story, allowing the music to know it and try to predict it too, by helping
creating effects of suspense or anticipation.
2.4. Related Storytelling Environments
In storytelling environments, the story is the most important factor. However, it is possible to
emphasize the emotions emerging on each scene by using music to go along with the action.
Besides that, the use of music can create effects of suspense and anticipation on the viewers of
what will happen in the story. It is a powerful way to intensify the viewer enjoyment and
experience when assisting a story created on such environments.
We will dedicate special attention to the I-Shadows and FearNot! systems, since they were the
two environments where this thesis was tested on. We will follow with a brief explanation and
overview about the two systems.
2.4.1. I-Shadows
I-Shadows (Fig. 5) is an Affective Interactive Drama system, based on Autonomous Affective
Characters and Drama theory. The objective of the creation of this system consisted of creating
an interactive experience between real actors (the users) and the virtual actors (some Chinese
shadow puppets). On this system, users can use different puppets and create a story in
cooperation with the system. To be able to interact with the story, the virtual actors have an
agent architecture that supports emotion reactions, goal oriented behaviour and social
9/100
interactions. Besides that, there is a specific agent (a story director) that coordinates the actors,
allowing them to appear or disappear from the scene of the story. [3] [4]
Fig. 5 – I-Shadows System
Characters
I-Shadows system implements a very rich cast of characters, with appropriate actions and an
emotional behaviour. To achieve this emotional behaviour, the system uses an OCC model of
emotions (Ortony, Clore, & Collins, 1988) based architecture (FAtiMA) [5] developed at GAIPS
(Intelligent Agents and Synthetic Characters Group) for the minds of the characters and
director. [3]
There are two different types of characters on I-Shadows, the Real Characters, that are puppets
manipulated by the user and that are detected by the system using a vision component and the
Virtual Autonomous Characters that are implemented by the system itself. The Director’s
function is to conciliate both Real and Virtual characters’ perspective in the story development
[3].
Relation between characters
In I-Shadows, each character has its own view of the environment and, not only they react
differently to events, but they also have different relations amongst them. That way, each
character has an initial relation towards all the other characters. These relations can develop
and change dynamically along the story development and influence their behaviour.
Some characters may be more relevant for the story, such as the hero, the villain, and the
princess, since they are the main characters and the story develops around them. The relations
between these are usually stereotyped and can be set as initial relations. For example, the
villain emotional relation towards the other two is a negative relation (anger, hate). On the other
hand, the emotional relation between the hero and the princess is a positive one (love,
friendship).
Emotional state
Each character has its own personality. Because of that, each character reacts differently when
perceiving an event from the environment. They also have an own emotional state, which
corresponds to his mood at each moment. Based on their valenced reaction to each event or to
10/100
their relation towards other characters present on the scene, their own emotion state may
change and adapt to the new situation.
Fig. 6 – I-Shadows story
2.4.2. FearNot!
This application software was developed within the VICTEC (Virtual ICT with Empathic
Characters), a European framework V project which was carried out between 2002 and 2005.
The project considered the application of 3D animated synthetic characters and emergent
narrative to create improvised dramas to address bullying problems for children aged 8-12 in
the UK, Germany and Portugal. One of its main goals was to develop synthetic characters
which would create an emphatic connection with the user. The users would be children which
would be exposed to bullying scenarios through this application. Then they could explore those
bullying scenarios and cope with those synthetic characters to find strategies that could solve
their problems. [6][7]
Characters
The characters in FearNot! consist in a bullied child – the main character - a pair of bullies and a
pair of friends of the main character. The gender of the characters changes according to the
gender of the child that is using the application. That way, the users can identify themselves
more with the situations that are exposed to during the story.
Story Sequence
The story presented in FearNot! has two different aspects: one is shown when the child is
outdoor being bullied; another aspect takes form where the child is at his/her home and has a
chat with the user about what is happening and asks for his/her help. In the first one, the user
can’t interact and the story is being generated automatically. In the second one, the user can
11/100
write to the synthetic character using a text chat and cope with him to find strategies to solve the
problems.
Fig. 7 – FearNot! story
2.5. Summary In this section we have presented the context behind the development of this thesis. A brief
explanation on generic storytelling systems was made. We discussed relevant themes such as
valence and arousal and its relation to story development that we can find in the I-Shadows
system. A special attention was given to the two storytelling systems that are closely related to
this thesis, I-Shadows and FearNot!. Before the implementation of this work, I-Shadows musical
scoring was limited to an association with the emotional state of the environment. FearNot!
system did not have any sound scoring at all. With D3S, the musical scoring got more complete
on I-Shadows and was also used to score FearNot! stories.
12/100
3. Background: Music, Emotions
and Film Scoring
This chapter is divided in two major sections. In the first section, we will discuss relevant
aspects about the relation between music and emotions and the importance of that relation in
the context of this dissertation. We will present some works that relate several musical
parameters with different emotions.
The second section will be focused on film scoring, which consists on the process of creating a
musical accompaniment to a film. We will discuss most of its characteristics in order to show its
relevance and purpose within the movies. The film scoring area was a major source of
inspiration for the work developed in this dissertation.
3.1. Music and Emotions
Wittgenstein [8] tells us that we understand music and language in similar ways but music is not
a language because we can’t communicate through music as we can through language.
However, between other things, it is still possible to communicate something through music –
emotions. In another perspective, Aristotle tells us that music is mimetic or imitative. By saying
that, he meant that music can be a representation of a person’s emotions and moods. If he was
right, then maybe we have a greater direct connection to music than we do in attempting to put
words to our emotions or to our emotive responses to music, because music can be imitative of
our emotions. Following Wittgenstein and Aristotle, we can understand music as something that
goes beyond words and only exists beyond words. That can provide an explanation to why we
respond the way we do to the music. Music can be so much closer to the affective field that it
cannot be described by words. [8]
In storytelling environments, it is important for the viewer to understand the emotions that each
scene may suggest. That way he can understand the story better and some ambiguities may
disappear. Being music an exceptional way to transmit emotions, then we need to make an
association between music features and emotions states.
The ability that music has to affect and manipulate emotions and the brain is undeniable, and
yet largely inexplicable. Not many studies have been made in this area until recently, since
music and biology fields were considered mutually exclusive.
One of the problems found when trying to study the musical’s emotional power is that defining
music’s emotional content may be very subjective, since a piece of music can be experienced in
different ways by each person who hears it. The emotion created can be affected by memories
that are remembered to the listener, the environment where the music is being played, the
mood of the listener at the time, his personality, culture and other multiple factors that can
influence the emotions felt by that listener. [9]
13/100
A study was made [10] in order to test the brain responses to pleasant and unpleasant music.
Unpleasant music was defined in this test as being music with a high rate of dissonances within
it. The findings of this study suggested that music may recruit neural mechanisms similar to
those previously associated with pleasant/unpleasant emotional states.
The rewards given by the brain while listening to pleasant music were analyzed by Blood, A.J
and Zatorre, R.J on a study [11] where several users were tested while they were listening to
pleasant music. Rewards given by their brains like “shivers-down-the-spine” or “chills” were
analyzed through the measuring of cerebral blood flow. Besides those chills, there were reports
of changes in heart rate, electromyogram and respiration. While listening to the pleasant music,
some brain structures were activated which are related with other euphoria-inducing stimuli,
such as food, sex and abuse of drugs. As a conclusion, they stated that this finding links music
with biologically relevant, survival-related stimuli via their common recruitment of brain circuitry
involved in pleasure and reward.
Another quantifiable aspect of emotional responses to music is its effect on hormone levels in
the body [12] [13]. There is evidence that music can lower levels of cortisol in the body
(associated with arousal and stress), and raise levels of melatonin (which can induce sleep) [12]
.This explains the music’s ability to relax, calm and give peace.
Despite several studying on this area, some questions as “How does music succeed in
prompting emotions within us?” or “Why are these emotions often so powerful?” cannot be
answered yet. It is possible to quantify the emotional responses caused by music, but it’s not
possible to explain them. [9].
Classification of the emotions associated to music is a challenging problem. Simply assigning
an emotion class to a song segment in a deterministic way does not work well because not
everyone shares the same feelings for a song. Listening mood, environment, personality, age,
cultural background etc, can influence the emotion perception. Because of these factors,
classification methods that simply assign one emotion class to each song in a deterministic
manner do not perform well in practice [14] [15] [16]
Yang, Liu and Chen [17] proposed then a fuzzy approach to classify music emotions, where
they created a model based on Thayer’s model of mood [18] (Fig. 8). They would divide the 2
dimension space (arousal – valance) in four parts, being each part associated with a type of
music.
14/100
Fig. 8 - Thayer’s model of mood [18]
The use of music as objective to intensify emotions was also used on other artistic areas
besides narration, such as painting. Cheng-Te and Man-Kwan [19] developed a system that
could generate automatic music to accompany a slideshow of impressionism paintings. They
said that in particular, paintings have more affective content than photos and impressionists are
more concerned with conveying emotions. Music elements which affect the emotion include
melody, rhythm, tempo, mode, key, harmony, dynamics and tone-color. Among these music
elements, melody, mode, tempo and rhythm have stronger effects on emotions. Generally
speaking, major scale is brighter, happier than minor; rapid tempo is more exciting or tenser
than slow tempo.
Other relevant works have been developed related with the association between emotions and
different types of music. Due to its high importance for this thesis, the next sub-section is
dedicated to it.
3.1.1. Different types of music for different emotions
When listening to some type of music, one may feel emotions and associate them to that music.
As explained before, that association may depend on numerous factors, depending on who’s
listening to it. However, it is still possible to make some relations between characteristics of
music and the emotions that they may suggest. We will present 3 different works that were
made on this area.
Juslin, Bresin and Friberg work
Juslin, Bresin and Friberg [20][21] have made a correlation between emotions and certain music
features. Several users listened to music played with different performances where each
performance had its own music features. It was possible to show that most listeners recognized
the intended emotions correctly when using different features of each music played associated
to those emotions. Some of the results of their work are shown on the next table (Fig. 9)
15/100
Emotion Music Feature Value
Fear Tempo
Sound Level
Articulation
Irregular
Low
Mostly non-legato
Anger Tempo
Sound Level
Articulation
Very rapid
Load
Mostly non-legato
Happiness Tempo
Sound Level
Articulation
Fast
Moderate or load
Airy
Fig. 9 - Features extracted from music and their correlation with emotional state[20][21]
Gabrielsson and Lindström work
Similarly, Gabrielsson and Lindström [22] have made an association between the emotions of
Happiness and Sadness and several musical parameters: articulation, harmony, loudness,
melodic range, melodic direction, mode, pitch level, rhythm, tempo and timbre. The results were
the following:
Happiness in music parameters Sadness in music parameters
- Articulation – staccato - Articulation - legato
- Harmony – simple and consonant - Harmony – complex /dissonant
- Loudness - loud - Loudness - soft
- Melodic range - wide - Melodic range - narrow
- Melodic direction – ascending - Melodic direction - descending
- Mode – major - Mode - minor
- Pitch level – high - Pitch level – low
- Rhythm – regular / smooth - Rhythm – firm
- Tempo – fast - Tempo - slow
- Timbre – few harmonics - Timbre – few harmonics, soft
As a result of their study, they concluded that happiness and sadness may be like opposite
emotions while using these parameters, since most of them had opposite results. An exception
was founded on the timbre, where both happiness and sadness use few harmonics to be
expressed.
Jan Berg and Johnny Wingstedt work
Another example is Berg and Wingstedt’s work [23]. Six musical parameters were considered to
their study involving the emotions ‘happiness’ and ‘sadness’. Those parameters were mode
(major, minor), instrumentation (timbre), tempo, articulation (legato-staccato), volume and
register.
The mode of a music is often associated with the emotion of happiness and sadness, being the
major mode linked to the first one and the minor mode to the second one. Besides that, the
16/100
major mode can also be associated with grace, serenity and solemnity, while the minor mode
can be associated with dreamy or dignified qualities of tension, disgust and anger. [22] On this
work, there were tests made with users. The objective was to make a relation between the
major / minor mode of music and the happiness / sadness emotions. Results show that over
95% of the users recognized the major mode as being associated with happiness and the minor
mode associated with sadness. Besides that, they concluded that users with musical studies
had linked this relation even stronger than non-musicians. This was the test with most relevant
and conclusive results, which may help us to conclude that there is a strong link between mode
and those two emotions.
According to the instrumentation parameter, it has been shown that sounds with a rich harmonic
spectrum may suggest potency, anger, disgust, fear, activity or surprise. Sounds with a few, low
or suppressed harmonics may be associated with pleasantness or happiness as well with
tenderness, sadness or boredom [22] [23].
Tempo is other relevant factor, where one can associate fast tempo with expressions such as
happiness/joy, activity/excitement, potency, surprise, anger or fear. Slow tempo may be
associated with sadness, calmness/serenity, dignity/solemnity, tenderness, boredom and
disgust [22][23].
Articulation is what defines the overall note-length of the music played, being staccato
associated with shorter duration notes and with the expressions of gaiety, energy, activity, fear
and anger. On the other hand, legato is associated with longer duration notes and may transmit
the expressions of sadness, tenderness, solemnity and softness [22][23].
Volume is associated with the loudness of the music. Loud music may transmit expressions like
joy, intensity, power, tension or anger. Soft music may transmit sadness, softness, tenderness,
solemnity or fear [22][23].
Register tells about how high or low the pitch level is. High pitches are usually associated with
happiness, grace, serenity, dreaminess, excitement, surprise, potency, anger, fear and activity.
Low pitches suggest sadness, dignity, solemnity, boredom or pleasantness [22][23].
Even if these parameters can be directly associated with emotions, when two or more are
combined, then the result may be different. They showed that the minor mode was mostly
associated with sadness. However, if it is added with a fast tempo, then the music may be
perceived as ‘happy’ [23]. The following table summarizes the association between the
parameters and the expressions/emotions.
17/100
Fig. 10 – Relations between musical parameters and human expressions [22][23]
3.2. Film Scoring
Film Scoring consists on creating a musical accompaniment to a film. This area has already
been studied and developed for some years now. It started in the first times of cinema, where
the score was used on silent movies so they could keep the viewer focused and interested on
the movie. Nowadays, film scores have a big importance to the movie and music is composed
exclusively for the films, being played by big orchestras or by well known bands. Even if its
purpose has changed, film music is a very important component on films and still provides other
purposes, as we will see in this section.
Since a film has a story involved, then we can make a relation between films and other
storytelling environments, such as I-Shadows. Since this thesis work will be focused on music to
accompany stories, then film scoring is the closest existing method to seek inspiration from,
being important to dedicate the rest of this chapter to it.
3.2.1. Film score functions
Usually it is hard to understand the difference between film score and film songs. While film
songs are songs used in a film that can be compiled in a soundtrack, the film score is the
“illustration” of the movie through the use of music, composed exclusively for that film.
The score can have a variety of functions in a film. These functions are generally dictated by the
film’s director and one or more can be used at the same time. The main functions of a film score
will be described next. [24]
Parameters Variations Emotions/Expressions Associated
Major mode happiness, grace,serenity, solemnity Mode
Minor mode sadness, dreamy, disgust, anger
Rich harmonics potency, anger, disgust, fear, activity, surprise Instrumentation
Few harmonics pleasantness, happiness, tenderness, sadness, boredom
Fast tempo happiness/joy, activity/excitement, potency, surprise, anger, fear Tempo
Slow tempo sadness, calmness/serenity, dignity/solemnity, tenderness, boredom, disgust
Staccato gaiety, energy, activity, fear, anger Articulation
Legato sadness, tenderness, solemnity, softness
Loud volume joy, intensity, power, tension, anger Volume
Soft volume sadness, softness, tenderness, solemnity, fear
High pitch happiness, grace, serenity, dreaminess, excitement, surprise, potency, anger, fear, activity Register
Low pitch sadness, dignity, solemnity, boredom, pleasantness
18/100
Source Music
Source Music is music that plays a key part in a scene of a film in which it is inserted in a scene
and every character of that scene is aware of it. This music is part of the film and it can be
heard or played by one of the characters. For example, if a character is playing an instrument
on the film, the sound of that instrument must be provided. The most complex type of source
music is music composed exclusively for the movie, such as orchestral music for an orchestra
that is playing on that movie.
Thematic music
Thematic music can be opening themes and character themes. This kind of music usually
parallels the action and can help the audience to recognize a certain character just by hearing
its theme or even changes that happened on that character, as it’s explained below on the
section “thematic music in character development”. [24]
Defining the ethnicity, location and period of the film
Film score music can be used to give hints to the audience about the ethnicity, location and
period of the film, adding realism to the scenes. The director can choose to take an existing
piece of music that represents or defines the period or location of the movie or ask the
composer to produce music with certain characteristics that would give those hints to the
audience about where and when the story of the film happens. Those hints can be made
through the use of instruments, like on the movie Braveheart (1995), where bagpipes were used
to give a Scottish feeling to the film, so it would help to clearly identify the location and ethnicity
of the characters of the film. [24]
Paralleling the action of the film
Music can also be used to emphasize action on the film. It can have the objective of accentuate
what the viewer is already seeing. One kind of films that uses this in an intensive way are
animated cartoons, where almost every action of a character is accompanied by music (for
example, when the coyote falls from a cliff while chasing road runner, the music also falls down
in pitch, usually in a chromatic sequence). Paralleling the action of the film is also called
underscoring, that will be expanded further ahead. [24]
Commenting on the film and adding to scenes
This function is considered one of the most intelligent ones, since music here has the function of
providing additional information to the viewer that is not explicit on the scene which is being
scored. This can be done in numerous ways. The first one is through an overture that consists
of an introductory music that has all the major themes of the movie and summarizes it, by giving
hints about the plot. For example, if there’s a love theme present on that overture, then the
viewer might guess that it will be present on the movie. A good example is Superman (1978)
19/100
which its overture includes a heroic march and a love theme, so the viewer knows that most
probably the hero will find love on his adventure.
Commenting the film through music score can also be done to describe a location shown on the
movie. Some landscapes may lead to misinterpretation if there’s not a musical score hinting
about the nature of it. For example, on the movie Legends of the Fall (1994) where the score
was composed by James Horner, the opening scene shows a scene of a beautiful landscape
that was accompanied by the score. If the scene was shown in silence, the viewer could not
have his attention focused on that beauty, and the scene could be understood as expressing
loneliness.
Another interesting example is the movie Jaws (1975), where the two notes musical motif is
played when a shark is coming, serving not only as the function of commenting the film by
hinting the viewer of what is about to happen but also providing emotional focus, as it will be
explained on the next part. [24]
Providing emotional focus
Another function of a film score is its ability to generate an emotional response from the viewer.
This is considered the strongest and most notorious function of film scoring, being all the other
functions related in a way to this one, since all of them have as objective to produce certain
emotions on the viewer. The use of orchestra is the preferred method to do this, where “strings
emphasize romance and tragedy, brass instruments emphasize power and sorrow (when used
in solos), and percussion heightens the suspense”. [24] Other relations between music and
emotions are described better in the section “Music and Emotions” of this document.
Underscoring
Underscoring consists on a piece of background music that is used to parallel the action of a
film. It can also be used as a mood enhancing accompaniment to a situation in a movie.
However, that music should not distract the viewer from the movie. [24] The use of character
theme’s can be used as underscoring, where the music that represents each character is
played when that character is on scene, having the necessary variation needed for the different
scenes and to accompany the character development. We can also see the example of the
movie Gladiator (2000) where the battle scenes had really intense music scoring them, while
other parts of the movie were scored with music that was not intense, having other musical
characteristics.
Sometimes the use of underscoring is abused making the effect of emphasizing the action
disappear. A bad example of the use of underscoring is the movie The Rock (1996), where it is
impossible to musically distinguish the scenes since they were all scored in the same bombastic
manner. [24]
20/100
Thematic music in character development
As explained above, thematic music consists on opening themes and character themes. A
character theme is a piece of music that is associated with a certain character and it is usually
played on the most relevant moments of the film where that character shows up. It is mostly
used for main characters and can help the audience not only to recognize the hero or the villain
but also to better notice changes on that character resulting from the action of that film. The
music can go along with the character development through that film, changing with that
character. A good example is the character theme scored for Luke Skywalker, on the Star Wars
movie (1977) by the composer John Williams. In the first scenes, Luke was shown as an
inexperienced young adult by using vigorous orchestrations to reflect his youth, with the
particular use of strings to represent hope and innocence. By the film’s end, when Luke finishes
his hero’s journey, the same theme is played in a different way, through the use of trumpets and
presented as a march. This expresses his maturity, newly found confidence, and strength -
changes of the character during the film that are also reflected in the music. [24]
Other functions outside the film
Music can be other functions outside the film, as it can be sold as soundtrack so the viewer can
listen at home their favorite film music. Some films themes have become so popular that can
serve as publicity for the correspondent film, being sometimes a theme better known than the
film itself. [24]
3.2.2. Pre-made music vs composed music
Sometimes, pre-made music is used temporarily within the creation of a movie while the
composer is making the dedicated music for it. That way, the director can have a first idea on
what he intends for each scene by using music with similar characteristics of the one under
production.
When making music for film scoring, the use of orchestras is preferred over popular music, not
only because of the vast number of instruments within the orchestra can allow the composer to
better show different emotions, but also because orchestral music ages much more gracefully,
while popular music tend to date quickly as styles rapidly evolve. [25]
3.2.3. Creation of the film score – production, composition and
synchronization
Being a film the result of a collective effort, the composer working on a film is not free to work
alone. He must submit to such demands as budget, number of performers, timing (sometimes
calculated in fractions of a second), the mood of each sequence, the producer's tastes, and the
pressures of production schedules. [26]
21/100
After the complete film or part of it has been shot, the composer is shown a “rough cut” of it and
talks with the director about the type of music being used on it. The process is called “spotting”.
Sometimes, when the musical score needs time to be developed, due to the complex nature or
importance of it, or because the scenes are dependent of the music (like dances, for example)
the director may talk with the composer before the shootage had take place.
During the process of composition, the composer may choose to write paper scores or use a
computer-based environment. The advantage of using the computer is that it is possible to
generate a midi file of the music which can be listened and approved before it is given to the
orchestra to perform. [25]
One of the hardest parts on the process of the film score creation is the timing and
synchronization of the score within the film scenes. This synchronization may be done by
adapting the image to the music or music to the image, depending on what would be easier to
do. For example, in an animated cartoon, usually its music that adapts the image, since the
continuity and fluency of the music is secondary, being the animation the main point. If it’s on a
movie where the director wants a scene where there will be a dance, then it should be the
image to adapt the music, being this last one the most important in this case. On the context of
a real time storytelling environment such as I-Shadows or FearNot!, the music would have to
adapt to the scenes, since obviously what matters is the story being shown on the shadows
theatre, which is being made in real time and there’s no previous preparation (in terms of timing
and synchronization) of the music produced for it.
Claudia Gorbman [27] defined seven principles which should be the functions of using
background music to accompany a narrative and what should be achieved when composing,
mixing and editing film music. Those principles are:
1 – ‘Invisibility’ – the technical apparatus that produces film music must remain invisible
2 – ‘Inaudibility’ – since background music is not the main focus of a film, it should not be heard
consciously, so the viewer can focus on the most important which is the dialogues, the visuals,
etc.
3 – ‘Signifier of emotion’ – music can suggest moods and emotions
4 – ‘Narrative cueing’ – music should work a) ‘referentially’/’narratively’, indicating point-of-view
and character/setting, and b) ‘connotatively’ – interpret and illustrate narrative events
5 – ‘Continuity’ – one of the objectives of background music is to fill “gaps”
6 – ‘Unity’ – this is achieved by using variations and repetition of musical material (such as
themes)
7 – “One might break with any of the above rules within the boundaries of reason”, making this
rules not absolute but as a guide to better understand background music functions.
22/100
3.2.4. The Symbolic, Real and Imaginary - three regimes of film
music
Robert Spande [28] created a theory, where he says that the film experience must imitate in
some way all three overlapping dimensions of subjective reality: the symbolic, the real and the
imaginary, and the use of film music is fundamental for it. These three dimensions are equally
important and none of them have any sort of “priority” or “primacy” over the others, being all
important for the experience that a subject feels when watching a film.
The Symbolic is related to the fact that film music’s objective is to emphasize/underline the
important dramatic aspects of the film.
The Real is the dimension related to facts that happened to the subject, such as traumas. So
the author defends that film music may have similar effects on the listener as traumas.
The Imaginary is connected to the fact that music is omniscient and always knows what is going
to happen on each scene.
3.2.5. Animated Movies
These types of movies take special advantage of the use of music scoring. Since most of them
(specially the older ones) have no dialogue or spoken commentary, the sound-track consists
simply of sound effects and music, being of a great importance for these movies.
One of the best example of the use of film scoring on animated movies are the Walt Disney’s
ones, where the synchronization between the soundtrack and the visual elements is so
complete that makes the cartoon viewing difficult to imagine without sound.
“The music and sound effects in animated movies have to create a lot more of the environment
and, many times, the emotions of the film. Animated, 2-D figures can’t emote in the way a
human actor can. The music in animated features often has to convey a lot more.” [29]
3.2.6. Scoring in Games
With the evolution of technology and the growing demand on the game’s market, using music in
games as scoring has become fundamental to improve the player’s experience when playing a
game.
Games are closer to the storytelling environments such as I-Shadows than films because there
is interactivity between the user and the environment. For example, music can be triggered
when the player reaches certain point in a game. The use of music could emphasize the
emotions given to the player by the game at that point.
Besides maintaining all of the functions found in films or television sound, games audio has
some distinct differences in the ways audio functions in games. Karen Collins [30] listed several
of those functions, being those commercial functions, kinetic functions, anticipating action,
drawing attention, structural functions, reinforcement, illusionary and spatial functions,
environmental functions and communication of emotional meaning.
23/100
3.3. Summary
In the first part of this chapter, we discussed the relation between music and emotions. We
presented some studies about the effect of listening to music on human brain. We saw that
listening to it may arise emotions. When listening to pleasant music, the rewards given by the
brain to the listener were similar to survival-related rewards. We also discussed that each
person has its own experience when listening to music and the emotions arised by listening to
the same piece of music can be different on each individual. Then we saw what relations have
been made between different kind of emotions and music characteristics, such as mode,
instrumentation, tempo, articulation, volume register and others. The most relevant association
being made was between the emotions of happiness and sadness that are opposite emotions in
terms of musical characteristics.
In this thesis’ context, knowing that each person feels differently towards a piece of music will
be useful when gathering the music that will fit a possible emotion state. One cannot go into
much detail by selecting music for each type of emotion, since there is a subjective association
between music and emotions depending on the listener. It would be more reasonable to group
emotions according to their similarities. The knowledge obtained by associating emotions with
different kinds of musical characteristics will help on that task as well.
In the second part of this chapter, we discussed the importance of the use of music as
accompaniment to films. Several of its functions were described, such as helping defining
ethnicity, location and period, paralleling the action, underscoring and providing emotional
focus. We also discussed some topics such as the use of thematic music on character
development, pre-made music vs. composed music and we described an overview on the
process of film score creation.
As a final conclusion for this section, we can say that a film score can’t save a bad film.
However, if it’s done well along with a good film, it can multiply the emotions felt by the viewer
and make his experience a most enjoyable one.
24/100
4. Related Work
After reviewing significant work in the fields of music, emotions and film scoring, it is essential to
present some of the works made in the computing area that used some of the concepts
mentioned in the previous sections. In this section we will then present seven distinct works:
Marc Downie’s behavior, animation music: the music and movement of synthetic characters;
Roberto Bresin’s Virtual Virtuosity – Studies in Automatic Music Performance; Pietro Casella’s
Music, Agents and Emotions; Andrew de Quincey’s Herman, Nakamura’s et al Automatic
background music generation based on actors’ mood and emotions, Axel Berndt’s Adaptive
Musical Expression from Automatic Realtime Orchestration and Performance and David Cope’s
Experiments in Musical Intelligence.
These works are relevant for this thesis and inspired the work here reported, since some of their
objectives and goals are similar.
4.1. Marc Downie’s work
Marc Downie’s work [31] named behavior, animation music: the music and movement of
synthetic characters consisted on the creation of synthetic characters within an architecture of
reactive behavior systems. These characters are autonomous characters that are complete but
simple. They are situated inside virtual worlds and we can interact with them. These characters
are related with music since one of the objectives of Downie’s work was the creation of music
through the use of those characters. One abstract character would be responsible for the music
control.
One musical character was built for the project (void *) (Fig. 11). This project consists of a group
of characters that would perform dance moves according to the interaction of the user with
them. The creation of the musical character had the objective of completing the environment
with a musical score. The objective of the music created by it was that it would never sound bad
and it would attend to emotional changes and behavioral decisions of the dancing characters as
well as changes on the camera system. Music would also have to exist only as a background
feature and should never be the primary attention target.
Fig. 11 – Project (void *)
The flow of the system for this musical character consisted on a behavior system that controls
tiles of pre-composed music (which reduces the risk of “sounding bad”), choosing when to play
25/100
them and how to change some of the tile features, such as volume. There is also pattern
generators between the behavior system and the tiles layout that decides the sequence of tiles
to play. These pattern generators can be used to predict the future.
The author noticed that in games most of the music is environmental background consisted of
music loop and occasional fade to silence. Because of that, he wanted to go beyond the use of
possible irrelevant background music and try to provide musical score for a medium with no
script that is up to the challenges offered by rich characters.
The use of music in (void *) follows the use of thematic development for the characters, use of
special music for key events and takes in account the emotions of the scene and which
characters are in scene at each moment.
The biggest problems arised by this musical system was the difficulty of dealing with time
related components, such as synchronization with the action in real time.
4.2. Roberto Bresin’s work
Roberto Bresin’s work [32] entitled Virtual Virtuosity – Studies in Automatic Music Performance
consisted on a research in the field of automatic music performance with a special focus on
piano. The objective of this work was to adapt the virtuosity of live performances made by
humans into an automatic music system. That way, it should be possible to have a system that
created music including time, sound and timbre deviations from a deadpan1 realization of the
score.
They proposed a system based on artificial neural networks (ANN). The system designed
listens to the last played note, predicts the performance of the next note, looks three notes
ahead in the score, and plays the current tone. It generates real-time sound level and time
deviations for each note represented in the input to the ANN. The ANN model was also used to
produce punctuation in performances. However, it failed at realizing legato and staccato
articulation. These performance features were used to create six macro-rules that represented
six different emotions: anger, sadness, happiness, fear, solemnity and tenderness.
4.3. Pietro Casella’s work
Pietro Casella’s work [33] named Music, Agents and Emotions had as objective the creation of
a generic cinematic musical agent able to create film-like music on demand. To achieve that
goal, they used automatic music generation algorithms with an emotional goal and automatic
management of the interaction between action and music with specific knowledge oriented to
affective goals. Two components were created so this objective could be achieved – an agent
named MAgentA and the system named Mediator.
MAgentA was the designation found for the music creation agent that focused on building music
through the use of composition algorithms which incorporated emotions on them. However, it
1 A deadpan play of a music does not have any kind of performance expressions
26/100
could not bring the “film like” music to the setting since, according to the author, technology is
still missing.
Mediator was the system responsible for integrating music in an interactive virtual environment.
It served also as a test-bed for models of action-music interaction as well as being a research
tool for game-music composers and directors.
4.4. Andrew de Quincey’s work
Andrew de Quincey’s work [34] named Herman is a project capable of generating music at
varying levels of tension in real time. It generates background music to a 3D haunted house
environment - Judy Robertson's PhD system, Ghostwriter [35] (Fig. 12), operated by children to
produce plays. One of the main objectives is to convey fear using suspense and surprise effects
through the music. According to the flow of the narrative, one of the human users can set the
level of tension manually, which the system responds to. The generation of music is made in 3
levels: 1 - high-level form, 2 - rhythm and volume and 3 - melody and harmony. The system
changes the parameters of each step according to the level of tension set previously by the
user. In the melody and harmony level, a custom probabilistic model is used to choose the
melody and it is used a set of composition rules and parameter values from harmony theories.
Fig. 12 - Ghostwriter
4.5. Nakamura’s work
Nakamura’s et al work [36] consisted on the creation of an automatic music generation system
that would help to improve the total quality of computer generated animations. The inputs for
this system are music parameters (mood types and musical motifs2) and motion parameters for
each scene of the animation that describes the action. The system then generates the music for
that scene starts by choosing tempo according to the emotional state.
The harmony is then chosen using a system of rules and its selection is based on the mood
type. Then the system chooses the melody, using the motifs instantiated to the harmony and
looped the necessary time. The rhythm is chosen using pre-made rhythmic patterns associated
to emotions and its selection is based on the mood type and tempo.
2 Motifs are short pieces of music. In this case they are pre-composed.
27/100
Finally it is made the selection of sound effects for motions that are determined according to the
characteristics and intensity of the motions. These effects are also synchronized with the music.
Both the background music and sound effects are generated so that the transitions between
scenes are smooth.
4.6. Alex Brendt’s work
Axel Berndt [37] developed a new approach named Adaptive Musical Expression from
Automatic Realtime Orchestration and Performance. It is based on musical orchestration
principles that consist on an adaptation of performative expression characteristics of a musical
piece.
The work focuses on how to make smooth transitions between musical pieces and at the same
time including some humanization features in those transitions through the use of dynamics and
tempo.
This work uses pre-composed music and changes dynamics and tempo on it. The changes in
dynamics consist in giving loudness instructions and changing between different loudness
levels (crescendos or decrescendos). The changes in tempo consist in subtle acceleration or
slowing (accelerando, ritardando) and making delays on the music. The MIDI standard is used
to output the music.
Fig. 13 – Berndt’s system overview of the approach to realtime adaptive orchestration and
performance
4.7. David Cope’s work David Cope [38] started developing his work called Experiments in Musical Intelligence (EMI) in
1981, where he intended to create a computer program which composed complete works in the
styles of various classical composers. He started by coding rules that would be followed by the
program to generate new music. However, the music created was lifeless and without much
music energy. He changed then the initial concept and revised the program to create new
output from music stored in a database. That way, the program would retrieve instructions that
28/100
are contained in each piece of music and then would create different music based on those
instructions. The discovering of instructions is based in part on the concept of recombinancy,
which is the method for producing new music by recombining extant music into new logical
successions. To do this recombinancy successfully, EMI uses three basic principles: 1)
deconstruction (analyze and separate into parts); 2) signatures (commonality – retain that which
signifies style); 3) compatibility – (recombinancy – recombine into new works). This work is one
of the most known works in the field of automatic composition.
4.8. Summary
In this section we presented some works related to this thesis. In all the works there was a
creation of a system that managed to create music in an automatic way. In Downie’s work [31],
a musical agent was created to assist the virtual environment (void *) with the musical creation.
In Bresin’s work [32] the main goal was to generate automatic music that had some kind of
virtuosity and expression on it, instead of just playing a deadpan reproduction of the score. In
Casella’s work [33] there was a creation of a musical composition agent that was integrated in a
system that integrated the music created by it in a virtual environment. In Quincey’s work [34],
music was generated to accompany a story going on a haunted house virtual environment. In
Nakamura’s et al work [36], music was generated to improve computer generated animations,
based on actors’ emotions and motions. In Berndt’s work [37], it is given a special attention on
how to make a smooth transition between musical pieces through the use of changes in
dynamics and tempo. Finally, in Cope’s work [38], new music is automatically generated based
on the recombinancy concept. To the work of this thesis, we can relate the ideas of the use of a
module that coordinates the creation of music, used in both Downie’s and Casella’s work, with
the use of Apollo – the D3S music manager which will be further explained. We can also
associate the use of virtuosity and expression of music with some of the changes in musical
parameters that were made for this thesis like in Berndt’s work. Other similarities between these
works and D3S can be found. As a final conclusion for this chapter, we will present a table
summarizing the features of these systems that influenced D3S work. The D3S features will be
explained with more detail in chapter 5.
29/100
Fig. 14 – Relation between discussed works and D3S
30/100
5. The sound of D3S
The big difference between scoring a movie and scoring a real time storytelling environment is
that these environments create a story in real time. While in the movies there is time to select
the scenes to score, compose the music suitable for each scene and then editing and
synchronizing with the music with the images, on real time environments this is much harder or
impossible to do without the aid of a computing system. We can make some associations
between what was presented in the film scoring chapter and what we could have on our system.
For instance, it would be interesting to have character themes associated to each main
character. It is also possible to use music to give hints to the viewer, mostly of what could
happen in a story’s near future. For example, when the villain is about to make his entrance on
the shadow theatre, then the music can give a hint about it by starting playing its theme just
before he enters. The villain theme should be a music piece with sinister and scary
characteristics, as other character themes would have their own characteristics.
In this thesis work, we will consider the use of pre-made music, since making a system that not
only accompanies a story in an effective way but also composes music in real time is beyond
the objective of this thesis. Because of that, the use of pre-made music, mostly classical or
instrumental, will suit better the storytelling environments designed for a young audience.
5.1. Understanding and enjoying a story
Our main goal with this thesis is to increase the enjoyment to the viewer of a story generated in
a storytelling environment. To enjoy viewing a story, in the majority of the cases, one needs to
understand it. One could say that if we watch, for example, a movie with many interesting
special effects, we can say that the movie was enjoyable to watch because of those features,
even if we didn’t understand what was going on in the story. However, if we understand the
story, then we have the chance of enjoying it even more. Through adding music to these
storytelling environments, besides adding entertainment value to them, we also want to add
understanding to the story so the user can enjoy watching it even more.
To understand a story, it is essential that we understand the characters in it. Then we need to:
• Know their roles - who is the hero, who is the villain, etc;
• Understand their actions – who is doing what;
• Perceive how they are feeling – not only individually but also towards other characters;
• Understand what is happening in the scene through the result of the characters’
interaction – knowledge about the environment’s valence and arousal, which results
from the interaction of the characters.
31/100
Besides the point of understanding a story, we also want the user to be entertained through
hearing music that he could enjoy. Music can help to understand the story, but if the viewer
doesn’t like the way it sounds, then it will be harder to reach the final goal of the enjoyment
increase.
The scheme in Fig. 15 illustrates the relation between the main goal – enjoyment of the story –
and the features of the thesis. Those will be explained in the following sections of this chapter.
Fig. 15 – Story enjoyment and understanding
32/100
Background music Event Sounds
Character
Themes
Key Moments
Themes
Music that
emphasizes
emotions
Filler Music
Tempo
And
ArousalHero ThemeInstruments
And
Characters
Volume
And
RelationsVillain Theme
Friend Theme
Hero Theme
PH
Hero Theme
PL
Hero Theme
NL
Hero Theme
NH
Duel Theme
Friendship Theme
Climax Theme
Note that there is also the entertainment factor of music – we want the music to sound good to
the story viewers. That would be guaranteed mostly through the use of pre-composed music for
all different background music. The use of pre-composed music is not explicit in this scheme.
Based on the features of the bottom layer of that figure, we can then organize them in a generic
D3S framework to score storytelling environments, which is shown on the next figure:
Fig. 16 – D3S Framework
Using the understanding and enjoyment factors as guidelines, in this chapter we will explain this
framework and present the theory behind the development of this work.
5.2. Event sounds and background music
As we have seen before, several films and animated movies use different type of music and
sounds to score what is happening on them. We can notice certain resemblances between the
affective environments such as I-Shadows and animated movies – both use fictional characters
that are made specifically for them; both are focused on a younger audience; both have actions
and feelings that are stereotyped through the use of images – for example, if a character is in
love with another one, some red hearts may appear in scene as an evidence of that feeling.
Using the last resemblance, we can then analyze how these actions are scored in the animated
movies – through the use of underscoring. This effect consists on scoring actions through the
story in a way that it is possible to emphasize them and make them more noticeable for the
audience [24]. That way we can then select specific actions in the affective environment – which
33/100
we will call events – and associate a piece of music or a sound to be played with it while it is
happening on the scene. That way we can underscore those events paralleling the action with
sounds.
Besides the use of music to score events, we want also other kind of music that we have seen
that is present on almost every single animated movie (specially on the classics from Disney)
and that is used on movies as well – the background music. This type of music may have
different purposes and objectives: it can be used as a way of emphasizing the presence of
certain characters on a scene; it can be used to hint the audience of the intensity and
importance of the actions that are happening in the story arc, such as the climax of the story; or
it can just be used as a filler to complement the visual aspect of the story [24] [27].
We can then conclude that there will be two parallel layers when considering music execution –
event music and background music. They are considered parallel since both can be happening
at the same time. The first one is more oriented to short musical sequences that represent
certain actions and the second one is relative to pre-composed music that is being played while
the story is developing.
5.2.1. Event Sounds
Since events are associated with actions made by characters, we classified the arguments of an
event as being the following:
Subject – the character that executes the action
Target – the character that is the target of that action
Action – the action itself of that event
With that, for every event, there is always a subject that executes it, there may be or not be a
target (since some events are individual and do not relate to other characters) and there’s the
action that describes which kind of event that is and what music will be then played to score it.
The event sounds help the user to understand what actions are being made and who are doing
those actions. For example, when a character is trying to tell us that he’s happy, he can try to
express it by doing a certain movement with his body. Along with that movement, a happy
sound can be also played to reinforce the fact that the character is happy.
5.2.2. Background Music
By using background music, we wanted to use music that could offer all the features we have
seen before of what would be the purpose of using it to score stories, with a special focus on
enhancing the understanding of the story. That way, we can then classify background music in
four different categories:
Character Themes
This music is played whenever an important character enters the scene for the first time. This
decision was based on the character themes that are also present in many movies as we have
seen in chapter 2. The objective of using this music is to create an impact on the audience
34/100
when that important character makes its first appearance, revealing that it is an important
character and it will take an important role on the story.
To choose which characters would be worthy of having a character theme, we based on
Propp’s study [39]. He analyzed the basic plot components of Russian folk tales to identify
common narrative elements amongst them. By analyzing the structure of those stories, he
concluded that all the existing characters could be resolved in 7 different types: hero, villain,
princess, donor, helper, dispatcher and false hero. In the context of this thesis, we will consider
only three of them as being the main characters: the hero, the villain and the princess. Since the
princess character might be limited, we used a different definition for it and we called it the
friend character. That way, it can still be a princess but can also be just a good friend of the
hero, someone that is on the hero’s side. We will only pick 3 characters so that it would be
easier for the audience to remember and associate a certain theme to a character. If every
character in scene had a character theme associated, it would be confusing and hard to
understand who the main characters of the story are. That way, those 3 characters - the Hero,
the Villain and the Friend – will have a dedicated character theme. With that, we have then our
first type of background music:
Hero Theme – Theme that is played when the Hero enters in scene for the first time.
Villain Theme – Theme that is played when the Villain enters in scene for the first time and
whenever the villain is in scene without the hero being present too.
Friend Theme – Theme that is played when the Friend enters in scene for the first time or when
it is alone in scene.
The Hero Theme is also played when the hero is alone in scene. However there were made
some changes to it according to the emotions felt by the hero at that time, as we will see next.
Background music that emphasizes emotions
As we have seen before, the use of music can be useful to emphasize emotions that are
present on the scene at each moment. The D3S system uses this type of music through the
hero. That way, the hero theme is played when the hero is alone in scene (which will most likely
happen often, since he’s the main character). To make the theme emphasize the emotions felt
by the hero at that time, we can associate one different type of hero theme for each kind of
mood that the hero is feeling at each moment. That theme will carry associated emotions within.
Using the notion of valence and intensity that we have seen before and the notation used by the
I-Shadows system to classify moods, the mood of the hero can then be divided in 4 main mood
intervals:
35/100
Mood Value Type of Hero Theme
[-10;-5[ Hero Theme with Negative Valance and High Intensity (Hero Theme NH)
[-5;0[ Hero Theme with Negative Valence and Low Intensity (Hero theme NL)
[0;5[ Hero Theme with Positive Valence and Low Intensity (Hero Theme PL)
[5;10] Hero Theme with Positive Valence and High Intensity (Hero Theme PH)
Fig. 17- Association between Mood Values and the type of Hero Theme
To make the changes of valence and intensity to the theme, we can then make an analogy
between some well known parameters that we have seen before [2]: the valence can be
associated to the mode, being major mode associated with positive valence and minor mode
associated with negative valence. The intensity can be analogized with rhythm and volume,
being a faster rhythm and loud volume associated with a high intensity and a slower rhythm and
softer volume associated with a low intensity. These associations between the musical
parameters of mode, rhythm, volume and the respective valence / intensity that they might
suggest, were based on the studies we have discussed in chapter 3. Most specifically, with
Thayer’s model of mood [18] and with the other studies that relate musical parameters with
emotions also discussed in that chapter.
Background music for key moments
Along the story, there will be some key moments that have a great importance for it. Such
importance should then be emphasized too through the use of music to score them. We have
seen some examples in films where the type of music played was related to what was
happening in the scene at that moment.
Using the existing knowledge for the common structure of the kind of stories and following
Propp structure [39], we could find some key moments that have background music associated
within. Those moments are:
Duel Theme – played when at least the hero and the villain are in the scene.
Friendship Theme – played when the hero and the friend are alone in the scene.
Climax Theme – played when the story reached its climax. Since the climax itself its just a
single point in the story that ends the rising action act and starts the denouement act – for
example, when the villain is finally defeated – we have to find a way to fit the playing of the
music associated with it. That way, it starts to be played when the rising action act reaches its
halfway point and ends when we reach the halfway point on the denouement act.
36/100
Fig. 18 - Changes of acts in I-Shadows
Background music as filler
When there is nothing relevant happening in a scene, a different type of background music is
played. We called it General Music, since it is the music that is played when, for example, only
secondary characters are present in the scene which most likely will not do anything as relevant
for the development of the story as a hero or a villain could make.
This General Music is the only type of music of the system that is not pre-composed - it is
generated automatically using the existing Amadeus module of I-Sounds, which generates
music for the scene according to the environments characteristics at that time. In this specific
case, the music changes according to the valence of the character’s mood and those changes
are reflected in the mode and rhythm. Amadeus will be better explained in chapter 6.
This kind of background music could also use pre-composed music instead of automatically
generated one. We could use a similar approach to the one we used for the hero themes. That
way, we could discretize the environment valence/arousal space into 4 intervals and have
different music associated to each of them.
Fig. 19 - Possible discretization of the General Music
37/100
5.2.3. The use of silence
A common saying tells us that “silence is golden”. Even though we have said that the use of
music is important to emphasize several aspects of the story, sometimes we may use silence to
emphasize other aspects.
As mentioned before, suspense can be created through the use of certain kind of music, like the
drums that are played in the circus before a dangerous action. However, the use of silence can
have the same effect. When the music suddenly stops after being played constantly for
sometime, it may arise some questions and emotions on the audience. The simple question
“why did the music stop?” may make the audience start imagining and guessing of what is going
to happen after, creating and building an effect of suspense [40].
In the concrete example of D3S, the use of silence was made before the first entry of the villain
in scene. Since D3S is informed about the characters’ entrance in scene before it happens, that
anticipation time is used to stop the music being currently played. That way, when the villain
music starts being played and then the villain itself appears in scene, the impact caused is
bigger and the evilness of the villain may be emphasized. We can then discretize the villain
entry in 3 different steps:
1 – The music currently being played stops – this happens when the system receives the
information that a villain is about to enter in scene. That way we can anticipate that entry and
prepare accordingly by stopping the music.
2 – The villain theme starts playing – this way, the viewers can be suggested that something
important will happen in the scene.
3 – The villain enters in scene – the suspense effect ends, making a greater impact on the
audience with that entrance.
5.3. The Artistic task
The choice of what kind of music / sounds are played for each event is mostly an artistic task.
One can have pre-composed music or music generated by systems such as Amadeus.
However, we chose to use pre-composed music and sounds, since it increases the chance of
adding the entertainment factor to it. There have been some advances in the automatic music
composition field where good examples exist, such as EMI. However, we have not reached yet
a point where really pleasurable music is being created. Despite of such advances in that field,
we have not reached yet the same quality of music that is created by composers.
Since we want to keep the system open for changes relative to the music to be played, all the
sounds and music are read from files that have a name associated with it. That way, each event
have a file associated called “name.mid”, where name is the action’s name. The same happens
to the background music, where there’s also midi files associated for each type of background
music. As we have the system now, some of the event sounds were composed by the author of
the thesis and others were adapted from existing motifs. Most of the background music has
38/100
been chosen between existing music, where there was the concern of searching for music that
had the characteristics that were explained above to match each type of background music.
However, even though the nature of the music may change by choosing other music files to be
played, it is possible to change some musical parameters dynamically. Those changes are
explained in the next section.
5.4. Adding dynamism to music and sounds
Since D3S is being used to score dynamic affective environments, the music and the sounds
need to be dynamic as well to adapt to the nature of those environments. Because of that, we
need to find which musical parameters we can change and when / how to change them.
Besides, we need to make associations and analogies between those musical parameters and
other parameters that we can find in the affective environment. The use of MIDI format reveals
useful in this matter, since it is easier to tweak parameters in this format than on other formats
such as WAV or MP3, since there are existing libraries that allows those changes.
5.4.1. Volume and Emotional Relations
The volume of music can be associated with the intensity of its output in decibels. With that, a
loud music will have a high decibel output, and a soft music a lower decibel output. Using the
intensity parameter, we can associate it with the other kind of intensity that we have seen before
– the emotions intensity.
Making an analogy between the volume of the music and the intensity of the emotions, we can
then change the first one according to the second one. With that, taking one example of the
events - if the character John gives a flower to the character Mary, a sound associated to the
action offer is played. The volume of that sound can be higher or lower according to the relation
that both characters have. Then, if John and Mary have a strong relationship between
themselves, the sound will be played with a higher volume. On the other hand, if their
relationship is neutral or weak, the sound will be lower and not as significant.
This way, we can add dynamism to the event sounds and hint the audience with the importance
of that action through the volume of the associated sound that was scoring it.
5.4.2. Instruments and Characters
The use of instruments plays an important role too when we want the audience to associate
them with a specific kind of character. This kind of association has been already seen in the
musical world. The most famous example is Peter and the Wolf, a composition by Sergei
Prokofiev written in 1936.
39/100
Fig. 20 - Disney’s Peter and the Wolf – 1946 [41]
This composition consists on a children’s story with both music and text written by the
composer. That story is spoken by a narrator and accompanied by an orchestra. There are
several characters present in it and for each character there is an instrument associated to it.
For example, the bird lines are played by a flute, the duck by an oboe, the cat by a clarinet and
so on [42]. The main objective of doing this was that by doing so, the listener to the story could
easily associate the music to the characters. That way, we have a better perception of what is
happening in the story. For example, if the flute and the oboe are being played at the same
time, then we know that the bird and the duck are both in scene at that time.
Using a similar analogy, in the D3S we can associate instruments to the characters as well.
However, since the background music can be complex and using different instruments at the
time, the events sounds is the best place we can find to do these associations. For example,
when we have John offering a flower to Mary, we will have a specific instrument to score that
action. On the other hand, if we have Mary offering the same or something else to John, then
the instrument used will be another one. Even if the sound played will be the same, the use of a
different instrument makes it possible for the audience to make a better distinction of who gave
the object to whom through listening to the sound.
5.4.3. Tempo and Environment Arousal
The Tempo of a piece of music is its speed or pace [43]. When the Tempo changes, each note
duration changes too, making the music slower or faster.
Usually, the tempo of music is measured in BPM (beats per minute). The same unit is used to
measure one’s heart rate. When we talk about environment’s arousal, we talk about how
agitated or calm is that environment at a given time. Usually, the heart beat of someone who is
agitated beats faster than other that is calm.
We can then make an analogy between the tempo of a piece of music and the environment
arousal - the intensity of the scene at each moment. We can then play the background music at
a faster Tempo when the environment arousal increases and do the opposite action when it
decreases. With these changes, we expect the tempo of the background music to follow
Freygtag’s Pyramid model (Fig. 21), being the Tension variable associated with the Tempo of
the music being played.
40/100
Fig. 21 – Freytag’s pyramid with music’s Tempo variable
5.5. Summary In this chapter, we explained how we could meet the goals of making the viewer better
understand a story and at the same time enjoy it more through the use of music and sounds.
We considered the use of two different and parallel layers while scoring a storytelling
environment: the event sounds layer and the background music layer. Within the first one, we
have seen that we can associate a sound to an event that happens in the environment. That
way, it is possible to better understand what kinds of events are happening. Changes on the
instrument according to the character that made the event can help the viewer to understand
who did the event. Changes in the volume can help to understand the strength of the relation
between two characters. Within the background music dimension, we have considered the use
of character themes to help the viewer to recognize the main characters of the story. The use of
the hero theme can be changed according to his mood and that change could be transmitted to
the audience with the help of a variation of that theme. Other themes can be used in key
moments of the story, such as the duel, friendship and climax scenes. We have also
considering the change of the background music tempo to emphasize the change of the
environment’s intensity along the story. We have seen that background music can also have
other functions such as being a filler to entertain the audience while there are not major events
for the story happening in the environment. The use of silence was explained as being a way of
creating anticipation and suspense effects on the audience.
41/100
6. Architecture and Integration
In this chapter we aim at giving an overview not only on the system architecture but also on the
integration with the storytelling systems. We will describe the main components of the base
implemented system that was used to develop the D3S features – the I-Sounds system – and
we will explain the changes that were needed to make in order to adapt it to D3S. After, we will
explain how the music manager module Apollo works and what its purpose for our work is. We
will also describe how the integration was made with the two storytelling systems used for the
test of this work – I-Shadows and FearNot! systems.
6.1. I-Sounds
I-Sounds system is an existing framework developed by Ricardo Cruz [44] that allows the
scoring of storytelling environments. It was created with the purpose of scoring the storytelling
environment I-Shadows, as a response to the need of having a system that could manage all
the sound resources and do the connection with the interactive storytelling systems.
It is a 3 layered system that uses the information received by the story-telling system to produce
sound to accompany the story made by that system. As for the layers, they consist in an
Affective Layer, a Composition Layer and an Output Layer.
Additionally, an application specific driver was developed in order to facilitate the
communication with the storytelling environments. An interface was also developed to ease the
process of testing the system and keeping track of the changes in the system while the story is
being scored. In the following sections, we will give more details not only about the driver but
also about each of the layers that compose the I-Sounds system and about the Interface.
6.1.1. Application Specific Driver
This driver is responsible for decoding the messages that come from the storytelling system and
arrive to the I-Sounds system. It makes then all the changes needed in the Affective Layer
according to the content received on those messages. This driver is dedicated to a specific
storytelling system. In order to adapt the system to different storytelling systems, a new driver
needs to be created and changes need to be made in order to integrate it with those systems.
Following, we present the example of the messages that are understood by a driver. Notice that
the majority of these messages can be used for any generic storytelling environment.
Register Actor – this message is responsible to register a new actor in the local environment. It
is also responsible to inform the I-Sounds system whenever an actor enters in scene.
Establish Connection – this message is responsible to establish an initial connection between
two actors. That connection is an affective one, which will be used to update their emotional
relations.
Remove Actor – this message is sent whenever an actor leaves the scene.
42/100
Update mood – this message updates the mood of a specific actor. The mood can be positive if
the actor is feeling positive emotions (such as happiness, joy, etc.) or negative in the case of
negative emotions (sadness, anger, etc.).
Fig. 22 – Original I-Sounds architecture [44]
43/100
Update emotional state – this message updates the emotional state of an actor. It gives more
information about the intensity of each specific emotion that is being felt by that actor. For
example, it can update the emotion “sad” for the character “John” with a value 10 (maximum),
which would mean that the character is very sad.
Updates connection – message used to update the emotional connection between two actors,
which was created previously. We can then update the system saying that, for example, a
character hates another character with an intensity of 10 while at the same time likes a third
character with an intensity of 5.
Story State Message – In this message, the storytelling system updates the information about
the story state, namely about the mood, valence and intensity of the environment and the
current story act. The story acts are related to the Freytag’s pyramid acts, which allow the D3S
system to know the current story state and predict changes in the music according to it.
Event Message – This message is sent whenever an event occurs, informing the I-Sounds
system about the subject, target and action of that event.
The last two messages were added for D3S, since we needed to retrieve more information
about the storytelling environment related to the story states and the events made.
6.1.2. Affective Layer
In this layer, I-Sounds system keeps the record of all the entities that are present in the affective
environment as well as a record of the environment proprieties. It needs to keep a registry of the
environment so that it helps to decide what kind of music to generate at each moment.
The information about each entity consists on their name, role, mood and instrument. The
information about the name and role is given by the storytelling system through the Register
Actor message. The mood is updated whenever it is needed through the Update Mood
message. The instrument is assigned to a character according to its role. We decided that the
main characters (hero, villain and friend) would have a pre-assigned instrument (trumpet, cello
and flute) respectively. That way it would be easier to distinguish between the characters. All
the other characters would have a different instrument assigned to them, which would be
assigned according to the order of registry in the system.
The information about the environment proprieties consists on its mood, valence, intensity and
background music being played. There is also a registry of the existing actors and the actors
that are currently acting in scene. This distinction is needed due to the importance of knowing
who enters or exits the scene and who is acting at a given time so that the music can be played
according to it.
There was the need to change this layer slightly to adapt it to the new information that would be
received from systems which have a story development that follows Freytag’s Pyramid.
Therefore, the affective environment also keeps the registry of the story state. That way, it is
possible to know in which act of the story we are in a given moment. In the case of the
storytelling systems that do not follow Freytag’s Pyramid, this parameter is not used to produce
the background music.
44/100
6.1.3. Composition Layer
This layer gets information from the previous layer, creates the music and/or sounds to be
produced based on the information received and sends the results for the next one, the output
layer.
The original I-Sounds system was able to compose music in real-time according to mood
changes using mode and rhythm [44]. That music creation was the job of the module Amadeus,
the composition algorithm implemented in I-Sounds framework. It is able to compose short
music segments that are representative of certain defined emotions. Those compositions are
based in two parameters: mode and rhythm. The mode parameter follows diatonic major and
minor modes and there are used three notes to produce the melody of the sequences. Those
notes are the tonic, mediant and dominant scale degrees. The rhythm parameter follows the
Just in Time theory by Eduardo Lopes, which is based in pulse salience and kinetics to produce
different rhythmic sequences and associate them with emotions. [45]
To make it possible for the system to use pre-composed music without changing Amadeus, we
created another composer – Ludwig – which is responsible for the selection of pre-composed
music.
To manage which composer to be selected at a given time, a music manager – Apollo – was
created to do that job. This manager makes the bridge between the affective layer and the
composition layer. Therefore, whenever a message is received, after the information in the first
layer has been updated, it selects the most appropriate composer to produce the sounds or
music to be output. More information about Apollo can be read later in this chapter.
Fig. 23 – Integration of Apollo
45/100
6.1.4. Output Layer
This final layer is responsible for reading the results produced by the composition layer and
outputting them as music or sounds. Originally, the output layer was built in order to be able to
reproduce only one layer of sound. Since we needed two different layers for D3S – background
music and events – changes had to be done in order to adapt the output to be able to reproduce
two layers of music at the same time. Since that the MIDI protocol is the one used, we added a
new synthesizer that would be responsible for playing the event sounds, while the one existing
would play the background music. Other changes were made, related to the change of musical
parameters (change of music’s tempo, volume, instruments) so that this layer could support the
variation of these features.
6.1.5. Interface
The I-Sounds system has an interface built for the purpose of keeping track of all the changes
that were happening in the environment. It has four different divisions, where we can keep track
of the affective entities that are in the environment and their respective emotion values, the
current environment politic control, the compositor that is being used and the output handler that
is active at a given moment.
A new division was added in the interface for the purpose of testing the D3S changes.
Through it, it is possible to track the following variables:
Actors registered – The list of the actors registered in the I-Sounds system
Events list – The list of all the events that were received by the system since the start of its
execution;
Current State – The current story state. It can have values between 1 and 4. Each integer
related to each act. It can also track middle states between acts. This is useful if we want to
consider changes that happen between acts, such as the climax of the story.
Environment mood – the overall mood of the environment;
Hero Mood – the mood of the first hero that showed up in the story;
Music being played – The music that is being played in the environment and that can be heard
by the users.
Music value – the value of the music being played. Each musical theme has a value
associated. The most relevant and important themes for the story have the higher values (such
as the Hero Theme or Villain Theme), and the less relevant have the lower values (such as
General Music). When a certain defined time passes, this music value is changed to 0 due to a
timeout trigger, to give opportunity to other less valuable themes to be played. This music
hierarchy can be seen later in this chapter.
Actors in scene – The list of actors that have entered in scene and will affect the story
development directly.
46/100
6.2. Apollo – D3S Music Manager
Since we have many musical features we want to include, we need to find a way of coordinating
them and make them fully integrated with the I-Sounds system. To do that, the module Apollo
was created.
Apollo is the module responsible for updating the affective environment and performing the
music selection in D3S.
Fig. 24 – D3S Interface
It has three main functions:
- Changing the affective environment whenever a change occurs due to a new message
received from the affective environment.
- Choosing what changes in sound / music are needed to make according to the new
changes in the environment.
- Selecting which composer module – Amadeus or Ludwig – will be activated to
generate the sound or music to play according to the changes.
6.2.1. Amadeus and Ludwig
Amadeus and Ludwig are the two available composition modules in the I-Sounds system. They
have different natures – Ludwig is oriented to pre-composed music and Amadeus is oriented to
real-time composed music. Ludwig was created while Amadeus already existed.
As we have seen, there are 2 parallel paths when considering music execution – events music
and background music.
Since the events are sounds that are reproduced whenever an action occurred, those were pre-
composed to be available always in the same form so the same action has always the same
sound associated to it.
47/100
The background music can either be pre-composed or composed in real time, depending on
which kind of music we intent to have. The pre-composed music has the advantage of generally
sounding better and being more interesting to hear. However, it has the disadvantage of having
to be composed before by someone. The music composed in real time have the advantage of
being composed while the story is being developed and can adapt better to the changes in the
environment. However, it might sometimes not sound as good as the pre-composed music. As
such, Ludwig plays all the music and sounds associated to special events and the other
background music that is pre-composed. Amadeus is responsible for generating and playing
other kind of background music whenever there is not obvious pre-composed music to score in
certain concrete situations, as it will be explained below.
6.2.2. Input
The input for Apollo consists on variables given by I-Shadows whenever relevant changes occur
in the action of the story that justify a possible change in the sound as well.
These variables consist on:
Scene Valence – the valence of the current scene, being an interval between -10 and 10
Hero Mood – the mood of the hero, being an interval between -10 and 10
Event – a special action made by a character
Scene Change– given whenever the story act changes, being possible to take the values
corresponding to the beginning and the middle point of each act: Exposition, Rising Action,
Climax, Falling Action, Denouement
Character Entry – when a character enters the scene
Character Exit – when a character exits the scene
6.2.3. Output
The output created by Apollo will be given to the destination composer. The composer will then
use this output to generate the sound to play. In the case where Amadeus is selected, Apollo
gives information about the scene mood which Amadeus will use to generate its music. In the
case where Ludwig is selected, Apollo gives information about which music to play, being it an
event or other type of background music.
6.2.4. Internal Structure
Since the events and the background music follow parallel paths, there are also two paths on
the manager associated to them. Whenever an event occurs, the function associated to the
treatment of events is called. When a change occurs in the environment that may lead to a
change to the background music, another function is called to analyze that occurrence.
To score the background music, Apollo uses a structure of music selection based on a list of
hierarchy. This list was ordered based on the importance and relevance of the music for the
story. The list, sorted by descending relevance order, consists on:
48/100
• Hero Theme First Time – theme that is played when the hero enters in a scene for the
first time
• Villain Theme First Time – theme that is played when the villain enters in a scene for
the first time
• Friend Theme First Time – theme that is played when the friend enters in a scene for
the first time
• Climax Music – music that is played when the story reached its climax
• Hero Theme – theme played when the hero is the only character in scene. The
characteristics of this theme are based on the current mood of the hero.
• Friend Theme – theme played when the friend is the only character in scene.
• Duel Theme – theme played when at least the hero and the villain are in scene.
• Friendship Theme – theme played when the hero and the friend are alone in scene.
• Villain Theme – theme played when the villain is in scene, either alone or with more
characters. The hero can’t be present.
• General Music – music played when none of the above situations happen, when we
have only secondary characters in scene. This is the situation when Amadeus is
currently being called, since the music generated from it is more basic than the pre-
composed music. All the other states above are treated by the composer Ludwig.
The hierarchy is defined according to the importance and relevance of each theme to the
understanding of the story. More concretely, we want to guarantee that the viewers understand
the character’s roles, their actions and emotional state, how they interact and feel about other
characters and the key moments of the story
49/100
Fig. 25 - Input and Output of Apollo – The music manager
50/100
Fig. 26 - Internal Structure of Apollo for the Background Music
51/100
6.3. Integration with the storytelling systems
To be possible for the I-Sounds system to connect to a storytelling system, we designed a driver
which makes that connection. This driver can be modified or a new one can be created to adapt
the I-Sounds to other systems. During the development of this thesis, we used two different
storytelling environments to test the D3S features on them: the I-Shadows system and the
FearNot! system. As we have seen before, the I-Shadows system consists on a theatre where
computer generated shadows interact with user shadows in order to create children oriented
stories [4]. The FearNot! system consists in another storytelling environment where the main
theme is bullying. It has the objective of creating scenarios where a child character is being
bullied by others and the user has the objective to help that character through the story. [6][7]
6.3.1. Integration with I-Shadows
The integration between I-Sounds and I-Shadows required that the following components were
created:
On I-Sounds side:
A driver – called I-Shadows driver – which is responsible for detecting incoming messages from
the I-Shadows system. Those messages are transmitted through UDP protocol and they are
organized in a XML structure, so that the portability would be better and the integration easier to
make. This driver existed originally in the I-Sounds system.
On I-Shadows side:
There were created 3 components:
1. Director – entity that has the knowledge about what is happening in the storytelling
environment and is responsible to send messages to the sound system whenever
something relevant happens in the story.
2. Sound Interface – this component contains all the information about the possible
messages that can be sent to I-Sounds. This is accessed directly by the I-Shadows
director, which gives order to create the message whenever it is needed.
3. Messages structure – this component is responsible for the serialization of the
messages and to convert them in the XML structure, so then the I-Sounds system can
then read them after and decipher them.
Fig. 27 - Communication between I-Shadows and I-Sounds
52/100
I-Shadows I-Sounds
1) Register Hero, Villain, Friend
Register characters
and
assign instruments
2) Put Hero in Scene
Affective
Layer
Composition
Layer
Output
Layer
S1: Play
Hero
Theme
Select Hero
Theme
Hero in
scene3) Put Friend in Scene
Hero and Friend
in scene
Select Friend
Theme
S1:Play
Friend
Theme
Select Friendship
Theme
S1: Play
Friendship
Theme
- -
4) Event: Hero grabs flower
Select grab
event
S2: Play grab
sound5) Event: Hero offers flower to friend
Select offer
eventS2: Play offer
sound
6) Event: Friend expresses happiness
Select express
happiness event
S2: Play express
happiness sound7) Put Villain in Scene
Hero, Friend and
Villain in sceneSelect Villain
Theme
S1: Play Villain
Theme
Select Duel
Theme
S1: Play
Duel Theme
8) Story reaches climax
Select Climax
ThemeS1: Play Climax
Theme9) Villain Dies
Hero and Friend
in scene
Select Die
Sound10) Villain exits sceneS2: Play die
sound
Select Friendship
Theme
S1: Play
Friendship
Theme
Example of interaction between I-Shadows and I-Sounds
Following, we will present an example of interaction between these two systems:
Fig. 28 – Interaction example between I-Shadows and I-Sounds
Whenever I-Shadows sends a message to I-Sounds, that message is read and decoded by the
driver. It changes then the affective layer according to the updated information contained in that
message. Apollo is then responsible to check if there is the need to either change the
background music or to play an event sound. In the case where we need to do that, it forwards
the request of selecting the music to Ludwig, which is responsible to locate the music to play. It
forwards the music to the output layer, which makes the necessary changes on it and output it
in MIDI format. Notice that S1 and S2 correspond to the synthesizer that is used to output that
53/100
music / sound. As we explained before, we need to have two since we want 2 layers of music to
be played at the same time: the background music and events sounds. To simplify, we will
assume that the environment objects have already been registered in I-Sounds.
Following, we will give more details about the 10 messages that exist on the above example.
1) I-Shadows sends message “Register Actor” for the Hero, Villain and Friend characters. The
driver decodes the message register those entities in I-Sounds. The respective instruments are
assigned to those characters. Notice that at this moment there is no sound being output since
the scene is still empty of characters.
2) I-Shadows sends the message “Register Actor” again for the Hero character. Since this
character is already registered in I-Sounds, the system will then put it in scene. Apollo notices
that there was a significant change on the environment and selects the Hero Theme to be
played, since the hero is alone in scene. The output layer starts then outputting the music.
3) Similarly, this message puts the Friend character in scene. Since another significant change
in the environment was made, Apollo decides to change the music to the Friend Theme. Every
background music theme has a minimum time of reproduction. We want this time to exist so that
the change between themes is not abrupt (in the cases where there is a fast change of the
environment). When that time expires, the value of the music that is currently being played is
reset to zero. That way, the high valued themes (such as the character themes) do not play
forever and there is given the chance of other themes to be played as well. In this case, the
Friend Theme is played for that minimum time of execution. When it expires, its value is reset to
zero. Apollo will then check if there is another possible theme that can be reproduced by that
time which value is above zero. It finds the Friendship Theme as the theme with the next higher
value and sends it to the output layer for reproduction.
4) The storytelling system sends an Event Message with the Hero as a subject, with the target
being a flower and with the action being grab. Apollo detects this new message and forwards
the sound associated with the action grab to be played by the output layer. Since the Hero is the
subject, that event is played with the instrument associated to that character.
5) Another event is sent to I-Sounds. This time, is the offer event. The subject is still the hero,
but the target changes to the friend. The action made is offer. Notice that the volume of this
sound is played according to the relation between the hero and the friend at the time.
6) A third event is sent by this time, which corresponds to an action of expressing happiness by
the hero. The subject is himself, but there is no target since it is an action that he is doing alone.
7) By this time, the villain enters in scene. The Villain Theme is played during the minimum time
of reproduction. After that, Apollo changes the background music to a Duel Theme since at
least the hero and the villain are in scene at that moment.
8) A Story State message arrives on I-Sounds with the information that we have reached the
climax of the story. That way, Apollo chooses the Climax Theme as the next theme to be
played.
9) The Villain dies and the message with that respective event is sent. The corresponding
sound is played with the instrument associated to the villain.
54/100
10) After the death of the villain, I-Shadows decides to remove it from scene, leaving the hero
and friend alone in scene. That way, Apollo changes the music back to the friendship theme.
Notice that along the story, the environment mood and arousal have been changed, along with
changes in the affective relations between the characters. With the changes in the
environment’s arousal, the background music’s tempo also changes, being faster when the
arousal is higher and slower when the arousal is lower. The messages related to the
environment and characters’ relations updates are not represented in the scheme above.
6.3.2. Integration with FearNot!
A driver for I-Shadows already existed in the original I-Sounds system, but a new one had to be
created for the FearNot! system that was adapted to the needs of this new system.
Besides, in the FearNot! side, we needed to implement not only the SoundInterface and
Messages components, but we also needed to adapt the director existing in I-Shadows. That
way, the director was responsible to make the calls to create a message whenever it was
needed. Since both I-Shadows and FearNot! systems use the OCC based architecture (FAtiMA)
[5], this adaptation was possible to do without bigger problems.
Interaction between FearNot! and I-Sounds
The story structure in FearNot! is different from I-Shadows. If follows a sequence that alternates
episodes with coping parts. The episodes have no user interaction and the characters of the
story interact only with each other. The coping parts consist on a small chat between the bullied
child and the user, where advices can be given to that child by the latter.
In the next image we can see the flow chart of the story sequence, where 1, 2, 3 are the scenes
with no interaction, that are intercalated with coping parts. The content of the final episode
depends on the choices made by the user concerning the coping strategies decided along the
story.
Fig. 29 - FearNot! Flow Chart [35]
Legend: • Introduction (I): Type in of code, name, age and gender, introduction of characters and school • Bullying episode (1-3) • In between episodes: interaction with victim character in resource room (cope) • Educational message (F): after end of episode 3.
55/100
Even though the structure is different, it does not change the way that D3S interacts with the
storytelling environment. During the episodes, the process is very similar of an I-Shadows story.
Between episodes, the actors are all removed from scene so the scene remains silent while the
user chats with the bullied child. Therefore, FearNot! would be like different small stories (which
are the episodes) that are intercalated with the silent chat parts.
6.4. Summary
In this chapter, we described the architecture of I-Sounds and the changes that were made in it
so we could integrate the new features of D3S. We described the D3S music manager Apollo,
which is the module responsible for changing the affective environment whenever a change
occurs, choosing the sound / music to be played and selecting which composer module will
generate the output. We defined its structure and we described a new composer module,
Ludwig, and how it is used along with the existing module Amadeus. The first one uses pre-
composed music while the latter uses music composed in real-time. We presented an
interaction example between the I-Shadows and I-Sounds system and what changes happened
during that interaction in every layer of the –Sounds system. Finally we explained how the
integration was made with the two storytelling systems that served as a testing base for this
thesis work – the I-Shadows and FearNot! Systems.
56/100
7. Evaluation
With this dissertation work, we intended to find a solution for the problem of scoring a story,
which is being created in real time, using sounds and music in a way that adds more
understanding to what is happening in that story and consequently increases the enjoyment to
the viewer.
In the previous chapters we discussed some features that together would consist on a solution
for that problem. We can summarize those features in the following list:
1) Themes associated to characters
2) Association between characters and instruments
3) Use of sounds to underscore events
4) Change of character theme according to mood
5) Volume of sound associated to events
6) Music for key moments: Friendship Theme, Duel Theme, Climax Theme
7) Background music tempo
In this chapter we will then present the evaluation model that was followed to validate these
features. In order to do that validation, they were evaluated individually in independent tests.
Additionally, we also tested the system as a whole so that we could guarantee that we would
meet the main goal of the thesis. The storytelling environment that was used for the evaluation
part was the I-Shadows system.
In order to obtain data for the different tests performed, all the experiments were gathered in
online forms which the participants answered. We created 9 different versions of tests for that
form. The participants watched videos and heard sounds online. The sample was collected
amongst university students aged between 18 and 30. We understand that the ideal would be to
have answers from the I-Shadows destined age group of users, which would be around the age
of 10. Unfortunately, it wasn’t possible to gather a significant amount of users of these ages.
Note: Even though several different variables will be tested, we will assume that they are
independent among them. That way, the evaluation of the experiments can be simplified.
7.1. General system evaluation - D3S and story
enjoyment
The objective of this experiment is to validate if D3S brings a better global understanding to the
story and if consequently improves the enjoyment of the viewer while watching it, which was the
main goal to be achieved by this thesis. We will then evaluate if the use of D3S brings more
interest to the viewers by comparing it with no use of any sound system and with the use of
57/100
Amadeus composing system, which was present in the original I-Sounds project [44], which
uses only real-time composed music.
7.1.1 Method
Design
The Independent variable of this experiment is the type of scoring system to accompany a
storytelling environment, while the dependent variable is the enjoyment of the story. This
experiment followed a repeated-measures design and the type of data collected consists on a
5-point Likert scale score (0-5).
Participants
A total of 62 participants answered to this experiment. They were asked to answer under all the
different conditions of the test, so there were 62 answers for the Silent, Amadeus and D3S
versions.
Procedure
Three videos were shown to each of the participants:
- a first video where a complete story is shown. In that video no background music is played
during its exhibition;
- a second video where the story is scored using Amadeus;
- a third video of that same story where all the D3S features are present.
In the end, the participants were asked to rate the interest on each of the videos in a 5-point
Likert scale (0-5), where 0 corresponded to “Not Interesting” and 5 to “Very Interesting”.
To guarantee that there was not any influence in the answers, the order of the videos was
changed among the 9 versions.
7.1.2. Results
In this first experiment, D3S got the highest mean value of interest (3.21 out of 5), while the
silent version got the lowest (1.73 out of 5). The Amadeus versions got the middle result (2.27
out of 5).
The comparison between the three different conditions can be seen in the box plot graph in the
next figure:
58/100
Fig. 30 - Different scoring systems and associated interest.
A non-parametric Friedman test was performed on the data. Results tell us that the level of
interest obtained was significantly affected by the type of scoring system used,
χ2(2) = 75.8, ρ< 0.001.
7.1.3. Discussion
The results of this experiment tell us that the use of a musical system to score a storytelling
environment increases the interest of watching a story being created there. The results were
validated since we got a significance value of p < 0.001. We can see that participants preferred
D3S over Amadeus, which suggests that the new features included in this project offers a better
interest to the viewers. This increase of interest might be related with the fact of the use of pre-
composed music in D3S opposing the real-time composed music used by Amadeus. The final
result is more pleasant to the listener’s ears and might have a significant influence in the
evaluation related to the interest.
7.2. Background music’s tempo and
environment’s intensity
The objective of this experiment was to test if the changes in the background music’s tempo
translate into a better perception of the story’s intensity. We aimed at studying if there is a
relation between the music’s tempo and the environment’s intensity. By this, we wanted to
analyze if fast tempo induces the perception of greater environment intensity and if slow tempos
induces the opposite.
59/100
7.2.1. Method
Design
The Independent variable of this experiment is the background’s music tempo, while the
dependent variable is the perception of the environment’s intensity. This experiment followed an
independent design and the type of data collected consists on a 5-point Likert scale score (0-5).
Participants
A total of 62 participants answered to this experiment. Each participant got one version with a
different background music tempo. Therefore, 21 of them watched a video with a slow tempo,
other 21 watched a video with a medium tempo and the remaining 20 watched a video with a
fast tempo.
Procedure
The experiment participants were divided in 3 groups:
- We showed a video of a story segment with a slow music tempo to the first group;
- To the second group we showed a video of the same story segment but with a medium music
tempo;
- The same story segment was shown but with a fast music tempo to the third group.
In the end, we asked for the participants to rate their perception of the intensity of the story on
each of the videos in a 5-point Likert scale (0-5), where 0 corresponded to “Not Intense” and 5
to “Very Intense”.
7.2.2. Results
In this experiment, the highest mean value of intensity belonged to the version of the video with
the medium tempo (3.05 out of 5), while the slow tempo version got the lowest (2.57 out of 5).
The fast tempo version got the middle result (2.80 out of 5).
The comparison between the three different conditions can be seen in the box plot graph in the
next figure:
60/100
Fig. 31 - Different music’s tempo speeds and associated perception of the story intensity
A Kruskal-Wallis test was applied on the data. Analyzing the results of that test, we can affirm
that the change in music’s tempo did not affect significantly the perception of the environment’s
intensity, since the result p = 0.465 is greater than the acceptance value of 0.05.
7.2.3. Discussion
The results of this test do not allow us to conclude that there is an association between music
tempo and the environment’s intensity, since the significance value was above the acceptance
value of 0.05. However, even though the test results were not conclusive, we strongly believe
that the association exists. We can summarize some of the reasons that might have lead to the
fact that our results were not conclusive for this experiment:
- The design of this experiment is independent. Therefore, each participant only saw one video
and answered only to that video. The music used was one of the music’s assigned for the
climax part. The climax music already has an intense character associated to it, and even with a
medium tempo, the perception of intensity might be considered high. Because of that,
participants that watched the medium tempo video might have considered that the intensity was
already high enough to score it with a four or five in the 5-point scale, which would inflate the
results for the medium version.
61/100
- The users might have different understandings of the word “intensity”. It might have an
ambiguous meaning, since some might consider it related with being agitated / calm, while
others might associate it with the number of characters or actions happening in scene or even
other factors.
- The fact that this experiment was made right after the first one might have created an
unexpected impact in the answers. The first experiment asked three questions to the
participants, where they had to rate the video in an interest scale of 5-points. In this experiment,
the question changed. However, the scale is the same and the participants might not have
noticed that the question have changed, answering it as they would to an interest scale.
- The last possible reason might be related to the fact that it might be easier to perceive a
change in music’s tempo than to quantify its intensity. Therefore, what might be considered a
value 3 of intensity for a participant can be understood as a value 1 for others. However, if we
ask the participants if we have the music’s tempo changing in a video and we ask about the
intensity in the beginning and end of the video, the results might be more according to the
theory behind this experiment.
7.3. Association between characters and
instruments
The objective of this experiment was to test if the association between a character and an
instrument helps the understanding of which character did a certain action. We wanted to
analyze if the participants instinctively associate an instrument to a character and can recognize
the characters through the hearing of that instrument. That way, the listener could also better
distinguish the character’s actions by associating them with the respective instrument.
7.3.1. Method
Design
The Independent variable of this experiment is the association between characters and
instruments, while the dependent variable is the understanding about who did an action. This
experiment followed an independent design and the type of data collected consists on a score.
Participants
A total of 62 participants answered to this experiment. They all watched the same video.
Procedure
We showed to the participants a video where there are some events made by characters which
have different instruments associated with them – a trumpet, a flute and a cello to the hero, his
friend and the dragon, respectively.
62/100
After the video, the participants heard 3 events played by each of the instruments. The first
event was an “express happiness” event, played by the trumpet. The second event was also an
“express happiness” event, but played by the flute. The last event was a “show villainy intention”
event, but played by the cello. These events consisted on movements made by the characters
which represented the respective intention of either expressing happiness or showing a villainy
intention. We asked to the participants to tell who did the sound that they just heard. This test
was the same in all 9 versions.
7.3.2. Results
To analyze the results, a score was assigned to each participant according to the answers to
this experiment, which we can see following:
0 – None characters recognized correctly
1 – One character recognized correctly
2 – Two characters recognized correctly
3 – All characters recognized correctly
Analyzing the results according to the score, we can see that the mean was 2.42 out of 3, which
is a positive value. Considering the scores of 0 and 1 as being negative and 2 and 3 as being
positive, 81% of the participants had a positive score, while 19% had a negative one.
Additionally, we can analyze the individual scores for each character and the percentage of
participants that successfully recognized them through the hearing of the sounds. The hero
(boy) and his friend (the girl) had respectively 73% and 74% of successful rates. The villain
(dragon) got the highest score with 92% of the participants being able to recognize it. The Chi-
Square Test results showed us a significance p < 0.001 for all the characters and score. In the
next figures we can see the relation between the percentages of recognition for each character
and the percentage of participants that fall in each score category:
Fig. 32 – Percentages of recognition
63/100
Fig. 33 – Frequency percent of score
7.3.3. Discussion
As we can see through the results for this experiment, about 81% of the participants could
recognize at least 2 of the 3 characters. The biggest doubts within the recognition process
appeared while listening to the sounds of the hero (boy) and the friend (girl). When this
experience was designed, the boy and the girl had the same event sound associated. Both
played the “express happiness” event but with different instruments. The reason behind the
greatest recognition rates for the dragon, and lower and similar rates between each other for the
boy and girl might be related with it. Participants might easily recognize a music motif than an
instrument. Therefore, most of them might found hard to distinguish between the same music
motifs being played by two different instruments. That didn’t happen with the dragon, since the
motif associated was different. Another explanation might be related to the timber of the
instruments. The trumpet and flute might be closer to each other than to the cello, since the first
two use air to produce sound while the latter use the vibration of strings.
7.4. Themes associated to characters
The objective of this experiment was to test if our system helps to identify the characters’ role in
the story – specially the main characters - by associating a theme to a character. This test was
realized in order to test the hero, friend and villain themes, which are the main characters
considered for this thesis’ work. We will then see if the choice of those themes are appropriate
to fit the type of character which they represent.
64/100
7.4.1. Method
Design
The Independent variable of this experiment is the background’s themes associated to the main
characters, while the dependent variable is the understanding of the characters’ role. This
experiment followed an independent design and the type of data collected consists on a score.
Participants
A total of 62 participants answered to this experiment. 35 of them watched the versions with
sound while the remaining 27 watched the versions without sound.
Procedure
The experiment participants were divided in 2 groups:
- We showed a video to the first group where 4 characters made their appearance by a defined
order. Those characters had all the boy shape, but different colors. The colors used were blue,
yellow, red and green. The choice of using the same character was made to minimize the
influence of the shape of the character on the participant’s decision. The white and black colors
were avoided because they might be easily associated with good and evil, respectively. There
was no sound played during the video.
- We showed another video to the second group where the same 4 characters showed up.
However, the hero theme, friend theme and villain theme were played. The order of the played
themes changed among the different versions.
Fig. 34 - Characters used on this experiment
7.4.2. Results
To analyze the results, a score was assigned to each participant according to the answers to
this experiment, which we can see following:
0 – No themes recognized correctly
1 – One theme recognized correctly
65/100
2 – Two themes recognized correctly
3 – All themes recognized correctly
Analyzing the results according to the score, we can see that the mean was 1.71 out of 3, which
is a positive value. However, the Chi-Square test for the score gave a non significant value of p
= 0.543. The Chi-Square test also gave non-significant values for the recognition of the hero
and friend theme (p = 0.866 for both). However, it was significant for the recognition for the
villain theme (p = 0.028). In the next figures we can see the relation between the percentages of
recognition for each theme and the percentage of participants that fall in each score category:
Fig. 35 – Percentages of recognition
Fig. 36 – Frequency percent of score
66/100
Additionally, we can check if the order which the characters were presented has any influence
in the answers. The Kruskal-Wallis test was applied in order to verify it. The non-significant
result of p = 0.108 makes the order not relevant.
In the experiment versions without sound, we can also try to establish a connection between the
type (hero, friend and villain) and colors (blue, yellow, red and green) of the characters. The
Chi-Square test gave significant values for all the characters (p = 0.009, p < 0.001 and p =
0.013 for the hero, friend and villain respectively). Below, we can see how the colors were
picked by the participants for each character:
Legend: 1 – Blue 2 – Yellow 3 – Red Fig. 37 – Hero, Friend and Villain Colors 4 – Green
7.4.3. Discussion
In the versions with sound, the villain theme was the only one that was successfully recognized
by a significant amount of participants (69%), since it was the only one that got a significance
value of p = 0.028, which is within the acceptance values of p < 0.05. The hero and friend
67/100
themes were hard to distinguish in this experiment. The reason behind might be the musical
characteristics of both themes, which are close to each other. For example, both are in the
major mode and are happy themes. The villain theme, by contrast, is in minor mode and has a
sinister nature. It might also be hard for the participants to distinguish characters only through
the theme. The visualization aspect is also very important, and since all the characters had the
same shape and only differed in color, it was hard to distinguish between them.
In the versions without sound, we can see that the predominant choice of colors for the
characters was blue, yellow and red for the hero, friend and villain respectively. However, this
was also the first three colors that were presented to the participants. Since there was not any
sound and they didn’t have anything to base their decision on, they might have picked the same
order of colors as the order of the questions that were made to them. The high amount of red
answers for the hero can also be explained by the use of that same color on the previous
videos, which might have influenced these answers.
7.5. Use of sounds to underscore events
The objective of this experiment was to test if through the use of sounds to underscore events, if
they help to understand those events that happened in the story. By adding or removing a
sound associated to an event, we wanted to study if that sound helps to understand the action
made by the character responsible for that sound. In this experiment we will then see the
differences that the presence of sound associated with an event can make, comparing with the
non existence of sound.
7.5.1. Method
Design
The Independent variable of this experiment is the use of sounds to underscore events, while
the dependent variable is the understanding of what action was made. This experiment followed
an independent design and the type of data collected consists on a frequency.
Participants
A total of 62 participants answered to this experiment. 35 watched the versions without sound
while the remaining 27 watched the versions with sound.
Procedure
The experiment participants are divided in 2 groups:
- We showed a video of a scene to the first group where 4 actions are made by the characters.
Those actions do not have any sound associated within.
68/100
- We showed another video to the second group where the same actions happen but this time
they had the respective sound associated with it.
The last action consists of the death of the dragon. While in the first video there is no sound
played along, in the second video there is played a motif of the “Funeral March” from Frédéric
Chopin’s Piano Sonata No. 2 in B flat minor, op. 35, which it is known in popular culture as
being the music usually associated to death.
After each participant has seen its video we asked what they thought that had happened to the
dragon in the end. They could choose between the following options: fell, ran away, died, fell
asleep or none of the above.
7.5.2. Results
About 86% of the participants that watched the video without sound recognized the event in it.
In the video with sound, 96% recognized the event. However, the Mann-Whitney test gave us a
result of p = 0.166, which tells us that this difference of recognition is not significant for this test.
Fig. 38 – Experiment 5 results
7.5.3. Discussion
Even though there was an increase of the recognition of the event in the version with sound, the
inference test applied made it not significant enough to make this test conclusive since we got a
significance value p > 0.05. This might be explained by the fact that the movement that the
dragon made which represents the “die” event – falling down the screen and rotating 45º in the
end – might be a good enough representation of a dying action. Besides that, the fact that the
character that did it was the dragon – a natural villain character – might have influenced the
answers as it would be natural for it to die. We strongly believe that the use of sound helps to
understand the events made, however to be significant it have to be applied in a more
ambiguous situation. In this case, we could suggest a scenario where the character that dies is
not the dragon, but another (like the boy or the girl) and the movement done could be more
ambiguous as well.
69/100
7.6. Volume and relations between characters
The objective of this experiment was to test if the changes of volume translate into a better
perception of the strength of the relation between characters. We intended to test the
assumption that high volume corresponds to a strong relation between two characters, while a
lower volume corresponds to a weak relation. In this experiment we changed the volume
parameter in several videos and see if the participants change their answers accordingly.
7.6.1. Method
Design
The Independent variable of this experiment is the volume of the sound associated to events,
while the dependent variable is the understanding of the strength of the relation between them.
This experiment followed an independent design and the type of data collected consists on a
frequency.
Participants
A total of 62 participants answered to this experiment. 20 of them watched the video without any
sound, 15 watched the video with same volume, 13 the Low-High volume version and 14 the
High-Low volume version.
Procedure
The experiment participants were divided in 4 different groups:
- We showed a video to the first group where the boy sends two hearts: one to the girl and other
to the fairy in this order. The heart is the icon that represents the event “show love intention”. In
the end, the girl and the fairy leave the scene and the boy is left alone. There is not any sound
associated to these events.
- We showed the same video to the second group. However, the sending of the hearts has the
respective sound underscoring it. The volume of the sound in both hearts is the same.
- The same video was shown to the third group. This time, the sound underscoring the first
heart (girl) had a volume bigger than the second heart (fairy).
- We also used the same video for the fourth and last group. The sound underscoring the first
heart (girl) had a volume lower than the second heart (fairy).
In the end, we asked to the participants to tell who they thought that the boy liked the most: the
girl, the fairy or if he liked both the same.
7.6.2. Results
On the silent version, the predominant answer was “both”, with 60%. The girl got 30% and the
fairy only 10% of the answers.
70/100
On the version where the event was scored with a sound and the volume was the same on both
hearts, 73% of the participants answered “both”, while 27% answered the boy liked more the
girl. None participants answered that the boy liked more the fairy.
On the version where the volume was lower on the girl and higher on the fairy, the girl got 15%,
the fairy 62% and both 23% of the answers.
On the version where the volume was higher on the girl and lower on the fairy, the girl got 86%,
the fairy 7% and both 7% of the answers.
A Kruskal-Wallis test was applied on the results, where the value of p = 0.001 tells us that the
change of answer according to the video showed was significant, which validates our results.
Following we can see a box plot of the answers to this test:
Fig. 39 – Experiment 6 box plot
Legend: Conditions: Silent – video without any sound Same Volume – same volume on both hearts LH – lower volume for the girl heart and higher volume for the fairy heart HL – higher volume for the girl heart and lower volume for the fairy heart Answer: Girl – The boy liked more the girl Fairy – The boy liked more the fairy Both – The boy liked both the same
71/100
7.6.3. Discussion
This test successfully proved our assumption of the existing relation between sound volume and
strength of the relation between characters. The results were validated since we got a
significance value of p = 0.001. We can see that in the silent and same volume version,
participants tend to answer that the boy liked both the same because they couldn’t perceive any
difference between the two actions. However, there was still a higher percentage of answers on
the girl side than on the fairy side, since this character would be more natural to be liked by the
boy. When the changes of volume were made, the doubts dissipated and the majority of
answers fell in the side of the higher volumes (62% and 82%). However, in the version where
the volume was higher for the fairy, there was still some participants that answers that the boy
liked more the girl (23%), probably because it would be a more natural answer.
7.7. Character theme and mood
The objective of this experiment was to test if the change of the character theme according to
the mood helps to understand how that character is feeling. To test this, we presented at the
participants of this test three different situations, where the music played was different and we
then analyzed if these changes affected the answers related to the mood of the characters.
7.7.1. Method
Design
The Independent variable of this experiment is the change of character’s theme according to
mood, while the dependent variable is the understanding of the mood of a character. This
experiment followed an independent design and the type of data collected consists on a
frequency.
Participants
A total of 62 participants answered to this experiment. 20 watched the silent version, 21
watched the version with a happy theme and 21 the version with a sad theme.
Procedure
The video used in this experiment was the same that was used in the previous one. However,
this test focuses on the last part of it, when the boy is alone in scene after he sent the hearts to
the two other characters.
The test participants are then divided in 3 groups:
- In the first group, the participants saw a silent video.
- In the second group, the participants saw a video where in the end - when the boy was alone -
a theme corresponding to a happy mood was played.
72/100
- In the last group, the participants saw a video where the ending theme corresponded to a sad
mood.
In the end, we asked for the participants to tell if they thought that, in the end, the boy was very
happy, happy, sad, very sad or felt other feeling.
7.7.2. Results
In order to better analyze the results, the answers were grouped in 3 segments according to its
valence: neutral (other feeling), negative (sad and very sad) or positive (happy and very happy).
In the silent version, we can see that the negative valence got 65% of the answers, while the
positive valence got 20% of them. The remaining 15% corresponded to the neutral valence.
In the version where a happy theme was played, 81% of the answers were on the positive
valence, while 19% of the answers were on the negative side. There was no answer on the
neutral valence for this type of video.
In the version where a sad theme was played, 95% of the answers were on the negative
valence. 5% of the answers were on the positive. There were no answers on the neutral
valence for this type of video.
A Kruskal-Wallis test was applied to the results and we obtained a value p < 0.001 which
indicates that the results are significant.
Following, we can see a box plot that illustrates the results better:
Fig. 40 – Experiment 7 results
73/100
7.7.3. Discussion
This test proved the initial hypothesis that the nature of the theme played can influence the
perception of the characters mood in a story. The results were validated since we got a
significance value of p < 0.001. We can see that the sad theme was less ambiguous than the
happy theme, since it got 95% of the answers on the negative valence as supposed, while the
happy theme got less (81%) on the positive valence. When no music was played, the fact that
65% of the participants answered in the negative valence might be related with two factors: the
first part of the video was the sending of the hearts for the two characters. Then, those
characters left the scene and the boy was alone on it. Participants might have associated those
events as making the boy sad, since the other characters didn’t correspond to his hearts. The
second factor might be related to the association between loneliness and sadness. Since
humans are social beings, they don’t like to be alone. Therefore, loneliness might be associated
to sadness.
7.8. Music associated to story’s key moments –
Friendship and Duel Themes
The objective of this experiment is to test if the use of music associated to key moments such
as friendship and duel parts helps to understand what is happening in the story.
7.8.1. Method
Design
The Independent variable of this experiment is the use of friendship / duel themes, while the
dependent variable is the understanding of what is happening in the story. This experiment
followed an independent design and the type of data collected consists on a frequency.
Participants
A total of 62 participants answered to this experiment. There were used 6 different videos, and
the distribution of the participants was the following:
Fig. 41 – Participants distribution
Characters Music Participants
Boy and Dragon Silent 8
Boy and Dragon Friendship Theme 13
Boy and Dragon Duel Theme 14
Boy and Fairy Silent 7
Boy and Fairy Friendship Theme 7
Boy and Fairy Duel Theme 13
74/100
Procedure
In this experience, we showed to the participants a video where two characters were moving
and interacting with each other in an ambiguous way. The characters that interacted and the
music played in the background were changed among 6 different videos that were shown to
different groups of participants.
1) Silent video with the boy and the dragon
2) Video with the boy and the dragon with the duel theme
3) Video with the boy and the dragon with the friendship theme
4) Silent video with the boy and the fairy
5) Video with the boy and the fairy with the duel theme
6) Video with the boy and the fairy with the friendship theme
In the end, we asked for the participants to tell what they thought that the characters were doing
– if they were playing or fighting.
7.8.2. Results
Next, we will present the results for each video, showing what percentage each got for each
answer – if the characters were playing or fighting.
The video with the boy and dragon without any sound got 50% for playing and 50% for fighting.
The video with the boy and dragon scored with the duel theme got 7% for playing and 93% for
fighting. The video with the boy and dragon scored with the friendship theme got 100% for
playing and 0% for fighting. The video with the boy and fairy without any sound got 100% for
playing and 0% for fighting. The video with the boy and fairy score with the duel theme got 15%
for playing and 85% for fighting. The video with the boy and fairy scored with the friendship
theme got 100% for playing and 0% for fighting.
To analyze the significance of this test, we want to see not only if there is any influence in the
answers with the change of theme between the videos (friendship and duel) but also if there is
an influence with the change of characters (dragon and fairy). We applied then the Kruskal-
Wallis test to these 2 dimensions of independent variables: themes and characters. The first
one got a result of p < 0.001 which makes it highly significant, while the second got a result of p
= 0.542 which makes it not significant. Additionally, we verified with a third Kruskal-Wallis test
that the changes were significant among all the 6 versions of videos, which got us a significant
result of p < 0.001.
Following, we can see two graphs that illustrate the differences of the answers according to the
change of theme:
75/100
0
2
4
6
8
10
12
14
Silent Duel Friendship
Playing
Fighting
0
2
4
6
8
10
12
Silent Duel Friendship
Playing
Fighting
Boy and Dragon
Fig. 42 - Answers for the videos with the boy and the dragon
Boy and Fairy
Fig. 43 - Answers for the videos with the boy and the fairy
7.8.3. Discussion
Since the change of characters did not have any influence in the answers (non-significance of
Kruskal-Wallis test with p > 0.05), we can then conclude that the only influence in the answers
was on the theme change. This influence would be enough to completely change the opinion of
the participants about what the characters were doing.
The success of this test might be associated to the ambiguity of the videos, which would give
more space to the music to influence the participants. Since the viewers couldn’t clearly tell
what the characters were doing just by looking at their movement, they would have to get hints
coming from their hearing sense. Therefore, the music being played strongly affected their
opinion about the interaction of those characters.
76/100
7.9. Music associated to story’s key moments –
Climax Theme
The objective of this experiment was to test if the use of music associated to the climax helps to
understand the increased intensity of the environment at that point. We will then test the
participants under two different scenarios. In the first one we will have a video of a climax scene
without any sound on it. In the second one, the same video will be played but it will be scored
with one of the climax themes existent in D3S. We will then analyze the perception of intensity
by the participants under the two different conditions.
7.9.1. Method
Design
The Independent variable of this experiment is the use of climax theme, while the dependent
variable is the perception of the story intensity. This experiment followed an independent design
and the type of data collected consists on a 5-point Likert scale score (0-5).
Participants
A total of 62 participants answered to this experiment. 27 of them watched the version without
any sound while 35 watched the version with sound.
Procedure
The experiment participants were divided in 2 groups:
- To the first group, we showed a video of a story when it reached its climax without any sound.
- To the second group, the same video was shown, but this time it was played along with a
climax theme
In the end, it was asked for the participants to rate their perception of the intensity of the story
on each of the videos in a 5-point Likert scale (0-5), where 0 corresponded to “Not Intense” and
5 to “Very Intense”.
7.9.2. Results
The video without any sound got a mean intensity of 2.85, while the video with sound got a
mean of 3.91.
.A Mann-Whitney test was performed in order to validate the significance of this test. A result of
p = 0.001 makes the answers changing significantly according to the video played. The
following box plot illustrates the changes of answers under the two conditions:
77/100
Fig. 44 – Experiment 9 answers
7.9.3. Discussion
The results of this experiment were validated since we got a significance value of p =
0.001.They proved that the existence of climax music adds to the perception of intensity of a
scene. This makes it beneficial to use if we want the viewers to understand the scene’s high
intensity, which is a characteristic of the climax part in almost every story.
78/100
8. Conclusions In the beginning of this thesis, we proposed to find a solution for the following question: how to
score a story, which is being created in real time by storytelling environments, using sounds and
music in a way that adds more understanding of what is happening in the story and
consequently increases the enjoyment to the viewer?
The D3S project intended to be a contribution to both the fields of interactive storytelling and
automatic scoring. With it, we intended to explore a different approach on how to produce sound
and music to accompany a story and help the viewer to understand it better while offering a
more entertaining experience.
To help the viewer to understand the story, we used sounds to score actions made by the
characters so that those actions could be better understood by the viewer. That way, the viewer
could easily distinguish who did the action through the use of different instruments for different
characters. In the case of actions towards other characters, they could understand better the
strength of the connection between those characters. This strength was perceived through
changes on volume.
Additionally, we also used background music to score the story that was being created in the
storytelling environment. This type of music would help the viewer to understand the nature of
the characters that appeared in the story, namely the hero, his friend and the villain. Besides, it
would also help to understand the key moments of the story, such as the duel, friendship and
climax scenes. The background music was also used to emphasize the emotions felt by the
hero by changing the nature of his theme. Finally, we used it as a way of making the viewer
perceive the intensity of the environment through changes in tempo.
The adding of sound and music to a previously silent environment increases the enjoyment of
the viewer, since it makes the experience more complete by using the hearing sense. The use
of pre-composed music was an important factor to contribute to the increase of the enjoyment
since it minimizes the risk of music sounding unpleasant.
The integration with the I-Sounds system was successful, since we managed to include all the
features of this thesis in that system. Additionally, we created the Apollo manager which will
help the addition of future composition modules, increasing the extensibility of the I-Sounds
system.
Besides that, even though the main storytelling system used to test this thesis features was I-
Shadows, we also integrated I-Sounds with FearNot!, proving not only the adaptability of the I-
Sounds system but also of the work developed for D3S, since it was possible to see all the
features implemented for this thesis working in that storytelling environment as well. The
successful integration with two different storytelling systems proves the generality of D3S
framework.
The results obtained were satisfying in a way that we have reached our initial proposed goals –
viewers could understand better the story being created and have a more enjoyable experience
79/100
while watching it. We validated the results for the following features: increase of enjoyment of
the story when using D3S comparing with the use of Amadeus or with no use of any sound
scoring system; better understanding of the author of an event when using instruments
associated to characters; better understanding of the emotional relation between two characters
when using differences of volume in sound; better understanding of the character’s mood when
changing the theme according to it and better perception of key moments when using different
kinds of background music.
The results for the use of themes to help to identify the hero and friend characters were not
conclusive. With this result, we can conclude that the choice of these themes was not the best
possible, which made it hard for the viewer to distinguish between both. However, we have to
take into account that the shape of the character used was the same, which might have
influenced the result. Additionally, the results for better perception of the environment’s intention
through changes in music’s tempo, use of sounds to score events as a way of understanding
them better were not conclusive either. However, even though these two tests were not
conclusive, we believe that they could be reevaluated with some changes. Those changes are
explained in the next and final section of this thesis.
8.1. Future Work
In this work we explored several features related to music and sound and used them in a way
that they could help the viewer to better understand the story. Those features, such as the
change on background music tempo, use of different instruments for different characters,
change of volume according to the emotions strength and others, were used to enhance certain
characteristics of the story being made. Other features can be explored and used in future
works. For instance, we could try to use different octaves according to the character gender. As
we know, male voices are usually lower in frequency than female voices. If we make an analogy
between the pitch of the voice and octaves, we could then play the sounds associated to the
characters in its respective octave, according to the character gender. This type of new features
and many others could be integrated within the system to create new ways of understanding the
story.
Other changes could be made in the output system. As the system is now, it only supports MIDI
files to be played. The reason for it was explained as being easier to manipulate this type of files
so we could fulfill our intention of changing musical parameters. In the future, this system could
be adapted to WAV or MP3 files, which have an overall greater quality and would produce more
interesting results. The challenge here would consist on adapt the parameter changes (such as
music tempo, volume, etc) to the new format.
Other approaches could also be made in the D3S music Manager, Apollo. Different semantics
could be explored, such as creating other ways of selecting the music to play. New composer
modules (such as Amadeus or Ludwig) could be created and integrated to the system, making
the type of music available to be played richer and new advances in the automatic composing
field could be made. The system was made in a way that it would be easy to extend it by adding
80/100
or changing the composer modules. The coordination of those modules and the decisions on
when and how to use those modules would be made at the manager level.
We could also reevaluate the tests that didn’t produce conclusive results by changing some
aspects of them. In the relation between music’s tempo and environment’s intensity, we could
make a new test where we would focus on finding a relation between the change of music and
the change of the environment’s intensity perception instead of trying to find a relation between
music’s tempo value and an intensity value. On the events’ sounds test, we can make a new
evaluation using a more ambiguous video which would eventually generate better results
related to the understanding of the events.
81/100
References [1] Frijda, Nico H. The Emotions. Cambridge (UK): Cambridge University Press, 1986, p.207
[2] Lewis PA, Critchley HD, Rotshtein P and Dolan RJ. Neural Correlates of Processing Valence
and Arousal in Affective Words, 2006
[3] Brisson, António and Paiva, Ana. Are we telling the same story? Balancing real and virtual
actors in a collaborative story creation system, AAAI Fall Symposium on Narrative Intelligence
Technologies, Westin Arlington Gateway, Arlington, Virginia, November 9-11, 2007
[4] Brisson, António. I-Shadows. Using Affect in an Emergent Interactive Drama. MSc Thesis,
Instituto Superior Técnico, Portugal, 2007
[5] Dias J, and Paiva A.: 2005, Feeling and reasoning: A computational model for emotional
characters. Proceedings of EPIA 2005, 127-140, Springer, LNCS, 2005.
[6] ECIRCUS – Education through Characters with emotional-Intelligence and Role-playing
Capabilities that Understand Social interaction - The FearNot! Demonstrator - http://www.e-
circus.org/ (Jan 2009)
[7] Aylett, Ruth , Vala, Marco, Sequeira, Pedro and Paiva, Ana. FearNot! – An Emergent
Narrative Approach to Virtual Dramas for Anti-bullying Education. ICVS 2007, LNCS 4871, pp.
202 – 205, 2007.
[8] Worth, Sarah E. Music, Emotion and Language: Using Music to Communicate, Furman
University, 1998
[9] Vaidya, Geetanjali, Music, Emotion and the Brain, 2004
[10] Blood, A.J., Zatorre, R.J, Bermudez, P., and Evans, A.C. Emotional responses to pleasant
and unpleasant music correlate with activity in paralimbic brain regions, Montreal Neurological
Institute, Montreal, Canada 1999
[11] Blood, A.J , Zatorre, R.J, Intensely pleasurable responses to music correlate with activity in
brain regions implicated with reward and emotion, Montreal Neurological Institute, Montreal,
Canada, 2001
[12] Heslet, Prof. Dr. Lars. Our Musical Brain, Danish State Hospital, 2003
[13] Lemonick, Michael, Music on the Brain: Biologists and psychologists join forces to
investigate how and why humans appreciate music., 2000
[14] Wang, M., Zhang, N., and Zhu, H., User-Adaptive Music Emotion Recognition, IEEE, Int.
Conf. Signal Processing,
pp. 1352-1355, 2004.
[15] Liu, D., Lu, L., and Zhang, H. J., Automatic Mood Detection from Acoustic Music Data,
ISMIR, 2003.
[16] Yang, D., and Lee, W., Disambiguating Music Emotion Using Software Agents, ISMIR,
2004.
[17] Yang, Yi-Hsuan, Liu, Chia-Chu and Chen, Homer H. Music Emotion Classification: A Fuzzy
Approach, Graduate Institute of Communication Engineering, National Taiwan University, 2006
[18] Thayer, R. E., The Biopsychology of Mood and Arousal, Oxford University Press, 1989
82/100
[19] Cheng-Te, Li, Man-Kwan, Shan. Emotion-based Impressionism Slideshow
with Automatic Music Accompaniment, Department of Computer Science, National Chengchi
University, Taipei, Taiwan, 2007
[20] - Juslin, P.N. Cue Utilization in Communication of Emotion in Music Performance: Relating
Performance to Perception, Journal of Experimental Psychology: Human Perception and
Performance, vol. 26, no. 6, pp 1797-1813., 2000
[21] - Bresin, R. and Friberg A., Synthesis and Decoding of Emotionally Expressive Music
Performance, IEEE International Conference on Systems, Man, and Cybernetics, Tokyo, Japan,
1999
[22] Gabrielsson, A. and Lindström, E. The Influence of Musical Structure on Emotional
Expression, 2001
[23] Jan Berg, Johnny Wingstedt, Relations between Selected Musical Parameters and
Expressed Emotions – Extending the Potential of Computer Entertainment, Interactive Institute,
Sweden, 2005
[24] Oppenheim, Y. The functions of film music Published in the online magazine
Film Score Monthly, http://www.filmscoremonthly.com/features/functions.asp (Dec 2007)
[25] Wikipedia, Filme Score, http://en.wikipedia.org/wiki/Film_score (Dec 2007)
[26] Encyclopedia of Music in Canada – Film Scores - www.thecanadianencyclopedia.com (Dec
2007)
[27] Gorbman, Cláudia . Unheard Melodies: Narrative Film Music. Bloomington: Indiana
University Press, 1987
[28] Spande, Robert. The Three Regimes: A Theory of Film Music, The University of Minnesota,
Minneapolis, USA, 1996
[29] John Debney, Interview in Playback Magazine,
http://www.ascap.com/playback/2003/march/debney.html (Dec 2007)
[30] Karen Collins, Functions of Game Audio, http://www.gamesound.com/functions.pdf (Dec
2007)
[31] Downie, Marc. behavior, animation music: the music and movement of synthetic characters.
BA, MSci, Magdalene College, University of Cambridge, 1998
[32] Bresin, Roberto. Virtual Virtuosity – Studies in Automatic Music Performance Doctoral
dissertation, Stockholm, Sweden: KTH, Speech Music and Hearing, 2000
[33] Casella, Pietro. Music, Agents and Emotions, MSc Thesis, Engenharia Informática e de
Computadores, Instituto Superior Técnico, Universidade Técnica, Lisboa, Portugal, 2004.
[34] De Quincey, Andrew. Herman. Master’s thesis, Division of Informatics, University of
Edinburgh, 1998
[35] Robertson, Judy, Good, Judith. Ghostwriter: A narrative virtual environment for children.
Interaction Design And Children, England, 2003
[36] Jun-Ichi Nakamura, Tetsuya Kaku, Kyungsil Hyun, Tsukasa Noma, Sho Yoshida, Automatic
background music generation based on actors’ mood and motions. The Journal of Visualization
and Computer Animation, Volume 5, Issue 4 , Pages 247 – 264, 1994
83/100
[37] Berndt, Axel, Theisel, Holger, Adaptive Musical Expression from Automatic Realtime
Orchestration and Performanc, pp. 132-143, Springer, LNCS, 2008
[38] Cope, David. Experiments in Musical Intelligence. Madison, WI: A-R Editions, 1996
[39] Propp, Vladimir. Morphology of the Folktale. 1928. 2nd ed. Trans. Lawrence Scott. Austin:
U of Texas P, 1968.
[40] Lafrance, J.D. In Dreams I Talk To You…- The use of sound in David Lynch’s Blue Velvet,
http://www.armchairdirector.org/features/archive/bluevelvet/index.htm (Dec 2007)
[41] Disney’s 1946 movie – Peter and the Wolf - http://www.imdb.com/title/tt0038836/
[42] Thulga, Phil. Peter and the Wolf - a musical story by Sergei Prokofiev,
http://www.philtulga.com/Peter.html (Apr 2008)
[43] Music Theory Online: Tempo - http://www.dolmetsch.com/musictheory5.htm (Dec 2007)
[44] Cruz, Ricardo. I-Sounds: Emotion-based Music Composition for Virtual Environments, MSc
Thesis, IST, Apr 2008
[45] Lopes, E. Just In Time: Towards a theory of rhythm and metre. PhD thesis, University of
Southampton, UK, 2003.
84/100
Appendix A
The evaluation form
The test individuals answered to an online form where they needed to watch videos / hear
sounds and then answer to questions related to those. A brief description about the form was
available to read before the start of it:
Dynamic Story Shadows and Sounds
Welcome! D3S is a system that was developed to accompany a storytelling environment. Through the use of music and sounds, it has the objective of making a better experience to the viewer, helping to his comprehension of the story and increasing the enjoyment. The following tests were developed with the objective of validating the system as a whole and some of its features in particular. Since there are used many videos with sound, we would like to ask you to use your headphones before you start. However, take into account that some videos don't have any sound. Watch every video with attention, since these should only be watched once. After each video, a question will be asked. There are not any right or wrong answers. We thank you for your participation. Click in "next" when you are ready. Fig. 45 – Brief description of the testing procedure
The structure of the form can be seen in the next figure. Notice that the questions and available
answers were all the same among the 9 versions. The only changes between them were the
videos displayed in each one.
85/100
86/100
87/100
88/100
Appendix B
Evaluation Tests – Organization and Data Calculation
Experiment 7.1 - General system evaluation - D3S and story enjoyment
Version Experiment 1
A S D A
B D S A
C S A D
D A D S
E A S D
F D A S
G S A D
H S D A
I A S D
Fig. 46 – Experiment 1 versions
Legend: S – Silent version A – Amadeus version D – D3S version
N Mean Std. Deviation Minimum Maximum
Silent 62 1,73 ,813 1 4
Amadeus 62 2,27 ,978 1 5
D3S 62 3,21 1,026 1 5
Fig. 47 - Experiment 1 Descriptive Statistics
Friedman Test
Ranks
Mean Rank
Silent 1,35
Amadeus 1,90
D3S 2,74
Test Statisticsa
N 62
Chi-Square 75,798
df 2
Asymp. Sig. ,000
a. Friedman Test
Fig. 48 - Experiment 1 Friedman Test
89/100
Experiment 7.2 - Background music’s tempo and environment’s intensity
Version Experiment 2
A S
B M
C F
D S
E M
F F
G S
H M
I F
Fig. 49 – Experiment 2 versions
Legend: S – Slow tempo M – Medium tempo F – Fast tempo
Descriptive Statistics
N Mean Std. Deviation Minimum Maximum
Slow 21 2,57 1,028 1 5
Medium 21 3,05 1,203 1 5
Fast 20 2,80 1,196 1 5
Fig. 50 - Experiment 2 Descriptive Statistics
Kruskal-Wallis Test
Ranks
Tempo N Mean Rank
Slow 21 27,95
Medium 21 34,57
Fast 20 32,00
Intensity
Total 62
Test Statisticsa,b
Intensity
Chi-Square 1,532
df 2
Asymp. Sig. ,465
a. Kruskal Wallis Test
90/100
Test Statisticsa,b
Intensity
Chi-Square 1,532
df 2
Asymp. Sig. ,465
a. Kruskal Wallis Test
b. Grouping Variable: Tempo
Fig. 51 - Experiment 2 Kruskal-Wallis Test
Experiment 7.3 - Association between characters and instruments
Descriptive Statistics
N Mean Std. Deviation Minimum Maximum
Boy 62 ,73 ,450 0 1
Girl 62 ,74 ,441 0 1
Dragon 62 ,92 ,275 0 1
Score 62 2,42 ,897 0 3
Fig. 52 - Experiment 3 Descriptive Statistics
Chi-Square Test Frequencies
Boy
Observed N Expected N Residual
Wrong 17 31,0 -14,0
Right 45 31,0 14,0
Total 62
Girl
Observed N Expected N Residual
Wrong 16 31,0 -15,0
Right 46 31,0 15,0
Total 62
91/100
Dragon
Observed N Expected N Residual
Wrong 5 31,0 -26,0
Right 57 31,0 26,0
Total 62
Score
Observed N Expected N Residual
None 2 15,5 -13,5
1 Right 11 15,5 -4,5
2 Right 8 15,5 -7,5
All Right 41 15,5 25,5
Total 62
Test Statistics
Boy Girl Dragon Score
Chi-Square 12,645a 14,516
a 43,613
a 58,645
b
df 1 1 1 3
Asymp. Sig. ,000 ,000 ,000 ,000
a. 0 cells (,0%) have expected frequencies less than 5. The minimum
expected cell frequency is 31,0.
b. 0 cells (,0%) have expected frequencies less than 5. The minimum
expected cell frequency is 15,5.
Fig. 53 - Experiment 3 Chi-Square Test
92/100
Experiment 7.4 - Themes associated to characters
Version Experiment 4
A FVHN
B S
C HFVN
D S
E VFNH
F S
G HFVN
H S
I FVHN
Fig. 54 – Experiment 4 versions
Legend: F – Friend Theme V – Villain Theme H – Hero Theme N – No Theme associated S – Silent Version
Versions with sound:
Descriptive Statistics
N Mean Std. Deviation Minimum Maximum
Hero 35 ,51 ,507 0 1
Friend 35 ,49 ,507 0 1
Villain 35 ,69 ,471 0 1
Score 35 1,71 1,045 0 3
Fig. 55 - Experiment 4 Descriptive Statistics – with sound
Chi-Square Test Frequencies
Hero
Observed N Expected N Residual
Wrong 17 17,5 -,5
Right 18 17,5 ,5
Total 35
93/100
Friend
Observed N Expected N Residual
Wrong 18 17,5 ,5
Right 17 17,5 -,5
Total 35
Villain
Observed N Expected N Residual
Wrong 11 17,5 -6,5
Right 24 17,5 6,5
Total 35
Score
Observed N Expected N Residual
None 5 8,8 -3,8
1 Right 10 8,8 1,3
2 Right 10 8,8 1,3
All Right 10 8,8 1,3
Total 35
Test Statistics
Hero Friend Villain Score
Chi-Square ,029a ,029
a 4,829
a 2,143
b
df 1 1 1 3
Asymp. Sig. ,866 ,866 ,028 ,543
a. 0 cells (,0%) have expected frequencies less than 5. The minimum
expected cell frequency is 17,5.
b. 0 cells (,0%) have expected frequencies less than 5. The minimum
expected cell frequency is 8,8.
Fig. 56 - Experiment 4 Chi-Square test – with sound
94/100
Chi-Square Test Frequencies
Hero
Observed N Expected N Residual
Blue 12 6,8 5,3
Yellow 4 6,8 -2,8
Red 10 6,8 3,3
Green 1 6,8 -5,8
Total 27
Friend
Observed N Expected N Residual
Blue 4 6,8 -2,8
Yellow 16 6,8 9,3
Red 1 6,8 -5,8
Green 6 6,8 -,8
Total 27
Villain
Observed N Expected N Residual
Blue 7 6,8 ,3
Yellow 1 6,8 -5,8
Red 13 6,8 6,3
Green 6 6,8 -,8
Total 27
Test Statistics
Hero Friend Villain
Chi-Square 11,667a 18,778
a 10,778
a
df 3 3 3
Asymp. Sig. ,009 ,000 ,013
a. 0 cells (,0%) have expected frequencies less than 5.
The minimum expected cell frequency is 6,8.
Fig. 57 - Experiment 4 Chi-Square Test – without sound
95/100
Experiment 7.5 - Use of sounds to underscore events
Version Experiment 5
A X
B Y
C X
D Y
E X
F Y
G X
H Y
I X
Fig. 58 – Experiment 5 versions
Legend: X – Silent Y – With sound
Descriptive Statistics
N Mean Std. Deviation Minimum Maximum
Understanding 62 ,90 ,298 0 1
Fig. 59 - Experiment 5 Descriptive Statistics
Mann-Whitney Test
Ranks
Sound N Mean Rank Sum of Ranks
Without 35 30,07 1052,50
With 27 33,35 900,50
Understanding
Total 62
Test Statisticsa
Understanding
Mann-Whitney U 422,500
Wilcoxon W 1052,500
Z -1,386
Asymp. Sig. (2-tailed) ,166
a. Grouping Variable: Sound
Fig. 60 - Experiment 5 Mann-Whitney test
96/100
Experiment 7.6 - Volume and relations between characters
Version Experiment 6
A S
B S
C X
D LH
E LH
F X
G HL
H HL
I X
Fig. 61 – Experiment 6 versions
Legend: S – Same volume X – Silent L – Lower volume H – Higher volume
Kruskal-Wallis Test
Ranks
Conditions N Mean Rank
Silent 20 36,15
Same Volume 15 39,27
LH 13 31,69
HL 14 16,36
Answer
Total 62
Test Statisticsa,b
Answer
Chi-Square 16,360
df 3
Asymp. Sig. ,001
a. Kruskal Wallis Test
b. Grouping Variable:
Conditions
Fig. 62 - Experiment 6 Kruskal-Wallis test
97/100
Experiment 7.7 - Character theme and mood
Version Experiment 7
A H
B S
C X
D H
E S
F X
G H
H S
I X
Fig. 63 – Experiment 7 versions
Legend: H – Happy theme played in the end S – Sad theme played in the end X – Silent
Kruskal-Wallis Test
Ranks
ThemeValence N Mean Rank
SilentValence 20 28,50
HappyValence 21 45,31
SadValence 21 20,55
Valence
Total 62
Test Statisticsa,b
Valence
Chi-Square 27,724
df 2
Asymp. Sig. ,000
a. Kruskal Wallis Test
b. Grouping Variable:
ThemeValence
Fig. 64 - Experiment 7 Kruskal-Wallis test
98/100
Experiment 7.8 - Music associated to story’s key moments – Friendship and Duel Themes
Version Experiment 8
A BDS
B BDDu
C BDFr
D BFDu
E BFS
F BFFr
G BDFr
H BFDu
I BDDu
Fig. 65 – Experiment 8 versions
Legend: B – Boy D – Dragon F – Fairy S – Silent Fr – Friendship theme Du – Duel theme
Kruskal-Wallis Test - Themes
Ranks
Theme N Mean Rank
Silent 15 25,77
Friendship 20 17,50
Duel 27 45,06
Answer
Total 62
Test Statisticsa,b
Answer
Chi-Square 38,753
df 2
Asymp. Sig. ,000
a. Kruskal Wallis Test
b. Grouping Variable:
Theme
Fig. 66 - Experiment 8 Kruskal-Wallis test for themes
99/100
Kruskal-Wallis Test - Characters
Ranks
Character N Mean Rank
Dragon 35 32,56
Fairy 27 30,13
Answer
Total 62
Test Statisticsa,b
Answer
Chi-Square ,371
df 1
Asymp. Sig. ,542
a. Kruskal Wallis Test
b. Grouping Variable:
Character
Fig. 67 - Experiment 8 Kruskal-Wallis test for characters
Kruskal-Wallis Test - Video versions
Ranks
Type N Mean Rank
BDS 8 33,00
BDF 13 17,50
BDD 14 46,29
BFS 7 17,50
BFF 7 17,50
BFD 13 43,73
Answer
Total 62
Test Statisticsa,b
Answer
Chi-Square 42,643
df 5
Asymp. Sig. ,000
a. Kruskal Wallis Test
b. Grouping Variable: Type
Fig. 68 - Experiment 8 Kruskal-Wallis test for versions
100/100
Experiment 7.9 - Music associated to story’s key moments – Climax Theme
Version Experiment 9
A Y
B X
C Y
D X
E Y
F X
G Y
H X
I Y
Fig. 69 – Experiment 9 versions
Legend: Y – With climax music X – Silent
Descriptive Statistics
Intensity
Type Mean N Std. Deviation
Without Music 2,85 27 1,262
With Music 3,91 35 ,919
Total 3,45 62 1,197
Fig. 70 - Experiment 9 Descriptive Statistics
Mann-Whitney Test
Ranks
Type N Mean Rank Sum of Ranks
Without Music 27 23,19 626,00
With Music 35 37,91 1327,00
Intensity
Total 62
Test Statisticsa
Intensity
Mann-Whitney U 248,000
Wilcoxon W 626,000
Z -3,307
Asymp. Sig. (2-tailed) ,001
a. Grouping Variable: Type
Fig. 71 - Experiment 9 Mann-Whitney test