automatic recognition of social roles using long term role
TRANSCRIPT
Automatic recognition of social roles using long termrole transitions in small group interactions
Gaurav Fotedar1 Aditya Gaonkar P2,1 Saikat Chatterjee3
Prasanta Kumar Ghosh1
1Department of Electrical EngineeringIndian Institute of Science, Bangalore, India
2Department of Electrical EngineeringColumbia University, New York, USA
3Department of Communication TheoryKTH Electrical Engg. School, Stockholm, Sweden
Interspeech, 2016
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 1 / 39
Outline
1 Introduction
2 Data
3 Proposed MethodUsing Role TransitionsDynamic Programming Algorithm
4 Experiments and ResultsFeaturesExperimental SetupResults and Discussion
5 Conclusions and Future work
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 2 / 39
Introduction
What are meeting roles?
Roles define who is doing what in a meeting
They can be of two types :
Formal Roles: Often relate to the official designation of a member e.gManager, Developer, Designer. These stay constant for a member forthe length of the meeting.Social Roles: These characterise the behaviour of a member at aparticular time in a meeting e.g. Protagonist, Supporter. Hence, amember goes through many social roles as the meeting progresses.
We will be focusing on the recognition of social roles in our work.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 3 / 39
Introduction
Why social roles?
Social roles characterise relationships between members in a meetingand capture the dynamics of a meeting.
They answer semantic queries like, Who is doing? What in an event?
Knowing social roles helps to determine engagement, socialdominance and hot-spots in meetings.
Information of social roles has been used for topic segmentation inconversation discourses and summarising spoken documents
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 4 / 39
Introduction
Types of social roles
Typically, the social roles in a meeting could be:
Gatekeeper - a group moderator.
Neutral - a passive participant.
Protagonist - the driver of the conversation.
Supporter - participant with cooperative attitude.
Attacker - participant expressing disagreement.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 5 / 39
Introduction
Types of social roles
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 6 / 39
Introduction
Challenges in recognition of social roles in meetings
Dis-fluency in speech and overlaps of members while speakingincrease errors of ASR and speaker segmentation systems.
Short speaker turns reduce the data available for feature extraction ofa particular speaker.
Limited availability of annotated corpora.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 7 / 39
Introduction
Related Work and Our Contribution
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 8 / 39
Prior Work
Zancanaro et al.a used speech activity and body fidgeting featureswith a SVM classifier.
aZancanaro, Massimo, Bruno Lepri, and Fabio Pianesi. ”Automatic detection of group functional roles in face to faceinteractions.” Proceedings of the 8th international conference on Multimodal interfaces. ACM, 2006.
Introduction
Related Work and Our Contribution
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 9 / 39
Prior Work
Zancanaro et al.a used speech activity and body fidgeting featureswith a SVM classifier.
Valente et al.b used prosodic and turn-taking features combined withinfluence of speakers on one another.
aZancanaro, Massimo, Bruno Lepri, and Fabio Pianesi. ”Automatic detection of group functional roles in face to faceinteractions.” Proceedings of the 8th international conference on Multimodal interfaces. ACM, 2006.bValente, Fabio, and Alessandro Vinciarelli. ”Language-Independent Socio-Emotional Role Recognition in the AMI MeetingsCorpus.” INTERSPEECH. 2011.
Introduction
Related Work and Our Contribution
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 10 / 39
Prior Work
Zancanaro et al.a used speech activity and body fidgeting featureswith a SVM classifier.
Valente et al.b used prosodic and turn-taking features combined withinfluence of speakers on one another.
Wilson et al.c used combinations of speech activity, subjectivity, andexpressive prosodic features with CRF.
aZancanaro, Massimo, Bruno Lepri, and Fabio Pianesi. ”Automatic detection of group functional roles in face to faceinteractions.” Proceedings of the 8th international conference on Multimodal interfaces. ACM, 2006.bValente, Fabio, and Alessandro Vinciarelli. ”Language-Independent Socio-Emotional Role Recognition in the AMI MeetingsCorpus.” INTERSPEECH. 2011.cWilson, Theresa, and Gregor Hofer. ”Using linguistic and vocal expressiveness in social role recognition.” Proceedings of the16th international conference on Intelligent user interfaces. ACM, 2011.
Introduction
Related Work and Our Contribution
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 11 / 39
Prior Work
Zancanaro et al.a used speech activity and body fidgeting featureswith a SVM classifier.
Valente et al.b used prosodic and turn-taking features combined withinfluence of speakers on one another.
Wilson et al.c used combinations of speech activity, subjectivity, andexpressive prosodic features with CRF.
Sapru et al.d annotated the AMI corpus with social roles and usedHCRF with combinations of lexical, acoustic and structural features.
aZancanaro, Massimo, Bruno Lepri, and Fabio Pianesi. ”Automatic detection of group functional roles in face to faceinteractions.” Proceedings of the 8th international conference on Multimodal interfaces. ACM, 2006.bValente, Fabio, and Alessandro Vinciarelli. ”Language-Independent Socio-Emotional Role Recognition in the AMI MeetingsCorpus.” INTERSPEECH. 2011.cWilson, Theresa, and Gregor Hofer. ”Using linguistic and vocal expressiveness in social role recognition.” Proceedings of the16th international conference on Intelligent user interfaces. ACM, 2011.dSapru, Ashtosh, and Herv Bourlard. ”Automatic recognition of emergent social roles in small group interactions.” IEEETransactions on Multimedia 17.5 (2015): 746-760.
Introduction
Related Work and Our Contribution
Our Contribution
All existing works have predicted roles in each meeting sliceindependently.
We incorporate role transition probabilities across meeting slices topredict social roles.
We propose a dynamic programming framework to reduce the runtimein estimating the sequence of predicted roles.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 12 / 39
Data
The AMI Corpus
100 hours of audio-visual recordings of role played meetings
4 members in each meeting with the formal roles:
Project ManagerIndustrial DesignerUser Interface DesignerMarketing Expert
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 13 / 39
Data
Social Role Annotation
Social Role annotation for 59 meetings are available. 1
Each meeting has been segmented into meeting slices based onpauses longer than 1 second. It is assumed that social role remainsconstant for a member for one meeting slice.
For each meeting slice, the 4 members have been assigned one socialrole from among Gatekeeper, Protagonist, Neutral, Supporter andAttacker.
1Done by Sapru et al.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 14 / 39
Data
Social Role Annotation
Figure: Distribution of social roles in the annotated AMI Corpus
Due to limited data for Attackers we only consider the other 4 socialroles.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 15 / 39
Proposed Method Using Role Transitions
Outline
1 Introduction
2 Data
3 Proposed MethodUsing Role TransitionsDynamic Programming Algorithm
4 Experiments and ResultsFeaturesExperimental SetupResults and Discussion
5 Conclusions and Future work
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 16 / 39
Proposed Method Using Role Transitions
Why consider transition probabilities
Typically role recognition is posed as a classification problem usingfeatures from the respective meeting slice
Let fk be the feature vector and Lk be the role in the k-th slice for aparticipant in the meeting.Pkr = Prob(Lk = ρr |fk), r = 1, 2, 3, 4 are computed by the classifier
and the role with the highest probability becomes the predicted role
However, the variation of the role of a participant across slices coulddepend on the group dynamics and personal characteristics.
So prediction of the role in the k-th slice can use the information ofthe roles in previous slices.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 17 / 39
Proposed Method Using Role Transitions
Formulating the role recognition problem
We formulate the problem as maximizing the joint probability of theroles of a participant in all slices expressed as
Prob(L1,L2, · · · ,LK |f1, f2, · · · , fK )
Using the definition of conditional probability,
Prob(L1,L2, · · · ,LK |f1, f2, · · · , fK ) ∝p(f1, f2, · · · , fK |L1,L2, · · · ,LK )Prob(L1,L2, · · · ,LK ) (1)
Next, we will consider the calculation of the two terms on the rightside of the above equation.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 18 / 39
Proposed Method Using Role Transitions
Formulating the role recognition problem
Assuming that given roles in all K slices, the feature vectors in theseslices are independent we obtain :
p(f1, f2, · · · , fK |L1,L2, · · · ,LK ) =K∏
k=1
p(fk |Lk) (2)
Also, assuming all roles are equally likely, we know
Pkr = Prob(Lk = ρr |fk) ∝ p(fk |Lk = ρr ) (3)
We propose to use a discriminative classifier to obtain p(Lk |fk) whichcan then be used to obtain the first term in (1) by (2) and (3).
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 19 / 39
Proposed Method Using Role Transitions
Formulating the role recognition problem
The second term in (1) Prob(L1,L2, · · · ,LK ) captures the long termrole dynamics of a participant.
Applying the First-order Markov chain assumption
= Prob(LK |L1, · · · ,LK−1)Prob(L1, · · · ,LK−1)
= Prob(LK |LK−1)Prob(L1, · · · ,LK−1) = · · ·
= Prob(L1)K∏
k=2
Prob(Lk |Lk−1) ∝K∏
k=2
Prob(Lk |Lk−1)
[assuming roles are equally likely] (4)
We can see that this term can be obtained using the role transitionprobabilities.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 20 / 39
Proposed Method Using Role Transitions
Formulating the role recognition problem
Assuming the role sequence across meeting slice as a First-orderMarkov process, we calculate count based transition probabilities.
HHHH
HHFrom
ToGatekeeper Neutral Protagonist Supporter
Gatekeeper 0.70 0.16 0.04 0.10
Neutral 0.02 0.72 0.03 0.23Protagonist 0.10 0.14 0.62 0.14
Supporter 0.06 0.35 0.07 0.52
Table: Role transition probabilities.
This shows us a pattern that probability of staying in the same role isrelatively high for all roles
There is a significant probability of transition between Neutral andSupporter however, transition from Neutral to Protagonist is unlikely.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 21 / 39
Proposed Method Using Role Transitions
Formulating the role recognition problem
We can now re-write (1) as
Prob(L1,L2, · · · ,LK |f1, f2, · · · , fK ) ∝(K∏
k=1
p(fk |Lk)
)1−γ ( K∏k=2
Prob(Lk |Lk−1)
)γ(5)
Where we use weights (1-γ) and γ, where γ (0 ≤ γ ≤ 1), to controlthe contribution of the role transition probabilities.
And the estimated sequence of roles can be obtained by
L̂k , ∀k = arg maxL1,··· ,LK
Prob(L1, · · · ,LK |f1, f2, · · · , fK ) (6)
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 22 / 39
Proposed Method Dynamic Programming Algorithm
Outline
1 Introduction
2 Data
3 Proposed MethodUsing Role TransitionsDynamic Programming Algorithm
4 Experiments and ResultsFeaturesExperimental SetupResults and Discussion
5 Conclusions and Future work
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 23 / 39
Proposed Method Dynamic Programming Algorithm
DP based Algorithm
Since there are 4 roles, a full search for solving (6) will have acomplexity of O(4K )
We propose a DP based solution having a complexity of O(16K )
Let Dr (k) be the maximum probability of assigning k many roles forfirst k slices with ρr as the role in the k-th meeting slice.
Let the back-tracking pointer be ξr (k) which stores the role assignedto the (k − 1)-th slice for obtaining the maximum probability Dr (k).
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 24 / 39
Proposed Method Dynamic Programming Algorithm
DP based Algorithm
Dr (k) is computed in a recursive manner as follows:
1 Initialization: Compute Dr (1) =(P1r
)(1−γ)using equation (3).
2 Iteration: For 2 ≤ k ≤ K and 1 ≤ r ≤ 4, compute the following:
Dr (k) = max1≤r ′≤4
{Dr ′(k − 1)×
(αr ,r ′
)γ}× (Pkr
)(1−γ)
ξr (k) = arg max1≤r ′≤4
{Dr ′(k − 1)×
(αr ,r ′
)γ}where Pk
r is obtained using equation (3) and αr ,r ′ = Prob(ρr |ρr ′)which is the transition probability obtained from the training data.
3 Backtracking: L̂K = arg maxr Dr (K ).
L̂k = ξL̂k+1(k + 1), k = K − 1,K − 2, · · · , 1 (7)
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 25 / 39
Experiments and Results Features
Outline
1 Introduction
2 Data
3 Proposed MethodUsing Role TransitionsDynamic Programming Algorithm
4 Experiments and ResultsFeaturesExperimental SetupResults and Discussion
5 Conclusions and Future work
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 26 / 39
Experiments and Results Features
Acoustic Features
Speaking Style and Vocal Expression can give hints about speaker role
We use OpenSMILE to extract various features such as : 2
Average, Standard Deviation, Skewness, range, kurtosis, minimum,maximumLinear and quadratic regression coefficients and approximation errors ofLow Level Descriptor (LLD) contours like Sub-band energy, spectralroll off, Spectral flux, short time energy
This creates a 297 dimensional feature vector
2Eyben, Florian, et al. ”Recent developments in openSMILE, the munich open-source multimedia feature extractor.”Proceedings of the 21st ACM international conference on Multimedia. ACM, 2013.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 27 / 39
Experiments and Results Features
Lexical Features
The words used by speakers in a meeting hold information about theirroles
We use Linguistic Inquiry and Word Count(LIWC) to analyse thespeech transcripts of meeting slices. 3
Weights for various linguistic categories are obtained through LIWCresulting in a 43-dimensional feature vector
3Pennebaker, James W., Matthias R. Mehl, and Kate G. Niederhoffer. ”Psychological aspects of natural language use: Ourwords, our selves.” Annual review of psychology 54.1 (2003): 547-577.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 28 / 39
Experiments and Results Features
Structural Features
Duration of speech and the number of speaker turns can holdsignificant information about the speaker role
Transcripts of the meeting slices provide the timestamps of wordutterances, which we utilise to calculate fraction of speaking time andnumber of turns taken by the speaker.
This results in an 2-dimensional feature vector
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 29 / 39
Experiments and Results Experimental Setup
Outline
1 Introduction
2 Data
3 Proposed MethodUsing Role TransitionsDynamic Programming Algorithm
4 Experiments and ResultsFeaturesExperimental SetupResults and Discussion
5 Conclusions and Future work
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 30 / 39
Experiments and Results Experimental Setup
Experimental Setup
Five fold cross validation setup
59 meetings randomly divided into 5 sets. 4 sets with 12 meetings, 1set with 11 meetings. For each fold:
3 sets are used for training.1 set is used as development set1 set is used for testing
The γ parameter is optimised on development set. In a grid searchapproach γ is varied from 0 to 1 with a step of 0.1, and the γ whichprovides the highest accuracy is considered as optimum
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 31 / 39
Experiments and Results Experimental Setup
Experimental Setup
We use HCRF for the classification task 4
3 hidden states with 500 function evaluations to train
Various combinations of features have been used namely Acoustic(A),Lexical(L), Structural(S), Acoustic+Lexical(AL),Acoustic+Structural(AS), Lexical+Structural(LS),Acoustic+Lexical+Structural(ALS).
We use three performance metrics : Precision, Recall, F-score foreach role averaged across five folds. Recall is also reported averagedacross all roles.
The work by Sapru et al. is considered as the baseline scheme. 5
4Python implementation freely available at https://github.com/dirko/pyhcrf5Sapru, Ashtosh, and Herv Bourlard. ”Automatic recognition of emergent social roles in small group interactions.” IEEE
Transactions on Multimedia 17.5 (2015): 746-760.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 32 / 39
Experiments and Results Results and Discussion
Outline
1 Introduction
2 Data
3 Proposed MethodUsing Role TransitionsDynamic Programming Algorithm
4 Experiments and ResultsFeaturesExperimental SetupResults and Discussion
5 Conclusions and Future work
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 33 / 39
Experiments and Results Results and Discussion
Results I
Method Gatekeeper Neutral Protagonist Supporter
PrecisionBaseline 0.50 0.90 0.50 0.67Proposed 0.57 0.89 0.57 0.67
RecallBaseline 0.43 0.91 0.47 0.73Proposed 0.44 0.92 0.44 0.75
F-scoreBaseline 0.46 0.91 0.46 0.69Proposed 0.49 0.91 0.46 0.70
Table: Performance metrics averaged across all feature combinations for all roles.
Significant improvement in Precision over the baseline for the roleswith lesser amount of data(Gatekeeper and Protagonist).
The recall (accuracy) averaged across all roles turns out to be 0.75and 0.76 using the baseline and the proposed methods respectively
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 34 / 39
Experiments and Results Results and Discussion
Results II
A L S LS AL AS ALS
0.45
0.5
0.55
0.6
0.65
Gatekeeper
Pre
cis
ion
A L S LS AL AS ALS
0.4
0.45
0.5
0.55
Recall
A L S LS AL AS ALS
0.4
0.5
0.6
F−
score
A L S LS AL AS ALS
0.85
0.9
0.95
Neutral
A L S LS AL AS ALS
0.860.88
0.90.920.940.960.98
A L S LS AL AS ALS
0.86
0.88
0.9
0.92
0.94
A L S LS AL AS ALS
0.5
0.6
0.7
Protagonist
A L S LS AL AS ALS
0.35
0.4
0.45
0.5
0.55
A L S LS AL AS ALS
0.35
0.4
0.45
0.5
0.55
A L S LS AL AS ALS
0.62
0.64
0.66
0.68
0.7
0.72
Supporter
A L S LS AL AS ALS
0.7
0.75
0.8
A L S LS AL AS ALS
0.65
0.7
0.75
Baseline Proposed
Figure: Various performance metrics for different feature combinations for all roles
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 35 / 39
Experiments and Results Results and Discussion
Results II
For the Neutral role, in terms of F-score the 2-dimensional StructuralFeatures perform as well as other higher dimensional featurecombinations for both methods.
For the Gatekeeper role, a combination of Lexical and structuralfeatures(LS) performs best in term of F-score for both methods.
For the Gatekeeper role, the proposed method outperforms thebaseline irrespective of feature combination used in terms of averageF-score.
For AL and ALS the proposed method improves F-score in 3 roles(Gatekeeper, Supporter, Protagonist).
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 36 / 39
Experiments and Results Results and Discussion
Results III
Feature Combinations
Fold A L S LS AL AS ALS
1 0.2 0.3 0.1 0.4 0.6 0.4 0.4
2 0.1 0.2 0.1 0.4 0.5 0.3 0.5
3 0.0 0.3 0.6 0.6 0.4 0.4 0.4
4 0.5 0.3 0.0 0.4 0.4 0.4 0.4
5 0.4 0.2 0.0 0.1 0.4 0.3 0.5
Avg 0.24 0.26 0.16 0.38 0.46 0.36 0.44
Table: Optimal γ values for different folds and feature combinations
Benefit from role transitions varies across feature combinations whichis reflected by γ values
γ values are highest for AL and ALS which as previously notedimprove the F-score for 3 roles.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 37 / 39
Conclusions and Future work
Conclusions
Precision of role recognition improves when role transition probabilityis included.
Improvement in precision is pronounced for Gatekeeper andProtagonist, which occur less frequently.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 38 / 39
Conclusions and Future work
Future Work
Incorporating role dynamics of 3 or more consecutive slices.
Investigation of the effect of including interpersonal dynamics in themodel.
Investigating role recognition on realistic situations, as this has beenon a ”constructed” corpus, AMI.
Gaurav Fotedar et al. Social role recognition Interspeech, 2016 39 / 39