background knowledge injection for interpretable sequence ... · centre for data analytics...
TRANSCRIPT
![Page 1: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/1.jpg)
Centre for Data Analytics
Background Knowledge Injection for Interpretable Sequence Classification
Severin Gsponer1, Luca Costabello2, Chan Le Van2, Sumit Pai2, Christophe Gueret2, Georgiana Ifrim1, Freddy Lecue2
16.09.19
1 Insight Centre for Data AnalyticsUniversity College Dublin
2 Accenture LabsDublin
![Page 2: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/2.jpg)
Contributions
Groups:Regex-like expansion of traditional k-mers
Emb-SEQL:Injection of Background Knowledge into Sequence Learning Algorithm
Semantic Fidelity:Metric to quantify interpretability
2Insight Centre for Data Analytics
![Page 3: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/3.jpg)
Symbolic Sequence Classification
Sequence Class
ABBCBAABCBABCBABBBCBABBABBCB +1CCBBABACAABBABBAAABBBCCBBABA -1ACBBCACCCBAABCBABCCABCAABCCA +1
BABACCBABCTABCBABBCAABCBBBCA ?
Σ = {A, B, C}
3Insight Centre for Data Analytics
k-mers: Consecutive sequences k of symbols2-mer: AB5-mer: BCTCB
![Page 4: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/4.jpg)
SEQL - Sequence Learner
● Integrated approach
● Learns sparse k-mer based linear models
● Feature space of all possible k-mers
● Gauss-Southwell coordinate descent
● Iteratively add best k-mer to model
● Exploits structure in feature space
4Insight Centre for Data Analytics
![Page 5: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/5.jpg)
SEQL - Sequence Learner with Groups
● Exploit structure in symbol space
● Use groups to gain more flexibility
● Groups are built by combining basic symbols with OR
● Groups predefined by user orautomatically generated
5Insight Centre for Data Analytics
![Page 6: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/6.jpg)
Automatic Group Generation
6Insight Centre for Data Analytics
![Page 7: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/7.jpg)
Emb-SEQL Pipeline
7Insight Centre for Data Analytics
![Page 8: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/8.jpg)
Interpretability
● Interpretability is crucial in some problems
● Accuracy-interpretability trade-off
● Measuring interpretability of models is an open question
Semantic Fidelity intuition:
● Positive features should be “close” to target class
● Negative features should be “close” to non-target class
Functional grounded protocol as proxy measurement
8Insight Centre for Data Analytics
![Page 9: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/9.jpg)
k-mer - Target Class Distance
Weighted Target Class Distance
Semantic Fidelity
Semantic Fidelity
9Insight Centre for Data Analytics
![Page 10: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/10.jpg)
Experiment
Opportunity - Human Composite Activity Recognition (HAR) [1]:- Predict Composite Activity- Multiclass classification problem- Combinations of 5 low level features categories (|Σ| > 1400 symbols)
PhosphoELM - Protein Classification [2]:- Binary classification problem - Predict Kinase group- Amino acid sequences (|Σ| = 21 symbols; 438 sequences)
10Insight Centre for Data Analytics
![Page 11: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/11.jpg)
Predict composite activity based on sequence of low-level activity
Composite Activity Recognition
11Insight Centre for Data Analytics
<stand, open, drawer, none, none><stand, none, none, grab, milk><stand, none, none, put, milk><stand, close, drawer, none, none> ...<sit, move, plate, move, cup>
Coffee timeEarly morningCleanupSandwich timeRelaxing
?
Seq
uenc
e
![Page 12: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/12.jpg)
Composite Activity Recognition
12Insight Centre for Data Analytics
Sandwich time model
Atomic symbol embeddings
![Page 13: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/13.jpg)
Results - Semantic Fidelity
13Insight Centre for Data Analytics
![Page 14: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/14.jpg)
PCA Visualization
14Insight Centre for Data Analytics
Target: Coffee timeEmbedding: WordNet
![Page 15: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/15.jpg)
Results - Classification Quality
15Insight Centre for Data Analytics
SCIS_MA and HMM results from [2]
![Page 16: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/16.jpg)
Achievements & Conclusion
● Introduction of Groups, regex like k-mer symbols
● Generation of Groups from background knowledge sources
● Emb-SEQL, a method to learn sparse linear models
● Semantic Fidelity a way to measure interpretability
● Background knowledge injection improves interpretability measured by
Semantic Fidelity without hurting accuracy of learned model
16Insight Centre for Data Analytics
![Page 17: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/17.jpg)
Limitations & Future Work
● High memory demand of Emb-SEQL for large Groups
● Clustering method and Group size is crucial
● Background knowledge source is needed
● Semantic Fidelity for non-linear models
● Human-based evaluation of Semantic Fidelity
17Insight Centre for Data Analytics
![Page 18: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/18.jpg)
18Insight Centre for Data Analytics
Thank you!
Please email [email protected] if you have further questions
This work was funded by ScienceFoundation Ireland (SFI) under Insight Centre for Data Analytics (grant 12/RC/2289)
![Page 19: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,](https://reader034.vdocuments.mx/reader034/viewer/2022052101/603b4dababc5fd729e45764c/html5/thumbnails/19.jpg)
References
[1] D. Roggen, A. Calatroni, M. Rossi, T. Holleczek, K. Förster, G. Tröster, P. Lukowicz, D. Bannach, G. Pirkl, A. Ferscha, J. Doppler, C. Holzmann, M. Kurz, G. Holl,R. Chavarriaga, H. Sagha, H. Bayati, M. Creatura, and J. d. R. Millàn. Collecting complex activity datasets in highly rich networked sensor environments. In 7th International Conference on Networked Sensing Systems (INSS)
[2] C. Zhou, B. Cule, and B. Goethals. Pattern based sequence classification. IEEE Transactions on Knowledge and Data Engineering, 28(5):1285–1298, 2016.
19Insight Centre for Data Analytics