![Page 1: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/1.jpg)
RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS
Zeheng Li Mingrui Chen LiGuo Huang
Department of Computer Science & Engineering
Southeren methodist University
Dallas, TX 75275-0122
Vincet Ng
Human Language Technology Institute
University Of Texas at Dallas
Richardson, TX 75083-0688
Presented By
Narendra Narisetti
Cuk,2552738 1
![Page 2: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/2.jpg)
Introduction
• Software system development initialized with evaluation andrefinement of requirements.
• Documenting those requirements using natural language iscalled “requirements documents”.
• The requirements are refined with additional design detailsand implementation information.
• Linking of requirements in which one is refinement of other iscalled ‘’ requirements traceability’’.
2
![Page 3: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/3.jpg)
Types of Requirements
• Specifically, requirements can be divided into two types:
1. High Level Requirements(coarse-grained)
2. Low Level Requirements(fine-grained)
• Requirement traceability links each high-level requirementwith all the low-level requirements that improves.
• The traceability mapping is many-to-many .
3
![Page 4: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/4.jpg)
Example: Pine email system by Sultanov and Hayes
Figure 1: Sample of high- and low-level requirements4
1
2
3
![Page 5: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/5.jpg)
Drawbacks:
• Information irrelevant to the establishment of one link is related to establishment of other link in same requirement.
Example: Description section in UC01 is irrelevant to the HR02 but it is relevant to HR01 for linking.
• Link can exist between a pair of requirements even if they don’t have similar content words or overlapping.
5
![Page 6: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/6.jpg)
Requirements Traceability Approaches:
• It is classified as two types:
Manual approaches: Requirements traceability links arerecovered manually by developers.
Automated approaches: Depends on information retrieval(IR)techniques to generate links automatically.
6
![Page 7: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/7.jpg)
Automated approaches
• Binary classification tasks.
• Measures similarity between high and low level requirements.
• Classifying positive means high and low level requirements are linked.
• Information retrieval (IR) techniques are used for traceability link prediction.
7
![Page 8: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/8.jpg)
Supervised Learning Methods
• Supervised methods are employed with two types of humansupplied knowledge:
i) Annotator rationales : It contains the information relevantto the establishment of link by the human annotator.
we use this rationales to create additional training instancesfor the learner.
ii) Ontology hand-built: It is defined by a domain expert tocreate additional training features for learner. (see next slide)
8
![Page 9: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/9.jpg)
Hand-built ontology of pine
9
![Page 10: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/10.jpg)
Why ontology based features are useful for traceability links?
1.Only those verbs and nouns appear in training data
2. For link identification , verbs and nouns are deemed relevant by domain expert in ontology.
3. Robust generalization of the words/phrases .
10
Hand-built Ontology
![Page 11: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/11.jpg)
Manual Vs Automated
Manual Approach
1. System analysts uses requirement management tools to build RTM.
2. Rational DOORS, Rational RequisitePro, CASE .
3. It is human-intensive so error prone gives large set of requirements.
Automated Approach
1. Calculate textual similaritybetween requirements.
Ex: Cosine coefficients, Jaccard
2. Tf-idf-based vector spacemodel, Latent DirichletAllocation.
3. Depend on IR techniques.
11
![Page 12: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/12.jpg)
For our evaluation we are taking second dataset“WorldVistA” , an electronic health informationsystem developed by the USA veteransadministration along with pine email system.
Datasets
Table 1: Statistics on the Datasets12
![Page 13: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/13.jpg)
Manual ontology for WorldVistA
13
![Page 14: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/14.jpg)
Manual ontology for WorldVistA
14
![Page 15: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/15.jpg)
15
Baseline Systems
• It employs different methods for traceability prediction.
Baseline Systems
Unsupervised Baseline Supervised Baseline
Tf-idf LDA Word Pairs LDA induced topic pairs
![Page 16: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/16.jpg)
Unsupervised Baselinesa) The Tf-idf baseline: If cosine similarity value between two
documents is greater than given threshold value then it ispositive.
b) The LDA baseline: Each entry in document has certainprobability such that it belongs to one of the topics ofn(length of the document) and apply cosine similarity asabove method.
Note: Here LDA is trained to produce n topics.
16
![Page 17: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/17.jpg)
Supervised Baseline
• Instance is pair of high-level and low-level requirements.
• Instance is positive then two requirements are linked otherwiseit is negative.
• Instances can be represented using two types of features:
a) word pairs: Instance is pair of words taken from traininginstances.
b) LDA-induced topic pairs: Instance is pair of features and it ispositive if both features are most probable topics in high andlow-level requirements.
Note: Here LDA is trained with additional parameter C toproduce n topics.
17
![Page 18: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/18.jpg)
Exploiting Rationales
Extension:
• Generating extra training instances i.e. pseudo instances, weneed to adopt extension to baseline systems.
• We employ a binary SVM classifier on training data set withlinear kernel and setting all parameters to default valuesexpect C parameter.
Evaluation:
• Dataset is five fold cross validation in which three folds fortraining data, one fold for development set and one fold forevaluation.
• F-score on dev set give performance of the classifier. 18
![Page 19: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/19.jpg)
Rationale in Traceability Prediction
• According to Zaidan et al, Rationale is a human-annotated textfragment that motivated an annotator to assign a particularlabel to training document.
• In traceability prediction rationales are identified only forpositive instances.
• In traceability prediction, negative instances are because ofabsence of evidence that two requirements involved shouldbe linked rather than presence of evidence that they shouldnot be linked.
19
![Page 20: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/20.jpg)
Creating Negative Pseudo Instances
• Steps for creating negative pseudo instances:
i) Select pair of linked requirements.
ii) Remove rationale from both requirements. Only negativeinstances will remain.
iii) Remaining text fragments create pseudo instances which arenegative in nature.
iv) From each pair of positive instances, three types of negativepseudo instances are possible:
a) Removing all and rationales from high-level requirements.
b) Removing all and rationales from low-level requirements.
c) Removing all rationales from both requirements.20
![Page 21: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/21.jpg)
Creating Positive Pseudo Instances
• Steps for creating positive pseudo instances:
i) Select pair of linked requirements.
ii) Remove text fragments which are not part of rationale in pair.
iii) Reaming pseudo instances are positive pseudo instances.
iv) Add a constraint to the SVM learner to classify pseudo instances with less confidence.
21
![Page 22: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/22.jpg)
Soft-margin SVM formulation
i) Positive instances:
ii) Positive pseudo instances:
iii) Negative pseudo instances:
• Xi = Training example
• C = error penalty
• Vi ,uij = pos/neg pseudo instances created from Xi
• Ci = { -1,+1} class label
• ξi = slack variable
• μ = margin size
22
![Page 23: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/23.jpg)
Exploiting an Ontology
• For generating additional features we employ SVM learner tohand-built ontology contains verb and noun clusters.
• In this, each training instance is
i) from high-level and low-level requirements
ii) from the list of Ontology.
23
![Page 24: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/24.jpg)
24
Ontology Based Features
Verb pairsNoun pairs Verb group pairs Noun group pairs Dependency pairs
focus on verbs/Nouns that relevant to traceability prediction
Replace verbs/Nouns with cluster id’sCreate binary file with cluster id’sBest performance
Combination of verb and nounUse Stanford dependency parser Connected by dependency relation
![Page 25: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/25.jpg)
Learning the Ontology
Is it possible to learn an ontology rather than hand-buildingit?
Yes, it involves 3steps procedure:
Step1: verb/noun selection
Select verbs, nouns, noun phrases from training data in such
way that
a) should appear more than once
b) it contains at least three characters. Ex: be, is.
c) should appear in high level but not in low level and vice
versa.
25
![Page 26: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/26.jpg)
Learning the Ontology• Step2: Verb/Noun representation
a) Represent each verb with set of nouns/NPs using Stanforddependency parser.
b) similarly noun with set of verbs collected in step1.
• Step3: Clustering-
a) Apply clustering to both verb and noun clusters separatelyusing single-link algorithm.
b) This algorithm merges two most similar clusters usingsimilarity measurement and stops when it reaches desirednumber of clusters.
It gives induced number of clusters for given datasets.26
![Page 27: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/27.jpg)
Evaluation
• In evaluation, we compare F-score of different methodswhich depends on combination of noun clustering and verbclustering and C value.
• F-score depend on two terms:
i) Recall (R) :- It is percentage of links in the gold standard thatare recovered by our system.
ii) Precision (P) :- It is percentage of links recovered by oursystem that are correct.
• F-score is harmonic mean of recall and precision.
27
![Page 28: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/28.jpg)
Result of Supervised Systems
28
![Page 29: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/29.jpg)
Conclusion• Traceability prediction is crucial task with annotator rationale
and ontology.
• Supervised baseline techniques reduces relative error by 11.1-19.7% compared to baseline techniques.
• F-score is competitive in between manual clusters andinduced clusters.
• The results might change depending on datasets.
29
![Page 30: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/30.jpg)
30
![Page 31: RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS …horacek/Ontologies-Presentation.pdf · RECOVERING TRACEABILITY LINKS IN REQUIREMENTS DOCUMENTS Zeheng Li Mingrui Chen LiGuo](https://reader031.vdocuments.mx/reader031/viewer/2022022423/5a9e14077f8b9a29228d45e2/html5/thumbnails/31.jpg)
31