itelos - inception · 2020. 12. 27. · the inception phase aims to define what are called...

41
iTelos - Inception W3.L6.M3.T7

Upload: others

Post on 21-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

iTelos - Inception

W3.L6.M3.T7

Page 2: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Contents1 Top level view

2 Schema Inception

3 Data Inception

4 Evaluation

5 Phase Iterations

6 Languages & Standards

7 Tools

8 Deliverables

9 Examples

W3.L6.M3.T7 iTelos - Inception 1 / 39

Page 3: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Contents1 Top level view

2 Schema Inception

3 Data Inception

4 Evaluation

5 Phase Iterations

6 Languages & Standards

7 Tools

8 Deliverables

9 Examples

W3.L6.M3.T7 iTelos - Inception 2 / 39

Page 4: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Top level view

Figure: Inception Diagram

W3.L6.M3.T7 iTelos - Inception 3 / 39

Page 5: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Top level view

where:0 : Purpose Documentation.i : Competency Queries with Data Objects definition.ii : Preliminary datasets and informal metadata.

W3.L6.M3.T7 iTelos - Inception 4 / 39

Page 6: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Top level view

The Inception phase aims to define what are called CompetencyQuestions (cq), that at the end of this phase will became CompetencyQueries defining all kinds of queries that can be generated to solve theproblem.

W3.L6.M3.T7 iTelos - Inception 5 / 39

Page 7: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Top level view

The Knowledge Engineer and the Data Scientist are respectively incharge of the Schema Inception and Data Inception activities.

W3.L6.M3.T7 iTelos - Inception 6 / 39

Page 8: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Contents1 Top level view

2 Schema Inception

3 Data Inception

4 Evaluation

5 Phase Iterations

6 Languages & Standards

7 Tools

8 Deliverables

9 Examples

W3.L6.M3.T7 iTelos - Inception 7 / 39

Page 9: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Schema Inception

W3.L6.M3.T7 iTelos - Inception 8 / 39

Page 10: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Schema Inception

In the Schema Inception the main sub-activities being executed are thefollowing:

Competency Questions Definition.

Data Object Selection.

Generalized Query Definition.

W3.L6.M3.T7 iTelos - Inception 9 / 39

Page 11: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Competency Questions Definition - 1/2

In this first sub-activity the Knowledge Engineer using the Purposedocumentation, has to define the cq.

The Knowledge Engineer categorizes the cq collected during the phaseiterations, following the different data typologies:

Core data.

Common data.

Contextual data.

W3.L6.M3.T7 iTelos - Inception 10 / 39

Page 12: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Competency Questions Definition - 2/2It is important to note that:

The three data typologies listed above, define a dependencyhierarchy.

The Common data have the strongest impact in terms ofdependencies.

The Core data are the most important entities regard theproject’s solution.

The Contextual data are a specification of the previoustypologies of data.

W3.L6.M3.T7 iTelos - Inception 11 / 39

Page 13: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Data Object SelectionData Object Selection starts using the cq definitions.

The scope of this internal step is to identify and list the main data objectinvolved in the questions defined before.

This general object are the first version of what will be called etype, andin the current phase are used in the next sub-activity to improve the cqdefinition.

W3.L6.M3.T7 iTelos - Inception 12 / 39

Page 14: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Generalized Query Definition

This sub-activity aims to define more precisely all kinds of queries whichcan be useful in the solution achievement.

To obtain this result, the Knowledge Engineer:uses the preliminary defined cq together with the data objectdefined in the previous step,proceed to write a list of queries in a more precise format (i.e.SQL-like language).

W3.L6.M3.T7 iTelos - Inception 13 / 39

Page 15: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Contents1 Top level view

2 Schema Inception

3 Data Inception

4 Evaluation

5 Phase Iterations

6 Languages & Standards

7 Tools

8 Deliverables

9 Examples

W3.L6.M3.T7 iTelos - Inception 14 / 39

Page 16: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Data Inception

W3.L6.M3.T7 iTelos - Inception 15 / 39

Page 17: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Data Inception

In the Data Inception the sub-activities being executed are the following:

Data sets Selection.

Data sets Metadata Enrichment.

W3.L6.M3.T7 iTelos - Inception 16 / 39

Page 18: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Data sets Selection

The Data Scientist, has to analyze all the data sources listed inthe Purpose documentation.

The Data Scientist (working in parallel with the KnowledgeEngineer) has to identify the correct datasets respect to theusers and scenarios previously defined.

Some data sources could be not a repository but instead specificweb location from where a scraping procedure is needed inorder to extract the data.

W3.L6.M3.T7 iTelos - Inception 17 / 39

Page 19: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Data sets Metadata Enrichment

The Data Scientist is tasked on enriching the datasets selectedand extracted with record-level metadata.

Part of the metadata might have to be discovered by reading theimportance of the problem context.

W3.L6.M3.T7 iTelos - Inception 18 / 39

Page 20: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Contents1 Top level view

2 Schema Inception

3 Data Inception

4 Evaluation

5 Phase Iterations

6 Languages & Standards

7 Tools

8 Deliverables

9 Examples

W3.L6.M3.T7 iTelos - Inception 19 / 39

Page 21: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Evaluation - 1/4

The main aspects for Inception evaluation is the alignmentbetween project informal knowledge collected and datasetscollected.

The output of this phase is made out of cqs plus the preliminarylist of data sets with informal metadata selected out of theproblem documentation.

The class and properties are collected into two types of sets, Scand Sp from the cqs and the reference alignment data set.

W3.L6.M3.T7 iTelos - Inception 20 / 39

Page 22: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Evaluation - 2/4

Definition

Coverage (Cov) is the coverage between two sets α onto β, the per-centage of the difference from α from β.

Cov(α, β) = 1− |α− β||α|

Definition

Flexibility (Flx) is the flexibility between two sets α onto β, the percent-age of the difference from α from β.

Flx(α, β) = |β− α||β|

W3.L6.M3.T7 iTelos - Inception 21 / 39

Page 23: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Evaluation - 3/4

The Coverage calculates the ratio α ∩ β to α which is the percentage ofthe join set to the source set α.

Cov(Cc/p,Dc/p) evaluates the percentage of the overlapped part of cqs,where C and D stand for cq and the referenced alignment data set.

c/p stands for the type of set, classes as c and properties as p.

W3.L6.M3.T7 iTelos - Inception 22 / 39

Page 24: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Evaluation - 4/4

The Flexibility returns the ration β− α to β which is the percentage ofthe leftover of the target set β to itself.

Flx(Cc/p,Dc/p) evaluates the leftover percentage of the referencealignment data sets.

W3.L6.M3.T7 iTelos - Inception 23 / 39

Page 25: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Contents1 Top level view

2 Schema Inception

3 Data Inception

4 Evaluation

5 Phase Iterations

6 Languages & Standards

7 Tools

8 Deliverables

9 Examples

W3.L6.M3.T7 iTelos - Inception 24 / 39

Page 26: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Phase Iterations

In the Inception phase the minimum number of iterations required forthe production of an high quality output is equal, or more than, fouriterations.

The iterative process provide at each iterations a more defined andprecise input to the activities.

More in detail each iteration is focused on a specific data typology,among the three already mentioned, Common, Core and Contextual.

W3.L6.M3.T7 iTelos - Inception 25 / 39

Page 27: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Iteration Zero

In the first iteration the main output of the schema level is a documentwith the definitions of general queries with the general object definitionfor the Common data typology.

Regarding the data level, instead, the first iteration aims to identifyeventually missing data sources.

W3.L6.M3.T7 iTelos - Inception 26 / 39

Page 28: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Iteration One

The Knowledge Engineer has to define the general queries and thegeneral objects definition for the Core data typology.

At data level the Data Scientist has to extract the dataset and collectmetadata for the Common typology.

The evaluation activity at the and of the iteration, verifies the datasetsextracted using the schema elements defined in the parallel activity

W3.L6.M3.T7 iTelos - Inception 27 / 39

Page 29: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Iteration TwoThe Knowledge Engineer has to define the general queries and thegeneral objects definition for the Contextual data typology.

At data level the Data Scientist has to extract the dataset and collectmetadata for the Core typology.

The evaluation activity at the and of the iteration, verifies the datasetsextracted using the schema elements defined in the parallel activity.

W3.L6.M3.T7 iTelos - Inception 28 / 39

Page 30: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Iteration Three

The Knowledge Engineer has to perform a general check on thedocumentation produced in the previous iteration.

At data level the Data Scientist has to extract the dataset and collectmetadata for the Contextual typology.

W3.L6.M3.T7 iTelos - Inception 29 / 39

Page 31: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Contents1 Top level view

2 Schema Inception

3 Data Inception

4 Evaluation

5 Phase Iterations

6 Languages & Standards

7 Tools

8 Deliverables

9 Examples

W3.L6.M3.T7 iTelos - Inception 30 / 39

Page 32: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Languages & Standards

In this phase the Knowledge Engineer has to produce as final schemalevel output the Generalized Query Definitions.

The Data Scientist starts to collect the dataset needed from the datasources identified in the previous phase.

The datasets collected can be expressed using one or more of thefollowing format:

XMLHTMLCSVJSON

W3.L6.M3.T7 iTelos - Inception 31 / 39

Page 33: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Contents1 Top level view

2 Schema Inception

3 Data Inception

4 Evaluation

5 Phase Iterations

6 Languages & Standards

7 Tools

8 Deliverables

9 Examples

W3.L6.M3.T7 iTelos - Inception 32 / 39

Page 34: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Tools

The Knowledge Engineer can use a spreadsheet tool such as Excel orGoogle Sheet to produce the documentation containing initially the cqs.

For the data level the Scientist has to collect and manage data from thedata sources defined in the previous phase.

Examples of libraries that can be used:Data management: Pandas, NumPy, Scikit LearnData scraping: Beautiful Soup, Scrapy, Selenium, LXMLData formatting: Arrow, PrettyPandas, datacleaner

W3.L6.M3.T7 iTelos - Inception 33 / 39

Page 35: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Contents1 Top level view

2 Schema Inception

3 Data Inception

4 Evaluation

5 Phase Iterations

6 Languages & Standards

7 Tools

8 Deliverables

9 Examples

W3.L6.M3.T7 iTelos - Inception 34 / 39

Page 36: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Deliverables - 1/2

In the Inception phase the main deliverables produced are:

iTelos project report.

Preliminary datasets sheet.

Metadata sheet.

Metadata description.

W3.L6.M3.T7 iTelos - Inception 35 / 39

Page 37: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Contents1 Top level view

2 Schema Inception

3 Data Inception

4 Evaluation

5 Phase Iterations

6 Languages & Standards

7 Tools

8 Deliverables

9 Examples

W3.L6.M3.T7 iTelos - Inception 36 / 39

Page 38: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Examples of Schema Inception

Figure: Space Domain Competency Questions

W3.L6.M3.T7 iTelos - Inception 37 / 39

Page 39: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Examples of Schema Inception

Figure: Space Domain Competency Questions

W3.L6.M3.T7 iTelos - Inception 38 / 39

Page 40: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

Examples of Schema Inception

Figure: Space Domain Query Patterns

W3.L6.M3.T7 iTelos - Inception 39 / 39

Page 41: iTelos - Inception · 2020. 12. 27. · The Inception phase aims to define what are called Competency Questions (cq), that at the end of this phase will became Competency Queries

W3.L6.M3.T7

iTelos - Inception