semantic olap with fluenteditor and ontorion semantic excel toolchain

19
1 The company, product and service names used in this web site are for identification purposes only. © Cognitum 2014. All trademarks and registered trademarks are the property of their respective owners. Semantic OLAP with FluentEditor and Ontorion Semantic Excel Toolchain

Upload: cognitum

Post on 17-Aug-2015

292 views

Category:

Technology


2 download

TRANSCRIPT

1The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

Semantic OLAP with FluentEditor and Ontorion Semantic Excel Toolchain

2The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

Motivation Business Intelligence (BI) is a technology that

enables the business to make intelligent, data-driven decisions.

Intelligence here is governed by the laws of statistics that are applied on loosely coupled statistical variables, however to understand the meaning of data we need to link statistical variables to the real-life entities.

This improvement can be implemented nowadays with aid of semantic technologies.

3The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

On Line Analytical Processing

OLAP is a well-known method used in Business Analytics to provide decision makers with Online Access to Analytical Capabilities.

It is based on the concept of data-cubes, multidimensional cubes of data

that if equipped with tools allow the data and problems wherein to be explored.

4The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

dimensions measures

month year region prod unit priceMarch Year-2011 California Computer-38 1 106September Year-2014 California Computer-72 1 119November Year-2014 New-York Computer-10 2 488December Year-2014 California Computer-80 2 355July Year-2014 Quebec Computer-70 1 176September Year-2012 Quebec Computer-17 3 624

yearquartermonthdayhourminutesecondmilisecondnanosecond

time_key

Timecontinentcountryregioncitypostal_code

location_key

Location

categorybrandnamecolor

product_key

Product

unitsprice

time_keylocation_keyproduct_key

Sales

measures

dimensions

Transformation of a given dataset into the STAR schema (example)

5The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

yearquartermonthdayhourminutesecondmilisecondnanosecond

time_key

Timecontinentcountryregioncitypostal_code

location_key

Location

categorybrandnamecolor

product_key

Product

unitsprice

time_keylocation_keyproduct_key

Sales

measures

dimensionsExtracting the data hypercube

product

loca

tion measures

dimensions

6The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

location=Callifornia

Slicing/rolling the data-cube over dimensions

7The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

Semantic OLAPwith FluentEditor and Ontorion Semantic Excel Toolchain

8The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

FluentEditor 2014

Ontology editor - tool for editing and manipulating ontologies

Controlled Natural Languge interface + Predictive Editor

Knowledge representation – semantic technologies (formal logic, OWL 2, SWRL)

Reasoning engine - HermiT

9The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

Controlled Natural Language in FECNL is a subset of natural language with restricted grammar and vocabulary

in order to reduce the ambiguity and complexity inherent in full natural language

Ontology OWL 2 + SWRLControlled Englishin FluentEditor

Controlled English (CE) in Fluent Editoris automatically translated into and from description

logic OWL 2, SWRL

10The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

Ontology of dimentions

 January is a month and is-in-quarter equal-to 1.February is a month and is-in-quarter equal-to 1.March is a month and is-in-quarter equal-to 1.April is a month and is-in-quarter equal-to 2.May is a month and is-in-quarter equal-to 2.June is a month and is-in-quarter equal-to 2.July is a month and is-in-quarter equal-to 3.August is a month and is-in-quarter equal-to 3.September is a month and is-in-quarter equal-to 3.October is a month and is-in-quarter equal-to 4.November is a month and is-in-quarter equal-to 4.December is a month and is-in-quarter equal-to 4. Year-2011 is a year.Year-2012 is a year.Year-2013 is a year.Year-2014 is a year. If X is-inside Y then Y contains X. Usa is a country.Canada is a country. California is a place and is-inside Usa.New-York is a place and is-inside Usa.Washington is a place and is-inside Usa.Ontario is a place and is-inside Canada.Quebec is a place and is-inside Canada.

Every chromebook is a computer-type.Every sleekbook is a computer-type.Every laptop is a computer-type.Samsung is a vendor.Toshiba is a vendor.Gateway is a vendor.Lenovo is a vendor.Dell is a vendor.Acer is a vendor.Asus is a vendor.Hp is a vendor.Touchsmart is a family.Satellite is a family.Elitebook is a family.Alienware is a family.Inspiron is a family.Pavilion is a family.Thinkpad is a family.Qosimo is a family.Aspire is a family.Envy is a family.Intel is a cpu-vendor.Amd is a cpu-vendor.Ssd is a disk-type.Hdd is a disk-type.Solid-State-Drive is a disk-type.Flash-Drive is a disk-type.Hard-Drive is a disk-type.The-"windows-8.1" is an os.Chrome-Os is an os.Windows-7 is an os.Windows-8 is an os.

Computer-1 is a laptop.Computer-1 is-produced-by Hp.Computer-1 has-diagonal-in-inches equal-to 15.6.Computer-1 has-cpu-produced-by Intel.Computer-1 has-cpu-model equal-to 'Intel Pentium N3520'.Computer-1 has-ram-in-gb equal-to 4.Computer-1 has-disk-capacity-in-gb equal-to 500.Computer-1 has-disk-type Hdd.Computer-1 has-os The-"windows-8.1".Computer-1 has-color equal-to 'black licorice'.

11The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

Ontology of dimentions

Satellite

Computer-1

os

J une

J anuary

Acer

Computer-39

Computer-38

Computer-49

Computer-48

Computer-31

Computer-30

Computer-33

Computer-32

Computer-35

Computer-34

Computer-37

Computer-36

Computer-42

Computer-45

Hdd

Computer-47

Computer-46laptop

Computer-67

New-York

September

May

Computer-7

Year-2011

Computer-8

J uly

Pavilion

Elitebook

Computer-2

Inspiron

computer-type

February

Touchsmart

December

Computer-41

IntelQosimo

Computer-59

Computer-58

cpu-vendor

Lenovo

Computer-51

Computer-50

Computer-53

Computer-52

Computer-55

Computer-54

Computer-57

Computer-56

Computer-44

Computer-77

vendor

Samsung

Computer-9

August

placeyear

"thing"

Computer-4

sleekbook

Asus

Usa

Computer-80Computer-5

Computer-21

month

Computer-69

Computer-68

Thinkpad

Computer-61

Computer-60 Computer-63

Computer-62Computer-65

Computer-64

November

Computer-66

Computer-19

Computer-18

Computer-76

Computer-11

Computer-10

Computer-13

Computer-12

Computer-15

Computer-14

Computer-17Computer-16

Hard-Drive

Flash-Drive

Windows-8

Windows-7

Year-2014

Year-2013

April

Dell

Solid-State-Drive

Hp

disk-type

October

Quebec

Computer-3

Chrome-Os

Ssd

Computer-40

Ontario

Computer-43

Amd

Aspire

Computer-79

Computer-78

"windows-8.1"

Computer-71

Computer-70

Computer-73

Computer-72

Computer-75

Computer-74

California

country

Toshiba

Computer-29

Computer-28

Canada

Washington

Envy

Computer-20

Computer-23

Computer-22

Computer-25

Computer-24

Computer-27

Computer-26

chromebook

March

familyAlienware

Computer-6

Year-2012

Gateway

12The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

Ontology of dimentions

Satellite

Computer-1

os

J une

J anuary

Acer

Computer-39

Computer-38

Computer-49

Computer-48

Computer-31

Computer-30

Computer-33

Computer-32

Computer-35

Computer-34

Computer-37

Computer-36

Computer-42

Computer-45

Hdd

Computer-47

Computer-46laptop

Computer-67

New-York

September

May

Computer-7

Year-2011

Computer-8

J uly

Pavilion

Elitebook

Computer-2

Inspiron

computer-type

February

Touchsmart

December

Computer-41

IntelQosimo

Computer-59

Computer-58

cpu-vendor

Lenovo

Computer-51

Computer-50

Computer-53

Computer-52

Computer-55

Computer-54

Computer-57

Computer-56

Computer-44

Computer-77

vendor

Samsung

Computer-9

August

placeyear

"thing"

Computer-4

sleekbook

Asus

Usa

Computer-80Computer-5

Computer-21

month

Computer-69

Computer-68

Thinkpad

Computer-61

Computer-60 Computer-63

Computer-62Computer-65

Computer-64

November

Computer-66

Computer-19

Computer-18

Computer-76

Computer-11

Computer-10

Computer-13

Computer-12

Computer-15

Computer-14

Computer-17Computer-16

Hard-Drive

Flash-Drive

Windows-8

Windows-7

Year-2014

Year-2013

April

Dell

Solid-State-Drive

Hp

disk-type

October

Quebec

Computer-3

Chrome-Os

Ssd

Computer-40

Ontario

Computer-43

Amd

Aspire

Computer-79

Computer-78

"windows-8.1"

Computer-71

Computer-70

Computer-73

Computer-72

Computer-75

Computer-74

California

country

Toshiba

Computer-29

Computer-28

Canada

Washington

Envy

Computer-20

Computer-23

Computer-22

Computer-25

Computer-24

Computer-27

Computer-26

chromebook

March

familyAlienware

Computer-6

Year-2012

Gateway

13The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

Ontology of dimentions

Satellite

Computer-1

os

J une

J anuary

Acer

Computer-39

Computer-38

Computer-49

Computer-48

Computer-31

Computer-30

Computer-33

Computer-32

Computer-35

Computer-34

Computer-37

Computer-36

Computer-42

Computer-45

Hdd

Computer-47

Computer-46laptop

Computer-67

New-York

September

May

Computer-7

Year-2011

Computer-8

J uly

Pavilion

Elitebook

Computer-2

Inspiron

computer-type

February

Touchsmart

December

Computer-41

IntelQosimo

Computer-59

Computer-58

cpu-vendor

Lenovo

Computer-51

Computer-50

Computer-53

Computer-52

Computer-55

Computer-54

Computer-57

Computer-56

Computer-44

Computer-77

vendor

Samsung

Computer-9

August

placeyear

"thing"

Computer-4

sleekbook

Asus

Usa

Computer-80Computer-5

Computer-21

month

Computer-69

Computer-68

Thinkpad

Computer-61

Computer-60 Computer-63

Computer-62Computer-65

Computer-64

November

Computer-66

Computer-19

Computer-18

Computer-76

Computer-11

Computer-10

Computer-13

Computer-12

Computer-15

Computer-14

Computer-17Computer-16

Hard-Drive

Flash-Drive

Windows-8

Windows-7

Year-2014

Year-2013

April

Dell

Solid-State-Drive

Hp

disk-type

October

Quebec

Computer-3

Chrome-Os

Ssd

Computer-40

Ontario

Computer-43

Amd

Aspire

Computer-79

Computer-78

"windows-8.1"

Computer-71

Computer-70

Computer-73

Computer-72

Computer-75

Computer-74

California

country

Toshiba

Computer-29

Computer-28

Canada

Washington

Envy

Computer-20

Computer-23

Computer-22

Computer-25

Computer-24

Computer-27

Computer-26

chromebook

March

familyAlienware

Computer-6

Year-2012

Gateway

14The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

R + rOntorion

R language is a widely used tool for statistical analysis.

Combining ontologies and statistics opens an efficient way for the quantitative-qualitative analysis of data.

rOntorion R package allows direct access to ontologies created with FluentEditor and opens them for semantic processing in the R environment.

15The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

source ('SemanticOLAP.R') #load dimentions from ontologydimentions <- ontorion.load.cnl.file('dimentions.encnl'); #load data from CSV#CSV has columns : "month" "year" "region" "prod" "unit" "price"sales_fact <- read.table('sales.csv',header = T) #lets build SEMANTIC INDEX called "country" calculated as a country for a regioncountry_region_index <- build.index(dimentions,"country", "a country", "region", function(x)paste("a place that is-inside ",x)) #merge index with table from CSV so we have additional column called "country"sales_fact<-merge(sales_fact,country_region_index) #build the cube# price*unit = measurement# dimentions are ("prod"), ("month","year"), ("region","country)revenue_cube <- build.cube(sales_fact, c("price","unit"), function(x,y) x*y, c("prod", "month", "year","region", "country")) #SEMANTIC SLICE AND DICE# prod = "a laptop that has-diagonal-in-inches lower-than 12.0"# month = "a month that is-in-quarter equal-to 2"# year = "Year-2012"# region = "a place that is-inside Canada"# country = any countrysliceddiced_cube<-slice.and.dice(revenue_cube, dimentions, c( "a laptop that has-diagonal-in-inches lower-than 12.0", "a month that is-in-quarter equal-to 2", "Year-2012", "a place that is-inside Canada", "a country")) sliceddiced_cube #ROLLUP the main cube (total sum)roll.up(revenue_cube,c()) #ROLLUP sliced cube - show per monthsroll.up(sliceddiced_cube,c("prod"),aggreg = sum) #ROLLUP sliced cube roll.up(sliceddiced_cube,c("prod","region","country"))

16The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

source ('SemanticOLAP.R') #load dimentions from ontologydimentions <- ontorion.load.cnl.file('dimentions.encnl'); #load data from CSV#CSV has columns : "month" "year" "region" "prod" "unit" "price"sales_fact <- read.table('sales.csv',header = T) #lets build SEMANTIC INDEX called "country" calculated as a country for a regioncountry_region_index <- build.index(dimentions,"country", "a country", "region", function(x)paste("a place that is-inside ",x)) #merge index with table from CSV so we have additional column called "country"sales_fact<-merge(sales_fact,country_region_index) #build the cube# price*unit = measurement# dimentions are ("prod"), ("month","year"), ("region","country)revenue_cube <- build.cube(sales_fact, c("price","unit"), function(x,y) x*y, c("prod", "month", "year","region", "country")) #SEMANTIC SLICE AND DICE# prod = "a laptop that has-diagonal-in-inches lower-than 12.0"# month = "a month that is-in-quarter equal-to 2"# year = "Year-2012"# region = "a place that is-inside Canada"# country = any countrysliceddiced_cube<-slice.and.dice(revenue_cube, dimentions, c( "a laptop that has-diagonal-in-inches lower-than 12.0", "a month that is-in-quarter equal-to 2", "Year-2012", "a place that is-inside Canada", "a country")) sliceddiced_cube #ROLLUP the main cube (total sum)roll.up(revenue_cube,c()) #ROLLUP sliced cube - show per monthsroll.up(sliceddiced_cube,c("prod"),aggreg = sum) #ROLLUP sliced cube roll.up(sliceddiced_cube,c("prod","region","country"))

17The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

Conclusion The semantic extension of OLAP is proved to be

fully functional using the toolchain of domain ontology

Moreover, it created the foundations for already available on the market, developed and maintained by Cognitum, a solution called Ask Data Anything (ADA!).

The ADA! allows exploring data by using natural language directly, rather than by using CNL, therefore we classify ADA as a tool that allows to explore data with natural language.

18The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

Future WorksThe modern approach to BI called BigData, is currently understood to face the problem of “ (…) growing number of insights that are being produced by big data through automated forms of analysis (…) What happens to the thousands of insights that are being generated automatically by all of those nifty machine learning algorithms? How do they find their way to a person at the right time?“[1]

[1] D. Woods. (2015) Why big data needs natural language generation to work. Forbes. [Online]. Available: http:// www.forbes.com/ sites/ danwoods/ 2015/ 07/ 09/ why-big-data-needs-natural-language-generation-to-work/ [retrieved: 1 june, 2015]

19The company, product and service names used in this web site are for identification purposes

only. © Cognitum 2014. All trademarks and registered trademarks are the property of their

respective owners.

Source Code

You can try Semantic OLAP by your self

Download link:https://cognitumwww.blob.core.windows.net/software/CognitumSemanticOlap.zip