semantic olap with fluenteditor and ontorion semantic excel toolchain
TRANSCRIPT
1The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
Semantic OLAP with FluentEditor and Ontorion Semantic Excel Toolchain
2The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
Motivation Business Intelligence (BI) is a technology that
enables the business to make intelligent, data-driven decisions.
Intelligence here is governed by the laws of statistics that are applied on loosely coupled statistical variables, however to understand the meaning of data we need to link statistical variables to the real-life entities.
This improvement can be implemented nowadays with aid of semantic technologies.
3The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
On Line Analytical Processing
OLAP is a well-known method used in Business Analytics to provide decision makers with Online Access to Analytical Capabilities.
It is based on the concept of data-cubes, multidimensional cubes of data
that if equipped with tools allow the data and problems wherein to be explored.
4The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
dimensions measures
month year region prod unit priceMarch Year-2011 California Computer-38 1 106September Year-2014 California Computer-72 1 119November Year-2014 New-York Computer-10 2 488December Year-2014 California Computer-80 2 355July Year-2014 Quebec Computer-70 1 176September Year-2012 Quebec Computer-17 3 624
yearquartermonthdayhourminutesecondmilisecondnanosecond
time_key
Timecontinentcountryregioncitypostal_code
location_key
Location
categorybrandnamecolor
product_key
Product
unitsprice
time_keylocation_keyproduct_key
Sales
measures
dimensions
Transformation of a given dataset into the STAR schema (example)
5The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
yearquartermonthdayhourminutesecondmilisecondnanosecond
time_key
Timecontinentcountryregioncitypostal_code
location_key
Location
categorybrandnamecolor
product_key
Product
unitsprice
time_keylocation_keyproduct_key
Sales
measures
dimensionsExtracting the data hypercube
product
loca
tion measures
dimensions
6The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
location=Callifornia
Slicing/rolling the data-cube over dimensions
7The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
Semantic OLAPwith FluentEditor and Ontorion Semantic Excel Toolchain
8The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
FluentEditor 2014
Ontology editor - tool for editing and manipulating ontologies
Controlled Natural Languge interface + Predictive Editor
Knowledge representation – semantic technologies (formal logic, OWL 2, SWRL)
Reasoning engine - HermiT
9The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
Controlled Natural Language in FECNL is a subset of natural language with restricted grammar and vocabulary
in order to reduce the ambiguity and complexity inherent in full natural language
Ontology OWL 2 + SWRLControlled Englishin FluentEditor
Controlled English (CE) in Fluent Editoris automatically translated into and from description
logic OWL 2, SWRL
10The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
Ontology of dimentions
January is a month and is-in-quarter equal-to 1.February is a month and is-in-quarter equal-to 1.March is a month and is-in-quarter equal-to 1.April is a month and is-in-quarter equal-to 2.May is a month and is-in-quarter equal-to 2.June is a month and is-in-quarter equal-to 2.July is a month and is-in-quarter equal-to 3.August is a month and is-in-quarter equal-to 3.September is a month and is-in-quarter equal-to 3.October is a month and is-in-quarter equal-to 4.November is a month and is-in-quarter equal-to 4.December is a month and is-in-quarter equal-to 4. Year-2011 is a year.Year-2012 is a year.Year-2013 is a year.Year-2014 is a year. If X is-inside Y then Y contains X. Usa is a country.Canada is a country. California is a place and is-inside Usa.New-York is a place and is-inside Usa.Washington is a place and is-inside Usa.Ontario is a place and is-inside Canada.Quebec is a place and is-inside Canada.
Every chromebook is a computer-type.Every sleekbook is a computer-type.Every laptop is a computer-type.Samsung is a vendor.Toshiba is a vendor.Gateway is a vendor.Lenovo is a vendor.Dell is a vendor.Acer is a vendor.Asus is a vendor.Hp is a vendor.Touchsmart is a family.Satellite is a family.Elitebook is a family.Alienware is a family.Inspiron is a family.Pavilion is a family.Thinkpad is a family.Qosimo is a family.Aspire is a family.Envy is a family.Intel is a cpu-vendor.Amd is a cpu-vendor.Ssd is a disk-type.Hdd is a disk-type.Solid-State-Drive is a disk-type.Flash-Drive is a disk-type.Hard-Drive is a disk-type.The-"windows-8.1" is an os.Chrome-Os is an os.Windows-7 is an os.Windows-8 is an os.
Computer-1 is a laptop.Computer-1 is-produced-by Hp.Computer-1 has-diagonal-in-inches equal-to 15.6.Computer-1 has-cpu-produced-by Intel.Computer-1 has-cpu-model equal-to 'Intel Pentium N3520'.Computer-1 has-ram-in-gb equal-to 4.Computer-1 has-disk-capacity-in-gb equal-to 500.Computer-1 has-disk-type Hdd.Computer-1 has-os The-"windows-8.1".Computer-1 has-color equal-to 'black licorice'.
11The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
Ontology of dimentions
Satellite
Computer-1
os
J une
J anuary
Acer
Computer-39
Computer-38
Computer-49
Computer-48
Computer-31
Computer-30
Computer-33
Computer-32
Computer-35
Computer-34
Computer-37
Computer-36
Computer-42
Computer-45
Hdd
Computer-47
Computer-46laptop
Computer-67
New-York
September
May
Computer-7
Year-2011
Computer-8
J uly
Pavilion
Elitebook
Computer-2
Inspiron
computer-type
February
Touchsmart
December
Computer-41
IntelQosimo
Computer-59
Computer-58
cpu-vendor
Lenovo
Computer-51
Computer-50
Computer-53
Computer-52
Computer-55
Computer-54
Computer-57
Computer-56
Computer-44
Computer-77
vendor
Samsung
Computer-9
August
placeyear
"thing"
Computer-4
sleekbook
Asus
Usa
Computer-80Computer-5
Computer-21
month
Computer-69
Computer-68
Thinkpad
Computer-61
Computer-60 Computer-63
Computer-62Computer-65
Computer-64
November
Computer-66
Computer-19
Computer-18
Computer-76
Computer-11
Computer-10
Computer-13
Computer-12
Computer-15
Computer-14
Computer-17Computer-16
Hard-Drive
Flash-Drive
Windows-8
Windows-7
Year-2014
Year-2013
April
Dell
Solid-State-Drive
Hp
disk-type
October
Quebec
Computer-3
Chrome-Os
Ssd
Computer-40
Ontario
Computer-43
Amd
Aspire
Computer-79
Computer-78
"windows-8.1"
Computer-71
Computer-70
Computer-73
Computer-72
Computer-75
Computer-74
California
country
Toshiba
Computer-29
Computer-28
Canada
Washington
Envy
Computer-20
Computer-23
Computer-22
Computer-25
Computer-24
Computer-27
Computer-26
chromebook
March
familyAlienware
Computer-6
Year-2012
Gateway
12The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
Ontology of dimentions
Satellite
Computer-1
os
J une
J anuary
Acer
Computer-39
Computer-38
Computer-49
Computer-48
Computer-31
Computer-30
Computer-33
Computer-32
Computer-35
Computer-34
Computer-37
Computer-36
Computer-42
Computer-45
Hdd
Computer-47
Computer-46laptop
Computer-67
New-York
September
May
Computer-7
Year-2011
Computer-8
J uly
Pavilion
Elitebook
Computer-2
Inspiron
computer-type
February
Touchsmart
December
Computer-41
IntelQosimo
Computer-59
Computer-58
cpu-vendor
Lenovo
Computer-51
Computer-50
Computer-53
Computer-52
Computer-55
Computer-54
Computer-57
Computer-56
Computer-44
Computer-77
vendor
Samsung
Computer-9
August
placeyear
"thing"
Computer-4
sleekbook
Asus
Usa
Computer-80Computer-5
Computer-21
month
Computer-69
Computer-68
Thinkpad
Computer-61
Computer-60 Computer-63
Computer-62Computer-65
Computer-64
November
Computer-66
Computer-19
Computer-18
Computer-76
Computer-11
Computer-10
Computer-13
Computer-12
Computer-15
Computer-14
Computer-17Computer-16
Hard-Drive
Flash-Drive
Windows-8
Windows-7
Year-2014
Year-2013
April
Dell
Solid-State-Drive
Hp
disk-type
October
Quebec
Computer-3
Chrome-Os
Ssd
Computer-40
Ontario
Computer-43
Amd
Aspire
Computer-79
Computer-78
"windows-8.1"
Computer-71
Computer-70
Computer-73
Computer-72
Computer-75
Computer-74
California
country
Toshiba
Computer-29
Computer-28
Canada
Washington
Envy
Computer-20
Computer-23
Computer-22
Computer-25
Computer-24
Computer-27
Computer-26
chromebook
March
familyAlienware
Computer-6
Year-2012
Gateway
13The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
Ontology of dimentions
Satellite
Computer-1
os
J une
J anuary
Acer
Computer-39
Computer-38
Computer-49
Computer-48
Computer-31
Computer-30
Computer-33
Computer-32
Computer-35
Computer-34
Computer-37
Computer-36
Computer-42
Computer-45
Hdd
Computer-47
Computer-46laptop
Computer-67
New-York
September
May
Computer-7
Year-2011
Computer-8
J uly
Pavilion
Elitebook
Computer-2
Inspiron
computer-type
February
Touchsmart
December
Computer-41
IntelQosimo
Computer-59
Computer-58
cpu-vendor
Lenovo
Computer-51
Computer-50
Computer-53
Computer-52
Computer-55
Computer-54
Computer-57
Computer-56
Computer-44
Computer-77
vendor
Samsung
Computer-9
August
placeyear
"thing"
Computer-4
sleekbook
Asus
Usa
Computer-80Computer-5
Computer-21
month
Computer-69
Computer-68
Thinkpad
Computer-61
Computer-60 Computer-63
Computer-62Computer-65
Computer-64
November
Computer-66
Computer-19
Computer-18
Computer-76
Computer-11
Computer-10
Computer-13
Computer-12
Computer-15
Computer-14
Computer-17Computer-16
Hard-Drive
Flash-Drive
Windows-8
Windows-7
Year-2014
Year-2013
April
Dell
Solid-State-Drive
Hp
disk-type
October
Quebec
Computer-3
Chrome-Os
Ssd
Computer-40
Ontario
Computer-43
Amd
Aspire
Computer-79
Computer-78
"windows-8.1"
Computer-71
Computer-70
Computer-73
Computer-72
Computer-75
Computer-74
California
country
Toshiba
Computer-29
Computer-28
Canada
Washington
Envy
Computer-20
Computer-23
Computer-22
Computer-25
Computer-24
Computer-27
Computer-26
chromebook
March
familyAlienware
Computer-6
Year-2012
Gateway
14The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
R + rOntorion
R language is a widely used tool for statistical analysis.
Combining ontologies and statistics opens an efficient way for the quantitative-qualitative analysis of data.
rOntorion R package allows direct access to ontologies created with FluentEditor and opens them for semantic processing in the R environment.
15The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
source ('SemanticOLAP.R') #load dimentions from ontologydimentions <- ontorion.load.cnl.file('dimentions.encnl'); #load data from CSV#CSV has columns : "month" "year" "region" "prod" "unit" "price"sales_fact <- read.table('sales.csv',header = T) #lets build SEMANTIC INDEX called "country" calculated as a country for a regioncountry_region_index <- build.index(dimentions,"country", "a country", "region", function(x)paste("a place that is-inside ",x)) #merge index with table from CSV so we have additional column called "country"sales_fact<-merge(sales_fact,country_region_index) #build the cube# price*unit = measurement# dimentions are ("prod"), ("month","year"), ("region","country)revenue_cube <- build.cube(sales_fact, c("price","unit"), function(x,y) x*y, c("prod", "month", "year","region", "country")) #SEMANTIC SLICE AND DICE# prod = "a laptop that has-diagonal-in-inches lower-than 12.0"# month = "a month that is-in-quarter equal-to 2"# year = "Year-2012"# region = "a place that is-inside Canada"# country = any countrysliceddiced_cube<-slice.and.dice(revenue_cube, dimentions, c( "a laptop that has-diagonal-in-inches lower-than 12.0", "a month that is-in-quarter equal-to 2", "Year-2012", "a place that is-inside Canada", "a country")) sliceddiced_cube #ROLLUP the main cube (total sum)roll.up(revenue_cube,c()) #ROLLUP sliced cube - show per monthsroll.up(sliceddiced_cube,c("prod"),aggreg = sum) #ROLLUP sliced cube roll.up(sliceddiced_cube,c("prod","region","country"))
16The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
source ('SemanticOLAP.R') #load dimentions from ontologydimentions <- ontorion.load.cnl.file('dimentions.encnl'); #load data from CSV#CSV has columns : "month" "year" "region" "prod" "unit" "price"sales_fact <- read.table('sales.csv',header = T) #lets build SEMANTIC INDEX called "country" calculated as a country for a regioncountry_region_index <- build.index(dimentions,"country", "a country", "region", function(x)paste("a place that is-inside ",x)) #merge index with table from CSV so we have additional column called "country"sales_fact<-merge(sales_fact,country_region_index) #build the cube# price*unit = measurement# dimentions are ("prod"), ("month","year"), ("region","country)revenue_cube <- build.cube(sales_fact, c("price","unit"), function(x,y) x*y, c("prod", "month", "year","region", "country")) #SEMANTIC SLICE AND DICE# prod = "a laptop that has-diagonal-in-inches lower-than 12.0"# month = "a month that is-in-quarter equal-to 2"# year = "Year-2012"# region = "a place that is-inside Canada"# country = any countrysliceddiced_cube<-slice.and.dice(revenue_cube, dimentions, c( "a laptop that has-diagonal-in-inches lower-than 12.0", "a month that is-in-quarter equal-to 2", "Year-2012", "a place that is-inside Canada", "a country")) sliceddiced_cube #ROLLUP the main cube (total sum)roll.up(revenue_cube,c()) #ROLLUP sliced cube - show per monthsroll.up(sliceddiced_cube,c("prod"),aggreg = sum) #ROLLUP sliced cube roll.up(sliceddiced_cube,c("prod","region","country"))
17The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
Conclusion The semantic extension of OLAP is proved to be
fully functional using the toolchain of domain ontology
Moreover, it created the foundations for already available on the market, developed and maintained by Cognitum, a solution called Ask Data Anything (ADA!).
The ADA! allows exploring data by using natural language directly, rather than by using CNL, therefore we classify ADA as a tool that allows to explore data with natural language.
18The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
Future WorksThe modern approach to BI called BigData, is currently understood to face the problem of “ (…) growing number of insights that are being produced by big data through automated forms of analysis (…) What happens to the thousands of insights that are being generated automatically by all of those nifty machine learning algorithms? How do they find their way to a person at the right time?“[1]
[1] D. Woods. (2015) Why big data needs natural language generation to work. Forbes. [Online]. Available: http:// www.forbes.com/ sites/ danwoods/ 2015/ 07/ 09/ why-big-data-needs-natural-language-generation-to-work/ [retrieved: 1 june, 2015]
19The company, product and service names used in this web site are for identification purposes
only. © Cognitum 2014. All trademarks and registered trademarks are the property of their
respective owners.
Source Code
You can try Semantic OLAP by your self
Download link:https://cognitumwww.blob.core.windows.net/software/CognitumSemanticOlap.zip