beyond the models: applying semantic technologies across ...€¦ · moving to smart data – enter...
TRANSCRIPT
V.2.2
Eric Little, PhDChief Data [email protected]
Beyond the Models: Applying Semantic Technologies Across the Enterprise
Slide 2
The Current Situation Across Enterprises
Many challenges exist for data to be captured, integrated and shared
Data SilosIncompatible instruments and software systems, proprietary data formatsLegacy architectures are brittle and rigidSME knowledge resides in people’s heads, little common vocabularyData schemas are not explicitly understoodLack of common vision between business units and scientists
Slide 3
The challenge of big data is here – and it is growing By 2020 there will be 2.3 Zetabytes of annual traffic
on the Internet (ZB=1,000,000,000,000,000,000,000 bytes)
The volume of business data worldwide is estimated to double every 1.2 years.
Since 2012, more than 90 percent of the Fortune 500 have funded big data initiatives
100 terabytes of data is uploaded daily to Facebook Data production will be 44 times greater in 2020
than it was in 2009
Storing/retrieving that amount of data is 1challenge …. Analyzing even a fraction of it is an even bigger challenge
Big Data’s Impacts
If each Gigabyte in a Zettabyte were a brick, 258 Great Walls of China (made of 3.8B bricks) could be built.
Slide 4
The Common Big Data Fallacy
Hypothesis:
If I have more data at my fingertips –then I will have more answers
Well…. Actually….. No.
One major hurdle:“Real-world data […] is messy data, filled with inconsistencies, potential biases, and noise.”
Need a new approach to Big DataCopping & Li Harvard Business ReviewNov 29, 2016
Slide 5
Understanding the 4V’s of Big Data
Normally the focus –Big Data Analysis is more than just size
Performance is Critical to Success
Data complexity is increasing – Model complexity
Uncertainty abounds – requires statistics and probabilities
Majority of Big Data analytics approaches treat these two V’s
Semantic technologies provide
clear advantages
Mathematical Clustering
Techniques provide clear advantages
Slide 6
Moving to Smart Data – Enter Semantics
Smart data can be added to existing systems Does not require replacement of existing tech
Smart data provides a separation of: Model Layer Data Layer
Link to the model layer Leave data in place Smart data links information from the models to instance-level data
Smart Data uses metadata in order to capture logical context about data
Slide 7
Semantic Spectrum of Knowledge Organization Systems
• Deborah L. McGuinness. "Ontologies Come of Age". In Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang Wahlster, editors. Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press, 2003. • Michael Uschold and Michael Gruninger “Ontologies and semantics for seamless connectivity” SIGMOD Rec. 33, 4 (December 2004), 58-64. DOI=http://dx.doi.org/10.1145/1041410.1041420• Leo Obrst “The Ontology Spectrum”. Book section in of Roberto Poli, Michael Healy, Achilles Kameas “Theory and Applications of Ontology: Computer Applications”. Springer Netherlands, 17 Sep 2010.• Leo Obrst and Mills Davis "Semantic Wave 2008 Report: Industry Roadmap to Web 3.0 & Multibillion Dollar Market Opportunities”. 2008.
Sources
Slide 8
Ontologies provide a background for computations
Humans logically structure their world
Ontologies help to capture that structure
Background Beliefs
Ontologies capture important logical structures
But where does Machine Learning fit in?
Slide 9
The power of analytics is now just beginning to be felt Moore’s Law pertaining to processing
is not the problemFocus on the growth of Analysis: From 1988-2003 Computer processing
speed grew by 1000x In the same period algorithm dev grew
by 43,000x
Advanced analytics are increasingly adopted in mid-market organizations and large enterprises
The Growth of Analytics is Changing the Game
ANAL
YTIC
S
Slide 10
As data sources continue to increase – so to do new algorithmic approaches
Data Variety & Veracity are driving new innovations
More data is now better
Specialized hardware is evolving to match needs
Machine Learning and Deep Learning on Image Data
From: https://medium.com/@anthony_sarkis/the-age-of-the-algorithm-why-ai-progress-is-faster-than-moores-law-2fb7d5ae7943
Switch to Deep Learning Approach
Slide 11
THE MOVE FROM BIG DATA TO
B I G A N A L Y S I SST
ATIS
TIC
AL
SEM
ANTI
CS
MAC
HIN
ELE
ARN
ING
REA
SON
ING
Slide 12
Big Analysis Requires Hybrid Architectures
Semantic DBs
Unstructured Docs
Structured Data
Cloud DBs (NoSQL)Analytics
Dashboards & Reports
Integration Layer
Slide 13
Two Extremes of a Spectrum of Possible Solutions for Big Data
Data Warehouse Data Lake
Proven enterprise technology
Big DWHs require too great an effort
Not all data is suitable for rigid DWHs
+ Great flexibility and very little effort to store all sorts of dataData lakes are too loose a construct
Tremendous efforts on retrieval
+
Slide 14
Data Science (machine learning, text analytics, clustering etc.)
Make Data FAIR (Findable, Accessible, Interoperable Reusable)
Linked Open Data& Open APIs
Semantic Graph DB
(Knowledge Graph)
Operational DBs
…
Unstructured Documents
Analytics Toolssimulationsstatisticsreasoning
Visualizationdashboardsexplorationsearch
…
Semi-structured Data
Instrument Data
Lightweight Semantic Integration Layer(semantic RMDM, APIs, semantic indexing, data annotation, catalogues, meta data and linking)
Reportingregulatoryinternalexternal
15
Enter LeapAnalysis
16
LeapAnalysis
NOSQL Excel
Any kind of data source can be supported directly
Queries, Rules, Patterns, etc.
Big Analysis Concept:Semantics + Statistics
True Federated Analytics Across The Enterprise
Ref Data
17
• Companies must speed up the process of integrating data• Cleaning or integrating data before you know its value is
wasteful• Making data just “smart” can make it very slow• The world is moving to decentralization
• Virtualization• Federation• Complex problem solving• Pattern/model reuse
Main Topics to Consider
1818
Open Source Data
Aligned Data Sources
LA Alignment Store
(MongoDB)Data Integration
Model
User Query Workspaces
Query Response
How LeapAnalysisWorks
Subject Domain
DomainModels
…
Reference Model(Virtuoso RDF)
xxxxxxxx
Patient Data(CSV File System)
Sample Herexxxxxxxxxxxxxx
Patient Data(MSSQL)
LA Alignment Store (MongoDB)
prefix core: <http://vocab.rd.astrazeneca.net/core/>prefix bdm: <http://vocab.rd.astrazeneca.net/bdm/>prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>SELECT ?subject ?gender ?indication ?age ?height ?weight WHERE {
?subject rdf:type core:Subject .?subject core:hasGender ?gender .?subject core:hasIndication ?indication .?subject bdm:hasAge ?age .?subject bdm:hasHeight ?height .?subject bdm:hasWeight ?weight .
}
SPARQL Queryxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Slide 19
CONNECTING DATA, PEOPLE AND ORGANIZATIONS
Contact Information:
Email: [email protected]: www.osthus.com
www.biganalysis.comTwitter: OntoEric