text mining and knowledge graphs in the cloud: the self-service semantic suite (s4)

51
Text Mining and Knowledge Graphs in the Cloud: The Self-Service Semantic Suite (S4) A webinar with Marin Dimitrov, CTO of Ontotext Feb 26 th , 2015 Text Mining & Knowledge Graphs in the Cloud with S4 #1 Feb 2015

Upload: marin-dimitrov

Post on 14-Jul-2015

957 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Text Mining and Knowledge Graphs in the Cloud: The Self-Service Semantic

Suite (S4)

A webinar with Marin Dimitrov, CTO of Ontotext

Feb 26th, 2015

Text Mining & Knowledge Graphs in the Cloud with S4 #1 Feb 2015

Page 2: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Semantic technologies for data management

• Self-Service Semantic Suite (S4)

• Text analytics

• RDF data management in the Cloud

• Knowledge graphs

• S4 for developers

• Roadmap

• Q&A session

Today’s Topics

Text Mining & Knowledge Graphs in the Cloud with S4 #2 Feb 2015

Page 3: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

About Ontotext

• Provides products & solutions for content enrichment and metadata management

– 70 employees, head quartered in Sofia (Bulgaria)

– Sales presence in London, Washington & Boston

• Major clients and industries

– Media & Publishing

– Health Care & Life Sciences

– Cultural Heritage & Digital Libraries

– Government

– Education

Text Mining & Knowledge Graphs in the Cloud with S4 #3 Feb 2015

Page 4: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Some of our clients

Text Mining & Knowledge Graphs in the Cloud with S4 #4 Feb 2015

Page 5: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Semantic Technologies for Smart Data Management

Text Mining & Knowledge Graphs in the Cloud with S4 #5 Feb 2015

Page 6: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• How can we unlock more insight from text?

• How can we interlink & search across text and structured data sources?

• How can we improve data & content reuse?

• How can we integrate data sources faster?

• How can we reuse external open data sources?

• How can we discover relations between entities?

Typical challenges for our customers

Text Mining & Knowledge Graphs in the Cloud with S4 #6 Feb 2015

Page 7: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Ontotext’s vision for smart data management

Graph Database • Flexible RDF graph

data model • Ontology metadata

layer

Semantic Search • Semantic,

exploratory search • Metadata driven

content

Text Mining & Interlinking • People, locations,

organisations, topics • Discover implicit

relations • Reuse open knowledge

graphs

Text Mining & Knowledge Graphs in the Cloud with S4 #7 Feb 2015

Page 8: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Ontotext and AstraZeneca

Profile • Global, Bio-pharma company • $28 billion in sales in 2012 • $4 billion in R&D across three continents

Goals • Efficient design of new clinical studies • Quick access to all of the data • Improved evidence based decision-making • Strengthen the knowledge feedback loop • Enable predictive science

Challenges • Over 7,000 studies and 23,000 documents

are difficult to obtain • Searches returning 1,000 – 10,000 results • Document repositories not designed for

reuse • Tedious process to arrive at evidence

based decisions

#8 Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015

Page 9: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Ontotext and LMI

Profile • Established in 1961 to enable federal

agencies • Specializes in logistics, financial,

infrastructure & information management

Goals • Unlock large collections of complex

documents • Improve analyst productivity • Create an application they can sell to US

Federal agencies

Challenges • Analysts taking hours to find, download

and search documents, using inaccurate keyword searches

• Needed a knowledge base to search quickly and guide the analysts – highly relevant searches

• Extracts knowledge from collection of documents

• Uses GraphDB to intuitively search and filter • More than 90% savings in analyst time • Accurate results

#9 Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015

Page 10: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Ontotext and Euromoney

Profile • Euromoney Institutional Investor PLC, the

international online information and events group

Goals • Create a horizontal platform to serve 100

different publications / 80 business units • create a new unified publishing and

information platform

Challenges • Different domains covered • Sophisticated content analytics incl.

relation, template and scenario extraction

• Text analytics of reports and news in various domains

• Extraction of sophisticated macro economic views on markets and market conditions

• Triplestore for flexible data integration & reasoning

• Multi-faceted search

#10 Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015

Page 11: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

The Self-Service Semantic Suite (S4)

Text Mining & Knowledge Graphs in the Cloud with S4 #11 Feb 2015

Page 12: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Unlock the value of semantic technologies to SMEs

– Most success stories so far come from bigger companies

• Lower the technology adoption barriers and risks

– Challenge: perceived risks associated with new technology adoption

– Challenge: insufficient resources to implement new technologies

– Challenge: bureaucratic budgeting, procurement & provisioning processes

Why did we create S4?

Text Mining & Knowledge Graphs in the Cloud with S4 #12 Feb 2015

Page 13: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Self-service capabilities for text analytics, content enrichment and metadata management

– Text analytics for news, life sciences and social media

– RDF graph database as-a-service

– Access to large open knowledge graphs

• Available anytime, anywhere

– Simple RESTful services

• Simple, pay-per-use pricing

– No upfront commitments

What is S4?

Text Mining & Knowledge Graphs in the Cloud with S4 #13 Feb 2015

Page 14: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Utilise semantic technology for smart data applications

– Extract more value hidden in text

– Interlink structured and unstructured data sources

– Semantic search (instead of keyword-based search)

– Reuse open knowledge graphs

• Low adoption cost and risk

• No need for complex planning & procurement

• Pay only for what you use, reduce TCO

S4 benefits

Text Mining & Knowledge Graphs in the Cloud with S4 #14 Feb 2015

Page 15: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Enables quick prototyping & shorter time-to-market, increase innovation speed

• Available on-demand in the cloud, no provisioning & operations required

• Based on enterprise grade semantic technology by Ontotext

• Migration path from S4 based prototypes to customised enterprise solutions with Ontotext technology

S4 benefits

Text Mining & Knowledge Graphs in the Cloud with S4 #15 Feb 2015

Page 16: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Instantly available

• Free tier

• Easy to start, shorter learning curve

– Various add-ons, SDKs and demo code

• Simplify the technology stack for smart data applications

• Focus on building applications, don’t worry about infrastructure & operations

• Quicker prototyping, shorter development cycles

S4 for developers

Text Mining & Knowledge Graphs in the Cloud with S4 #16 Feb 2015

Page 17: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Text Analytics

Text Mining & Knowledge Graphs in the Cloud with S4 #17 Feb 2015

Page 18: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Text analytics services

– News annotation

– News categorisation

– Biomedical

– Twitter

• Entity linking & disambiguation

– Mappings to DBpedia & GeoNames instances

– Mappings to biomedical data sources (LinkedLifeData)

• HTML, MS Word, XML, plain text input

• Simple JSON output

Text analytics with S4

Text Mining & Knowledge Graphs in the Cloud with S4 #18 Feb 2015

Page 19: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Entity types

– Person

– Organization

– Location

– Relation (affiliation, customer, competitor, partner, acquisition, role, …)

– Keywords and key phrases

• Enterprise grade technology

– Based on successful text mining solutions for big media & publishing companies

News analytics with S4

Text Mining & Knowledge Graphs in the Cloud with S4 #19 Feb 2015

Page 20: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Text Mining & Knowledge Graphs in the Cloud with S4 #20 Feb 2015

News analytics with S4

Page 21: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

News analytics example

Text Mining & Knowledge Graphs in the Cloud with S4 #21 Feb 2015

S4 result

Page 22: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

News analytics example

Text Mining & Knowledge Graphs in the Cloud with S4 #22 Feb 2015

API_KEY=s4trm64sb76u KEY_SECRET=lrcki2kkajslsp6 SERVICE_ENDPOINT="https://$API_KEY:[email protected]/v1/news" CONTENT="President Barack Obama is urging parents to get their children vaccinated in the face of a measles outbreak that has infected more than 100 people in the United States. In excerpts from an interview with NBC News that will air on Monday, Obama said measles was a preventable disease." CONTENT_TYPE="text/plain" JSON_REQUEST="{\"document\" : \"$CONTENT\", \"documentType\" : \"$CONTENT_TYPE\"}" curl -X POST -H "Content-Type: application/json" -d "$JSON_REQUEST" $SERVICE_ENDPOINT

{ "document" : "President Barack Obama is urging parents to get their children vaccinated in the face of a measles outbreak that has infected more than 100 people in the United States. In excerpts from an interview with NBC News that will air on Monday, Obama said measles was a preventable disease" , "documentType" : "text/plain" }

API key pair REST service

text

Request structure

Request structure

Page 23: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• 17 top-level categories from the IPTC Subject Reference System

– Arts / Culture / Entertainment, Crime / Law / Justice, Disaster / Accident, Economy / Business / Finance, Education, Environment, Health, Politics, …

• Enterprise grade technology

– Based on successful text mining solutions for big media & publishing companies

News classification with S4

Text Mining & Knowledge Graphs in the Cloud with S4 #23 Feb 2015

Page 24: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

News classification example

Text Mining & Knowledge Graphs in the Cloud with S4 #24 Feb 2015

S4 result

Page 25: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

News classification example

Text Mining & Knowledge Graphs in the Cloud with S4 #25 Feb 2015

API_KEY=s4trm64sb76u KEY_SECRET=lrcki2kkajslsp6 SERVICE_ENDPOINT="https://$API_KEY:[email protected]/v1/news-classifier" CONTENT_URL="http://www.theguardian.com/world/2015/feb/04/taiwan-plane-crash-lands-in-river" CONTENT_TYPE="text/plain" JSON_REQUEST="{\"documentUrl\" : \"$CONTENT_URL\", \"documentType\" : \"$CONTENT_TYPE\"}" curl -X POST -H "Content-Type: application/json" -d "$JSON_REQUEST" $SERVICE_ENDPOINT

{ "documentUrl" : "http://www.theguardian.com/world/2015/feb/04/taiwan-plane-crash-lands-in-river" , "documentType" : "text/html" }

API key pair REST service

URL Request structure

Request structure

Page 26: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• 130 biomedical entity types

– Organism, Virus, Animal, Anatomical Structure, Organ, Tissue, Cell, Genome, Chemical, Lab Result, Clinical Drug, Biologic Function, Organ Function, Disease/Syndrome, …

• Enterprise grade technology

– Based on successful text mining solutions for big pharmaceuticals and healthcare providers

Biomedical analytics with S4

Text Mining & Knowledge Graphs in the Cloud with S4 #26 Feb 2015

Page 27: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Biomedical analytics with S4

Text Mining & Knowledge Graphs in the Cloud with S4 #27 Feb 2015

Page 28: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Biomedical analytics example

Text Mining & Knowledge Graphs in the Cloud with S4 #28 Feb 2015

S4 result

Page 29: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Entity types

– Person, Location, Organisation, Date, Address, Money

– Hashtag, Emoticon, URL, @UserID

• Based on TwitIE microblog pipeline by GATE / University of Sheffield

Twitter analytics with S4

Text Mining & Knowledge Graphs in the Cloud with S4 #29 Feb 2015

Page 30: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Twitter analytics example

Text Mining & Knowledge Graphs in the Cloud with S4 #30 Feb 2015

Page 31: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

RDF Data Management

Text Mining & Knowledge Graphs in the Cloud with S4 #31 Feb 2015

Page 32: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Standards compliance

– Based on a mature set of W3C standards: RDF/S, OWL, SPARQL

– Portability & interoperability

• Schema-less data integration, easy querying of diverse data

• Complex & exploratory queries

• Infer implicit relations in the graph

• Reuse open knowledge graphs (Linked Open Data)

RDF for smart data management

Text Mining & Knowledge Graphs in the Cloud with S4 #32 Feb 2015

Page 33: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

A visual view of RDF data

Text Mining & Knowledge Graphs in the Cloud with S4 #33 Feb 2015

Sub-properties Sub-classes Transitive relations Inference

Page 34: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• High performance RDF database

• Full SPARQL 1.1 support

• Various reasoning profiles, including custom rules

• Efficient data integration (“sameAs” optimisations)

• Efficient deletion of statements & their inferences

• Geo-spatial indexing & querying with SPARQL

• RDF Rank, full-text search, 3rd party plugins

GraphDB by Ontotext

Text Mining & Knowledge Graphs in the Cloud with S4 #34 Feb 2015

Page 35: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Ideal for customers who are…

– still evaluating and testing RDF technology

– In the early phase of adoption / POC

• Enterprise grade RDF database in the Cloud

– No need for upfront payments for licenses & hardware

– Pay only for what you use, when you use it

– Instantly operational within minutes

– No need for complex planning - use as many DB instances for as long as needed

– Timely upgrades to the latest version

• Self-managed and full-managed options

RDF database in the Cloud with S4

Text Mining & Knowledge Graphs in the Cloud with S4 #35 Feb 2015

Page 36: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Available from AWS Marketplace

• Variety of hardware configurations

– 2 to 8 CPU cores / 8 to 61 GB RAM

– IOPS performance & encryption (EBS)

• Manage large data volumes

• Pay-per-hour pricing

Self-managed database in the Cloud

Text Mining & Knowledge Graphs in the Cloud with S4 #36 Feb 2015

Page 37: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• (available in Q2’2015)

• Low-cost DBaaS available 24/7

• Ideal for small & moderate data volumes

• Instantly start new databases when needed

• Zero administration: automated operations, maintenance & upgrades

• Users pay only for the actual database utilisation

– database size + number of queries per period

Fully-managed database in the Cloud

Text Mining & Knowledge Graphs in the Cloud with S4 #37 Feb 2015

Page 38: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Knowledge Graphs

Text Mining & Knowledge Graphs in the Cloud with S4 #38 Feb 2015

Page 39: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• SPARQL query endpoint to FactForge knowledge graph

– 500 million entities

– 5 billion triples

• Key LOD datasets integrated

– DBpedia, Freebase, GeoNames, WordNet

– Dublin Core, SKOS, PROTON ontologies and vocabularies

Knowledge graphs with S4

Text Mining & Knowledge Graphs in the Cloud with S4 #39 Feb 2015

Page 40: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Knowledge graph query example

Text Mining & Knowledge Graphs in the Cloud with S4 #40 Feb 2015

SPARQL query using DBpedia

data

Page 41: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

For Developers

Text Mining & Knowledge Graphs in the Cloud with S4 #41 Feb 2015

Page 42: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Getting started in minutes

Text Mining & Knowledge Graphs in the Cloud with S4 #42 Feb 2015

1. Register a personal account at s4.ontotext.com

2. Generate an API key pair

3. Check out the docs, demos & code at

docs.s4.ontotext.com

4. Contact us with questions!

Page 43: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Java & C# SDKs

• Sample code

– Java, C#, NodeJS, JavaScript, Python, PHP, Groovy

– Curl examples for the most impatient

• GATE plugin (UIMA plugin in Q2’2015)

• Firefox plugin

• Online documentation

S4 for developers

Text Mining & Knowledge Graphs in the Cloud with S4 #43 Feb 2015

Page 44: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• March 1st – 30th 2015

• Submit a cool text analytics & Linked Data application using S4

• $1,000 for the winning submission

• More details at http://bit.ly/s4-challenge

S4 Developers Challenge

Text Mining & Knowledge Graphs in the Cloud with S4 #44 Feb 2015

Page 45: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Roadmap

Text Mining & Knowledge Graphs in the Cloud with S4 #45 Feb 2015

Page 46: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Text analytics

– Multi-lingual text analytics

– Sentiment analytics

– JSON-LD output format

• RDF databases

– Fully managed RDF DBaaS

– Regular updates of the self-managed GraphDB on AWS

• Knowledge Graphs

– Private knowledge graph databases with DBpedia/Wikidata

– 3rd party Linked Data visualisation & exploration tools

What to expect in 2015?

Text Mining & Knowledge Graphs in the Cloud with S4 #46 Feb 2015

Page 47: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Pricing plans

– Simple, transparent, usage based pricing

– Pay only for what you use, when you use it

• For developers

– UIMA plugin for S4

– More SDKs

– mode add-ons

– Demos and sample code

– S4 Developers Challenges

What to expect in 2015?

Text Mining & Knowledge Graphs in the Cloud with S4 #47 Feb 2015

Page 48: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Key Takeaways

Text Mining & Knowledge Graphs in the Cloud with S4 #48 Feb 2015

Page 49: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Semantic technologies provide good capabilities for smart data management

• Key S4 benefits

– Lowers the risks and costs for semantic technology adoption

– Shortens time-to-market, reduces TCO

– Provides a safe migration path into custom enterprise solutions with Ontotext technology

• Key S4 capabilities

– Various text analytics components (more to come!)

– Self-managed & fully managed RDF DB in the Cloud

– Knowledge graphs with reusable open data

Key Takeaways

Text Mining & Knowledge Graphs in the Cloud with S4 #49 Feb 2015

Page 50: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

• Online documentation

– http://docs.s4.ontotext.com/

• Sample code & demos on GitHub

– https://github.com/Ontotext-AD/S4

• Helpdesk

– http://support.s4.ontotext.com/

• Twitter

– @Ontotext_S4

Additional S4 resources

#50 Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015

Page 51: Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

Thank you!

Text Mining and Knowledge Graphs in the Cloud:

The Self-Service Semantic Suite

A link to the recording will be sent out shortly

Feb 26th, 2015

#51 Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015