text mining and knowledge graphs in the cloud: the self-service semantic suite (s4)
TRANSCRIPT
Text Mining and Knowledge Graphs in the Cloud: The Self-Service Semantic
Suite (S4)
A webinar with Marin Dimitrov, CTO of Ontotext
Feb 26th, 2015
Text Mining & Knowledge Graphs in the Cloud with S4 #1 Feb 2015
• Semantic technologies for data management
• Self-Service Semantic Suite (S4)
• Text analytics
• RDF data management in the Cloud
• Knowledge graphs
• S4 for developers
• Roadmap
• Q&A session
Today’s Topics
Text Mining & Knowledge Graphs in the Cloud with S4 #2 Feb 2015
About Ontotext
• Provides products & solutions for content enrichment and metadata management
– 70 employees, head quartered in Sofia (Bulgaria)
– Sales presence in London, Washington & Boston
• Major clients and industries
– Media & Publishing
– Health Care & Life Sciences
– Cultural Heritage & Digital Libraries
– Government
– Education
Text Mining & Knowledge Graphs in the Cloud with S4 #3 Feb 2015
Some of our clients
Text Mining & Knowledge Graphs in the Cloud with S4 #4 Feb 2015
Semantic Technologies for Smart Data Management
Text Mining & Knowledge Graphs in the Cloud with S4 #5 Feb 2015
• How can we unlock more insight from text?
• How can we interlink & search across text and structured data sources?
• How can we improve data & content reuse?
• How can we integrate data sources faster?
• How can we reuse external open data sources?
• How can we discover relations between entities?
Typical challenges for our customers
Text Mining & Knowledge Graphs in the Cloud with S4 #6 Feb 2015
Ontotext’s vision for smart data management
Graph Database • Flexible RDF graph
data model • Ontology metadata
layer
Semantic Search • Semantic,
exploratory search • Metadata driven
content
Text Mining & Interlinking • People, locations,
organisations, topics • Discover implicit
relations • Reuse open knowledge
graphs
Text Mining & Knowledge Graphs in the Cloud with S4 #7 Feb 2015
Ontotext and AstraZeneca
Profile • Global, Bio-pharma company • $28 billion in sales in 2012 • $4 billion in R&D across three continents
Goals • Efficient design of new clinical studies • Quick access to all of the data • Improved evidence based decision-making • Strengthen the knowledge feedback loop • Enable predictive science
Challenges • Over 7,000 studies and 23,000 documents
are difficult to obtain • Searches returning 1,000 – 10,000 results • Document repositories not designed for
reuse • Tedious process to arrive at evidence
based decisions
#8 Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
Ontotext and LMI
Profile • Established in 1961 to enable federal
agencies • Specializes in logistics, financial,
infrastructure & information management
Goals • Unlock large collections of complex
documents • Improve analyst productivity • Create an application they can sell to US
Federal agencies
Challenges • Analysts taking hours to find, download
and search documents, using inaccurate keyword searches
• Needed a knowledge base to search quickly and guide the analysts – highly relevant searches
• Extracts knowledge from collection of documents
• Uses GraphDB to intuitively search and filter • More than 90% savings in analyst time • Accurate results
#9 Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
Ontotext and Euromoney
Profile • Euromoney Institutional Investor PLC, the
international online information and events group
Goals • Create a horizontal platform to serve 100
different publications / 80 business units • create a new unified publishing and
information platform
Challenges • Different domains covered • Sophisticated content analytics incl.
relation, template and scenario extraction
• Text analytics of reports and news in various domains
• Extraction of sophisticated macro economic views on markets and market conditions
• Triplestore for flexible data integration & reasoning
• Multi-faceted search
#10 Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
The Self-Service Semantic Suite (S4)
Text Mining & Knowledge Graphs in the Cloud with S4 #11 Feb 2015
• Unlock the value of semantic technologies to SMEs
– Most success stories so far come from bigger companies
• Lower the technology adoption barriers and risks
– Challenge: perceived risks associated with new technology adoption
– Challenge: insufficient resources to implement new technologies
– Challenge: bureaucratic budgeting, procurement & provisioning processes
Why did we create S4?
Text Mining & Knowledge Graphs in the Cloud with S4 #12 Feb 2015
• Self-service capabilities for text analytics, content enrichment and metadata management
– Text analytics for news, life sciences and social media
– RDF graph database as-a-service
– Access to large open knowledge graphs
• Available anytime, anywhere
– Simple RESTful services
• Simple, pay-per-use pricing
– No upfront commitments
What is S4?
Text Mining & Knowledge Graphs in the Cloud with S4 #13 Feb 2015
• Utilise semantic technology for smart data applications
– Extract more value hidden in text
– Interlink structured and unstructured data sources
– Semantic search (instead of keyword-based search)
– Reuse open knowledge graphs
• Low adoption cost and risk
• No need for complex planning & procurement
• Pay only for what you use, reduce TCO
S4 benefits
Text Mining & Knowledge Graphs in the Cloud with S4 #14 Feb 2015
• Enables quick prototyping & shorter time-to-market, increase innovation speed
• Available on-demand in the cloud, no provisioning & operations required
• Based on enterprise grade semantic technology by Ontotext
• Migration path from S4 based prototypes to customised enterprise solutions with Ontotext technology
S4 benefits
Text Mining & Knowledge Graphs in the Cloud with S4 #15 Feb 2015
• Instantly available
• Free tier
• Easy to start, shorter learning curve
– Various add-ons, SDKs and demo code
• Simplify the technology stack for smart data applications
• Focus on building applications, don’t worry about infrastructure & operations
• Quicker prototyping, shorter development cycles
S4 for developers
Text Mining & Knowledge Graphs in the Cloud with S4 #16 Feb 2015
Text Analytics
Text Mining & Knowledge Graphs in the Cloud with S4 #17 Feb 2015
• Text analytics services
– News annotation
– News categorisation
– Biomedical
• Entity linking & disambiguation
– Mappings to DBpedia & GeoNames instances
– Mappings to biomedical data sources (LinkedLifeData)
• HTML, MS Word, XML, plain text input
• Simple JSON output
Text analytics with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #18 Feb 2015
• Entity types
– Person
– Organization
– Location
– Relation (affiliation, customer, competitor, partner, acquisition, role, …)
– Keywords and key phrases
• Enterprise grade technology
– Based on successful text mining solutions for big media & publishing companies
News analytics with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #19 Feb 2015
Text Mining & Knowledge Graphs in the Cloud with S4 #20 Feb 2015
News analytics with S4
News analytics example
Text Mining & Knowledge Graphs in the Cloud with S4 #21 Feb 2015
S4 result
News analytics example
Text Mining & Knowledge Graphs in the Cloud with S4 #22 Feb 2015
API_KEY=s4trm64sb76u KEY_SECRET=lrcki2kkajslsp6 SERVICE_ENDPOINT="https://$API_KEY:[email protected]/v1/news" CONTENT="President Barack Obama is urging parents to get their children vaccinated in the face of a measles outbreak that has infected more than 100 people in the United States. In excerpts from an interview with NBC News that will air on Monday, Obama said measles was a preventable disease." CONTENT_TYPE="text/plain" JSON_REQUEST="{\"document\" : \"$CONTENT\", \"documentType\" : \"$CONTENT_TYPE\"}" curl -X POST -H "Content-Type: application/json" -d "$JSON_REQUEST" $SERVICE_ENDPOINT
{ "document" : "President Barack Obama is urging parents to get their children vaccinated in the face of a measles outbreak that has infected more than 100 people in the United States. In excerpts from an interview with NBC News that will air on Monday, Obama said measles was a preventable disease" , "documentType" : "text/plain" }
API key pair REST service
text
Request structure
Request structure
• 17 top-level categories from the IPTC Subject Reference System
– Arts / Culture / Entertainment, Crime / Law / Justice, Disaster / Accident, Economy / Business / Finance, Education, Environment, Health, Politics, …
• Enterprise grade technology
– Based on successful text mining solutions for big media & publishing companies
News classification with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #23 Feb 2015
News classification example
Text Mining & Knowledge Graphs in the Cloud with S4 #24 Feb 2015
S4 result
News classification example
Text Mining & Knowledge Graphs in the Cloud with S4 #25 Feb 2015
API_KEY=s4trm64sb76u KEY_SECRET=lrcki2kkajslsp6 SERVICE_ENDPOINT="https://$API_KEY:[email protected]/v1/news-classifier" CONTENT_URL="http://www.theguardian.com/world/2015/feb/04/taiwan-plane-crash-lands-in-river" CONTENT_TYPE="text/plain" JSON_REQUEST="{\"documentUrl\" : \"$CONTENT_URL\", \"documentType\" : \"$CONTENT_TYPE\"}" curl -X POST -H "Content-Type: application/json" -d "$JSON_REQUEST" $SERVICE_ENDPOINT
{ "documentUrl" : "http://www.theguardian.com/world/2015/feb/04/taiwan-plane-crash-lands-in-river" , "documentType" : "text/html" }
API key pair REST service
URL Request structure
Request structure
• 130 biomedical entity types
– Organism, Virus, Animal, Anatomical Structure, Organ, Tissue, Cell, Genome, Chemical, Lab Result, Clinical Drug, Biologic Function, Organ Function, Disease/Syndrome, …
• Enterprise grade technology
– Based on successful text mining solutions for big pharmaceuticals and healthcare providers
Biomedical analytics with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #26 Feb 2015
Biomedical analytics with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #27 Feb 2015
Biomedical analytics example
Text Mining & Knowledge Graphs in the Cloud with S4 #28 Feb 2015
S4 result
• Entity types
– Person, Location, Organisation, Date, Address, Money
– Hashtag, Emoticon, URL, @UserID
• Based on TwitIE microblog pipeline by GATE / University of Sheffield
Twitter analytics with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #29 Feb 2015
Twitter analytics example
Text Mining & Knowledge Graphs in the Cloud with S4 #30 Feb 2015
RDF Data Management
Text Mining & Knowledge Graphs in the Cloud with S4 #31 Feb 2015
• Standards compliance
– Based on a mature set of W3C standards: RDF/S, OWL, SPARQL
– Portability & interoperability
• Schema-less data integration, easy querying of diverse data
• Complex & exploratory queries
• Infer implicit relations in the graph
• Reuse open knowledge graphs (Linked Open Data)
RDF for smart data management
Text Mining & Knowledge Graphs in the Cloud with S4 #32 Feb 2015
A visual view of RDF data
Text Mining & Knowledge Graphs in the Cloud with S4 #33 Feb 2015
Sub-properties Sub-classes Transitive relations Inference
• High performance RDF database
• Full SPARQL 1.1 support
• Various reasoning profiles, including custom rules
• Efficient data integration (“sameAs” optimisations)
• Efficient deletion of statements & their inferences
• Geo-spatial indexing & querying with SPARQL
• RDF Rank, full-text search, 3rd party plugins
GraphDB by Ontotext
Text Mining & Knowledge Graphs in the Cloud with S4 #34 Feb 2015
• Ideal for customers who are…
– still evaluating and testing RDF technology
– In the early phase of adoption / POC
• Enterprise grade RDF database in the Cloud
– No need for upfront payments for licenses & hardware
– Pay only for what you use, when you use it
– Instantly operational within minutes
– No need for complex planning - use as many DB instances for as long as needed
– Timely upgrades to the latest version
• Self-managed and full-managed options
RDF database in the Cloud with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #35 Feb 2015
• Available from AWS Marketplace
• Variety of hardware configurations
– 2 to 8 CPU cores / 8 to 61 GB RAM
– IOPS performance & encryption (EBS)
• Manage large data volumes
• Pay-per-hour pricing
Self-managed database in the Cloud
Text Mining & Knowledge Graphs in the Cloud with S4 #36 Feb 2015
• (available in Q2’2015)
• Low-cost DBaaS available 24/7
• Ideal for small & moderate data volumes
• Instantly start new databases when needed
• Zero administration: automated operations, maintenance & upgrades
• Users pay only for the actual database utilisation
– database size + number of queries per period
Fully-managed database in the Cloud
Text Mining & Knowledge Graphs in the Cloud with S4 #37 Feb 2015
Knowledge Graphs
Text Mining & Knowledge Graphs in the Cloud with S4 #38 Feb 2015
• SPARQL query endpoint to FactForge knowledge graph
– 500 million entities
– 5 billion triples
• Key LOD datasets integrated
– DBpedia, Freebase, GeoNames, WordNet
– Dublin Core, SKOS, PROTON ontologies and vocabularies
Knowledge graphs with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #39 Feb 2015
Knowledge graph query example
Text Mining & Knowledge Graphs in the Cloud with S4 #40 Feb 2015
SPARQL query using DBpedia
data
For Developers
Text Mining & Knowledge Graphs in the Cloud with S4 #41 Feb 2015
Getting started in minutes
Text Mining & Knowledge Graphs in the Cloud with S4 #42 Feb 2015
1. Register a personal account at s4.ontotext.com
2. Generate an API key pair
3. Check out the docs, demos & code at
docs.s4.ontotext.com
4. Contact us with questions!
• Java & C# SDKs
• Sample code
– Java, C#, NodeJS, JavaScript, Python, PHP, Groovy
– Curl examples for the most impatient
• GATE plugin (UIMA plugin in Q2’2015)
• Firefox plugin
• Online documentation
S4 for developers
Text Mining & Knowledge Graphs in the Cloud with S4 #43 Feb 2015
• March 1st – 30th 2015
• Submit a cool text analytics & Linked Data application using S4
• $1,000 for the winning submission
• More details at http://bit.ly/s4-challenge
S4 Developers Challenge
Text Mining & Knowledge Graphs in the Cloud with S4 #44 Feb 2015
Roadmap
Text Mining & Knowledge Graphs in the Cloud with S4 #45 Feb 2015
• Text analytics
– Multi-lingual text analytics
– Sentiment analytics
– JSON-LD output format
• RDF databases
– Fully managed RDF DBaaS
– Regular updates of the self-managed GraphDB on AWS
• Knowledge Graphs
– Private knowledge graph databases with DBpedia/Wikidata
– 3rd party Linked Data visualisation & exploration tools
What to expect in 2015?
Text Mining & Knowledge Graphs in the Cloud with S4 #46 Feb 2015
• Pricing plans
– Simple, transparent, usage based pricing
– Pay only for what you use, when you use it
• For developers
– UIMA plugin for S4
– More SDKs
– mode add-ons
– Demos and sample code
– S4 Developers Challenges
What to expect in 2015?
Text Mining & Knowledge Graphs in the Cloud with S4 #47 Feb 2015
Key Takeaways
Text Mining & Knowledge Graphs in the Cloud with S4 #48 Feb 2015
• Semantic technologies provide good capabilities for smart data management
• Key S4 benefits
– Lowers the risks and costs for semantic technology adoption
– Shortens time-to-market, reduces TCO
– Provides a safe migration path into custom enterprise solutions with Ontotext technology
• Key S4 capabilities
– Various text analytics components (more to come!)
– Self-managed & fully managed RDF DB in the Cloud
– Knowledge graphs with reusable open data
Key Takeaways
Text Mining & Knowledge Graphs in the Cloud with S4 #49 Feb 2015
• Online documentation
– http://docs.s4.ontotext.com/
• Sample code & demos on GitHub
– https://github.com/Ontotext-AD/S4
• Helpdesk
– http://support.s4.ontotext.com/
– @Ontotext_S4
Additional S4 resources
#50 Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
Thank you!
Text Mining and Knowledge Graphs in the Cloud:
The Self-Service Semantic Suite
A link to the recording will be sent out shortly
Feb 26th, 2015
#51 Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015