analyzing text & images - getting more insight from web content with s4

41
Analyzing Text & Images Getting More Insight from Web Content with S4 Marin Dimitrov (CTO of Ontotext) & Georgi Kadrev (CEO of Imagga) Nov 2015

Upload: marin-dimitrov

Post on 15-Feb-2017

2.068 views

Category:

Technology


3 download

TRANSCRIPT

Analyzing Text & Images Getting More Insight from Web Content with S4

Marin Dimitrov (CTO of Ontotext) & Georgi Kadrev (CEO of Imagga)

Nov 2015

Some Ontotext Customers

2 Nov 2015 S4 Webinar - Analyzing Text & Images

S4 Webinar - Analyzing Text & Images

Smart Data Management

3

Graph Database

• Flexible RDF graph data model

• Ontology metadata layer

Semantic Search

• Semantic, exploratory search • Information discovery • Metadata driven content

Text Mining & Interlinking

• People, locations, organisations, topics

• Discover implicit relations • Reuse open knowledge

graphs

Nov 2015

Presentation Outline

• The Self-Service Semantic Suite (S4)

• Recent Updates

• Combining Text + Images Analytics

• Imagga Image Tagging

• Roadmap

• Q & A

4 Nov 2015 S4 Webinar - Analyzing Text & Images

What Is S4?

• Capabilities for Smart Data management and analytics

−Text analytics for news, life sciences and social media

−RDF graph database as-a-service

−Access to large open knowledge graphs

• Available on-demand, anytime, anywhere

−Simple RESTful services

• Simple pay-per-use pricing

−No upfront commitments

5 Nov 2015 S4 Webinar - Analyzing Text & Images

What Is S4?

6 Nov 2015 S4 Webinar - Analyzing Text & Images

+ Image analytics

via Imagga

S4 Benefits

• Enables quick prototyping

− Instantly available, no provisioning & operations required

−Focus on building applications, don’t worry about software + infrastructure

• Free tier!

• Easy to start, shorter learning curve

−Detailed documentation, various add-ons, SDKs and demo code

• Based on enterprise technology by Ontotext

7 Nov 2015 S4 Webinar - Analyzing Text & Images

Nov 2015 S4 Webinar - Analyzing Text & Images

3. Check out the docs, demos

& sample code at

docs.s4.ontotext.com

Getting Started in Minutes

8

1. Register a personal account at s4.ontotext.com

2. Generate an

API key pair

4. Contact us

with questions!

Text Analytics with S4

• Text analytics services − News annotation

− News categorisation

− Biomedical

− Twitter

• Entity linking & disambiguation − Mappings to DBpedia & GeoNames instances

− Mappings to biomedical data sources (LinkedLifeData)

• HTML, MS Word, XML, plain text input

• Simple JSON output

9 Nov 2015 S4 Webinar - Analyzing Text & Images

+ Image analytics

via Imagga

News Analytics Example

10 Oct 2015

News Analytics Example

11 Oct 2015

S4 result

News Analytics Example

12 Oct 2015

S4 result

News Classification

13 Oct 2015

S4 result

Try It!

14 Oct 2015

API_KEY=…

KEY_SECRET=…

SERVICE_ENDPOINT="https://$API_KEY:[email protected]/v1/news-classifier"

URL="http://www.theguardian.com/world/2015/nov/11/germany-spied-fbi-un-bodies-french-foreign-minister"

JSON_REQUEST="{\"documentUrl\" : \"$URL\",

\"documentType\" : \"text/html\“ }"

curl -X POST -H "Content-Type: application/json" –H “Accept: application/json” -d "$JSON_REQUEST"

$SERVICE_ENDPOINT

API_KEY=…

KEY_SECRET=…

SERVICE_ENDPOINT="https://$API_KEY:[email protected]/v1/news"

URL="http://www.theguardian.com/world/2015/nov/11/germany-spied-fbi-un-bodies-french-foreign-minister"

JSON_REQUEST="{\"documentUrl\" : \"$URL\",

\"documentType\" : \"text/html\“ }"

curl -X POST -H "Content-Type: application/json" –H “Accept: application/json” -d "$JSON_REQUEST"

$SERVICE_ENDPOINT

Biomedical Analytics

15 Aug 2015 Introduction to Semantic Technology 15 Nov 2015 S4 Webinar - Analyzing Text & Images

What Is S4?

16 Nov 2015 S4 Webinar - Analyzing Text & Images

RDF Graph Databases – Advantages

• Simple, graph based data model

• Agile schema / schema-less / schema-late

• Ontology-based schema

• Global identifiers of resources (entities)

• Inference of implicit facts, based on rules

• Exploratory queries against unknown schema

• Compliance to standards (RDF, SPARQL), no vendor lock-in

17 Nov 2015 S4 Webinar - Analyzing Text & Images

RDF Graph Databases – Inferring New Facts

18 Nov 2015 S4 Webinar - Analyzing Text & Images

RDF Graph Database-as-a-Service Benefits

• Evaluate the technology

• Instant deployment

• Faster experimentation & application development

• Data services / Open Data publishing

• Reducing TCO & risk

19 Nov 2015 S4 Webinar - Analyzing Text & Images

Self-managed RDF DBaaS

• Available from AWS Marketplace, “1-Click” purchasing

• Variety of hardware configurations

• Manage large RDF data volumes

• Pay-per-hour pricing, 5-day trial

• Users take care of operations

−Backups, restores

20 Nov 2015 S4 Webinar - Analyzing Text & Images

Fully Managed RDF DBaaS

• Low-cost graph DBaaS available 24/7 on S4

• Ideal for small to moderate data & query volumes

−database options: 1M, 10M, 50M, 250M & 1B triples

• Instantly deploy new databases when needed

• Zero administration

−automated operations, maintenance & upgrades

• Users pay only for the actual database utilisation

• Standard OpenRDF REST API, 3rd party tools 21 Nov 2015 S4 Webinar - Analyzing Text & Images

Fully Managed RDF DBaaS

22 Nov 2015 S4 Webinar - Analyzing Text & Images

Fully Managed RDF DBaaS

23 Nov 2015 S4 Webinar - Analyzing Text & Images

OpenRDF REST API

24

resource operations comments

/repositories GET Get info on DB repos

/repositories/<REPOSITORY> GET, POST, PUT, DELETE Create*, delete, query a repository

/repositories/<REPOSITORY>/size GET Gets the number of triples in a

repository

/repositories/<REPOSITORY>/statements GET, POST, PUT, DELETE Add, read, update, delete statements

repositories/<REPOSITORY>/rdf-graphs/<GRAPH> GET, POST, PUT, DELETE

Same as above

Nov 2015 S4 Webinar - Analyzing Text & Images

Presentation Outline

• The Self-Service Semantic Suite (S4)

• Recent Updates

• Combining Text + Images Analytics

• Imagga Image Tagging

• Roadmap

• Q & A

25 Nov 2015 S4 Webinar - Analyzing Text & Images

Monthly Upgrades of the RDF DBaaS

26 Nov 2015 S4 Webinar - Analyzing Text & Images

Recent RDF DBaaS Updates

• Improved stability, bugfixes

• Fine-grained access control

− repositories can be open for R/O public data access

− Useful for Open Data publishing

• Database exports in various formats

• Automated backup & restore

• Context indices

• Sample code in various programming languages

• Improved documentation 27 Nov 2015 S4 Webinar - Analyzing Text & Images

Python SDK

28 Nov 2015 S4 Webinar - Analyzing Text & Images

Python SDK

29 Nov 2015 S4 Webinar - Analyzing Text & Images

key = '<api-key>'

secret = '<api-secret>'

endpoint = "https://text.s4.ontotext.com/v1/news“

# Prepare the data

data = {

"documentUrl": "<document url goes here>",

"documentType": "text/html",

}

jsonData = json.dumps(data)

# Prepare the POST headers

headers = {

'Accept': "application/json",

'Content-type': "application/json",

'Accept-Encoding': "gzip",

}

# Prepare & execute the request

req = requests.post(endpoint, headers=headers, data=jsonData, auth=(key, secret))

response = json.loads(req.content.decode('utf-8'))

print(response)

More SDKs via Swagger

30 Nov 2015 S4 Webinar - Analyzing Text & Images

Presentation Outline

• The Self-Service Semantic Suite (S4)

• Recent Updates

• Combining Text + Images Analytics

• Imagga Image Tagging

• Roadmap

• Q & A

31 Nov 2015 S4 Webinar - Analyzing Text & Images

Text + Image Analytics

• Enrich the entities, categories & keywords extracted from text content with image tags & categories

• Image analytics via Imagga Image Tagging API

• Easy to use with S4

32 Nov 2015 S4 Webinar - Analyzing Text & Images

{ "text“ : "The text of the document",

"entities“ : {

"AnnotationType1“ : […],

"AnnotationType2“ : […] },

"images": [

{ "image“ : "imageURL",

"tags“ : [

{ "confidence": …,

"tag“ : "SampleTag“ },

{ "confidence": …,

"tag“ : "SampleTag2“ }

],

"categories": [

{ "confidence": ….,

"name“ : "SampleCategory1“ },

{ "confidence": …,

"name“ : "SampleCategory2“ }

]

}

]

}

Text + Image Analytics

33 Nov 2015 S4 Webinar - Analyzing Text & Images

{

"documentType": "text/html",

"documentUrl": "<Paste your url here>",

"imageTagging": true,

"imageCategorization": true

}

Request Response

News + Image Analytics

34 Oct 2015

API_KEY=…

KEY_SECRET=…

SERVICE_ENDPOINT="https://$API_KEY:[email protected]/v1/news"

URL="http://www.theguardian.com/world/2015/nov/11/germany-spied-fbi-un-bodies-french-foreign-minister"

JSON_REQUEST="{\"documentUrl\" : \"$URL\",

\"documentType\" : \"text/html\“ ,

\"imageTagging\" : true,

\"imageCategorization\" : true }"

curl -X POST -H "Content-Type: application/json" –H “Accept: application/json” -d "$JSON_REQUEST"

$SERVICE_ENDPOINT

API_KEY=…

KEY_SECRET=…

SERVICE_ENDPOINT="https://$API_KEY:[email protected]/v1/news"

URL="http://www.theguardian.com/world/2015/nov/11/germany-spied-fbi-un-bodies-french-foreign-minister"

JSON_REQUEST="{\"documentUrl\" : \"$URL\",

\"documentType\" : \"text/html\“ }"

curl -X POST -H "Content-Type: application/json" –H “Accept: application/json” -d "$JSON_REQUEST"

$SERVICE_ENDPOINT

News + Image Analytics

35 Oct 2015

S4 + Imagga result

Presentation Outline

• The Self-Service Semantic Suite (S4)

• Recent Updates

• Combining Text + Images Analytics

• Imagga Image Tagging

• Roadmap

• Q & A

36 Nov 2015 S4 Webinar - Analyzing Text & Images

Presentation Outline

• The Self-Service Semantic Suite (S4)

• Recent Updates

• Combining Text + Images Analytics

• Imagga Image Tagging

• Roadmap

• Q & A

37 Nov 2015 S4 Webinar - Analyzing Text & Images

Roadmap

• RDF graph database-as-a-service

− Regular upgrades

− GraphDB Workbench

− Fully managed DBaaS of up to 1 billion triples

• Text Analytics

− Multi-lingual pipelines

− Large-scale processing

− Improvements of the integrated text + image analytics

• Video analytics via Imagga API 38 Nov 2015 S4 Webinar - Analyzing Text & Images

Key Takeaways

• S4 provides key capabilities for Smart Data management & analytics

−Text analytics

−RDF graph database-as-a-service

−Knowledge graphs

• S4 enables faster prototyping

• Integrated text+image analytics for more insight from web content

• Check out http://s4.ontotext.com

39 Nov 2015 S4 Webinar - Analyzing Text & Images

Analyzing Text & Images: Getting More Insight from Web Content with S4

Thank You!