analyzing text & images - getting more insight from web content with s4
Post on 15-Feb-2017
2.068 Views
Preview:
TRANSCRIPT
Analyzing Text & Images Getting More Insight from Web Content with S4
Marin Dimitrov (CTO of Ontotext) & Georgi Kadrev (CEO of Imagga)
Nov 2015
S4 Webinar - Analyzing Text & Images
Smart Data Management
3
Graph Database
• Flexible RDF graph data model
• Ontology metadata layer
Semantic Search
• Semantic, exploratory search • Information discovery • Metadata driven content
Text Mining & Interlinking
• People, locations, organisations, topics
• Discover implicit relations • Reuse open knowledge
graphs
Nov 2015
Presentation Outline
• The Self-Service Semantic Suite (S4)
• Recent Updates
• Combining Text + Images Analytics
• Imagga Image Tagging
• Roadmap
• Q & A
4 Nov 2015 S4 Webinar - Analyzing Text & Images
What Is S4?
• Capabilities for Smart Data management and analytics
−Text analytics for news, life sciences and social media
−RDF graph database as-a-service
−Access to large open knowledge graphs
• Available on-demand, anytime, anywhere
−Simple RESTful services
• Simple pay-per-use pricing
−No upfront commitments
5 Nov 2015 S4 Webinar - Analyzing Text & Images
S4 Benefits
• Enables quick prototyping
− Instantly available, no provisioning & operations required
−Focus on building applications, don’t worry about software + infrastructure
• Free tier!
• Easy to start, shorter learning curve
−Detailed documentation, various add-ons, SDKs and demo code
• Based on enterprise technology by Ontotext
7 Nov 2015 S4 Webinar - Analyzing Text & Images
Nov 2015 S4 Webinar - Analyzing Text & Images
3. Check out the docs, demos
& sample code at
docs.s4.ontotext.com
Getting Started in Minutes
8
1. Register a personal account at s4.ontotext.com
2. Generate an
API key pair
4. Contact us
with questions!
Text Analytics with S4
• Text analytics services − News annotation
− News categorisation
− Biomedical
• Entity linking & disambiguation − Mappings to DBpedia & GeoNames instances
− Mappings to biomedical data sources (LinkedLifeData)
• HTML, MS Word, XML, plain text input
• Simple JSON output
9 Nov 2015 S4 Webinar - Analyzing Text & Images
+ Image analytics
via Imagga
Try It!
14 Oct 2015
API_KEY=…
KEY_SECRET=…
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@text.s4.ontotext.com/v1/news-classifier"
URL="http://www.theguardian.com/world/2015/nov/11/germany-spied-fbi-un-bodies-french-foreign-minister"
JSON_REQUEST="{\"documentUrl\" : \"$URL\",
\"documentType\" : \"text/html\“ }"
curl -X POST -H "Content-Type: application/json" –H “Accept: application/json” -d "$JSON_REQUEST"
$SERVICE_ENDPOINT
API_KEY=…
KEY_SECRET=…
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@text.s4.ontotext.com/v1/news"
URL="http://www.theguardian.com/world/2015/nov/11/germany-spied-fbi-un-bodies-french-foreign-minister"
JSON_REQUEST="{\"documentUrl\" : \"$URL\",
\"documentType\" : \"text/html\“ }"
curl -X POST -H "Content-Type: application/json" –H “Accept: application/json” -d "$JSON_REQUEST"
$SERVICE_ENDPOINT
Biomedical Analytics
15 Aug 2015 Introduction to Semantic Technology 15 Nov 2015 S4 Webinar - Analyzing Text & Images
RDF Graph Databases – Advantages
• Simple, graph based data model
• Agile schema / schema-less / schema-late
• Ontology-based schema
• Global identifiers of resources (entities)
• Inference of implicit facts, based on rules
• Exploratory queries against unknown schema
• Compliance to standards (RDF, SPARQL), no vendor lock-in
17 Nov 2015 S4 Webinar - Analyzing Text & Images
RDF Graph Database-as-a-Service Benefits
• Evaluate the technology
• Instant deployment
• Faster experimentation & application development
• Data services / Open Data publishing
• Reducing TCO & risk
19 Nov 2015 S4 Webinar - Analyzing Text & Images
Self-managed RDF DBaaS
• Available from AWS Marketplace, “1-Click” purchasing
• Variety of hardware configurations
• Manage large RDF data volumes
• Pay-per-hour pricing, 5-day trial
• Users take care of operations
−Backups, restores
20 Nov 2015 S4 Webinar - Analyzing Text & Images
Fully Managed RDF DBaaS
• Low-cost graph DBaaS available 24/7 on S4
• Ideal for small to moderate data & query volumes
−database options: 1M, 10M, 50M, 250M & 1B triples
• Instantly deploy new databases when needed
• Zero administration
−automated operations, maintenance & upgrades
• Users pay only for the actual database utilisation
• Standard OpenRDF REST API, 3rd party tools 21 Nov 2015 S4 Webinar - Analyzing Text & Images
OpenRDF REST API
24
resource operations comments
/repositories GET Get info on DB repos
/repositories/<REPOSITORY> GET, POST, PUT, DELETE Create*, delete, query a repository
/repositories/<REPOSITORY>/size GET Gets the number of triples in a
repository
/repositories/<REPOSITORY>/statements GET, POST, PUT, DELETE Add, read, update, delete statements
repositories/<REPOSITORY>/rdf-graphs/<GRAPH> GET, POST, PUT, DELETE
Same as above
Nov 2015 S4 Webinar - Analyzing Text & Images
Presentation Outline
• The Self-Service Semantic Suite (S4)
• Recent Updates
• Combining Text + Images Analytics
• Imagga Image Tagging
• Roadmap
• Q & A
25 Nov 2015 S4 Webinar - Analyzing Text & Images
Recent RDF DBaaS Updates
• Improved stability, bugfixes
• Fine-grained access control
− repositories can be open for R/O public data access
− Useful for Open Data publishing
• Database exports in various formats
• Automated backup & restore
• Context indices
• Sample code in various programming languages
• Improved documentation 27 Nov 2015 S4 Webinar - Analyzing Text & Images
Python SDK
29 Nov 2015 S4 Webinar - Analyzing Text & Images
key = '<api-key>'
secret = '<api-secret>'
endpoint = "https://text.s4.ontotext.com/v1/news“
# Prepare the data
data = {
"documentUrl": "<document url goes here>",
"documentType": "text/html",
}
jsonData = json.dumps(data)
# Prepare the POST headers
headers = {
'Accept': "application/json",
'Content-type': "application/json",
'Accept-Encoding': "gzip",
}
# Prepare & execute the request
req = requests.post(endpoint, headers=headers, data=jsonData, auth=(key, secret))
response = json.loads(req.content.decode('utf-8'))
print(response)
Presentation Outline
• The Self-Service Semantic Suite (S4)
• Recent Updates
• Combining Text + Images Analytics
• Imagga Image Tagging
• Roadmap
• Q & A
31 Nov 2015 S4 Webinar - Analyzing Text & Images
Text + Image Analytics
• Enrich the entities, categories & keywords extracted from text content with image tags & categories
• Image analytics via Imagga Image Tagging API
• Easy to use with S4
32 Nov 2015 S4 Webinar - Analyzing Text & Images
{ "text“ : "The text of the document",
"entities“ : {
"AnnotationType1“ : […],
"AnnotationType2“ : […] },
"images": [
{ "image“ : "imageURL",
"tags“ : [
{ "confidence": …,
"tag“ : "SampleTag“ },
{ "confidence": …,
"tag“ : "SampleTag2“ }
],
"categories": [
{ "confidence": ….,
"name“ : "SampleCategory1“ },
{ "confidence": …,
"name“ : "SampleCategory2“ }
]
}
]
}
Text + Image Analytics
33 Nov 2015 S4 Webinar - Analyzing Text & Images
{
"documentType": "text/html",
"documentUrl": "<Paste your url here>",
"imageTagging": true,
"imageCategorization": true
}
Request Response
News + Image Analytics
34 Oct 2015
API_KEY=…
KEY_SECRET=…
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@text.s4.ontotext.com/v1/news"
URL="http://www.theguardian.com/world/2015/nov/11/germany-spied-fbi-un-bodies-french-foreign-minister"
JSON_REQUEST="{\"documentUrl\" : \"$URL\",
\"documentType\" : \"text/html\“ ,
\"imageTagging\" : true,
\"imageCategorization\" : true }"
curl -X POST -H "Content-Type: application/json" –H “Accept: application/json” -d "$JSON_REQUEST"
$SERVICE_ENDPOINT
API_KEY=…
KEY_SECRET=…
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@text.s4.ontotext.com/v1/news"
URL="http://www.theguardian.com/world/2015/nov/11/germany-spied-fbi-un-bodies-french-foreign-minister"
JSON_REQUEST="{\"documentUrl\" : \"$URL\",
\"documentType\" : \"text/html\“ }"
curl -X POST -H "Content-Type: application/json" –H “Accept: application/json” -d "$JSON_REQUEST"
$SERVICE_ENDPOINT
Presentation Outline
• The Self-Service Semantic Suite (S4)
• Recent Updates
• Combining Text + Images Analytics
• Imagga Image Tagging
• Roadmap
• Q & A
36 Nov 2015 S4 Webinar - Analyzing Text & Images
Presentation Outline
• The Self-Service Semantic Suite (S4)
• Recent Updates
• Combining Text + Images Analytics
• Imagga Image Tagging
• Roadmap
• Q & A
37 Nov 2015 S4 Webinar - Analyzing Text & Images
Roadmap
• RDF graph database-as-a-service
− Regular upgrades
− GraphDB Workbench
− Fully managed DBaaS of up to 1 billion triples
• Text Analytics
− Multi-lingual pipelines
− Large-scale processing
− Improvements of the integrated text + image analytics
• Video analytics via Imagga API 38 Nov 2015 S4 Webinar - Analyzing Text & Images
Key Takeaways
• S4 provides key capabilities for Smart Data management & analytics
−Text analytics
−RDF graph database-as-a-service
−Knowledge graphs
• S4 enables faster prototyping
• Integrated text+image analytics for more insight from web content
• Check out http://s4.ontotext.com
39 Nov 2015 S4 Webinar - Analyzing Text & Images
Useful Links
• Self-Service Semantic Suite (S4)
−http://s4.ontotext.com/
−http://docs.s4.ontotext.com/
−Twitter: @Ontotext_S4
• Imagga Image Tagging
−http://imagga.com/
−http://docs.imagga.com/
−Twitter: @Imagga
40 Nov 2015 S4 Webinar - Analyzing Text & Images
top related