catching the wave dave clarke - synaptica llc · 11/6/2018 · concepts in taxonomies provide the...
TRANSCRIPT
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Catching the wave
Dave ClarkeCEO Synaptica
Catching the waveTools and Technology for Taxonomists
Taxonomy Bootcamp Washington DCNovember 6, 2018
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Agenda
AILinked DataOntologySemantic Web
Knowledge GraphsBlockchain
Big Data
Q2. How can new technology
help?
Q1. What are taxonomists doing today?
Q3. What tools do you need to succeed?
Three questions
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Preface
What is it we taxonomists do?
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Organize
Taxonomies are Knowledge Organization Systems.
When we do taxonomy we use industry standard data models to centralize and standardize the terminology used in our enterprise.
We define and unambiguously label enterprise terminology, and then we organize it into
hierarchical and associative concept schemes, which we call taxonomies or ontologies.
These schemes help us to understand how concepts, people, places, products, processes and organizations all relate to one another.
1
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Categorize
Categorization enables an enterprise to retrieve, sort and rank content based on what it is about.
Concepts in taxonomies provide the metadata values for tagging documents and database records.
When we categorize content we use the semantics of our taxonomies plus contextualization rules to
determine meaning and rank the relevance of content.
Taxonomy builds a bridge between the language people use to search and browse and the language found in documents.
2
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Discover
The end-goal of what we do is we help people to retrieve more complete and accurate information and to discover latent knowledge.
Our work isn’t finished if we stop at categorization.We need to work with search teams and
information architects to design and deliver semantic search and rich end-user
discovery experiences.
including browsable navigation, faceted query refinement, and the ability to recommend related content.
3
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Taxonomy Practitioners Survey
Q1. What are taxonomists doing today?
Online survey – takes 10 minuteslink from pinned tweet at
https://twitter.com/DavidClarkeBlog
Participate in this survey during the conference and we will email you the revised results
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions
6%
6%
18%
28%
36%
42%
44%
60%
66%
66%
0% 10% 20% 30% 40% 50% 60% 70%
Chatbots and conversational search
Sentiment analysis
Machine-reasoning
Product Information Management
Recommend related content
Content classification
Lookup lists & glossaries
Browsable navigation
Improve search accuracy
Faceted query refinement
Q1. How does your enterprise currently make use of taxonomies?
Taxonomy Practitioners Survey
What we would expect…
core taxonomy applications
Under-exploited… room for greater
adoption
Participate today using link from pinned tweet at https://twitter.com/DavidClarkeBlog
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions
6%
48%
70%
0% 10% 20% 30% 40% 50% 60% 70% 80%
Full text in-line document mark-upMetadata in search engines
Metadata in content systems
Q4. Where do you store your tagging data?
22%
78%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
In-line tagging
Document tagging
Q3. How general or specific is your content tagging?
48%
58%
70%
0% 10% 20% 30% 40% 50% 60% 70% 80%
Individual named entitiesBroad categories and topics
Highly specific categories and subjects
Q2. Do your taxonomies contain general categories or specific concepts and names?
Taxonomy Practitioners Survey
Even mix of people doing broad-level categorization and
highly specific subject indexing
Document retrieval more prevalent than
page-level access
Most taxonomy ends up as metadata
in CMS / search
Participate today using link from pinned tweet at https://twitter.com/DavidClarkeBlog
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions
12%
14%
20%
42%
52%
0% 10% 20% 30% 40% 50% 60%
Linked Open DataAvailable for adoption and reuse
Shared with trusted partnersPublic-facing
Within a company intranet
Q6. How public or private are your taxonomies?
22%
38%
44%
50%
86%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Specialized ontology relationshipsDisambiguators
Associative relationshipsPreferred and alternative labels
Hierarchy
Q5. How are your lexicons / taxonomies / thesauri / ontologies structured?
Taxonomy Practitioners Survey
What we would
expect…
core taxonomy
Under-exploited…
room for greater
adoption
Personal prediction…
semantic web and
automation will drive
greater sharing of
taxonomies even between
commercial enterprises
Participate today using link from pinned tweet at https://twitter.com/DavidClarkeBlog
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions
2%
10%
14%
18%
20%
32%
44%
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%
W3C SKOS-XL
W3C OWL
Schema.org
W3C SKOS
W3C Linked Data
Dublin Core
ANSI-NISO Z39.19 / ISO 25964
Q8. Which industry standards do you use for your taxonomies and/or for tagging?
4%
4%
10%
10%
14%
16%
26%
96%
0% 20% 40% 60% 80% 100% 120%
Outsource to consultants
Commercial licenses
Tag using third-party taxonomies
Linked Open Data taxonomies
Public-domain taxonomies
Map to third-party taxonomies
Reference third-party taxonomies
Create our own in-house
Q7. How do you create your taxonomies and/or use third-party taxonomies?
Taxonomy Practitioners Survey
Most taxonomy still done
within and specifically for the enterprise
Personal prediction… much greater adoption
of LOD taxonomies Miniscule adoption but this spec. means a lot
to academic publishers and life-sciences
Helps your metadata drive SEO Enabler for
machine reasoning but low adoption because of its
complexity
Participate today using link from pinned tweet at https://twitter.com/DavidClarkeBlog
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions
6%
10%
10%
18%
20%
30%
34%
42%
44%
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%
Outsourced content tagging
Training-sets and machine learning
Fully automated categorization
Human-supervised auto-categorization
Editable auto-categorization rules
Tagging images, audio, and video
Self-tag using free-text keywords
Self-tag content using taxonomies
In-house professional indexers
Q9. What processes do you use to tag content?
Taxonomy Practitioners Survey
Human tagging dominant
Rules-based, human-supervised auto-
categorization more prevalent than fully
automated or machine learning processes
Participate today using link from pinned tweet at https://twitter.com/DavidClarkeBlog
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions
18%
18%
20%
36%
52%
54%
58%
72%
0% 10% 20% 30% 40% 50% 60% 70% 80%
Blockchain
Internet of Things (IoT)
Chatbots and conversational search
Graph databases
Semantic Web
Big data and data science
Linked Data
AI / ML / Deep Learning
Q10. Which emerging technologies do you think are already or will soon impact your enterprise?
Taxonomy Practitioners Survey
No surprises given the hype around
AI… but is the hype
justified?
Linked Dataidentified as second highest impact on the enterprise…
are you prepared?
Participate today using link from pinned tweet at https://twitter.com/DavidClarkeBlog
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Taxonomy Practitioners Survey
• Most (80%) tagging at the document level, fewer (20%) indexing to page/paragraph
• Over 40% of enterprise taxonomies are public facing
• What tech do people think will impact them most: AI followed by Linked Data
• But, currently there is low adoption (10%) of ontologies and graphs that enable AI
• Low adoption of schema.org, despite it being key to SEO for public-facing content
• Big surprise: most tagging still being done by humans
• Most common aspiration is for better quality auto-categorization
Survey Insights
• 70% of enterprises are building traditional taxonomies and thesauri
• 18% are doing machine reasoning
• Even mix doing broad-bucket categorization versus highly-granular subject indexing
Participate today using link from pinned tweet at https://twitter.com/DavidClarkeBlog
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions New Technology
Q2. How can new technology
help?
https://medium.com/@jimmysong/why-blockchain-is-hard-60416ea4c5c
AI
Ontologies
Linked Data
Decentralized webMetadata management
Records managementProvenance & Governance
IP rights managementSupply chain management
Linked Data
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions AI & Human Intelligence
How might the development of new technologies such as AI affect how we do our work as information professionals?Primarily it should make some of the leg work easier. Being able to process a large amount of data in a shorter time than a human could, is going to be very helpful when it comes to the day-to-day work that we do. Machine learning and AI should also help us spot patterns in information that we may not otherwise notice - and it can help us simulate what the consequences of particular decisions might be.
However, machine learning and AI won’t be a silver bullet -we’ve already seen examples of algorithms being applied with unfortunate consequences, or machine learning classifying images in a way that is problematic when we consider culture, race and politics. So, we’ll need to be aware of the limits of what is possible, manage expectations, and take on new responsibilities for helping machines understand our world, our biases, and our morality.
http://www.taxonomybootcamp.com/London/2018/LatestNews.aspx?ID=1632
What's the most exciting change you've seen in the industry in the last few years?The deployment of knowledge graphs and non-visual interfaces have really brought the value of structured data to the fore. Voice and conversational interfaces are obviously very in vogue at the moment, and I think an under-appreciated aspect of these has to be the importance of structured data in powering these. I think more and more people are seeing the benefits of structured data in things like Google’s Knowledge Graph, Amazon’s ability to tell you which actors are on screen during a show you’re watching, and of course the perennial joy of doing a deep dive on Wikipedia.
Whilst ‘big data’ and ‘machine learning’ might be hogging the limelight at the moment, I think that the work of taxonomists and those who architect and develop structured data is quietly, gradually, revolutionising the kinds of things we do with computers and the Internet.
AI an Anecdotal Review
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions AI & Human Intelligence
From: Jodie Ruby [email protected]: Smarter AI requires a human touch
Date: 27 September 2018 at 17:03To: [email protected]
Not rendering correctly? View this email as a web page here.
Hi Dave,
Smarter AI requires even smarter data, but how does that data stay sharp in a rapidlychanging industry? Sometimes it takes a human touch to bridge the gap, and curatedcrowds are the perfect source for that bit of human tuning.
“Behind all this AI, humans still need to touch the data. We still need humanpower to teach computers the subtleties of identifying data, language,dialects, and so on.”- Kerri Reynolds, SVP Human Resources & Crowdsourcing, Appen
Visit our blog to see how human interaction creates smarter AI. Interested in learninghow a curated crowd could help improve your AI and machine learning initiatives? Get intouch with one of our experts today.
Warm Wishes,Jodie Rubyappen.com
Appen | 9 Help Street | Level 6 | Chatswood | NSW | 2067 | Australia
You received this email because you are subscribed to Industry Information from Appen.
Update your email preferences to choose the types of emails you receive.
Unsubscribe from all future emails
“Behind all this AI, humans still need to touch the data. We still need humanpower to teach computers the subtleties of identifying data, language,dialects, and so on.”
Kerri Reynolds – Appen
“A fully automated generation of ontologies from text corpora is not possible and won’t be possible in the next couple of decades. This is the same kind of A.I. promise that has failed many times before… Generation and maintenance of taxonomies and ontologies will always remain to the realm of human beings and their knowledge of the world.”
Andreas Blumauer – Pool Party
Synaptica: What do you think are the biggest challenges in the future? Hedden:“Growing interest in AI and machine learning and its impact on indexing and tagging. I feel there are limits in automatic technology and taxonomy creation.”
Heather Hedden – Author The Accidental Taxonomist
“…we’ve increasingly come across organizations that have been promised Artificial Intelligence (AI) capabilities, but have not realized them… The message we consistently hear, however, is that these [AI] tools haven’t lived up to the promise. Though the demos are impressive, the reality is deflating.”
Zach Wahl – Enterprise Knowledge
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions AI Interactive Primer
https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/an-executives-guide-to-ai
There are many different types of AI, and different types are more or less suited to different business applications, specifically some are just not relevant to taxonomy and semantics…
…learn about different use-cases with this interactive online guide from McKinsey
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions AI take away
AI is very relevant to what we do, but it will not replace the need for human curated taxonomies or ontologies.
On the contrary, it is taxonomies and ontologies that will empower AI with the semantics and logic to improve search, categorization and machine reasoning.
Make sure management in your organization understand this.
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Taxonomy & Ontology Analogy
Taxonomies have a singular top-down way of organizing information
…ontologies are graphs, they support many points of entry and many alternative pathways
Now try modelling the Washington Metro as a taxonomy… it won’t work because it is a graph
Do you need ontologies……depends on the type of problem you are trying to solve
an analogy…
…great for a classification or guided browse experiences
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Taxonomy & Ontology
Ont
olog
y
Taxonomy
The boundary between Taxonomy and Ontology can be a confusing grey area…
…distinguishing Property from Value Vocabularies may help
…we take an holistic approach and use ontology to design and build taxonomy
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Linked Data
Linked Open Data:a huge and growing set of re-useable
public domain ontologies and taxonomies
shared openly behind the firewall
LEDLinked
EnterpriseData
LODLinked Open
Data
Linked Enterprise Data:enterprises can benefit from the
data model even when their data is not open
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Ontology & Linked Data take aways
Build smarter search and discovery applications by leveraging the logical dependencies defined by ontologies.
Reduce costs and speed up project deliverables by reusing a vast and rapidly growing library of public domain property vocabularies (ontologies) and value vocabularies (taxonomies).
Simplify systems integrations by adopting open industry standard data models and portable data interchange formats.
++
+
Ontologies and Linked Data are highly relevant and offer many benefits including
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Tools
Q3. What tools do you need to succeed?
Taxonomy, Ontology and Linked Data Management Systems
Text Analytics, Auto-Categorization & Human Tagging Systems
Semantic Search, Recommenders, Visualization & Chatbots
Designing and building standards compliant taxonomies and ontologies and developing good categorization rules isn’t easy…
…good software tools will simplify the complexity.
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Taxonomy Tools
What to look for in Taxonomy Management Systems
Drag-and-drop editability
Collaboration & governance
Automated management reports
… and the ability to switch seamlessly between taxonomy editing andcategorization rule editing
See also Synaptica’s Top 100 Features Checklist at https://www.synaptica.com/resources/
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Ontology & Linked Data Tools
What to look for in Ontology & Linked Data Management Systems
An extensible library of public domain ontologies
Design your own taxonomy schemes using plug-and-play ontologies
Easy-to-use UI/X to simplify the complexities of…
… generating standards-
compliant RDF triples and
knowledge graphs.
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Categorization Tools
What to look for from Categorization & Text Analytics Tools
Categorization rules that are transparent easy to edit without having to learn esoteric syntax…
… a no-black-box principle allowing users to understand how rules work and quickly refine them.
Seamless integration between taxonomy management and categorization management workflows…
… a no-silo approach to reduce complexity and improve productivity.
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Tools
What to look for from Search and Discovery Tools
NLP to handle natural language queries.
Semantic search to improve precision recommend related content.
Taxonomy-driven IA including facetted navigation and query refinement.
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Tools take away
Software tools exist to help taxonomists and indexers be more effective and productive... they must simplify the complex.
They must use open industry standards for data portability and interoperability.
They should help taxonomists to become experts without having to learn esoteric code.
Search needs to take advantage of the semantics of taxonomy and ontologies to deliver smarter applications and a richer knowledge discovery experience.
© Synaptica LLC, 2018 www.synaptica.com
Synaptica Knowledge Organization Solutions Catching the wave
Dave ClarkeCEO Synaptica
Catching the waveTools and Technology for Taxonomists
Taxonomy Bootcamp Washington DCNovember 6, 2018
Thank You!and remember to take the survey
Participate in the survey today using link from pinned tweet at https://twitter.com/DavidClarkeBlog