veda semantics - introduction document

40
Building intelligence through semantics Text Analytic s Text Analytics Ontology Building Context Analysis Sentiment Analysis Machine Learning

Upload: rajatkr

Post on 27-Jan-2015

347 views

Category:

Technology


0 download

DESCRIPTION

As more and more organizations move from recognizing that unstructured data exists, and remains untapped, the field of semantic technology and text analysis capabilities is

TRANSCRIPT

Page 1: Veda Semantics - introduction document

Building intelligence through semantics

Text Analytic

s

Text Analytics

OntologyBuilding

Context Analysis

Sentiment Analysis

MachineLearning

Page 2: Veda Semantics - introduction document

• A semantic technology service provider leveraging its capabilities to provide standardized and bespoke solutions

• One of 5 companies worldwide named as Semantic Application Specialists by Gartner (Who’s Who of Text Analytics, September 2012)

About Veda

Who we are

• Started as a JV with the Fraunhofer Institute, Germany• Earlier part of 3i Infotech, a large listed IT form. Acquired by current promoters as

part of a management buy out

Formation and

background

Location

Team

• Headquartered in Bangalore, India’s software capital, with ready access to critical talent

• Currently a 20 member team, also having a sales presence in Chicago, USA. Key members of technology team each have over a decade’s worth of experience in semantic technology

Awards and references

3

Page 3: Veda Semantics - introduction document

Unstructured Data:• Consists of textual

information like contracts, emails, presentations

• 70% of organizations’ information remains in an unstructured form hence it is not utilized at all.

~70%

~30%

Structured Data:• Consists of information

from ERP, CRM systems, XML data

• It is organized and manageable

• Currently only 30% of organizations’ information is analysed for decision making

Enterprise’ Information Distribution

5

Are we using only structured data for decision making? What are the critical misses that are made as a result?

Page 4: Veda Semantics - introduction document

What is hidden in unstructured data

6

• Customer complaints

• Employee feedback

• Brand perception

• Financial data from reports

• Competitive news

• Information

• Facts

• Events etc.

• And many many more….

• Insights

• Opportunities

• Risks

Examples of unstructured data What it contains

• Just the things needed for good decision making!

Page 5: Veda Semantics - introduction document

Semantics – making sense of unstructured data

• Semantics is the study of meaning. It focuses on the relation between signifiers, like words, phrases, signs, and symbols, and what they stand for their denotation. [Wikipedia]

• SEMANTICS = MEANING• It is about describing things

• In linguistics, semantics is the subfield that is devoted to the study of meaning as inherent at the levels of words, phrases, sentences, and larger units of discourse.

7

Page 6: Veda Semantics - introduction document

Industry Overview - Need for Semantic Technology

• Heterogeneous• Distributed• Unorganized

• Increasing numbers• Increasing Sources• Unmanageable

• Keyword search is inefficient• Lack of Classification and relevance• Focus on “Search” rather than “Find”

Information overload

High data volumes

Inefficient retrieval

The definition of ‘Data’, which had been artificially restricted to only numerical data, can now extend to text and other unstructured data as well…

…Providing more insights and richness for decision making

8

Page 7: Veda Semantics - introduction document

9

Top 9 Technology Trends Likely to Impact Information Management in 2013

Technology Trend

Big Data

Modern information infrastructure

Semantic technologies

The logical data warehouse

NoSQL DBMSs

In-memory computing

Chief data officer and other information-centric roles

Information stewardship applications

Information valuation / infonomics

Source: Gartner

Page 8: Veda Semantics - introduction document

Broadly, text based offerings can be clubbed under two main heads

10

Statistical text mining Natural language processing

• Looks for documents based on statistical techniques.

• Helps identify high frequency terms or expressions

• Identifies other terms being used in conjunction with them

• Assigns match probability to documents based on mathematical techniques to facilitate searches and knowledge management

• Accuracy could be improved further by using machine learning principles

• Parses a sentence to identify nature of words in it

• More relevant for sentence level analysis as opposed to document level analysis

• Principles of English, as opposed to statistical techniques, take precedence in analysis

• Accuracy dependent on strengths of algorithms written

• Primary applications: Text mining and document matching (eg VoC analysis, Email analysis, E Discovery, etc)

• Primary applications: Named Entity Extraction (knowledge management), Sentiment analysis (VoC analysis, E mail monitoring, etc)

Page 9: Veda Semantics - introduction document

Industry Overview – usual application areas

Areas Technique used

11

Marketing

Compliance

Risk analysis, Fraud detection

Social media analyticsBetter advertising placementCRM information capture and action

E DiscoveryAuto classificationForensic analysis

Knowledge Management

Pattern analysisPredictive modelling

Auto tagging and classificationDiscovery (eg healthcare information sharing)

Sentiment Analysis using NLPCoupled with vertical specific taxonomies

Statistical text miningNamed Entity Recognition (NER) Machine learning

Statistical text mining Named Entity RecognitionCoupled with structured data (e.g. frequency of mails, department information, etc)

NER (for named entities)Statistical text miningCustom ontologies / semantic networks

Vertical specific use cases

Examples:Financial services, Publishing, Pharma, Healthcare, Legal, Insurance, etc

Various degrees of text mining, NLP and sentiment analysis, and entity extraction techniques

Page 10: Veda Semantics - introduction document

But purely from an R&D perspective, quality thresholds have a very high standard deviation

NLP

eDiscovery

Ontology

• Attaching sentiment to attribute, and attribute to object• Handling basic keywords (e.g. I like something, vs. something is like another)• Vertical taxonomies that allow aggregation• Vertical specific sentiment words (e.g. executing a man vs. executing a

transaction, high fuel economy vs. high fuel consumption)

High variability in Recall and Precision ratesTagging of concepts remains difficultSummarization techniques based on basic lexical parsing

Limited use casesOften seen as multi year projects as opposed to quick win areas

12

Page 11: Veda Semantics - introduction document

The reason for the quality difference is that at many times, client context is not fully understood and the software is not trained on such context

13

• What is the primary purpose for which the tool will be used for: finding trends, better search, forensics, fraud prevention, building predictive models, etc

• Are certain terms so common that they must be ignored while doing an analysis

• Are there domain specific words that attain a different meaning than in other domains (eg ‘execution’ has a different meaning in financial services than in the news domain)

• Should weightages assigned to certain kinds of documents / words be increased to improve relevance

• How will the results be presented – are they to be shown visually and not be connected to other enterprise systems, or should they be an integrated part of the overall BI roadmap of an organization

Unlike traditional systems, text analytics has a large dependency on context. Consequently, in order to unleash its full potential, the usual bifurcation between consultancy, software development and software implementation must disappear in the case of text analytics. An off-the-shelf product approach will definitely not help, and one must adopt a services model to better serve client needs!

Page 12: Veda Semantics - introduction document

In addition, there is limited focus on client needs and use cases

• Companies mostly founded and run by technology experts

• Focus on technology capability and terms as opposed to problems to be solved

Technology focused

• Leave out value to be derived by examining enterprise specific data more closely, or integrating it with structured data for greater insights

Product approach

Customer language

14

Page 13: Veda Semantics - introduction document

16

An example of our Natural Language Processing capabilities

“The car model looks like the old one”

“I loved the food, but the service was terrible”

“Did anyone like the car?”

“I really luuuuv it”

“The Tokyo office does not like the current prototype of the product. Bob said we should talk to them to find out why they are unhappy. Must close this ASAP to get the launch done by August 2013.”

• Can tag sentiments to attributes, and attributes to products

• Can handle difficult words, eg ‘like’ based on context – most engines cannot

• Can handle anaphora resolution (eg pronouns)

• Can handle Named Entity Recognition with high recall and precision

IP protection:

• Patent being filed for clause based sentiment extraction process

Page 14: Veda Semantics - introduction document

17

Our Discovery product demonstrates the NLP capability in a powerful manner, making consumer feedback actionable

• In this example about a vehicle, most people care about comfort, and luckily, the product gets mostly positive reviews in this area

• Clickthrough allows deeper dives into each category

• Though price gets mainly negative reviews, not too many people seem to talk about it. Perhaps a discount scheme could help?

• Actual sentences are displayed, and things to which the sentiments are attached are highlighted

• Sentiments are associated with specific aspects of the product

Page 15: Veda Semantics - introduction document

Example of Natural Language Processing in Financial Domain (continuing R&D)

18

Extracts economic factors that have been impacted

Recommendations and predictions help analyze complex financial information in quickest time.

Helps in predictive analytics

Page 16: Veda Semantics - introduction document

Linguistic rules to extract financial / economic indicators

Domain specific verbs and nouns to understand movement

Financial markets rebounded strongly in 2006's third quarter .

FINANCE ENT : Financial markets

ACTION : rebounded

TIME : 2006's third quarter

MOVEMENT : UP

By the end of the third quarter , crude oil had fallen over 20 %

from its[crude_oil] July peak , while a similar retreat in natural

gas prices produced the latest high-profile hedge fund debacle .

FINANCE ENT : crude oil

ACTION : had fallen

TIME : the end of the third quarter

QUANTITY : 20 %

MOVEMENT : DOWN

FINANCE ENT : natural gas prices

ACTION : produced the latest high-profile hedge fund debacle

MOVEMENT : DOWN

Prices of longer-dated bonds rallied too : the 10-year U. S.

Treasury bond yield fell over 60 basis points during the third

quarter .

FINANCE ENT : Prices of longer-dated bonds

ACTION : rallied

MOVEMENT : UP

FINANCE ENT : the 10-year U. S. Treasury bond yield

ACTION : fell over 60 basis points

TIME : the third quarter

QUANTITY : 60 basis points

MOVEMENT : DOWN

Example of Natural Language Processing in Financial Domain – highlighting outlook by driver (continuing R&D)

Page 17: Veda Semantics - introduction document

As the fourth quarter begins , financial markets remain supported by

positive earnings and interest rate trends .

FINANCE ENT : financial markets

ACTION : remain supported

TIME : the fourth quarter

CAUSE : positive earnings and interest rate trends

EFFECT : financial markets remain supported

However , the pace of U. S. economic activity will slow further by

year-end as weakness in the housing and automotive sectors becomes

increasingly acute .

FINANCE ENT : the pace of U. S. economic activity

ACTION : will slow

TIME : year-end

MOVEMENT : DOWN

CAUSE : weakness in the housing and automotive sectors becomes

increasingly acute .

EFFECT : the pace of U. S. economic activity will slow year-end

Example of Natural Language Processing in Financial Domain -extracting Cause and Effect (continuing R&D)

20

Page 18: Veda Semantics - introduction document

An example of our Enterprise capabilities

• Ontology modeling using RDF and OWL semantic web standards

• Document Matching / Similarity using statistical models and concept based approach for Patent Search, Knowledge Management etc..

• Information Extraction using linguistic models for Fraud Detection, analysis of news stories etc..

• Demonstrated capability for patent search, legal cases, handling survey data

• Machine learning capability allows for precision to be attuned and increased for specific client situations

• Can disambiguate based on domain specific situations, e.g. execution may mean a different thing in a news domain, vs. executing a transaction in financial services domain

21

Page 19: Veda Semantics - introduction document

22

Veda Text Mining capability – key features

• Data input in various forms (eg txt, doc, etc)• Can accept data from public sources (eg Facebook, Twitter) apart from Enterprise sourcesInput

Preprocessing

Processing

Categorization

UI, editing and export

• Removal of junk text around emails• Removal of small Emails like “Thanks” • Removal of forwarded Emails attached to main Email from analysis• Spell checks and autocorrects• Language parsing for English

• Natural Language and Statistical Processing techniques • Extraction of key discussion items from the text, and what is being said in relation to them• Key themes from messages and semantic chaining. Can be combined with sentiment analysis as well.• Ability to handle high velocity and high volume data using Big Data infrastructure (Hadoop, Storm, etc.)

• Group discussion items into categories and sub categories, while identifying what is being said about them:• Automatic for synonyms, singular and plural, etc• Ability to add / delete categories• Ability to further analyse sub-categories

• Simple, easy custom built UI with filtering and drill down capability• Machine learning approach where human insight guides further results• Output not only available in visual format, but exportable to other applications or databases

Page 20: Veda Semantics - introduction document

23

Veda Text Mining capability – screens of analysis in progress

Clustering conversations into categories using semantic analysis.

Example customized outputs

Page 21: Veda Semantics - introduction document

Proof of Concept

Trial & Demonstration Delivery Methodology

High-level client requirements Detailed solution requirements

- Define the scope of work - Delivery framework (core offering + value added services)

- Documented External Interfaces with Volume and associated recurring cost (if any) information

- User Guide & Training

- Proof of concept - Methodology (Agile, Waterfall approach or client specified approach)

- Timelines for each deliverable

- Responsibility Matrix

Our Delivery Capabilities

24

Page 22: Veda Semantics - introduction document

Test & Verify

Analysis and

Design

Business Require-

ments

Machine Learning

ReleasePost

Release Support

Project Closure

Data Set Creation

Develop-ment

Feature Selection

Project Kick-offProject

Delivery

Program Initiation

Program Benefits Tracking

Program Mgmt

Program HR

Mgmt

Change Analysis

Program Activities

Infrastructure Readiness

Support Delivery

TrainingOperational Readiness

Support Activities

Delivery Methodology

Client assignments

25

Page 23: Veda Semantics - introduction document

Ph

ase

1 Veda will solve a business challenge you choose to demonstrate the power of a semantics based solutions in a quick turn around (Typically within few days)exercise

Ph

ase

2 Taking the next step

*Implement for a business function/division/a single geography

*Multiple features of SIS implemented including cross business solutions leading to concrete measurable gains

Ph

ase

3 Replicating the success of the previous phase –

*Across Larger Sections of the enterprise

*Wider Data consolidation scope

*Multiple output delivery channels

*Visible long term gains

For bespoke development, we are prepared to start small, to show clients clear value and RoI

26

Page 24: Veda Semantics - introduction document

* Collecting unstructured data from disparate sources

* Analyse all collected unstructured data, Organize it using rich knowledge representation/domain ontologies

* Insights from Unstructured data coupled with Analytics from Structured Data assets (E.g. BI, Big Data)

27

But ultimately, we believe that clients will benefit considerably by a unified Semantic Information System

Marketing Purchasing Payroll

Data Mart

Data Mart

Data Mart

Unstructured data

(Server,SAN,SAS)

InternetPublic Web Data

Databases

Databases

Databases

Web Crawler

Email Crawler

Files Crawler

Social Media

Crawler

Visual Segregation

Veda Collection Processes Veda Organising Processes

LOB Applications

Staging Area Data Warehouse Reporting

Unstructured & Semi-Structured Data

Data

Unstructured Data

Online

Store into Cubes Processed data

Formatted data

Processed data Processed data Ready insights

Data

Categorized Data

Dashboards

Alerts

Social Media

chatter

Natural Language processing

Ontologies

Semantic Analysis

Knowledge Base

Auto Classification

OperationsSales

Stru

ctur

ed D

ata

Stru

ctur

ed D

ata

Stru

ctur

ed D

ata

Structured data

Page 25: Veda Semantics - introduction document

Our proprietary Collect – Organize- Present framework and tools allow us to undertake quick bespoke development

• Connectors— Collect information from variety of (heterogeneous) sources

• Information Extraction— Using NLP and semantic analysis

• Semantic Net / Ontology Editor— Smart knowledge representation of a domain

• Auto Classifier— Classify data and tag it to industry specific concepts automatically

• Ontology Reasoning— Analyze industry knowledge and infer from ontological knowledge

• Analytics— Identify various patterns and insights from the data

• Semantic Matching— Provide most relevant information

• Semantic Search and Browsing— Semantic explorer to retrieve contextual concept-based information

Collect

Organize

Present

28

Veda Approach – COP Framework

Page 26: Veda Semantics - introduction document

• Deep understanding of the Semantics space

• In the semantic technology space for more than a decade

• Expertise in both NLP and ontologies / taxonomies, and in standards (RDF / OWL)

• Team has provided services not only to clients, but to other semantic service providers

• Tie up with academia

• Tie up with leading Indian university in the area

• Allows for cutting edge R&D

• High quality talent pipeline

• Live - Delivery and Support Turnaround

— The Veda Platform is the core that— Is a solution accelerator giving a head start to all our assignments (tested and

certified components)— Allows for lower costs— Allows for incremental rollouts

29

Veda’s Value Proposition

Technology

Delivery

Page 27: Veda Semantics - introduction document

• Expertise in Multiple Business Domains

• Healthy mix of business and technology expertise – can provide clear use cases for

Semantics and help establish clear RoI metrics

• Core team members have had experience in Semantic technology since 2003, longer

than most other companies

• Technology team experienced in providing expertise in a wide variety of business

domains leading to speedy and effective solution implementations

• Located in India, with associated inherent advantages

• Lower cost options for clients with onshore – offshore model

• 24 hour work cycle

• Large talent pool

• Tie ups with companies focused on various other related technologies to offer

integrated offerings, eg full service offering / working with offshore vendor to make

outsourced processes more efficient using semantics

30

Veda’s Value Proposition (contd)

Experience

Location

Page 28: Veda Semantics - introduction document

• Text Analytics — Analyzing unstructured text, converting to structured data

• Machine learning— Statistical techniques resulting in increasing accuracy over time (with more inputs)

• Sentiment Analysis— Identifying if the sentiment of a sentence is positive, negative or neutral (and the various shades

in between)

• Semantic Information Retrieval— More artifacts searched/More accurate – e- Mails, Documents, Spreadsheets, Output from

existing structured data sources

• Semantic Web Standards— Standardized storage and output formats for easier information sharing

Veda’s End-to-End Semantic Expertise

32

Page 29: Veda Semantics - introduction document

Past Experience

Client Profile Project Description

A global publishing house in legal, tax, finance and healthcare

Context-based content research platform for tax & legal domain Automatic meta-tagging , ontology modeling and ontology driven

content reference system.

A prominent product manufacturer on inference and reasoning engine

Leveraged semantics for a supply chain process to integrate systems with heterogeneous data sources and help in automatic decision making in case of any disruptions in the cycle.

Provided ontology modeling and application development services.

A reputed university and complex systems research lab in Australia

Produced a method for organizing and potentially navigating the wide range of web-pages associated with the Murray-Darling river system in a seamless fashion

An analytics software manufacturer in Australia

Assist investigation of fraud and terrorism – Establishing links between entities

Unstructured data analysis

A premier worldwide online providers of news, information, communication, entertainment and shopping services

Developed a web analytics platform for analyzing click-stream data in real-time.

33

Page 30: Veda Semantics - introduction document

Some sample use cases mapped to our current technology demonstrators

Legal contracts

Current situation How Semantics will help Mapping to current Veda technology demonstrator

• Saved in C drives or in DMS, separate excel sheets maintained to check on timely renewals, etc.

• Tough to compare specific clauses across contracts or find relevant clause as needed

• Search for specific kind of contract and specific clause will throw up (a) master template (b) earlier contracts entered into in the area (c) extracts from the relevant clause

• Patent search demonstrator uses similar techniques, allowing the user to also see probabilistic match of documents

Process changes

• Dig deep into embedded code to see what departments and areas will get impacted

• Ontology based relational steps make it easy to see connected departments, processes, etc. that will be impacted

• Tax caselaw and section ontology created

Marketing

• Mapping social sentiment and reviews done manually or using dictionary based social monitoring tools

• Some social marketing and social listening already being done, though not accurate. A better quality NLP engine allows for more accurate results (e.g. the word ‘like’).

• Veda Discovery Engine which has sentiment capabilities

HR

• Obtaining right resumes using keyword search remains time consuming

• Employee suggestions in open ended surveys not aggregatable

• Qualitative comments in employee evaluations not aggregated

• Identify key intervention areas at aggregate levels

• Map trends in overall ratings to key strength and weakness areas

• Veda Discovery for aggregation, Veda Txt for identification of gist of comments

Knowledge management

• Metatagging remains a manual process and as a result, searches remain searches, not findings

• Automatic metatagging (Persons, Locations, Organizations, concepts, etc.)

• Veda Discovery – NER Engine, Veda Legal demonstrator, Veda Msg (for alerts)

34

Page 31: Veda Semantics - introduction document

Domain Description

Publishing, media

Allows automatic extraction of people, location, dates and events, being extended to themes and concepts. Helps in automatic metatagging.• Current tagging process is manual and time consuming. Technology provides clear RoI

by reducing this time and manual labour, providing consistent tagging, and allowing easier search for future reference, rather than relying on keywords (eg Mahatma vsGandhi vs Mahatma Gandhi).

Oil and Gas Can make Incident monitoring and reporting systems more robust, thereby reducing risk of major accidents• For incident reporting, a user need not fill in multiple structured data fields. Text

analytics can quickly match data to structured inputs.• Witness reports, once converted to text, can be monitored across incidents for patters

that would otherwise have gone unnoticed. Helps make process changes easier and allows all linked aspects to be seen at one go• Helps determine what other processes and safety regulations are relevant if a sub

process is sought to be changed (could also include contractual information etc if relevant)

Usually, companies have millions of oil well logs which can be classified by performing named entity extraction and enrichment

Sample use cases by industries

35

Page 32: Veda Semantics - introduction document

Domain Description

Financial services • Contract matching (including addendums)• VoC analysis

• Churn prediction• Highlights capability gaps

• Promotion management• Avoids duplication of creation of similar material across divisions / locations. Saving in man

hours and resources by leveraging all available material produced earlier• Risk analysis

• Manage and gather customer documents from various sources to look for areas of concern• “Know your customer” analysis• Competitor analysis• Financial news analysis for investment managers

Telecom • Legal interception and pattern recognition• SMS analyses for recognizing spam to avoid penalties • VoC analysis

Airlines • Analysis of unstructured problem and safety logs to avoid incidents

Sample use cases by industries

36

Page 33: Veda Semantics - introduction document

Domain Description

Healthcare • Link and compare patient records to obtain insights on:• Symptoms, medicines and discharge times to determine if some medication mixes may be

more beneficial than others across a wide set of patient records• why some patients may be re-admitted

Pharma • R&D improvement by allowing scientists, who need to refer to papers but may not know exactly what to look for, to see relevant topics (based on automatic metatagging, and linked ontology at the backend)

• Better knowledge management - automatically tag papers, saving scientist time and making search consistent

• Feedback analysis for product from distributors, doctors and end patients

Insurance • Broker document analysis to deepen insight on insured risks to improve risk management

Sample use cases by industries

37

Page 34: Veda Semantics - introduction document

Domain Description

Marketing • Voice of Customer analysis• New product ideas• Competitor analysis• Complaint monitoring

HR • Drawing insights from employee suggestions• Analysing unstructured inputs in evaluations and improving training efficacy

Risk • Internal document monitoring for risk and compliance

Legal • Better contract management

Sample functional use cases

38

Page 35: Veda Semantics - introduction document

• Configurable to any Business requirement across Industries

• Sources of content can be structured AND Unstructured

• Can be integrated to various Business Applications - ERP, Content Management, Portals, etc..

• Configurable User Interface with features such as:

– Saving of Search for later reference

– Tabbed Views

– No. of results to be displayed with sort order

Veda Solutions Currently Deployed

Veda for Business Process Workflow

39

Page 36: Veda Semantics - introduction document

Veda Social Media Analytics Registration & log in

Inputs from Social Media

Inputs from Blogs, Websites

Hierarchy & Relevance Analysis

Sentiment Analysis

Rich Reporting

Veda Solutions Currently Deployed

40

Page 37: Veda Semantics - introduction document

Veda Recruiter

Veda Solutions Currently Deployed

41

Page 38: Veda Semantics - introduction document

Veda Patent Search

Registration & log in

Subscription

Payment Gateway

Keyword Search

Semantic Search

Rich Internet Application

Saved Search

Filters

Veda Solutions Currently Deployed

42

Page 39: Veda Semantics - introduction document

Veda SMS Service

Registration & log in

Subscription

Payment Gateway

Keyword Search

Semantic Search

Legal ontology (Indian)

Filters

Veda Solutions Currently Deployed

• Crunches judgment text into high relevance words that can be sent through an SMS for immediate access

• Is combined with website service offering full access for relevant cases

44

Page 40: Veda Semantics - introduction document

Veda Semantics Pvt Ltd

www.vedasemantics.com

Contact person:Rajat Kumar (CEO)[email protected]# +91-9619308745

Contact details

45