language technologies for geomatics: from intelligence to agility

40
Language Technologies for Geomatics: From Intelligence to Agility Vision Géomatique - 2014-11-12 Stéphane Gagnon, Ph.D. Professeur, DSA, UQO

Upload: visiongeomatique2014

Post on 10-Jul-2015

214 views

Category:

Technology


0 download

DESCRIPTION

Language Technologies for Geomatics: From Intelligence to Agility

TRANSCRIPT

Language

Technologies for

Geomatics: From

Intelligence to Agility

Vision Géomatique - 2014-11-12

Stéphane Gagnon, Ph.D.

Professeur, DSA, UQO

Outline

1. Business Intelligence

2. Language Technologies

3. Geomatics Applications

4. Big Data and Geo-Agility

2014-11-122 Stéphane Gagnon

Abstract Language Technologies are used for automated text

analytics, and rely on a blend of Linguistics, Artificial

Intelligence (AI), and Decision Sciences.

They include such applications as content

management, document indexing and search, text classification, automated translation, geographic and

contextual localization, semantic web, real-time text

stream processing, event patterns analysis, and others.

We present a brief discussion of how Language

Technologies may be integrated with geomatics

applications, not simply to enhance business and

decisional intelligence, but with the aim of making organizations more agile and resilient in the face of risk

and uncertainty.

2014-11-12Stéphane Gagnon3

Sources Baccalauréat en administration - Systèmes d'information de gestion

SIG1003 - Systèmes d'information pour gestionnaires

Efraim Turban, Linda Volonino, Gregory Wood, et Janice Sipior,

(2013), Information technology for management: Advancing

sustainable, profitable business growth, 9th edition, New York, Wiley, 480 pages, ISBN: 9781118547861

SIG1043 - Intelligence d’affaires

Ramesh Sharda, Dursun Delen, Efraim Turban, (2013), Business

Intelligence: A Managerial Perspective on Analytics, CourseSmart

eTextbook, 3rd edition, New York, Pearson Higher Education, 330 pages, ISBN: 9780133051070

2014-11-12Stéphane Gagnon4

Stéphane Gagnon

1. Business Intelligence

Stéphane Gagnon

2014-11-125

Goals of BI

2014-11-12Stéphane Gagnon6

BI Evolution

2014-11-12Stéphane Gagnon7

Modern BI Dashboard

2014-11-12Stéphane Gagnon8

BI Project Lifecycle

2014-11-12Stéphane Gagnon9

Typical BI Architecture

2014-11-12Stéphane Gagnon10

Data Warehouse

Technical staff

Data Warehouse Environment

DataSources

Business Analytics Environment

Performance and Strategy

Business users Managers / executives

Built the data warehouse Access

ManipulationResults

BPM strategyü Organizingü Summarizingü Standardizing

Future component intelligent systems

User Interface - browser

- portal - dashboard

BI Data Management

2014-11-12Stéphane Gagnon11

Stéphane Gagnon

2. Language Technologies

Stéphane Gagnon

2014-11-1212

Data Mining (DM)

2014-11-12Stéphane Gagnon13

Sta

tistic

s

Management Science &

Information Systems

Artificial Intelligence

Databases

Pattern

Recognition

Machine

Learning

Mathematical

Modeling

DATA

MINING

DM Tasks

2014-11-12Stéphane Gagnon14

Data Mining

Prediction

Classification

Regression

Clustering

Association

Link analysis

Sequence analysis

Learning Method Popular Algorithms

Supervised

Supervised

Supervised

Unsupervised

Unsupervised

Unsupervised

Unsupervised

Decision trees, ANN/MLP, SVM, Rough

sets, Genetic Algorithms

Linear/Nonlinear Regression, Regression

trees, ANN/MLP, SVM

Expectation Maximization, Apriory

Algorithm, Graph-based Matching

Apriory Algorithm, FP-Growth technique

K-means, ANN/SOM

Outlier analysis Unsupervised K-means, Expectation Maximization (EM)

Apriory, OneR, ZeroR, Eclat

Classification and Regression Trees,

ANN, SVM, Genetic Algorithms

Language Technologies

Statistical Methods

Analyze documents as bags of

words

Semantic Methods

Analyze documents using tags from

ontologies describing relationships

2014-11-12Stéphane Gagnon15

Statistical Methods

Information retrieval/search

Topic/keyword tracking

Geo-language recognition

Categorization/classification

Clustering/recommendation

Concept linking/association rules

2014-11-12Stéphane Gagnon16

Semantic MethodsNatural Language Processing (NLP)

Part-of-speech tagging

Text segmentation

Word sense disambiguation

Syntax ambiguity

Imperfect or irregular input

Speech acts

2014-11-12Stéphane Gagnon17

NLP Tasks

Information extraction

Named-entity recognition

Question answering

Automatic summarization

Natural language generation & understanding

Machine translation

Foreign language reading & writing

Speech recognition

Text proofing, optical character recognition

Sentiment analysis

2014-11-12Stéphane Gagnon18

Text Mining (TM) Process

2014-11-12Stéphane Gagnon19

Establish the Corpus:

Collect & Organize the

Domain Specific

Unstructured Data

Create the Term-

Document Matrix:

Introduce Structure

to the Corpus

Extract Knowledge:

Discover Novel

Patterns from the

T-D Matrix

The inputs to the process

includes a variety of relevant

unstructured (and semi-

structured) data sources such

as text, XML, HTML, etc.

The output of the Task 1 is a

collection of documents in

some digitized format for

computer processing

The output of the Task 2 is a

flat file called term-document

matrix where the cells are

populated with the term

frequencies

The output of Task 3 is a

number of problem specific

classification, association,

clustering models and

visualizations

Task 1 Task 2 Task 3

FeedbackFeedback

Enterprise Index/Search

2014-11-12Stéphane Gagnon20

Web Mining

2014-11-12Stéphane Gagnon21

Web

Analytics

Voice of

Customer

Customer Experience

Management

Customer Interaction

on the Web

Analysis of Interactions Knowledge about the Holistic

View of the Customer

IBM Watson QA

2014-11-12Stéphane Gagnon22

Trained models

Question

analysis

Hypothesis

generation

Query

decomposition

Soft

filtering

Hypothesis and

evidence scoringSynthesis

Final merging

and ranking

Answer and

confidence

... ... ...

Hypothesis

generation

Soft

filtering

Hypothesis and

evidence scoring

Answer sources

Evidence sources

Primary

search

Candidate

answer

generation

Support

evidence

retrieval

Deep

evidence

scoringQuestion

12

34

5

TM for Lies

2014-11-12Stéphane Gagnon23

Statements

Transcribed for

Processing

Text Processing

Software Identified

Cues in Statements

Statements Labeled as

Truthful or Deceptive

By Law Enforcement

Text Processing

Software Generated

Quantified Cues

Classification Models

Trained and Tested on

Quantified Cues

Cues Extracted &

Selected

Stéphane Gagnon

3. Geomatics Applications

Stéphane Gagnon

2014-11-1224

Geo-Analytics

2014-11-12Stéphane Gagnon25

Geo-Textual Contextualization

2014-11-12Stéphane Gagnon26

Extract knowledge from available data sources

A0

Unstructured data (text)

Structured data (databases)

Context-specific knowledge

Software/hardware lim itations

Privacy issues

Tools and techniques

Dom ain expertise

Linguistic lim itations

Geo-Localized Contents

Geographic Information

Geo-Intelligence Models

Geo-Information Sensors

Geo-Social Network Analysis

2014-11-12Stéphane Gagnon27

Geo-Analytics of Voter Talk

2014-11-12Stéphane Gagnon28

INPUT: Data Sources

§ Census dataPopulation specifics, age, race, sex, income, etc.

§ Election DatabasesParty affiliations, previous election outcomes, trends and distributions

§ Market research Polls, recent trends and movements

§ Social mediaFacebook, Twitter, LinkedIn, Newsgroups, Blogs, etc.

§ Web (in general)Web pages, posts and replies, search trends, etc.

· Other data sources

OUTPUT: Goals

§ Raise money contributions§ Increase number of

volunteers§ Organize movements§ Mobilize voters to get out

and vote§ Other goals and objectives§ ...

Big Data & Analytics

(Data Mining, Web Mining, Text Mining, Multi-media Mining)

§ Predicting outcomes and trends

§ Identifying associations between events and outcomes

§ Assessing and measuring the sentiments

§ Profiling (clustering) groups with similar behavioral patterns

§ Other knowledge nuggets

Geo-Contextualized

Text and Voice

Messages

2014-11-12Stéphane Gagnon29

Geo-Analytics of Test Reports

2014-11-12Stéphane Gagnon30

Stéphane Gagnon

4. Big Data and Geo-Agility

Stéphane Gagnon

2014-11-1231

Competitive Advantage

2014-11-12Stéphane Gagnon32

Pressures for Agility

2014-11-12Stéphane Gagnon33

BI and Agility Process efficiency and cost reduction

Brand management

Revenue maximization, cross-selling/up-selling

Enhanced customer experience

Churn identification, customer recruiting

Improved customer service

Identifying new products and market opportunities

Risk management

Regulatory compliance

Enhanced security capabilities

2014-11-12Stéphane Gagnon34

Big Data - Definition

Big Data means different things to people

with different backgrounds and interests

Traditionally, “Big Data” = Giga and Tera

E.g., volume of data at CERN, NASA, Google, …

The Vs that define Big Data

Volume

Variety

Velocity

Veracity

Variability

Value

2014-11-12Stéphane Gagnon35

Stéphane Gagnon

Big Data Examples

Data Sources

Web text documents

Multimedia annotations

Web logs

RFID

GPS systems

Sensor networks

Social networks

Internet search indexes

Detail call records

Application Domains

Financial markets

Broadcasting

Biology and life sciences

Healthcare informatics

Transportation

Security and defense

Atmospheric science

Genomics and research

Energy and SCADA

2014-11-1236

Big Data Architecture

2014-11-12Stéphane Gagnon37

Math

and Stats

Data

Mining

Business

Intelligence

Applications

Languages

Marketing

ANALYTIC TOOLS & APPS USERS

DISCOVERY PLATFORM

INTEGRATED DATA WAREHOUSE

DATAPLATFORM

ACCESSMANAGEMOVE

UNIFIED DATA ARCHITECTURESystem Conceptual View

Marketing

Executives

Operational

Systems

Frontline

Workers

Customers

Partners

Engineers

Data

Scientists

Business

Analysts

EVENT PROCESSING

ERPERP

SCM

CRM

Images

Audio

and Video

Machine

Logs

Text

Web and

Social

BIG DATA SOURCES

ERP

Big Data Requirements

2014-11-12Stéphane Gagnon38

Keys to Success with Big Data

Analytics

A Clear business need

Strong, committed sponsorship

Alignment between the

business and IT strategy

A fact-based decision-making

culture

A strong data infrastructure

The right analytics tools

Personnel with advanced

analytical skills

Conclusion: Toward Geo-Agility People-Centric: Track geo-information from key

individuals and assets across/around the enterprise

Contextualize: Add geo-info to unstructured contents,

use DM and TM with geo-analytics

Exploration: Link contextualized geo-info to real-time decision requirements

Open: Leverage open and mobile sources

Big Data: Make real-time streaming capabilities

Event-Driven: Develop organization agility and resilience,

capability to automate adaptation

Emergent Strategies: Adapt business strategy along with evidence-based decision-making

2014-11-12Stéphane Gagnon39

Merci!Stéphane Gagnon, Ph.D.

Professeur agrégé

Département des sciences administratives

Université du Québec en Outaouais

Pavillon Lucien-Brault

101, rue St-Jean-Bosco, Local A2228

C.P. 1250, succursale Hull

Gatineau (Québec) J8X 3X7

Téléphone: 819-595-3900, poste 1942

Télécopieur: 819-773-1747

Courriel: [email protected]

Web: http://www.gagnontech.org

Skype: stephanegagnon

Crédits des photos: SJ: http://www.lgt.ws, AT et LB: http://www.flickr.com/photos/uqo/

2014-11-1240 Stéphane Gagnon