a presentation by w h inmon bridging the gap between unstructured data and structured data

30
A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Post on 18-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

A presentation byW H Inmon

BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Page 2: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

- unstructured data

- .doc files- .txt files- .xls files- email- transcripted telephone

The informal systems of the corporation:

Email

.Txt

.Doc

- structured systems- structured data

- corporate transactions- corporate reports- corporate databases -customer files- audit reports

The formal systems of a corporation:

Program

Page 3: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

It is estimated that less than 20% of corporatesystems are structured.

80%

Email

.Txt

.Doc

20%

Program

Page 4: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Email

.Txt

.Doc

searchengines

legal discovery

email archive

taxonomy

ontology

document mgmt

web content

Program

dbms

businessintelligence

applications

transactionsOLTP

ERP

compliance

imagine what would happen if thetwo worlds could be integrated…….

the world of dbms, analytics, and other processing opens up.

Page 5: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Email

.Txt

.Doc

searchengines

legal discovery

email archive

taxonomy

ontology

document mgmt

web content

Program

dbms

businessintelligence

applications

transactionsOLTP

ERP

compliance

Email

.Txt

.Doc

tight integration betweenthe two types of data.

Page 6: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

There is a gulf between the two worlds: - technology - business practice - organizational - historical

Email

.Txt

.Doc

Program

Page 7: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Think of the possibilities!

Email

.Txt

.Doc

Program

Page 8: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Imagine this -

Reports and visualization show a lot.

have you ever wondered why youcan’t hook up your Business Objects toemail? or telephone conversations?

Page 9: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Email

.Txt

.Doc

text

numbers

There is a fundamental disconnect between unstructured dataand business intelligence.

So what would happen if we had powerful visualizationfor text?

BusinessIntelligence

Page 10: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA
Page 11: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

liver cancer

skin cancer

thirst

diabetes

blood pressure

correlative information becomesvery easy to spot

Page 12: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

for the general population

for women

for women who smoke

for women who smokeover the age to 50

doing analysis on sub populationsof women

Page 13: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

for the general population

for women who smokeover the age to 50

the contrast between the different correlations of different populationsleads to great insight

Page 14: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

service

delivery

late

broken

installation salesmanattitude

wait too long

did not fit

what about looking at customer feedback – complaints?now you can see the broader picture of what is happening

Page 15: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

but there are plenty of other places wherethe technology applies –

- manufacturing warranties – (what patterns of defects are there?)

- Weblogs (marketing – who is saying what?)

- customer complaints – (what are the problem products?)

- general email – (What’s the buzz? what is on people’s minds?)

- insurance claims (what are the circumstances of accidents?)

Page 16: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Email

.Txt

.Doc

another possibility is the monitoringof email and the transport of emailto the structured environment

Page 17: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Monitoring emails and other corporate conversations -

Email

.Txt

.Doc

Sarbanes Oxley

HIPAABASEL II

compliance – making sure that email is being used properly - compliance - corporate standard for language

Page 18: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Jan 3 - vp to vp “This is going to be a real barn burner of a quarter….”

Jan 5 – finance to vp“It looks like we are going to do $9,000,000 this quarter…”

Jan 5 – president to analyst“This quarter looks like we are going to break new records…”

Feb 1 – employee to employee“Did you see the stock market? Everything is going down…”

Feb 3 – president to vp“What is happening to sales in the midwest? We didn’t expect this…”

Feb 4 – sales manager to vp

Feb 3 – vp to vp

“The sales cycle looks like it is extending. The economy is tanking…”

“It looks like we are going to be a little short this quarter…”Feb 6 – president to vp

“What are we going to do to get sales up? Do we need to do some discounting?”

Mar 2 – sales person to vp“Demand has dried up. We aren’t going to close as many sales this quarter as we thought…”

A bunch of emails and conversations:

What do you do with them?

Page 19: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Jan 3 - vp to vp “This is going to be a real barn burner of a quarter….”

Jan 5 – finance to vp

“It looks like we are going to do $9,000,000 this quarter…”

Jan 5 – president to analyst

“This quarter looks like we are going to break new records…”

Feb 1 – employee to employee“Did you see the stock market? Everything is going down…”

Feb 3 – president to vp

“What is happening to sales in the midwest? We didn’t expect this…”

Feb 4 – sales manager to vp

Feb 3 – vp to vp

“The sales cycle looks like it is extending. The economy is tanking…”

“It looks like we are going to be a little short this quarter…”

Feb 6 – president to vp

“What are we going to do to get sales up? Do we need to do some discounting?”

Mar 2 – sales person to vp

“Demand has dried up. We aren’t going to close as many sales this quarter as we thought…”

Examining emails (“combing” them) for important corporate information:

Sarbanes Oxley quarter stock sales discount demand sales cycle

external categories

Page 20: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

sales email – Feb 2 email – Mar 5 phone – Mar 8 ………………

quarter email – Jan 2 email – Jan 4 email – Feb 5 ………………

discount phone conversation – Jan 6 email – Jan 12 email – Jan 14 …………………………..

sales cycle email – Feb 24 phone conversation – Mar 14 meeting notes – Mar 18 …………………………….

StructuredEnvironment

The “combed” information is brought over tothe structured environment.

Now you can use standard tools, such as Cognos, Business Objects,Crystal Reports, MicroStrategy to do analysis.

Page 21: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

customer data

probabilisticmatch

Emails and telephone conversations can be linkedto CDI/CRM data.

But there are other ways that communications can be used

Page 22: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

A true 360 degree viewof the customer can beformed.

“I placed an order last week andwhen it arrived it was the wrongsize. And then your companywould not take it back. I’m mad.”

how easy is it going to be to engageMrs Jones until she has satisfactionabout her order

Page 23: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

A true 360 degree viewof the customer can beformed.

communications

demographics

delivering on the promise of CDI

Page 24: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Email

.Txt

.Doc

Program

can’t I just use a search engineto link the two worlds?

integration

integration

integration

integration

search engines do not integrate textual information

Page 25: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Email

.Txt

.Doc

Program

integration

integration

integration

integration

text doesn’t need to be searched, it needs to be integrated

Page 26: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Email

.Txt

.Doc

Program

integration

integration

integration

integration

“ha”

“head ache”“heart attack”“Hepatitis A”

Page 27: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Email

.Txt

.Doc

Program

integration

integration

integration

integration

“oblique fractured ulna”“oblique fractured tibia”“obliq fractured tarsi”

“broken bone”

Page 28: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Email

.Txt

.Doc

Program

1 – stop word editing2 – stemming3 – synonym replacement4 – synonym concatenation5 – homograph resolution6 – alternate spelling resolution7 – external category classification

8 – theming9 – probabilistic matching10 – negation exclusion11 – concept clustering12 – mid process editing13 – change sensitivity

What is meant by editing, integrating text?

integration

integration

integration

integration

Page 29: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Email

.Txt

.Doc

Program

DW 2.0 Transactiondata

Current++

Verycurrent

O lder

Less thancurrent

Interactive

Integrated

Near line

Archival

R eferen ce ,m aster d ata

R eferen ce ,m aster d ata

R eferen ce ,m aster d ata

Sum m ary

Subj

Subj

Subj

Subj

Detailed

Sum m ary

Subj

Subj

Subj

Subj

Detailed

Sum m ary

Subj

Subj

Subj

Subj

Detailed

Appl

Appl

Appl

Continuoussnapshotdata

Subj

Subj SubjProfiledata

Continuoussnapshotdata

Subj

Subj SubjProfiledata

Continuoussnapshotdata

Subj

Subj SubjProfiledata

Text to subj

Text id ......

Internal, external

Textualsubjects

Capturedtext

Linkage

Text to subj

Text id ......

Internal, external

Textualsubjects

Capturedtext

Linkage

Text to subj

Text id ......

Internal, external

Textualsubjects

Capturedtext

Linkage

Sim p lep o inte rs

Sim p lep o inte rs

Sim p lep o inte rs

Business

Business

Business

Technical

Technical

Technical

Unstruc tu re dc o m p o n e n t

Struc tu re dc o m p o n e n t

C C o p y r ig h t 2 0 0 6 B i ll In m o n a n d In m o n D a ta S y s te m s

The a rc h ite c tu re fo r the ne xt g e ne ra tio n o f d a ta w a re h o usin g

D W 2 .0 is a tradem ark o f B ill Inm on and Inm on D a ta System s. A ll righ ts rese rved .

C “T he a rch itectu re fo r the nex t genera t ion o f da ta w arehous ing ” is copyrigh ted by B ill Inm on and Inm on D a ta System s. 2006For a detailed description of

how the unstructured environmentshould be linked to the structuredenvironment, go to -

www.inmoncif.com

and look for DW 2.0 TM

or go to -www.inmondatasystems.com

Page 30: A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA

Unstructured Data

Structured Environment

Query

Business Objects,Cognos,MicroStrategy,Crystal Reports

DB2

probabilisticmatch

visualization