the reading to learn project peter clark phil harrison tom jenkins john thompson rick wojcik (boeing...

22
The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Upload: esmond-wheeler

Post on 30-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

The Reading to Learn Project

Peter ClarkPhil HarrisonTom Jenkins

John ThompsonRick Wojcik

(Boeing Phantom Works)David Israel

(SRI)

Page 2: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

SRI-Boeing’s Reading to Learn Seedling

• Goal:– study issues in machine reading by working with a reduced

version of the problem, namely working with controlled, rather than unrestricted natural language. The NLP task is factored into two:

• Rationale:– by sidestepping some of the grammatical issues of full NLP,

can focus on issues of understanding and reasoning– methods for full NL → CL can be studied separately

this project

full NL → CL, CL → logic

Page 3: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

SRI-Boeing’s Reading to Learn Seedling• Approach:

– Rewrite 5 pages of chemistry text into our controlled language, CPL

– Extend and use our CPL interpreter to generate logic

– Integrate this new knowledge with an existing chemistry knowledge base (from the Halo Pilot)

– Report on the problems encountered and solutions developed

Page 4: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

CPL: Computer-Processable Language

• Simplified version of English. Basic sentence:

subj + verb + complements + adjuncts– Set of guidelines (present tense, no adverbs, no

subordinate clauses, etc.)– Translates into first-order logic axioms– Used in several projects in Boeing

Page 5: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Two Paths from Language to Logic…

Declarative CPL rulesInference-supporting

Representation

“The Knowledge

Gap”

Real TextReal(istic) Controlled Language Text

Literal/messy logic representation

Page 6: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Two Paths from Language to Logic:Approach I: CPL as simplified English

Declarative CPL rulesInference-supporting

Representation

“The Knowledge

Gap”

Real TextReal(istic) Controlled Language Text

Literal/messy logic representation

Page 7: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Approach I: Reformulation of the 5 pages…

• Note: introductory material, flowery language, fluff, complex sentences, parentheticals.

Page 8: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Acids have a sour taste.Acids cause some dyes to change color.Bases have a bitter taste.Bases have a slippery feel.

CPL Reformulation:

Page 9: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Hydrogen chloride is an Arrhenius acid.Hydrogen chloride gas is soluble in water.Hydrogen chloride gas in water reacts with the water.The reaction produces H-plus ions and Cl-minus ions.HCl is hydrogen chloride.Hydrochloric acid is an aqueous solution of HCl.37 percent of the mass of concentrated hydrochloric acid is HCl.The concentration of HCl in concentrated hydrochloric acid is 12 M.

CPL Reformulation:

Page 10: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

FORALL ?Water0, ?Hydrogen Chloride0, ?Gas0:isa(?Water0,H2O)isa(?Gas0, Gas-Substance)isa(?HydrogenChloride0, HCl)has-basic-unit(?Gas0, ?HydrogenChloride0)is-inside(?Gas0, ?Water0)===>

EXISTS ?React0:isa(?React0, Reaction)raw-material(?React0, ?Gas0)raw-material(?React0, ?Water0)

Hydrogen chloride gas in water reacts with the water.

Page 11: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Do We Get Inference-Capable Knowledge Out?

Encoded and processed ~250 CPL sentences.Limited inference ability. Several key problems:

1. Complex Notions/Idioms/Special Phrases: "The reaction favors transfer of protons to the weaker acid"

2. Examples: "An HCl molecule in water donates a proton to an H2O molecule."

3. Diagrams and Tables:

Page 12: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Do We Get Inference-Capable Knowledge Out?

4. Generics: "Acids contain hydrogen." "Acids cause some dyes to change color."

5. Procedural/Problem-solving knowledge: "A conjugate base is formed by removing a proton from the acid.“

( Base = Acid – Proton )

6. Representational challenges: "An acid and a base differing only in a proton are called a conjugate

acid-base-pair."

7. Algebra: "NaOH dissociates into Na+ and OH- ions."

8. Metonymy: "Nitrous acid reacts with water in Equation 16.7."

Page 13: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Two Paths from Language to Logic:Approach II: CPL as a rule language

Declarative CPL rulesInference-supporting

Representation

“The Knowledge

Gap”

Real TextReal(istic) CPL Text

Literal/messy logic representation

Page 14: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Rewriting a Sentence into CPL

IF there is a reactionAND one base in the reaction is stronger than the other base in the reactionTHEN the direction of the reaction is away from the stronger base.[“favors transfer to” → “direction is away from”]

“In every acid-base reaction the position of the equilibrium favors transfer of the proton to the stronger base.”

IF there is a reactionAND there is a base on the left side of the reactionAND there is a base on the right side of the reactionAND the first base is stronger than the second baseTHEN the direction of the reaction is to the right.

Textbook

More Precise Encoding 1

More Precise Encoding 2

Page 15: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Can we now integrate this with the KB?

• Goal:– Manually remove the target knowledge from KB– Add the new knowledge in

• However:– Hard to remove and add knowledge

• KB is complex and intertwined– 2 case studies:

• Conjugate acid calculations• Relative acid strengths

Page 16: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Text Book: Conjugate Acid

• Computing the conjugate acid:

• i.e. Acid = Base + proton • e.g. H3O+ = H2O + H+

“Every base has associated with it a conjugate acid, formed by adding a proton to the base.”

Page 17: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Compute-Conjugate-Acid:

input: Chemical

parent?formula: Chemical (input) → Molecule → FormulaObject → Formula

target-unit: LISP: Formula (parent?formula) → Formula (conjugate)

output: Formula (target-unit) → FormulaObject → Molecule →

ClassifiedMolecule → Chemical → ClassifiedChemical

“H2O”

“H3O”

2H+O

2H+O 3H+O

3H+O

KB: Conjugate Acid

(result)

Page 18: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Text Book: Relative Strengths of Acids

• The textbook presents a rank-order of relative strengths.

• Encoded as large nested structure in KB

Page 19: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

(every Acid-Role has (intensity ( (a Intensity-Value with (value (

(:pair ;; Case statement for Acids. (if ((the played-by of Self) isa Ionic-Compound-Substance) then (if (((the played-by of Self) isa HCl-Substance) or

((the played-by of Self) isa HBr-Substance) or ((the played-by of Self) isa HI-Substance) or ((the played-by of Self) isa HClO3-Substance) or ((the played-by of Self) isa HClO4-Substance) or ((the played-by of Self) isa H2SO4-Substance) or ((the played-by of Self) isa HNO3-Substance)) then *strong else (if (((the played-by of Self) isa H3PO4-Substance) or

((the played-by of Self) isa HF-Substance) or((the played-by of Self) isa HC2H3O2-Substance) or((the played-by of Self) isa H2CO3-Substance) or

KB: Relative Strengths of Acids

Page 20: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Requirements for an Extensible KB

1. Simple structures

2. Declarative and procedural knowledge separated

– include a constraint reasoner

3. Elaboration-tolerant organization

4. Error-tolerant reasoner

Page 21: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

What Have We Learned?• Controlled language only partially alleviates the problem

– many interpretation problems remain

• Text often doesn't clearly state knowledge – need to use multiple texts

• AP Chemistry is particularly hard– chemical/molecule/formula distinction; algebra

• Knowledge bases need to be built for extensibility– syntactically simple, declarative, elaboration tolerant

• Bridging the gap:– "Smart" language interpretation; use of knowledge– Where might this come from? bootstrapping

Page 22: The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)

Another View of the Path from Language to Logic…

Inference-supporting

Representation

Real Text

Expectations

Context

Prior knowledge

Hypotheses

Fragments of new knowledge

Confirmations

Examples

Refinements