reading to learn q3 review peter clark john thompson tom jenkins phil harrison bill murray

90
Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Upload: erik-parrish

Post on 02-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Reading to LearnQ3 Review

Peter ClarkJohn Thompson

Tom JenkinsPhil HarrisonBill Murray

Page 2: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Agenda• This Seedling and Mobius

– Major lessons learned• Reformulations in CPL

– Whole 5 pages– Key Sentences

• How do other texts compare?• Generics• How to identify “important” text• Principles for an extensible KB• Evaluation discussion• Tuples as another source of knowledge

Page 3: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

SRI-Boeing’s Reading to Learn Seedling

• Goal:– study issues in learning through reading by working with a

reduced version of the problem, namely working with controlled, rather than unrestricted natural language. The NLP task is factored into two:

• full NL → CL, CL → logic

• Rationale:– by sidestepping some of the linguistic issues of full NLP, can

focus on knowledge integration issues– methods for full NL → CL can be studied separately

this project

Page 4: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

SRI-Boeing’s Reading to Learn Seedling• Approach:

– Rewrite 5 pages of chemistry text into our controlled language, CPL

– Extend and use our CPL interpreter to generate logic

– Integrate this new knowledge with an existing chemistry knowledge base (from the Halo Pilot), which has the new knowledge surgically deleted from it

– Evaluate the performance of the CPL-extended KB with the original

– Report on the problems encountered and solutions developed

Page 5: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

This Seedling in Mobius

KnowledgeIntegration

Introspection

Natural LanguageProcessing

TestGeneration

This seedling

Page 6: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Summary• Q3:

– Completed coding of key sentences in CPL– Demonstration of inference with that knowledge– Study of cues for identifying important text– Assembly of key lessons learned – Interaction with ISI– Exploration of shallow knowledge extraction

• Q4– Finish interpretation of additional sentences– Assemble qualitative and quantitive evaluations– Continue interaction with ISI: Side-by-side study – Final report

Page 7: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Main Results and Messages

• With some hand-holding, part of the “Mobius loop” can be done– But: chemistry is a formidable domain

• Contributions:– 10 key lessons learned for a larger project– Qualitative and quantitative evaluation data

Page 8: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

10 Key Lessons

• Much of the text is irrelevant (“fluff”)

• Much important knowledge is conveyed by examples & diagrams

• General principles are rarely spelt out clearly

• Text is full of ambiguity, metaphor, and metonymy/“loosespeak”

• Declarative knowledge may be hidden in procedural descriptions

• Text creates disconnected knowledge, which may not chain well

• Discourse structure is important

• Generic sentences are ubiquitous

• Many sentences pose major representational challenges

• Traditional KR structures are difficult to extend

Page 9: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Two Reformulations into CPL…• Reformulation of the whole 5 pages into CPL

– Approximately 250 sentences– Syntactic conversion + pseudo-logic– generally not inference capable, esp. generics

• Re-reformulation of first subsection into explicit if-thens• Inference capable but greater distance from source text

• Reformulation of key pieces into CPL– approximately 10 if-then rules– inference capable– barely recognizable from the original source text

Page 10: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Agenda• This Seedling and Mobius

– Major lessons learned• Reformulations in CPL

– Whole 5 pages– Key Sentences

• How do other texts compare?• Generics• How to identify “important” text• Principles for an extensible KB• Evaluation discussion• Tuples as another source of knowledge

Page 11: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Some CPL RulesIF a substance is an acid THEN the substance tastes sour.IF an acid contacts an acid-sensitive dye THEN the acid changes the color of the dye.IF a substance is a base THEN the substance tastes bitter. IF a substance is a base THEN the substance feels slippery.IF a substance is an acid THEN the substance contains hydrogen. IF a thing is a base THEN the thing is a substance. IF an Arrhenius base contacts water THEN the base emits OH-minus ions in the water.IF an Arrhenius acid is dissolving in water THEN the dissolving is increasing the concentration of H-plus ions in the water.IF an Arrhenius base is dissolving in water THEN the dissolving is increasing the concentration of OH-minus ions in the waterIF a substance is a HCl substance THEN the substance is an Arrhenius acid.IF hydrogen chloride gas is in water THEN the gas dissolves easily in the water.IF hydrogen chloride gas is in water THEN the gas reacts with the water.

Page 12: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Reformulation of the 5 pages…

• Note: introductory material, flowery language, fluff, complex sentences, parentheticals.

Page 13: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

IF a substance is an acid THEN the substance tastes sour.IF an acid contacts an acid-sensitive dye THEN the acid changes the color of the dye.IF a substance is a base THEN the substance tastes bitter. IF a substance is a base THEN the substance feels slippery.

Page 14: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

IF a substance is a HCl substance THEN the substance is an Arrhenius acid. IF hydrogen chloride gas is in water THEN the gas dissolves easily in the water. IF hydrogen chloride gas is in water THEN the gas reacts with the water. HCl is the chemical symbol for hydrogen chloride. IF a substance is an aqueous solution of HCl substance THEN the substance is hydrochloric acid. IF a substance is concentrated hydrochloric acid THEN 37 percent of the mass of the substance is HCl. IF a substance is concentrated hydrochloric acid THEN the concentration of HCl in the substance is 12 M.

← (Implied but not explicit)

Page 15: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

IF a substance is an aqueous solution of HCl substanceTHEN the substance is hydrochloric acid.

(every Hydrochloric-Acid has-definition (instance-of (Aqueous-Solution)) (has-solute ((a HCl-Substance)))

the'(e1,x1,e2) & aqueous'(e3,x1) & solution'(e2,x1) & of'(e4,x1,x2) & hcl'(e5,x2) & know'(e6,z1,x1,x3) & as'(e7,e6,x3) & hydrochloric'(e8,x3) & acid'(e9,x3)

(surfacelogicalform)

CPL

Halo KBstyle

Page 16: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Summary of Interpretation Challenges

• Interpreting generics.– "Acids cause some dyes to change color."

• how to handle negation.– "Some substances containing hydrogen are not acids."– "The transfer leaves no undissociated acid molecules"

• Vague attributes ("properties", "due to")– “Properties of aqueous solutions of Arrhenius acids are due to H-plus ions"

– coreference with nominalizations ("react"/"reaction")– "Hydrogen chloride reacts... The reaction produces..."

• naming: how to represent both the name and the symbol for a chemical.– "An aqueous solution of HCl is called hydrochloric acid."

• how to get new technical vocabulary + meanings into the system.– "NaOH dissociates in water."– "H2O abstracts the proton from HX"

• how to represent definitions.– "Arrhenius acids and defined..."

• how to state that one category is more general than another.– "Bronsted-Lowry acids are more general than Arrhenius acids."

Page 17: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Summary of Interpretation Challenges (cont)• how to represent "sometimes".

– "An HO3-plus ion sometimes reacts with an H2O molecule."• how to represent modals/tendancies like "can".

– "A molecule of a Bronsted-Lowry acid can donate a proton..."• how to represent an argument (proof), and generalize from it.

– "Therefore, the H2O molecule acts as a Bronsted-Lowry base.“– "Substances with negligible acidity contain hydrogen, but the substances do

not behave as acids in water."• vagueness ("is mostly", "nearby", "some")

– "The NH4Cl is mostly solid particles."– "Some acids are better proton donors than other acids."– "A weak acid partly transfers the acid's protons to the water."– "Proton-transfer reactions are governed by the relative strengths of the bases"– "The solution has a negligible concentration of HCl molecules."– "An aqueous solution of acetic acid consists mainly of HC2H3O2 molecules"– "The aqueous solution has relatively few H3O-plus ions"

• metonymy– "The H2O molecule in Equation 16.5 donates a proton"– "In Equation 16.9 HX dissolves in water."– "Equation 16.9 describes the behavior of a strong acid in water."

Page 18: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Summary of Interpretation Challenges (cont)

• definitions with negation.– "An H-plus ion is a proton with no valence electron."

• presuppositions– "Acids cause some dyes to change color."– "A Bronsted-Lowry acid always reacts with a nearby Bronsted-Lowry base."

• generalized formulae and equations– "In Equation 16.6 the symbol HX denotes an acid."

• how to compute and represent differences– "An acid and a base differing only in a proton are called a conjugate pair"

• how to handle definite references ("the" base) that haven't been introduced.– "Removing a proton from the acid produces the conjugate base."

• change over time– "The HNO2 molecule becomes the NO2-minus ion."– "The H2O molecule changes into the hydronium ion"– "Acids cause some dyes to change color."

• semi-malformed sentences– "A stronger acid has a weaker conjugate base."

• How to state and represent hypothetical situations.– "Assume that H2O is a stronger base than X-minus in Equation 16.9."

Page 19: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Summary of Interpretation Challenges (cont)

• Generalization from examples– “In any reaction we can identify two sets of conjugate acid-base pairs. For

example, consider the reaction…”• Information in tables and diagrams

Page 20: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Agenda• This Seedling and Mobius

– Major lessons learned• Reformulations in CPL

– Whole 5 pages– Key Sentences

• How do other texts compare?• How to identify “important” text• Principles for an extensible KB• Evaluation discussion• Tuples as another source of knowledge

Page 21: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Recall from Last Time …• Most of the textbook sentences are “fluff” and

examples– and are not needed to solve test questions

• A few key sentences (and a table) are the heart of this section of the textbook– and are often given in italics

• These key sentences are not worded as precisely as needed for automatic translation into axioms that can chain together to solve a problem– in fact, some parts are not stated at all– students look at diagrams and examples and

figure it out

Page 22: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Overview• 4 key pieces of knowledge in the Section:

– Computing the direction of the reaction• Rewriting in CPL• Compare to UT’s KM encoding• Compare to ISI’s shallow logical form

– Identifying the acids/bases in a reaction– Computing the conjugate of an acid/base– Comparing the strengths of two acids/bases

Page 23: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Overview• 4 key pieces of knowledge in the Section:

– Computing the direction of the reaction• Rewriting in CPL• Compare to UT’s KM encoding• Compare to ISI’s shallow logical form

– Identifying the acids/bases in a reaction– Computing the conjugate of an acid/base– Comparing the strengths of two acids/bases

Page 24: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

A Key Sentence in Our Textbook

• Let’s look at one example of a key sentence:

• “From these examples we conclude that in every acid-base reaction the position of the equilibrium favors transfer of the proton to the stronger base.”

• Restated in Sample Exercise 16.3:• “Thus, the equilibrium favors the direction in

which the proton moves from the stronger acid and becomes bonded to the stronger base.”

• “In other words, the reaction favors consumption of the stronger acid and stronger base and formation of the weaker acid and weaker base.”

Page 25: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray
Page 26: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Rewriting a Sentence into CPL

IF there is a reactionAND one base in the reaction is stronger than the other base in the reactionTHEN the direction of the reaction is away from the stronger base.[“favors transfer to” → “direction is away from”]

“In every acid-base reaction the position of the equilibrium favors transfer of the proton to the stronger base.”

IF there is a reactionAND there is a base on the left side of the reactionAND there is a base on the right side of the reactionAND the first base is stronger than the second baseTHEN the direction of the reaction is to the right.

Textbook

Naïve Encoding 1

Naïve Encoding 2

Page 27: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Further Refinement of the CPL

IF there is a reactionAND there is a base on the left side of the reactionAND there is a base on the right side of the reactionAND the first base is stronger than the second baseTHEN the direction of the reaction is to the right.

Naïve Encoding 2

“The chemical entity whose formula is on the left side of the equation of the reaction and which plays a base role”

Page 28: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Final CPL Rule That Worked!

IF there is an equation of a reaction

AND a first chemical entity has a chemical formulaAND the first chemical formula is part of the left side of the equationAND the first chemical entity is playing a base role

AND a second chemical entity has a second chemical formulaAND the second chemical formula is part of the right side of the equationAND the second chemical entity is playing a base role

AND the first chemical entity is stronger than the second chemical entity

THEN the direction of the reaction is right [to the right]AND the equilibrium side of the reaction is right. [lies on the right]

“the base on the LHS”

“the base on the RHS”

(means “stronger base than”)

(UT’s rep. uses Reaction, but should use Equation)

Page 29: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Compare Sentence to Final CPL

• In every acid-base reaction the position of the equilibrium favors transfer of the proton to the stronger base.

• IF there is an equation of a reaction AND a first chemical entity has a chemical formulaAND a second chemical entity has a second chemical formulaAND the first chemical formula is part of the left side of the equationAND the second chemical formula is part of the right side of the equationAND the first chemical entity is playing a base roleAND the second chemical entity is playing a base roleAND the first chemical entity is stronger than the second chemical entity THEN the direction of the reaction is rightAND the equilibrium side of the reaction is right.

• (There’s a 2nd rule like this that concludes the direction is left)

not actually used!

Page 30: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

KM Generated from CPL– (_Equation7461 equation-of _Reaction7462)

– (|_Chemical Entity7468| has-chemical-formula |_Chemical Formula7469|)

– (|_Chemical Formula7469| equal _Part7485)

– (_Part7485 is-part-of |_Left Side7483|)

– (|_Left Side7483| is-region-of _Equation7461)

– (|_Chemical Entity7475| has-chemical-formula |_Chemical Formula7476|)

– (|_Chemical Formula7476| equal _Part7494)

– (_Part7494 is-part-of |_Right Side7492|)

– (|_Right Side7492| is-region-of _Equation7461)

– (|_Chemical Entity7468| plays |_Base Role7501|)

– (|_Chemical Entity7475| plays |_Base Role7508|)

– (|_Chemical Entity7468| stronger-base-than |_Chemical Entity7475|) – (_Direction7518 value *right)

– (_Direction7518 direction-of _Reaction7462)

– (|_Equilibrium Side7524| property *right)

– (|_Equilibrium Side7524| equilibrium-side-of _Reaction7462)

chem.on LHS

chem. on RHS

THEN

IF

Page 31: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Structure of the CPL Axioms

1. Find equilibrium side (or direction) of equation

2. Find out if a chemical is playing a base role in the equation

4. Check whether one base is stronger than another base

3. Find out if a chemical is the conjugate base of another chemical

3b. Check whether one formula differs from another in an H+

3a. Look in Table, or …

4a. Look in Table

(not in CPL)

Page 32: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Notes on our CPL Rule

• The wording is way different from the original text!

• The literal sentence translation would not have produced anything that could solve a problem, given an equation

• “In every acid-base reaction the position of the equilibrium favors transfer of the proton to the stronger base.”– this would create a Favoring event– the position of the equilibrium is the agent– the transfer of the proton is the object– what does this mean?

Page 33: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Overview• 4 key pieces of knowledge in the Section:

– Computing the direction of the reaction• Rewriting in CPL• Compare to UT’s KM encoding• Compare to ISI’s shallow logical form

– Identifying the acids/bases in a reaction– Computing the conjugate of an acid/base– Comparing the strengths of two acids/bases

Page 34: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

How UT Encoded This• "In acid/base equilibrium reactions, the reaction proceeds in the direction of the side where

equilibrium lies“ [their comment for use in explanations]

• (every Reaction has …

(direction ((if (not (the direction of Self)) then (a Direction-Value with (value ((if (the output of

(a Compute-Equilibrium-Position with (input (Self))))

then (if ((the output of

(a Compute-Equilibrium-Position with(input (Self))))

= (the raw-material of Self)) then *left else *right)))))

To find the direction of a reaction…

Compute the equilibrium position …

If the chemicals match the raw materials

Then the direction is left, else right

Page 35: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

(every Compute-Equilibrium-Position has (input ((a Reaction))) (output ( ;; See if both the strong acid and base are on the LHS. (if (;; Check the acids. ((the output of (a Compare-Relative-Strengths-of-Acids with (input ( (oneof (the raw-material of (the input of Self)) where (the Acid-Role plays of It)) (oneof (the result of (the input of Self)) where (the Acid-Role plays of It)))))) = (oneof (the raw-material of (the input of Self)) where (the Acid-Role plays of It))) and ;; Check the bases. ((the output of (a Compare-Relative-Strengths-of-Bases with (input ( (oneof (the raw-material of (the input of Self)) where (the Base-Role plays of It)) (oneof (the result of (the input of Self)) where (the Base-Role plays of It)))))) = (oneof (the raw-material of (the input of Self)) where (the Base-Role plays of It)))) then (the result of (the input of Self)) else (the raw-material of (the input of Self))))))

If the stronger of…

the raw material acid…

and the result acid…

is the raw material acid…

(same for bases)

then equilibrium is on the result sideelse the raw material side

UT’s Compute-Equilibrium-Position

Page 36: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Notes on UT’s Encoding

• Very procedural!

• Various procedural methods are encoded– both qualitative and quantitative

• Nothing like the textbook sentences

• Their representation does not match the natural conceptual model we expected– see the next slide

Page 37: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Mismatches between UT and CPL• UT put a “direction” slot on a Reaction, we expected it

to be on an Equation

• UT has no model of the left and right sides of an Equation, only the “raw-materials” and “result” slots of a Reaction

• UT has a Conjugate-Acid-Base-Pair concept, but lacks the conjugate-base & conjugate-acid relations we expected

• UT has no slot for the “equilibrium-side” of an Equation, only the “direction” of a reaction

Page 38: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

More Mismatches between UT and CPL

• UT gives us no primitives to use for formula manipulation (adding an H+), it’s buried within their Compute-Conjugate-Acid

• UT’s model of Formula does not include a “charge” slot, they’ve only attached it to the Chemical itself

• UT has no notion of “stronger-base-than,” they only label a chemical with “intensity” = strong or weak.

• So, it would help if the conceptual model were closer to natural language!

Page 39: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Overview• 4 key pieces of knowledge in the Section:

– Computing the direction of the reaction• Rewriting in CPL• Compare to UT’s KM encoding• Compare to ISI’s shallow logical form

– Identifying the acids/bases in a reaction– Computing the conjugate of an acid/base– Comparing the strengths of two acids/bases

Page 40: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

ISI’s Shallow Logical Form for our Sentence

from'(e1,e2,x1) & these'(e3,s1,e4) & example'(e4,x1) & plural'(e7,x1,s1) & we'(e8,x2) & plural'(e9,x2,s2) & conclude'(e2,x2,x3,z1) & that'(e10,e2,e11) & in'(e12,e11,x4) & every'(e13,x4,e14) & acid-base'(e15,x4) & reaction'(e14,x4) &the'(e16,x5,e17) &

position'(e17,x5) & of'(e18,x5,x6) & the'(e19,x6,e20) & equilibrium'(e20,x6) & favor'(e11,x5,x7,z2) & transfer'(e21,x7) & of'(e22,x7,x8) & the'(e23,x8,e24) & proton'(e24,x8) & to'(e25,x7,x9) & the'(e26,x9,e27) & strong'(e28,x9) & base'(e27,x9)

“From these examples we conclude that in every acid-base reaction the position of the equilibrium favors transfer of the proton to the stronger base.”

Page 41: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Graph of ISI’s Shallow Logical Form

z2 = favor (x5, x7)

x5 = position

x6 = equilibrium

of (x5, x6)

x7 = transfer

of (x7, x8)

x8 = proton

to (x7, x9)

x9 = base

strong (x9)

z1 = conclude(x2, x3)

x2 = we x3 = [missing!]

x1 = example

from (x1)

?

these

that

x4 = reaction

every(x4) acid-base(x4)

in(x4)?

?

Page 42: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Notes on ISI’s Shallow Logical Form

• Not far removed from a syntactic parse• They plan to do much more development of this

• Will probably produce a literal translation

– there will be a Favoring event, with agent & object

• As with the naïve CPL sentence, a literal translation would not help solve a Chemistry problem

Page 43: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Overview• 4 key pieces of knowledge in the Section:

– Computing the direction of the reaction• Rewriting in CPL• Compare to UT’s KM encoding• Compare to ISI’s shallow logical form

– Identifying the acids/bases in a reaction– Computing the conjugate of an acid/base– Comparing the strengths of two acids/bases

Page 44: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray
Page 45: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

CPL for 2nd Key Sentence• “In any acid-base (proton transfer) reaction we can

identify two sets of conjugate acid-base pairs.”

• IF there is an equation of a reactionAND a first chemical entity has a chemical formulaAND a second chemical entity has a second chemical formulaAND the first chemical formula is part of the left side of the equationAND the second chemical formula is part of the right side of the equationAND the first chemical entity is the conjugate base of the second chemical entity THEN the first chemical entity is playing a base roleAND the second chemical entity is playing an acid role.

• (There’s a 2nd rule like this with first & second reversed)

Page 46: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

UT Code for 2nd Key Sentence (every Chemical has (plays ( (if ((the term of (the atomic-chemical-formula of (the has-basic-structural-unit of Self)))

and (not (the Base-Role plays of Self))) then (if ((has-value (oneof (the result of (the Reaction raw-material-of of Self)) where (((the elements of (the term of (the atomic-chemical-formula of (the has-basic-structural-unit of It)))) = (forall2 (the elements of (the term of (the atomic-chemical-formula of (the has-basic-structural-unit of Self)))) (if ((the2 of It2) = H) then (:pair ((the1 of It2) + 1) H) else It2))) or...

then (a Base-Role)

jump to the otherside of the equation!

Reaction

ChemicalChemical

raw-materialresult

“… has an extra H”

“IF one of the chemicalson the other side of thereaction…”

“…THEN this chemical’s a base”

Page 47: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Overview• 4 key pieces of knowledge in the Section:

– Computing the direction of the reaction• Rewriting in CPL• Compare to UT’s KM encoding• Compare to ISI’s shallow logical form

– Identifying the acids/bases in a reaction– Computing the conjugate of an acid/base– Comparing the strengths of two acids/bases

• These last two items are presented in a table

Page 48: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Conjugate Acid-Base Pairs

IF there is an HCl and a Cl-MinusTHEN the conjugate base of the HCl is the

Cl-minus.

IF there is an H3O-Plus and an H2OTHEN the conjugate base of the H3O-Plus

is the H2O.

Etc.

CPL Textbook

Page 49: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Relative Strengths of Bases

IF there is a Cl-Minus and an HSO4-MinusTHEN the HSO4-Minus is a stronger base than

the Cl-Minus.

IF there is a HSO4-Minus and an NO3-MinusTHEN the NO3-Minus is a stronger base than

the HSO4-Minus.

IF there is an NO3-Minus and an H2OTHEN the H2O is a stronger base than the NO3-

Minus.

Etc.

CPL Textbook

Page 50: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Lessons from Key Sentences - 1• The key sentences did not translate literally into useful logic

– they had to be carefully rewritten in CPL

– and knowledge was added from studying diagrams and examples

– and they were tested with each other to chain together

• It was difficult to make use of the UT representations

– they were very procedural

– their representations were further removed from the English

– so, we should use more natural representations

• ISI’s shallow logical forms may produce literal translations

– again, not useful for solving problems

Page 51: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Lessons from Key Sentences - 2

• Reading knowledge directly from a Chemistry text would be very challenging

– the knowledge has to be written precisely enough for a computer (with little common sense) to encode

– knowledge in tables and diagrams may be critical

– the knowledge has to chain together to solve difficult exam problems

– we need text that is written much more dryly and precisely

– we need a domain that doesn’t have such difficult exam problems

Page 52: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Agenda• This Seedling and Mobius

– Major lessons learned• Reformulations in CPL

– Whole 5 pages– Key Sentences

• How do other texts compare?• Generics • How to identify “important” text• Principles for an extensible KB• Evaluation discussion• Tuples as another source of knowledge

Page 53: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Are Other Chemistry Texts Better?

• We looked at Web explanations and at ‘Chemistry Made Simple’ types of books

• Discovered that each teacher explains it differently

• Most jump right into quantitative formulas for computing where a reaction’s equilibrium lies

– but our textbook teaches it qualitatively first, which is rare

• Other sources are not any easier to process

Page 54: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Examples of Other Sources

• “Think of a Bronsted acid-base reaction as a competition between the 2 bases in the system for protons. The stronger base ‘wins” and forces the equilibrium in the direction of the weaker acid and base.” (Web)

• [some books say that an acid is a proton donor] “… the acid molecule does not ‘give’ or ‘donate’ the proton, it has it taken away. In the same sense, you do not donate your wallet to the pickpocket, you have it removed from you.” (another website)

• “The base is a molecule with a built-in ‘drive’ to collect protons. As soon as the base approaches the acid, it will (if it is strong enough) rip the proton off the acid molecule and add it to itself.”

Page 55: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

More Examples from Other Sources

• “You see, some bases are stronger than others, meaning some have a large ‘desire’ for protons, while other bases have a weaker drive. It’s the same way with acids, some have very weak bonds and the proton is easy to pick off, while other acids have stronger bonds, making it harder to ‘get the proton’.”

• “Remember that an acid-base reaction is a competition between two bases (think about it!) for a proton. If the stronger of the two acids and the stronger of the two bases are reactants (appear on the left side of the equation), the reaction is said to proceed to a large extent.”

• Note the heavy use of metaphors in these qualitative explanations!

• The more readable by humans, the less readable by computers!

Page 56: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Agenda• This Seedling and Mobius

– Major lessons learned• Reformulations in CPL

– Whole 5 pages– Key Sentences

• How do other texts compare?• Generics • How to identify “important” text• Principles for an extensible KB• Evaluation discussion• Tuples as another source of knowledge

Page 57: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Agenda• This Seedling and Mobius

– Major lessons learned• Reformulations in CPL

– Whole 5 pages– Key Sentences

• How do other texts compare?• Generics • How to identify “important” text• Principles for an extensible KB• Evaluation discussion• Tuples as another source of knowledge

Page 58: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Review

• Earlier analysis:– Much of the textbook is “irrelevant” for the

purposes of computer-based reading• motivational material, illustrative material, humor

– Other sentences/parts are critical• Questions:

– Can a computer automatically find the critical items?

– What cues might indicate the important material?

Page 59: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

This brief analysis…• Here, just consider two categories:

– important vs. unimportant material• Categories of surface cues:

– linguistic– context– layout– typography (e.g., font changes)

• Looked at several text books:– B&L, Chemistry Made Simple, Cliffs Notes

Page 60: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Cues for Importance/Unimportance• Verb tense: past tense suggests irrelevance

– chemical facts are generally presented in the present tense; past tense usually signals a historical digression; but biological facts include evolutionary facts, which require past tense.

• Cue phrases for important generalizations– “for example” and (less so) “thus” precede examples

but follow important generalizations.

Page 61: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Cues for Importance/Unimportance• Long sentences (>20) suggest irrelevance

– Average sentence length for chemistry is about 15 words; biology, ca. 24 words.

– 15 words seems to allow a good balance of simplicity and complexity for stepping through explanations. CPL should target this number.

– Summaries tend to have long complex sentences that are harder to process. Also true for sentences in review texts: Cliffs Notes, Instant Notes.

Page 62: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Cues for Importance/Unimportance

• Everyday words suggest applications.• Nominalized verbs suggest irrelevance

– exception: basic chemical changes (e.g., reaction, combustion, evaporation)

• Keywords: – “if”, “when”, “because”, “for” indicate important

sentences– “For example” precedes an illustration

• also indicates stuff prior is an important generality– “although”: typically part of fluffy sentence

Page 63: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Cues for Importance/Unimportance

• Definitional patterns: important!– “x is substance y”, “x is a y that does z”, “x is

called y”• First and last sentences in a paragraph tend to be

important (unless transitional)– set the topic of the paragraph

• Text in bold or italics is often important• Repetition: could this be exploited?

Page 64: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Summary

• Many surface cues exist• Could identify important material by

– surface cues– “deeper” model of the document structure

• e.g. Motivation → General principle → Example → Reinforce general principle

• Could the document automatically be turned into a labeled, networked structure like this?

• How document-specific are these patterns?

Page 65: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Agenda• This Seedling and Mobius

– Major lessons learned• Reformulations in CPL

– Whole 5 pages– Key Sentences

• How do other texts compare?• Generics • How to identify “important” text• Principles for an extensible KB• Evaluation discussion• Tuples as another source of knowledge

Page 66: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Principles for an Extensible KB

e.g.: add/modify knowledge (semantics) by (only) adding formulae (syntactics)

A formalism is elaboration tolerant to the extent that it is convenient to modify a set of facts expressed in the formalism to take into account new phenomena or changed circumstances. [John McCarty]

Elaboration Tolerance:

• Syntactic simplicity• Metonymy-tolerant reasoning• Separate procedural and declarative knowledge

Three Key Desirables for this:

Page 67: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Syntactic Simplicity

(every Acid-Role has (intensity ( (a Intensity-Value with (value (

(:pair ;; Case statement for Acids. (if ((the played-by of Self) isa Ionic-Compound-Substance) then (if (((the played-by of Self) isa HCl-Substance) or

((the played-by of Self) isa HBr-Substance) or ((the played-by of Self) isa HI-Substance) or ((the played-by of Self) isa HClO3-Substance) or ((the played-by of Self) isa HClO4-Substance) or ((the played-by of Self) isa H2SO4-Substance) or ((the played-by of Self) isa HNO3-Substance)) then *strong else

Not elaboration-tolerant

Many syntactically large and complex structures in the original Halo KB, e.g.,

Page 68: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Syntactic Simplicity Better would be to factor them smaller units, e.g.,

intensity(HCl-Substance, *strong)intensity(HBr-Substance, *strong)intensity(HI-Substance, *strong)intensity(HClO3-Substance, *strong)intensity(HClO4-Substance, *strong)intensity(H2SO4-Substance, *strong)intensity(HNO3-Substance, *strong)…intensity(HF-Substance, *weak)intensity(HC2H3O2-Substance, *weak)intensity(H2CO3-Substance, *weak)…

Elaboration-tolerant

Page 69: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

CPL Produces Syntactically Simple Structures…

IF (_Intensity9 instance-of Intensity-Value) (_Chemical8 instance-of Chemical) (_Intensity5 instance-of Intensity-Value) (_Chemical4 instance-of Chemical)

(_Intensity5 property *strong) (_Intensity5 intensity-of _Chemical4) (_Intensity9 property *weak) (_Intensity9 intensity-of _Chemical8)

THEN (_Chemical4 stronger-than _Chemical8)

(every Compare-Relative-Strengths-of-Acids has (output ((if ((the intensity of (the first of (the Chemicals)) = *strong)

and ((the intensity of (the second of (the Chemicals)) = *weak)then (the strongest of (the Chemicals)) = (the first of (the Chemicals)))))

“Traditional” KM:

CPL triples:

Page 70: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Metonymy/Loosespeak• Metonymy: One word substitutes for a closely related word• Loosespeak: More generally, the “literal” interpretation is wrong• Examples:

– “The kettle is boiling.”– “I’m just going to change the washing machine.”– “It’s your turn to clean out the rabbit.”

– “Remove a proton from the acid”– “The acid on the left of the equation”

– “The reaction moves to the right”

– “NaCl dissolves in water”

Page 71: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Handling Metonymy/Loosespeak

1. Detect inconsistencies / “unusualities”

Need extensive world knowledge for this

2. If found, create and evaluate alternative interpretations– Metonymic transformation rules (e.g., Lakoff, Fass)

• PART for WHOLE (“Get your butt over here”)• PLACE for INSTITUTION (“The White House isn’t saying anything”)• PLACE for EVENT (“Remember the Alamo”)• SUBSTANCE for MOLECULE (“NaCl dissolves”)• FORMULA for SUBSTANCE (“NaCl is on the left of the eqn”)

Page 73: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

(every Compare-Relative-Strengths-of-Acids has (output ((if (((the1 of (the value of (the intensity of

(the Acid-Role plays of (the first of (the input of Self)))))) = *strong) and((the1 of (the value of (the intensity of (the Acid-Role plays of (the second of (the input of Self)))))) /= *strong))

then (the first of (the input of Self)))))

(every Compare-Relative-Strengths-of-Acids has (output ((if ((the intensity of (the first of (the Chemicals)) = *strong)

and ((the intensity of (the second of (the Chemicals)) /= *strong)then (the strongest of (the Chemicals)) = (the first of (the Chemicals)))))

Metonymy Tolerance

if we had a metonymy-tolerant reasoner, we could instead write…

Page 74: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Separating Procedural and Declarative Knowledge

• Procedural descriptions are uni-directional, and difficult to introspect on

• Better: domain-specific, declarative knowledge + general-purpose procedural algorithms

“Every acid has a conjugate base, formed by removing a proton from the acid. ... Similarly, every base has associated with it a conjugate acid, formed by adding a proton to the base.”

Acid-Chemical = Base-Chemical + H+

Page 75: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

(every Compare-Relative-Strengths-of-Acids has (output ((if (((the1 of (the value of (the intensity of

(the Acid-Role plays of (the first of (the input of Self)))))) = *strong) and((the1 of (the value of (the intensity of (the Acid-Role plays of (the second of (the input of Self)))))) /= *strong))

then (the first of (the input of Self))

[Compare-Relative-Strengths-of-Acids-output-1] )))

Declarative

Procedural (PSM)

Mixed

HCl *strongH2CO3 *weak

… …+

Find object(s) withqualitatively largestattribute value

*strong > *weak > …

+

“Theory of magnitudes”

Separating Procedural and Declarative Knowledge

Page 76: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Agenda• This Seedling and Mobius

– Major lessons learned• Reformulations in CPL

– Whole 5 pages– Key Sentences

• How do other texts compare?• Generics • How to identify “important” text• Principles for an extensible KB• Evaluation discussion• Tuples as another source of knowledge

Page 77: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Possible Quantitative Metrics• Behavioral:

– Ablation study: Question-answering performance • Analytic:

– Complexity of CPL vs Halo KB encodings– Amount of domain K added by Boeing in writing CPL– % of Halo KB that would be simplified if metonymy handled– % of original text encodable in CPL– Time taken to encode the KBs– % of source text which is important (vs. fluff)– Bar graph of textual phenomena vs. frequency of occurrence

• e.g., metaphor, examples, metonymy, diagrams

– Measure of redundancy in the text book

Page 78: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Behavioral Evaluation:Ablation Methodology

Approach:a. Create set of questionsb. Send qns to Halo KB, measure % correctc. Ablate the Halo KB, add in oursd. Send qns to new KB, measure % correct

Issues:• How to ensure a fair comparison?

– defining the space of questions to look at• How to ablate the UT KB?

Page 79: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Behavioral Evaluation:Relevant AP Questions from the Halo Pilot

Questions from Halo Pilot Syllabus & Sample Qns• Q10. Given an equilibrium reaction, which species in the reaction act as bases?• Q33. Each of the following can act as both a Bronsted acid and a Bronsted base

EXCEPT ...

Questions from Challenge Exam, Project Halo• Q18. Given an equilibrium reaction, the species that act as acids include which

of the following?• Q19. Given an equilibrium reaction, the correct acid/conjugate base pair is ...• Q37. Which of the following species forms an acid when added to water?• Q38. Which of the following (lists of chemicals) is in correct order of

increasing acidity?

Page 80: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Behavioral Evaluation:Variations on a Theme…

• Four main question patterns:– What is the conjugate base/acid of X?– Is X stronger/weaker acid/base than Y?– Find the conjugate acid-base pairs in equation E– What is the direction of the equilibrium?

Page 81: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Core Knowledge Encodings

Conjugate pairs

Relative strengths

Labelling acid/bases in a reaction

Computing direction of the reaction

Giant KM procedure for formula manipulation

Qualitative absolute strengths (strong/weak/negligible)

+ qualitative comparison

Giant KM procedure for reaction manipulation

KM rule

Task Halo KB

Lookup table

Relative strength assertions

if-then rule using conjugate pairs

if-then rule

CPL

Page 82: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Core Knowledge Encodings

Conjugate pairs

Relative strengths

Labelling acid/bases in a reaction

Computing direction of the reaction

Giant KM procedure for formula manipulation

Qualitative absolute strengths (strong/weak/negligible)

+ qualitative comparison

Giant KM procedure for reaction manipulation

KM rule

Task Halo KB

Lookup table

Relative strength assertions

if-then rule using conjugate pairs

if-then rule

CPLMore general

≈≈

(equivalent)

Page 83: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Behavioral Evaluation: Discussion Points

• We can predict the outcome of any evaluation– can see the internals of each system

• So what is a fair sample set?– Generate instantiations of the 4 templates?– AP exam questions?– Extend to cover other knowledge in the 5 pages?

• none of it contained in Halo KB

Page 84: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Analytic Evaluation

• Possible Metrics include:– Complexity of CPL vs Halo KB encodings– Amount of domain K added by us in writing CPL– % of KB simplified if metonymy handled– % of original text encodable in CPL– Time taken to encode the KBs– % of source text which is important (vs. fluff)– Bar graph of textual phenomena vs. frequency

• e.g., metaphor, examples, metonymy, diagrams

– Measure of redundancy in the text book

Page 85: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Agenda• This Seedling and Mobius

– Major lessons learned• Reformulations in CPL

– Whole 5 pages– Key Sentences

• How do other texts compare?• Generics • How to identify “important” text• Principles for an extensible KB• Evaluation discussion• Tuples as another source of knowledge

Page 86: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Knowledge Mining

There is a largely untapped source of general knowledge in texts, lying at a level beneath the explicit assertional content, and which can be harnessed.

“The camouflaged helicopter landed near the embassy.” helicopters can land helicopters can be camouflaged

Schubert’s Conjecture:

Our attempt: “lightweight” LFs generated from ReutersLF forms: (S subject verb object (prep noun) (prep noun) …) (NN noun … noun) (AN adj noun)

Page 87: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Knowledge Mining

HUTCHINSON SEES HIGHER PAYOUT. HONG KONG. Mar 2.Li said Hong Kong’s property market remains strong while its economy is performing better than forecast. Hong Kong Electric reorganized and will spin off its non-electricity related activities. Hongkong Electric shareholders will receive one share in the new subsidiary for every owned share in the sold company. Li said the decision to spin off …

Newswire Article

Shareholders may receive shares.

Companies may be sold.

Shares may be owned.

Implicit, tacit knowledge

Page 88: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Knowledge Mining – our attempt

;; Atoms can combine(S "atom" "combine")

;; For example, combustion reactions are redox reactions because elemental oxygen is converted to compounds of oxygen (Section 3.2).(S "reaction" "be" "reaction")(S-ADJ "oxygen" "converted" ("to" "compound"))(AN "elemental" "oxygen")

;; Plan: Metals react with acids to form salts and gas.(S "metal" "react" (PP "with" "acid"))

;; Extensive oxidation can lead to the failure of metal machinery parts or the deterioration of metal structures.(S "oxidation" "lead" (PP "to" "failure"))(S "oxidation" "lead" (PP "to" "deterioration"))(AN "extensive" "oxidation")

Fragment of the raw data (Brown & Lemay)

Page 89: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray
Page 90: Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray

Summary• Q3:

– Completed coding of key sentences in CPL– Demonstration of inference with that knowledge– Study of cues for identifying important text– Assembly of key lessons learned – Interaction with ISI– Exploration of shallow knowledge extraction

• Q4– Finish interpretation of additional sentences– Assemble qualitative and quantitive evaluations– Continue interaction with ISI: Side-by-side study – Final report