yves marcoux - olst-rali - 21 mars 20071 une approche basée sur la langue naturelle pour la...
TRANSCRIPT
Yves Marcoux - OLST-RALI - 21 mars 2007 1
Une approche basée sur la langue naturelle pour la modélisation de
documents structurés
Yves MARCOUXGRDS – EBSI
Université de Montréal
Yves Marcoux - OLST-RALI - 21 mars 2007 2
A natural-language approach to modeling
Why is some XML so difficult to write?
<http://www.idealliance.org/papers/extreme/proceedings/html/2006/Marcoux01/EML2006Marcoux01.html>
Yves Marcoux - OLST-RALI - 21 mars 2007 3
Structure of the talk
1. The problem
2. Proposed direction for solution
3. Conclusion
4. Question period
Yves Marcoux - OLST-RALI - 21 mars 2007 4
Writing well-formed XML: author’s choices
• <sex><male /></sex>• <is-female>FALSE</is-female>• <gender gender="♂" />• <note>It's a boy!</note>
♂ = ♂
Yves Marcoux - OLST-RALI - 21 mars 2007 5
Writing valid XML is collaborative work
• Modeler has chosen the markup (container)
• Author supplies the contents
• Much like a form
• Collaborative work communication between parties: modeler and author
• But the modeler is gone…
Yves Marcoux - OLST-RALI - 21 mars 2007 6
Problem
• Authoring environments are:– good at conveying the syntactic intentions (or
decisions) of the modeler– not as good at conveying the semantic
intentions of the modeler
• Often, all there is is a generic ID or some slightly more developed form– Ex.: “date” in a memo
Yves Marcoux - OLST-RALI - 21 mars 2007 7
What is available?
• More or less developed forms of genIDs (and attribute names)
• General documentation of the model
• Per element (attribute) documentation
• OK for tooltips or popups
• Could we do better?
• (Applications / stylesheets are not appropriate)
Yves Marcoux - OLST-RALI - 21 mars 2007 8
Could we aim at…
• Having a semantic conversation right in the editing window?
• In the same way that there is actually a syntactic conversation?
• Yes…
Yves Marcoux - OLST-RALI - 21 mars 2007 9
Structure of the talk
1. The problem
2. Proposed direction for solution
3. Conclusion
4. Question period
Yves Marcoux - OLST-RALI - 21 mars 2007 10
Key idea
• Have modeler prepare bits of NL (prose)
• That can be intertwined with author-supplied contents to give them meaning
• Allows “fill-in”-like sentences
• And thus, a semantic conversation in the editing window
• NB: modeler segments can contain hyperlinks
Yves Marcoux - OLST-RALI - 21 mars 2007 11
Example
Facts about some US cities
City PopulationAnnual snowfall (inches)
Denver 850,000 23
Rochester 240,000 88
Palm Spring 48,000 0
Yves Marcoux - OLST-RALI - 21 mars 2007 12
Raw XML
<facts-about-US-cities> <city> <name>Denver</name> <population>850,000</population> <annual-snowfall-in-inches>23</annual-snowfall-in-inches> </city> <city> <name>Rochester</name> <population>240,000</population> <annual-snowfall-in-inches>88</annual-snowfall-in-inches> </city> ...</facts-about-US-cities>
Yves Marcoux - OLST-RALI - 21 mars 2007 13
Prose equivalent
Here are facts about some US cities. The city of Denver has a population of 850,000 and an annual snowfall of 23 inches. The city of Rochester has a population of 240,000 and an annual snowfall of 88 inches. The city of Palm Spring has a population of 48,000 and an annual snowfall of 0 inches.
Yves Marcoux - OLST-RALI - 21 mars 2007 14
Modeler prepares “peritext” segments
Element text-before text-after
facts-about-US-cities"Here are facts about some US cities."
empty
city " The city " "."
name "named " empty
population" has a population of "
empty
annual-snowfall-in-inches" and an annual snowfall of "
" inches"
Yves Marcoux - OLST-RALI - 21 mars 2007 15
Possible “semantic” view
Here are facts about some US cities. The city named Denver has a population of 850,000 and an annual snowfall of 23 inches. The city named Rochester has a population of 240,000 and an annual snowfall of 88 inches. The city named Palm Spring has a population of 48,000 and an annual snowfall of 0 inches.
Yves Marcoux - OLST-RALI - 21 mars 2007 16
What it allows during editing (in semantic view)
• Peritexts convey the semantic intentions of the modeler
• A semantic conversation takes place in the editing window (instead of a syntactic one)
• Fill-in sentences:– Make “tag abuse” embarrassing…– Likely to reduce some kinds of errors
• Other views / fragment viewing / hyperlink
Yves Marcoux - OLST-RALI - 21 mars 2007 17
Discussion
• This is not like defining an application– Not a stylesheet mechanism
• Peritexts (fixed here) could be allowed to vary with some parameters:– position among siblings– attribute value– etc.
• (Attributes should be treated)
Yves Marcoux - OLST-RALI - 21 mars 2007 18
Why does it work?
• Sometimes tricky (see paper), but…
• NL has very high affordance
• NL can act as it’s own metalanguage
• XML contents + NL usually mix pretty well
Yves Marcoux - OLST-RALI - 21 mars 2007 19
Intertextual semantics
• Meaning of a text fragment is given by placing it in a network of other texts
• That network can simply consist in a sentence (or “quasi-sentence”)
• Or more elaborate topology: peritexts can contain hyperlinks, determining sense-making / learning paths– Too much hyperlinking can spoil the idea!
Yves Marcoux - OLST-RALI - 21 mars 2007 20
Interpretation workflow
• d is document or fragment, H is a human• S(d) is the intertextual semantics of d• S(d) is in NL• S is machine computable• Actual meaning of d for H may vary:
– with H– for a same H, from one “reading” of S(d) to
another
d S(d) actual “meaning” of d for HS H
Yves Marcoux - OLST-RALI - 21 mars 2007 22
Suggests a modeling process
• Modeler starts with the prose
• Identify peritexts
• Work out more and more abbreviated forms– Will correspond to different “views” in the
editor
• Tersest level gives markup
• Increase model usability?
Yves Marcoux - OLST-RALI - 21 mars 2007 23
Mixed content question revisited
• Known: can get rid of mixed content with<!ELEMENT text (#PCDATA)>
Example:<!ELEMENT (e1 | e2 | … | #PCDATA)*>
becomes:<!ELEMENT (e1 | e2 | … | text)*>
• Why does it feel bad?– Tags “text” are not abbreviations of any
reasonable peritexts!
Yves Marcoux - OLST-RALI - 21 mars 2007 24
Is NL too much to ask for?
• Relative to some “target” community
• Can go a long way (previous slide)
• Hyperlinks are allowed in peritexts– Allows defining “sense-making” or learning
paths
• (Almost) anything formal can be turned into NL…
Yves Marcoux - OLST-RALI - 21 mars 2007 25
NL as formalism common denominator
Expression in artificial formalism
Textbook explaining formalism STAPLER
Equivalent expression in NL
Yves Marcoux - OLST-RALI - 21 mars 2007 26
Editing setup without intertextual semantics
Modeler
Author
Valid XMLinstance or fragment
World
NL and presupposed
knowledge of target community
XML EDITOR
XML DTD
Doc. / tr.material
Yves Marcoux - OLST-RALI - 21 mars 2007 27
Editing setup with intertextual semantics
Modeler
Author
Valid XMLinstance or fragment
World
NL and presupposed
knowledge of target community
XML EDITOR
XML DTDtext-before
and text-aftersegments
NL equivalent
Yves Marcoux - OLST-RALI - 21 mars 2007 28
Structure of the talk
1. The problem
2. Proposed direction for solution
3. Conclusion
4. Question period
Yves Marcoux - OLST-RALI - 21 mars 2007 29
What it suggests
• Bring some of the discipline of producing “good documents” (manuals of style) into model & interface design– E.g., don’t abuse hyperlinking
• Litterate modeling, litterate interfaces– Litterate interface / interaction design
• Benefit: make explicit prerequisite knowledge & sense-making / learning paths
Yves Marcoux - OLST-RALI - 21 mars 2007 30
Other possible uses of intertextual semantics
• Legal documents with multiple renditions• NLP systems that cannot treat markup
– Including full-text indexing• <ex>Hamlet</ex>• “Exit Hamlet”
• Other data models– Ex.: relational
• Normal forms
– A new look at expressivity
Yves Marcoux - OLST-RALI - 21 mars 2007 31
Future work
• Editing:– Work out a few existing / new models– Properly integrate attributes– More powerful peritext computation– Implement ideas in a real editor
• Display peritexts when chosing insertion• Hyperlinks in displayed peritexts
– Experiment with real authors
Yves Marcoux - OLST-RALI - 21 mars 2007 32
Future work
• More than peritexts?
• More than NL (icons, sound, …)?
• Compare with other semantic frameworks– Downstream semantics: Wrightson, Renear
et al.
• Other models
• Tackle litterate modeling / interface design