multimodal semantic text editing - mdhgdc/work/transdisciplines/peterljungloef... · smartphone,...
TRANSCRIPT
MUSTE: Multimodal Semantic Text Editing
VR project 2015–2019
Peter LjunglöfData- och informationsteknik
CLT workshop, 20 oktober 2015
––– ––––lingu yntac
How to edit syntax trees on the surface
I will describe an interactive system where the user can build and modify texts in a grammatical manner.
The user can insert, delete and change words and phrases, and the system ensures that the resulting sentences are always grammatically correct.
This is accomplished by automatically rearranging words and changing inflection, if necessary.
2
One single editing operation
The tool has only one type of editing operation:”click” on a word to display a menu of alternativesthere is no keyboard input, and no special ”gestures” it can easily be adapted to different devices
such as tablets or smartphonesand different input methods
such as eye-tracking and switch access scanning
3
Usage
It can be used for automatic translation as an interactive travel phrasebook on your smartphone, more useful than a Berlitz book and better translations than Google
It can be used for authoring quick messagese.g., standard mobile text answers that can be modified quicklye.g., a tool for quickly correcting minor errors in speech recognition (could be used in a smartphone or -watch)
4
More usage
It can be used by disabled peopleas a communication aid that helps the user to create utterances when talking with other people
It can be used for computer assisted language learning
as a kind of interactive textbook, where the student can play with different linguistic features, such as inflection, word order and other constructions
5
Demo time!
http://heatherleaf.github.io/muste/demo/muste.html
6
Overview
7
System overview
sentences are stored as syntax treesthe user performs editing operations on the surface stringthe editing operations are translated to constraints on the underlying treethe system finds similar trees satisfying the constraintsthe user selects among the suggested variantsthe underlying grammar formalism is GF
8
Tree editing without trees
Each word in the surface sentece is introduced by exactly one node in the abstract tree.
when the user selects a word, the system searches for similar trees from the corresponding tree nodeit is not always the selected node that changes:
”ideas” – ”an idea” – ”this idea”if the user wants to change ”ideas” to singular, the system has to change the determiner: (indefdet ideanoun) ➙ (adet ideanoun) or (thisdet ideanoun)
9
Measuring tree similarity
Ideally, I want to use tree edit distance:the number of replacements, insertions and deletions between two trees
But this is complicated to implement, so I use the Levenshtein string edit distance instead:
first I flatten the trees into lists of function symbolsthen I calculate the edit distance between the lists
But this doesn't work very wellperhaps something for a PhD student to look into?
10
Generating tree candidates
The system has to generate candidate phrases that it can suggest to the user.
ideally it would generate candidates in order of similarity with the current treebut I don’t know how to do that, so I instead generate all trees (up to a given level), and sort them according to similarity
this is of course not the right way to do it…I also have some additional techniques for filtering the candidates
11
Multilingual grammars
Editing operations are translated to operations on the abstract syntax tree, which is language-independent:
the system shows each translation on a new rowwhen the user selects a phrase, the corresponding words are highlighted in the other langaugesthe user can change the sentence in any language and the other translations change automagicallythe second language can also be a symbol language such as sign language or Blissymbolics
12
Theory
13
Example grammar (1)
cat lincat Noun = {s : Num => Str} Adj = {adj, adv : Str} NP = {s : Str; n : Num} S = {s : Str}
param Num = Sg | Pl
14
Example grammar (2)fun lin idea, = mkNoun "idea" cat : Noun = mkNoun "cat"
green, = mkAdj "green" furious, = mkAdj "furious" colourless : Adj = mkAdj "coulorless"
the, = mkDet Sg "the" all, = mkDet Pl "all" indef : Noun -> NP = mkDet Pl ""
sleep, = mkVerb "sleep" yawn : NP -> S = mkVerb "yawn"
15
Example grammar (3)
fun adjn : Adj -> Noun -> Noun adjs : Adj -> S -> S
lin adjn x y = {s = \\num => x.adj ++ y.s ! num} adjs x y = {s = y.s ++ x.adv}
oper mkDet num d = \n -> {s = d ++ n.s ! num; n = num} mkAdj a = {adj = a; adv = a+"ly"} mkNoun n = {s = mkNum n (n+"s")} mkVerb v np = {s = np.s ++ mkNum (v+"s") v ! np.n} mkNum sg pl = table {Sg => sg; Pl => pl}
16
Terms and linearisationsHere are two example terms licensed by the grammar, and their corresponding linearisations:〚(sleep (the (idea)))〛 = 〚sleep〛 (〚the〛 (〚idea〛)) = 〚sleep〛 ({s = ”the” ⧺ 〚idea〛.s ! Sg ; n = Sg}) = 〚sleep〛 ({s = ”the” ⧺ table{Sg⇒”idea”;Pl⇒”ideas”}!Sg ; n = Sg}) = 〚sleep〛 ({s = ”the” ⧺ ”idea” ; n = Sg }) = … = {s = ”the idea sleeps”}
〚(sleep (indef (idea))〛 = 〚sleep〛 (〚indef〛 (〚idea〛)) = 〚sleep〛 ({s = ”” ⧺ 〚idea〛.s ! Pl ; n = Pl }) = 〚sleep〛 ({s = ”ideas” ; n = Pl }) = … = {s = ”ideas sleep”}
17
Side note: How about parsing?
Does the system parse the sentence into a tree?no!
There is no parsing involved.the syntax tree is always known everything is based on linearisations
which is good, since it is a much less difficult problem than parsing
18
GF terms are syntax trees
GF terms are trees where nodes are labeled with function symbols
I will write v for a subtree in a tree t, and〚v〛for its linearisation
Each word in〚t〛is introduced by a specific tree node v
19
sleep
indef
adjs
furious
adjn
colourless adjn
green idea
(adjs (furious) (sleep (indef (adjn (colourless) (adjn (green) (idea))))))
colourless green ideas sleep furiously
Constrained linearisation
Let's define the constrained linearisation〚v〛t as the string(s) in〚v〛 that are actually used in〚t〛
The linearisation of idea is:〚idea〛= {s = table {Sg ⇒ ”idea” ; Pl ⇒”ideas”}}
It has the following constrained linearisations:〚idea〛t = ”idea” when t = sleep(the(idea)) 〚idea〛t = ”ideas” when t = sleep(all(idea))
20
Calculating the candidate menus
When the user clicks a word w, first find its corresponding node vthen generate candidate trees:
all subtrees that can replace v, or the parent of v, or the grandparent…
filter out candidates:if the words are not changed (e.g., singular ”fish” to plural ”fish”)if they are subsumed by other candidates
”compress” the candidates by only showing the affected wordse.g., sleep(the(idea)) vs. yawn(the(idea)): the linearisation is ”the idea sleeps”, but the only affected word is ”sleeps”
group the candidates by which words in〚v〛t that are affected(…)
21
Displaying the menus
When the menus are calculated, display themthe system shows the first menuif the user selects a menu item, the system changes the sentence and the underlying treeif the user clicks on an affected word in〚v〛t, the system shows the next candidate menuif the user clicks on another word, the system calculates new menus
22
Filtering candidates
Apart from removing candidates that do not affect the selected word, candidates that are too similar to each other are also filtered out:
suppose you have the original tree t, and candidates c1 and c2
if dist(t, c1) + dist(c1, c2) ≤ dist(t, c2),then c2 is excluded, since it can easily be selected from c1 in the next stepthis is very slow, so it is currently turned off
23
Discussion
24
Summary
when the user clicks a word/phrase in the sentence, the system calculates similar words/phrases that can replace the word/phrasethe system is implemented in Javascript
actually, it has been converted to TypeScriptthe theory is grammar- and language-neutral, there are no grammar-specific hacks
but some grammars work better than othersthe source code is on GitHub:
https://github.com/heatherleaf/muste
25
Future work (i.e., wish list)
ImplementationI want a smarter way of generating candidatesI want working tablet/smartphone apps
language trainer, communication aid, text message editor
EvaluationI want to evaluate the apps on real usersI want a method for evaluating on text corpora
TheoryI want to make a corpus-based variant, where candidates are selected from corpus phrases
26