multimodal semantic text editing - mdhgdc/work/transdisciplines/peterljungloef... · smartphone,...

MUSTE: Multimodal Semantic Text Editing

VR project 2015–2019

Peter LjunglöfData- och informationsteknik

CLT workshop, 20 oktober 2015

––– ––––lingu yntac

How to edit syntax trees on the surface

I will describe an interactive system where the user can build and modify texts in a grammatical manner.

The user can insert, delete and change words and phrases, and the system ensures that the resulting sentences are always grammatically correct.

This is accomplished by automatically rearranging words and changing inflection, if necessary.

2

One single editing operation

The tool has only one type of editing operation:”click” on a word to display a menu of alternativesthere is no keyboard input, and no special ”gestures” it can easily be adapted to different devices

such as tablets or smartphonesand different input methods

such as eye-tracking and switch access scanning

3

Usage

It can be used for automatic translation as an interactive travel phrasebook on your smartphone, more useful than a Berlitz book and better translations than Google

It can be used for authoring quick messagese.g., standard mobile text answers that can be modified quicklye.g., a tool for quickly correcting minor errors in speech recognition (could be used in a smartphone or -watch)

4

More usage

It can be used by disabled peopleas a communication aid that helps the user to create utterances when talking with other people

It can be used for computer assisted language learning

as a kind of interactive textbook, where the student can play with different linguistic features, such as inflection, word order and other constructions

5

Demo time!

http://heatherleaf.github.io/muste/demo/muste.html

6

http://heatherleaf.github.io/muste/demo/muste.html

Overview

7

System overview

sentences are stored as syntax treesthe user performs editing operations on the surface stringthe editing operations are translated to constraints on the underlying treethe system finds similar trees satisfying the constraintsthe user selects among the suggested variantsthe underlying grammar formalism is GF

8

Tree editing without trees

Each word in the surface sentece is introduced by exactly one node in the abstract tree.

when the user selects a word, the system searches for similar trees from the corresponding tree nodeit is not always the selected node that changes:

”ideas” – ”an idea” – ”this idea”if the user wants to change ”ideas” to singular, the system has to change the determiner: (indefdet ideanoun) ➙ (adet ideanoun) or (thisdet ideanoun)

9

Measuring tree similarity

Ideally, I want to use tree edit distance:the number of replacements, insertions and deletions between two trees

But this is complicated to implement, so I use the Levenshtein string edit distance instead:

first I flatten the trees into lists of function symbolsthen I calculate the edit distance between the lists

But this doesn't work very wellperhaps something for a PhD student to look into?

10

Generating tree candidates

The system has to generate candidate phrases that it can suggest to the user.

ideally it would generate candidates in order of similarity with the current treebut I don’t know how to do that, so I instead generate all trees (up to a given level), and sort them according to similarity

this is of course not the right way to do it…I also have some additional techniques for filtering the candidates

11

Multilingual grammars

Editing operations are translated to operations on the abstract syntax tree, which is language-independent:

the system shows each translation on a new rowwhen the user selects a phrase, the corresponding words are highlighted in the other langaugesthe user can change the sentence in any language and the other translations change automagicallythe second language can also be a symbol language such as sign language or Blissymbolics

12

Theory

13

Example grammar (1)

cat lincat Noun = {s : Num => Str} Adj = {adj, adv : Str} NP = {s : Str; n : Num} S = {s : Str}

param Num = Sg | Pl

14

Example grammar (2)fun lin idea, = mkNoun "idea" cat : Noun = mkNoun "cat"

green, = mkAdj "green" furious, = mkAdj "furious" colourless : Adj = mkAdj "coulorless"

the, = mkDet Sg "the" all, = mkDet Pl "all" indef : Noun -> NP = mkDet Pl ""

sleep, = mkVerb "sleep" yawn : NP -> S = mkVerb "yawn"

15

Example grammar (3)

fun adjn : Adj -> Noun -> Noun adjs : Adj -> S -> S

lin adjn x y = {s = \\num => x.adj ++ y.s ! num} adjs x y = {s = y.s ++ x.adv}

oper mkDet num d = \n -> {s = d ++ n.s ! num; n = num} mkAdj a = {adj = a; adv = a+"ly"} mkNoun n = {s = mkNum n (n+"s")} mkVerb v np = {s = np.s ++ mkNum (v+"s") v ! np.n} mkNum sg pl = table {Sg => sg; Pl => pl}

16

Terms and linearisationsHere are two example terms licensed by the grammar, and their corresponding linearisations:〚(sleep (the (idea)))〛 = 〚sleep〛 (〚the〛 (〚idea〛)) = 〚sleep〛 ({s = ”the” ⧺ 〚idea〛.s ! Sg ; n = Sg}) = 〚sleep〛 ({s = ”the” ⧺ table{Sg⇒”idea”;Pl⇒”ideas”}!Sg ; n = Sg}) = 〚sleep〛 ({s = ”the” ⧺ ”idea” ; n = Sg }) = … = {s = ”the idea sleeps”}

〚(sleep (indef (idea))〛 = 〚sleep〛 (〚indef〛 (〚idea〛)) = 〚sleep〛 ({s = ”” ⧺ 〚idea〛.s ! Pl ; n = Pl }) = 〚sleep〛 ({s = ”ideas” ; n = Pl }) = … = {s = ”ideas sleep”}

17

Side note: How about parsing?

Does the system parse the sentence into a tree?no!

There is no parsing involved.the syntax tree is always known everything is based on linearisations

which is good, since it is a much less difficult problem than parsing

18

GF terms are syntax trees

GF terms are trees where nodes are labeled with function symbols

I will write v for a subtree in a tree t, and〚v〛for its linearisation

Each word in〚t〛is introduced by a specific tree node v

19

sleep

indef

adjs

furious

adjn

colourless adjn

green idea

(adjs (furious) (sleep (indef (adjn (colourless) (adjn (green) (idea))))))

colourless green ideas sleep furiously

Constrained linearisation

Let's define the constrained linearisation〚v〛t as the string(s) in〚v〛 that are actually used in〚t〛

The linearisation of idea is:〚idea〛= {s = table {Sg ⇒ ”idea” ; Pl ⇒”ideas”}}

It has the following constrained linearisations:〚idea〛t = ”idea” when t = sleep(the(idea)) 〚idea〛t = ”ideas” when t = sleep(all(idea))

20

Calculating the candidate menus

When the user clicks a word w, first find its corresponding node vthen generate candidate trees:

all subtrees that can replace v, or the parent of v, or the grandparent…

filter out candidates:if the words are not changed (e.g., singular ”fish” to plural ”fish”)if they are subsumed by other candidates

”compress” the candidates by only showing the affected wordse.g., sleep(the(idea)) vs. yawn(the(idea)): the linearisation is ”the idea sleeps”, but the only affected word is ”sleeps”

group the candidates by which words in〚v〛t that are affected(…)

21

Displaying the menus

When the menus are calculated, display themthe system shows the first menuif the user selects a menu item, the system changes the sentence and the underlying treeif the user clicks on an affected word in〚v〛t, the system shows the next candidate menuif the user clicks on another word, the system calculates new menus

22

Filtering candidates

Apart from removing candidates that do not affect the selected word, candidates that are too similar to each other are also filtered out:

suppose you have the original tree t, and candidates c1 and c2

if dist(t, c1) + dist(c1, c2) ≤ dist(t, c2),then c2 is excluded, since it can easily be selected from c1 in the next stepthis is very slow, so it is currently turned off

23

Discussion

24

Summary

when the user clicks a word/phrase in the sentence, the system calculates similar words/phrases that can replace the word/phrasethe system is implemented in Javascript

actually, it has been converted to TypeScriptthe theory is grammar- and language-neutral, there are no grammar-specific hacks

but some grammars work better than othersthe source code is on GitHub:

https://github.com/heatherleaf/muste

25

https://github.com/heatherleaf/muste

Future work (i.e., wish list)

ImplementationI want a smarter way of generating candidatesI want working tablet/smartphone apps

language trainer, communication aid, text message editor

EvaluationI want to evaluate the apps on real usersI want a method for evaluating on text corpora

TheoryI want to make a corpus-based variant, where candidates are selected from corpus phrases

26

multimodal semantic text editing - mdhgdc/work/transdisciplines/peterljungloef... · smartphone,...

Documents