gaiku generating haiku with word associations norms yael netzer, david gabay, yoav goldberg and...

42
Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay , Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion University of the Negev Israel CALC’09 May 35 th 2009

Post on 19-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Gaiku Generating Haiku withWord Associations Norms

Yael Netzer, David Gabay , Yoav Goldberg and Michael ElhadadDepartment of Computer ScienceBen Gurion University of the Negev Israel

CALC’09 May 35th 2009

Page 2: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Creativity“the forming of associative elements into new combinations which either meet specified requirements or are in some way useful…”

[Mendick 1969]

Three main pathes to a creative solution:- serendipity- similarity- mediation

Page 3: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Poetry

ComputationalCreativity

WAN

Generating Haiku!

Page 4: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Haiku

Page 5: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Haiku• Form of poetry originated in Japan, 16th

Century• Three lines of 5,7,5 phonetic units (mora) • Use present tense and use no judgmental

words• Adopted in Western languages, 20th

Century• 5,7,5 3 short lines• Traditionaly, reference to nature and

seasons, but modern Haiku are not restricted

Basho Haiku

古池や蛙飛込む水の音old pond . . .a frog leaps inwater’s sound

Page 6: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

iced over pond

I skip a rock

the entire width

blossomlessbut not unloved the old magnolia

fishing guidesboat in the backgrounda new trip

a holy cowa carton of milkseeking a church

blind snakeson the wet grasstombstoned terror

first date —the little pileof anchovies

Page 7: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Poetry Generation

Page 8: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Bo

y

S ul

Page 9: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Bo

y

S ul

Structure

Page 10: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Bo

y

S ul

Structure

Content

Page 11: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Bo

y

S ul

3 lines,Grammatical,Haiku-like

Inspiring,Interesting,Intriguing,Joyful,…

Page 12: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Previous works

- Manurung [2003]- Manurung et al. [2000]- Gervas [2001]

Emphasize on Structure,Less on Content

Page 13: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Body / Structure

• Haiku Corpus– ~3,500 Haiku in English– Various sources

• amateurish sites• children’s writings• translations of classic Japanese Haiku of Bashu

and others• ’official’ sites of Haiku Associations (e.g.,

Haiku Path - Haiku Society of America).

Page 14: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Body / Structure

POS TagCount

Line 1 Patterns:

280 JJ NN276 NN NN ...

Line 2 Patterns:

64 DT_the JJ NN

Line 3 Patterns:

.…Count

Pattern Transitions:

P(line2==DT_the NN |

line1==JJ NN) ... =

NN IN_of NNPDT_a NN IN_ofNNS

NN NNNNS CC NNSIN_on DT_a NN NN

Page 16: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Body / StructureGoogle 1T-Web / Proj Gutenberg

POS Tagged

mat

ch

JJ NNSDT_a JJ NNIN_of NN

Line 1 Patterns:280 JJ NN276 NN NN...

Line 2 Patterns:64 DT_the JJ NN …

Line 3 Patterns:

….

Pattern Transitions:P(line2==DT_the NN | line1==JJ NN) = ... …

Page 17: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Body / StructureGoogle 1T-Web / Proj Gutenberg

POS Tagged

mat

ch

JJ NNSDT_a JJ NNIN_of NN

pouring catsa pilot careof fighter

Line 1 Patterns:280 JJ NN276 NN NN...

Line 2 Patterns:64 DT_the JJ NN …

Line 3 Patterns:

….

Pattern Transitions:P(line2==DT_the NN | line1==JJ NN) = ... …

Page 18: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Body / StructureLine 1 Patterns:

AA BB CC / 12

BB CC DD / 10

Line 2 Patterns:

CC DD EE / 20

Line 3 Patterns:

.…

Pattern Transitions:

P(Line2=AA BB | Line1= XX YY)

Google 1T-Web / Proj Gutenberg

POS Tagged

mat

ch

Grammatical outputPreserves Haiku

“Texture”

JJ NNSDT_a JJ NNIN_of NN

pouring catsa pilot careof fighter

Page 19: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Soul?

• Requirements: good “story”– cohesive– surprising– provoke feelings/emotions– metaphorical– “Should leave the reader wondering…”

… Creative!

Page 20: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Soul?• An idea:

capture “story” seed as sequence of concepts

butterfly, spring, flowerthief , steal , jailmosquito, blood, vampire

but not any seed will docat , feline , claw too

cohesivecomputer , coat , queen too

divergent

Page 21: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Soul?

Is WordNet a good soul?

not really

it may give cohesiveness,but bad stories

Page 22: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Soul?

Is WordNet a good soul?

not really

We actually measured it

in Haiku Corpus

Page 23: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion
Page 24: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Butterfly Spring Flower

• The connection between these words is reconstructable by human

• It is not available in WordNet

• Where can we find such relations?

Page 25: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Word Association Norms

Page 26: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Word Association Norms (WAN)

• Collection of cue words a set of free associations (targets) with quantitative and statistical measures.

(mouse CAT 0.5, RAT 0.08, CHEESE 0.07, HOLE 0.05…)

• Given a cue - collect immediate responses of first word that comes to mind.

• Largest WAN we know for English is the University of South Florida Free Association Norms (Nelson et al., 1998).http://w3.usf.edu/FreeAssociation/

• 5,019 cue words and 10,469 additional target that were collected with more than 6,000 participants since 1973.

WAN – weighted directed graph, nodes are stemmed words.

Page 27: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

water

flowerbutterfly

spring

fall fall

green

water

bloom

Page 28: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Why Word Associations• Added value of WAN: an insight on language,

not found in WordNet or are hard to acquire from corpora [Sinopalnikova & Smrz 2004]

• Associative thinking takes part in the process of writing and reading poetry.

• Haiku, because so short - relies on lexical associations for concept progression

Hypothesis: word-associations are good catalyzers for creativity, can be used as a building block in the creative process of Haiku generation.

Page 29: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

We first test this hypothesis by analyzing a corpus of existing Haiku poems.

• Can the creativity of text as reflected in word associations be quantified?

• Are Haiku poems indeed more associative than newswire text or prose?

Page 30: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Two nodes are connected iff one of them is a cue for the other.

Associative distance:number of edges in the shortest path between the

words in the associations-graph.

WordNet distance:number of edges in the shortest path between any

synset of one word to any synset of the other word

Associativity of a text - the number of associated word pairs in the text, normalized by the number of word pairs in the text of which both words are in the WAN.

WordNet-relations level - the number of WordNet-related word pairs in the text.

Page 31: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Average AssociativityWe measure the associavity and WordNet relations levels of 200 of the Haiku in our Haiku Corpus, as well as of random 12-wordsequences from Project Gutenberg and from the NANC newswire corpus.

SourceAvg. Assoc. Relations (<3)

Avg. WordNet Relations (<4)

News0.262.02

Prose0.221.4

Haiku0.321.38

Page 32: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Filling body with soul: Theme Selection

• Generating the seed of the story:– Start with a word

random walk on a word graphMany possible variants. We currently use:

start with the node of the seed word

do several short random walkskeep resulting word set

Page 33: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

water

flowerbutterfly

spring

fall fall

green

water

bloom

Spring {flower, butterfly…}

Page 34: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Filling body with soul• For a given structure:

– Choose first line containing seed word– Choose other lines containing a word from the set

• This is adequate, but relations might be straightforward

Searching for a better soul: Generate several poems for the pattern

Rerank them based on associativity measure.Reranking catches further “residual” relations

Page 35: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

6alligator peara handful of whitesin the spring

8avocado peara kind of bootsin the fall

10pear salada season of tearsin the summer

10pear treea seasoning of spicesin the fall

10alligator peara spring of tearsin the blackness

NN NNDET_a NN of

NNSPP_in DET_the

NN

Page 36: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Evaluation Method• ‘Turing test’:

– Was this Haiku written by human or by a computer?

– How would you grade it between 1 to 5?• Settings:

– AUTO Haiku set: 15 Haiku created by Gaiku without any manual selection, 10 random human Haiku on same subjects

– SEL set: 17 Haiku created by Gaiku, selected manually out several runs, 9 award winning human Haiku

• 52 subjects

Page 37: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Results: AUTO set

Page 38: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Results: SEL set

Page 39: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

The Best of Gaiku

• Best in SEL. Classified as human - 77.2%, average grade 3.09

• Best in AUTO. Classified as human - 72.2%, average grade 2.75

early dewthe water containsteaspoons of honey

cherry treepoisonous flowers lieblooming

Page 40: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

Conclusions• Word Association Norms have good potential

in creative content generation

Future Work: Lots!– Haiku: improve theme selection– Additional forms of creative texts

• Test WAN in general NLP tasks:– Use WAN for (Non-creative) Generation – Word Sense Disambiguation– Lexical chains– ‘Guess the word’ given associations (for people

with SLI)

Page 41: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

iced over pond

I skip a rock

the entire width

blossomlessbut not unloved the old magnolia

first date —the little pileof anchovies

fishing guidesboat in the backgrounda new trip

a holy cowa carton of milkseeking a church

blind snakeson the wet grasstombstoned terror

Page 42: Gaiku Generating Haiku with Word Associations Norms Yael Netzer, David Gabay, Yoav Goldberg and Michael Elhadad Department of Computer Science Ben Gurion

iced over pond

I skip a rock

the entire width

blossomlessbut not unloved the old magnolia

fishing guidesboat in the backgrounda new trip

a holy cowa carton of milkseeking a church

blind snakeson the wet grasstombstoned terror

first date —the little pileof anchovies