informatics 2a: language complexity and the chomsky hierarchy

33
Outline Review Chomsky’s Models Dependency as a measure of Complexity of Language The Chomsky Hierarchy Informatics 2A: Language Complexity and the Chomsky Hierarchy Slides by Bonnie Webber (modified by Stuart Anderson) September 28, 2010 Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Upload: others

Post on 16-Oct-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Informatics 2A: Language Complexity and theChomsky Hierarchy

Slides by Bonnie Webber (modified by Stuart Anderson)

September 28, 2010

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 2: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Review

Chomsky’s Models

Dependency as a measure of Complexity of Language

The Chomsky Hierarchy

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 3: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Starter 1

Is there a finite state machine that recognises all those strings sfrom the alphabet {a, b} where the difference between the numberof as and number of bs is less than k for some constant k?

I True or

I False?

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 4: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Starter 2

Is there a finite state machine that recognises all those strings sfrom the alphabet {a, b} where the difference between the numberof as and number of bs is less than k for some constant k in everyprefix of s?A prefix of any string s is a string p such that there is a string qsuch that s = pq. Note that it is possible that q = ε.

I True or

I False?

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 5: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Readings and Labs

I J&M[2nd.Ed] ch. 15 (pp. 1–4)

I Kozen: Lecture 21

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 6: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Languages: Collection and Generation

A formal language is the possibly infinite set of strings over a finiteset of symbols (called a vocabulary or lexicon).

Such strings are also called sentences of the language.

Where do the sentences come from?

I from a (finite) list – useful, but not very interesting (maybemore interesting when we have collections of really largesamples of speech or text).

I from a grammar – abstract characterisation of the stringsbelonging to a language. Grammars are a generativemechanism, they give rules for generating potentially infinitecollection of finite strings.

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 7: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Different kinds of Language

Programming language: Programmers are given an explicitgrammar for the syntactically valid strings of the language thatthey must adhere to.

Human language: Children hear/see sentences of a language (their“mother tongue” or other languages used at home or in theircommunity) and are sometimes (but not always!) corrected if astring they generate isn’t in the language.

Without being given an explicit grammar, how dochildren learn a grammar(s) for the infinite number ofsentences that belong to the language(s) they speak andunderstand?

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 8: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Structure and Meaning

Small red androids sleep quietly.√

Colorless green ideas sleep furiously.√

Sleep green furiously ideas colorless. ]

Mary persuaded John to wash himself with lavender soap.√

Mary persuaded John to wash herself with lavender soap. ]Mary persuaded John to wash her with lavender soap.

Mary promised John to wash herself with lavender soap.√

Mary promised John to wash himself with lavender soap. ]Mary promised John to wash him with lavender soap.

I Characterising child language acquisition is one goal ofLinguistics.

I Characterising language learnability (grammar induction) isone goal of Informatics.

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 9: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Natural and Formal Languages

More broadly, the goals of Linguistics are to characterise:I individual languages: figuring out and specifying their sound

systems, grammars, and semantics;I how children learn language and what allows them to do so;I the social systems of language use;I how individual languages change over time, and how new

languages arise.

Work on formal languages in Informatics contributes to achievingthese goals through

I clear computational methods of characterising the complexityof languages;

I clear computational methods for processing languages;I clear computational theories of language learnability.

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 10: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Questions

We heard from Lecture 2 that grammars differ in their complexity.I What is complex about a complex grammar?I How does adding a data structure to an automaton allow its

corresponding grammar to be more complex?I How does removing limits on how the store on an automaton

is accessed allow its corresponding grammar to be morecomplex?

I Is there any relationship between language complexity andhow hard a language is to learn?

Chomsky’s desire to find a “simple and revealing” grammar thatgenerates exactly the sentences of English led him to the discoverythat some models of language were more powerful than others.[Noam Chomsky, Three Models for the Description of Language,IRE Transactions on Information Theory 2 (1956), pp. 113–124.]

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 11: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Noam Chomsky

I Credited with the creation of the theory of generativegrammar

I Significant contributions to the field of theoretical linguisticsI Sparked the cognitive revolution in psychology through his

review of B.F. Skinner’s Verbal BehaviorI Credited with the establishment of the

Chomsky-Schutzenberger hierarchy, a classification of formallanguages in terms of their generative power

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 12: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Three Models for the Description of Language

I Linguistic theory attempts to explain the ability of a speakerto produce and understand new sentences, and to reject asungrammatical other new sequences, on the basis of hislimited linguistic experience. [Chomsky 1956, p. 113]

I The adequacy of a linguistic theory can be tested by lookingat a grammar for a language constructed according to thetheory and seeing if it makes predictions that accord withwhat’s found in a large corpus of sentences of that language.

I What about what is not found in a large corpus of sentences?

I Chomsky’s paper explores the sort of linguistic theory that is“required as a basis for an English grammar what will describethe set of English sentences in an interesting and satisfyingmanner”.

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 13: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Three Models for the Description of Language

For that description to be “interesting and satisfying”, Chomskyfelt that a grammar had to be

I finite

I “revealing”, in allowing strings to be associated with meaning(semantics) in a systematic way

The three models he considered were:

1. Grammars based on Finite-state Markov processes [Shannon& Weaver 1947, The Mathematical Theory ofCommunication] – regular grammars

2. Phrase structure grammars reflecting pedagogical ideas of“sentence diagramming”

3. Transformational grammars

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 14: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Dependency and Complexity

Much of Chomsky’s argument in 3MDL is based on the notion ofdependency:Suppose s =a1a2 . . . an is a sentence of language L.We say that S has an i-j dependency if when symbol ai is replacedwith symbol bi , the string is no longer a sentence of L and whensymbol aj is then replaced by some new symbol bj , the resultingstring is a sentence of L.We’ve already seen such a dependency in English: Marypersuaded John to wash himself with lavender soap.

John ⇒ Suehimself ⇒ herself

Mary persuaded Sue to wash herself with lavender soap.

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 15: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Dependencies don’t need to be binary

I R.D. Laing took this to extremes in Knots his play on sanityin everyday language.

I “There must be something the matter with him because hewould not be acting as he does unless there was therefore heis acting as he is because there is something the matter withhim”

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 16: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Dependency Sets

I If we restrict ourselves to binary dependencies, then for anysentence s we can construct a dependency setD = {(i1, j1), . . . (ik , jk)} where each pair is a dependency inS .

I For example: If Mary has persuaded John to wash himselfwith lavender soap, then he is clean. (dep set size = 4)

I Sentences in the language generated by a regular grammarcan have dependencies.

I Consider the regular language described by a regularexpression:

L0 = (b∗ + (ab∗c))∗

I.e. where every a is eventually followed by a c and only bsmay intervene

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 17: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

An example: L0

I bbbabbcbbbabcbbbb ∈ L0 is a typical sentence in thelanguage.

I {(4, 7), (11, 13)} is the dependency set for the sentence.

I If we use the convention that we colour the pair of symbols inthe dependency set the same colour and we can reuse coloursfor parts of the string after the later symbol in thedependency pair has appeared. How many colours do we needto colour the symbols in sentences in L0?

I bbbabbcbbbabcbbbb uses just one colour.

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 18: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Limits to Dependencies

I The number of colours we need to colour the dependency setof a sentence gives us a measure of the amount that has to beremembered about earlier symbols to get the dependenciesright. If we need k colours then we need to remember ksymbols at most at any one time.

I For any regular language R there must exist a constant kR

such that the dependency set for any sentence in the languagecan be coloured with at most kR colours.

I What do you make of this claim?

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 19: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Example 1

I L1 consists of all (and only) sentences over {a, b} containingn as followed by n bs: e.g., ab, aabb, aaabbb, . . ..

I Suggest a dependency set for aaaaaabbbbbb.

I How many colours does it take to colour the dependencies?

I How many colours does it take to colour the dependencies foranbn?

I Is this a good example? What would you need to add toimprove it?

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 20: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Example 2

I L2 consists of all (and only) sentences over {a, b} containinga string of as and bs followed by its reverse{ααR | α ∈ {a, b}∗}: e.g.,aa, bb, abba, baab, abaabbaaabbbbbbaaabbaaba, . . ..

I What is the dependency set for aaaaaaaa?

I How many colours are required to colour this dependency set?

I How many colours does it take to colour the dependency setfor a2n?

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 21: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Example 3

I L3 consists of all (and only) sentences over {a, b} containinga string of as and bs followed by the same string over again,{αα | α ∈ {a, b}∗}: e.g.aa, bb, abab, baba, abbabb, abaaba, . . ..

I What is the dependency set for aaaaaaaa?

I How many colours does it take to colour the dependencies?

I How many colours does it take to colour the dependency setfor a2n?

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 22: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Questions

I For any string of length 2k in L2, what is its dependency set?

I For any string of length 2k in L3, what is its dependency set?

I Is the dependency set unique for strings in L1? strings in L2?strings in L3?

I For each of the languages L1, L2 and L3 what is the minimumand maximum size of the dependency set for any string oflength 2k?

I Give an example language in which some sentences have morethan one dependency set.

I Can you devise a language which is regular (i.e. recognisableby a FSM) and whose dependency set needs more than onecolour?

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 23: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

The simplest languages – ones that can be described by a regulargrammar – need at most a finite number of colours to colour anydpendency set in the language.

They are at the lowest rung of the Chomsky Hierarchy.

regular grammars

Are all languages with arbitrarily many dependenciesequally complex?

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 24: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Phrase Structure Grammars

Phrase structure grammars provide a way of analysing sentencesvery much like some of us were taught to do:

the man took the book

NP verb NPVP

Sentence

This is called an “Immediate Constituent Analysis”.It shows a sentence made of a noun phrase (NP) followed by averb phrase (VP).

. . . a verb phrase made of a verb folllowed by an NP.

How is phrase structure specified?

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 25: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

A phrase structure grammar consists of

I a finite vocabulary V

I a finite set Σ of initial strings over VI a finite set of rules of the form X → Y where

1. X and Y are strings over V2. Y is formed from X by replacing one symbol of X with a string

over V3. Neither the replaced symbol nor the replacing string is empty

(ε).

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 26: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Context-free Phrase Structure Grammars

Rules of the simplest PS Grammars contain only a single symbolon their left-hand side – e.g.,

Σ: {S}S → NP VPVP → verb NPNP → the manNP → the bookverb → took

These are called Context-free PSGs or, for short, Context-freeGrammars (CFGs).

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 27: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Derivations in CFGs

The sequence of strings over V produced by a sequence of PS ruleapplications, starting from an initial string, is called a derivation:

S ⇒ NP VP ⇒ NP verb NP ⇒ NP verb the book ⇒ NPtook the book ⇒ the man took the book

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 28: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Dependency and PS Grammars

Some dependencies that are beyond the capability of a regulargrammar can be captured by a context-free grammar. Suchdependencies are ones that can be generated locally.

Recall L1: all (and only) sentences over {a, b} containing n a’sfollowed by n b’s.Here, the presence of a b on the right of the string depends onthere being a comparable a on the left.Simple PSG for generating L1:

V = {a, b, S}Σ = SPS rules: S → aSb

S → ab

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 29: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Derivation

Sample derivation:

S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaaabbbb

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 30: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Dependency and Complexity Revisited

Are all dependencies local?Are there dependencies that cannot be capture in a CFG?

context-free grammars

regular grammars

The dependency in L3 = {XX} where X is a string over {a, b}cannot be captured by a CFG, nor can the dependency in L4,consisting of all (and only) sentences over {a, b, c} containing astring of n a’s, then n b’s followed by n c’s – e.g., abc, aabbcc,aaabbbccc, etc.

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 31: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Context-sensitive PSGs

Phrase structure grammars with rules whose LHS contain >1symbol are called context-sensitive phrase structure grammars orsimply, context-sensitive grammars.

Simple context-sensitive grammar for generating L4:

V = {a, b, c, S, B}Σ = SPS rules: S → abc | aSBc

cB → BcbB → bb

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 32: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Sample Derivation

Sample derivation:

S ⇒ aSBc ⇒ aaSBcBc ⇒ aaabcBcBc ⇒ aaabBccBc ⇒aaabbccBc ⇒ aaabbcBcc ⇒ aaabbBccc ⇒ aaabbbccc

Context on the LHS allows for more dependencies and hence morecomplexity.

context-sensitive grammars

context-free grammars

regular grammars

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy

Page 33: Informatics 2A: Language Complexity and the Chomsky Hierarchy

OutlineReview

Chomsky’s ModelsDependency as a measure of Complexity of Language

The Chomsky Hierarchy

Top of the Chomsky Hierarchy

Arbitrary re-write systems that can take account of any amount ofcontext on the LHS and re-write any number of symbols, calledType 0 grammars.

Type 0 grammars

context-sensitive grammars

context-free grammars

regular grammars

This is what is normally called the Chomsky hierarchy.

Slides by Bonnie Webber (modified by Stuart Anderson) Inf2A: Chomsky Hierarchy