a sentence (s) is composed of a noun phrase (np) and a verb phrase (vp). a noun phrase may be...

Download A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase

Post on 01-Jan-2016




1 download

Embed Size (px)


PowerPoint Presentation

A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP).

A noun phrase may be composed of a determiner (D/DET) and a noun (N).A noun phrase may also be composed of an adjective (ADJ) and a noun (N)

A verb phrase may be composed of a verb (V) and a noun (N) or noun phrase (NP).CMSC 330

English GrammarCMSC 330: Organization of Programming LanguagesContext-Free Grammars2CMSC 3303Program structureSyntax Source code formWhat a program looks likeIn general, syntax is described using grammars.

Semantics Execution behaviorWhat a program does3CMSC 3304MotivationPrograms are just strings of textBut theyre strings that have a certain structureA C program is a list of declarations and definitionsA function definition contains parameters and a bodyA function body is a sequence of statementsA statement is either an expression, an if, a goto, etc.An expression may be assignment, addition, subtraction, etcWe want to solve two problemsWe want to describe programming languages preciselyWe need to describe more than the regular languagesRecall that regular expressions, DFAs, and NFAs are limited in their expressiveness4CMSC 3305Context-Free Grammars (CFGs)A way of generating sets of strings or languagesThey subsume regular expressions (and DFAs and NFAs)There is a CFG that generates any regular language(But regular expressions are a better notation for languages which are regular.)They can be used to describe programming languagesThey (mostly) describe the parsing process5CMSC 3306Simple ExampleS 0 | 1 | 0S | 1S |

This is the same as the regular expression(0|1)*

But CFGs can do a lot more!6CMSC 3307Formal DefinitionA context-free grammar G is a 4-tuple: a finite set of terminal or alphabet symbolsOften written in lowercaseV a finite, nonempty set of nonterminal symbolsOften written in uppercaseSometimes called variablesNo common elements; i.e., it must be that V = P a set of productions of the form V (|V)*Informally this means that the nonterminal can be replaced by the string of zero or more terminals or non-terminals to the right of the S V the start symbol

7CMSC 3308Notational ShortcutsIf S is not specified, assume the left-hand side of the first listed production is the start symbolProductions with the same left-hand sides are usually combined with |If a production has an empty right-hand side it means 8CMSC 3309Informal Definition of AcceptanceA string is accepted by a CFG if there is some path that can be followed starting at the start symbol which generates the string

Example:S 0 | 1 | 0S | 1S |

0101:S 0S 01S 010S 01019CMSC 33010Example: Arithmetic Expressions (Limited)E a | b | c | E+E | E-E | E*E | (E)An expression E is either a letter a, b, or cOr an E followed by + followed by an Eetc.

This describes or generates a set of strings{a, b, c, a+b, a+a, a*c, a-(b*a), c*(b + a), }

Example strings not in the languaged, c(a), a+, b**c, etc.10CMSC 33011Formal Description of ExampleFormally, the grammar we just showed is = { +, -, *, (, ), a, b, c }V = { E }P = { E a, E b, E c, E E-E, E E+E, E E*E, E (E)}S = E11CMSC 33012Uniqueness of GrammarsGrammars are not unique. Different grammars can generate the same set of strings.The following grammar generates the same set of strings as the previous grammar:

E E+T | E-T | TT T*P | PP (E) | a | b | c

12CMSC 33013Another Example GrammarS aS | TT bT | UU cU |

What are some strings in the language?

13CMSC 33014PracticeTry to make a grammar which accepts0*|1*anbnRemember, we couldn't do this with a regex!

Give some example strings from this language:S 0 | 1S

What language is it?14CMSC 33015Backus-Naur FormContext-free grammar production rules are also called Backus-Naur Form or BNFA production like A B c D is written in BNF as ::= c (Non-terminals written with angle brackets and ::= is used instead of )Often used to describe language syntaxJohn BackusChair of the Algol committee in the early 1960sPeter NaurSecretary of the committee, who used this notation to describe Algol in 196215Type 0: Any formal grammarTuring machinesType-1:Linear bounded automataType-2:Pushdown automataType-3: Regular expressionsFinite state automataChomsky HierarchyCategorization of various languages and grammarsEach is strictly more descriptive than the previousFirst described by Noam Chomsky in 1956CMSC 330CMSC 33017Sentential FormsA sentential form is a string of terminals and nonterminals produced from the start symbolInductively:The start symbolIf A is a sentential form for a grammar, where ( and (V|)*), and A is a production, then is a sentential form for the grammarIn this case, we say that A derives in one step, which is written as A 17CMSC 33018Derivations is used to indicate a derivation of one step+ is used to indicate a derivation of one or more steps* indicates a derivation of zero or more stepsExample:S 0|1|0S|1S|0101:S 0S 01S 010S 0101S + 0101S * S18CMSC 33019Language Generated by GrammarA slightly more formal definitionThe language defined by a CFG is the set of all sentential forms made up of only terminals.Example:S 0|1|0S|1S|In language:Not in language:01, 000, 11, 0S, a, 11S, 19CMSC 33020ExampleS aS | TT bT | UU cU | A derivation:S aS aT aU acU acAbbreviated as S + acSo S, aS, aT, aU, acU, ac are all sentential forms for this grammarS T U Is there any derivationS + ccc ? S + Sa ?S + bab ? S + bU ?20CMSC 33021The Language Generated by a CFGThe language generated by a grammar G isL(G) = { | * and S + }(where S is the start symbol of the grammar and is the alphabet for that grammar)I.e., all sentential forms with only terminalsI.e., all strings over that can be derived from the start symbol via one or more productions21CMSC 33022Example (contd)S aS | TT bT | UU cU | Generates what language?Do other grammars generate this language?S ABCA aA | B bB | C cC | So grammars are not unique22CMSC 33023Parse TreesA parse tree shows how a string is produced by a grammarThe root node is the start symbolEach interior node is a nonterminalChildren of node are symbols on r.h.s of production applied to that nonterminalLeaves are all terminal symbolsReading the leaves left-to-right shows the string corresponding to the tree23CMSC 33024ExampleS aS | TT bT | UU cU | S aS aT aU acU ac24CMSC 33025Parse Trees for ExpressionsA parse tree shows the structure of an expression as it corresponds to a grammar E a | b | c | d | E+E | E-E | E*E | (E)aa*cc*(b+d)25CMSC 33026PracticeE a | b | c | d | E+E | E-E | E*E | (E)Make a parse tree fora*ba+(b-c)d*(d+b)-a(a+b)*(c-d)a+(b-c)*d26