parse tree of python code with inset tokenizationthe syntax of textual programming languages is...

12
In computer science, the syntax of a programming language is the set of rules that define the combinations of symbols that are considered to be correctly structured programs in that language. The syntax of a language defines its surface form.[1] Text-b ased programming languages are based on sequences of characters, while visual programming languages are based on the spatial layout and connections between symbols (which may be textual or graphical). The lexical grammar of a textual language specifies how characters must be chunked into tokens. Other syntax rules specify the permissible sequences of these tokens and the process of assigning meaning to these token sequences is part of semantics. The syntactic analysis of source code usually entails the transformation of the linear sequence of tokens into a hierarchical syntax tree (abstract syntax trees are one convenient form of syntax tree). This process is called parsing, as it is in syntactic analysis in linguistics. Tools have been written that automatically generate parsers from a specification of a language grammar written in Backus-  Naur form, e.g., Y acc (yet another compiler compiler). [edit] Syntax definition  Parse tree of Python code with inset tokenizationThe syntax of textual programming languages is usually defined using a combination of regular expressions (for lexical structure) and Backus-Naur Form (for grammatical structure) to inductively specify syntactic categories (nonterminals) and terminal symbols. Syntactic categories are defined by r ules called productions, which specify the values that belong to a particular syntactic category .[1] Te rminal symbols are the concrete characters or strings of characters (for example keywords such as define, if, let, or void) fr om which syntactically valid programs are constructed. Below is a simple grammar, based on Lisp, which defines productions for the syntactic categories expression, atom, number, symbol, and list: expression ::= atom | list atom ::= number | sym bol number ::= [+-]?['0'-'9']+ symbol ::= ['A'-'Z''a'-'z'].* list ::= '(' expression* ')' This grammar specifies the following: an expression is either an atom or a list; an atom is either a number or a symbol; a number is an unbroken sequence of one or more decimal digits, optionally preceded by a plus or minus sign; a symbol is a letter f ollowed by zero or more of any characters (excluding whitespace); and a list is a matched pair of parentheses, with zero or more expressions inside it. Here the decimal digits, upper- and lower-case characters, and parenthese s are terminal symbols. The following are examples of well-formed token sequences in this grammar: '12345', '()', '(a b c232 (1))' The grammar needed to specify a programming language can be classified by its position in the Chomsky hierarchy. The syntax of most programming languages can be specified using a Type-2

Upload: toradeskijer

Post on 30-May-2018

239 views

Category:

Documents


0 download

TRANSCRIPT

8/9/2019 Parse Tree of Python Code With Inset TokenizationThe Syntax of Textual Programming Languages is Usually Define…

http://slidepdf.com/reader/full/parse-tree-of-python-code-with-inset-tokenizationthe-syntax-of-textual-programming 1/12

In computer science, the syntax of a programming language is the set of rules that define thecombinations of symbols that are considered to be correctly structured programs in that language.The syntax of a language defines its surface form.[1] Text-based programming languages are basedon sequences of characters, while visual programming languages are based on the spatial layout andconnections between symbols (which may be textual or graphical).

The lexical grammar of a textual language specifies how characters must be chunked into tokens.Other syntax rules specify the permissible sequences of these tokens and the process of assigningmeaning to these token sequences is part of semantics.

The syntactic analysis of source code usually entails the transformation of the linear sequence of tokens into a hierarchical syntax tree (abstract syntax trees are one convenient form of syntax tree).This process is called parsing, as it is in syntactic analysis in linguistics. Tools have been writtenthat automatically generate parsers from a specification of a language grammar written in Backus-

 Naur form, e.g., Yacc (yet another compiler compiler).

[edit] Syntax definition

 Parse tree of Python code with inset tokenizationThe syntax of textual programming languages isusually defined using a combination of regular expressions (for lexical structure) and Backus-Naur Form (for grammatical structure) to inductively specify syntactic categories (nonterminals) andterminal symbols. Syntactic categories are defined by rules called productions, which specify thevalues that belong to a particular syntactic category.[1] Terminal symbols are the concretecharacters or strings of characters (for example keywords such as define, if, let, or void) from whichsyntactically valid programs are constructed.

Below is a simple grammar, based on Lisp, which defines productions for the syntactic categoriesexpression, atom, number, symbol, and list:

expression ::= atom | listatom ::= number | symbolnumber ::= [+-]?['0'-'9']+symbol ::= ['A'-'Z''a'-'z'].*list ::= '(' expression* ')'

This grammar specifies the following:

an expression is either an atom or a list;an atom is either a number or a symbol;a number is an unbroken sequence of one or more decimal digits, optionally preceded by a plus or minus sign;a symbol is a letter followed by zero or more of any characters (excluding whitespace); anda list is a matched pair of parentheses, with zero or more expressions inside it.Here the decimal digits, upper- and lower-case characters, and parentheses are terminal symbols.

The following are examples of well-formed token sequences in this grammar: '12345', '()', '(a b c232(1))'

The grammar needed to specify a programming language can be classified by its position in theChomsky hierarchy. The syntax of most programming languages can be specified using a Type-2

8/9/2019 Parse Tree of Python Code With Inset TokenizationThe Syntax of Textual Programming Languages is Usually Define…

http://slidepdf.com/reader/full/parse-tree-of-python-code-with-inset-tokenizationthe-syntax-of-textual-programming 2/12

grammar, i.e., they are context-free grammars.[2] However, there are exceptions. In some languageslike Perl and Lisp the specification (or implementation) of the language allows constructs thatexecute during the parsing phase. Furthermore, these languages have constructs that allow the

 programmer to alter the behavior of the parser. This combination effectively blurs the distinction between parsing and execution, and makes syntax analysis an undecidable problem in theselanguages, meaning that the parsing phase may not finish. For example, in Perl it is possible to

execute code during parsing using a BEGIN statement, and Perl function prototypes may alter thesyntactic interpretation, and possibly even the syntactic validity of the remaining code.[3] Similarly,Lisp macros introduced by the defmacro syntax also execute during parsing, meaning that a Lispcompiler must have an entire Lisp run-time system present. In contrast C macros are merely stringreplacements, and do not require code execution.[4][5]

[edit] Syntax versus semanticsThe syntax of a language describes the form of a valid program, but does not provide anyinformation about the meaning of the program or the results of executing that program. Themeaning given to a combination of symbols is handled by semantics (either formal or hard-coded ina reference implementation). Not all syntactically correct programs are semantically correct. Many

syntactically correct programs are nonetheless ill-formed, per the language's rules; and may(depending on the language specification and the soundness of the implementation) result in anerror on translation or execution. In some cases, such programs may exhibit undefined behavior.Even when a program is well-defined within a language, it may still have a meaning that is notintended by the person who wrote it.

Using natural language as an example, it may not be possible to assign a meaning to agrammatically correct sentence or the sentence may be false:

"Colorless green ideas sleep furiously." is grammatically well-formed but has no generally acceptedmeaning."John is a married bachelor." is grammatically well-formed but expresses a meaning that cannot betrue.The following C language fragment is syntactically correct, but performs an operation that is notsemantically defined (because p is a null pointer, the operations p->real and p->im have nomeaning):

complex *p = NULL;complex abs_p = sqrt (p->real * p->real + p->im * p->im);[edit] References^ a b Friedman, Daniel P.; Mitchell Wand, Christopher T. Haynes (1992). Essentials of 

Programming Languages (1st ed.). The MIT Press. ISBN 0-262-06145-7.^ Michael Sipser (1997). Introduction to the Theory of Computation. PWS Publishing. ISBN 0-534-94728-X. Section 2.2: Pushdown Automata, pp.101–114.^ The following discussions give examples:Perl and UndecidabilityLtU comment clarifying that the undecidable problem is membership in the class of Perl programschromatic's example of Perl code that gives a syntax error depending on the value of randomvariable^ http://www.apl.jhu.edu/~hall/Lisp-Notes/Macros.html^ http://cl-cookbook.sourceforge.net/macros.htmlRetrieved from "http://en.wikipedia.org/wiki/Syntax_(programming_languages)"

Categories: Programming language topics | Source codePersonal tools New features Log in / create account Namespaces

8/9/2019 Parse Tree of Python Code With Inset TokenizationThe Syntax of Textual Programming Languages is Usually Define…

http://slidepdf.com/reader/full/parse-tree-of-python-code-with-inset-tokenizationthe-syntax-of-textual-programming 3/12

Article Discussion VariantsViewsRead Edit View history ActionsSearch 

SearchNavigationMain page Contents Featured content Current events Random article InteractionAbout Wikipedia Community portal Recent changes Contact Wikipedia Donate to Wikipedia Help

ToolboxWhat links here Related changes Upload file Special pages Permanent link Cite this pagePrint/exportCreate a bookDownload as PDFPrintable versionLanguagesالعربية Italiano Македонски Nederlands Русский Tiếng Việt தமழ This page was last modifiedon 14 June 2010 at 15:38.

Text is available under the Creative Commons Attribution-ShareAlike L

8/9/2019 Parse Tree of Python Code With Inset TokenizationThe Syntax of Textual Programming Languages is Usually Define…

http://slidepdf.com/reader/full/parse-tree-of-python-code-with-inset-tokenizationthe-syntax-of-textual-programming 4/12

8/9/2019 Parse Tree of Python Code With Inset TokenizationThe Syntax of Textual Programming Languages is Usually Define…

http://slidepdf.com/reader/full/parse-tree-of-python-code-with-inset-tokenizationthe-syntax-of-textual-programming 5/12

8/9/2019 Parse Tree of Python Code With Inset TokenizationThe Syntax of Textual Programming Languages is Usually Define…

http://slidepdf.com/reader/full/parse-tree-of-python-code-with-inset-tokenizationthe-syntax-of-textual-programming 6/12

8/9/2019 Parse Tree of Python Code With Inset TokenizationThe Syntax of Textual Programming Languages is Usually Define…

http://slidepdf.com/reader/full/parse-tree-of-python-code-with-inset-tokenizationthe-syntax-of-textual-programming 7/12

8/9/2019 Parse Tree of Python Code With Inset TokenizationThe Syntax of Textual Programming Languages is Usually Define…

http://slidepdf.com/reader/full/parse-tree-of-python-code-with-inset-tokenizationthe-syntax-of-textual-programming 8/12

8/9/2019 Parse Tree of Python Code With Inset TokenizationThe Syntax of Textual Programming Languages is Usually Define…

http://slidepdf.com/reader/full/parse-tree-of-python-code-with-inset-tokenizationthe-syntax-of-textual-programming 9/12

8/9/2019 Parse Tree of Python Code With Inset TokenizationThe Syntax of Textual Programming Languages is Usually Define…

http://slidepdf.com/reader/full/parse-tree-of-python-code-with-inset-tokenizationthe-syntax-of-textual-programming 10/12

8/9/2019 Parse Tree of Python Code With Inset TokenizationThe Syntax of Textual Programming Languages is Usually Define…

http://slidepdf.com/reader/full/parse-tree-of-python-code-with-inset-tokenizationthe-syntax-of-textual-programming 11/12

8/9/2019 Parse Tree of Python Code With Inset TokenizationThe Syntax of Textual Programming Languages is Usually Define…

http://slidepdf.com/reader/full/parse-tree-of-python-code-with-inset-tokenizationthe-syntax-of-textual-programming 12/12