a second course in formal languages and automata...

9
A Second Course in Formal Languages and Automata Theory Intended for graduate students and advanced undergraduates in computer science, A Second Course in Formal Languages and Automata Theory treats topics in the theory of computation not usually covered in a first course. After a review of basic concepts, the book covers combinatorics on words, regular languages, context-free languages, parsing and recognition, Turing machines, and other language classes. Many topics often absent from other textbooks, such as repetitions in words, state complexity, the interchange lemma, 2DPDAs, and the incompressibility method, are covered here. The author places particular emphasis on the resources needed to represent certain languages. The book also includes a diverse collection of more than 200 exercises, suggestions for term projects, and research problems that remain open. jeffrey shallit is professor in the David R. Cheriton School of Computer Science at the University of Waterloo. He is the author of Algorithmic Number Theory (co- authored with Eric Bach) and Automatic Sequences: Theory, Applications, Generaliza- tions (coauthored with Jean-Paul Allouche). He has published approximately 90 articles on number theory, algebra, automata theory, complexity theory, and the history of math- ematics and computing. © Cambridge University Press www.cambridge.org Cambridge University Press 978-0-521-86572-2 - A Second Course in Formal Languages and Automata Theory Jeffrey Shallit Frontmatter More information

Upload: dinhbao

Post on 25-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Second Course in Formal Languages and Automata Theoryassets.cambridge.org/97805218/65722/frontmatter/9780521865722... · A Second Course in Formal Languages and Automata ... Second

A Second Course in Formal Languages and Automata Theory

Intended for graduate students and advanced undergraduates in computer science, ASecond Course in Formal Languages and Automata Theory treats topics in the theoryof computation not usually covered in a first course.

After a review of basic concepts, the book covers combinatorics on words, regularlanguages, context-free languages, parsing and recognition, Turing machines, and otherlanguage classes. Many topics often absent from other textbooks, such as repetitionsin words, state complexity, the interchange lemma, 2DPDAs, and the incompressibilitymethod, are covered here. The author places particular emphasis on the resources neededto represent certain languages. The book also includes a diverse collection of more than200 exercises, suggestions for term projects, and research problems that remain open.

jeffrey shallit is professor in the David R. Cheriton School of Computer Scienceat the University of Waterloo. He is the author of Algorithmic Number Theory (co-authored with Eric Bach) and Automatic Sequences: Theory, Applications, Generaliza-tions (coauthored with Jean-Paul Allouche). He has published approximately 90 articleson number theory, algebra, automata theory, complexity theory, and the history of math-ematics and computing.

© Cambridge University Press www.cambridge.org

Cambridge University Press978-0-521-86572-2 - A Second Course in Formal Languages and Automata TheoryJeffrey ShallitFrontmatterMore information

Page 2: A Second Course in Formal Languages and Automata Theoryassets.cambridge.org/97805218/65722/frontmatter/9780521865722... · A Second Course in Formal Languages and Automata ... Second

A Second Course in Formal Languages andAutomata Theory

JEFFREY SHALLITUniversity of Waterloo

© Cambridge University Press www.cambridge.org

Cambridge University Press978-0-521-86572-2 - A Second Course in Formal Languages and Automata TheoryJeffrey ShallitFrontmatterMore information

Page 3: A Second Course in Formal Languages and Automata Theoryassets.cambridge.org/97805218/65722/frontmatter/9780521865722... · A Second Course in Formal Languages and Automata ... Second

cambridge university pressCambridge, New York, Melbourne, Madrid, Cape Town, Singapore, Sao Paulo, Delhi

Cambridge University Press32 Avenue of the Americas, New York, NY 10013-2473, USA

www.cambridge.orgInformation on this title: www.cambridge.org/9780521865722

C© Jeffrey Shallit 2009

This publication is in copyright. Subject to statutory exceptionand to the provisions of relevant collective licensing agreements,

no reproduction of any part may take place withoutthe written permission of Cambridge University Press.

First published 2009

Printed in the United States of America

A catalog record for this publication is available from the British Library.

Library of Congress Cataloging in Publication Data

Shallit, Jeffrey Outlaw.A second course in formal languages and automata theory / Jeffrey Shallit.

p. cm.Includes bibliographical references and index.

ISBN 978-0-521-86572-2 (hardback)1. Formal languages. 2. Machine theory. I. Title.

QA267.3.S53 2009005.13′1–dc22 2008030065

ISBN 978-0-521-86572-2 hardback

Cambridge University Press has no responsibility forthe persistence or accuracy of URLs for external or

third-party Internet Web sites referred to in this publicationand does not guarantee that any content on such

Web sites is, or will remain, accurate or appropriate.

© Cambridge University Press www.cambridge.org

Cambridge University Press978-0-521-86572-2 - A Second Course in Formal Languages and Automata TheoryJeffrey ShallitFrontmatterMore information

Page 4: A Second Course in Formal Languages and Automata Theoryassets.cambridge.org/97805218/65722/frontmatter/9780521865722... · A Second Course in Formal Languages and Automata ... Second

Contents

Preface page ix

1 Review of formal languages and automata theory 11.1 Sets 11.2 Symbols, strings, and languages 11.3 Regular expressions and regular languages 31.4 Finite automata 41.5 Context-free grammars and languages 81.6 Turing machines 131.7 Unsolvability 171.8 Complexity theory 191.9 Exercises 211.10 Projects 261.11 Research problems 261.12 Notes on Chapter 1 26

2 Combinatorics on words 282.1 Basics 282.2 Morphisms 292.3 The theorems of Lyndon–Schutzenberger 302.4 Conjugates and borders 342.5 Repetitions in strings 372.6 Applications of the Thue–Morse sequence

and squarefree strings 402.7 Exercises 432.8 Projects 472.9 Research problems 472.10 Notes on Chapter 2 47

v

© Cambridge University Press www.cambridge.org

Cambridge University Press978-0-521-86572-2 - A Second Course in Formal Languages and Automata TheoryJeffrey ShallitFrontmatterMore information

Page 5: A Second Course in Formal Languages and Automata Theoryassets.cambridge.org/97805218/65722/frontmatter/9780521865722... · A Second Course in Formal Languages and Automata ... Second

vi Contents

3 Finite automata and regular languages 493.1 Moore and Mealy machines 493.2 Quotients 523.3 Morphisms and substitutions 543.4 Advanced closure properties of regular languages 583.5 Transducers 613.6 Two-way finite automata 653.7 The transformation automaton 713.8 Automata, graphs, and Boolean matrices 733.9 The Myhill–Nerode theorem 773.10 Minimization of finite automata 813.11 State complexity 893.12 Partial orders and regular languages 923.13 Exercises 953.14 Projects 1053.15 Research problems 1053.16 Notes on Chapter 3 106

4 Context-free grammars and languages 1084.1 Closure properties 1084.2 Unary context-free languages 1114.3 Ogden’s lemma 1124.4 Applications of Ogden’s lemma 1144.5 The interchange lemma 1184.6 Parikh’s theorem 1214.7 Deterministic context-free languages 1264.8 Linear languages 1304.9 Exercises 1324.10 Projects 1384.11 Research problems 1384.12 Notes on Chapter 4 138

5 Parsing and recognition 1405.1 Recognition and parsing in general grammars 1415.2 Earley’s method 1445.3 Top-down parsing 1525.4 Removing LL(1) conflicts 1615.5 Bottom-up parsing 1635.6 Exercises 1725.7 Projects 1735.8 Notes on Chapter 5 173

© Cambridge University Press www.cambridge.org

Cambridge University Press978-0-521-86572-2 - A Second Course in Formal Languages and Automata TheoryJeffrey ShallitFrontmatterMore information

Page 6: A Second Course in Formal Languages and Automata Theoryassets.cambridge.org/97805218/65722/frontmatter/9780521865722... · A Second Course in Formal Languages and Automata ... Second

Contents vii

6 Turing machines 1746.1 Unrestricted grammars 1746.2 Kolmogorov complexity 1776.3 The incompressibility method 1806.4 The busy beaver problem 1836.5 The Post correspondence problem 1866.6 Unsolvability and context-free languages 1906.7 Complexity and regular languages 1936.8 Exercises 1986.9 Projects 2006.10 Research problems 2006.11 Notes on Chapter 6 200

7 Other language classes 2027.1 Context-sensitive languages 2027.2 The Chomsky hierarchy 2127.3 2DPDAs and Cook’s theorem 2137.4 Exercises 2217.5 Projects 2227.6 Research problems 2227.7 Notes on Chapter 7 223

Bibliography 225Index 233

© Cambridge University Press www.cambridge.org

Cambridge University Press978-0-521-86572-2 - A Second Course in Formal Languages and Automata TheoryJeffrey ShallitFrontmatterMore information

Page 7: A Second Course in Formal Languages and Automata Theoryassets.cambridge.org/97805218/65722/frontmatter/9780521865722... · A Second Course in Formal Languages and Automata ... Second

Preface

Goals of this bookThis is a textbook for a second course on formal languages and automata theory.

Many undergraduates in computer science take a course entitled “Introduc-tion to Theory of Computing,” in which they learn the basics of finite automata,pushdown automata, context-free grammars, and Turing machines. However,few students pursue advanced topics in these areas, in part because there is noreally satisfactory textbook.

For almost 20 years I have been teaching such a second course for fourth-year undergraduate majors and graduate students in computer science at theUniversity of Waterloo: CS 462/662, entitled “Formal Languages and Parsing.”For many years we used Hopcroft and Ullman’s Introduction to AutomataTheory, Languages, and Computation as the course text, a book that has provedvery influential. (The reader will not have to look far to see its influence on thepresent book.)

In 2001, however, Hopcroft and Ullman released a second edition of theirtext that, in the words of one professor, “removed all the good parts.” In otherwords, their second edition is geared toward second- and third-year students,and omits nearly all the advanced topics suitable for fourth-year students andbeginning graduate students.

Because the first edition of Hopcroft and Ullman’s book is no longer easilyavailable, and because I have been regularly supplementing their book with myown handwritten course notes, it occurred to me that it was a good time to writea textbook on advanced topics in formal languages. The result is this book.

The book contains many topics that are not available in other textbooks. Toname just a few, it addresses the Lyndon–Schutzenberger theorem, Thue’s re-sults on avoiding squares, state complexity of finite automata, Parikh’s theorem,the interchange lemma, Earley’s parsing method, Kolmogorov complexity, andCook’s theorem on the simulation of 2DPDAs. Furthermore, some well-known

ix

© Cambridge University Press www.cambridge.org

Cambridge University Press978-0-521-86572-2 - A Second Course in Formal Languages and Automata TheoryJeffrey ShallitFrontmatterMore information

Page 8: A Second Course in Formal Languages and Automata Theoryassets.cambridge.org/97805218/65722/frontmatter/9780521865722... · A Second Course in Formal Languages and Automata ... Second

x Preface

theorems have new (and hopefully simpler) proofs. Finally, there are almost200 exercises to test students’ knowledge. I hope this book will prove useful toadvanced undergraduates and beginning graduate students who want to dig alittle deeper in the theory of formal languages.

PrerequisitesI assume the reader is familiar with the material contained in a typical first coursein the theory of computing and algorithm design. Because not all textbooksuse the same terminology and notation, some basic concepts are reviewed inChapter 1.

Algorithm descriptionsAlgorithms in this book are described in a “pseudocode” notation similar toPascal or C, which should be familiar to most readers. I do not provide a formaldefinition of this notation. The readers should note that the scope of loops isdenoted by indentation.

Proof ideasAlthough much of this book follows the traditional theorem/proof style, it doeshave one nonstandard feature. Many proofs are accompanied by “proof ideas,”which attempt to capture the intuition behind the proofs. In some cases, proofideas are all that is provided.

Common errorsI have tried to point out some common errors that students typically make whenencountering this material for the first time.

ExercisesThere are a wide variety of exercises, from easy to hard, in no particular order.Most readers will find the exercises with one star challenging and exerciseswith two stars very challenging.

ProjectsEach chapter has a small number of suggested projects that are suitable for termpapers. Students should regard the provided citations to the literature only asstarting points; by tracing forward and backward in the citation history, manymore papers can usually be found.

Research problemsEach chapter has a small number of research problems. Currently, no one knowshow to solve these problems, so if you make any progress, please contact me.

© Cambridge University Press www.cambridge.org

Cambridge University Press978-0-521-86572-2 - A Second Course in Formal Languages and Automata TheoryJeffrey ShallitFrontmatterMore information

Page 9: A Second Course in Formal Languages and Automata Theoryassets.cambridge.org/97805218/65722/frontmatter/9780521865722... · A Second Course in Formal Languages and Automata ... Second

Preface xi

AcknowledgmentsOver the last 18 years, many students in CS 462/662 at the University of Wa-terloo have contributed to this book by finding errors and suggesting betterarguments. I am particularly grateful to the following students (in alphabet-ical order): Margareta Ackerman, Craig Barkhouse, Matthew Borges, TracyDamon, Michael DiRamio, Keith Ellul, Sarah Fernandes, Johannes Franz,Cristian Gaspar, Ryan Golbeck, Mike Glover, Daniil Golod, Ryan Hadley,Joe Istead, Richard Kalbfleisch, Jui-Yi Kao, Adam Keanie, Anita Koo, DaliaKrieger, David Landry, Jonathan Lee, Abninder Litt, Brendan Lucier, Bob Lutz,Ian MacDonald, Angela Martin, Glen Martin, Andrew Martinez, GyanendraMehta, Olga Miltchman, Siamak Nazari, Sam Ng, Lam Nguyen, MatthewNichols, Alex Palacios, Joel Reardon, Joe Rideout, Kenneth Rose, JessicaSocha, Douglas Stebila, Wenquan Sun, Aaron Tikuisis, Jim Wallace, XiangWang, Zhi Xu, Colin Young, Ning Zhang, and Bryce Zimny.

Four people have been particularly influential in the development of thisbook, and I single them out for particular thanks: Jonathan Buss, who taughtfrom this book while I was on sabbatical in 2001 and allowed me to use hisnotes on the closure of CSLs under complement; Narad Rampersad, who taughtfrom this book in 2005, suggested many exercises, and proofread the final draft;Robert Robinson, who taught from a manuscript version of this book at theUniversity of Georgia and identified dozens of errors; and Nic Santean, whocarefully read the draft manuscript and made dozens of useful suggestions.All errors that remain, of course, are my responsibility. If you find an error,please send it to me. The current errata list can be found on my homepage;http://www.cs.uwaterloo.ca/~shallit/ is the URL.

This book was written at the University of Waterloo, Ontario, and the Uni-versity of Arizona. This work was supported in part by grants from the NaturalSciences and Engineering Research Council (Canada).

Waterloo, Ontario; January 2008 Jeffrey Shallit

© Cambridge University Press www.cambridge.org

Cambridge University Press978-0-521-86572-2 - A Second Course in Formal Languages and Automata TheoryJeffrey ShallitFrontmatterMore information