data structures - 國立臺灣大學ccf.ee.ntu.edu.tw/~yen/courses/ds19f/introduction.pdf · how to...
TRANSCRIPT
-
Data Structures
Õ§¡ Theory of Computation
Ý_�
Dept. of Electrical EngineeringNational Taiwan University
( Introduction) Data Structures Fall 2019 1 / 46
-
Course Organization
E-mail: [email protected]: http://www.ee.ntu.edu.tw/ ∼ yenTime: 9:10-12:10, MondayPlace: BL 112Office hours: by appointmentClass web page:http://ccf.ee.ntu.edu.tw/ ∼ yen/courses/ds19F.html
( Introduction) Data Structures Fall 2019 2 / 46
-
Instructor
2
顏嗣鈞
• 學歷 博士 Univ. of Texas at Austin (CS) 1986碩士 交大計算機工程研究所 1982
學士 台大電機系 1980
• 經歷 台大電機系 教授 1991 – present台大計算機及資訊網路中心 主任 2014 – present台大電機系 系主任 2010 -- 2013台大電機系 副教授 1990 -- 1991
美國Iowa State Univ. 計算機科學系助理教授 1986-1990
• 專長 演算法設計分析、資訊視覺化、計算理論
( Introduction) Data Structures Fall 2019 3 / 46
-
Prerequisites
Familiarity in C, C++, JAVA, or Python
Textbook: There is no required textbook for this class. There are somerelevant and useful books:
Mark Allen Weiss, Data Structures and Algorithm Analysis inC++. (3rd or 4th Ed.) Addison-Wesley, 2006/2013Peter Brass, Advanced Data Structures, Cambridge, 2008https://doi.org/10.1017/CBO9780511800191http://www-cs.ccny.cuny.edu/ peter/dstest.htmlD. Sheehy, Data Structureshttps://donsheehy.github.io/datastructures/
( Introduction) Data Structures Fall 2019 4 / 46
-
Topics
PRELIMINARIES: Introduction. Asymptotic Notations.Algorithm Analysis.ELEMENTARY STRUCTURES: Arrays vs. Pointers. Abstract datatypes. Stacks. Queues. Lists. Tree operations. Treerepresentations. Tree traversals. Binary trees.SEARCH TREES, BALANCED SEARCH TREES: AVL trees. 2-3-4trees. B-trees. Red-black trees. Binomial trees. Splay trees.Top-down splay trees. Skip lists, and more.HEAPS: Priority queues. Binary heaps. Binomial heaps. Fibonacciheaps. Min-max heaps. Leftist heaps. Skew heaps.HASHING: Hash tables. Chaining. Open addressing. Collisionhandling. Universal hashing. Perfect hashing.DISJOINT SET UNION/FIND: Set operations. Setrepresentations. Union-find. Path compression.SORTING : Insertion sort. Selection sort. Quicksort. Heapsort.Mergesort. Shellsort. Lower bound of sorting.
( Introduction) Data Structures Fall 2019 5 / 46
-
Topics (Cont’d)
DATA STRUCTURES FOR STRINGS : Tries. Compressed tries.Suffix trees. Suffix arrays.GRAPH DATA STRUCTURES/ALGORITHMS. Graphoperations. Graph representations. Basic graph algorithms.Algorithm design techniques.SUPPLEMENTARY MATERIALS: AMORTIZED ANALYSIS:Amortized complexity. Potential method.
( Introduction) Data Structures Fall 2019 6 / 46
-
Grading
Homework + Programming Assignment(s): 20-30 %Midterm Exam.: 30 - 40 %Final Exam.: 30 - 40 %
Academic Integrity With the exception of group assignments, thework (including homework, programming assignments, tests) must bethe result of your individual effort. This implies that one studentshould never have in his/her possession a copy of all or part ofanother student’s homework. It is your responsibility to protect yourwork from unauthorized access. Academic dishonesty has no place ina university, in particular, in NTUEE. It wastes our time and yours,and it is unfair to the majority of students. Any form of cheating willautomatically result in a failing grade in the course.
( Introduction) Data Structures Fall 2019 7 / 46
-
Data Structures vs. Programming
( Introduction) Data Structures Fall 2019 8 / 46
-
Data Structures, Algorithms, and Programs
Data structureI Organization of data to solve the problem at hand
AlgorithmI Outline, the essence of a computational procedure, step-by-step
instructionsProgram
I implementation of an algorithm in some programming language
( Introduction) Data Structures Fall 2019 9 / 46
-
Overall Picture
Using a computer to help solveproblems:
Precisely specifying the problemDesigning programs
I architectureI algorithms
Writing programsVerifying (testing) programs
( Introduction) Data Structures Fall 2019 10 / 46
-
Algorithmic Solution: Example
( Introduction) Data Structures Fall 2019 11 / 46
-
Insertion Sort
( Introduction) Data Structures Fall 2019 12 / 46
-
Analysis of Algorithms
Efficiency:I Running timeI Space used
Efficiency as a function of input size:I Number of data elements (numbers, points)I A number of bits in an input number
Very important to choose the level of detailThe RAM model:
I Instructions (each taking constant time):F Arithmetic (add, subtract, multiply, etc.)F Data movement (assign)F Control (branch, subroutine call, return)
I Data types: characters, integers, floats ...
( Introduction) Data Structures Fall 2019 13 / 46
-
Analysis of Insertion Sort
Time to compute the running time as a function of the input size
where tj is the # of comparisons needed for key A[j].
( Introduction) Data Structures Fall 2019 14 / 46
-
Best/Worst/Average Case
Best case: elements already sorted⇒ j = 1, running time ≈ n, i.e.,linear time.Worst case: elements are sorted in inverse order⇒ tj = j, runningtime ≈ n2, i.e., quadratic timeAverage case: tj = j/2, running time ≈ n2, i.e., quadratic time
– For a specific size of input n, investigate running times for differentinput instances:
( Introduction) Data Structures Fall 2019 15 / 46
-
Best/Worst/Average Case (Cont’d)
– For inputs of all sizes:
( Introduction) Data Structures Fall 2019 16 / 46
-
Data Structure Design
Search for a number k in a set of N numbersSolution # 1: Linear Search
I Store numbers in an array of size NI Iterate through array until find kI Number of checks
F Best case: 1 (k = 15)F Worst case: N(k = 27)F Average case: N/2
( Introduction) Data Structures Fall 2019 17 / 46
-
Data Structure Design (Cont’d)
Solution # 2: Binary Search Tree (BST)I Store numbers in a binary search tree
F Requires: Elements to be sortedI Properties:
F The left (resp., right) subtree of a node contains only nodes with keysless than (resp., greater than) the node’s key
F Both the left and right subtrees must also be binary search treesI Search tree until find kI Number of checks
F Best case: 1 (k = 15)F Worst case: log2 N (k = 27)F Average case: (log2 N)/2
( Introduction) Data Structures Fall 2019 18 / 46
-
Should things be Ordered?
( Introduction) Data Structures Fall 2019 19 / 46
-
Example
Problem ArtifactsI N = 1,000,000,000
F 1 billion (Walmart transactions in 100 days)I 1 Ghz processor = 109 cycles per second
Solution #1I Worst case: 1 billion checks = 10 seconds
Solution #2I Worst case: 30 checks = 0.0000003 seconds
Does it matter? N vs. log2 N
( Introduction) Data Structures Fall 2019 20 / 46
-
Computational Complexity
Computational complexity: an abstract measure of the time andspace necessary to execute an algorithm as functions of its inputsize.Input size: size of encoded ”binary” strings.
I sort n words of bounded length - input size: nI the input is the integer n - input size: log2 nI the input is the graph G = (V,E) - input size: |V| and |E|
Runtime comparison: assume 1 BIPS, 1 instruction/opTime Big-Oh n = 10 n = 100 n = 104 n = 106 n = 108
500 O(1) 5*10-7 sec 5*10-7 sec 5*10-7
sec5*10-7 sec 5*10-7
sec
3n O(n) 3*10-8 sec 3*10-7 sec 3*10-5
sec
0.003 sec 0.3 sec
n lg n O(n lg n) 3*10-8 sec 6*10-7 sec 1*10-4
sec
0.018 sec 2.5 sec
n2 O(n2) 1*10-7 sec 1*10-5 sec 0. 1 sec 16.7 min 116 days
n3 O(n3) 1*10-6 sec 0.001 sec 16.7 min 31.7 yr ∞2n O(2n) 1*10-6 sec 4 *1011
cent.∞ ∞ ∞
n! O(n!) 0.003 sec ∞ ∞ ∞ ∞
( Introduction) Data Structures Fall 2019 21 / 46
-
Can’t Finish the Assigned Task
I can’t find an efficient algorithm, I guess I’m just too dumb.
( Introduction) Data Structures Fall 2019 22 / 46
-
Mission Impossible
I can’t find an efficient algorithm, because no such algorithm ispossible.
( Introduction) Data Structures Fall 2019 23 / 46
-
Difficult because ...
I can’t find an efficient algorithm, but neither can all these famouspeople.
( Introduction) Data Structures Fall 2019 24 / 46
-
Life Cycle of Software Development
( Introduction) Data Structures Fall 2019 25 / 46
-
Correctness of Programs
The program is (totally) correct if for any legal input it terminatesand produces the desired output.Automatic proof of correctness is not possible – an unsolvableproblemBut there are practical techniques and rigorous formalisms thathelp to reason about the correctness of programs
( Introduction) Data Structures Fall 2019 26 / 46
-
Correctness of Programs (Cont’d)
( Introduction) Data Structures Fall 2019 27 / 46
-
Assertions
To prove partial correctness we associate a number of assertions(statements about the state of the execution) with specificcheckpoints in the algorithm.E.g., A[1], ...,A[k] form an increasing sequencePreconditions: assertions that must be valid before the executionof an algorithm or a subroutinePostconditions: assertions that must be valid after the execution ofan algorithm or a subroutine
( Introduction) Data Structures Fall 2019 28 / 46
-
Loop Invariants
Invariants: assertions that are valid any time they are reached(many times during the execution of an algorithm, e.g., in loops)We must show three things about loop invariants:
I Initialization: it is true prior to the first iterationI Maintenance: if it is true before an iteration, it remains true before
the next iterationI Termination: when loop terminates the invariant gives a useful
property to show the correctness of the algorithm
( Introduction) Data Structures Fall 2019 29 / 46
-
Example of Loop Invariants
Invariant: at the start of each for loop, A[1...j− 1] consists ofelements originally in A[1...j− 1] but in sorted orderInitialization: j = 2, the invariant trivially holds because A[1] is asorted array.Maintenance: the inner while loop moves elementsA[j− 1],A[j− 2], ...,A[j− k] one position right without changingtheir order. Then the former A[j] element is inserted into k-thposition so that A[k− 1] ≤ A[k] ≤ A[k + 1].A[1...j− 1] sorted + A[j]⇒ A[1...j] sortedTermination: the loop terminates, when j = n + 1.
( Introduction) Data Structures Fall 2019 30 / 46
-
Data ”Structures”Array: requires that you copy all theelements in the array over
Linked List: allows you to make theinsertion very quickly
General areas include:Sequential storage (e.g., Lists)Hierarchical storage (e.g., Trees)Adjacency storage (e.g., Graphs)
( Introduction) Data Structures Fall 2019 31 / 46
-
Goal
Learn to write efficient and elegant softwareHow to choose between two algorithms
I Which to use? bubble-sort, insertion-sort, merge-sortHow to choose appropriate data structures
I Which to use? array, vector, linked list, binary treeIn this course, we will look at:
I different techniques for storing, accessing, and modifyinginformation on a computer
I algorithms which can efficiently solve problems
We will see that all data structures have trade-offs - there is noultimate data structure...The choice depends on our requirements
( Introduction) Data Structures Fall 2019 32 / 46
-
Why should you care?
Complex data structures and algorithms are used in every realprogram
I Data compression uses trees: MP3, Gif, etcKI Networking uses graphs: Routers and telephone networksI Security uses complex math algorithms: GCD and large decimalsI Operating systems use queues and stacks: Scheduling and
recursion
Many problems can only be solved using complex data structuresand algorithms
( Introduction) Data Structures Fall 2019 33 / 46
-
What this course is NOT about
This course is not about C++I Although we will use C++ to implement some of the concepts
This course is not about MATHI Although we will use math to formalize many of the concepts
Competency in both math and C++ (or other programminglanguages) is therefore welcomed.
I C++: inheritance, overloading, overriding, files, linked-lists,multi-dimensional arrays
I Math: polynomials, logarithms, inductive proofs, logic
( Introduction) Data Structures Fall 2019 34 / 46
-
The Big Idea
Definition of Abstract Data TypeI A collection of data along with specific operations that manipulate
that dataI Has nothing to do with a programming language!
Two fundamental goals of algorithm analysisI Correctness: Prove that a program works as expectedI Efficiency: Characterize the run-time of an algorithm
Alternative goals of algorithm analysisI Characterize the amount of memory requiredI Characterize the size of a programs codeI Characterize the readability of a programI Characterize the robustness of a program
( Introduction) Data Structures Fall 2019 35 / 46
-
Clever? Efficient?
( Introduction) Data Structures Fall 2019 36 / 46
-
Why study data structures?
Clever ways to organize information in order to enable efficientcomputation
( Introduction) Data Structures Fall 2019 37 / 46
-
Why so many data structures?
Ideal data structure:fast, elegant, memory efficientGenerates tensions:
I time vs. spaceI performance vs. eleganceI generality vs. simplicityI one operation’s performance vs.
another’s
Dictionary ADTlistbinary search treeAVL treeSplay treeRed-Black treehash table...
( Introduction) Data Structures Fall 2019 38 / 46
-
An Example: Shortest Path
Given a weighted graph and two vertices u and v, we want to finda path of minimum total weight between u and v.
I Length of a path is the sum of the weights of its edges.
Applications: Internet packet routing, Flight reservations, Drivingdirections
( Introduction) Data Structures Fall 2019 39 / 46
-
Dijkstra’s algorithm
( Introduction) Data Structures Fall 2019 40 / 46
-
Example (cont.)
( Introduction) Data Structures Fall 2019 41 / 46
-
Questions
Operations performed?
( Introduction) Data Structures Fall 2019 42 / 46
-
Key steps
( Introduction) Data Structures Fall 2019 43 / 46
-
Straightforward approach
Questions:Is the above efficient?Can we do better?
( Introduction) Data Structures Fall 2019 44 / 46
-
Priority Queues
( Introduction) Data Structures Fall 2019 45 / 46
-
Questions?
( Introduction) Data Structures Fall 2019 46 / 46