精研课程. 9/17/20152/74 课程基本信息 课程编号: 21190120 上课时间、地点:...
TRANSCRIPT
精研课程
04/19/23 2/74
课程基本信息课程编号: 21190120
上课时间、地点: 2013 年 夏学期周二 (1~2) 曹光彪 西 101 、周四( 1~2 )曹光彪 西 101
考试时间: 7 月 3 日 闭卷
04/19/23 3/74
Office & Homepage
Office :工商楼 215
My Homepage :http://www.cs.zju.edu.cn/people/yedeshi/
04/19/23 4/74
Examination
Grading Polices :1) Homework: 10%
2) Quiz: 15% (May 21)
3) Presentation: 15%
4) Programming Projects: 20% (2 programms)
5) Final Exam: 40%
TA information
史如意 : rolesGuide to group the students
Collect the homework, program, and project
Review of homework, programs
Email: [email protected]
04/19/23 5/74
04/19/23 6/74
The core of TCSAlgorithms and complexity of computation
Computational limits of proof methods
Logic and program verification
The power of randomization
Cryptography
Quantum computation
Distributed computation and communication
Computational learning theory
Important themes in TCSEfficiency common measures: computation time, memory, parallelism, randomness,..
Impossibility results intellectual ancestors: impossibility of perpetual motion, impossibility of trisecting an angle, incompleteness theorem, undecidability, etc.
Approximationapproximately optimal answers, algorithms that work “most of the time”,mathematical characterizations that are approximate (e.g., approximatemax-flow min-cut theorem)
Central role of randomnessrandomized algorithms and protocols, probabilistic encryption,random graph models, probabilistic models of the WWW, etc.
ReductionsNP-completeness and other intractability results (including complexity-based cryptography)
What is the position of algorithms in CS
1. Linguists: what shall we talk to the machines?
2. Algorithms: what is a good method for solving a problem fast on my computer
3. Architects: Can I build a better computer?
4. Sculptors of Machine Intelligence: Can I write a computer program that can find its own solution.
04/19/23 9/74
04/19/23 10/74
Algorithms in Computer ScienceA
lgo
rithm
s
Hardware
Compilers, Programming languages
Machine learning, Statistics, Information retrieval, AI
Networking, Distributed systems, Fault tolerance, Security
Bioinformatics .......
04/19/23 11/74
MIT Undergraduate Programs
04/19/23 12/74
What is algorithm ?( Oxford Dict. ) Algorithm:
A set of rules that must be followed when solving a particular problem.
From Math world A specific set of instructions for carrying out a procedure or solving a problem, usually with the requirement that the procedure terminate at some point.
An algorithm is any well-defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values, as output.
04/19/23 13/74
Algorithm
Problem definition 问题
Objective 目标 (very important)
Evaluation 算法评价
Methods 方法
04/19/23 14/74
Algorithm evaluation
Quality: how far away from the optimal solution ?
Cost: Running time Space needed
Our goal is to design algorithm with high quality, but in low cost
04/19/23 15/74
Reasonable times
Poly(|I|), Time polynomial in |I|, where |I| is the size of the problem instanceInput size: size(x) of an instance x with rational data is the total number of bits needed for the binary prepresentation.
Time complexity
logarithmic time if T(n) = O(log n).
sub-linear time if T(n) = o(n)
linear time, or O(n) time
linearithmic function: T(n) = O(n log n), quasilinear time if T(n) = O(n logk n)
polynomial time: T(n) = O(nk) for some constant kstrongly polynomial time:
the number of operations in the arithmetic model of computation is bounded by a polynomial in the number of integers in the input instance; and
the space used by the algorithm is bounded by a polynomial in the size of the input.
weakly polynomial time: P but not strongly P
04/19/23 16/74
Time complexity
Quasi-polynomial time: for some fixed c.
Sub-exponential time if T(n) = 2o(n)
Exponential time, if T(n) is upper bounded by 2poly(n)
04/19/23 17/74
((log ) )2cO n
04/19/23 18/74
Hardness of problems
Polynomial (e.g. n2, n log n, n3, n1000).
Quasi-polynomial(e.g.:n log n, n log2n, c log7n).
Sub-exponential (e.g.: 2√n, 5(n0.98)).
Exponential (e.g.: 2n, 8n, n!, nn).
Easy
Hard
04/19/23 19/74
Running timeComputer A is 100 times faster than computer B
Sort n numbersComputer A requires instructions
Computer B requires 50nlgn instructions
n = 1,000, 000Computer A: 2(10^6)^2/10^9 = 2000 seconds
Computer B: 50*10^6 lg 10^6/10^7 ~ 100 seconds
22n
04/19/23 20/74
Running time
10 < 1 s < 1s < 1 s < 1 s < 1 s 4 s
100 < 1 s < 1 s < 1 s 1 s 18 min
year
1,000 < 1 s < 1 s 1 s 18 min
Very long
Very long
10,000 < 1 s < 1 s 2 min 12 day
Very long
Very long
1 s 20 s 12 days
31710 year
Very long
Very long
n2n3nlognn 2n n!
1025
106
04/19/23 21/74
Sorting输入: A sequence of n number
输出:排列( permutation )< a0
1, a02,… ,a0
n >
使得: a01 <=a0
2 <= ... <=a0n
Example:
Input: 8 2 4 9 3 6Output: 2 3 4 6 8 9
< a1,a2, …,an >
04/19/23 22/74
EX. of insertion sort
8 2 4 9 3 6
04/19/23 23/74
8 2 4 9 3 6
2 8 4 9 3 6
EX. of insertion sort
04/19/23 24/74
8 2 4 9 3 6
2 8 4 9 3 6
EX. of insertion sort
04/19/23 25/74
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
EX. of insertion sort
04/19/23 26/74
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
EX. of insertion sort
04/19/23 27/74
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
EX. of insertion sort
04/19/23 28/74
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
EX. of insertion sort
04/19/23 29/74
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6
EX. of insertion sort
04/19/23 30/74
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6
EX. of insertion sort
04/19/23 31/74
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6
2 3 4 6 8 9 done
EX. of insertion sort
04/19/23 32/74
Insertion sortINSERTION-SORT (A, n) ⊳ A[1 . . n]
for j ← 2 to ndo key ← A[ j]
i ← j – 1while i > 0 and A[i] > key
do A[i+1] ← A[i]i ← i – 1
A[i+1] = key
“pseudocode”
i j
keysortedsorted
AA::1 n
04/19/23 33/74
Analyzing algorithms
Need a computational model
Random-access machine (RAM) modelInstructions are executed one after another. No concurrent operations.
Arithmetic: add, subtract, multiply, divide, remainder, floor, ceiling
Data movement: load, store, copy
Control: conditional/unconditional branch, subroutine call and return.
Each of these instructions takes a constant amount of time.
04/19/23 34/74
Running time
Running time: The running time of an algorithm on a particular input is the number of primitive operations or “steps” executed.line consists only of primitive operations and takes constant time
Input size: number of items the total number of bits.more than one number: Graph
the number of vertices and the number of edges
04/19/23 35/74
The input size of sorting problem is n.Worst-case running time of Insert sort is O(n2).
Example:
04/19/23 36/74
Running time
The running time depends on the input: an already sorted sequence is easier to sort.
Parameterize the running time by the size of the input, since short sequences are easier to sort than long ones.
Generally, we seek upper bounds on the running time, because everybody likes a guarantee.
04/19/23 37/74
Map of Algorithm Design
Off-line problem
Polynomial NP-C problem
Improve cost running time
Exact Algorithm
HeuristicApproximate
Algorithm
Improve cost running time
QualityAppro. ratio
On-line problem
New problem
QualityAppro. ratio
Polynomial
04/19/23 38/74
课程内容1. 数学基础
1.1 算法基础 1.2 和 (SUMS) 集合运算 (Sets) 1.3 特殊数 ( Stirling numbers, Harmonic numbers, Eulerian numbers et al. )
2. 基本算法2.1 分治 ( Divide-and-Conquer ) *
2.1.1 Mergesort * 2.1.2 自然数相乘( Multiplication ) * 2.1.3 矩阵相乘( Matrix multiplication ) 2.1.4 Discrete Fourier transform and Fast Fourier transform
04/19/23 39/74
2.2 动态规划 (Dynamic Programming) 2.2.1 背包问题( Knapsack problem ) 2.2.2 最长递增子序列( Longest increasing subsequence ) 2.2.3 Sequence alignment 2.2.4 最长相同子序列( Longest common subsequence ) 2.3.5 Matrix-chain multiplication 2.3.6 树上的独立集 (Max Independent set in tree)
课程内容
04/19/23 40/74
2.3 贪婪算法 ( Greedy ) – 2.3.1 区间规划( Interval scheduling )– 2.3.2 集合覆盖( Set cover )– 2.3.3 拟阵( Matroids )
• 2.4 NP 问题 ( NP-completeness ) – 2.4.1 The classes P and NP – 2.4.2 NP-completeness and reducibility – 2.4.3 NP-complete problems *
课程内容
04/19/23 41/74
• 2.5 近似算法 (Approximate Algorithm) – 2.5.1 顶点覆盖问题 ( Vertex cover ) – 2.5.2 负载平衡问题 (Load balancing)
– 2.5.3 旅行商问题 (Traveling salesman problem)
– 2.5.4 子集和问题 (Subset sum problem)
课程内容
04/19/23 42/74
• 3. 算法的应用• 3.1 局部搜索 (Local Search)
– 3.1.1 The Metropolis Algorithm and Simulated Annealing
– 3.1.2 Local Search to Hopfield Neural Networks ( Nash Equilibria )
– 3.1.3 Maximum Cut Approximation via Local Search
课程内容
04/19/23 43/74
• 3.2 图论 (Graph Theorem) *– 3.2.1 图论的基本知识 ( Fundamental )– 3.2.2 线性规划 (Linear Programming)
• 网络流( Network Flow ),二分图,完全图的匹配 • 3.3 计算几何学 (Computational Geometry)*
– 3.3.1 基本概念与折线段的性质 (Line-segment ) – 3.3.2 线段的一些性质 (Segments intersects )– 3.3.3 凸包问题 (Convex Hull )– 3.3.4 最近点对问题 (The closet pair of points)– 3.3.5 多边形三角剖分 (Polygon Triangulation)
课程内容
04/19/23 44/74
• 3.4 随机算法 (Randomized Algorithm) – 3.4.1 随机变量与期望 – 3.4.2 A Randomized MAX-3-SAT – 3.4.3 Randomized Divide-and-Conquer
• 3.5 在线算法( Online Algorithm )– 3.5.1 Online Skying– 3.5.2 Online Hiring
课程内容
* :备选内容
04/19/23 45/74
课程内容
04/19/23 46/74
教材 • Textbook:Textbook: Introduction to algorithms Introduction to algorithms,
Second Edition. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. The MIT Press, 2001. ISBN: 0262032937.
• Recommended:Recommended: Algorithm Design.Algorithm Design. Jon Kleinberg, Éva Tardos. Addison Wesley, 2005. ISBN: 0-321-29535-8.
Rolf Nevanlinna Rolf Nevanlinna Prize, 06Prize, 06
04/19/23 47/74
04/19/23 48/74
• Algorithms.Algorithms. S. Dasgupta, C.H. Papadimitriou, and U. V. Vazirani. May 2006.
• Combinatorial Algorithms. Jeff Erickson. University of Illinois, Urbana-Champaign. Lecture Notes. Fall 2002.
• Concrete MathematicsConcrete Mathematics. Ronald L. Graham, Donald E. Knuth, Oren Patashnik. Addison-Wesley Publishing Company, 2005. ISBN: o-201-14236-8.
参考教材
04/19/23 49/74
Algorithms in Computer Science• P = NPP = NP ?
• Can we solve a problem efficiently?
• Tradeoff between quality of solution and the running time– Solve a problem with optimal solution, but it
might cost long time– Solve a problem approximately in short time
04/19/23 50/74
$1,000,000 problem
• P = NPP = NP ? http://www.claymath.org/millennium/
• Solved???!!!!
Algorithms in Computer Science
• Selfish Routing
• Privacy preserve in database
• TSP
• Ad auction
04/19/23 51/74
04/19/23 52/74
Perspective
• Algorithms we can find everywhere
• They have been developed to easy our daily life– Train/Airplane timetable schedule– Routing
• We live in the age of information – Text, numbers, images, video, audio
04/19/23 53/74
Selfish routing• Pigou's Example
Suburb s, a nearby train station t.
Assuming that all drivers aim to minimize the driving time from s to t
s t
C(x) = 1
C(x) = x, with x in [0, 1]
04/19/23 54/74
Selfish routing
• We have good reason to expect all traffic to follow the lower road
• Social optimal? ½ to the long, wide highway, ½ to the lower road. – selfish behavior need not produce a socially
optimal outcome
04/19/23 55/74
Braess's Paradox
s
v
w
t
C(x) = 1
C(x) = xC(x) = 1
C(x) = x
04/19/23 56/74
Braess's Paradox
s
v
w
t
C(x) = 1
C(x) = xC(x) = 1
C(x) = x
C(x) = 0
04/19/23 57/74
Braess's Paradox
• Paradox thus shows that the intuitively helpful action of adding a new zero-cost link can negatively impact all of the traffic!
• With selfish routing, network improvements can degrade network performance.
Link attack example
• Re-identify the medical record of the governor of Massachussetts– MA collects and publishes sanitized medical data for state
employees (microdata) left circle– voter registration list of MA (publicly available data) right circle
• looking for governor’s record• join the tables:
– 6 people had his birth date
– 3 were men
– 1 in his zipcode
• regarding the US 1990 census data– 87% of the population are unique based on (zipcode,
gender, dob)
Privacy in microdata• the role of attributes in microdata
– explicit identifiers are removed
– quasi identifiers can be used to re-identify individuals
– sensitive attributes (may not exist!) carry sensitive information
Name Birthdate
Sex Zipcode Disease
Andre 21/1/79 male 53715 Flu
Beth 10/1/81 female 55410 Hepatitis
Carol 1/10/44 female 90210 Brochitis
Dan 21/2/84 male 02174 Sprained Ankle
Ellen 19/4/72 female 02237 AIDS
identifier
quasi identifiers sensitive
Name Birthdate
Sex Zipcode Disease
Andre 21/1/79 male 53715 FluBeth 10/1/81 female 55410 HepatitisCarol 1/10/44 female 90210 BrochitisDan 21/2/84 male 02174 Sprained
AnkleEllen 19/4/72 female 02237 AIDS
k-anonymity• k-anonymity: intuitively, hide each individual among k-1 others
– each QI set of values should appear at least k times in the released microdata
– linking cannot be performed with confidence > 1/k
– sensitive attributes are not considered (going to revisit this...)
• how to achieve this?– generalization and suppression
– value perturbation is not considered (we should remain truthful to original values )
• privacy vs utility tradeoff– do not anonymize more than necessary
04/19/23 61/74
Advertisement Auction
• Auction – Dutch auction– Vickrey auction
• Ad placement
k-anonymity exampletools for anonymization• generalization
– publish more general values, i.e., given a domain hierarchy, roll-up
• suppression– remove tuples, i.e., do not publish outliers– often the number of suppressed tuples is bounded
Birthdate
Sex Zipcode
21/1/79 male 53715
10/1/79 female 55410
1/10/44 female 90210
21/2/83 male 02274
19/4/82 male 02237
Birthdate
SexZipcode
group 1*/1/79 person 5*****/1/79 person 5****
suppressed
1/10/44 female 90210
group 2*/*/8* male 022***/*/8* male 022**
original microdata 2-anonymous data
04/19/23 63/74
TSP• Trucking company with a central warehouse• Each day, it loads up the truck at the warehouse
and sends it around to several locations to make deliveries.
• At the end of the day, the truck must end up back at the warehouse so that it ready to be loaded for the next day.
• To reduce the costs, the company wants to select an order of delivery stops that yields the lowest overall distance traveled by the truck.
04/19/23 64/74
04/19/23 65/74
04/19/23 66/74
04/19/23 67/74
Pizza delivery
• One can give a call or via internet to order a pizza for dinner
• We want the hot, fresh and tasty pizzas
• How should they delivery the pizzas upon the reception of orders??
• Immediately or wait some minutes for next orders in the near places?
04/19/23 68/74
The Ski problem
• The Ski problem [Karp 92]: A skier must decide every day she goes skiing whether to rent or buy skis, unless or until she decides to buy them. The skiier doesn’t know how many days she will go on skiing before she gets tired of this hobbie. The cost to rent skis for a day is 1 unit, while the cost to buy the skis is B units.
• How can she save money?
04/19/23 69/74
Lost cow problem
• A short-sighted cow (or assume it’s dark, or foggy, or ...) is standing in front of a fence and does not know in which direction the only gate in the fence might be. How can the cow find the gate without walking too great a detour?
• How can two soldiers get together when lost in battlefield ?
04/19/23 70/74
Erdős project – shortest path
• Paul Erdős(1913-1996) has an Erdős number of zero. If the lowest Erdős number of a coauthor is X, then the author's Erdős number is X + 1.
04/19/23 71/74
Nevanlinna Prize winners
• NAME YEAR COUNTRY ERDÖS NUMBER
• Robert Tarjan 1982 USA 2
• Leslie Valiant 1986 Hungary/Gt Brtn 3
• Alexander Razborov 1990 Russia 2
• Avi Wigderson 1994 Israel 2
• Peter Shor 1998 USA 2
• Madhu Sudan 2002 India/USA 2
• Jon Kleinberg 2006 USA 3
• Daniel Alan Spielman 2010 USA
04/19/23 72/74
Other famous people • Albert Einstein 1921 Physics 2 • Chen Ning Yang 1957 Physics 4 • Tsung-dao Lee 1957 Physics 5 • John F. Nash 1994 Economics 4 • Edmund S. Phelps 2006 Economics 4 • Shing-Tung Yau 1982 China 2 • Shiing Shen Chern 1983-84 China 2 • Alan Turing computer science 5 • John von Neumann mathematics 3 • David Hilbert mathematics 4 • Donald E. Knuth 2
Extensions of shortest path
• On k-skip Shortest Paths (SIGMOD 2011)
04/19/23 73/74
04/19/23 74/74
History of Algorithm
• The word algorithm comes from the name of the 9th century Persian mathematician Abu Abdullah Muhammad ibn Musa al-Khwarizmi whose works introduced Arabic numerals and algebraic concepts.
• The word algorism originally referred only to the rules of performing arithmetic using Arabic numerals but evolved into algorithm by the 18th century. The word has now evolved to include all definite procedures for solving problems or performing tasks.
04/19/23 75/74
History – con.
• The first case of an algorithm written for a computer was Ada Byron's notes on the analytical engine written in 1842, for which she is considered by many to be the world's first programmer. However, since Charles Babbage never completed his analytical engine the algorithm was never implemented on it.
• This problem was largely solved with the description of the Turing machine, an abstract model of a computer formulated by Alan Turing, and the demonstration that every method yet found for describing "well-defined procedures" advanced by other mathematicians could be emulated on a Turing machine (a statement known as the Church-Turing thesis).
04/19/23 76/74
Why you come here?
04/19/23 77/74
Requirement • Come to the class (*)
• Ask questions
• Thinking :Why it is ok now ?How about other methods ?
04/19/23 78/74
Kinds of analyses
• Worst-case: (usually)– T(n) = maximum time of algorithm on any input of size
n.
• Average-case: (sometimes)– T(n) = expected time of algorithm over all inputs of
size n.– Need assumption of statistical distribution of inputs.
• Best-case: (bogus) Cheat with a slow algorithm that works fast on
some input.
04/19/23 79/74
Uniform distribution
04/19/23 80/74
Performance Measures for On-line Algorithms
• Competitive ratio
• Max/Max ratio
• Smoothed Competitiveness