lecture 6 - stanford universityweb.stanford.edu/.../lecture6/lecture6-compressed.pdflecture 6...
TRANSCRIPT
Lecture6SortinglowerboundsandO(n)-timesorting
Announcements
• HW2dueFriday
• HW3postedFriday
• PleasesendanyOAEletterstoJessicaSu(jtysu)ASAP.
Sorting
• We’veseenafewO(nlog(n))-time algorithms.
• MERGESORThasworst-caserunningtimeO(nlog(n))
• QUICKSORThasexpectedrunningtimeO(nlog(n))
Canwedobetter?
Dependsonwho
youask…
AnO(1)-timealgorithmforsorting:
StickSort
• Problem:sortthesensticksbylength.
• Algorithm:• Dropthemonatable.
• Nowtheyaresortedthisway.
Thatmayhavebeenunsatisfying
• ButStickSort doesraisesomeimportantquestions:
• Whatisourmodelofcomputation?
• Input: array
• Output: sortedarray
• Operationsallowed:comparisons
-vs-
• Input:sticks
• Output:sortedsticksinverticalorder
• Operationsallowed:droppingontables
• Whatarereasonablemodelsofcomputation?
Today:two(more)models
• Comparison-basedsortingmodel
• ThisincludesMergeSort,QuickSort,InsertionSort
• We’llseethatany algorithminthismodelmusttakeatleastΩ(nlog(n)) steps.
• Anothermodel(morereasonablethanthestickmodel…)
• BucketSort andRadixSort
• BothrunintimeO(n)
Comparison-basedsorting
Comparison-basedsortingalgorithms
Thereisagenie whoknowswhat
therightorderis.
ThegeniecananswerYES/NO
questionsoftheform:
is [this] bigger than [that]?Algorithm
Wanttosorttheseitems.
There’ssomeorderingonthem,butwedon’tknowwhatitis.
Is bigger than ?
YES
Thealgorithm’sjobisto
outputacorrectlysorted
listofalltheobjects.
isshorthandfor
“thefirstthingintheinputlist”
Allthesortingalgorithmswehaveseenworklikethis.
7 6 3 5 1 4 2eg,QuickSort:
Is bigger than ? 7 5
Is bigger than ?
Is bigger than ?
6
3
5
5
YES
YES
NO
7 6 3
5 etc.
Pivot!
LowerboundofΩ(nlog(n)).
• Theorem:
• Anydeterministiccomparison-basedsortingalgorithm musttakeΩ(nlog(n))steps.
• Anyrandomized comparison-basedsortingalgorithm musttakeΩ(nlog(n))stepsinexpectation.
• Howmightweprovethis?
1. Considerallcomparison-basedalgorithms,one-by-one,andanalyzethem.
2. Don’tdothat.
Instead,arguethatallcomparison-basedsorting
algorithmsgiverisetoadecisiontree.
Thenanalyzedecisiontrees.
Decisiontrees
Sortthesethreethings. ?≤
YESNO
≤
YES
?
NO
≤ ?YES NO
etc…
Allcomparison-basedalgorithmslooklikethis
Example:Sortthese
threethingsusing
QuickSort.
≤
NO
?
YES
L RRL
≤ ?NOYES
L RL RReturn ≤
NOYES
?Thenwe’redone
(aftersomebase-
casestuff)
Now
recurse
onR
Pivot!
L RL R
Pivot!
Return ReturnIneithercase,we’redone
(aftersomebasecasestuffand
returningrecursivecalls).
etc...
Allcomparison-basedalgorithmshaveanassociateddecisiontree.
YES NO
?
??
YESNOYES
NO
????
Theleavesofthis
treeareallpossible
orderingsofthe
items:whenwe
reachaleafwe
returnit.
Whatdoesthedecision
treeforMERGESORTING
fourelementslooklike?
Olliethe
over-achievingostrich
Runningthealgorithmonagiven
inputcorrespondstotakinga
particularpaththroughthetree.
What’stheruntimeonaparticularinput?
YES NO
?
??
YESNOYES
NO
????
Ifwetakethispaththrough
thetree,theruntimeis
Ω(lengthofthepath).
Atleastthenumber
ofcomparisonsthat
aremadeonthat
input.
What’stheworst-case runtime?
YES NO
?
??
YESNOYES
NO
????
AtleastΩ(lengthofthelongestpath).
Howlongisthelongestpath?
YESNO
?
??YES
NOYESNO
????
• Thisisabinarytreewithat
least_____leaves.
• Theshallowesttreewithn!
leavesisthecompletely
balancedone,whichhas
depth______.
• Soinallsuchtrees,the
longestpathisatleastlog(n!).
n!
log(n!)
• n!isabout(n/e)n (Stirling’s approx.*).
• log(n!)isaboutnlog(n/e)=Ω(nlog(n)).Conclusion:thelongestpath
haslengthatleastΩ(nlog(n)).
beingsloppyabout
floorsandceilings!
Wewantastatement:inallsuchtrees,
thelongestpathisatleast_____
*Stirling’s approximationisabitmorecomplicatedthanthis,butthisisgoodenoughfortheasymptoticresultwewant.
LowerboundofΩ(nlog(n)).• Theorem:• Anydeterministiccomparison-basedsortingalgorithm musttakeΩ(nlog(n)) steps.
• Proofrecap:• Anydeterministiccomparison-basedalgorithmcanberepresentedasadecisiontreewithn!leaves.
• Theworst-caserunningtimeisatleastthedepthofthedecisiontree.
• Alldecisiontreeswithn!leaveshavedepthΩ(nlog(n)).
• Soanycomparison-basedsortingalgorithmmusthaveworst-caserunningtimeatleast Ω(nlog(n)).
\end{Aside}
• Forexample,QuickSort?
• Theorem:
• Anyrandomized comparison-basedsortingalgorithmmusttakeΩ(nlog(n))stepsinexpectation.
• Proof:
• seelecturenotes
• (sameideasasdeterministiccase) Trytoprovethis
yourself!
Ollietheover-achievingostrich
Aside:Whataboutrandomizedalgorithms?
Butlookonthebrightside!
• Theorem:
• Anydeterministiccomparison-basedsortingalgorithm musttakeΩ(nlog(n)) steps.
• Theorem:
• Anyrandomized comparison-basedsortingalgorithm musttakeΩ(nlog(n))stepsinexpectation.
Sothat’sbadnews.
ButwhataboutStickSort?
• Thisisoneofthecoolthingsaboutlowerboundslikethis:weknowwhenwecandeclarevictory!
MergeSort isoptimal!
• StickSort can’tbeimplementedasacomparison-basedsortingalgorithm.Sotheselowerboundsdon’tapply.
• ButStickSort waskindofdumb.EspeciallyifIhave
tospendtime
cuttingallthose
stickstobethe
rightsize!Butmighttherebeanothermodelofcomputationthat’slessdumb,inwhichwecansortfaster?
Beyondcomparison-basedsortingalgorithms
Anothermodelofcomputation
• Theitemsyouaresortinghavemeaningfulvalues.
9 6 3 5 2 1 2
insteadof
Pre-lectureexercise
• SortingCS161studentsbytheirmonthofbirth.
• [Discussiononboard]
1 1 4 5
Anothermodelofcomputation
• Theitemsyouaresortinghavemeaningfulvalues.
9 6 3 5 2 1 2
insteadof
Whymightthishelp?
BucketSort: 9 6 3 5 2 1 2
1 2 3 4 5 6 7 8 9
963 521
2
SORTED!IntimeO(n).
Implementthebuckets aslinked
lists.Theyarefirst-in,first-out.
Thiswillbeusefullater.
Concatenate
thebuckets!
Note:thisisasimplificationof
whatCLRScalls“BucketSort”
Issues
• Needtobeabletoknowwhatbuckettoputsomethingin.
• Wheredoesgo?
• That’sokayfornow:it’spartofthemodel.
• Needtoknowwhatvaluesmightshowupaheadoftime.
• Space…
2 12345 13 21000 50 100000000 1
Onesolution:RadixSortSaywe’resortingintegers.
• Idea:BucketSort ontheleast-significantdigitfirst,thenthenextleast-significant,andsoon.
1 2 3 4 5 6 7 8 9
21 345 13 101 50 234 1
0
345
50 1321
101
1
234
Step1:BucketSort onLSB:
50 21 101 1 13 234 345
Step2:BucketSort onthe2nd digit
1 2 3 4 5 6 7 8 90
50 21 101 1 13 234 345
502113101
234
1 345
101 1 13 21 234 345 50
Step3:BucketSort onthe3rd digit
1 2 3 4 5 6 7 8 90
50
21
13
101
234
1
345
1 13 21 50 101 234 345
101 1 13 21 234 345 50
Itworked!!
Whydoesthiswork?
21 345 13 101 50 234 1
50 21 101 1 13 234 345
1 13 21 50 101 234 345
101 1 13 21 234 345 50
Originalarray:
Nextarrayissortedbythefirstdigit.
Nextarrayissortedbythefirsttwodigits.
Nextarrayissortedbyallthreedigits.
Sortedarray
50 21 101 1 13 234 345
101 01 13 21 234 345 50
001 013 021 050 101 234 345
Formally…
• Arguebyinduction.
• Inductivehypothesis:
Luckythelackadaisicallemur
Oratleasta
littleformally!
Whydoesthiswork?
21 345 13 101 50 234 1
50 21 101 1 13 234 345
1 13 21 50 101 234 345
101 1 13 21 234 345 50
Originalarray:
Nextarrayissortedbythefirstdigit.
Nextarrayissortedbythefirsttwodigits.
Nextarrayissortedbyallthreedigits.
Sortedarray
50 21 101 1 13 234 345
101 01 13 21 234 345 50
001 013 021 050 101 234 345
Formally…
• Arguebyinduction.
• Inductivehypthesis:
• Afterthek’th iteration,thearrayissortedbythefirstkleast-significantdigits.
• Basecase:
• “Sortedby0least-significantdigits”meansnotsorted.
• Inductivestep:
• (SeelecturenotesorCLRS)
• Conclusion:
• Afterthed’th iteration,thearrayissortedbythedleast-significantdigits.Aka,it’ssorted.
Luckythelackadaisicallemur
Oratleasta
littleformally!
Pluckythepedanticpenguin
Thisneedstouse:(1)bucketsort
works,and(2)wetreateachbucket
asaFIFOqueue.*
*thebuzzwordhereisthat
bucketSort isstable.
Canwedobetter?whatifM=n?
• Saytheyared-digit numbers.
• Therearediterations.
• EachiterationtakestimeO(n+10)=O(n)
• Totaltime:O(nd).
• SaythebiggestintegerisM.Whatisd?
• d= log%& 𝑀 + 1
• soO(nd)=O(nlog10(M)).
Whatistherunningtime?The“10”isbecausewe
areworkingbase10.
Trade-offs…
• RadixSort workswithanybase.
• Beforewediditbaser=10.
• Butwecoulddoitbaser=2 orr=20 justaseasily.
• [Onboard]
• RunningtimeforgeneralrandM?
• [Onboard]
Trade-offsctd…• Therearennumbers,biggestoneisM.
• Whatshouldwechooseforr(intermsofM,n)?
Runningtime:𝑂 (𝑛 + 𝑟) ⋅ log0 𝑀
There’ssomesweetspot… (andmaybeit’sgrowingwithMandn?)
IPython NotebookforLecture6
Weget…
• [Discussiononboard…]
• Ifwechooser=n,runningtimeis𝑻 𝒏 = 𝑶 𝒏 ⋅ 𝐥𝐨𝐠𝐧 𝑴
• IfM=O(n),T(n)=O(n).Awesome!
• IfM=Ω(nn),T(n)=O(n2)…
Ollietheover-achievingostrich
Choosingr=n
isprettygood.
What’stheoptimal
choiceofr?
Thestorysofar
• Ifweuseacomparison-basedsortingalgorithm,itMUSTrunintimeΩ(nlog(n)).
• Ifweassumeabitofstructureonthevalues,wehaveanO(n)-timesortingalgorithm.
9 6 3 5 2 1 2
Whywouldweeverusea
comparison-basedsortingalgorithm??
Whywouldweeveruse
acomparison-basedsortingalgorithm?
• Lotsofprecision…
• RadixSort needsextramemoryforthebuckets.• Notin-place
• Iwanttosortemojibytalkingtoagenie.• RadixSort makesmoreassumptionsontheinput.
𝜋123456
987654 𝑒 140! 2.1234123 nn 42
• Wecancomparetheseprettyquickly(justlookatthemost-significantdigit):
• 𝜋 =3.14….
• e=2.78….
• ButtodoRadixSort we’dhavetolookateverydigit.
• Thisisespeciallyproblematicsincebothofthesehaveinfinitelymanydigits...
Evenwithintegers,ifthe
biggestonisreallybig,
RadixSort isslow.
Recap• Howdifficultaproblemisdependsonthemodelofcomputation.
• Howreasonableamodelofcomputationisisupfordebate.
• Comparison-basedsortingmodel
• ThisincludesMergeSort,QuickSort,InsertionSort
• Any algorithminthismodelmustuseatleastΩ(nlog(n))operations.
• Butifwearesortingsmallintegers(orotherreasonabledata):
• BucketSort andRadixSort
• BothrunintimeO(n)
Nexttime
• Binarysearchtrees!
• Balancedbinarysearchtrees!
• Specialguestlecturer:SamKim!
• Pre-lectureexerciseforLecture7
• Rememberbinarysearchtrees?
Before nexttime