three concepts: information
TRANSCRIPT
![Page 1: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/1.jpg)
OutlineOccam’s RazorMDL Principle
Three Concepts: InformationLecture 5: MDL Principle
Teemu Roos
Complex Systems Computation GroupDepartment of Computer Science, University of Helsinki
Fall 2007
Teemu Roos Three Concepts: Information
![Page 2: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/2.jpg)
OutlineOccam’s RazorMDL Principle
Lecture 5: MDL Principle
Jorma Rissanen (left) receiving the IEEE Information TheorySociety Best Paper Award from Claude Shannon in 1986.
IEEE Golden Jubilee Award forTechnological Innovation (for theinvention of arithmetic coding)1998; IEEE Richard W. HammingMedal (for fundamentalcontribution to information theory,statistical inference, control theory,and the theory of complexity)1993; Kolmogorov Medal 2006;IBM Corporate Award (for theMDL/PMDL principles andstochastic complexity) 1991; IBMOutstanding Innovation Award(for work in statistical inference,information theory, and the theoryof complexity) 1988; ...
Teemu Roos Three Concepts: Information
![Page 3: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/3.jpg)
OutlineOccam’s RazorMDL Principle
1 Occam’s RazorHouseVisual RecognitionAstronomyRazor
2 MDL PrincipleIdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Teemu Roos Three Concepts: Information
![Page 4: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/4.jpg)
OutlineOccam’s RazorMDL Principle
1 Occam’s RazorHouseVisual RecognitionAstronomyRazor
2 MDL PrincipleIdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Teemu Roos Three Concepts: Information
![Page 5: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/5.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
House
Teemu Roos Three Concepts: Information
![Page 6: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/6.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
House
Brandon has
1 cough,
2 severe abdominal pain,
3 nausea,
4 low blood pressure,
5 fever.
1 pneumonia,
common cold,
2 appendicitis,
gout medicine,
3 food poisoning,
gout medicine,
4 hemorrhage,
gout medicine,
5 meningitis.
common cold.
No single disease causes all of these.
Each symptom can be caused by some (possibly different) disease...
Dr. House explains the symptoms with two simple causes:
1 common cold, causing the cough and fever,
2 pharmacy error: cough medicine replaced by gout medicine.
Teemu Roos Three Concepts: Information
![Page 7: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/7.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
House
Brandon has
1 cough,
2 severe abdominal pain,
3 nausea,
4 low blood pressure,
5 fever.
1 pneumonia,
common cold,
2 appendicitis,
gout medicine,
3 food poisoning,
gout medicine,
4 hemorrhage,
gout medicine,
5 meningitis.
common cold.
No single disease causes all of these.
Each symptom can be caused by some (possibly different) disease...
Dr. House explains the symptoms with two simple causes:
1 common cold, causing the cough and fever,
2 pharmacy error: cough medicine replaced by gout medicine.
Teemu Roos Three Concepts: Information
![Page 8: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/8.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
House
Brandon has
1 cough,
2 severe abdominal pain,
3 nausea,
4 low blood pressure,
5 fever.
1 pneumonia,
common cold,
2 appendicitis,
gout medicine,
3 food poisoning,
gout medicine,
4 hemorrhage,
gout medicine,
5 meningitis.
common cold.
No single disease causes all of these.
Each symptom can be caused by some (possibly different) disease...
Dr. House explains the symptoms with two simple causes:
1 common cold, causing the cough and fever,
2 pharmacy error: cough medicine replaced by gout medicine.
Teemu Roos Three Concepts: Information
![Page 9: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/9.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
House
Brandon has
1 cough,
2 severe abdominal pain,
3 nausea,
4 low blood pressure,
5 fever.
1 pneumonia,
common cold,
2 appendicitis,
gout medicine,
3 food poisoning,
gout medicine,
4 hemorrhage,
gout medicine,
5 meningitis.
common cold.
No single disease causes all of these.
Each symptom can be caused by some (possibly different) disease...
Dr. House explains the symptoms with two simple causes:
1 common cold, causing the cough and fever,
2 pharmacy error: cough medicine replaced by gout medicine.
Teemu Roos Three Concepts: Information
![Page 10: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/10.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
House
Brandon has
1 cough,
2 severe abdominal pain,
3 nausea,
4 low blood pressure,
5 fever.
1 pneumonia,
common cold,
2 appendicitis,
gout medicine,
3 food poisoning,
gout medicine,
4 hemorrhage,
gout medicine,
5 meningitis.
common cold.
No single disease causes all of these.
Each symptom can be caused by some (possibly different) disease...
Dr. House explains the symptoms with two simple causes:
1 common cold, causing the cough and fever,
2 pharmacy error: cough medicine replaced by gout medicine.
Teemu Roos Three Concepts: Information
![Page 11: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/11.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
House
Brandon has
1 cough,
2 severe abdominal pain,
3 nausea,
4 low blood pressure,
5 fever.
1 pneumonia,
common cold,
2 appendicitis,
gout medicine,
3 food poisoning,
gout medicine,
4 hemorrhage,
gout medicine,
5 meningitis.
common cold.
No single disease causes all of these.
Each symptom can be caused by some (possibly different) disease...
Dr. House explains the symptoms with two simple causes:
1 common cold, causing the cough and fever,
2 pharmacy error: cough medicine replaced by gout medicine.
Teemu Roos Three Concepts: Information
![Page 12: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/12.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
House
Brandon has
1 cough,
2 severe abdominal pain,
3 nausea,
4 low blood pressure,
5 fever.
1 pneumonia,
common cold,
2 appendicitis,
gout medicine,
3 food poisoning,
gout medicine,
4 hemorrhage,
gout medicine,
5 meningitis.
common cold.
No single disease causes all of these.
Each symptom can be caused by some (possibly different) disease...
Dr. House explains the symptoms with two simple causes:
1 common cold, causing the cough and fever,
2 pharmacy error: cough medicine replaced by gout medicine.
Teemu Roos Three Concepts: Information
![Page 13: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/13.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
House
Brandon has
1 cough,
2 severe abdominal pain,
3 nausea,
4 low blood pressure,
5 fever.
1 pneumonia,
common cold,
2 appendicitis,
gout medicine,
3 food poisoning,
gout medicine,
4 hemorrhage,
gout medicine,
5 meningitis.
common cold.
No single disease causes all of these.
Each symptom can be caused by some (possibly different) disease...
Dr. House explains the symptoms with two simple causes:
1 common cold, causing the cough and fever,
2 pharmacy error: cough medicine replaced by gout medicine.
Teemu Roos Three Concepts: Information
![Page 14: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/14.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
House
Brandon has
1 cough,
2 severe abdominal pain,
3 nausea,
4 low blood pressure,
5 fever.
1 pneumonia,
common cold,
2 appendicitis,
gout medicine,
3 food poisoning,
gout medicine,
4 hemorrhage,
gout medicine,
5 meningitis.
common cold.
No single disease causes all of these.
Each symptom can be caused by some (possibly different) disease...
Dr. House explains the symptoms with two simple causes:
1 common cold, causing the cough and fever,
2 pharmacy error: cough medicine replaced by gout medicine.
Teemu Roos Three Concepts: Information
![Page 15: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/15.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
House
Brandon has
1 cough,
2 severe abdominal pain,
3 nausea,
4 low blood pressure,
5 fever.
1 common cold,
2 appendicitis,
gout medicine,
3 food poisoning,
gout medicine,
4 hemorrhage,
gout medicine,
5 common cold.
No single disease causes all of these.
Each symptom can be caused by some (possibly different) disease...
Dr. House explains the symptoms with two simple causes:
1 common cold, causing the cough and fever,
2 pharmacy error: cough medicine replaced by gout medicine.
Teemu Roos Three Concepts: Information
![Page 16: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/16.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
House
Brandon has
1 cough,
2 severe abdominal pain,
3 nausea,
4 low blood pressure,
5 fever.
1 common cold,
2 gout medicine,
3 gout medicine,
4 gout medicine,
5 common cold.
No single disease causes all of these.
Each symptom can be caused by some (possibly different) disease...
Dr. House explains the symptoms with two simple causes:
1 common cold, causing the cough and fever,
2 pharmacy error: cough medicine replaced by gout medicine.
Teemu Roos Three Concepts: Information
![Page 17: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/17.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Visual Recognition
Teemu Roos Three Concepts: Information
![Page 18: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/18.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Visual Recognition
Teemu Roos Three Concepts: Information
![Page 19: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/19.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Visual Recognition
Teemu Roos Three Concepts: Information
![Page 20: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/20.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Visual Recognition
Teemu Roos Three Concepts: Information
![Page 21: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/21.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Visual Recognition
Teemu Roos Three Concepts: Information
![Page 22: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/22.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Visual Recognition
Teemu Roos Three Concepts: Information
![Page 23: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/23.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Visual Recognition
Teemu Roos Three Concepts: Information
![Page 24: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/24.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Visual Recognition
Teemu Roos Three Concepts: Information
![Page 25: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/25.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Astronomy
Teemu Roos Three Concepts: Information
![Page 26: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/26.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Astronomy
Teemu Roos Three Concepts: Information
![Page 27: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/27.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Astronomy
Teemu Roos Three Concepts: Information
![Page 28: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/28.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
William of Ockham (c. 1288–1348)
Teemu Roos Three Concepts: Information
![Page 29: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/29.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Occam’s Razor
Occam’s Razor
Entities should not be multiplied beyond necessity.
Isaac Newton: “We are to admit no more causes of natural thingsthan such as are both true and sufficient to explain theirappearances.”
Diagnostic parsimony: Find the fewest possible causes thatexplain the symptoms.
(Hickam’s dictum: “Patients can have as many diseases as they damnwell please.”)
Teemu Roos Three Concepts: Information
![Page 30: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/30.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Occam’s Razor
Occam’s Razor
Entities should not be multiplied beyond necessity.
Isaac Newton: “We are to admit no more causes of natural thingsthan such as are both true and sufficient to explain theirappearances.”
Diagnostic parsimony: Find the fewest possible causes thatexplain the symptoms.
(Hickam’s dictum: “Patients can have as many diseases as they damnwell please.”)
Teemu Roos Three Concepts: Information
![Page 31: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/31.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Occam’s Razor
Occam’s Razor
Entities should not be multiplied beyond necessity.
Isaac Newton: “We are to admit no more causes of natural thingsthan such as are both true and sufficient to explain theirappearances.”
Diagnostic parsimony: Find the fewest possible causes thatexplain the symptoms.
(Hickam’s dictum: “Patients can have as many diseases as they damnwell please.”)
Teemu Roos Three Concepts: Information
![Page 32: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/32.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Occam’s Razor
Occam’s Razor
Entities should not be multiplied beyond necessity.
Isaac Newton: “We are to admit no more causes of natural thingsthan such as are both true and sufficient to explain theirappearances.”
Diagnostic parsimony: Find the fewest possible causes thatexplain the symptoms.
(Hickam’s dictum: “Patients can have as many diseases as they damnwell please.”)
Teemu Roos Three Concepts: Information
![Page 33: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/33.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Visual Recognition
Teemu Roos Three Concepts: Information
![Page 34: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/34.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Visual Recognition
Teemu Roos Three Concepts: Information
![Page 35: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/35.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Visual Recognition
Teemu Roos Three Concepts: Information
![Page 36: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/36.jpg)
OutlineOccam’s RazorMDL Principle
HouseVisual RecognitionAstronomyRazor
Visual Recognition
Teemu Roos Three Concepts: Information
![Page 37: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/37.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
1 Occam’s RazorHouseVisual RecognitionAstronomyRazor
2 MDL PrincipleIdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Teemu Roos Three Concepts: Information
![Page 38: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/38.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
MDL Principle
Minimum Description Length (MDL) Principle (2-part)
Choose the hypothesis which minimizes the sum of
1 the codelength of the hypothesis, and
2 the codelength of the data with the help of the hypothesis.
How to encode data with the help of a hypothesis?
Teemu Roos Three Concepts: Information
![Page 39: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/39.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
MDL Principle
Minimum Description Length (MDL) Principle (2-part)
Choose the hypothesis which minimizes the sum of
1 the codelength of the hypothesis, and
2 the codelength of the data with the help of the hypothesis.
How to encode data with the help of a hypothesis?
Teemu Roos Three Concepts: Information
![Page 40: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/40.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Encoding Data: Rules & Exceptions
Idea 1: Hypothesis = rule; encode exceptions.
Black box of size 25× 25 = 625, whitedots at (x1, y1), (x2, y2), (x3, y3).
For image of size n = 625, there are 2n
different images, and(n
k
)=
n!
k!(n − k)!
different groups of k exceptions.
Teemu Roos Three Concepts: Information
![Page 41: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/41.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Encoding Data: Rules & Exceptions
Idea 1: Hypothesis = rule; encode exceptions.
Black box of size 25× 25 = 625, whitedots at (x1, y1), (x2, y2), (x3, y3).
For image of size n = 625, there are 2n
different images, and(n
k
)=
n!
k!(n − k)!
different groups of k exceptions.
Teemu Roos Three Concepts: Information
![Page 42: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/42.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Encoding Data: Rules & Exceptions
Idea 1: Hypothesis = rule; encode exceptions.
Black box of size 25× 25 = 625, whitedots at (x1, y1), (x2, y2), (x3, y3).
For image of size n = 625, there are 2n
different images, and(n
k
)=
n!
k!(n − k)!
different groups of k exceptions.
Teemu Roos Three Concepts: Information
![Page 43: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/43.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Encoding Data: Rules & Exceptions
Idea 1: Hypothesis = rule; encode exceptions.
Black box of size 25× 25 = 625, whitedots at (x1, y1), (x2, y2), (x3, y3).
For image of size n = 625, there are 2n
different images, and(n
k
)=
n!
k!(n − k)!
different groups of k exceptions.
k = 1 :
(n
1
)= 625� 2625.
Codelength log2(n + 1) + log2
(n
k
)≈ 19 vs. 625
Teemu Roos Three Concepts: Information
![Page 44: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/44.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Encoding Data: Rules & Exceptions
Idea 1: Hypothesis = rule; encode exceptions.
Black box of size 25× 25 = 625, whitedots at (x1, y1), (x2, y2), (x3, y3).
For image of size n = 625, there are 2n
different images, and(n
k
)=
n!
k!(n − k)!
different groups of k exceptions.
k = 2 :
(n
2
)= 195 000� 2625.
Codelength log2(n + 1) + log2
(n
k
)≈ 27 vs. 625
Teemu Roos Three Concepts: Information
![Page 45: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/45.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Encoding Data: Rules & Exceptions
Idea 1: Hypothesis = rule; encode exceptions.
Black box of size 25× 25 = 625, whitedots at (x1, y1), (x2, y2), (x3, y3).
For image of size n = 625, there are 2n
different images, and(n
k
)=
n!
k!(n − k)!
different groups of k exceptions.
k = 3 :
(n
3
)= 40 495 000� 2625.
Codelength log2(n + 1) + log2
(n
k
)≈ 35 vs. 625
Teemu Roos Three Concepts: Information
![Page 46: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/46.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Encoding Data: Rules & Exceptions
Idea 1: Hypothesis = rule; encode exceptions.
Black box of size 25× 25 = 625, whitedots at (x1, y1), (x2, y2), (x3, y3).
For image of size n = 625, there are 2n
different images, and(n
k
)=
n!
k!(n − k)!
different groups of k exceptions.
k = 10 :
(n
10
)= 2331 354 000 000 000 000 000� 2625.
Codelength log2(n + 1) + log2
(n
k
)≈ 80 vs. 625
Teemu Roos Three Concepts: Information
![Page 47: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/47.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Encoding Data: Rules & Exceptions
Idea 1: Hypothesis = rule; encode exceptions.
Black box of size 25× 25 = 625, whitedots at (x1, y1), (x2, y2), (x3, y3).
For image of size n = 625, there are 2n
different images, and(n
k
)=
n!
k!(n − k)!
different groups of k exceptions.
k = 100 :
(n
100
)≈ 9.5× 10117 � 2625.
Codelength log2(n + 1) + log2
(n
k
)≈ 401 vs. 625
Teemu Roos Three Concepts: Information
![Page 48: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/48.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Encoding Data: Rules & Exceptions
Idea 1: Hypothesis = rule; encode exceptions.
Black box of size 25× 25 = 625, whitedots at (x1, y1), (x2, y2), (x3, y3).
For image of size n = 625, there are 2n
different images, and(n
k
)=
n!
k!(n − k)!
different groups of k exceptions.
k = 300 :
(n
300
)≈ 2.7× 10186 < 2625.
Codelength log2(n + 1) + log2
(n
k
)≈ 629 vs. 625
Teemu Roos Three Concepts: Information
![Page 49: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/49.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Encoding Data: Probabilistic Models
Idea 2: Hypothesis = probability distribution.
Noisy PSNR=19.8 MDL (A-B) PSNR=32.9
How to encode a distribution?
Teemu Roos Three Concepts: Information
![Page 50: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/50.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Encoding Data: Probabilistic Models
Idea 2: Hypothesis = probability distribution.
Noisy PSNR=19.8 MDL (A-B) PSNR=32.9
How to encode a distribution?
Teemu Roos Three Concepts: Information
![Page 51: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/51.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Two-Part Codes
Let M = {pθ : θ ∈ Θ} be a parametric probabilistic model class,i.e., a set of distributions pθ indexed by parameter θ.
If the parameter space Θ is discrete, we can construct a (prefix)code C1 : Θ→ {0, 1}∗ which maps each parameter value to acodeword.
For each distribution pθ there is a prefix code Cθ : D → {0, 1}∗where D ∈ D is a data-set to be encoded, such that the codewordlengths satisty
`θ(D) ≈ log21
pθ(D).
Using parameter value θ, the total codelength becomes (≈)
`1(θ) + log21
pθ(D).
Teemu Roos Three Concepts: Information
![Page 52: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/52.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Two-Part Codes
Let M = {pθ : θ ∈ Θ} be a parametric probabilistic model class,i.e., a set of distributions pθ indexed by parameter θ.
If the parameter space Θ is discrete, we can construct a (prefix)code C1 : Θ→ {0, 1}∗ which maps each parameter value to acodeword.
For each distribution pθ there is a prefix code Cθ : D → {0, 1}∗where D ∈ D is a data-set to be encoded, such that the codewordlengths satisty
`θ(D) ≈ log21
pθ(D).
Using parameter value θ, the total codelength becomes (≈)
`1(θ) + log21
pθ(D).
Teemu Roos Three Concepts: Information
![Page 53: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/53.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Two-Part Codes
Let M = {pθ : θ ∈ Θ} be a parametric probabilistic model class,i.e., a set of distributions pθ indexed by parameter θ.
If the parameter space Θ is discrete, we can construct a (prefix)code C1 : Θ→ {0, 1}∗ which maps each parameter value to acodeword.
For each distribution pθ there is a prefix code Cθ : D → {0, 1}∗where D ∈ D is a data-set to be encoded, such that the codewordlengths satisty
`θ(D) ≈ log21
pθ(D).
Using parameter value θ, the total codelength becomes (≈)
`1(θ) + log21
pθ(D).
Teemu Roos Three Concepts: Information
![Page 54: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/54.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Two-Part Codes
Let M = {pθ : θ ∈ Θ} be a parametric probabilistic model class,i.e., a set of distributions pθ indexed by parameter θ.
If the parameter space Θ is discrete, we can construct a (prefix)code C1 : Θ→ {0, 1}∗ which maps each parameter value to acodeword.
For each distribution pθ there is a prefix code Cθ : D → {0, 1}∗where D ∈ D is a data-set to be encoded, such that the codewordlengths satisty
`θ(D) ≈ log21
pθ(D).
Using parameter value θ, the total codelength becomes (≈)
`1(θ) + log21
pθ(D).
Teemu Roos Three Concepts: Information
![Page 55: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/55.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Two-Part Codes
The parameter value minimizing the codelength is given by themaximum likelihood parameter θ̂:
minθ∈Θ
log21
pθ(D)= log2
1
maxθ∈Θ
pθ(D)= log2
1
pθ̂(D).
It could of course be that `1(θ̂) is so large that some otherparameter value gives a shorter total codelength.
Teemu Roos Three Concepts: Information
![Page 56: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/56.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Two-Part Codes
The parameter value minimizing the codelength is given by themaximum likelihood parameter θ̂:
minθ∈Θ
log21
pθ(D)= log2
1
maxθ∈Θ
pθ(D)= log2
1
pθ̂(D).
It could of course be that `1(θ̂) is so large that some otherparameter value gives a shorter total codelength.
Teemu Roos Three Concepts: Information
![Page 57: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/57.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Multi-Part Codes
If there are more than one model classes, M1,M2, . . . it ispossible to construct multi-part codes where the parts are
1 Encoding of the model class index: C0(i), i ∈ N.
2 Encoding of the parameter (vector): Ci (θ), θ ∈ Θi .
3 Encoding of the data: Cθ(D), D ∈ D.
For instance, the models could be polynomials with differentdegrees, the parameters are the coefficients
θ0 + θ1x + θ2x2 + θ3x
3 + . . . + θkxk .
The more complex the model class (the higher the degree), thebetter it fits the data but the longer the second part Ci (θ)becomes.
Teemu Roos Three Concepts: Information
![Page 58: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/58.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Multi-Part Codes
If there are more than one model classes, M1,M2, . . . it ispossible to construct multi-part codes where the parts are
1 Encoding of the model class index: C0(i), i ∈ N.
2 Encoding of the parameter (vector): Ci (θ), θ ∈ Θi .
3 Encoding of the data: Cθ(D), D ∈ D.
For instance, the models could be polynomials with differentdegrees, the parameters are the coefficients
θ0 + θ1x + θ2x2 + θ3x
3 + . . . + θkxk .
The more complex the model class (the higher the degree), thebetter it fits the data but the longer the second part Ci (θ)becomes.
Teemu Roos Three Concepts: Information
![Page 59: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/59.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Multi-Part Codes
If there are more than one model classes, M1,M2, . . . it ispossible to construct multi-part codes where the parts are
1 Encoding of the model class index: C0(i), i ∈ N.
2 Encoding of the parameter (vector): Ci (θ), θ ∈ Θi .
3 Encoding of the data: Cθ(D), D ∈ D.
For instance, the models could be polynomials with differentdegrees, the parameters are the coefficients
θ0 + θ1x + θ2x2 + θ3x
3 + . . . + θkxk .
The more complex the model class (the higher the degree), thebetter it fits the data but the longer the second part Ci (θ)becomes.
Teemu Roos Three Concepts: Information
![Page 60: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/60.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Multi-Part Codes
If there are more than one model classes, M1,M2, . . . it ispossible to construct multi-part codes where the parts are
1 Encoding of the model class index: C0(i), i ∈ N.
2 Encoding of the parameter (vector): Ci (θ), θ ∈ Θi .
3 Encoding of the data: Cθ(D), D ∈ D.
For instance, the models could be polynomials with differentdegrees, the parameters are the coefficients
θ0 + θ1x + θ2x2 + θ3x
3 + . . . + θkxk .
The more complex the model class (the higher the degree), thebetter it fits the data but the longer the second part Ci (θ)becomes.
Teemu Roos Three Concepts: Information
![Page 61: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/61.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Multi-Part Codes
If there are more than one model classes, M1,M2, . . . it ispossible to construct multi-part codes where the parts are
1 Encoding of the model class index: C0(i), i ∈ N.
2 Encoding of the parameter (vector): Ci (θ), θ ∈ Θi .
3 Encoding of the data: Cθ(D), D ∈ D.
For instance, the models could be polynomials with differentdegrees, the parameters are the coefficients
θ0 + θ1x + θ2x2 + θ3x
3 + . . . + θkxk .
The more complex the model class (the higher the degree), thebetter it fits the data but the longer the second part Ci (θ)becomes.
Teemu Roos Three Concepts: Information
![Page 62: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/62.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Polynomials
Teemu Roos Three Concepts: Information
![Page 63: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/63.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Continuous Parameters
What if the parameters are continuous (like polynomialcoefficients)? How to encode continuous values?
Solution: Quantization. Choose a discrete subset of points,θ(1), θ(2), . . ., and use only them.
Information Geometry!
If the points are sufficiently dense (in a codelength sense) then thecodelength for data is still almost as short as minθ∈Θ `θ(D).
Teemu Roos Three Concepts: Information
![Page 64: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/64.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Continuous Parameters
What if the parameters are continuous (like polynomialcoefficients)? How to encode continuous values?
Solution: Quantization. Choose a discrete subset of points,θ(1), θ(2), . . ., and use only them.
Information Geometry!
If the points are sufficiently dense (in a codelength sense) then thecodelength for data is still almost as short as minθ∈Θ `θ(D).
Teemu Roos Three Concepts: Information
![Page 65: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/65.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Continuous Parameters
What if the parameters are continuous (like polynomialcoefficients)? How to encode continuous values?
Solution: Quantization. Choose a discrete subset of points,θ(1), θ(2), . . ., and use only them.
Θ
Information Geometry!
If the points are sufficiently dense (in a codelength sense) then thecodelength for data is still almost as short as minθ∈Θ `θ(D).
Teemu Roos Three Concepts: Information
![Page 66: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/66.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Continuous Parameters
What if the parameters are continuous (like polynomialcoefficients)? How to encode continuous values?
Solution: Quantization. Choose a discrete subset of points,θ(1), θ(2), . . ., and use only them.
Θ
Information Geometry!
If the points are sufficiently dense (in a codelength sense) then thecodelength for data is still almost as short as minθ∈Θ `θ(D).
Teemu Roos Three Concepts: Information
![Page 67: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/67.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Continuous Parameters
What if the parameters are continuous (like polynomialcoefficients)? How to encode continuous values?
Solution: Quantization. Choose a discrete subset of points,θ(1), θ(2), . . ., and use only them.
Θ
Information Geometry!
If the points are sufficiently dense (in a codelength sense) then thecodelength for data is still almost as short as minθ∈Θ `θ(D).
Teemu Roos Three Concepts: Information
![Page 68: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/68.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Continuous Parameters
What if the parameters are continuous (like polynomialcoefficients)? How to encode continuous values?
Solution: Quantization. Choose a discrete subset of points,θ(1), θ(2), . . ., and use only them.
Θ
Information Geometry!
If the points are sufficiently dense (in a codelength sense) then thecodelength for data is still almost as short as minθ∈Θ `θ(D).
Teemu Roos Three Concepts: Information
![Page 69: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/69.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Continuous Parameters
What if the parameters are continuous (like polynomialcoefficients)? How to encode continuous values?
Solution: Quantization. Choose a discrete subset of points,θ(1), θ(2), . . ., and use only them.
Θ
Information Geometry!
If the points are sufficiently dense (in a codelength sense) then thecodelength for data is still almost as short as minθ∈Θ `θ(D).
Teemu Roos Three Concepts: Information
![Page 70: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/70.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Continuous Parameters
What if the parameters are continuous (like polynomialcoefficients)? How to encode continuous values?
Solution: Quantization. Choose a discrete subset of points,θ(1), θ(2), . . ., and use only them.
Θ
Information Geometry!
If the points are sufficiently dense (in a codelength sense) then thecodelength for data is still almost as short as minθ∈Θ `θ(D).
Teemu Roos Three Concepts: Information
![Page 71: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/71.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Continuous Parameters
What if the parameters are continuous (like polynomialcoefficients)? How to encode continuous values?
Solution: Quantization. Choose a discrete subset of points,θ(1), θ(2), . . ., and use only them.
Θ
Information Geometry!
If the points are sufficiently dense (in a codelength sense) then thecodelength for data is still almost as short as minθ∈Θ `θ(D).
Teemu Roos Three Concepts: Information
![Page 72: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/72.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
About Quantization
How many points should there be in the subset θ(1), θ(2), . . .?
Intuition: Estimation accuracy of order1√n.
Theorem
Optimal quantization accuracy is of order1√n.
⇒ number of points ≈√
nk
= nk/2, where k = dim(Θ).
Teemu Roos Three Concepts: Information
![Page 73: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/73.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
About Quantization
How many points should there be in the subset θ(1), θ(2), . . .?
Intuition: Estimation accuracy of order1√n.
Theorem
Optimal quantization accuracy is of order1√n.
⇒ number of points ≈√
nk
= nk/2, where k = dim(Θ).
Teemu Roos Three Concepts: Information
![Page 74: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/74.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
About Quantization
How many points should there be in the subset θ(1), θ(2), . . .?
Intuition: Estimation accuracy of order1√n.
Theorem
Optimal quantization accuracy is of order1√n.
⇒ number of points ≈√
nk
= nk/2, where k = dim(Θ).
Teemu Roos Three Concepts: Information
![Page 75: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/75.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
About Quantization
How many points should there be in the subset θ(1), θ(2), . . .?
Intuition: Estimation accuracy of order1√n.
Theorem
Optimal quantization accuracy is of order1√n.
⇒ number of points ≈√
nk
= nk/2, where k = dim(Θ).
Teemu Roos Three Concepts: Information
![Page 76: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/76.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
About Quantization
How many points should there be in the subset θ(1), θ(2), . . .?
Intuition: Estimation accuracy of order1√n.
Theorem
Optimal quantization accuracy is of order1√n.
⇒ number of points ≈√
nk
= nk/2, where k = dim(Θ).
The codelength for the quantized parameters becomes
`(θq) ≈ log2 nk/2 =k
2log2 n .
Teemu Roos Three Concepts: Information
![Page 77: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/77.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Old-Style MDL
With the precision 1√n
the codelength for data is almost optimal:
minθq∈{θ(1),θ(2),...}
`θq(D) ≈ minθ∈Θ
`θ(D) = log21
pθ̂(D).
This gives the total codelength formula:
“Steam MDL”
`θq(D) + `(θq) ≈ log21
pθ̂(D)+
k
2log2 n .
Teemu Roos Three Concepts: Information
![Page 78: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/78.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Old-Style MDL
With the precision 1√n
the codelength for data is almost optimal:
minθq∈{θ(1),θ(2),...}
`θq(D) ≈ minθ∈Θ
`θ(D) = log21
pθ̂(D).
This gives the total codelength formula:
“Steam MDL”
`θq(D) + `(θq) ≈ log21
pθ̂(D)+
k
2log2 n .
Teemu Roos Three Concepts: Information
![Page 79: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/79.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Old-Style MDL
The k2 log2 n formula is only a rough approximation, and works well
only for very large samples.
Next week:
More advanced codes: mixtures, normalized maximumlikelihood, etc.
Foundations of MDL.
Teemu Roos Three Concepts: Information
![Page 80: Three Concepts: Information](https://reader034.vdocuments.mx/reader034/viewer/2022052017/628676cb9687c630d1714718/html5/thumbnails/80.jpg)
OutlineOccam’s RazorMDL Principle
IdeaRules & ExceptionsProbabilistic ModelsOld-Style MDL
Old-Style MDL
The k2 log2 n formula is only a rough approximation, and works well
only for very large samples.
Next week:
More advanced codes: mixtures, normalized maximumlikelihood, etc.
Foundations of MDL.Teemu Roos Three Concepts: Information