- A Bayesian perspective on Occam’s razor
- Motivated by interpreting the definition of hMAP in the light of basic concepts from information theory.
This equation (1) can be interpreted as a statement that short hypotheses are preferred, assuming
a particular representation scheme for encoding
hypotheses and data
- -log2P(h): the description length of h under the optimal encoding for the hypothesis space H, LCH (h) = −log2P(h), where CH is the optimal code for hypothesis space H.
- -log2P(D | h): the description length of the training data D given hypothesis h, under the optimal encoding from the hypothesis space H: LCH (D|h) = −log2P(D| h) , where C D|h is the optimal code for describing data D assuming that both the sender and receiver know the hypothesis h.
- Rewrite Equation (1) to show that hMAP is the hypothesis h that minimizes the sum given by the description length of the hypothesis plus the description length of the data given the hypothesis.
Where, CH and CD|h are the optimal encodings for H and for D given h
Minimum Description Length principle:
0 Comments