The Evolution of Probabilities Associated with Hypotheses

Rahul Saini July 01, 2021

Figure (a) all hypotheses have the same probability.
Figures (b) and (c), As training data accumulates, the posterior probability for inconsistent hypotheses becomes zero while the total probability summing to 1 is shared equally among the remaining consistent hypotheses.

MAP Hypotheses and Consistent Learners

A learning algorithm is a consistent learner if it outputs a hypothesis that commits zero errors over the training examples.
Every consistent learner outputs a MAP hypothesis, if we assume a uniform prior probability distribution over H (P(hi) = P(hj) for all i, j), and deterministic, noise free training data (P(D|h) =1 if D and h are consistent, and 0 otherwise).

Example:

FIND-S outputs a consistent hypothesis, it will output a MAP hypothesis under the probability distributions P(h) and P(D|h) defined above.
Are there other probability distributions for P(h) and P(D|h) under which FIND-S outputs MAP hypotheses? Yes.
Because FIND-S outputs a maximally specific hypothesis from the version space, its output hypothesis will be a MAP hypothesis relative to any prior probability distribution that favours more specific hypotheses.

Note

Bayesian framework is a way to characterize the behaviour of learning algorithms
By identifying probability distributions P(h) and P(D|h) under which the output is a optimal hypothesis, implicit assumptions of the algorithm can be characterized (Inductive Bias)
Inductive inference is modelled by an equivalent probabilistic reasoning system based on Bayes theorem

Post a Comment

0 Comments