Header Ads Widget

BAYES THEOREM AND CONCEPT LEARNING

What is the relationship between Bayes theorem and the problem of concept learning?

Since Bayes theorem provides a principled way to calculate the posterior probability of each hypothesis given the training data, and can use it as the basis for a straightforward learning algorithm that calculates the probability for each possible hypothesis, then outputs the most probable. 

Brute-Force Bayes Concept Learning 

Consider the concept learning problem

  • Assume the learner considers some finite hypothesis space H defined over the instance space X, in which the task is to learn some target concept c : X → {0,1}.
  • Learner is given some sequence of training examples ((x1, d1) . . . (xm, dm)) where xi is some instance from X and where di is the target value of xi (i.e., di = c(xi)).
  • The sequence of target values are written as D = (d1 . . . dm). 

We can design a straightforward concept learning algorithm to output the maximum a posteriori hypothesis, based on Bayes theorem, as follows:


BRUTE-FORCE MAP LEARNING algorithm: 

  • For each hypothesis h in H, calculate the posterior probability

  • Output the hypothesis hMAP with the highest posterior probability

In order specify a learning problem for the BRUTE-FORCE MAP LEARNING algorithm we must specify what values are to be used for P(h) and for P(D|h) ?

Let’s choose P(h) and for P(D|h) to be consistent with the following assumptions:
  • The training data D is noise free (i.e., di = c(xi))
  • The target concept c is contained in the hypothesis space H
  • Do not have a priori reason to believe that any hypothesis is more probable than any other.

What values should we specify for P(h)?

  • Given no prior knowledge that one hypothesis is more likely than another, it is reasonable to assign the same prior probability to every hypothesis h in H.
  • Assume the target concept is contained in H and require that these prior probabilities sum to 1.

What choice shall we make for P(D|h)?

  • P(D|h) is the probability of observing the target values D = (d1 . . .dm) for the fixed set of instances (x1 . . . xm), given a world in which hypothesis h holds
  • Since we assume noise-free training data, the probability of observing classification di given h is just 1 if di = h(xi) and 0 if di h(xi). Therefore,


Given these choices for P(h) and for P(D|h) we now have a fully-defined problem for the above BRUTE-FORCE MAP LEARNING algorithm.

 Recalling Bayes theorem, we have

Consider the case where h is inconsistent with the training data D


The posterior probability of a hypothesis inconsistent with D is zero

 

Consider the case where h is consistent with D

Where, VSH,D is the subset of hypotheses from H that are consistent with D

To summarize, Bayes theorem implies that the posterior probability P(h|D) under our assumed P(h) and P(D|h) is







Post a Comment

0 Comments