“The computer was born to solve problems that did not exist before.”

Random Posts

Thursday, July 1, 2021

BAYES THEOREM AND CONCEPT LEARNING

What is the relationship between Bayes theorem and the problem of concept learning?

Since Bayes theorem provides a principled way to calculate the posterior probability of each hypothesis given the training data, and can use it as the basis for a straightforward learning algorithm that calculates the probability for each possible hypothesis, then outputs the most probable. 

Brute-Force Bayes Concept Learning 

Consider the concept learning problem

  • Assume the learner considers some finite hypothesis space H defined over the instance space X, in which the task is to learn some target concept c : X → {0,1}.
  • Learner is given some sequence of training examples ((x1, d1) . . . (xm, dm)) where xi is some instance from X and where di is the target value of xi (i.e., di = c(xi)).
  • The sequence of target values are written as D = (d1 . . . dm). 

We can design a straightforward concept learning algorithm to output the maximum a posteriori hypothesis, based on Bayes theorem, as follows:


BRUTE-FORCE MAP LEARNING algorithm: 

  • For each hypothesis h in H, calculate the posterior probability

  • Output the hypothesis hMAP with the highest posterior probability

In order specify a learning problem for the BRUTE-FORCE MAP LEARNING algorithm we must specify what values are to be used for P(h) and for P(D|h) ?

Let’s choose P(h) and for P(D|h) to be consistent with the following assumptions:
  • The training data D is noise free (i.e., di = c(xi))
  • The target concept c is contained in the hypothesis space H
  • Do not have a priori reason to believe that any hypothesis is more probable than any other.

What values should we specify for P(h)?

  • Given no prior knowledge that one hypothesis is more likely than another, it is reasonable to assign the same prior probability to every hypothesis h in H.
  • Assume the target concept is contained in H and require that these prior probabilities sum to 1.

What choice shall we make for P(D|h)?

  • P(D|h) is the probability of observing the target values D = (d1 . . .dm) for the fixed set of instances (x1 . . . xm), given a world in which hypothesis h holds
  • Since we assume noise-free training data, the probability of observing classification di given h is just 1 if di = h(xi) and 0 if di h(xi). Therefore,


Given these choices for P(h) and for P(D|h) we now have a fully-defined problem for the above BRUTE-FORCE MAP LEARNING algorithm.

 Recalling Bayes theorem, we have

Consider the case where h is inconsistent with the training data D


The posterior probability of a hypothesis inconsistent with D is zero

 

Consider the case where h is consistent with D

Where, VSH,D is the subset of hypotheses from H that are consistent with D

To summarize, Bayes theorem implies that the posterior probability P(h|D) under our assumed P(h) and P(D|h) is







No comments:

Post a Comment

Post Top Ad

Your Ad Spot

Pages

SoraTemplates

Best Free and Premium Blogger Templates Provider.

Buy This Template