Header Ads Widget

Bayes Optimal Classifier

Bayes Optimal Classifier

The Bayes optimal classifier is a probabilistic model that makes the most probable prediction for a new example, given the training dataset.

This model is also referred to as the Bayes optimal learner, the Bayes classifier, Bayes optimal decision boundary, or the Bayes optimal discriminant function.

  • Bayes Classifier: Probabilistic model that makes the most probable prediction for new examples.

Specifically, the Bayes optimal classifier answers the question:

What is the most probable classification of the new instance given the training data?

This is different from the MAP framework that seeks the most probable hypothesis (model). Instead, we are interested in making a specific prediction.

In general, the most probable classification of the new instance is obtained by combining the predictions of all hypotheses, weighted by their posterior probabilities.

The equation below demonstrates how to calculate the conditional probability for a new instance (vi) given the training data (D), given a space of hypotheses (H).

  • P(vj | D) = sum {h in H} P(vj | hi) * P(hi | D)

Where vj is a new instance to be classified, H is the set of hypotheses for classifying the instance, hi is a given hypothesis, P(vj | hi) is the posterior probability for vi given hypothesis hi, and P(hi | D) is the posterior probability of the hypothesis hi given the data D.

Selecting the outcome with the maximum probability is an example of a Bayes optimal classification.

  • max sum {h in H} P(vj | hi) * P(hi | D)

Any model that classifies examples using this equation is a Bayes optimal classifier and no other model can outperform this technique, on average.

Any system that classifies new instances according to [the equation] is called a Bayes optimal classifier, or Bayes optimal learner. No other classification method using the same hypothesis space and same prior knowledge can outperform this method on average.

We have to let that sink in.

It is a big deal.

It means that any other algorithm that operates on the same data, the same set of hypotheses, and same prior probabilities cannot outperform this approach, on average. Hence the name “optimal classifier.”

Although the classifier makes optimal predictions, it is not perfect given the uncertainty in the training data and incomplete coverage of the problem domain and hypothesis space. As such, the model will make errors. These errors are often referred to as Bayes errors.

The Bayes classifier produces the lowest possible test error rate, called the Bayes error rate. […] The Bayes error rate is analogous to the irreducible error …

Because the Bayes classifier is optimal, the Bayes error is the minimum possible error that can be made.

  • Bayes Error: The minimum possible error that can be made when making predictions.

Further, the model is often described in terms of classification, e.g. the Bayes Classifier. Nevertheless, the principle applies just as well to regression: that is, predictive modeling problems where a numerical value is predicted instead of a class label.

It is a theoretical model, but it is held up as an ideal that we may wish to pursue.

In theory we would always like to predict qualitative responses using the Bayes classifier. But for real data, we do not know the conditional distribution of Y given X, and so computing the Bayes classifier is impossible. Therefore, the Bayes classifier serves as an unattainable gold standard against which to compare other methods.

Because of the computational cost of this optimal strategy, we instead can work with direct simplifications of the approach.

Two of the most commonly used simplifications use a sampling algorithm for hypotheses, such as Gibbs sampling, or to use the simplifying assumptions of the Naive Bayes classifier.

  • Gibbs Algorithm. Randomly sample hypotheses biased on their posterior probability.
  • Naive Bayes. Assume that variables in the input data are conditionally independent.


Ques. Explain Naive Bayes classifier .

Answer:

Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem.It belongs to the family of probability classifiers, using Bayesian theorem. The reason why it is called ‘Naïve’  because it requires rigid independence assumption between input variables.

  • It gives the conditional probability of an event A given event B.
  • They are highly scalable, requiring a number of parameters linear in the number of variables in a learning problem.
  • It assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
  • Naive Bayes is known to outperform even highly sophisticated classification methods.

Advantages of Naive Bayes Classifier

  • It is simple and easy to implement
  • It is fast and can be used to make real-time predictions
  • It handles both continuous and discrete data

 

 

Ques. Consider a two-class (Tasty or non-Tasty ) problem in the following training data .Use Naive Bayes classifier to classify the pattern 

“Cook = Asha , Health-Status = Bad , Cuisine = Continental”.

No.

Cook

Health-Status

Cuisine

Tasty

1

Asha

Bad

Indian

yes

2

Asha

Good

Continental

yes

3

Sita

Bad

Indian

no

4

Sita

Good

Indian

yes

5

Usha

Bad

Indian

yes

6

Usha

Bad

Continental

no

7

Sita

Bad

Continental

no

8

Sita

Good

Continental

yes

9

Usha

Good

Indian

yes

10

Usha

Good

Continental

no



 

11

Question is demanding

Asha



 

Bad



 

Continental



 

?

Answer:

 

Cook

 

Health-

Status

 

Cuisine

 
 

Yes

No

 

Yes

No

 

Yes

No

Asha

2

0

Bad

2

3

Indian

4

1

Sita

2

2


 

Good


 

4


 

1


 

Continental


 

2


 

3

Usha

2

2

Tasty

Yes

No

6

4

Cook

 

Health-

Status

 

Cuisine

 
 

Yes

No

 

Yes

No

 

Yes

No

Asha

2/6

0

Bad

2/6

3/4

Indian

4/6

1/4

Sita

2/6

2/4


 

Good


 

4/6


 

1/4


 

Continental


 

2/6


 

3/4

Usha

2/6

2/4

Tasty

YesNo
6/104/10

Class

Formula: P(C) = N c / N

P(No)=4/10

P(Yes)=6/10

For discrete attributes,i.e 

Cook is having 3 discrete values-Asha,Sita,Usha

Health-Status is having 2 discrete values-Bad,Good

Cuisine is having 2 discrete values-Indian,Continental

Tasty is having 2 discrete values-No,Yes

So the formula to be used will be:

P(A i | C k) = |A ik | / N kc

Where |A ik| is the number of instances having attributes A i and belongs to class C k.

So, we will find the probabilities for both yes and no where Cook = “Asha” ,Health-Status = “Bad” and Cuisine = “Continental”

For ease we will solve it in tabular form:

No

Yes

P(Cook = Asha | No) = 0

P(Health-Status = Bad | No) = 3/4

P(Cuisine = Continental | No) = 3/4

P(Cook=Asha | Yes) = 2/6

P(Health-Status = Bad | Yes) = 2/6

P(Cuisine = Continental | Yes) = 2/6

Total Likewise Probability for No = 0*¾*¾*4/10 = 0

Total Likewise Probability for Yes =

2/6*2/6*2/6*6/10 = 0.02

Since Likewise Probability for Yes > No hence Answer will be Yes which indicates 

“Continental Cuisine made by Cook Asha where Health-Status is Bad is Tasty(Yes).”

Post a Comment

0 Comments