Header Ads Widget

NAIVE BAYES CLASSIFIER

  • The naive Bayes classifier applies to learning tasks where each instance x is described by a conjunction of attribute values and where the target function f (x) can take on any value from some finite set V.
  • A set of training examples of the target function is provided, and a new instance is presented, described by the tuple of attribute values (al, a2.. .am).
  • The learner is asked to predict the target value, or classification, for this new instance.

The Bayesian approach to classifying the new instance is to assign the most probable target value, VMAP, given the attribute values (al, a2.. .am) that describe the instance



Use Bayes theorem to rewrite this expression as 


  • The naive Bayes classifier is based on the assumption that the attribute values are conditionally independent given the target value. Means, the assumption is that given the target value of the instance, the probability of observing the conjunction (al, a2.. .am), is just the product of the probabilities for the individual attributes:

Substituting this into Equation (1), 

Naive Bayes classifier:

Where, VNB denotes the target value output by the naive Bayes classifier


An Illustrative Example

  • Let us apply the naive Bayes classifier to a concept learning problem i.e., classifying days according to whether someone will play tennis.
  • The below table provides a set of 14 training examples of the target concept PlayTennis, where each day is described by the attributes Outlook, Temperature, Humidity, and Wind

Day

Outlook

Temperature

Humidity

Wind

PlayTennis

D1

Sunny

Hot

High

Weak

No

D2

Sunny

Hot

High

Strong

No

D3

Overcast

Hot

High

Weak

Yes

D4

Rain

Mild

High

Weak

Yes

D5

Rain

Cool

Normal

Weak

Yes

D6

Rain

Cool

Normal

Strong

No

D7

Overcast

Cool

Normal

Strong

Yes

D8

Sunny

Mild

High

Weak

No

D9

Sunny

Cool

Normal

Weak

Yes

D10

Rain

Mild

Normal

Weak

Yes

D11

Sunny

Mild

Normal

Strong

Yes

D12

Overcast

Mild

High

Strong

Yes

D13

Overcast

Hot

Normal

Weak

Yes

D14

Rain

Mild

High

Strong

No

  • Use the naive Bayes classifier and the training data from this table to classify the following novel instance:

< Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong >

  • Our task is to predict the target value (yes or no) of the target concept PlayTennis for this new instance



The probabilities of the different target values can easily be estimated based on their frequencies over the 14 training examples

  • P(P1ayTennis = yes) = 9/14 = 0.64
  • P(P1ayTennis = no) = 5/14 = 0.36

 Similarly, estimate the conditional probabilities. For example, those for Wind = strong

  • P(Wind = strong | PlayTennis = yes) = 3/9 = 0.33
  • P(Wind = strong | PlayTennis = no) = 3/5 = 0.60

 Calculate VNB according to Equation (1) 


Thus, the naive Bayes classifier assigns the target value PlayTennis = no to this new instance, based on the probability estimates learned from the training data.

By normalizing the above quantities to sum to one, calculate the conditional probability that the target value is no, given the observed attribute values

 


 

Estimating Probabilities

  • We have estimated probabilities by the fraction of times the event is observed to occur over the total number of opportunities.
  • For example, in the above case we estimated P(Wind = strong | Play Tennis = no) by the fraction nc /n where, n = 5 is the total number of training examples for which PlayTennis = no, and nc = 3 is the number of these for which Wind = strong.
  • When nc = 0, then nc /n will be zero and this probability term will dominate the quantity calculated in Equation (2) requires multiplying all the other probability terms by this zero value
  • To avoid this difficulty we can adopt a Bayesian approach to estimating the probability, using the m-estimate defined as follows

 m -estimate of probability:




  • p is our prior estimate of the probability we wish to determine, and m is a constant called the equivalent sample size, which determines how heavily to weight p relative to the observed data
  • Method for choosing p in the absence of other information is to assume uniform priors; that is, if an attribute has k possible values we set p = 1 /k.

 

Post a Comment

0 Comments