Header Ads Widget

Bayes Theorem

Bayes Theorem of Conditional Probability

Before we dive into Bayes theorem, let’s review marginal, joint, and conditional probability.

Recall that marginal probability is the probability of an event, irrespective of other random variables. If the random variable is independent, then it is the probability of the event directly, otherwise, if the variable is dependent upon other variables, then the marginal probability is the probability of the event summed over all outcomes for the dependent variables, called the sum rule.

  • Marginal Probability: The probability of an event irrespective of the outcomes of other random variables, e.g. P(A).

The joint probability is the probability of two (or more) simultaneous events, often described in terms of events A and B from two dependent random variables, e.g. X and Y. The joint probability is often summarized as just the outcomes, e.g. A and B.

  • Joint Probability: Probability of two (or more) simultaneous events, e.g. P(A and B) or P(A, B).

The conditional probability is the probability of one event given the occurrence of another event, often described in terms of events A and B from two dependent random variables e.g. X and Y.

  • Conditional Probability: Probability of one (or more) event given the occurrence of another event, e.g. P(A given B) or P(A | B).

The joint probability can be calculated using the conditional probability; for example:

  • P(A, B) = P(A | B) * P(B)

This is called the product rule. Importantly, the joint probability is symmetrical, meaning that:

  • P(A, B) = P(B, A)

The conditional probability can be calculated using the joint probability; for example:

  • P(A | B) = P(A, B) / P(B)

The conditional probability is not symmetrical; for example:

  • P(A | B) != P(B | A)

We are now up to speed with marginal, joint and conditional probability. If you would like more background on these fundamentals, see the tutorial:


Bayes Theorem: Principled way of calculating a conditional probability without the joint probability.

It is often the case that we do not have access to the denominator directly, e.g. P(B).

We can calculate it an alternative way; for example:

  • P(B) = P(B|A) * P(A) + P(B|not A) * P(not A)

This gives a formulation of Bayes Theorem that we can use that uses the alternate calculation of P(B), described below:

  • P(A|B) = P(B|A) * P(A) / P(B|A) * P(A) + P(B|not A) * P(not A)

Or with brackets around the denominator for clarity:

  • P(A|B) = P(B|A) * P(A) / (P(B|A) * P(A) + P(B|not A) * P(not A))

Note: the denominator is simply the expansion we gave above.

As such, if we have P(A), then we can calculate P(not A) as its complement; for example:

  • P(not A) = 1 – P(A)

Additionally, if we have P(not B|not A), then we can calculate P(B|not A) as its complement; for example:

  • P(B|not A) = 1 – P(not B|not A)

Now that we are familiar with the calculation of Bayes Theorem, let’s take a closer look at the meaning of the terms in the equation.


Ques. Explain Bayesian learning . Explain two category classification .

Answer:

Bayesian Learning:

  • Bayesian learning uses Bayes' theorem to determine the conditional probability of a hypothesis given some evidence or observations.
  • It is a fundamental statistical approach to the problem of pattern classification.
  • This type of learning focuses on quantifying the tradeoffs between various classification decisions using probability and costs that accompany success decisions.
  • Decision problem is solved on the basis of probabilistic terms hence it is assumed that all the relevant probabilities are known.

Bayes' theorem:

  • It tells how the conditional probability of an event or a hypothesis can be computed using evidence and prior knowledge.
  • This is the formula for Bayes’ Theorem
  • P(X|θ) — Likelihood is the conditional probability of the evidence given a hypothesis.
  • P(X) — Evidence term denotes the probability of evidence or data. 

This can be expressed as a summation (or integral) of the probabilities of all possible hypotheses weighted by the likelihood of the same.

  • P(θ) — Prior Probability is the probability of the hypothesis θ being true before applying the Bayes' theorem. Prior indicates that the machine gains the data from the past experience.
  • For example:

For instance,our code has no bugs given the evidence that it has passed all the test cases, including our prior belief that we have rarely observed any bugs in our code.

Two Category Classification:

Special case of two-category classification problems,

  •  action a1 corresponds to deciding that the 
  • true state of nature is w1, 
  • and action a
  • corresponds to deciding that it is w2

For notational simplicity, let lij=l(ai|wj) be the loss incurred for deciding wi, when the true state of nature is wj. If we write out the conditional risk given by 

( where P(wj|x) is the probability that the true state of nature is wj, the expected loss associated with taking action ai is)

we obtain

                                                                                 

There are a variety of ways of expressing the minimum-risk decision rule, each having its own minor advantages. The fundamental rule is to decide wif R(a1|x)<R(a2|x). In terms of the posterior probabilities, we decide wif

R(a1|x)<R(a2|x)

or in terms of the prior probabilities

or alternatively as likelihood ratio

This form of the decision rule focuses on the x-dependence of the probability den­sities. We can consider p(x|wj) a function of w(i.e., the likelihood function) and then form the likelihood ratio p(x|w1)/ p(x|w2). 

Thus the Bayes decision rule can be interpreted as calling for deciding w

if the likelihood ratio exceeds a threshold value that is independent of the observation x.

 

Ques. Explain how the decision error for Bayesian classification can be minimized.

Answer:

In order to minimise the decision error,Bayesian Classifier is the optimal way.

In classification problems, each state of nature is associated with a different one of the classes, and the action ai is usually interpreted as the decision that the true state of nature is wi. If action ai is taken and the true state of nature is wj then the decision is correct if i=j and in error if i¹j. If errors are to be avoided it is natural to seek a decision rule that minimizes the probability of error, that is the error rate.

This loss function is so called symmetrical or zero-one loss function is given as

                                                                               

This loss function assigns no loss to a correct decision, and assigns a unit loss to any error: thus, all errors are equally costly. The risk corresponding to this loss function is precisely the average probability of error because the conditional risk for the two-category classification is

and P(wj|x) is the conditional probability that action ai is correct. 

The Bayes deci­sion rule to minimize risk calls for selecting the action that minimizes the conditional risk.

Thus, to minimize the average probability of error, we should select the i that maximizes the posterior probability P(wj|x).

In other words, for minimum error rate:

Decide wif P(wi|x)>P(wj|x)  for all i≠j  


Ques. Define Bayes classifier . Explain how classification is done using Bayes Classifier.

Answer:

Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in building the fast machine learning models that can make quick predictions.It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and classifying articles.

When the features are independent,we can extend Bayes Rule called “Naive” because of the Naive assumption that X’s are independent of each other,regardless of its name,it’s a powerful formula.When there are multiple X variables we simplify it by assuming the X’s are independent.

Working of Bayes’ Classifier:

Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this dataset we need to decide whether we should play or not on a particular day according to the weather conditions. So to solve this problem, we need to follow the below steps:

  1. Convert the given dataset into frequency tables.
  2. Generate Likelihood table by finding the probabilities of given features.
  3. Now, use Bayes theorem to calculate the posterior probability.

Advantages of  Bayes’ Classifier:

  • Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
  • It can be used for Binary as well as Multi-class Classifications.

Disadvantages of Bayes’ Classifier:

  • Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship between features.

  

Ques. Discuss Bayes classification using some examples in detail.

Answer:

Bayes Classification:

Bayesian classification uses Bayes theorem to predict the occurrence of any event. 

Bayesian classifiers are the statistical classifiers. Bayesian classifiers can predict class membership probabilities such as the probability that a given tuple belongs to a particular class.

 

 

Ques. Let blue,green and red be three classes of objects with prior probabilities given by P(blue)=¼,P(green)=½,P(red)=¼.

Let there be three types of objects: pencils,pens and paper.Let the class-conditional probabilities of these objects be given as follows.

Use Bayes classifier to classify pencil,pen and paper.

  • P(pencil/green)=⅓,
  • P(pen/green)=½,
  • P(paper/green)=⅙,
  • P(pencil/blue)=½,
  • P(pen/bue)=⅙,
  • P(paper/blue)=⅓,
  • P(pencil/red)=⅙,
  • P(pen/red)=⅓,
  • P(paper/red)=½.

Answer:

 

Post a Comment

0 Comments