- The naive Bayes classifier makes significant use of the assumption that the values of the attributes a1 . . .an are conditionally independent given the target value v.
- This assumption dramatically reduces the complexity of learning the target function
A Bayesian belief network describes the probability distribution
governing a set of variables by
specifying a set of conditional independence assumptions along with a set of
conditional probabilities
Bayesian belief networks allow stating conditional independence
assumptions that apply to subsets of the variables
Notation
- Consider an arbitrary set of random variables Y1 . . . Yn , where each variable Yi can take on the set of possible values V(Yi).
- The joint space of the set of variables Y to be the cross product V(Y1) x V(Y2) x. . . V(Yn).
- In other words, each item in the joint space corresponds to one of the possible assignments of values to the tuple of variables (Y1 . . . Yn). The probability distribution over this joint' space is called the joint probability distribution.
- The joint probability distribution specifies the probability for each of the possible variable bindings for the tuple (Y1 . . . Yn).
- A Bayesian belief network describes the joint probability distribution for a set of variables.
Conditional Independence
Let X, Y, and Z be three discrete-valued random variables. X is
conditionally independent of Y given
Z if the probability distribution governing X is independent of the value of Y
given a value for Z, that is, if
Where,
The above expression is written in abbreviated form as
P(X | Y, Z) = P(X | Z)
Conditional
independence can be extended to sets of variables. The set of variables X1 . . . Xl is conditionally
independent of the set of variables Y1 . . . Ym given the set of
variables Z1 . . . Zn if
The naive Bayes
classifier assumes that the instance attribute A1 is conditionally
independent of instance attribute A2 given the target value V. This allows the
naive Bayes classifier to calculate P(Al, A2 | V) as follows,
Representation
A Bayesian belief network represents the joint probability distribution
for a set of variables. Bayesian networks
(BN) are represented by directed acyclic
graphs.
The
Bayesian network in above figure represents the joint probability distribution
over the boolean variables
Storm, Lightning, Thunder, ForestFire, Campfire, and BusTourGroup
A Bayesian network (BN) represents the joint probability distribution by specifying a set of conditional independence assumptions
- BN represented by a directed acyclic graph, together with sets of local conditional probabilities
- Each variable in the joint space is represented by a node in the Bayesian network
- The network arcs represent the assertion that the variable is conditionally independent of its non-descendants in the network given its immediate predecessors in the network.
- A conditional probability table (CPT) is given for each variable, describing the probability distribution for that variable given the values of its immediate predecessors
The joint probability
for any desired assignment of values (y1, . . . , yn) to the tuple of network variables (Y1 . . . Ym) can be computed by the formula
Where, Parents(Yi) denotes the set of immediate predecessors of Yi in the network.
Example:
Consider the node Campfire. The network nodes and arcs represent the assertion that Campfire is conditionally independent of its non-descendants Lightning and Thunder, given its immediate
parents Storm and BusTourGroupThis
means that once we know the value of the variables Storm and BusTourGroup,
the variables Lightning and Thunder provide no additional
information about Campfire
The conditional probability table associated with the variable Campfire.
The assertion is P(Campfire = True | Storm = True, BusTourGroup = True) = 0.4
Inference
- Use a Bayesian network to infer the value of some target variable (e.g., ForestFire) given the observed values of the other variables.
- Inference can be straightforward if values for all of the other variables in the network are known exactly.
- A Bayesian network can be used to compute the probability distribution for any subset of network variables given the values or distributions for any subset of the remaining variables.
- An arbitrary Bayesian network is known to be NP-hard
Learning Bayesian Belief Networks
Affective algorithms can be considered for learning Bayesian belief
networks from training data by considering several
different settings for learning problem
Ø First, the network structure might be given in advance,
or it might have to be inferred
from the training
data.
Ø Second, all the network variables might be directly observable in each training example, or some might be unobservable.
- In the case where the network structure is given in advance and the variables are fully observable in the training examples, learning the conditional probability tables is straightforward and estimate the conditional probability table entries
- In the case where the network structure is given but only some of the variable values are observable in the training data, the learning problem is more difficult. The learning problem can be compared to learning weights for an ANN.
Gradient Ascent Training of Bayesian Network
The gradient ascent rule which maximizes P(D|h) by following the gradient
of ln
P(D|h) with respect to the parameters that define the conditional probability tables of the Bayesian network.
Let wijk denote a single entry in one of the
conditional probability tables. In particular wijk denote the conditional probability that the network variable Yi will take on the
value yi, given that its immediate parents
Ui take on the values
given by uik.
The gradient of ln P(D|h) is given by the derivatives
for each of the wijk. As shown below, each of these derivatives can be calculated asfor all i, j, and k. Assuming the training examples d in the data set D are drawn independently, we write this derivative as
We write the
abbreviation Ph(D) to represent P(D|h).
0 Comments