Header Ads Widget

BAYESIAN BELIEF NETWORKS

  • The naive Bayes classifier makes significant use of the assumption that the values of the attributes a1 . . .an are conditionally independent given the target value v.
  • This assumption dramatically reduces the complexity of learning the target function 

A Bayesian belief network describes the probability distribution governing a set of variables by specifying a set of conditional independence assumptions along with a set of conditional probabilities

Bayesian belief networks allow stating conditional independence assumptions that apply to subsets of the variables

 

Notation

  • Consider an arbitrary set of random variables Y1 . . . Yn , where each variable Yi can take on the set of possible values V(Yi).
  • The joint space of the set of variables Y to be the cross product V(Y1) x V(Y2) x. . . V(Yn).
  • In other words, each item in the joint space corresponds to one of the possible assignments of values to the tuple of variables (Y1 . . . Yn). The probability distribution over this joint' space is called the joint probability distribution.
  • The joint probability distribution specifies the probability for each of the possible variable bindings for the tuple (Y1 . . . Yn).
  • A Bayesian belief network describes the joint probability distribution for a set of variables. 

Conditional Independence 

Let X, Y, and Z be three discrete-valued random variables. X is conditionally independent of Y given Z if the probability distribution governing X is independent of the value of Y given a value for Z, that is, if

Where,


The above expression is written in abbreviated form as 

P(X | Y, Z) = P(X | Z) 


Conditional independence can be extended to sets of variables. The set of variables X1 . . . Xl is conditionally independent of the set of variables Y1 . . . Ym given the set of variables Z1 . . . Zn if


The naive Bayes classifier assumes that the instance attribute A1 is conditionally independent of instance attribute A2 given the target value V. This allows the naive Bayes classifier to calculate P(Al, A2 | V) as follows,

Representation 

A Bayesian belief network represents the joint probability distribution for a set of variables. Bayesian networks (BN) are represented by directed acyclic graphs.


The Bayesian network in above figure represents the joint probability distribution over the boolean variables Storm, Lightning, Thunder, ForestFire, Campfire, and BusTourGroup

 

A Bayesian network (BN) represents the joint probability distribution by specifying a set of conditional independence assumptions

  • BN represented by a directed acyclic graph, together with sets of local conditional probabilities
  • Each variable in the joint space is represented by a node in the Bayesian network
  • The network arcs represent the assertion that the variable is conditionally independent of its non-descendants in the network given its immediate predecessors in the network.
  • A conditional probability table (CPT) is given for each variable, describing the probability distribution for that variable given the values of its immediate predecessors 

The joint probability for any desired assignment of values (y1, . . . , yn) to the tuple of network variables (Y1 . . . Ym) can be computed by the formula


Where, Parents(Yi) denotes the set of immediate predecessors of Yi in the network.


Example:

Consider the node Campfire. The network nodes and arcs represent the assertion that Campfire is conditionally independent of its non-descendants Lightning and Thunder, given its immediate parents Storm and BusTourGroup


This means that once we know the value of the variables Storm and BusTourGroup, the variables Lightning and Thunder provide no additional information about Campfire

The conditional probability table associated with the variable Campfire. The assertion is P(Campfire = True | Storm = True, BusTourGroup = True) = 0.4

Inference

  • Use a Bayesian network to infer the value of some target variable (e.g., ForestFire) given the observed values of the other variables.
  • Inference can be straightforward if values for all of the other variables in the network are known exactly.
  • A Bayesian network can be used to compute the probability distribution for any subset of network variables given the values or distributions for any subset of the remaining variables.
  • An arbitrary Bayesian network is known to be NP-hard

Learning Bayesian Belief Networks 

Affective algorithms can be considered for learning Bayesian belief networks from training data by considering several different settings for learning problem

Ø  First, the network structure might be given in advance, or it might have to be inferred from the training data.

Ø  Second, all the network variables might be directly observable in each training example, or some might be unobservable.

  • In the case where the network structure is given in advance and the variables are fully observable in the training examples, learning the conditional probability tables is straightforward and estimate the conditional probability table entries
  • In the case where the network structure is given but only some of the variable values are observable in the training data, the learning problem is more difficult. The learning problem can be compared to learning weights for an ANN. 

Gradient Ascent Training of Bayesian Network 

The gradient ascent rule which maximizes P(D|h) by following the gradient of ln P(D|h) with respect to the parameters that define the conditional probability tables of the Bayesian network.

 

Let wijk denote a single entry in one of the conditional probability tables. In particular wijk denote the conditional probability that the network variable Yi will take on the value yi, given that its immediate parents Ui take on the values given by uik.

The gradient of ln P(D|h) is given by the derivatives 

 for each of the wijk. As shown below, each of these derivatives can be calculated as



Derive the gradient defined by the set of derivatives 

for all i, j, and k. Assuming the training examples d in the data set D are drawn independently, we write this derivative as

We write the abbreviation Ph(D) to represent P(D|h).






Post a Comment

0 Comments