Header Ads Widget

CANDIDATE-ELIMINATION Learning Algorithm

The CANDIDATE-ELIMINTION algorithm computes the version space containing all hypotheses from H that are consistent with an observed sequence of training examples.


Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H For each training example d, do

         If d is a positive example

         Remove from G any hypothesis inconsistent with d

         For each hypothesis s in S that is not consistent with d

         Remove s from S

         Add to S all minimal generalizations h of s such that

         h is consistent with d, and some member of G is more general than h

         Remove from S any hypothesis that is more general than another hypothesis in S

 

     If d is a negative example

         Remove from S any hypothesis inconsistent with d

         For each hypothesis g in G that is not consistent with d

         Remove g from G

         Add to G all minimal specializations h of g such that

         h is consistent with d, and some member of S is more specific than h

         Remove from G any hypothesis that is less general than another hypothesis in G


CANDIDATE- ELIMINTION algorithm using version spaces

 An Illustrative Example

Example

Sky

AirTemp

Humidity

Wind

Water

Forecast

EnjoySport

1

Sunny

Warm

Normal

Strong

Warm

Same

Yes

2

Sunny

Warm

High

Strong

Warm

Same

Yes

3

Rainy

Cold

High

Strong

Warm

Change

No

4

Sunny

Warm

High

Strong

Cool

Change

Yes

CANDIDATE-ELIMINTION algorithm begins by initializing the version space to the set of all hypotheses in H;

         Initializing the G boundary set to contain the most general hypothesis in H

G0  á?,  ?,  ?,  ?,  ?, ?ñ

         Initializing the S boundary set to contain the most specific (least general) hypothesis

S0  áÆ, Æ, Æ, Æ, Æ, Æñ

  • When the first training example is presented, the CANDIDATE-ELIMINTION algorithm checks the S boundary and finds that it is overly specific and it fails to cover the positive example.
  • The boundary is therefore revised by moving it to the least more general hypothesis that covers this new example
  • No update of the G boundary is needed in response to this training example because Go correctly covers this example

  • When the second training example is observed, it has a similar effect of generalizing S further to S2, leaving G again unchanged i.e., G2 = G1 = G0

  • Consider the third training example. This negative example reveals that the G boundary of the version space is overly general, that is, the hypothesis in G incorrectly predicts that this new example is a positive example.
  • The hypothesis in the G boundary must therefore be specialized until it correctly classifies this new negative example

Given that there are six attributes that could be specified to specialize G2, why are there only three new hypotheses in G3?

For example, the hypothesis h = (?, ?, Normal, ?, ?, ?) is a minimal specialization of G2 that correctly labels the new example as a negative example, but it is not included in G3. The reason this hypothesis is excluded is that it is inconsistent with the previously encountered positive examples

  • Consider the fourth training example.
  • This positive example further generalizes the S boundary of the version space. It also results in removing one member of the G boundary, because this member fails to cover the new positive example

After processing these four examples, the boundary sets S4 and G4 delimit the version space of all hypotheses consistent with the set of incrementally observed training examples.





Post a Comment

0 Comments