CANDIDATE-ELIMINATION Learning Algorithm

Rahul Saini June 06, 2021 ,Machine Learning ,Machine Learning Unit-1

The CANDIDATE-ELIMINTION algorithm computes the version space containing all hypotheses from H that are consistent with an observed sequence of training examples.

Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H For each training example d, do

• If d is a positive example

• Remove from G any hypothesis inconsistent with d

• For each hypothesis s in S that is not consistent with d

• Remove s from S

• Add to S all minimal generalizations h of s such that

• h is consistent with d, and some member of G is more general than h

• Remove from S any hypothesis that is more general than another hypothesis in S

• If d is a negative example

• Remove from S any hypothesis inconsistent with d

• For each hypothesis g in G that is not consistent with d

• Remove g from G

• Add to G all minimal specializations h of g such that

• h is consistent with d, and some member of S is more specific than h

• Remove from G any hypothesis that is less general than another hypothesis in G

CANDIDATE- ELIMINTION algorithm using version spaces

An Illustrative Example

Example	Sky	AirTemp	Humidity	Wind	Water	Forecast	EnjoySport
1	Sunny	Warm	Normal	Strong	Warm	Same	Yes
2	Sunny	Warm	High	Strong	Warm	Same	Yes
3	Rainy	Cold	High	Strong	Warm	Change	No
4	Sunny	Warm	High	Strong	Cool	Change	Yes

CANDIDATE-ELIMINTION algorithm begins by initializing the version space to the set of all hypotheses in H;

Initializing the G boundary set to contain the most general hypothesis in H

G0 á?, ?, ?, ?, ?, ?ñ

Initializing the S boundary set to contain the most specific (least general) hypothesis

S0 áÆ, Æ, Æ, Æ, Æ, Æñ

When the first training example is presented, the CANDIDATE-ELIMINTION algorithm checks the S boundary and finds that it is overly specific and it fails to cover the positive example.
The boundary is therefore revised by moving it to the least more general hypothesis that covers this new example
No update of the G boundary is needed in response to this training example because Go correctly covers this example

When the second training example is observed, it has a similar effect of generalizing S further to S2, leaving G again unchanged i.e., G2 = G1 = G0

Consider the third training example. This negative example reveals that the G boundary of the version space is overly general, that is, the hypothesis in G incorrectly predicts that this new example is a positive example.
The hypothesis in the G boundary must therefore be specialized until it correctly classifies this new negative example

Given that there are six attributes that could be specified to specialize G2, why are there only three new hypotheses in G3?

For example, the hypothesis h = (?, ?, Normal, ?, ?, ?) is a minimal specialization of G2 that correctly labels the new example as a negative example, but it is not included in G3. The reason this hypothesis is excluded is that it is inconsistent with the previously encountered positive examples

Consider the fourth training example.

This positive example further generalizes the S boundary of the version space. It also results in removing one member of the G boundary, because this member fails to cover the new positive example

After processing these four examples, the boundary sets S4 and G4 delimit the version space of all hypotheses consistent with the set of incrementally observed training examples.