The CANDIDATE-ELIMINTION algorithm computes the version space containing all hypotheses from H that are consistent with an observed sequence of training examples.
Initialize G to the set of maximally general hypotheses in H Initialize S
to the set of maximally specific hypotheses in H For each training example d,
do
•
If d is a positive example
•
Remove from G any hypothesis inconsistent with d
•
For each hypothesis s in S that is not consistent with d
•
Remove s from S
•
Add to S all minimal generalizations h of s such that
•
h is consistent with d, and some member of G is more general than h
•
Remove from S any hypothesis that is more general than another hypothesis in S
• If d is a negative example
•
Remove from S any hypothesis inconsistent with d
•
For each hypothesis g in G that is not consistent with d
•
Remove g from G
•
Add to G all minimal specializations h of g such that
•
h is consistent with d, and some member of S is more specific than h
• Remove from G any hypothesis that is less general than another hypothesis in G
CANDIDATE- ELIMINTION algorithm using version spaces
An Illustrative Example
Example |
Sky |
AirTemp |
Humidity |
Wind |
Water |
Forecast |
EnjoySport |
1 |
Sunny |
Warm |
Normal |
Strong |
Warm |
Same |
Yes |
2 |
Sunny |
Warm |
High |
Strong |
Warm |
Same |
Yes |
3 |
Rainy |
Cold |
High |
Strong |
Warm |
Change |
No |
4 |
Sunny |
Warm |
High |
Strong |
Cool |
Change |
Yes |
CANDIDATE-ELIMINTION
algorithm begins by initializing the version space to the set of all hypotheses
in H;
G0 á?, ?,
?, ?, ?, ?ñ
S0 áÆ, Æ, Æ, Æ, Æ, Æñ
- When the first training example is presented, the CANDIDATE-ELIMINTION algorithm checks the S boundary and finds that it is overly specific and it fails to cover the positive example.
- The boundary is therefore revised by moving it to the least more general hypothesis that covers this new example
- No update of the G boundary is needed in response to this training example because Go correctly covers this example
- When the second training example is observed, it has a
similar effect of generalizing S further to S2, leaving G again unchanged i.e., G2 = G1 =
G0
- Consider the third training example. This negative example reveals that the G boundary of the version space is overly general, that is, the hypothesis in G incorrectly predicts that this new example is a positive example.
- The hypothesis in the G boundary must therefore be specialized until it correctly classifies this new negative example
Given
that there are six attributes that could be specified to specialize G2, why are there only three new hypotheses in G3?
For
example, the hypothesis h = (?, ?, Normal, ?, ?, ?) is a minimal specialization
of G2 that correctly
labels the new example as a negative
example, but it is not included in G3. The reason
this hypothesis is excluded is that it is inconsistent with the previously
encountered positive examples
- Consider the fourth training example.
- This positive example further generalizes the S boundary of the version space. It also results in removing one member of the G boundary, because this member fails to cover the new positive example
After processing these four examples, the boundary sets S4 and G4 delimit the version space of all hypotheses consistent with the set of incrementally observed training examples.
0 Comments