The version space is represented by its most general and least general
members. These members form general and specific boundary sets that delimit the
version space within the partially ordered hypothesis space.
Definition: The general boundary G, with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent with D
is the set of minimally general
(i.e., maximally specific) members of H consistent
with D.
S º{s Î H | Consistent (s, D)Ù(Ø$s' Î H)[(s >gs') Ù Consistent(s', D)]}
Theorem:
Version Space representation theorem
Theorem: Let X be an arbitrary set of instances and Let H be a
set of Boolean-valued hypotheses defined over X. Let c: X →{O, 1} be an
arbitrary target concept defined over X, and
let D be an arbitrary
set of training examples {(x, c(x))). For all X, H, c, and D such that S
and G are well defined,
V SH, D ={ h ÃŽ H | ($s ÃŽ S ) ($g ÃŽ G ) ( g ³gh ³ gs )}
To Prove:
1. Every h satisfying the right hand side of the above expression is in V SH, D
2. Every member of V SH, D satisfies the right-hand side of the expression
Sketch of proof:
- let g, h, s be arbitrary members of G, H, S respectively with g ³g h ³g s
- By the definition of S, s must be satisfied by all positive examples in D. Because h ³g s, h must also be satisfied by all positive examples in D.
- By the definition of G, g cannot be satisfied by any negative example in D, and because g ³g h, h cannot be satisfied by any negative example in D. Because h is satisfied by all positive examples in D and by no negative examples in D, h is consistent with D, and therefore h is a member of VSH,D.
0 Comments